Wire rate limit middleware into proxy runner chain#4652
Wire rate limit middleware into proxy runner chain#4652
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4652 +/- ##
==========================================
- Coverage 68.90% 68.75% -0.15%
==========================================
Files 509 507 -2
Lines 52698 52699 +1
==========================================
- Hits 36310 36233 -77
- Misses 13580 13664 +84
+ Partials 2808 2802 -6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
15f2cd5 to
90ebabc
Compare
90ebabc to
7063692
Compare
7063692 to
d28ef3e
Compare
d28ef3e to
4d1d6aa
Compare
4d1d6aa to
0f476a5
Compare
jhrozek
left a comment
There was a problem hiding this comment.
Review from Claude Code — 4 inline comments, all non-blocking nits.
0f476a5 to
38e5bb3
Compare
38e5bb3 to
2c8481a
Compare
ChrisJBurns
left a comment
There was a problem hiding this comment.
Multi-Agent Consensus Review
Agents consulted: mcp-protocol-expert, kubernetes-expert, security-advisor, code-reviewer, toolhive-expert
Consensus Summary
| # | Finding | Consensus | Severity | Action |
|---|---|---|---|---|
| 1 | CRD type v1alpha1.RateLimitConfig embedded directly in RunConfig API contract |
9/10 | MEDIUM | Discuss |
| 2 | HTTP 500 for missing ParsedMCPRequest inconsistent with other middleware | 7/10 | MEDIUM | Discuss |
| 3 | Missing unit test: rate limit configured without Redis | 8/10 | LOW | Fix |
| 4 | Hardcoded redisImage instead of images.RedisImage |
8/10 | LOW | Fix |
| 5 | Missing JustAfterEach state dump in E2E suite |
7/10 | LOW | Fix |
| 6 | Duplicate retrySeconds computation |
7/10 | LOW | Fix |
| 7 | AddMiddleware(MiddlewareType, ...) vs config.Type |
7/10 | LOW | Fix |
Overall
Well-structured PR with clean commit separation, correct middleware chain ordering, and solid test coverage (unit + E2E). The middleware follows established factory patterns, fail-open on Redis errors is the right resilience trade-off, and the security properties are sound — Redis key isolation is correct, no rate limit bypass vectors exist, and tool name lookups use pre-built maps (not dynamic key interpolation).
The most architecturally notable point is embedding v1alpha1.RateLimitConfig directly in RunConfig, which is documented as ToolHive's API contract. Other middleware configs (authz, audit, telemetry) use pkg/-local types, and ScalingConfig/SessionRedisConfig are runner-local types with conversion at the operator boundary. The CRD coupling is a team design decision worth making explicitly. The remaining findings are test polish items.
Generated with Claude Code
|
|
||
| // RateLimitConfig contains the CRD rate limiting configuration. | ||
| // When set, rate limiting middleware is added to the proxy middleware chain. | ||
| RateLimitConfig *v1alpha1.RateLimitConfig `json:"rate_limit_config,omitempty" yaml:"rate_limit_config,omitempty"` |
There was a problem hiding this comment.
[MEDIUM | 9/10 consensus] This embeds a CRD type directly in RunConfig, which is documented as "part of ToolHive's API contract." Other middleware configs (AuthzConfig, AuditConfig, TelemetryConfig) use pkg/-local types, and ScalingConfig/SessionRedisConfig are runner-local types with conversion at the operator boundary. A CRD version bump (v1alpha1 → v1beta1) would be a breaking change to RunConfig serialization, and the Swagger output exposes internal CRD type paths.
Consider defining runner-local types mirroring the CRD structure (like ScalingConfig/SessionRedisConfig) with conversion at the operator boundary — or accept the coupling with a comment noting the v1alpha1 dependency and migration implications.
There was a problem hiding this comment.
Intentional. Rate limiting is tightly coupled to the consuming MCPServer and is only used in the Kubernetes operator context. A CRD version rename (v1alpha1 → v1beta1) changes the import path but not the struct layout or JSON tags, so serialization is unaffected. The risk of not building a runner-local mirror is low and can be deferred.
| return fmt.Errorf("rate limit middleware requires a Redis address") | ||
| } | ||
|
|
||
| // TODO: share a Redis client builder with session storage to get TLS, |
There was a problem hiding this comment.
[MEDIUM | 7/10 consensus] Returning HTTP 500 here is inconsistent with other middleware handling the same condition:
- authz middleware (
pkg/authz/middleware.go:192): returns HTTP 400 - webhook middleware (
pkg/webhook/validating/middleware.go:88): passes through to next handler
Since the method filter on line 88 already handles non-tools/call requests, the pass-through approach (like webhooks) may be safer — it avoids rejecting legitimate non-JSON-RPC requests (health checks, GET for SSE streams) if middleware ordering changes in the future.
There was a problem hiding this comment.
Fixed. Non-JSON-RPC requests (health checks, SSE streams) have no MCP context and now pass through unconditionally. The nil check and method check are combined into a single guard.
| MaxTokens: 10, | ||
| RefillPeriod: metav1.Duration{Duration: time.Minute}, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
[LOW | 8/10 consensus] Missing error path test case: RateLimitConfig set but ScalingConfig/SessionRedis nil. This is the most important validation at the RunConfig level since CEL only covers CRD admission, not direct RunConfig construction.
| }, | |
| tests := []struct { | |
| name string | |
| config *RunConfig | |
| wantAppended bool | |
| wantErr bool | |
| }{ | |
| { | |
| name: "nil RateLimitConfig returns input unchanged", | |
| config: &RunConfig{}, | |
| wantAppended: false, | |
| }, | |
| { | |
| name: "rate limit without Redis returns error", | |
| config: &RunConfig{ | |
| RateLimitConfig: &v1alpha1.RateLimitConfig{ | |
| Shared: &v1alpha1.RateLimitBucket{ | |
| MaxTokens: 10, | |
| RefillPeriod: metav1.Duration{Duration: time.Minute}, | |
| }, | |
| }, | |
| }, | |
| wantErr: true, | |
| }, | |
| { |
There was a problem hiding this comment.
Fixed. Added test case for RateLimitConfig set with nil ScalingConfig/SessionRedis — verifies the error mentions sessionStorage.
| "bytes" | ||
| "context" | ||
| "encoding/json" | ||
| "fmt" |
There was a problem hiding this comment.
[LOW | 8/10 consensus] test/e2e/images/images.go already exports images.RedisImage with the same value "redis:7-alpine", and the scaling tests in virtualmcp/ use it. Using the centralized constant keeps image versions in sync for automated dependency updates (Renovate).
There was a problem hiding this comment.
Fixed. Now uses images.RedisImage from the centralized test/e2e/images/ package.
Implement the rate limit middleware factory and handler that wires into the proxy runner middleware chain. The handler: - Passes through when MCP parser context is missing (non-JSON-RPC requests like health checks and SSE streams) - Only rate-limits tools/call requests; other methods pass through - Fails open on Redis errors (logs warning, allows request) - Returns HTTP 429 with JSON-RPC -32029 error and Retry-After header The factory pings Redis at startup with a 5s timeout to fail fast on misconfiguration rather than silently failing open on every request. Tests use a dummy Limiter implementation to verify handler behavior without Redis: allowed, rejected, fail-open, missing context passthrough, and non-tools/call passthrough. Part of #4551 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Register the rate limit middleware factory and add it to the proxy runner middleware chain, positioned after MCP parser (needs tool name from context) and before validating webhooks. The operator reconciler maps spec.rateLimiting directly to the RunConfig, carrying the CRD type without intermediate config types. The middleware factory creates a Redis client from session storage config and builds the Limiter at startup. Move populateScalingConfig before PopulateMiddlewareConfigs in the reconciler so SessionRedis config is available when the rate limit middleware reads it. Validate that Redis session storage is configured when rate limiting is enabled, with a clear error message. Part of #4551 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ginkgo-based E2E tests that deploy Redis and an MCPServer with rate limiting in a Kind cluster and verify runtime enforcement: - AC7: Send requests exceeding the shared limit, verify HTTP 429 with JSON-RPC error code -32029 and retryAfterSeconds in error data - AC8: Deploy MCPServer with both shared and per-tool rate limit config, verify the CRD is accepted and the server reaches Running phase Creates a NodePort service targeting the MCPServer proxy pods for test access. Uses images.RedisImage for centralized version management. Part of #4551 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2c8481a to
54376f3
Compare
Summary
Rate limiting CRD types and the token-bucket Limiter package were
merged in #4577. This PR wires the limiter into the proxy runner
middleware chain so rate limiting is enforced at runtime.
rate-limits only
tools/callrequests, returns HTTP 429 withJSON-RPC error code
-32029andRetry-Afterheader on rejection,and fails open on Redis errors
parser in the middleware chain
spec.rateLimitingfrom the CRD to the RunConfig in theoperator reconciler, carrying the CRD type directly
Kind cluster and verify rate limit enforcement
Closes #4551
Type of change
Test plan
task test)task test-e2e)task lint-fix)Unit tests (12 new):
missing MCP context (500), non-tools/call passthrough
presence/absence
E2E tests (2 new):
429 with JSON-RPC -32029 and retryAfterSeconds
reaches Running phase
Changes
pkg/ratelimit/middleware.gopkg/runner/middleware.gopkg/runner/config.goRateLimitConfigandRateLimitNamespacefieldspkg/runner/config_builder.goWithRateLimitConfigbuilder optioncmd/thv-operator/controllers/mcpserver_runconfig.gospec.rateLimitingto RunConfigtest/e2e/thv-operator/acceptance_tests/Special notes for reviewers
ParsedMCPRequestis missingfrom context — this catches middleware ordering bugs at runtime
tools/callrequests are rate-limited; other MCP methods passthrough unconditionally
SessionRedisConfig+ theTHV_SESSION_REDIS_PASSWORDenv var, sharing connection info withsession storage
test/e2e/thv-operator/virtualmcp/patternsLarge PR Justification
1140 lines across 11 files, but 844 lines are tests (unit + E2E
infrastructure). The 296 lines of production code span 5 files that
form a single logical change: middleware + wiring + operator mapping.
Generated with Claude Code