Skip to content

Wire rate limit middleware into proxy runner chain#4652

Open
jerm-dro wants to merge 3 commits intomainfrom
jerm-dro/rate-limit-middleware
Open

Wire rate limit middleware into proxy runner chain#4652
jerm-dro wants to merge 3 commits intomainfrom
jerm-dro/rate-limit-middleware

Conversation

@jerm-dro
Copy link
Copy Markdown
Contributor

@jerm-dro jerm-dro commented Apr 7, 2026

Summary

Rate limiting CRD types and the token-bucket Limiter package were
merged in #4577. This PR wires the limiter into the proxy runner
middleware chain so rate limiting is enforced at runtime.

  • Adds HTTP middleware that reads the parsed MCP request from context,
    rate-limits only tools/call requests, returns HTTP 429 with
    JSON-RPC error code -32029 and Retry-After header on rejection,
    and fails open on Redis errors
  • Registers the middleware factory and positions it after auth and MCP
    parser in the middleware chain
  • Maps spec.rateLimiting from the CRD to the RunConfig in the
    operator reconciler, carrying the CRD type directly
  • Adds Ginkgo E2E acceptance tests that deploy Redis + MCPServer in a
    Kind cluster and verify rate limit enforcement

Closes #4551

Type of change

  • New feature

Test plan

  • Unit tests (task test)
  • E2E tests (task test-e2e)
  • Linting (task lint-fix)

Unit tests (12 new):

  • Middleware handler: allowed, rejected (429 + -32029), fail-open,
    missing MCP context (500), non-tools/call passthrough
  • Wiring: addRateLimitMiddleware nil/valid, PopulateMiddlewareConfigs
    presence/absence
  • Operator: spec.rateLimiting flows to RunConfig

E2E tests (2 new):

  • AC7: Burst of requests exceeds shared limit, 4th request gets HTTP
    429 with JSON-RPC -32029 and retryAfterSeconds
  • AC8: MCPServer with both shared and per-tool config is accepted and
    reaches Running phase

Changes

File Change
pkg/ratelimit/middleware.go New: factory, handler, JSON-RPC error helpers, MiddlewareParams
pkg/runner/middleware.go Register factory, add helper, call after MCP parser
pkg/runner/config.go Add RateLimitConfig and RateLimitNamespace fields
pkg/runner/config_builder.go Add WithRateLimitConfig builder option
cmd/thv-operator/controllers/mcpserver_runconfig.go Map spec.rateLimiting to RunConfig
test/e2e/thv-operator/acceptance_tests/ New: Ginkgo E2E suite with Redis deploy and rate limit tests

Special notes for reviewers

  • The middleware errors with HTTP 500 if ParsedMCPRequest is missing
    from context — this catches middleware ordering bugs at runtime
  • Only tools/call requests are rate-limited; other MCP methods pass
    through unconditionally
  • Redis client is created from SessionRedisConfig + the
    THV_SESSION_REDIS_PASSWORD env var, sharing connection info with
    session storage
  • E2E tests follow the test/e2e/thv-operator/virtualmcp/ patterns

Large PR Justification

1140 lines across 11 files, but 844 lines are tests (unit + E2E
infrastructure). The 296 lines of production code span 5 files that
form a single logical change: middleware + wiring + operator mapping.

Generated with Claude Code

@github-actions github-actions bot added the size/XL Extra large PR: 1000+ lines changed label Apr 7, 2026
@jerm-dro jerm-dro marked this pull request as draft April 7, 2026 21:20
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

❌ Patch coverage is 60.39604% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.75%. Comparing base (257c467) to head (54376f3).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
pkg/ratelimit/middleware.go 50.00% 35 Missing and 1 partial ⚠️
pkg/runner/middleware.go 80.95% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4652      +/-   ##
==========================================
- Coverage   68.90%   68.75%   -0.15%     
==========================================
  Files         509      507       -2     
  Lines       52698    52699       +1     
==========================================
- Hits        36310    36233      -77     
- Misses      13580    13664      +84     
+ Partials     2808     2802       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jerm-dro jerm-dro force-pushed the jerm-dro/rate-limit-middleware branch from 15f2cd5 to 90ebabc Compare April 7, 2026 21:33
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 7, 2026
@jerm-dro jerm-dro force-pushed the jerm-dro/rate-limit-middleware branch from 90ebabc to 7063692 Compare April 7, 2026 21:47
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 7, 2026
@jerm-dro jerm-dro force-pushed the jerm-dro/rate-limit-middleware branch from 7063692 to d28ef3e Compare April 7, 2026 22:05
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 7, 2026
@jerm-dro jerm-dro force-pushed the jerm-dro/rate-limit-middleware branch from d28ef3e to 4d1d6aa Compare April 7, 2026 22:28
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 7, 2026
@jerm-dro jerm-dro force-pushed the jerm-dro/rate-limit-middleware branch from 4d1d6aa to 0f476a5 Compare April 7, 2026 23:06
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 7, 2026
@jerm-dro jerm-dro marked this pull request as ready for review April 7, 2026 23:28
Copy link
Copy Markdown
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review from Claude Code — 4 inline comments, all non-blocking nits.

@jerm-dro jerm-dro force-pushed the jerm-dro/rate-limit-middleware branch from 0f476a5 to 38e5bb3 Compare April 8, 2026 14:31
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 8, 2026
@jerm-dro jerm-dro requested a review from jhrozek April 8, 2026 15:17
@jerm-dro jerm-dro force-pushed the jerm-dro/rate-limit-middleware branch from 38e5bb3 to 2c8481a Compare April 8, 2026 18:42
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 8, 2026
Copy link
Copy Markdown
Collaborator

@ChrisJBurns ChrisJBurns left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-Agent Consensus Review

Agents consulted: mcp-protocol-expert, kubernetes-expert, security-advisor, code-reviewer, toolhive-expert

Consensus Summary

# Finding Consensus Severity Action
1 CRD type v1alpha1.RateLimitConfig embedded directly in RunConfig API contract 9/10 MEDIUM Discuss
2 HTTP 500 for missing ParsedMCPRequest inconsistent with other middleware 7/10 MEDIUM Discuss
3 Missing unit test: rate limit configured without Redis 8/10 LOW Fix
4 Hardcoded redisImage instead of images.RedisImage 8/10 LOW Fix
5 Missing JustAfterEach state dump in E2E suite 7/10 LOW Fix
6 Duplicate retrySeconds computation 7/10 LOW Fix
7 AddMiddleware(MiddlewareType, ...) vs config.Type 7/10 LOW Fix

Overall

Well-structured PR with clean commit separation, correct middleware chain ordering, and solid test coverage (unit + E2E). The middleware follows established factory patterns, fail-open on Redis errors is the right resilience trade-off, and the security properties are sound — Redis key isolation is correct, no rate limit bypass vectors exist, and tool name lookups use pre-built maps (not dynamic key interpolation).

The most architecturally notable point is embedding v1alpha1.RateLimitConfig directly in RunConfig, which is documented as ToolHive's API contract. Other middleware configs (authz, audit, telemetry) use pkg/-local types, and ScalingConfig/SessionRedisConfig are runner-local types with conversion at the operator boundary. The CRD coupling is a team design decision worth making explicitly. The remaining findings are test polish items.


Generated with Claude Code


// RateLimitConfig contains the CRD rate limiting configuration.
// When set, rate limiting middleware is added to the proxy middleware chain.
RateLimitConfig *v1alpha1.RateLimitConfig `json:"rate_limit_config,omitempty" yaml:"rate_limit_config,omitempty"`
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM | 9/10 consensus] This embeds a CRD type directly in RunConfig, which is documented as "part of ToolHive's API contract." Other middleware configs (AuthzConfig, AuditConfig, TelemetryConfig) use pkg/-local types, and ScalingConfig/SessionRedisConfig are runner-local types with conversion at the operator boundary. A CRD version bump (v1alpha1 → v1beta1) would be a breaking change to RunConfig serialization, and the Swagger output exposes internal CRD type paths.

Consider defining runner-local types mirroring the CRD structure (like ScalingConfig/SessionRedisConfig) with conversion at the operator boundary — or accept the coupling with a comment noting the v1alpha1 dependency and migration implications.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional. Rate limiting is tightly coupled to the consuming MCPServer and is only used in the Kubernetes operator context. A CRD version rename (v1alpha1 → v1beta1) changes the import path but not the struct layout or JSON tags, so serialization is unaffected. The risk of not building a runner-local mirror is low and can be deferred.

return fmt.Errorf("rate limit middleware requires a Redis address")
}

// TODO: share a Redis client builder with session storage to get TLS,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[MEDIUM | 7/10 consensus] Returning HTTP 500 here is inconsistent with other middleware handling the same condition:

  • authz middleware (pkg/authz/middleware.go:192): returns HTTP 400
  • webhook middleware (pkg/webhook/validating/middleware.go:88): passes through to next handler

Since the method filter on line 88 already handles non-tools/call requests, the pass-through approach (like webhooks) may be safer — it avoids rejecting legitimate non-JSON-RPC requests (health checks, GET for SSE streams) if middleware ordering changes in the future.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Non-JSON-RPC requests (health checks, SSE streams) have no MCP context and now pass through unconditionally. The nil check and method check are combined into a single guard.

MaxTokens: 10,
RefillPeriod: metav1.Duration{Duration: time.Minute},
},
},
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW | 8/10 consensus] Missing error path test case: RateLimitConfig set but ScalingConfig/SessionRedis nil. This is the most important validation at the RunConfig level since CEL only covers CRD admission, not direct RunConfig construction.

Suggested change
},
tests := []struct {
name string
config *RunConfig
wantAppended bool
wantErr bool
}{
{
name: "nil RateLimitConfig returns input unchanged",
config: &RunConfig{},
wantAppended: false,
},
{
name: "rate limit without Redis returns error",
config: &RunConfig{
RateLimitConfig: &v1alpha1.RateLimitConfig{
Shared: &v1alpha1.RateLimitBucket{
MaxTokens: 10,
RefillPeriod: metav1.Duration{Duration: time.Minute},
},
},
},
wantErr: true,
},
{

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added test case for RateLimitConfig set with nil ScalingConfig/SessionRedis — verifies the error mentions sessionStorage.

"bytes"
"context"
"encoding/json"
"fmt"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[LOW | 8/10 consensus] test/e2e/images/images.go already exports images.RedisImage with the same value "redis:7-alpine", and the scaling tests in virtualmcp/ use it. Using the centralized constant keeps image versions in sync for automated dependency updates (Renovate).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Now uses images.RedisImage from the centralized test/e2e/images/ package.

jerm-dro and others added 3 commits April 8, 2026 13:16
Implement the rate limit middleware factory and handler that wires into
the proxy runner middleware chain.

The handler:
- Passes through when MCP parser context is missing (non-JSON-RPC
  requests like health checks and SSE streams)
- Only rate-limits tools/call requests; other methods pass through
- Fails open on Redis errors (logs warning, allows request)
- Returns HTTP 429 with JSON-RPC -32029 error and Retry-After header

The factory pings Redis at startup with a 5s timeout to fail fast on
misconfiguration rather than silently failing open on every request.

Tests use a dummy Limiter implementation to verify handler behavior
without Redis: allowed, rejected, fail-open, missing context
passthrough, and non-tools/call passthrough.

Part of #4551

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Register the rate limit middleware factory and add it to the proxy
runner middleware chain, positioned after MCP parser (needs tool name
from context) and before validating webhooks.

The operator reconciler maps spec.rateLimiting directly to the
RunConfig, carrying the CRD type without intermediate config types.
The middleware factory creates a Redis client from session storage
config and builds the Limiter at startup.

Move populateScalingConfig before PopulateMiddlewareConfigs in the
reconciler so SessionRedis config is available when the rate limit
middleware reads it. Validate that Redis session storage is configured
when rate limiting is enabled, with a clear error message.

Part of #4551

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ginkgo-based E2E tests that deploy Redis and an MCPServer with rate
limiting in a Kind cluster and verify runtime enforcement:

- AC7: Send requests exceeding the shared limit, verify HTTP 429 with
  JSON-RPC error code -32029 and retryAfterSeconds in error data
- AC8: Deploy MCPServer with both shared and per-tool rate limit config,
  verify the CRD is accepted and the server reaches Running phase

Creates a NodePort service targeting the MCPServer proxy pods for
test access. Uses images.RedisImage for centralized version management.

Part of #4551

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jerm-dro jerm-dro force-pushed the jerm-dro/rate-limit-middleware branch from 2c8481a to 54376f3 Compare April 8, 2026 20:16
@github-actions github-actions bot added size/XL Extra large PR: 1000+ lines changed and removed size/XL Extra large PR: 1000+ lines changed labels Apr 8, 2026
@jerm-dro jerm-dro requested a review from ChrisJBurns April 8, 2026 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/XL Extra large PR: 1000+ lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Configure global rate limits on MCPServer

3 participants