feat: add ai-cache plugin#13578
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new ai-cache APISIX plugin that provides an L1 exact-match cache for non-streaming LLM requests handled by ai-proxy, using Redis as the backend and exposing cache debug headers.
Changes:
- Introduces the
ai-cacheplugin implementation, schema, and keying logic (SHA-256 fingerprint + configurable scope). - Adds an end-to-end test suite covering MISS/HIT, bypassing, TTL expiry, scope isolation, and fail-open behavior.
- Wires the plugin into the default plugin lists and build/install packaging.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
apisix/plugins/ai-cache.lua |
Core plugin logic: lookup on access, capture on body/log, Redis integration, cache headers. |
apisix/plugins/ai-cache/schema.lua |
JSON schema for plugin configuration, leveraging apisix.utils.redis-schema via policy + if/then. |
apisix/plugins/ai-cache/key.lua |
Cache key fingerprinting (protocol/model/messages/params) and scope computation. |
t/plugin/ai-cache.t |
New functional + unit tests for cache behavior and edge cases. |
t/admin/plugins.t |
Adds ai-cache to the admin plugin list expectation. |
conf/config.yaml.example |
Adds ai-cache to the example plugin list with priority comment. |
apisix/cli/config.lua |
Adds ai-cache to the CLI’s default plugin list. |
Makefile |
Installs the apisix/plugins/ai-cache/ directory Lua modules during make install. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ss_on
Encode the request fingerprint with rapidjson (sort_keys) plus a
to_rapidjson_value pass that maps the JSON null sentinel and array_mt
tables, mirroring ai-transport/http.lua. core.json.stably_encode (dkjson)
raised on the cjson null sentinel, so a body carrying an explicit null
(e.g. OpenAI's "stop": null) would error out of the access phase.
Replace the cache_bypass var-ref opt-out with bypass_on: an array of
{header, equals} rules that skip the cache when a request header exactly
equals its value (per rfcs#78). Exact header == value only; any matching
rule triggers BYPASS.
Tests: add a null-body fingerprint regression, migrate the bypass tests
to bypass_on, and cover multiple rules where any match bypasses.
… update fingerprinting logic
Document the ai-cache plugin: description, full attribute table (incl. all Redis policy fields), and Admin API / ADC / Ingress Controller examples covering cache MISS/HIT and bypass_on. Add the page to the en and zh plugin sidebars.
…oute cache sharing scenarios
nic-6443
left a comment
There was a problem hiding this comment.
Thanks for the quick turnaround — all my comments are addressed: per-route scoping by default with share_across_routes opt-out, red:close() on Redis errors instead of pooling a broken connection, the dead layers knob dropped, and the canonical encoding pulled up into core.json.canonical_encode (nicely de-duped with ai-transport). LGTM.
Description
Adds a new
ai-cacheplugin that caches LLM responses and replays them for subsequent requests that resolve to the same prompt, cutting upstream token cost and latency for repetitive workloads (FAQ bots, document Q&A, translation).This PR implements the exact (L1) cache layer:
temperature,top_p,max_tokens,tools, …). Provider-agnostic viaai-protocols, so it works for every chat protocolai-proxysupports (OpenAI Chat, Anthropic Messages, Bedrock Converse, OpenAI Responses). The key uses the client-requested model (the effective model fromai-proxy'soptions.model/ multi-instance selection isn't known until after the lookup); if differently-modelled routes share one Redis + scope, isolate them via a separate Redis orcache_key.include_vars(e.g.route_id).apisix.utils.redis-schemavia thepolicy+if/thenconvention used bylimit-count/limit-req/limit-conn.cache_key.include_consumer/include_vars).bypass_onopt-out (exact request-header match);max_cache_body_sizecap;X-AI-Cache-Status/X-AI-Cache-Ageresponse headers; fails open (proxies as a normal miss) when Redis is unreachable.ai-proxy(priority1035) and depends onai-proxy/ai-proxy-multi.Semantic cache, streaming support, and observability are planned as follow-up PRs. User-facing documentation will be added in a later PR once the series is further along.
Which issue(s) this PR fixes:
Related to #13290
Checklist