Skip to content

docs(examples): add prompt-caching example covering 3 patterns#1480

Open
HermeticOrmus wants to merge 1 commit intoanthropics:mainfrom
HermeticOrmus:feature/prompt-caching-example
Open

docs(examples): add prompt-caching example covering 3 patterns#1480
HermeticOrmus wants to merge 1 commit intoanthropics:mainfrom
HermeticOrmus:feature/prompt-caching-example

Conversation

@HermeticOrmus
Copy link
Copy Markdown

@HermeticOrmus HermeticOrmus commented Apr 30, 2026

Why

The examples/ directory has agents, batch, streaming, structured outputs, thinking, and MCP tools — but no prompt-caching example. The feature is documented at platform.claude.com, but a developer browsing the SDK won't find a runnable demonstration of where to put cache_control={\"type\": \"ephemeral\"}, what the response usage fields look like on a hit, or which patterns are worth caching.

This adds one runnable file covering the three patterns most callers actually use.

What changed

examples/prompt_caching.py — one __main__ script with three numbered sections:

  1. Cache the system prompt — chatbot/agent with a long instruction set; cache_control on the system block.
  2. Cache system + tool definitions — agent loop with tools; cache_control on the last tool entry (the bigger win, since tool defs are usually larger than the system prompt).
  3. Cache long static context — RAG / Q&A over a fixed document; cache_control on the document text block, follow-up question in a second block.

Each section has a first call that creates the cache and a second call that hits it. A show_usage() helper prints input / cache_creation / cache_read / output token counts so the reader can confirm the hit visually.

The script padding ensures each cache target exceeds the per-model floor (~1024 tokens) so the cache actually takes effect when run.

How to test

ANTHROPIC_API_KEY=... ./examples/prompt_caching.py

Expected: section [1] first-call shows non-zero cache_creation_input_tokens; second-call shows non-zero cache_read_input_tokens. Same shape for [2] and [3].

Linted clean: ruff check examples/prompt_caching.py (passes) and ruff format --check examples/prompt_caching.py (no changes). Project line-length 120 respected.

Notes

  • Uses the sync API since it dominates examples/. An async twin (prompt_caching_async.py) is a natural follow-up if maintainers prefer the same coverage in both.
  • Model pinned to claude-sonnet-4-5-20250929 (the most-used model across examples/).
  • This wasn't run against a live API from the contributor environment; verification against cache_creation_input_tokens / cache_read_input_tokens is recommended before merge.

The examples/ directory has agents, batch, streaming, structured
outputs, thinking, and MCP tools but no prompt-caching example. The
feature is heavily documented on platform.claude.com but reading the
SDK examples wouldn't tell you it exists. This adds one runnable file
covering the three patterns most users hit:

1. Cache the system prompt (chatbots, long instructions).
2. Cache system + tool definitions (agent loops).
3. Cache long static context (RAG / Q&A over a fixed document).

Each section has a first call that creates the cache and a second call
that hits it. show_usage() prints input / cache_create / cache_read /
output token counts so the operator can verify the cache hit.
@HermeticOrmus HermeticOrmus requested a review from a team as a code owner April 30, 2026 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant