feat(open-ai): instrumentation to support OTel GenAI Semantic Conventions 0.5.0 by max-deygin-traceloop · Pull Request #3844 · traceloop/openllmetry

max-deygin-traceloop · 2026-03-24T12:20:10Z

Summary

Update OpenAI instrumentation to align with OTel GenAI Semantic Conventions 0.5.0.

Migrate span attributes from custom SpanAttributes to upstream gen_ai_attributes and openai_attributes (gen_ai.request.frequency_penalty,
gen_ai.request.presence_penalty, gen_ai.operation.name, etc.)
Replace deprecated gen_ai.openai.request.service_tier / gen_ai.openai.response.service_tier with openai.request.service_tier / openai.response.service_tier
Add gen_ai.input.messages / gen_ai.output.messages support (use_messages_attributes flag) following the OTel JSON schema (role, parts[{type, content}],
finish_reason)
Map OpenAI's tool_calls finish reason to semconv's tool_call
Bump opentelemetry-semantic-conventions-ai dependency to >=0.5.0,<0.6.0 and opentelemetry-semantic-conventions to >=0.60b1

Test plan

216 unit tests pass (VCR cassettes)
New test_messages_attributes.py covers input/output message formatting (multi-turn, tool calls, edge cases)
New test_chat_with_messages_attributes VCR test validates end-to-end attribute shape
Live smoke test against real OpenAI API (gpt-4.1-nano)
E2E semconv compliance suite: 7 scenarios / 51 checks pass (chat, streaming, tools, embeddings, multi-choice, errors)

Summary by CodeRabbit

New Features
- Added a public toggle (default off) to emit structured gen_ai.input.messages and gen_ai.output.messages on spans.
Refactor
- Replaced legacy LLM span keys with GenAI semantic-convention keys for operation, usage, headers, tools, and streaming; improved message serialization and normalized finish_reason values; adjusted tool-call field naming.
Tests
- Added unit and VCR tests for message-attribute formatting and event emission; updated many assertions to use GenAI keys; expanded filtered HTTP headers.
Chores
- Bumped semantic-conventions version ranges.

coderabbitai · 2026-03-24T12:20:30Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Switches many span semantic keys from LLM to GenAI conventions, adds Config.use_messages_attributes and an OpenAIInstrumentor parameter to optionally serialize inputs/outputs as structured gen_ai.input.messages/gen_ai.output.messages, updates wrappers/models/tests accordingly, and bumps semantic-conventions dependency ranges.

Changes

Cohort / File(s)	Summary
Config & Instrumentor `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/__init__.py`, `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/config.py`, `packages/opentelemetry-instrumentation-openai/pyproject.toml`	Added `use_messages_attributes: bool` to `OpenAIInstrumentor.__init__` and `Config` (default False); updated opentelemetry semantic-conventions dependency ranges.
Shared semconv replacement & headers handling `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py`	Replaced many `SpanAttributes.LLM_` keys with `GenAIAttributes.GEN_AI_`; changed tool/function attribute prefix to `GEN_AI_TOOL_DEFINITIONS.*`; compute `headers = extra_headers or headers` and emit `GEN_AI_HEADERS` only when non-None.
Chat & Completion wrappers (message feature gated) `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py`, `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py`	Set GenAI operation-name attributes; introduced `Config.use_messages_attributes` gating to choose legacy per-choice attributes or new `_set_input_messages`/`_set_output_messages` flows that serialize structured messages to `gen_ai.input.messages` / `gen_ai.output.messages`; added finish-reason normalization.
Embeddings, Assistant, Realtime, Responses `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/embeddings_wrappers.py`, `.../v1/assistant_wrappers.py`, `.../v1/realtime_wrappers.py`, `.../v1/responses_wrappers.py`	Replaced LLM semconv keys with GenAI equivalents for operation name, usage totals, OpenAI base/version, streaming flags; tightened modalities emission; `process_content_block` derives `type` from payload; updated tool-definition attribute keys.
Event models `packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py`	`_FunctionToolCall` TypedDict: renamed `function_name` → `name` and changed `arguments` type/optionality to `NotRequired[Optional[str]]`.
Tests — fixtures, cassettes, assertions, new tests `packages/opentelemetry-instrumentation-openai/tests/conftest.py`, `packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py`, `packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/`, `packages/opentelemetry-instrumentation-openai/tests/traces/`, `packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py`, `packages/opentelemetry-instrumentation-openai/tests/traces/test_event_emitter.py`	Added `instrument_with_messages_attributes` fixture; expanded VCR `filter_headers`; added VCR cassettes; new unit tests for message-attribute serialization and event emission; updated many tests to assert `GEN_AI_` keys instead of legacy `LLM_` keys.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through spans with nimble paws,

swapped old keys for GenAI laws.
Messages nested, tidy and bright,
a config switch to show them right.
I thumped the logs and danced all night.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(open-ai): instrumentation to support OTel GenAI Semantic Conventions 0.5.0' clearly and specifically summarizes the main change: updating OpenAI instrumentation to support the new GenAI Semantic Conventions 0.5.0 standard, which aligns with the extensive modifications across all files in the changeset.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch max/tlp-1929-openai-instrumentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py (1)

626-630: ⚠️ Potential issue | 🟡 Minor

Confusing variable names in token usage assertion.

The variable names don't match the attributes being retrieved:

completion_tokens ← GEN_AI_USAGE_TOTAL_TOKENS
prompt_tokens ← GEN_AI_USAGE_INPUT_TOKENS
total_tokens ← GEN_AI_USAGE_OUTPUT_TOKENS

The assertion completion_tokens + prompt_tokens == total_tokens effectively checks total + input == output, which is mathematically incorrect. This appears to be a pre-existing issue, but worth fixing for clarity.

♻️ Suggested fix

-    completion_tokens = open_ai_span.attributes.get(SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS)
-    prompt_tokens = open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS)
-    total_tokens = open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS)
-    if completion_tokens and prompt_tokens and total_tokens:
-        assert completion_tokens + prompt_tokens == total_tokens
+    input_tokens = open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS)
+    output_tokens = open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS)
+    total_tokens = open_ai_span.attributes.get(SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS)
+    if input_tokens and output_tokens and total_tokens:
+        assert input_tokens + output_tokens == total_tokens

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py`
around lines 626 - 630, The three variables are misnamed and the assertion is
checking the wrong relationship: change the variable names so they match the
attributes (e.g., total_tokens =
open_ai_span.attributes.get(SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS),
input_tokens =
open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS),
output_tokens =
open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS)) and
update the assertion to assert input_tokens + output_tokens == total_tokens
(keeping the existing guard that all three are truthy).

🧹 Nitpick comments (4)

packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)

94-109: Also prove the legacy prompt/completion attributes stay off.

This test only checks that gen_ai.input.messages and gen_ai.output.messages exist. It would still pass if the span emitted both the new payload and the old gen_ai.prompt.* / gen_ai.completion.* keys, which is the regression this flag is most likely to introduce.

Proposed assertion additions

     assert output_messages[0]["finish_reason"] == "stop"
     assert len(output_messages[0]["parts"]) >= 1
     assert output_messages[0]["parts"][0]["type"] == "text"
     assert "content" in output_messages[0]["parts"][0]
+    assert f"{GenAIAttributes.GEN_AI_PROMPT}.0.content" not in open_ai_span.attributes
+    assert f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content" not in open_ai_span.attributes

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py`
around lines 94 - 109, Add assertions to ensure legacy prompt/completion
attributes are not present on the span: check the span attributes
(open_ai_span.attributes) do not contain any keys starting with "gen_ai.prompt."
or "gen_ai.completion." (for example by asserting not
any(k.startswith("gen_ai.prompt.") or k.startswith("gen_ai.completion.") for k
in open_ai_span.attributes.keys())). Keep these checks near the existing
input/output message validations so the test fails if legacy fields are emitted
alongside the new gen_ai.input.messages/gen_ai.output.messages attributes.

packages/opentelemetry-instrumentation-openai/tests/conftest.py (1)

191-200: Restore the prior config value in teardown.

This fixture mutates a process-global singleton and always tears it back down to False. Capturing the previous value makes it compose safely with any future override and avoids silent state bleed if the default changes.

Proposed fix

 `@pytest.fixture`(scope="function")
 def instrument_with_messages_attributes(
     instrument_legacy, reader, tracer_provider, logger_provider, meter_provider
 ):
     instrumentor = instrument_legacy
+    previous = Config.use_messages_attributes
     Config.use_messages_attributes = True
 
     yield instrumentor
 
-    Config.use_messages_attributes = False
+    Config.use_messages_attributes = previous

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-openai/tests/conftest.py` around lines
191 - 200, The fixture instrument_with_messages_attributes currently sets
Config.use_messages_attributes = True and always resets it to False; instead
capture the previous value before mutating (prev =
Config.use_messages_attributes) and after yield restore it
(Config.use_messages_attributes = prev) so the fixture composes safely with
other tests and preserves prior global state; locate this logic in the
instrument_with_messages_attributes fixture and replace the unconditional
teardown with restoring the saved prev value.

packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py (2)

103-117: Consider strengthening assertions for None content tests.

The tests verify that parts is not None, but it would be more explicit to verify the expected behavior. Based on the implementation in _set_input_messages, when content=None, the parts will be an empty list [].

💡 Optional: More explicit assertions

     def test_user_with_none_content(self, mock_span):
         """User message with None content should not crash or produce null parts."""
         _set_input_messages(mock_span, [{"role": "user", "content": None}])
         result = _get_input_messages(mock_span)
 
         assert len(result) == 1
-        assert result[0]["parts"] is not None
+        assert result[0]["parts"] == []
 
     def test_system_with_none_content(self, mock_span):
         """System message with None content should not crash or produce null parts."""
         _set_input_messages(mock_span, [{"role": "system", "content": None}])
         result = _get_input_messages(mock_span)
 
         assert len(result) == 1
-        assert result[0]["parts"] is not None
+        assert result[0]["parts"] == []

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py`
around lines 103 - 117, Change the two tests test_user_with_none_content and
test_system_with_none_content to assert the explicit expected value for parts
(an empty list) rather than only checking that parts is not None; locate the
assertions after calling _set_input_messages and _get_input_messages and replace
the "parts is not None" checks with an equality check against [] to reflect the
implementation of _set_input_messages.

211-217: Test assertion is incomplete.

The test verifies the function doesn't crash, but doesn't verify the expected output. Based on the implementation, when the only choice has message=None, the function will set GEN_AI_OUTPUT_MESSAGES to an empty JSON array "[]".

💡 Optional: Add explicit assertion

     def test_none_message_in_choice(self, mock_span):
         """Choice with message=None should not crash."""
         choices = [
             {"index": 0, "message": None, "finish_reason": "content_filter"},
         ]
         _set_output_messages(mock_span, choices)
-        # Should not crash; attribute may or may not be set
+        # Should not crash; attribute should be set to empty array
+        result = _get_output_messages(mock_span)
+        assert result == []

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py`
around lines 211 - 217, The test should assert the expected attribute value when
the only choice has message=None: after calling _set_output_messages(mock_span,
choices) verify that mock_span.set_attribute was called with the
GEN_AI_OUTPUT_MESSAGES key and a value representing an empty JSON array (i.e.,
"[]"); update the test_none_message_in_choice test to check
mock_span.set_attribute(GEN_AI_OUTPUT_MESSAGES, "[]") (or equivalent call
recorded on mock_span) so the behavior is explicitly validated.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py`:
- Around line 135-141: Don't serialize raw caller metadata into spans: remove
direct calls that stringify kwargs.get("user"), kwargs.get("headers"), and
kwargs.get("extra_headers") and instead validate and sanitize or omit them
before setting attributes; update uses of _set_span_attribute(span,
SpanAttributes.GEN_AI_USER, ...) and GEN_AI_HEADERS to first check for presence
(avoid recording the literal "None"), apply a sanitizer/allowlist that strips
PII/tokens (or only pick specific safe keys), and only call _set_span_attribute
with the sanitized or explicitly allowed value (or skip setting the attribute
entirely if nothing safe remains).

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py`:
- Around line 169-179: The _set_input_messages function currently drops
empty-string prompts and collapses list prompts to the first element; update it
so it preserves list-shaped prompts and allows empty-string items: remove the
truthiness check that returns on empty prompt, treat prompt uniformly (if it's a
list use it as-is, otherwise wrap non-list prompts into a single-item list),
then JSON-serialize an array mapping each item to
{"role":"user","parts":[{"content": item, "type":"text"}]} and pass that to
_set_span_attribute with GenAIAttributes.GEN_AI_INPUT_MESSAGES so batched
prompts and token arrays are fully recorded.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py`:
- Around line 5-7: The TypedDict _FunctionToolCall currently declares arguments:
Optional[dict[str, Any]] but the instrumentation emits/passes a raw JSON string;
update the type to match the emitted payload by changing the annotation on
arguments in _FunctionToolCall to Optional[str] (or, if you intentionally
support both parsed and raw forms, make it explicit as Optional[Union[str,
dict[str, Any]]]) so type-checkers and downstream callers see the correct shape;
ensure the change targets the _FunctionToolCall declaration and any related type
imports (e.g., add typing.Union if chosen).

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py`:
- Around line 123-126: Replace the incorrect use of
GenAIAttributes.GEN_AI_OPERATION_NAME for modalities and the non-standard
operation value "realtime": update the call that sets
f"{GenAIAttributes.GEN_AI_OPERATION_NAME}.modalities" (within the
_set_span_attribute call that references session.modalities) to instead set the
semantic attribute for output modality (use gen_ai.output.type) with one of the
well-known values like "text", "json", or "speech"; and ensure any span
attribute assignment that sets GenAIAttributes.GEN_AI_OPERATION_NAME (search for
usages that assign "realtime") uses one of the allowed operation names ("chat",
"create_agent", "embeddings", "execute_tool", "generate_content",
"invoke_agent", "retrieval", or "text_completion") or is replaced with a
justified custom attribute name if a non-standard operation is truly required.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml`:
- Around line 78-105: The cassette test_chat_with_messages_attributes.yaml
contains sensitive response headers (openai-organization, openai-project,
set-cookie, x-request-id) that weren't scrubbed; update the VCR scrubber used
for this suite by adding response filtering in before_record (or extending
filter_headers) to remove or replace those header keys (openai-organization,
openai-project, set-cookie, x-request-id) from recorded responses, then re-run
the tests to re-record test_chat_with_messages_attributes.yaml so the cassette
no longer contains those sensitive values.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml`:
- Around line 33-34: The cassette contains account-scoped headers (traceparent,
openai-organization, openai-project, set-cookie, x-request-id) that must be
scrubbed before committing; add VCR header filters or a
before_record/before_record_response hook to replace those header values with
deterministic placeholders (e.g., <TRACEPARENT>, <ORG>, <PROJECT>, <SET_COOKIE>,
<X_REQUEST_ID>), update any matching body fields if present, then re-record the
cassette (the YAML under
test_completions/test_completion_with_messages_attributes.yaml and other
affected cassettes referenced in lines ~79-110) so the committed cassette
contains only scrubbed placeholders.

---

Outside diff comments:
In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py`:
- Around line 626-630: The three variables are misnamed and the assertion is
checking the wrong relationship: change the variable names so they match the
attributes (e.g., total_tokens =
open_ai_span.attributes.get(SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS),
input_tokens =
open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS),
output_tokens =
open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS)) and
update the assertion to assert input_tokens + output_tokens == total_tokens
(keeping the existing guard that all three are truthy).

---

Nitpick comments:
In `@packages/opentelemetry-instrumentation-openai/tests/conftest.py`:
- Around line 191-200: The fixture instrument_with_messages_attributes currently
sets Config.use_messages_attributes = True and always resets it to False;
instead capture the previous value before mutating (prev =
Config.use_messages_attributes) and after yield restore it
(Config.use_messages_attributes = prev) so the fixture composes safely with
other tests and preserves prior global state; locate this logic in the
instrument_with_messages_attributes fixture and replace the unconditional
teardown with restoring the saved prev value.

In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py`:
- Around line 94-109: Add assertions to ensure legacy prompt/completion
attributes are not present on the span: check the span attributes
(open_ai_span.attributes) do not contain any keys starting with "gen_ai.prompt."
or "gen_ai.completion." (for example by asserting not
any(k.startswith("gen_ai.prompt.") or k.startswith("gen_ai.completion.") for k
in open_ai_span.attributes.keys())). Keep these checks near the existing
input/output message validations so the test fails if legacy fields are emitted
alongside the new gen_ai.input.messages/gen_ai.output.messages attributes.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py`:
- Around line 103-117: Change the two tests test_user_with_none_content and
test_system_with_none_content to assert the explicit expected value for parts
(an empty list) rather than only checking that parts is not None; locate the
assertions after calling _set_input_messages and _get_input_messages and replace
the "parts is not None" checks with an equality check against [] to reflect the
implementation of _set_input_messages.
- Around line 211-217: The test should assert the expected attribute value when
the only choice has message=None: after calling _set_output_messages(mock_span,
choices) verify that mock_span.set_attribute was called with the
GEN_AI_OUTPUT_MESSAGES key and a value representing an empty JSON array (i.e.,
"[]"); update the test_none_message_in_choice test to check
mock_span.set_attribute(GEN_AI_OUTPUT_MESSAGES, "[]") (or equivalent call
recorded on mock_span) so the behavior is explicitly validated.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a10d3c9a-1341-48bb-a2ef-ae05c4745cff

📥 Commits

Reviewing files that changed from the base of the PR and between 616c2a9 and a83cdbc.

⛔ Files ignored due to path filters (1)

packages/opentelemetry-instrumentation-openai/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (26)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/__init__.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/config.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/embeddings_wrappers.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
packages/opentelemetry-instrumentation-openai/pyproject.toml
packages/opentelemetry-instrumentation-openai/tests/conftest.py
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_parse.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_functions.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_vision.py

...opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py

...ry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py

...telemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py

...elemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py

...trumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml

...openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)

80-85: ⚠️ Potential issue | 🟡 Minor

Skip tool parameters when the schema is absent.

json.dumps(function.get("parameters")) turns a missing schema into the literal string "null", so spans claim a parameters payload even when none was provided. Guard on parameters is not None before setting the attribute.

💡 Suggested fix

     for i, function in enumerate(functions):
         prefix = f"{GenAIAttributes.GEN_AI_TOOL_DEFINITIONS}.{i}"
         _set_span_attribute(span, f"{prefix}.name", function.get("name"))
         _set_span_attribute(span, f"{prefix}.description", function.get("description"))
-        _set_span_attribute(
-            span, f"{prefix}.parameters", json.dumps(function.get("parameters"))
-        )
+        parameters = function.get("parameters")
+        if parameters is not None:
+            _set_span_attribute(span, f"{prefix}.parameters", json.dumps(parameters))
@@
     for i, tool in enumerate(tools):
         function = tool.get("function")
         if not function:
             continue
 
         prefix = f"{GenAIAttributes.GEN_AI_TOOL_DEFINITIONS}.{i}"
         _set_span_attribute(span, f"{prefix}.name", function.get("name"))
         _set_span_attribute(span, f"{prefix}.description", function.get("description"))
-        _set_span_attribute(
-            span, f"{prefix}.parameters", json.dumps(function.get("parameters"))
-        )
+        parameters = function.get("parameters")
+        if parameters is not None:
+            _set_span_attribute(span, f"{prefix}.parameters", json.dumps(parameters))

Also applies to: 97-102

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py`
around lines 80 - 85, The code is unconditionally serializing
function.get("parameters") which turns a missing schema into the string "null";
update the logic in the block that builds tool definition span attributes (the
code using GenAIAttributes.GEN_AI_TOOL_DEFINITIONS, _set_span_attribute and the
local variable function) to first check whether function.get("parameters") is
not None and only call _set_span_attribute for the ".parameters" key when
parameters is present; apply the same guard to the other identical block around
the 97-102 region so you don't record a "null" parameters payload in spans.

♻️ Duplicate comments (2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py (1)
169-183: ⚠️ Potential issue | 🟠 Major

Handle non-string list prompts as one completion input.

The completions client still allows prompt to be a string, an array of strings, an array of tokens, or an array of token arrays. This branch treats every list as “one message per element”, so tokenized prompts get split across multiple gen_ai.input.messages entries and nested token arrays end up as list-valued content. Restrict the multi-message path to list[str] and serialize other non-string shapes as a single message. (raw.githubusercontent.com)
💡 Suggested fix
 def _set_input_messages(span, prompt):
     if not span.is_recording() or prompt is None:
         return
 
-    prompts = prompt if isinstance(prompt, list) else [prompt]
+    prompts = (
+        prompt
+        if isinstance(prompt, list) and all(isinstance(p, str) for p in prompt)
+        else [prompt]
+    )
 
     messages = [
-        {"role": "user", "parts": [{"content": p, "type": "text"}]}
+        {
+            "role": "user",
+            "parts": [
+                {
+                    "content": p if isinstance(p, str) else json.dumps(p),
+                    "type": "text",
+                }
+            ],
+        }
         for p in prompts
     ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py`
around lines 169 - 183, The _set_input_messages function currently treats any
list prompt as multiple string messages which splits token lists; update it so
only a list of strings (list[str]) is expanded into multiple messages, and for
any other prompt shape (e.g., list of tokens or nested lists) serialize the
entire prompt as a single message content; modify the branch in
_set_input_messages (referencing the function name and
GenAIAttributes.GEN_AI_INPUT_MESSAGES and _set_span_attribute) to detect
isinstance(prompt, list) and all(isinstance(p, str) for p in prompt) before
creating multiple {"role":"user","parts":[{"content":..., "type":"text"}]}
entries, otherwise create a single message with the serialized prompt value.
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)

135-138: ⚠️ Potential issue | 🟠 Major

Don't emit raw user and header values into spans.

OpenAI documents user as an end-user identifier and extra_headers as arbitrary request headers. Recording both verbatim here turns stable identifiers and possible auth/session data into durable telemetry; sanitize/allowlist them or drop them entirely before calling _set_span_attribute. (raw.githubusercontent.com)

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py`:
- Around line 80-85: The code is unconditionally serializing
function.get("parameters") which turns a missing schema into the string "null";
update the logic in the block that builds tool definition span attributes (the
code using GenAIAttributes.GEN_AI_TOOL_DEFINITIONS, _set_span_attribute and the
local variable function) to first check whether function.get("parameters") is
not None and only call _set_span_attribute for the ".parameters" key when
parameters is present; apply the same guard to the other identical block around
the 97-102 region so you don't record a "null" parameters payload in spans.

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py`:
- Around line 169-183: The _set_input_messages function currently treats any
list prompt as multiple string messages which splits token lists; update it so
only a list of strings (list[str]) is expanded into multiple messages, and for
any other prompt shape (e.g., list of tokens or nested lists) serialize the
entire prompt as a single message content; modify the branch in
_set_input_messages (referencing the function name and
GenAIAttributes.GEN_AI_INPUT_MESSAGES and _set_span_attribute) to detect
isinstance(prompt, list) and all(isinstance(p, str) for p in prompt) before
creating multiple {"role":"user","parts":[{"content":..., "type":"text"}]}
entries, otherwise create a single message with the serialized prompt value.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f3419d11-cacd-4bd5-a8c7-e9f12d1ed800

📥 Commits

Reviewing files that changed from the base of the PR and between a83cdbc and 1de5ff4.

📒 Files selected for processing (12)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py
packages/opentelemetry-instrumentation-openai/tests/conftest.py
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py

✅ Files skipped from review due to trivial changes (3)

packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml

🚧 Files skipped from review as they are similar to previous changes (3)

packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py

coderabbitai

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py (1)

292-300: Make None explicit in request-attribute tests.

Line 294 and Line 300 currently validate “missing key” behavior, while the docstrings describe explicit None. Passing headers=None / user=None directly would better lock intent and future-proof these tests.

Suggested refinement

 class TestRequestAttributesSerialization:
     def test_none_headers_not_recorded_as_string(self, mock_span):
         """kwargs with headers=None should not record literal 'None'."""
-        _set_request_attributes(mock_span, {"model": "gpt-4"})
+        _set_request_attributes(mock_span, {"model": "gpt-4", "headers": None})
         val = mock_span._attrs.get(SpanAttributes.GEN_AI_HEADERS)
         assert val != "None", "Should not record literal string 'None'"
 
     def test_none_user_not_recorded(self, mock_span):
         """kwargs with user=None should not record attribute."""
-        _set_request_attributes(mock_span, {"model": "gpt-4"})
+        _set_request_attributes(mock_span, {"model": "gpt-4", "user": None})
         assert SpanAttributes.GEN_AI_USER not in mock_span._attrs

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py`
around lines 292 - 300, The tests claim to check explicit None values but
currently omit them; update test_none_headers_not_recorded_as_string to call
_set_request_attributes(mock_span, {"model": "gpt-4"}, headers=None) and assert
that SpanAttributes.GEN_AI_HEADERS is either absent or None (not the literal
string "None"), and update test_none_user_not_recorded to call
_set_request_attributes(mock_span, {"model": "gpt-4"}, user=None) and assert
that the user attribute is not present (or is None) in mock_span._attrs; use the
existing test function names (test_none_headers_not_recorded_as_string,
test_none_user_not_recorded) and the helper
_set_request_attributes/SpanAttributes symbols to locate and change the calls
and assertions.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py`:
- Around line 292-300: The tests claim to check explicit None values but
currently omit them; update test_none_headers_not_recorded_as_string to call
_set_request_attributes(mock_span, {"model": "gpt-4"}, headers=None) and assert
that SpanAttributes.GEN_AI_HEADERS is either absent or None (not the literal
string "None"), and update test_none_user_not_recorded to call
_set_request_attributes(mock_span, {"model": "gpt-4"}, user=None) and assert
that the user attribute is not present (or is None) in mock_span._attrs; use the
existing test function names (test_none_headers_not_recorded_as_string,
test_none_user_not_recorded) and the helper
_set_request_attributes/SpanAttributes symbols to locate and change the calls
and assertions.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bf095afb-423d-45a1-9f9e-4de2c2e07909

📥 Commits

Reviewing files that changed from the base of the PR and between 1de5ff4 and c2b8059.

📒 Files selected for processing (1)

packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)

261-271: ⚠️ Potential issue | 🟠 Major

Handle dict-shaped usage before dereferencing token fields.

traced_response.usage can be a plain dict here: the fallback type alias is Dict[str, Any], and Lines 529, 595, 695, and 757 also rehydrate saved state through model_dump(). This block still uses attribute access for input_tokens, output_tokens, total_tokens, and input_tokens_details, so those paths can raise here and skip span enrichment.

🐛 Proposed fix

     _set_span_attribute(span, OpenAIAttributes.OPENAI_RESPONSE_SERVICE_TIER, traced_response.response_service_tier)
     if usage := traced_response.usage:
-        _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.input_tokens)
-        _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.output_tokens)
-        _set_span_attribute(
-            span, SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS, usage.total_tokens
-        )
-        if usage.input_tokens_details:
+        input_tokens = usage.get("input_tokens") if isinstance(usage, dict) else getattr(usage, "input_tokens", None)
+        output_tokens = usage.get("output_tokens") if isinstance(usage, dict) else getattr(usage, "output_tokens", None)
+        total_tokens = usage.get("total_tokens") if isinstance(usage, dict) else getattr(usage, "total_tokens", None)
+        input_tokens_details = (
+            usage.get("input_tokens_details") if isinstance(usage, dict)
+            else getattr(usage, "input_tokens_details", None)
+        )
+
+        _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, input_tokens)
+        _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, output_tokens)
+        _set_span_attribute(span, SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS, total_tokens)
+        if input_tokens_details:
             _set_span_attribute(
                 span,
                 SpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,
-                usage.input_tokens_details.cached_tokens,
+                input_tokens_details.get("cached_tokens") if isinstance(input_tokens_details, dict)
+                else getattr(input_tokens_details, "cached_tokens", None),
             )

Also applies to: 275-289

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py`
around lines 261 - 271, traced_response.usage may be a dict instead of an object
with attributes, so update the span-enrichment code (the block that reads
traced_response.usage and calls _set_span_attribute with
GenAIAttributes/SpanAttributes) to handle both dict and attribute-style objects:
detect with isinstance(usage, dict) and read tokens via
usage.get("input_tokens"), usage.get("output_tokens"),
usage.get("total_tokens"), and nested usage.get("input_tokens_details",
{}).get("cached_tokens") when dict; otherwise use the existing attribute access
(usage.input_tokens, usage.output_tokens, usage.total_tokens,
usage.input_tokens_details.cached_tokens). Apply the same fix to the analogous
usage-handling block later in the file (the second occurrence that currently
uses attribute access).

♻️ Duplicate comments (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)

135-138: ⚠️ Potential issue | 🟠 Major

Don't persist raw caller metadata into spans.

The None guard helps, but Line 135 and Lines 136-138 still copy caller-provided user identifiers and full header dictionaries verbatim into telemetry. That can durably leak auth tokens, cookies, tenant IDs, or other PII into tracing backends. Please drop these attributes or emit only an allowlisted/sanitized subset.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py`
around lines 135 - 138, The code currently records raw caller-supplied
identifiers and full header dicts into spans via _set_span_attribute(span,
SpanAttributes.GEN_AI_USER, kwargs.get("user")) and _set_span_attribute(span,
SpanAttributes.GEN_AI_HEADERS, str(headers)), which can leak PII/tokens; remove
these direct writes and either omit these attributes entirely or replace them
with a sanitized/allowlisted representation. Update the block that reads
kwargs.get("user") and kwargs.get("extra_headers")/kwargs.get("headers") so it
does not call _set_span_attribute with raw values — instead either (a) do
nothing, (b) set a safe boolean/enum like SpanAttributes.GEN_AI_USER_PRESENT or
SpanAttributes.GEN_AI_HEADERS_PRESENT, or (c) build a sanitized dict by
lowercasing header names and including only an allowlist (e.g.,
"content-type","user-agent","accept") and/or mask all values before calling
_set_span_attribute; ensure changes are applied where _set_span_attribute,
SpanAttributes.GEN_AI_USER, and SpanAttributes.GEN_AI_HEADERS are referenced.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py`:
- Around line 5-7: The _FunctionToolCall TypedDict currently declares arguments
as a required key (arguments: Optional[str]) which forces the key to exist even
when absent in real payloads; update the _FunctionToolCall declaration to make
the arguments key optional by changing its annotation to
NotRequired[Optional[str]] and add/import NotRequired from typing_extensions
(same import pattern used elsewhere in this package) so emitted/test payloads
that omit the arguments key match the type model.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py`:
- Around line 318-330: The code is referencing a non-existent semconv constant
GenAIAttributes.GEN_AI_TOOL_DEFINITIONS which will raise AttributeError; update
the instrumentation in responses_wrappers.py to use the literal string key
"gen_ai.tool.definitions" wherever GenAIAttributes.GEN_AI_TOOL_DEFINITIONS is
used (e.g., in the calls to _set_span_attribute that set description,
parameters, name), and ensure all occurrences are consistently replaced so span
attributes are set without relying on the missing semconv constant.

---

Outside diff comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py`:
- Around line 261-271: traced_response.usage may be a dict instead of an object
with attributes, so update the span-enrichment code (the block that reads
traced_response.usage and calls _set_span_attribute with
GenAIAttributes/SpanAttributes) to handle both dict and attribute-style objects:
detect with isinstance(usage, dict) and read tokens via
usage.get("input_tokens"), usage.get("output_tokens"),
usage.get("total_tokens"), and nested usage.get("input_tokens_details",
{}).get("cached_tokens") when dict; otherwise use the existing attribute access
(usage.input_tokens, usage.output_tokens, usage.total_tokens,
usage.input_tokens_details.cached_tokens). Apply the same fix to the analogous
usage-handling block later in the file (the second occurrence that currently
uses attribute access).

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py`:
- Around line 135-138: The code currently records raw caller-supplied
identifiers and full header dicts into spans via _set_span_attribute(span,
SpanAttributes.GEN_AI_USER, kwargs.get("user")) and _set_span_attribute(span,
SpanAttributes.GEN_AI_HEADERS, str(headers)), which can leak PII/tokens; remove
these direct writes and either omit these attributes entirely or replace them
with a sanitized/allowlisted representation. Update the block that reads
kwargs.get("user") and kwargs.get("extra_headers")/kwargs.get("headers") so it
does not call _set_span_attribute with raw values — instead either (a) do
nothing, (b) set a safe boolean/enum like SpanAttributes.GEN_AI_USER_PRESENT or
SpanAttributes.GEN_AI_HEADERS_PRESENT, or (c) build a sanitized dict by
lowercasing header names and including only an allowlist (e.g.,
"content-type","user-agent","accept") and/or mask all values before calling
_set_span_attribute; ensure changes are applied where _set_span_attribute,
SpanAttributes.GEN_AI_USER, and SpanAttributes.GEN_AI_HEADERS are referenced.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8c9ec5df-f688-4d8a-8a41-3d6ada1449fc

📥 Commits

Reviewing files that changed from the base of the PR and between c2b8059 and 33f2b26.

⛔ Files ignored due to path filters (1)

packages/opentelemetry-instrumentation-openai/uv.lock is excluded by !**/*.lock

📒 Files selected for processing (27)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/__init__.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/config.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/embeddings_wrappers.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
packages/opentelemetry-instrumentation-openai/pyproject.toml
packages/opentelemetry-instrumentation-openai/tests/conftest.py
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_parse.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_functions.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_vision.py

✅ Files skipped from review due to trivial changes (13)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/config.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py
packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_parse.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_vision.py
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml
packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py

🚧 Files skipped from review as they are similar to previous changes (7)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/init.py
packages/opentelemetry-instrumentation-openai/pyproject.toml
packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py
packages/opentelemetry-instrumentation-openai/tests/conftest.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py

...telemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py

...lemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py

coderabbitai

🧹 Nitpick comments (1)

packages/opentelemetry-instrumentation-openai/tests/traces/test_event_emitter.py (1)

68-211: Optional: parameterize event-type/prompt-state cases to reduce duplication.

Both MessageEvent and ChoiceEvent suites repeat the same patch/assert scaffolding; a paramized matrix would keep intent while shrinking test maintenance surface.

♻️ Example refactor sketch

+@pytest.mark.parametrize(
+    "event_factory,send_prompts,tool_call_factory",
+    [
+        ("message", True, _make_tool_call_without_arguments),
+        ("message", False, _make_tool_call_without_arguments),
+        ("message", False, _make_tool_call_with_none_arguments),
+        ("message", False, _make_tool_call_with_arguments),
+        ("choice", True, _make_tool_call_without_arguments),
+        ("choice", False, _make_tool_call_without_arguments),
+    ],
+)
+def test_emit_tool_call_arguments_behavior(mock_event_logger, event_factory, send_prompts, tool_call_factory):
+    ...

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_event_emitter.py`
around lines 68 - 211, Combine the duplicate MessageEvent and ChoiceEvent tests
into a parametric test that iterates over event classes (MessageEvent,
ChoiceEvent), tool-call variants (_make_tool_call_without_arguments,
_make_tool_call_with_none_arguments, _make_tool_call_with_arguments), and
should_send_prompts True/False; in the single test call emit_event(event) and
assert mock_event_logger.emit was called and that
body["tool_calls"][0]["function"] either retains name "get_weather" when
arguments are absent/should_send_prompts=True or does not contain the
"arguments" key when should_send_prompts=False or arguments are None, reusing
the existing helpers and patches for should_emit_events and should_send_prompts
and keeping mock_event_logger fixture.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_event_emitter.py`:
- Around line 68-211: Combine the duplicate MessageEvent and ChoiceEvent tests
into a parametric test that iterates over event classes (MessageEvent,
ChoiceEvent), tool-call variants (_make_tool_call_without_arguments,
_make_tool_call_with_none_arguments, _make_tool_call_with_arguments), and
should_send_prompts True/False; in the single test call emit_event(event) and
assert mock_event_logger.emit was called and that
body["tool_calls"][0]["function"] either retains name "get_weather" when
arguments are absent/should_send_prompts=True or does not contain the
"arguments" key when should_send_prompts=False or arguments are None, reusing
the existing helpers and patches for should_emit_events and should_send_prompts
and keeping mock_event_logger fixture.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 58362de4-f7a3-4762-9218-57218fe75a46

📥 Commits

Reviewing files that changed from the base of the PR and between 33f2b26 and b094ce3.

📒 Files selected for processing (2)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py
packages/opentelemetry-instrumentation-openai/tests/traces/test_event_emitter.py

🚧 Files skipped from review as they are similar to previous changes (1)

packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py

…bility Bring back all LLM_* constants removed in v0.5.0 with their original string values. This allows non-migrated instrumentation packages to depend on semconv >=0.5.1 without code changes. - Add legacy LLM_* constants to SpanAttributes with TODO comments - Add GEN_AI_USAGE_CACHE_*_DEPRECATED for value-changed cache attrs - Update _testing.py to verify legacy constants are present - Bump semconv to 0.5.1 - Update all 32 packages to depend on >=0.5.1,<0.6.0 - Add local uv source overrides for semconv in all packages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

OzBenSimhonTraceloop · 2026-03-25T12:48:52Z