Skip to content

feat(open-ai): instrumentation to support OTel GenAI Semantic Conventions 0.5.0#3844

Open
max-deygin-traceloop wants to merge 15 commits intomainfrom
max/tlp-1929-openai-instrumentation
Open

feat(open-ai): instrumentation to support OTel GenAI Semantic Conventions 0.5.0#3844
max-deygin-traceloop wants to merge 15 commits intomainfrom
max/tlp-1929-openai-instrumentation

Conversation

@max-deygin-traceloop
Copy link
Contributor

@max-deygin-traceloop max-deygin-traceloop commented Mar 24, 2026

Summary

Update OpenAI instrumentation to align with OTel GenAI Semantic Conventions 0.5.0.

  • Migrate span attributes from custom SpanAttributes to upstream gen_ai_attributes and openai_attributes (gen_ai.request.frequency_penalty,
    gen_ai.request.presence_penalty, gen_ai.operation.name, etc.)
  • Replace deprecated gen_ai.openai.request.service_tier / gen_ai.openai.response.service_tier with openai.request.service_tier / openai.response.service_tier
  • Add gen_ai.input.messages / gen_ai.output.messages support (use_messages_attributes flag) following the OTel JSON schema (role, parts[{type, content}],
    finish_reason)
  • Map OpenAI's tool_calls finish reason to semconv's tool_call
  • Bump opentelemetry-semantic-conventions-ai dependency to >=0.5.0,<0.6.0 and opentelemetry-semantic-conventions to >=0.60b1

Test plan

  • 216 unit tests pass (VCR cassettes)
  • New test_messages_attributes.py covers input/output message formatting (multi-turn, tool calls, edge cases)
  • New test_chat_with_messages_attributes VCR test validates end-to-end attribute shape
  • Live smoke test against real OpenAI API (gpt-4.1-nano)
  • E2E semconv compliance suite: 7 scenarios / 51 checks pass (chat, streaming, tools, embeddings, multi-choice, errors)

Summary by CodeRabbit

  • New Features

    • Added a public toggle (default off) to emit structured gen_ai.input.messages and gen_ai.output.messages on spans.
  • Refactor

    • Replaced legacy LLM span keys with GenAI semantic-convention keys for operation, usage, headers, tools, and streaming; improved message serialization and normalized finish_reason values; adjusted tool-call field naming.
  • Tests

    • Added unit and VCR tests for message-attribute formatting and event emission; updated many assertions to use GenAI keys; expanded filtered HTTP headers.
  • Chores

    • Bumped semantic-conventions version ranges.

@coderabbitai
Copy link

coderabbitai bot commented Mar 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Switches many span semantic keys from LLM to GenAI conventions, adds Config.use_messages_attributes and an OpenAIInstrumentor parameter to optionally serialize inputs/outputs as structured gen_ai.input.messages/gen_ai.output.messages, updates wrappers/models/tests accordingly, and bumps semantic-conventions dependency ranges.

Changes

Cohort / File(s) Summary
Config & Instrumentor
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/__init__.py, packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/config.py, packages/opentelemetry-instrumentation-openai/pyproject.toml
Added use_messages_attributes: bool to OpenAIInstrumentor.__init__ and Config (default False); updated opentelemetry semantic-conventions dependency ranges.
Shared semconv replacement & headers handling
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py
Replaced many SpanAttributes.LLM_* keys with GenAIAttributes.GEN_AI_*; changed tool/function attribute prefix to GEN_AI_TOOL_DEFINITIONS.*; compute headers = extra_headers or headers and emit GEN_AI_HEADERS only when non-None.
Chat & Completion wrappers (message feature gated)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py, packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
Set GenAI operation-name attributes; introduced Config.use_messages_attributes gating to choose legacy per-choice attributes or new _set_input_messages/_set_output_messages flows that serialize structured messages to gen_ai.input.messages / gen_ai.output.messages; added finish-reason normalization.
Embeddings, Assistant, Realtime, Responses
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/embeddings_wrappers.py, .../v1/assistant_wrappers.py, .../v1/realtime_wrappers.py, .../v1/responses_wrappers.py
Replaced LLM semconv keys with GenAI equivalents for operation name, usage totals, OpenAI base/version, streaming flags; tightened modalities emission; process_content_block derives type from payload; updated tool-definition attribute keys.
Event models
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py
_FunctionToolCall TypedDict: renamed function_namename and changed arguments type/optionality to NotRequired[Optional[str]].
Tests — fixtures, cassettes, assertions, new tests
packages/opentelemetry-instrumentation-openai/tests/conftest.py, packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py, packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/*, packages/opentelemetry-instrumentation-openai/tests/traces/*, packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py, packages/opentelemetry-instrumentation-openai/tests/traces/test_event_emitter.py
Added instrument_with_messages_attributes fixture; expanded VCR filter_headers; added VCR cassettes; new unit tests for message-attribute serialization and event emission; updated many tests to assert GEN_AI_* keys instead of legacy LLM_* keys.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I hopped through spans with nimble paws,

swapped old keys for GenAI laws.
Messages nested, tidy and bright,
a config switch to show them right.
I thumped the logs and danced all night.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(open-ai): instrumentation to support OTel GenAI Semantic Conventions 0.5.0' clearly and specifically summarizes the main change: updating OpenAI instrumentation to support the new GenAI Semantic Conventions 0.5.0 standard, which aligns with the extensive modifications across all files in the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch max/tlp-1929-openai-instrumentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py (1)

626-630: ⚠️ Potential issue | 🟡 Minor

Confusing variable names in token usage assertion.

The variable names don't match the attributes being retrieved:

  • completion_tokensGEN_AI_USAGE_TOTAL_TOKENS
  • prompt_tokensGEN_AI_USAGE_INPUT_TOKENS
  • total_tokensGEN_AI_USAGE_OUTPUT_TOKENS

The assertion completion_tokens + prompt_tokens == total_tokens effectively checks total + input == output, which is mathematically incorrect. This appears to be a pre-existing issue, but worth fixing for clarity.

♻️ Suggested fix
-    completion_tokens = open_ai_span.attributes.get(SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS)
-    prompt_tokens = open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS)
-    total_tokens = open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS)
-    if completion_tokens and prompt_tokens and total_tokens:
-        assert completion_tokens + prompt_tokens == total_tokens
+    input_tokens = open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS)
+    output_tokens = open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS)
+    total_tokens = open_ai_span.attributes.get(SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS)
+    if input_tokens and output_tokens and total_tokens:
+        assert input_tokens + output_tokens == total_tokens
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py`
around lines 626 - 630, The three variables are misnamed and the assertion is
checking the wrong relationship: change the variable names so they match the
attributes (e.g., total_tokens =
open_ai_span.attributes.get(SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS),
input_tokens =
open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS),
output_tokens =
open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS)) and
update the assertion to assert input_tokens + output_tokens == total_tokens
(keeping the existing guard that all three are truthy).
🧹 Nitpick comments (4)
packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py (1)

94-109: Also prove the legacy prompt/completion attributes stay off.

This test only checks that gen_ai.input.messages and gen_ai.output.messages exist. It would still pass if the span emitted both the new payload and the old gen_ai.prompt.* / gen_ai.completion.* keys, which is the regression this flag is most likely to introduce.

Proposed assertion additions
     assert output_messages[0]["finish_reason"] == "stop"
     assert len(output_messages[0]["parts"]) >= 1
     assert output_messages[0]["parts"][0]["type"] == "text"
     assert "content" in output_messages[0]["parts"][0]
+    assert f"{GenAIAttributes.GEN_AI_PROMPT}.0.content" not in open_ai_span.attributes
+    assert f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content" not in open_ai_span.attributes
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py`
around lines 94 - 109, Add assertions to ensure legacy prompt/completion
attributes are not present on the span: check the span attributes
(open_ai_span.attributes) do not contain any keys starting with "gen_ai.prompt."
or "gen_ai.completion." (for example by asserting not
any(k.startswith("gen_ai.prompt.") or k.startswith("gen_ai.completion.") for k
in open_ai_span.attributes.keys())). Keep these checks near the existing
input/output message validations so the test fails if legacy fields are emitted
alongside the new gen_ai.input.messages/gen_ai.output.messages attributes.
packages/opentelemetry-instrumentation-openai/tests/conftest.py (1)

191-200: Restore the prior config value in teardown.

This fixture mutates a process-global singleton and always tears it back down to False. Capturing the previous value makes it compose safely with any future override and avoids silent state bleed if the default changes.

Proposed fix
 `@pytest.fixture`(scope="function")
 def instrument_with_messages_attributes(
     instrument_legacy, reader, tracer_provider, logger_provider, meter_provider
 ):
     instrumentor = instrument_legacy
+    previous = Config.use_messages_attributes
     Config.use_messages_attributes = True
 
     yield instrumentor
 
-    Config.use_messages_attributes = False
+    Config.use_messages_attributes = previous
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/opentelemetry-instrumentation-openai/tests/conftest.py` around lines
191 - 200, The fixture instrument_with_messages_attributes currently sets
Config.use_messages_attributes = True and always resets it to False; instead
capture the previous value before mutating (prev =
Config.use_messages_attributes) and after yield restore it
(Config.use_messages_attributes = prev) so the fixture composes safely with
other tests and preserves prior global state; locate this logic in the
instrument_with_messages_attributes fixture and replace the unconditional
teardown with restoring the saved prev value.
packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py (2)

103-117: Consider strengthening assertions for None content tests.

The tests verify that parts is not None, but it would be more explicit to verify the expected behavior. Based on the implementation in _set_input_messages, when content=None, the parts will be an empty list [].

💡 Optional: More explicit assertions
     def test_user_with_none_content(self, mock_span):
         """User message with None content should not crash or produce null parts."""
         _set_input_messages(mock_span, [{"role": "user", "content": None}])
         result = _get_input_messages(mock_span)
 
         assert len(result) == 1
-        assert result[0]["parts"] is not None
+        assert result[0]["parts"] == []
 
     def test_system_with_none_content(self, mock_span):
         """System message with None content should not crash or produce null parts."""
         _set_input_messages(mock_span, [{"role": "system", "content": None}])
         result = _get_input_messages(mock_span)
 
         assert len(result) == 1
-        assert result[0]["parts"] is not None
+        assert result[0]["parts"] == []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py`
around lines 103 - 117, Change the two tests test_user_with_none_content and
test_system_with_none_content to assert the explicit expected value for parts
(an empty list) rather than only checking that parts is not None; locate the
assertions after calling _set_input_messages and _get_input_messages and replace
the "parts is not None" checks with an equality check against [] to reflect the
implementation of _set_input_messages.

211-217: Test assertion is incomplete.

The test verifies the function doesn't crash, but doesn't verify the expected output. Based on the implementation, when the only choice has message=None, the function will set GEN_AI_OUTPUT_MESSAGES to an empty JSON array "[]".

💡 Optional: Add explicit assertion
     def test_none_message_in_choice(self, mock_span):
         """Choice with message=None should not crash."""
         choices = [
             {"index": 0, "message": None, "finish_reason": "content_filter"},
         ]
         _set_output_messages(mock_span, choices)
-        # Should not crash; attribute may or may not be set
+        # Should not crash; attribute should be set to empty array
+        result = _get_output_messages(mock_span)
+        assert result == []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py`
around lines 211 - 217, The test should assert the expected attribute value when
the only choice has message=None: after calling _set_output_messages(mock_span,
choices) verify that mock_span.set_attribute was called with the
GEN_AI_OUTPUT_MESSAGES key and a value representing an empty JSON array (i.e.,
"[]"); update the test_none_message_in_choice test to check
mock_span.set_attribute(GEN_AI_OUTPUT_MESSAGES, "[]") (or equivalent call
recorded on mock_span) so the behavior is explicitly validated.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py`:
- Around line 135-141: Don't serialize raw caller metadata into spans: remove
direct calls that stringify kwargs.get("user"), kwargs.get("headers"), and
kwargs.get("extra_headers") and instead validate and sanitize or omit them
before setting attributes; update uses of _set_span_attribute(span,
SpanAttributes.GEN_AI_USER, ...) and GEN_AI_HEADERS to first check for presence
(avoid recording the literal "None"), apply a sanitizer/allowlist that strips
PII/tokens (or only pick specific safe keys), and only call _set_span_attribute
with the sanitized or explicitly allowed value (or skip setting the attribute
entirely if nothing safe remains).

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py`:
- Around line 169-179: The _set_input_messages function currently drops
empty-string prompts and collapses list prompts to the first element; update it
so it preserves list-shaped prompts and allows empty-string items: remove the
truthiness check that returns on empty prompt, treat prompt uniformly (if it's a
list use it as-is, otherwise wrap non-list prompts into a single-item list),
then JSON-serialize an array mapping each item to
{"role":"user","parts":[{"content": item, "type":"text"}]} and pass that to
_set_span_attribute with GenAIAttributes.GEN_AI_INPUT_MESSAGES so batched
prompts and token arrays are fully recorded.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py`:
- Around line 5-7: The TypedDict _FunctionToolCall currently declares arguments:
Optional[dict[str, Any]] but the instrumentation emits/passes a raw JSON string;
update the type to match the emitted payload by changing the annotation on
arguments in _FunctionToolCall to Optional[str] (or, if you intentionally
support both parsed and raw forms, make it explicit as Optional[Union[str,
dict[str, Any]]]) so type-checkers and downstream callers see the correct shape;
ensure the change targets the _FunctionToolCall declaration and any related type
imports (e.g., add typing.Union if chosen).

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py`:
- Around line 123-126: Replace the incorrect use of
GenAIAttributes.GEN_AI_OPERATION_NAME for modalities and the non-standard
operation value "realtime": update the call that sets
f"{GenAIAttributes.GEN_AI_OPERATION_NAME}.modalities" (within the
_set_span_attribute call that references session.modalities) to instead set the
semantic attribute for output modality (use gen_ai.output.type) with one of the
well-known values like "text", "json", or "speech"; and ensure any span
attribute assignment that sets GenAIAttributes.GEN_AI_OPERATION_NAME (search for
usages that assign "realtime") uses one of the allowed operation names ("chat",
"create_agent", "embeddings", "execute_tool", "generate_content",
"invoke_agent", "retrieval", or "text_completion") or is replaced with a
justified custom attribute name if a non-standard operation is truly required.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml`:
- Around line 78-105: The cassette test_chat_with_messages_attributes.yaml
contains sensitive response headers (openai-organization, openai-project,
set-cookie, x-request-id) that weren't scrubbed; update the VCR scrubber used
for this suite by adding response filtering in before_record (or extending
filter_headers) to remove or replace those header keys (openai-organization,
openai-project, set-cookie, x-request-id) from recorded responses, then re-run
the tests to re-record test_chat_with_messages_attributes.yaml so the cassette
no longer contains those sensitive values.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml`:
- Around line 33-34: The cassette contains account-scoped headers (traceparent,
openai-organization, openai-project, set-cookie, x-request-id) that must be
scrubbed before committing; add VCR header filters or a
before_record/before_record_response hook to replace those header values with
deterministic placeholders (e.g., <TRACEPARENT>, <ORG>, <PROJECT>, <SET_COOKIE>,
<X_REQUEST_ID>), update any matching body fields if present, then re-record the
cassette (the YAML under
test_completions/test_completion_with_messages_attributes.yaml and other
affected cassettes referenced in lines ~79-110) so the committed cassette
contains only scrubbed placeholders.

---

Outside diff comments:
In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py`:
- Around line 626-630: The three variables are misnamed and the assertion is
checking the wrong relationship: change the variable names so they match the
attributes (e.g., total_tokens =
open_ai_span.attributes.get(SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS),
input_tokens =
open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS),
output_tokens =
open_ai_span.attributes.get(GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS)) and
update the assertion to assert input_tokens + output_tokens == total_tokens
(keeping the existing guard that all three are truthy).

---

Nitpick comments:
In `@packages/opentelemetry-instrumentation-openai/tests/conftest.py`:
- Around line 191-200: The fixture instrument_with_messages_attributes currently
sets Config.use_messages_attributes = True and always resets it to False;
instead capture the previous value before mutating (prev =
Config.use_messages_attributes) and after yield restore it
(Config.use_messages_attributes = prev) so the fixture composes safely with
other tests and preserves prior global state; locate this logic in the
instrument_with_messages_attributes fixture and replace the unconditional
teardown with restoring the saved prev value.

In `@packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py`:
- Around line 94-109: Add assertions to ensure legacy prompt/completion
attributes are not present on the span: check the span attributes
(open_ai_span.attributes) do not contain any keys starting with "gen_ai.prompt."
or "gen_ai.completion." (for example by asserting not
any(k.startswith("gen_ai.prompt.") or k.startswith("gen_ai.completion.") for k
in open_ai_span.attributes.keys())). Keep these checks near the existing
input/output message validations so the test fails if legacy fields are emitted
alongside the new gen_ai.input.messages/gen_ai.output.messages attributes.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py`:
- Around line 103-117: Change the two tests test_user_with_none_content and
test_system_with_none_content to assert the explicit expected value for parts
(an empty list) rather than only checking that parts is not None; locate the
assertions after calling _set_input_messages and _get_input_messages and replace
the "parts is not None" checks with an equality check against [] to reflect the
implementation of _set_input_messages.
- Around line 211-217: The test should assert the expected attribute value when
the only choice has message=None: after calling _set_output_messages(mock_span,
choices) verify that mock_span.set_attribute was called with the
GEN_AI_OUTPUT_MESSAGES key and a value representing an empty JSON array (i.e.,
"[]"); update the test_none_message_in_choice test to check
mock_span.set_attribute(GEN_AI_OUTPUT_MESSAGES, "[]") (or equivalent call
recorded on mock_span) so the behavior is explicitly validated.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a10d3c9a-1341-48bb-a2ef-ae05c4745cff

📥 Commits

Reviewing files that changed from the base of the PR and between 616c2a9 and a83cdbc.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-openai/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (26)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/__init__.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/config.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/embeddings_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
  • packages/opentelemetry-instrumentation-openai/pyproject.toml
  • packages/opentelemetry-instrumentation-openai/tests/conftest.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_parse.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_functions.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_vision.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)

80-85: ⚠️ Potential issue | 🟡 Minor

Skip tool parameters when the schema is absent.

json.dumps(function.get("parameters")) turns a missing schema into the literal string "null", so spans claim a parameters payload even when none was provided. Guard on parameters is not None before setting the attribute.

💡 Suggested fix
     for i, function in enumerate(functions):
         prefix = f"{GenAIAttributes.GEN_AI_TOOL_DEFINITIONS}.{i}"
         _set_span_attribute(span, f"{prefix}.name", function.get("name"))
         _set_span_attribute(span, f"{prefix}.description", function.get("description"))
-        _set_span_attribute(
-            span, f"{prefix}.parameters", json.dumps(function.get("parameters"))
-        )
+        parameters = function.get("parameters")
+        if parameters is not None:
+            _set_span_attribute(span, f"{prefix}.parameters", json.dumps(parameters))
@@
     for i, tool in enumerate(tools):
         function = tool.get("function")
         if not function:
             continue
 
         prefix = f"{GenAIAttributes.GEN_AI_TOOL_DEFINITIONS}.{i}"
         _set_span_attribute(span, f"{prefix}.name", function.get("name"))
         _set_span_attribute(span, f"{prefix}.description", function.get("description"))
-        _set_span_attribute(
-            span, f"{prefix}.parameters", json.dumps(function.get("parameters"))
-        )
+        parameters = function.get("parameters")
+        if parameters is not None:
+            _set_span_attribute(span, f"{prefix}.parameters", json.dumps(parameters))

Also applies to: 97-102

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py`
around lines 80 - 85, The code is unconditionally serializing
function.get("parameters") which turns a missing schema into the string "null";
update the logic in the block that builds tool definition span attributes (the
code using GenAIAttributes.GEN_AI_TOOL_DEFINITIONS, _set_span_attribute and the
local variable function) to first check whether function.get("parameters") is
not None and only call _set_span_attribute for the ".parameters" key when
parameters is present; apply the same guard to the other identical block around
the 97-102 region so you don't record a "null" parameters payload in spans.
♻️ Duplicate comments (2)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py (1)

169-183: ⚠️ Potential issue | 🟠 Major

Handle non-string list prompts as one completion input.

The completions client still allows prompt to be a string, an array of strings, an array of tokens, or an array of token arrays. This branch treats every list as “one message per element”, so tokenized prompts get split across multiple gen_ai.input.messages entries and nested token arrays end up as list-valued content. Restrict the multi-message path to list[str] and serialize other non-string shapes as a single message. (raw.githubusercontent.com)

💡 Suggested fix
 def _set_input_messages(span, prompt):
     if not span.is_recording() or prompt is None:
         return
 
-    prompts = prompt if isinstance(prompt, list) else [prompt]
+    prompts = (
+        prompt
+        if isinstance(prompt, list) and all(isinstance(p, str) for p in prompt)
+        else [prompt]
+    )
 
     messages = [
-        {"role": "user", "parts": [{"content": p, "type": "text"}]}
+        {
+            "role": "user",
+            "parts": [
+                {
+                    "content": p if isinstance(p, str) else json.dumps(p),
+                    "type": "text",
+                }
+            ],
+        }
         for p in prompts
     ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py`
around lines 169 - 183, The _set_input_messages function currently treats any
list prompt as multiple string messages which splits token lists; update it so
only a list of strings (list[str]) is expanded into multiple messages, and for
any other prompt shape (e.g., list of tokens or nested lists) serialize the
entire prompt as a single message content; modify the branch in
_set_input_messages (referencing the function name and
GenAIAttributes.GEN_AI_INPUT_MESSAGES and _set_span_attribute) to detect
isinstance(prompt, list) and all(isinstance(p, str) for p in prompt) before
creating multiple {"role":"user","parts":[{"content":..., "type":"text"}]}
entries, otherwise create a single message with the serialized prompt value.
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)

135-138: ⚠️ Potential issue | 🟠 Major

Don't emit raw user and header values into spans.

OpenAI documents user as an end-user identifier and extra_headers as arbitrary request headers. Recording both verbatim here turns stable identifiers and possible auth/session data into durable telemetry; sanitize/allowlist them or drop them entirely before calling _set_span_attribute. (raw.githubusercontent.com)

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py`:
- Around line 80-85: The code is unconditionally serializing
function.get("parameters") which turns a missing schema into the string "null";
update the logic in the block that builds tool definition span attributes (the
code using GenAIAttributes.GEN_AI_TOOL_DEFINITIONS, _set_span_attribute and the
local variable function) to first check whether function.get("parameters") is
not None and only call _set_span_attribute for the ".parameters" key when
parameters is present; apply the same guard to the other identical block around
the 97-102 region so you don't record a "null" parameters payload in spans.

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py`:
- Around line 169-183: The _set_input_messages function currently treats any
list prompt as multiple string messages which splits token lists; update it so
only a list of strings (list[str]) is expanded into multiple messages, and for
any other prompt shape (e.g., list of tokens or nested lists) serialize the
entire prompt as a single message content; modify the branch in
_set_input_messages (referencing the function name and
GenAIAttributes.GEN_AI_INPUT_MESSAGES and _set_span_attribute) to detect
isinstance(prompt, list) and all(isinstance(p, str) for p in prompt) before
creating multiple {"role":"user","parts":[{"content":..., "type":"text"}]}
entries, otherwise create a single message with the serialized prompt value.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f3419d11-cacd-4bd5-a8c7-e9f12d1ed800

📥 Commits

Reviewing files that changed from the base of the PR and between a83cdbc and 1de5ff4.

📒 Files selected for processing (12)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai/tests/conftest.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py
✅ Files skipped from review due to trivial changes (3)
  • packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py (1)

292-300: Make None explicit in request-attribute tests.

Line 294 and Line 300 currently validate “missing key” behavior, while the docstrings describe explicit None. Passing headers=None / user=None directly would better lock intent and future-proof these tests.

Suggested refinement
 class TestRequestAttributesSerialization:
     def test_none_headers_not_recorded_as_string(self, mock_span):
         """kwargs with headers=None should not record literal 'None'."""
-        _set_request_attributes(mock_span, {"model": "gpt-4"})
+        _set_request_attributes(mock_span, {"model": "gpt-4", "headers": None})
         val = mock_span._attrs.get(SpanAttributes.GEN_AI_HEADERS)
         assert val != "None", "Should not record literal string 'None'"
 
     def test_none_user_not_recorded(self, mock_span):
         """kwargs with user=None should not record attribute."""
-        _set_request_attributes(mock_span, {"model": "gpt-4"})
+        _set_request_attributes(mock_span, {"model": "gpt-4", "user": None})
         assert SpanAttributes.GEN_AI_USER not in mock_span._attrs
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py`
around lines 292 - 300, The tests claim to check explicit None values but
currently omit them; update test_none_headers_not_recorded_as_string to call
_set_request_attributes(mock_span, {"model": "gpt-4"}, headers=None) and assert
that SpanAttributes.GEN_AI_HEADERS is either absent or None (not the literal
string "None"), and update test_none_user_not_recorded to call
_set_request_attributes(mock_span, {"model": "gpt-4"}, user=None) and assert
that the user attribute is not present (or is None) in mock_span._attrs; use the
existing test function names (test_none_headers_not_recorded_as_string,
test_none_user_not_recorded) and the helper
_set_request_attributes/SpanAttributes symbols to locate and change the calls
and assertions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py`:
- Around line 292-300: The tests claim to check explicit None values but
currently omit them; update test_none_headers_not_recorded_as_string to call
_set_request_attributes(mock_span, {"model": "gpt-4"}, headers=None) and assert
that SpanAttributes.GEN_AI_HEADERS is either absent or None (not the literal
string "None"), and update test_none_user_not_recorded to call
_set_request_attributes(mock_span, {"model": "gpt-4"}, user=None) and assert
that the user attribute is not present (or is None) in mock_span._attrs; use the
existing test function names (test_none_headers_not_recorded_as_string,
test_none_user_not_recorded) and the helper
_set_request_attributes/SpanAttributes symbols to locate and change the calls
and assertions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bf095afb-423d-45a1-9f9e-4de2c2e07909

📥 Commits

Reviewing files that changed from the base of the PR and between 1de5ff4 and c2b8059.

📒 Files selected for processing (1)
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py

@max-deygin-traceloop max-deygin-traceloop changed the title OpenAI instrumentation to support OTel GenAI Semantic Conventions 0.5.0 feat(OpenAI) instrumentation to support OTel GenAI Semantic Conventions 0.5.0 Mar 24, 2026
@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1929-openai-instrumentation branch from c2b8059 to 33f2b26 Compare March 24, 2026 14:56
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py (1)

261-271: ⚠️ Potential issue | 🟠 Major

Handle dict-shaped usage before dereferencing token fields.

traced_response.usage can be a plain dict here: the fallback type alias is Dict[str, Any], and Lines 529, 595, 695, and 757 also rehydrate saved state through model_dump(). This block still uses attribute access for input_tokens, output_tokens, total_tokens, and input_tokens_details, so those paths can raise here and skip span enrichment.

🐛 Proposed fix
     _set_span_attribute(span, OpenAIAttributes.OPENAI_RESPONSE_SERVICE_TIER, traced_response.response_service_tier)
     if usage := traced_response.usage:
-        _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, usage.input_tokens)
-        _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, usage.output_tokens)
-        _set_span_attribute(
-            span, SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS, usage.total_tokens
-        )
-        if usage.input_tokens_details:
+        input_tokens = usage.get("input_tokens") if isinstance(usage, dict) else getattr(usage, "input_tokens", None)
+        output_tokens = usage.get("output_tokens") if isinstance(usage, dict) else getattr(usage, "output_tokens", None)
+        total_tokens = usage.get("total_tokens") if isinstance(usage, dict) else getattr(usage, "total_tokens", None)
+        input_tokens_details = (
+            usage.get("input_tokens_details") if isinstance(usage, dict)
+            else getattr(usage, "input_tokens_details", None)
+        )
+
+        _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_INPUT_TOKENS, input_tokens)
+        _set_span_attribute(span, GenAIAttributes.GEN_AI_USAGE_OUTPUT_TOKENS, output_tokens)
+        _set_span_attribute(span, SpanAttributes.GEN_AI_USAGE_TOTAL_TOKENS, total_tokens)
+        if input_tokens_details:
             _set_span_attribute(
                 span,
                 SpanAttributes.GEN_AI_USAGE_CACHE_READ_INPUT_TOKENS,
-                usage.input_tokens_details.cached_tokens,
+                input_tokens_details.get("cached_tokens") if isinstance(input_tokens_details, dict)
+                else getattr(input_tokens_details, "cached_tokens", None),
             )

Also applies to: 275-289

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py`
around lines 261 - 271, traced_response.usage may be a dict instead of an object
with attributes, so update the span-enrichment code (the block that reads
traced_response.usage and calls _set_span_attribute with
GenAIAttributes/SpanAttributes) to handle both dict and attribute-style objects:
detect with isinstance(usage, dict) and read tokens via
usage.get("input_tokens"), usage.get("output_tokens"),
usage.get("total_tokens"), and nested usage.get("input_tokens_details",
{}).get("cached_tokens") when dict; otherwise use the existing attribute access
(usage.input_tokens, usage.output_tokens, usage.total_tokens,
usage.input_tokens_details.cached_tokens). Apply the same fix to the analogous
usage-handling block later in the file (the second occurrence that currently
uses attribute access).
♻️ Duplicate comments (1)
packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py (1)

135-138: ⚠️ Potential issue | 🟠 Major

Don't persist raw caller metadata into spans.

The None guard helps, but Line 135 and Lines 136-138 still copy caller-provided user identifiers and full header dictionaries verbatim into telemetry. That can durably leak auth tokens, cookies, tenant IDs, or other PII into tracing backends. Please drop these attributes or emit only an allowlisted/sanitized subset.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py`
around lines 135 - 138, The code currently records raw caller-supplied
identifiers and full header dicts into spans via _set_span_attribute(span,
SpanAttributes.GEN_AI_USER, kwargs.get("user")) and _set_span_attribute(span,
SpanAttributes.GEN_AI_HEADERS, str(headers)), which can leak PII/tokens; remove
these direct writes and either omit these attributes entirely or replace them
with a sanitized/allowlisted representation. Update the block that reads
kwargs.get("user") and kwargs.get("extra_headers")/kwargs.get("headers") so it
does not call _set_span_attribute with raw values — instead either (a) do
nothing, (b) set a safe boolean/enum like SpanAttributes.GEN_AI_USER_PRESENT or
SpanAttributes.GEN_AI_HEADERS_PRESENT, or (c) build a sanitized dict by
lowercasing header names and including only an allowlist (e.g.,
"content-type","user-agent","accept") and/or mask all values before calling
_set_span_attribute; ensure changes are applied where _set_span_attribute,
SpanAttributes.GEN_AI_USER, and SpanAttributes.GEN_AI_HEADERS are referenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py`:
- Around line 5-7: The _FunctionToolCall TypedDict currently declares arguments
as a required key (arguments: Optional[str]) which forces the key to exist even
when absent in real payloads; update the _FunctionToolCall declaration to make
the arguments key optional by changing its annotation to
NotRequired[Optional[str]] and add/import NotRequired from typing_extensions
(same import pattern used elsewhere in this package) so emitted/test payloads
that omit the arguments key match the type model.

In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py`:
- Around line 318-330: The code is referencing a non-existent semconv constant
GenAIAttributes.GEN_AI_TOOL_DEFINITIONS which will raise AttributeError; update
the instrumentation in responses_wrappers.py to use the literal string key
"gen_ai.tool.definitions" wherever GenAIAttributes.GEN_AI_TOOL_DEFINITIONS is
used (e.g., in the calls to _set_span_attribute that set description,
parameters, name), and ensure all occurrences are consistently replaced so span
attributes are set without relying on the missing semconv constant.

---

Outside diff comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py`:
- Around line 261-271: traced_response.usage may be a dict instead of an object
with attributes, so update the span-enrichment code (the block that reads
traced_response.usage and calls _set_span_attribute with
GenAIAttributes/SpanAttributes) to handle both dict and attribute-style objects:
detect with isinstance(usage, dict) and read tokens via
usage.get("input_tokens"), usage.get("output_tokens"),
usage.get("total_tokens"), and nested usage.get("input_tokens_details",
{}).get("cached_tokens") when dict; otherwise use the existing attribute access
(usage.input_tokens, usage.output_tokens, usage.total_tokens,
usage.input_tokens_details.cached_tokens). Apply the same fix to the analogous
usage-handling block later in the file (the second occurrence that currently
uses attribute access).

---

Duplicate comments:
In
`@packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py`:
- Around line 135-138: The code currently records raw caller-supplied
identifiers and full header dicts into spans via _set_span_attribute(span,
SpanAttributes.GEN_AI_USER, kwargs.get("user")) and _set_span_attribute(span,
SpanAttributes.GEN_AI_HEADERS, str(headers)), which can leak PII/tokens; remove
these direct writes and either omit these attributes entirely or replace them
with a sanitized/allowlisted representation. Update the block that reads
kwargs.get("user") and kwargs.get("extra_headers")/kwargs.get("headers") so it
does not call _set_span_attribute with raw values — instead either (a) do
nothing, (b) set a safe boolean/enum like SpanAttributes.GEN_AI_USER_PRESENT or
SpanAttributes.GEN_AI_HEADERS_PRESENT, or (c) build a sanitized dict by
lowercasing header names and including only an allowlist (e.g.,
"content-type","user-agent","accept") and/or mask all values before calling
_set_span_attribute; ensure changes are applied where _set_span_attribute,
SpanAttributes.GEN_AI_USER, and SpanAttributes.GEN_AI_HEADERS are referenced.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8c9ec5df-f688-4d8a-8a41-3d6ada1449fc

📥 Commits

Reviewing files that changed from the base of the PR and between c2b8059 and 33f2b26.

⛔ Files ignored due to path filters (1)
  • packages/opentelemetry-instrumentation-openai/uv.lock is excluded by !**/*.lock
📒 Files selected for processing (27)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/__init__.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/__init__.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/config.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/embeddings_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/responses_wrappers.py
  • packages/opentelemetry-instrumentation-openai/pyproject.toml
  • packages/opentelemetry-instrumentation-openai/tests/conftest.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_azure.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_parse.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_functions.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_messages_attributes.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_vision.py
✅ Files skipped from review due to trivial changes (13)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/config.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_streaming_with_api_usage.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/conftest.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat_parse.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_assistant.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_embeddings.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_vision.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_completions/test_completion_with_messages_attributes.yaml
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_responses.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/cassettes/test_chat/test_chat_with_messages_attributes.yaml
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/assistant_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/chat_wrappers.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/completion_wrappers.py
🚧 Files skipped from review as they are similar to previous changes (7)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/init.py
  • packages/opentelemetry-instrumentation-openai/pyproject.toml
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_realtime.py
  • packages/opentelemetry-instrumentation-openai/tests/conftest.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_completions.py
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/v1/realtime_wrappers.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_chat.py

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/opentelemetry-instrumentation-openai/tests/traces/test_event_emitter.py (1)

68-211: Optional: parameterize event-type/prompt-state cases to reduce duplication.

Both MessageEvent and ChoiceEvent suites repeat the same patch/assert scaffolding; a paramized matrix would keep intent while shrinking test maintenance surface.

♻️ Example refactor sketch
+@pytest.mark.parametrize(
+    "event_factory,send_prompts,tool_call_factory",
+    [
+        ("message", True, _make_tool_call_without_arguments),
+        ("message", False, _make_tool_call_without_arguments),
+        ("message", False, _make_tool_call_with_none_arguments),
+        ("message", False, _make_tool_call_with_arguments),
+        ("choice", True, _make_tool_call_without_arguments),
+        ("choice", False, _make_tool_call_without_arguments),
+    ],
+)
+def test_emit_tool_call_arguments_behavior(mock_event_logger, event_factory, send_prompts, tool_call_factory):
+    ...
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_event_emitter.py`
around lines 68 - 211, Combine the duplicate MessageEvent and ChoiceEvent tests
into a parametric test that iterates over event classes (MessageEvent,
ChoiceEvent), tool-call variants (_make_tool_call_without_arguments,
_make_tool_call_with_none_arguments, _make_tool_call_with_arguments), and
should_send_prompts True/False; in the single test call emit_event(event) and
assert mock_event_logger.emit was called and that
body["tool_calls"][0]["function"] either retains name "get_weather" when
arguments are absent/should_send_prompts=True or does not contain the
"arguments" key when should_send_prompts=False or arguments are None, reusing
the existing helpers and patches for should_emit_events and should_send_prompts
and keeping mock_event_logger fixture.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In
`@packages/opentelemetry-instrumentation-openai/tests/traces/test_event_emitter.py`:
- Around line 68-211: Combine the duplicate MessageEvent and ChoiceEvent tests
into a parametric test that iterates over event classes (MessageEvent,
ChoiceEvent), tool-call variants (_make_tool_call_without_arguments,
_make_tool_call_with_none_arguments, _make_tool_call_with_arguments), and
should_send_prompts True/False; in the single test call emit_event(event) and
assert mock_event_logger.emit was called and that
body["tool_calls"][0]["function"] either retains name "get_weather" when
arguments are absent/should_send_prompts=True or does not contain the
"arguments" key when should_send_prompts=False or arguments are None, reusing
the existing helpers and patches for should_emit_events and should_send_prompts
and keeping mock_event_logger fixture.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 58362de4-f7a3-4762-9218-57218fe75a46

📥 Commits

Reviewing files that changed from the base of the PR and between 33f2b26 and b094ce3.

📒 Files selected for processing (2)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py
  • packages/opentelemetry-instrumentation-openai/tests/traces/test_event_emitter.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • packages/opentelemetry-instrumentation-openai/opentelemetry/instrumentation/openai/shared/event_models.py

max-deygin-traceloop and others added 2 commits March 25, 2026 14:59
…bility

Bring back all LLM_* constants removed in v0.5.0 with their original
string values. This allows non-migrated instrumentation packages to
depend on semconv >=0.5.1 without code changes.

- Add legacy LLM_* constants to SpanAttributes with TODO comments
- Add GEN_AI_USAGE_CACHE_*_DEPRECATED for value-changed cache attrs
- Update _testing.py to verify legacy constants are present
- Bump semconv to 0.5.1
- Update all 32 packages to depend on >=0.5.1,<0.6.0
- Add local uv source overrides for semconv in all packages

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

def _map_finish_reason(reason):
if not reason:
return "stop"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return "stop"
return reason

})
elif role == "user":
content = msg.get("content")
parts = [{"content": content, "type": "text"}] if isinstance(content, str) else (content or [])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

U need here an util method _content_to_parts as u did in Anthropic pr, instead of passing as is.

})
elif role in ["system", "developer"]:
content = msg.get("content")
parts = [{"content": content, "type": "text"}] if isinstance(content, str) else (content or [])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

samee

})
elif role == "assistant":
content = msg.get("content")
parts = [{"content": content, "type": "text"}] if isinstance(content, str) else (content or [])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Comment on lines +548 to +551
if Config.use_messages_attributes:
_set_output_messages(span, choices)
else:
_legacy_set_completions(span, choices)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this condition?

"parts": parts,
"finish_reason": _map_finish_reason(choice.get("finish_reason")),
})
_set_span_attribute(span, GenAIAttributes.GEN_AI_OUTPUT_MESSAGES, json.dumps(messages))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Anthropic pr u also added the finish reasons as top level, and not only in the messages json, so please add it here as well

Comment on lines 334 to 340
_set_span_attribute(
span,
f"{GenAIAttributes.GEN_AI_PROMPT}.{prompt_index}.content",
traced_response.instructions,
)
_set_span_attribute(span, f"{GenAIAttributes.GEN_AI_PROMPT}.{prompt_index}.role", "system")
prompt_index += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't that the old format?

should be

              input_messages.append({
                    "role": "system",
                    "parts": [{"type": "text", "content": traced_response.instructions}],
                })

also make sure its covered by tests

_set_span_attribute(span, f"{GenAIAttributes.GEN_AI_COMPLETION}.0.role", "assistant")
if traced_response.output_text:
_set_span_attribute(
span, f"{GenAIAttributes.GEN_AI_COMPLETION}.0.content", traced_response.output_text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@doronkopit5 doronkopit5 changed the title feat(OpenAI) instrumentation to support OTel GenAI Semantic Conventions 0.5.0 feat(open-ai): instrumentation to support OTel GenAI Semantic Conventions 0.5.0 Mar 25, 2026
max-deygin-traceloop and others added 12 commits March 25, 2026 17:06
These dev-only overrides don't belong in the semconv branch.
Only traceloop-sdk and langchain retain their pre-existing overrides.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
# Conflicts:
#	packages/opentelemetry-semantic-conventions-ai/opentelemetry/semconv_ai/__init__.py
@max-deygin-traceloop max-deygin-traceloop force-pushed the max/tlp-1929-openai-instrumentation branch from b094ce3 to 4c08e21 Compare March 25, 2026 16:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants