crewAI

andre/crewAI

Fork 0

mirror of https://github.com/crewAIInc/crewAI.git synced 2026-07-01 21:28:10 +00:00

Files

History

Lucas Gomide cab3319af9

CodeQL Advanced / Analyze (actions) (push) Has been cancelled

Details

CodeQL Advanced / Analyze (python) (push) Has been cancelled

Details

Vulnerability Scan / pip-audit (push) Has been cancelled

Details

Build uv cache / build-cache (3.10) (push) Has been cancelled

Details

Build uv cache / build-cache (3.11) (push) Has been cancelled

Details

Build uv cache / build-cache (3.12) (push) Has been cancelled

Details

Build uv cache / build-cache (3.13) (push) Has been cancelled

Details

feat(otel): surface real finish_reason + sampling params + response.id on LLM events (#5945 )

* feat(otel): surface real finish_reason + sampling params + response.id on LLM events

Companion to the OTel GenAI emitter compliance work in crewai-enterprise
(CON-172). Today the enterprise emitter reads these fields off the OSS
LLM events via `getattr(..., None)`, so it produces valid (but partial)
spans against the existing OSS surface. This change makes those fields
first-class on the events so spans can carry the real provider data.

What this adds:

- `LLMCallStartedEvent` gains the sampling-param fields the emitter needs
for `gen_ai.request.*`: `temperature`, `top_p`, `max_tokens`, `stream`,
`seed`, `stop_sequences`, `frequency_penalty`, `presence_penalty`, `n`.
All optional; existing call sites keep working.
- `BaseLLM._emit_call_started_event` introspects those values off `self`
(the LLM instance) via `getattr(..., None)` so every provider gets the
fields propagated for free without per-provider plumbing.
- `LLMCallCompletedEvent` gains `finish_reason: str | None` and
`response_id: str | None`. A field validator coerces any non-string
value (MagicMock, unexpected provider object) to None so the event
never raises on construction.
- `LLM._emit_call_completed_event` accepts both as kwargs.
- `LLM` (LiteLLM path) gets a defensive `_extract_finish_reason_and_response_id`
helper that handles both streaming (`StreamingChoices`) and non-streaming
(`Choices`) shapes and is wired into every completion-event emission site.
- Provider completions extract native values from their SDK responses and
pass them through:
- OpenAI: `_extract_responses_finish_reason_and_id` for Responses-API,
`_extract_finish_reason_and_id` for Chat-Completions.
- Anthropic: `_extract_finish_reason_and_id` (Messages API + streaming).
- Bedrock: `_extract_finish_reason_and_id` (`stopReason` from converse).
- Gemini: `_extract_finish_reason_and_id` (`finish_reason` from candidates).
- Azure: inherits via OpenAI sub-class; adds the helper for Azure-specific
response shapes.
- openai_compatible: inherits from OpenAICompletion, no edits needed.

Compatibility:

- All new fields are optional with sensible defaults. No existing call
sites need to change.
- The validator on `LLMCallCompletedEvent` swallows non-string values for
the new fields so legacy mocks / exotic provider types don't blow up
event construction.
- Enterprise side already reads these fields defensively, so OSS and
enterprise can merge independently and cut on the same synchronized
release.

Tested against the full LLM + events + provider test suite — all green;
the 14 pre-existing multimodal failures on main are unrelated and
reproduce without this diff.

* fix(bedrock): propagate finish_reason + response_id on async paths

The original commit covered every provider's sync path and Bedrock's
sync streaming path, but two Bedrock async paths still emitted
LLMCallCompletedEvent without finish_reason/response_id:

- _ahandle_converse: the final fallback emit_call_completed_event call
was missing both fields. Added stop_reason + response_id matching the
other emission sites in the same function.

- _ahandle_streaming_converse: response_id was never seeded from the
initial response object, and stream_finish_reason wasn't propagated
to the structured-output and final-text emissions. Now extracts
response_id up front and threads stream_finish_reason through every
completion event.

Adds a dedicated test file covering the new event fields end-to-end:
- LLMCallCompletedEvent.finish_reason / response_id Pydantic validation
(string accepted, None default, non-string coerced to None).
- LLMCallStartedEvent sampling params (all nine fields accepted, default
to None).
- BaseLLM._emit_call_started_event introspecting sampling params off
self, with explicit kwargs overriding.
- BaseLLM._emit_call_completed_event passing finish_reason/response_id
through to the event.
- LLM._extract_finish_reason_and_response_id across the LiteLLM shapes
(non-streaming response, streaming chunk, dict, missing fields,
non-string values, unexpected input).

* fix(otel): correct streaming finish_reason + bedrock response_id semantics

Two correctness fixes uncovered while landing the OTel finish_reason +
response_id plumbing:

- LiteLLM streaming (sync + async): `stream_options={"include_usage": True}`
causes LiteLLM to emit a final usage-only chunk with `choices=[]`. The
post-loop `_extract_finish_reason_and_response_id(last_chunk)` silently
returned `(None, None)` because the last chunk has no choices, even though
earlier chunks carried `finish_reason="stop"`. Track both fields
incrementally inside the loop (mirroring how OpenAI/Gemini/Azure already
handle their native streams) and use the tracked values for the
LLMCallCompletedEvent emission and the partial-response error path.

- Bedrock Converse: `ResponseMetadata.RequestId` is an AWS infra trace id,
not a model-level response id (semantically different from OpenAI's
`chatcmpl-XXX`). Return None for `response_id` rather than mislead
downstream telemetry consumers. The audit-fix's async propagation chain
still works — None propagates through unchanged.

Adds `test_llm_streaming_finish_reason.py` pinning both the sync and async
LiteLLM streaming paths against the include_usage chunk shape.

* refactor(otel): unify LLM event introspection + drop redundant defensive code

Three cohesion cleanups uncovered during PR review, all behavior-preserving:

- LLM.call / LLM.acall in llm.py now delegate to BaseLLM._emit_call_started_event
instead of constructing LLMCallStartedEvent inline. The base helper already
introspects sampling params off self via getattr; the inline duplication was
accidental, not justified, and a duplication risk if anyone adds a tenth
OTel sampling param later.

- Extracted lib/crewai/llms/_finish_reason_utils.py:extract_choices_finish_reason_and_id
as the shared extractor for the choices-based response shape. OpenAI Chat,
Azure, and LiteLLM all read the same shape (response.id + choices[0].finish_reason)
as both object attrs and dict keys. Providers with genuinely different shapes
- Anthropic (stop_reason), Bedrock (stopReason), Gemini (protobuf enum),
OpenAI Responses (status) - keep their own provider-specific helpers.

- Dropped redundant try/except (AttributeError, TypeError) wrappers around
bare getattr(obj, "field", None) calls across the new extraction helpers.
getattr with a default already suppresses AttributeError, and the inner
isinstance / dict.get / int-coercion ops can't raise TypeError in practice.
Kept the catches that legitimately guard against IndexError (e.g. choices[0]
on an empty list).

Tests: 600 passed, 23 skipped, 14 pre-existing multimodal failures unchanged.
Added 12 parametrized tests for the shared helper covering object + dict
shapes, missing fields, non-string coercion, and never-raises invariants.

* chore(otel): drop dead last_chunk variable from async streaming

The streaming-fix commit (49e5581b5) replaced the post-loop
`_extract_finish_reason_and_response_id(last_chunk)` call with the
incrementally-tracked `stream_finish_reason` / `stream_response_id`,
which removed the only reader of `last_chunk` in
`_ahandle_streaming_response`. The declaration and per-iteration
assignment were left behind — harmless but confusing for future
readers because the sync sibling still legitimately uses `last_chunk`
(for usage and content fallbacks via `_handle_streaming_callbacks`).

The async path inlines its usage extraction directly inside the loop
(`chunk.model_extra.get("usage")`), so there's no fallback consumer.
Drop both lines.

Sync path untouched — `last_chunk` there is still load-bearing.

* fix(otel): coerce non-list stop_sequences to list[str] on LLMCallStartedEvent

Observed in Datadog: gen_ai.request.stop_sequences on a Gemini/Vertex
span surfaced the textproto repr of a google.protobuf.struct_pb2.ListValue
(values { string_value: "\nObservation:" }) instead of a real Sequence[str].

Root cause is upstream - a Vertex AI / Gemini code path stores the stop
list in a protobuf container (RepeatedScalarContainer or ListValue) rather
than a plain Python list. When that container reaches LLMCallStartedEvent
and then BaseLLM._emit_call_started_event hands it to the OTel SDK as a
span attribute, the SDK falls back to str(value) because the type isn't a
recognised Sequence[str] - producing the protobuf textproto string instead
of an array attribute.

* chore: fix ruff lint findings

* refactor(otel): declare sampling params on BaseLLM + honor stop overrides + dict chunk id

* fix: widen max_tokens to int | float | None + apply ruff format

* fix(otel): coerce unknown finish_reason / response_id to None instead of stringifying

* fix(otel): extract Azure stream finish_reason/id before usage-continue

Match the LiteLLM ordering so a finish_reason or response id riding on a
usage-carrying chunk isn't dropped by the early `continue`.

* fix(otel): report effective max_tokens cap + bedrock structured finish_reason

2026-06-05 07:23:38 -04:00

anthropic

Fix structured output leaks in tool-calling loops (#5897 )

2026-05-27 13:20:53 -07:00

azure