mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-07-02 21:58:11 +00:00
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Vulnerability Scan / pip-audit (push) Has been cancelled
Build uv cache / build-cache (3.10) (push) Has been cancelled
Build uv cache / build-cache (3.11) (push) Has been cancelled
Build uv cache / build-cache (3.12) (push) Has been cancelled
Build uv cache / build-cache (3.13) (push) Has been cancelled
* feat(otel): surface real finish_reason + sampling params + response.id on LLM events
Companion to the OTel GenAI emitter compliance work in crewai-enterprise
(CON-172). Today the enterprise emitter reads these fields off the OSS
LLM events via `getattr(..., None)`, so it produces valid (but partial)
spans against the existing OSS surface. This change makes those fields
first-class on the events so spans can carry the real provider data.
What this adds:
- `LLMCallStartedEvent` gains the sampling-param fields the emitter needs
for `gen_ai.request.*`: `temperature`, `top_p`, `max_tokens`, `stream`,
`seed`, `stop_sequences`, `frequency_penalty`, `presence_penalty`, `n`.
All optional; existing call sites keep working.
- `BaseLLM._emit_call_started_event` introspects those values off `self`
(the LLM instance) via `getattr(..., None)` so every provider gets the
fields propagated for free without per-provider plumbing.
- `LLMCallCompletedEvent` gains `finish_reason: str | None` and
`response_id: str | None`. A field validator coerces any non-string
value (MagicMock, unexpected provider object) to None so the event
never raises on construction.
- `LLM._emit_call_completed_event` accepts both as kwargs.
- `LLM` (LiteLLM path) gets a defensive `_extract_finish_reason_and_response_id`
helper that handles both streaming (`StreamingChoices`) and non-streaming
(`Choices`) shapes and is wired into every completion-event emission site.
- Provider completions extract native values from their SDK responses and
pass them through:
- OpenAI: `_extract_responses_finish_reason_and_id` for Responses-API,
`_extract_finish_reason_and_id` for Chat-Completions.
- Anthropic: `_extract_finish_reason_and_id` (Messages API + streaming).
- Bedrock: `_extract_finish_reason_and_id` (`stopReason` from converse).
- Gemini: `_extract_finish_reason_and_id` (`finish_reason` from candidates).
- Azure: inherits via OpenAI sub-class; adds the helper for Azure-specific
response shapes.
- openai_compatible: inherits from OpenAICompletion, no edits needed.
Compatibility:
- All new fields are optional with sensible defaults. No existing call
sites need to change.
- The validator on `LLMCallCompletedEvent` swallows non-string values for
the new fields so legacy mocks / exotic provider types don't blow up
event construction.
- Enterprise side already reads these fields defensively, so OSS and
enterprise can merge independently and cut on the same synchronized
release.
Tested against the full LLM + events + provider test suite — all green;
the 14 pre-existing multimodal failures on main are unrelated and
reproduce without this diff.
* fix(bedrock): propagate finish_reason + response_id on async paths
The original commit covered every provider's sync path and Bedrock's
sync streaming path, but two Bedrock async paths still emitted
LLMCallCompletedEvent without finish_reason/response_id:
- _ahandle_converse: the final fallback emit_call_completed_event call
was missing both fields. Added stop_reason + response_id matching the
other emission sites in the same function.
- _ahandle_streaming_converse: response_id was never seeded from the
initial response object, and stream_finish_reason wasn't propagated
to the structured-output and final-text emissions. Now extracts
response_id up front and threads stream_finish_reason through every
completion event.
Adds a dedicated test file covering the new event fields end-to-end:
- LLMCallCompletedEvent.finish_reason / response_id Pydantic validation
(string accepted, None default, non-string coerced to None).
- LLMCallStartedEvent sampling params (all nine fields accepted, default
to None).
- BaseLLM._emit_call_started_event introspecting sampling params off
self, with explicit kwargs overriding.
- BaseLLM._emit_call_completed_event passing finish_reason/response_id
through to the event.
- LLM._extract_finish_reason_and_response_id across the LiteLLM shapes
(non-streaming response, streaming chunk, dict, missing fields,
non-string values, unexpected input).
* fix(otel): correct streaming finish_reason + bedrock response_id semantics
Two correctness fixes uncovered while landing the OTel finish_reason +
response_id plumbing:
- LiteLLM streaming (sync + async): `stream_options={"include_usage": True}`
causes LiteLLM to emit a final usage-only chunk with `choices=[]`. The
post-loop `_extract_finish_reason_and_response_id(last_chunk)` silently
returned `(None, None)` because the last chunk has no choices, even though
earlier chunks carried `finish_reason="stop"`. Track both fields
incrementally inside the loop (mirroring how OpenAI/Gemini/Azure already
handle their native streams) and use the tracked values for the
LLMCallCompletedEvent emission and the partial-response error path.
- Bedrock Converse: `ResponseMetadata.RequestId` is an AWS infra trace id,
not a model-level response id (semantically different from OpenAI's
`chatcmpl-XXX`). Return None for `response_id` rather than mislead
downstream telemetry consumers. The audit-fix's async propagation chain
still works — None propagates through unchanged.
Adds `test_llm_streaming_finish_reason.py` pinning both the sync and async
LiteLLM streaming paths against the include_usage chunk shape.
* refactor(otel): unify LLM event introspection + drop redundant defensive code
Three cohesion cleanups uncovered during PR review, all behavior-preserving:
- LLM.call / LLM.acall in llm.py now delegate to BaseLLM._emit_call_started_event
instead of constructing LLMCallStartedEvent inline. The base helper already
introspects sampling params off self via getattr; the inline duplication was
accidental, not justified, and a duplication risk if anyone adds a tenth
OTel sampling param later.
- Extracted lib/crewai/llms/_finish_reason_utils.py:extract_choices_finish_reason_and_id
as the shared extractor for the choices-based response shape. OpenAI Chat,
Azure, and LiteLLM all read the same shape (response.id + choices[0].finish_reason)
as both object attrs and dict keys. Providers with genuinely different shapes
- Anthropic (stop_reason), Bedrock (stopReason), Gemini (protobuf enum),
OpenAI Responses (status) - keep their own provider-specific helpers.
- Dropped redundant try/except (AttributeError, TypeError) wrappers around
bare getattr(obj, "field", None) calls across the new extraction helpers.
getattr with a default already suppresses AttributeError, and the inner
isinstance / dict.get / int-coercion ops can't raise TypeError in practice.
Kept the catches that legitimately guard against IndexError (e.g. choices[0]
on an empty list).
Tests: 600 passed, 23 skipped, 14 pre-existing multimodal failures unchanged.
Added 12 parametrized tests for the shared helper covering object + dict
shapes, missing fields, non-string coercion, and never-raises invariants.
* chore(otel): drop dead last_chunk variable from async streaming
The streaming-fix commit (49e5581b5) replaced the post-loop
`_extract_finish_reason_and_response_id(last_chunk)` call with the
incrementally-tracked `stream_finish_reason` / `stream_response_id`,
which removed the only reader of `last_chunk` in
`_ahandle_streaming_response`. The declaration and per-iteration
assignment were left behind — harmless but confusing for future
readers because the sync sibling still legitimately uses `last_chunk`
(for usage and content fallbacks via `_handle_streaming_callbacks`).
The async path inlines its usage extraction directly inside the loop
(`chunk.model_extra.get("usage")`), so there's no fallback consumer.
Drop both lines.
Sync path untouched — `last_chunk` there is still load-bearing.
* fix(otel): coerce non-list stop_sequences to list[str] on LLMCallStartedEvent
Observed in Datadog: gen_ai.request.stop_sequences on a Gemini/Vertex
span surfaced the textproto repr of a google.protobuf.struct_pb2.ListValue
(values { string_value: "\nObservation:" }) instead of a real Sequence[str].
Root cause is upstream - a Vertex AI / Gemini code path stores the stop
list in a protobuf container (RepeatedScalarContainer or ListValue) rather
than a plain Python list. When that container reaches LLMCallStartedEvent
and then BaseLLM._emit_call_started_event hands it to the OTel SDK as a
span attribute, the SDK falls back to str(value) because the type isn't a
recognised Sequence[str] - producing the protobuf textproto string instead
of an array attribute.
* chore: fix ruff lint findings
* refactor(otel): declare sampling params on BaseLLM + honor stop overrides + dict chunk id
* fix: widen max_tokens to int | float | None + apply ruff format
* fix(otel): coerce unknown finish_reason / response_id to None instead of stringifying
* fix(otel): extract Azure stream finish_reason/id before usage-continue
Match the LiteLLM ordering so a finish_reason or response id riding on a
usage-carrying chunk isn't dropped by the early `continue`.
* fix(otel): report effective max_tokens cap + bedrock structured finish_reason
97 lines
2.9 KiB
Python
97 lines
2.9 KiB
Python
"""Regression: LiteLLM emits a final usage-only chunk (choices=[]) when
|
|
``stream_options.include_usage`` is set. The old post-loop
|
|
``_extract_finish_reason_and_response_id(last_chunk)`` then silently returned
|
|
(None, None). These tests pin that we capture finish_reason/response_id
|
|
incrementally during the stream loop instead.
|
|
"""
|
|
from __future__ import annotations
|
|
|
|
from typing import Any
|
|
from unittest.mock import patch
|
|
|
|
import pytest
|
|
|
|
from crewai.events.event_bus import CrewAIEventsBus
|
|
from crewai.events.types.llm_events import LLMCallCompletedEvent
|
|
from crewai.llm import LLM
|
|
|
|
|
|
@pytest.fixture
|
|
def mock_emit():
|
|
with patch.object(CrewAIEventsBus, "emit") as mock:
|
|
yield mock
|
|
|
|
|
|
def _completed_event(mock_emit) -> LLMCallCompletedEvent:
|
|
matches = [
|
|
call.kwargs["event"]
|
|
for call in mock_emit.call_args_list
|
|
if isinstance(call.kwargs.get("event"), LLMCallCompletedEvent)
|
|
]
|
|
assert matches, "expected an LLMCallCompletedEvent to be emitted"
|
|
assert len(matches) == 1, f"expected one completed event, got {len(matches)}"
|
|
return matches[0]
|
|
|
|
|
|
def _chunks_with_usage_tail() -> list[dict[str, Any]]:
|
|
"""Three-chunk stream mirroring LiteLLM's include_usage behavior:
|
|
two content chunks where the second carries finish_reason="stop",
|
|
then a final usage-only chunk with choices=[]."""
|
|
return [
|
|
{
|
|
"id": "chatcmpl-stream-1",
|
|
"choices": [
|
|
{"delta": {"content": "hi"}, "finish_reason": None}
|
|
],
|
|
},
|
|
{
|
|
"id": "chatcmpl-stream-1",
|
|
"choices": [
|
|
{"delta": {"content": " there"}, "finish_reason": "stop"}
|
|
],
|
|
},
|
|
{
|
|
"id": "chatcmpl-stream-1",
|
|
"choices": [],
|
|
"usage": {
|
|
"prompt_tokens": 1,
|
|
"completion_tokens": 2,
|
|
"total_tokens": 3,
|
|
},
|
|
},
|
|
]
|
|
|
|
|
|
def test_sync_stream_emits_finish_reason_and_response_id_from_loop(mock_emit):
|
|
llm = LLM(model="gpt-4o-mini", is_litellm=True, stream=True)
|
|
|
|
with patch("crewai.llm.litellm.completion", return_value=iter(_chunks_with_usage_tail())):
|
|
result = llm.call("anything")
|
|
|
|
assert result == "hi there"
|
|
|
|
event = _completed_event(mock_emit)
|
|
assert event.finish_reason == "stop"
|
|
assert event.response_id == "chatcmpl-stream-1"
|
|
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_async_stream_emits_finish_reason_and_response_id_from_loop(mock_emit):
|
|
llm = LLM(model="gpt-4o-mini", is_litellm=True, stream=True)
|
|
|
|
async def _aiter():
|
|
for chunk in _chunks_with_usage_tail():
|
|
yield chunk
|
|
|
|
async def _acompletion(*_args, **_kwargs):
|
|
return _aiter()
|
|
|
|
with patch("crewai.llm.litellm.acompletion", side_effect=_acompletion):
|
|
result = await llm.acall("anything")
|
|
|
|
assert result == "hi there"
|
|
|
|
event = _completed_event(mock_emit)
|
|
assert event.finish_reason == "stop"
|
|
assert event.response_id == "chatcmpl-stream-1"
|