Files
Lucas Gomide 887adafd2c fix: aggregate token usage across all LLM calls (#6122)
* feat: aggregate LLM token usage at the flow level

Introduces `flow.usage_metrics`, a snapshot of every LLMCallCompletedEvent
emitted under the flow's `current_flow_id` for the duration of one kickoff
(or resume) call. Aggregation happens on the singleton event bus so it
covers crews, direct `LLM.call`s, and nested listener calls — solving the
mismatch where the SDK reported only the last crew's usage while the
Enterprise UI showed the correct full total.

Co-authored-by: Cursor <cursoragent@cursor.com>

* refactor: centralize provider key normalization in UsageMetrics

Add UsageMetrics.from_provider_dict to normalize raw LLM usage dicts
across providers (LiteLLM, native Anthropic, native Gemini, OpenAI
nested cached). BaseLLM._track_token_usage_internal and the flow-level
aggregator now share this single source of truth, so `flow.usage_metrics`
agrees with per-LLM totals on every provider — including the native
Anthropic path that emits `input_tokens`/`output_tokens` instead of
`prompt_tokens`/`completion_tokens`.

* fix: flush event bus before reading aggregated usage_metrics

`crewai_event_bus.emit` dispatches LLMCallCompletedEvent handlers on a
ThreadPoolExecutor (fire-and-forget), so a flow whose last LLM call
completes right before kickoff_async/resume_async returns can detach
the usage listener while that handler is still queued, leaving its
tokens off `flow.usage_metrics`. Match `Crew.kickoff()` and call
`crewai_event_bus.flush()` in both finally blocks so every handler
drains before the listener is detached.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
2026-06-12 12:55:22 -04:00
..
2026-06-10 14:34:30 -04:00