Two correctness fixes uncovered while landing the OTel finish_reason +
response_id plumbing:
- LiteLLM streaming (sync + async): `stream_options={"include_usage": True}`
causes LiteLLM to emit a final usage-only chunk with `choices=[]`. The
post-loop `_extract_finish_reason_and_response_id(last_chunk)` silently
returned `(None, None)` because the last chunk has no choices, even though
earlier chunks carried `finish_reason="stop"`. Track both fields
incrementally inside the loop (mirroring how OpenAI/Gemini/Azure already
handle their native streams) and use the tracked values for the
LLMCallCompletedEvent emission and the partial-response error path.
- Bedrock Converse: `ResponseMetadata.RequestId` is an AWS infra trace id,
not a model-level response id (semantically different from OpenAI's
`chatcmpl-XXX`). Return None for `response_id` rather than mislead
downstream telemetry consumers. The audit-fix's async propagation chain
still works — None propagates through unchanged.
Adds `test_llm_streaming_finish_reason.py` pinning both the sync and async
LiteLLM streaming paths against the include_usage chunk shape.
The original commit covered every provider's sync path and Bedrock's
sync streaming path, but two Bedrock async paths still emitted
LLMCallCompletedEvent without finish_reason/response_id:
- _ahandle_converse: the final fallback emit_call_completed_event call
was missing both fields. Added stop_reason + response_id matching the
other emission sites in the same function.
- _ahandle_streaming_converse: response_id was never seeded from the
initial response object, and stream_finish_reason wasn't propagated
to the structured-output and final-text emissions. Now extracts
response_id up front and threads stream_finish_reason through every
completion event.
Adds a dedicated test file covering the new event fields end-to-end:
- LLMCallCompletedEvent.finish_reason / response_id Pydantic validation
(string accepted, None default, non-string coerced to None).
- LLMCallStartedEvent sampling params (all nine fields accepted, default
to None).
- BaseLLM._emit_call_started_event introspecting sampling params off
self, with explicit kwargs overriding.
- BaseLLM._emit_call_completed_event passing finish_reason/response_id
through to the event.
- LLM._extract_finish_reason_and_response_id across the LiteLLM shapes
(non-streaming response, streaming chunk, dict, missing fields,
non-string values, unexpected input).
* feat: enhance StdioTransport to prevent environment variable leakage
- Replaced os.environ.copy() with get_default_environment() to ensure only allowed environment variables are passed to the MCP server.
- Added tests to verify that ambient environment variables do not leak and that user-supplied environment variables can override defaults.
* feat: add environment variable filtering hook to StdioTransport
- Introduced an optional `_env_filter_hook` to allow extensions to modify the environment variables passed to MCP servers, enabling features like credential stripping.
- Updated tests to ensure the filtering hook is applied correctly after merging user-supplied and default environment variables.
* Fix structured output leaks in tool-calling loops
* addressing comments
* drop scripts
* Update Gemini agent tests to include structured output with thoughts and bump model version to 2.5-flash
* merge
* Update Anthropic test cases to use new model and tool structure
- Changed the model from "claude-3-5-haiku-20241022" to "claude-sonnet-4-6" in the test setup.
- Updated the request and response formats in the YAML test cassette to reflect the new tool structure and improved content formatting.
- Adjusted the expected response body to match the new output format from the assistant, including changes in tool usage and response details.
- Increased rate limit values in the response headers for better testing scenarios.
* adjusted bedrock cassettes
* adjusting cassettes for bedrock
* fix test
* Update VCR configuration to use 'host' instead of 'bedrock_host' for request matching
* feat(planning): enhance planning configuration and observation handling
- Introduced attribute in to control LLM calls after each step.
- Updated to set default to 1 when planning is enabled without explicit config.
- Modified to support heuristic observations when LLM calls are disabled.
- Adjusted to respect and settings for step observations.
- Added tests to verify behavior of new configurations and ensure correct observation handling across different reasoning efforts.
* fix(agent_executor): update handling of failed steps in low effort mode
- Adjusted logic to ensure that failed steps are recorded without marking them as completed when using low reasoning effort.
- Introduced feedback for failed steps, allowing the process to continue while tracking failures.
- Added a test to verify that failed steps are correctly marked without triggering a replan.
- And linted
* linted
Every agent kickoff calls _use_trained_data, which calls
CrewTrainingHandler(...).load(). Since #4827 wrapped load() in store_lock,
that means every kickoff acquires the cross-process (Redis-backed when
REDIS_URL is set) lock even on deployments that never train and have no
trained-agents file on disk.
Move the missing/empty-file short-circuit above store_lock so the lock is
only acquired when there is actually a file to read. save() and the real
read remain locked.
- callable_to_string returns None for lambdas/closures instead of an
unresolvable dotted path; Crew filters Nones out of restored callback
lists.
- EventNode.event serializer honors info.mode so mode='json' calls cascade
properly into nested event payloads.
- RagTool.adapter serializes to None (post-validator rebuilds from
config); concrete adapters hold runtime state that can't be round-tripped.
Move scope restoration from Crew-level global push to a per-task push
inside Task via resume_task_scope() in event_context. Fixes orphan
task_started warning, hierarchical resume (manager_agent now eligible
for _resuming), and parallel async resume (each contextvars copy owns
its own scope). Tests added.
* feat: add Skills Repository — registry, cache, CLI, and SDK integration
Adds a Skills Repository feature allowing users to publish, install,
and use skills from the CrewAI registry with @org/skill-name refs.
## What's New
### SDK (lib/crewai/)
- SkillFrontmatter: added optional 'version' field (backward compatible)
- SkillCacheManager: manages ~/.crewai/skills/{org}/{name}/ with
.crewai_meta.json tracking, path-traversal-safe tar extraction
- SkillRegistry: parse @org/skill-name refs, local-first resolution
(./skills/ > cache > download), interactive prompt on first use,
CI-mode guard (CREWAI_NONINTERACTIVE/CI env vars)
- Agent.skills and Crew.skills widened to accept str refs (@org/name)
- set_skills() resolves registry refs with org-prefixed dedup keys
- New events: SkillDownloadStartedEvent, SkillDownloadCompletedEvent
### CLI (lib/cli/)
- crewai skill create <name> — context-aware (project vs standalone)
- crewai skill install @org/name — downloads to ./skills/ or cache
- crewai skill publish — ZIP + upload to org registry
- crewai skill list — show installed skills
### PlusAPI (lib/crewai-core/)
- Added SKILLS_RESOURCE, get_skill(), publish_skill(), list_skills()
### Scaffolding
- crew and flow templates now include skills/ directory
### Tests
- 91 SDK skill tests + 15 CLI skill tests, all passing
* fix: address all CI failures and CodeRabbit review comments
Lint:
- Remove unused imports (click, pytest, json)
- Replace try-except-pass with logging (S110)
- Fix unprotected zipfile.extractall (S202)
Security:
- Path traversal: startswith → is_relative_to for tar extraction
- Add path traversal protection to ZIP extraction via _safe_extract_zip
- Both cache.py and CLI main.py hardened
Type checker:
- Fix import path: crewai.events.event_bus (not crewai_event_bus)
- Remove unused type: ignore comments
- Fix type mismatches in set_skills() variable types
Code quality:
- Fix f-string interpolation in SkillNotCachedError
- Use ValidationError instead of Exception in test
* style: ruff format + autofix remaining lint errors
* refactor: reuse SDK parser and SkillCacheManager in CLI
- _parse_frontmatter() now delegates to crewai.skills.parser.parse_frontmatter
when available, with a minimal fallback for CLI-only installs
- install() global cache path now reuses SkillCacheManager.store() instead
of duplicating metadata writing logic
* refactor: add _print_current_organization to SkillCommand (matches ToolCommand pattern)
* fix: write .crewai_meta.json in fallback install path
CodeRabbit caught that the ImportError fallback in install() didn't write
cache metadata, making skills invisible to 'crewai skill list'.
* fix: tighten @org/name ref validation to prevent path traversal
Reject refs with multiple slashes (@org/a/b), dot segments (@../skill),
or leading dots in org/name. Applied to both CLI install() and SDK
parse_registry_ref() so the contract is enforced consistently.
* fix: update test assertions to match tightened error messages
* fix: align OSS client with AMP API contract
- download_skill(): fetch download_url (presigned URL) instead of
expecting inline base64. Falls back to 'file' field for compat.
- Read 'latest_version' field, fall back to 'version'
- Same fixes applied to CLI install() command
* fix: publish as tar.gz (matches AMP content_type validation) + add zip fallback to SDK cache
CLI publish:
- _build_skill_zip → _build_skill_tarball (tar.gz format)
- Content type: application/x-gzip (matches SkillVersion validation)
SDK cache:
- store() now tries tar.gz first, falls back to zip extraction
- Added _safe_extract_zip for path-traversal-safe zip handling
- Both formats work for download/install regardless of server format
---------
Co-authored-by: João Moura <joaomdmoura@gmail.com>
When a tool with result_as_answer=True raises an exception, the agent
was receiving result_as_answer=True and returning the error string as
the final answer. Now we set result_as_answer=False when an error event
is emitted, allowing the agent to reflect and retry.
FixescrewAIInc/crewAI#5156
---------
Co-authored-by: NIK-TIGER-BILL <nik.tiger.bill@github.com>
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
## Summary
- Reverts `b0e2fda` ("fix(flow): add execution_id separate from state.id", COR-48): removes `Flow.execution_id` and points `current_flow_id` / `current_flow_request_id` back at `flow_id` (i.e. `state.id`). The separate per-run tracking id was no longer the right abstraction once `restore_from_state_id` reshapes how `state.id` is assigned;
- Adds an optional `restore_from_state_id` kwarg to `Flow.kickoff` / `Flow.kickoff_async` that hydrates state from a previously-persisted flow's latest snapshot
- Reassigns `state.id` to a fresh value (or `inputs["id"]` if pinned) so the new run's `@persist` writes don't extend the source's history
- Existing `inputs["id"]` resume, `@persist`, and `from_checkpoint` paths are unchanged
## Problem
`@persist` only supports *resume* today: `kickoff(inputs={"id": <uuid>})` hydrates state and continues writing under the same `flow_uuid`. There's no way to **fork** — hydrate from a snapshot but persist under a separate key, leaving the source's history intact. This PR adds that.
| | `state.id` after kickoff | `@persist` writes land under |
|---|---|---|
| `inputs["id"]` (resume) | supplied id | supplied id (extends history) |
| `restore_from_state_id` (fork) | fresh id, or `inputs["id"]` if pinned | new id (source preserved) |
## Behavior
| `inputs.id` | `restore_from_state_id` | Effect |
|---|---|---|
| — | — | Fresh kickoff |
| set | — | Existing resume |
| — | UUID | Fork — new `state.id`, hydrated from source |
| set | UUID | Fork into a pinned `state.id`, hydrated from source |
- Source not found → silent fallback (mirrors existing resume)
- Both `from_checkpoint` and `restore_from_state_id` set → `ValueError`
- `restore_from_state_id=None` → byte-identical to current main
## Design
Fork hydration runs before the existing `inputs` block in `kickoff_async`. On a hit, it calls the same `_restore_state` primitive used by resume, then overwrites `state.id` with a fresh UUID (or `inputs["id"]`). A `fork_succeeded` flag gates the existing `inputs["id"]` path so we don't double-load. `_completed_methods` / `_is_execution_resuming` are intentionally untouched — skip-completed-methods remains the territory of `apply_checkpoint` and `from_pending`.
## Test plan
- [ ] `pytest tests/test_flow_persistence.py` — 5 new tests (four-row matrix, not-found fallback, default no-op, conflict raise) + 6 existing as regression
- [ ] `pytest tests/test_flow.py` — broader flow suite
- [ ] Manual end-to-end against an HITL `@persist` flow
* feat(azure): forward credential_scopes to Azure AI Inference client
Adds a credential_scopes field to the native Azure AI Inference
provider and a matching AZURE_CREDENTIAL_SCOPES env var
(comma-separated). The value is forwarded to ChatCompletionsClient /
AsyncChatCompletionsClient when set, letting keyless / Entra-based
callers target a specific Azure AD audience (e.g.
https://cognitiveservices.azure.com/.default) without subclassing the
provider. Matches the upstream azure.ai.inference SDK kwarg of the
same name.
Lazy build re-reads the env var so an LLM constructed at module
import (before deployment env vars are set) still picks up scopes —
same pattern as the existing AZURE_API_KEY / AZURE_ENDPOINT lazy
reads. to_config_dict round-trips the field.
* refactor(azure): tighten credential_scopes env handling
Address review feedback:
- Move os.getenv into the helper so AZURE_CREDENTIAL_SCOPES appears once
- Match the surrounding api_key/endpoint `or` style in the validator
- Drop the list() defensive copy in to_config_dict — every other field
in that method (and the base class's `stop`) is assigned by reference
* feat(flow): add optional key param to @persist decorator
Allows users to specify which state attribute to use as the
persistence key instead of always defaulting to state.id.
Usage: @persist(key='conversation_id')
Falls back to state.id when key is not provided (no breaking change).
Raises ValueError if the specified key is missing or falsy on state.
* docs(flow): document @persist key parameter for custom persistence keys
* fix(flow): use explicit None check for persist key to avoid empty-string fallback
---------
Co-authored-by: iris-clawd <iris-clawd@anthropic.com>
Co-authored-by: iris-clawd <iris@crewai.com>
Co-authored-by: Lorenze Jay <63378463+lorenzejay@users.noreply.github.com>
CrewAgentExecutor is reused across sequential tasks but invoke/ainvoke
only appended to self.messages and never reset self.iterations, so
task 2 inherited task 1's history and iteration count.
* fix(flow): add execution_id separate from state.id (COR-48)
When a consumer passes `id` in `kickoff(inputs=...)`, that value
overwrites the flow's state.id — which was also being used as the
execution tracking identity for telemetry, tracing, and external
correlation. Two kickoffs sharing the same consumer id ended up
with the same tracking id, breaking any downstream system that
joins on it.
Introduces `Flow.execution_id`: a stable per-run identifier stored
as a `PrivateAttr` on the `Flow` model, exposed via property +
setter. It defaults to a fresh `uuid4` per instance, is never
touched by `inputs["id"]`, and can be assigned by outer systems
that already have an execution identity (e.g. a task id).
Switches the `current_flow_id` / `current_flow_request_id`
ContextVars to seed from `execution_id` so OTel spans emitted by
`FlowTrackable` children correlate on the stable tracking key.
`state.id` keeps its existing override semantics for
persistence/restore — consumers resuming a persisted flow via
`inputs["id"]` work exactly as before.
Adds tests covering default uniqueness per instance, immunity to
consumer `inputs["id"]`, context-var propagation, absence from
serialized state, and parity for dict-state flows.
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
Enables keyless Azure auth (OIDC Workload Identity Federation, Managed
Identity, Azure CLI, env-configured Service Principal) without any
crewAI-specific configuration. Customers whose deployment environment
already sets the standard azure-identity env vars get keyless auth for
free; the existing API-key path is unchanged.
Linear: FAC-40
* fix: merge execution metadata on duplicate batch initialization in TraceBatchManager
- Updated TraceBatchManager to merge execution metadata when a batch is initialized multiple times.
- Enhanced logging to reflect the merging of metadata during duplicate initialization.
- Added a test case to verify that execution metadata is correctly merged when initializing a batch after a lazy action.
* drop env events emitting from traces listener
Add fork classmethod, _restore_runtime, and _restore_event_scope
to BaseAgent. Fix from_checkpoint to set runtime state on the
event bus and restore event scopes. Store kickoff event ID across
checkpoints to skip re-emission on resume. Handle agent entity
type in checkpoint CLI and TUI.
The test_older_than tests in both JSON and SQLite prune suites used
hardcoded 2026-04-17 timestamps for the 'new' checkpoint. Once that
date passes, the checkpoint is older than 1 day and gets pruned along
with the 'old' one, causing assert count >= 1 to fail (count=0).
Use 2099-01-01 for the 'new' checkpoint so tests remain stable.
Co-authored-by: Joao Moura <joaomdmoura@gmail.com>
Add three new CLI subcommands to improve checkpoint UX:
- `crewai checkpoint resume [id]` skips the TUI and resumes from the
latest or specified checkpoint directly
- `crewai checkpoint diff <id1> <id2>` compares two checkpoints showing
changes in metadata, inputs, task status, and outputs
- `crewai checkpoint prune --keep N --older-than Xd` removes old
checkpoints from JSON dirs or SQLite databases
Also writes a resume hint to stderr after every checkpoint save so
users discover the command without needing to know it exists.