Two correctness fixes uncovered while landing the OTel finish_reason +
response_id plumbing:
- LiteLLM streaming (sync + async): `stream_options={"include_usage": True}`
causes LiteLLM to emit a final usage-only chunk with `choices=[]`. The
post-loop `_extract_finish_reason_and_response_id(last_chunk)` silently
returned `(None, None)` because the last chunk has no choices, even though
earlier chunks carried `finish_reason="stop"`. Track both fields
incrementally inside the loop (mirroring how OpenAI/Gemini/Azure already
handle their native streams) and use the tracked values for the
LLMCallCompletedEvent emission and the partial-response error path.
- Bedrock Converse: `ResponseMetadata.RequestId` is an AWS infra trace id,
not a model-level response id (semantically different from OpenAI's
`chatcmpl-XXX`). Return None for `response_id` rather than mislead
downstream telemetry consumers. The audit-fix's async propagation chain
still works — None propagates through unchanged.
Adds `test_llm_streaming_finish_reason.py` pinning both the sync and async
LiteLLM streaming paths against the include_usage chunk shape.
The original commit covered every provider's sync path and Bedrock's
sync streaming path, but two Bedrock async paths still emitted
LLMCallCompletedEvent without finish_reason/response_id:
- _ahandle_converse: the final fallback emit_call_completed_event call
was missing both fields. Added stop_reason + response_id matching the
other emission sites in the same function.
- _ahandle_streaming_converse: response_id was never seeded from the
initial response object, and stream_finish_reason wasn't propagated
to the structured-output and final-text emissions. Now extracts
response_id up front and threads stream_finish_reason through every
completion event.
Adds a dedicated test file covering the new event fields end-to-end:
- LLMCallCompletedEvent.finish_reason / response_id Pydantic validation
(string accepted, None default, non-string coerced to None).
- LLMCallStartedEvent sampling params (all nine fields accepted, default
to None).
- BaseLLM._emit_call_started_event introspecting sampling params off
self, with explicit kwargs overriding.
- BaseLLM._emit_call_completed_event passing finish_reason/response_id
through to the event.
- LLM._extract_finish_reason_and_response_id across the LiteLLM shapes
(non-streaming response, streaming chunk, dict, missing fields,
non-string values, unexpected input).
Companion to the OTel GenAI emitter compliance work in crewai-enterprise
(CON-172). Today the enterprise emitter reads these fields off the OSS
LLM events via `getattr(..., None)`, so it produces valid (but partial)
spans against the existing OSS surface. This change makes those fields
first-class on the events so spans can carry the real provider data.
What this adds:
- `LLMCallStartedEvent` gains the sampling-param fields the emitter needs
for `gen_ai.request.*`: `temperature`, `top_p`, `max_tokens`, `stream`,
`seed`, `stop_sequences`, `frequency_penalty`, `presence_penalty`, `n`.
All optional; existing call sites keep working.
- `BaseLLM._emit_call_started_event` introspects those values off `self`
(the LLM instance) via `getattr(..., None)` so every provider gets the
fields propagated for free without per-provider plumbing.
- `LLMCallCompletedEvent` gains `finish_reason: str | None` and
`response_id: str | None`. A field validator coerces any non-string
value (MagicMock, unexpected provider object) to None so the event
never raises on construction.
- `LLM._emit_call_completed_event` accepts both as kwargs.
- `LLM` (LiteLLM path) gets a defensive `_extract_finish_reason_and_response_id`
helper that handles both streaming (`StreamingChoices`) and non-streaming
(`Choices`) shapes and is wired into every completion-event emission site.
- Provider completions extract native values from their SDK responses and
pass them through:
- OpenAI: `_extract_responses_finish_reason_and_id` for Responses-API,
`_extract_finish_reason_and_id` for Chat-Completions.
- Anthropic: `_extract_finish_reason_and_id` (Messages API + streaming).
- Bedrock: `_extract_finish_reason_and_id` (`stopReason` from converse).
- Gemini: `_extract_finish_reason_and_id` (`finish_reason` from candidates).
- Azure: inherits via OpenAI sub-class; adds the helper for Azure-specific
response shapes.
- openai_compatible: inherits from OpenAICompletion, no edits needed.
Compatibility:
- All new fields are optional with sensible defaults. No existing call
sites need to change.
- The validator on `LLMCallCompletedEvent` swallows non-string values for
the new fields so legacy mocks / exotic provider types don't blow up
event construction.
- Enterprise side already reads these fields defensively, so OSS and
enterprise can merge independently and cut on the same synchronized
release.
Tested against the full LLM + events + provider test suite — all green;
the 14 pre-existing multimodal failures on main are unrelated and
reproduce without this diff.
* feat: enhance StdioTransport to prevent environment variable leakage
- Replaced os.environ.copy() with get_default_environment() to ensure only allowed environment variables are passed to the MCP server.
- Added tests to verify that ambient environment variables do not leak and that user-supplied environment variables can override defaults.
* feat: add environment variable filtering hook to StdioTransport
- Introduced an optional `_env_filter_hook` to allow extensions to modify the environment variables passed to MCP servers, enabling features like credential stripping.
- Updated tests to ensure the filtering hook is applied correctly after merging user-supplied and default environment variables.
* Fix structured output leaks in tool-calling loops
* addressing comments
* drop scripts
* Update Gemini agent tests to include structured output with thoughts and bump model version to 2.5-flash
* merge
* Update Anthropic test cases to use new model and tool structure
- Changed the model from "claude-3-5-haiku-20241022" to "claude-sonnet-4-6" in the test setup.
- Updated the request and response formats in the YAML test cassette to reflect the new tool structure and improved content formatting.
- Adjusted the expected response body to match the new output format from the assistant, including changes in tool usage and response details.
- Increased rate limit values in the response headers for better testing scenarios.
* adjusted bedrock cassettes
* adjusting cassettes for bedrock
* fix test
* Update VCR configuration to use 'host' instead of 'bedrock_host' for request matching
* feat(planning): enhance planning configuration and observation handling
- Introduced attribute in to control LLM calls after each step.
- Updated to set default to 1 when planning is enabled without explicit config.
- Modified to support heuristic observations when LLM calls are disabled.
- Adjusted to respect and settings for step observations.
- Added tests to verify behavior of new configurations and ensure correct observation handling across different reasoning efforts.
* fix(agent_executor): update handling of failed steps in low effort mode
- Adjusted logic to ensure that failed steps are recorded without marking them as completed when using low reasoning effort.
- Introduced feedback for failed steps, allowing the process to continue while tracking failures.
- Added a test to verify that failed steps are correctly marked without triggering a replan.
- And linted
* linted
Every agent kickoff calls _use_trained_data, which calls
CrewTrainingHandler(...).load(). Since #4827 wrapped load() in store_lock,
that means every kickoff acquires the cross-process (Redis-backed when
REDIS_URL is set) lock even on deployments that never train and have no
trained-agents file on disk.
Move the missing/empty-file short-circuit above store_lock so the lock is
only acquired when there is actually a file to read. save() and the real
read remain locked.
- callable_to_string returns None for lambdas/closures instead of an
unresolvable dotted path; Crew filters Nones out of restored callback
lists.
- EventNode.event serializer honors info.mode so mode='json' calls cascade
properly into nested event payloads.
- RagTool.adapter serializes to None (post-validator rebuilds from
config); concrete adapters hold runtime state that can't be round-tripped.
Move scope restoration from Crew-level global push to a per-task push
inside Task via resume_task_scope() in event_context. Fixes orphan
task_started warning, hierarchical resume (manager_agent now eligible
for _resuming), and parallel async resume (each contextvars copy owns
its own scope). Tests added.
llm and prompt were declared required with exclude=True, making the
model un-restorable from its own serialized output. Mirror the
CrewAgentExecutor pattern: make them nullable with default None, keep
exclude=True, and re-attach llm on the resume path alongside the other
re-attached fields. Guard the two prompt-deref sites so the runtime
invariant survives the looser type.
* feat: add Skills Repository — registry, cache, CLI, and SDK integration
Adds a Skills Repository feature allowing users to publish, install,
and use skills from the CrewAI registry with @org/skill-name refs.
## What's New
### SDK (lib/crewai/)
- SkillFrontmatter: added optional 'version' field (backward compatible)
- SkillCacheManager: manages ~/.crewai/skills/{org}/{name}/ with
.crewai_meta.json tracking, path-traversal-safe tar extraction
- SkillRegistry: parse @org/skill-name refs, local-first resolution
(./skills/ > cache > download), interactive prompt on first use,
CI-mode guard (CREWAI_NONINTERACTIVE/CI env vars)
- Agent.skills and Crew.skills widened to accept str refs (@org/name)
- set_skills() resolves registry refs with org-prefixed dedup keys
- New events: SkillDownloadStartedEvent, SkillDownloadCompletedEvent
### CLI (lib/cli/)
- crewai skill create <name> — context-aware (project vs standalone)
- crewai skill install @org/name — downloads to ./skills/ or cache
- crewai skill publish — ZIP + upload to org registry
- crewai skill list — show installed skills
### PlusAPI (lib/crewai-core/)
- Added SKILLS_RESOURCE, get_skill(), publish_skill(), list_skills()
### Scaffolding
- crew and flow templates now include skills/ directory
### Tests
- 91 SDK skill tests + 15 CLI skill tests, all passing
* fix: address all CI failures and CodeRabbit review comments
Lint:
- Remove unused imports (click, pytest, json)
- Replace try-except-pass with logging (S110)
- Fix unprotected zipfile.extractall (S202)
Security:
- Path traversal: startswith → is_relative_to for tar extraction
- Add path traversal protection to ZIP extraction via _safe_extract_zip
- Both cache.py and CLI main.py hardened
Type checker:
- Fix import path: crewai.events.event_bus (not crewai_event_bus)
- Remove unused type: ignore comments
- Fix type mismatches in set_skills() variable types
Code quality:
- Fix f-string interpolation in SkillNotCachedError
- Use ValidationError instead of Exception in test
* style: ruff format + autofix remaining lint errors
* refactor: reuse SDK parser and SkillCacheManager in CLI
- _parse_frontmatter() now delegates to crewai.skills.parser.parse_frontmatter
when available, with a minimal fallback for CLI-only installs
- install() global cache path now reuses SkillCacheManager.store() instead
of duplicating metadata writing logic
* refactor: add _print_current_organization to SkillCommand (matches ToolCommand pattern)
* fix: write .crewai_meta.json in fallback install path
CodeRabbit caught that the ImportError fallback in install() didn't write
cache metadata, making skills invisible to 'crewai skill list'.
* fix: tighten @org/name ref validation to prevent path traversal
Reject refs with multiple slashes (@org/a/b), dot segments (@../skill),
or leading dots in org/name. Applied to both CLI install() and SDK
parse_registry_ref() so the contract is enforced consistently.
* fix: update test assertions to match tightened error messages
* fix: align OSS client with AMP API contract
- download_skill(): fetch download_url (presigned URL) instead of
expecting inline base64. Falls back to 'file' field for compat.
- Read 'latest_version' field, fall back to 'version'
- Same fixes applied to CLI install() command
* fix: publish as tar.gz (matches AMP content_type validation) + add zip fallback to SDK cache
CLI publish:
- _build_skill_zip → _build_skill_tarball (tar.gz format)
- Content type: application/x-gzip (matches SkillVersion validation)
SDK cache:
- store() now tries tar.gz first, falls back to zip extraction
- Added _safe_extract_zip for path-traversal-safe zip handling
- Both formats work for download/install regardless of server format
---------
Co-authored-by: João Moura <joaomdmoura@gmail.com>
Adds typed containers for wire payloads, literal aliases for HTTP method
and log type, and Ffnal markers on resource constants. Updates
upstream returns in project_utils.py and deploy/main.py to match
the new contracts.
In `_execute_task_with_a2a` and its async variant, the try body
sets `task.output_pydantic = None` before returning an A2A
response. The finally block then checks
`if task.output_pydantic is not None` before restoring the
original value — but since it was just set to None, the condition
is always False and the original value is never restored. This
permanently mutates the Task object.
Remove the guard so `output_pydantic` is unconditionally restored,
matching the unconditional restoration of `description` and
`response_model` in the same block.
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
When a tool with result_as_answer=True raises an exception, the agent
was receiving result_as_answer=True and returning the error string as
the final answer. Now we set result_as_answer=False when an error event
is emitted, allowing the agent to reflect and retry.
FixescrewAIInc/crewAI#5156
---------
Co-authored-by: NIK-TIGER-BILL <nik.tiger.bill@github.com>
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
## Summary
- Reverts `b0e2fda` ("fix(flow): add execution_id separate from state.id", COR-48): removes `Flow.execution_id` and points `current_flow_id` / `current_flow_request_id` back at `flow_id` (i.e. `state.id`). The separate per-run tracking id was no longer the right abstraction once `restore_from_state_id` reshapes how `state.id` is assigned;
- Adds an optional `restore_from_state_id` kwarg to `Flow.kickoff` / `Flow.kickoff_async` that hydrates state from a previously-persisted flow's latest snapshot
- Reassigns `state.id` to a fresh value (or `inputs["id"]` if pinned) so the new run's `@persist` writes don't extend the source's history
- Existing `inputs["id"]` resume, `@persist`, and `from_checkpoint` paths are unchanged
## Problem
`@persist` only supports *resume* today: `kickoff(inputs={"id": <uuid>})` hydrates state and continues writing under the same `flow_uuid`. There's no way to **fork** — hydrate from a snapshot but persist under a separate key, leaving the source's history intact. This PR adds that.
| | `state.id` after kickoff | `@persist` writes land under |
|---|---|---|
| `inputs["id"]` (resume) | supplied id | supplied id (extends history) |
| `restore_from_state_id` (fork) | fresh id, or `inputs["id"]` if pinned | new id (source preserved) |
## Behavior
| `inputs.id` | `restore_from_state_id` | Effect |
|---|---|---|
| — | — | Fresh kickoff |
| set | — | Existing resume |
| — | UUID | Fork — new `state.id`, hydrated from source |
| set | UUID | Fork into a pinned `state.id`, hydrated from source |
- Source not found → silent fallback (mirrors existing resume)
- Both `from_checkpoint` and `restore_from_state_id` set → `ValueError`
- `restore_from_state_id=None` → byte-identical to current main
## Design
Fork hydration runs before the existing `inputs` block in `kickoff_async`. On a hit, it calls the same `_restore_state` primitive used by resume, then overwrites `state.id` with a fresh UUID (or `inputs["id"]`). A `fork_succeeded` flag gates the existing `inputs["id"]` path so we don't double-load. `_completed_methods` / `_is_execution_resuming` are intentionally untouched — skip-completed-methods remains the territory of `apply_checkpoint` and `from_pending`.
## Test plan
- [ ] `pytest tests/test_flow_persistence.py` — 5 new tests (four-row matrix, not-found fallback, default no-op, conflict raise) + 6 existing as regression
- [ ] `pytest tests/test_flow.py` — broader flow suite
- [ ] Manual end-to-end against an HITL `@persist` flow
* feat(crewai-tools): add highlights to ExaSearchTool, rename from EXASearchTool
- Add a highlights init param so agents can get token-efficient excerpts instead of full pages
- Rename EXASearchTool to ExaSearchTool; keep EXASearchTool as a deprecated alias so existing imports keep working
- Update the docs and example to use highlights as the recommended option
- Add a small note that says Exa is the fastest and most accurate web search API
- Add tests for the new highlights param and the deprecation alias
* fix(crewai-tools): import order and module-level Exa for tests
- Reorder std-lib imports so ruff is happy with force-sort-within-sections.
- Import Exa at module level (with a fallback) so the existing test mocks resolve.
The lazy install prompt still works if exa_py is missing.
- Allow content and summary to be a dict, matching highlights.
- Trim test file to the cases this PR introduces (highlights param and the
EXASearchTool deprecation alias). Existing init-shape tests stay.
Co-Authored-By: ishan <ishan@exa.ai>
* chore(crewai-tools): drop self-explanatory comment on schema alias
Co-Authored-By: ishan <ishan@exa.ai>
* docs(crewai-tools): default highlights to True, drop summary from examples
Co-Authored-By: ishan <ishan@exa.ai>
* docs(crewai-tools): simplify highlights examples to highlights=True
Co-Authored-By: ishan <ishan@exa.ai>
* feat(crewai-tools): add x-exa-integration header for usage tracking
Co-Authored-By: ishan <ishan@exa.ai>
* docs(crewai-tools): add Exa MCP section and resources links
Co-Authored-By: ishan <ishan@exa.ai>
---------
Co-authored-by: ishan <ishan@exa.ai>
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
Co-authored-by: Lorenze Jay <63378463+lorenzejay@users.noreply.github.com>
* feat(azure): forward credential_scopes to Azure AI Inference client
Adds a credential_scopes field to the native Azure AI Inference
provider and a matching AZURE_CREDENTIAL_SCOPES env var
(comma-separated). The value is forwarded to ChatCompletionsClient /
AsyncChatCompletionsClient when set, letting keyless / Entra-based
callers target a specific Azure AD audience (e.g.
https://cognitiveservices.azure.com/.default) without subclassing the
provider. Matches the upstream azure.ai.inference SDK kwarg of the
same name.
Lazy build re-reads the env var so an LLM constructed at module
import (before deployment env vars are set) still picks up scopes —
same pattern as the existing AZURE_API_KEY / AZURE_ENDPOINT lazy
reads. to_config_dict round-trips the field.
* refactor(azure): tighten credential_scopes env handling
Address review feedback:
- Move os.getenv into the helper so AZURE_CREDENTIAL_SCOPES appears once
- Match the surrounding api_key/endpoint `or` style in the validator
- Drop the list() defensive copy in to_config_dict — every other field
in that method (and the base class's `stop`) is assigned by reference