Two correctness fixes uncovered while landing the OTel finish_reason +
response_id plumbing:
- LiteLLM streaming (sync + async): `stream_options={"include_usage": True}`
causes LiteLLM to emit a final usage-only chunk with `choices=[]`. The
post-loop `_extract_finish_reason_and_response_id(last_chunk)` silently
returned `(None, None)` because the last chunk has no choices, even though
earlier chunks carried `finish_reason="stop"`. Track both fields
incrementally inside the loop (mirroring how OpenAI/Gemini/Azure already
handle their native streams) and use the tracked values for the
LLMCallCompletedEvent emission and the partial-response error path.
- Bedrock Converse: `ResponseMetadata.RequestId` is an AWS infra trace id,
not a model-level response id (semantically different from OpenAI's
`chatcmpl-XXX`). Return None for `response_id` rather than mislead
downstream telemetry consumers. The audit-fix's async propagation chain
still works — None propagates through unchanged.
Adds `test_llm_streaming_finish_reason.py` pinning both the sync and async
LiteLLM streaming paths against the include_usage chunk shape.
The original commit covered every provider's sync path and Bedrock's
sync streaming path, but two Bedrock async paths still emitted
LLMCallCompletedEvent without finish_reason/response_id:
- _ahandle_converse: the final fallback emit_call_completed_event call
was missing both fields. Added stop_reason + response_id matching the
other emission sites in the same function.
- _ahandle_streaming_converse: response_id was never seeded from the
initial response object, and stream_finish_reason wasn't propagated
to the structured-output and final-text emissions. Now extracts
response_id up front and threads stream_finish_reason through every
completion event.
Adds a dedicated test file covering the new event fields end-to-end:
- LLMCallCompletedEvent.finish_reason / response_id Pydantic validation
(string accepted, None default, non-string coerced to None).
- LLMCallStartedEvent sampling params (all nine fields accepted, default
to None).
- BaseLLM._emit_call_started_event introspecting sampling params off
self, with explicit kwargs overriding.
- BaseLLM._emit_call_completed_event passing finish_reason/response_id
through to the event.
- LLM._extract_finish_reason_and_response_id across the LiteLLM shapes
(non-streaming response, streaming chunk, dict, missing fields,
non-string values, unexpected input).
Companion to the OTel GenAI emitter compliance work in crewai-enterprise
(CON-172). Today the enterprise emitter reads these fields off the OSS
LLM events via `getattr(..., None)`, so it produces valid (but partial)
spans against the existing OSS surface. This change makes those fields
first-class on the events so spans can carry the real provider data.
What this adds:
- `LLMCallStartedEvent` gains the sampling-param fields the emitter needs
for `gen_ai.request.*`: `temperature`, `top_p`, `max_tokens`, `stream`,
`seed`, `stop_sequences`, `frequency_penalty`, `presence_penalty`, `n`.
All optional; existing call sites keep working.
- `BaseLLM._emit_call_started_event` introspects those values off `self`
(the LLM instance) via `getattr(..., None)` so every provider gets the
fields propagated for free without per-provider plumbing.
- `LLMCallCompletedEvent` gains `finish_reason: str | None` and
`response_id: str | None`. A field validator coerces any non-string
value (MagicMock, unexpected provider object) to None so the event
never raises on construction.
- `LLM._emit_call_completed_event` accepts both as kwargs.
- `LLM` (LiteLLM path) gets a defensive `_extract_finish_reason_and_response_id`
helper that handles both streaming (`StreamingChoices`) and non-streaming
(`Choices`) shapes and is wired into every completion-event emission site.
- Provider completions extract native values from their SDK responses and
pass them through:
- OpenAI: `_extract_responses_finish_reason_and_id` for Responses-API,
`_extract_finish_reason_and_id` for Chat-Completions.
- Anthropic: `_extract_finish_reason_and_id` (Messages API + streaming).
- Bedrock: `_extract_finish_reason_and_id` (`stopReason` from converse).
- Gemini: `_extract_finish_reason_and_id` (`finish_reason` from candidates).
- Azure: inherits via OpenAI sub-class; adds the helper for Azure-specific
response shapes.
- openai_compatible: inherits from OpenAICompletion, no edits needed.
Compatibility:
- All new fields are optional with sensible defaults. No existing call
sites need to change.
- The validator on `LLMCallCompletedEvent` swallows non-string values for
the new fields so legacy mocks / exotic provider types don't blow up
event construction.
- Enterprise side already reads these fields defensively, so OSS and
enterprise can merge independently and cut on the same synchronized
release.
Tested against the full LLM + events + provider test suite — all green;
the 14 pre-existing multimodal failures on main are unrelated and
reproduce without this diff.
* feat: enhance StdioTransport to prevent environment variable leakage
- Replaced os.environ.copy() with get_default_environment() to ensure only allowed environment variables are passed to the MCP server.
- Added tests to verify that ambient environment variables do not leak and that user-supplied environment variables can override defaults.
* feat: add environment variable filtering hook to StdioTransport
- Introduced an optional `_env_filter_hook` to allow extensions to modify the environment variables passed to MCP servers, enabling features like credential stripping.
- Updated tests to ensure the filtering hook is applied correctly after merging user-supplied and default environment variables.
* Fix structured output leaks in tool-calling loops
* addressing comments
* drop scripts
* Update Gemini agent tests to include structured output with thoughts and bump model version to 2.5-flash
* merge
* Update Anthropic test cases to use new model and tool structure
- Changed the model from "claude-3-5-haiku-20241022" to "claude-sonnet-4-6" in the test setup.
- Updated the request and response formats in the YAML test cassette to reflect the new tool structure and improved content formatting.
- Adjusted the expected response body to match the new output format from the assistant, including changes in tool usage and response details.
- Increased rate limit values in the response headers for better testing scenarios.
* adjusted bedrock cassettes
* adjusting cassettes for bedrock
* fix test
* Update VCR configuration to use 'host' instead of 'bedrock_host' for request matching
* feat(planning): enhance planning configuration and observation handling
- Introduced attribute in to control LLM calls after each step.
- Updated to set default to 1 when planning is enabled without explicit config.
- Modified to support heuristic observations when LLM calls are disabled.
- Adjusted to respect and settings for step observations.
- Added tests to verify behavior of new configurations and ensure correct observation handling across different reasoning efforts.
* fix(agent_executor): update handling of failed steps in low effort mode
- Adjusted logic to ensure that failed steps are recorded without marking them as completed when using low reasoning effort.
- Introduced feedback for failed steps, allowing the process to continue while tracking failures.
- Added a test to verify that failed steps are correctly marked without triggering a replan.
- And linted
* linted
Every agent kickoff calls _use_trained_data, which calls
CrewTrainingHandler(...).load(). Since #4827 wrapped load() in store_lock,
that means every kickoff acquires the cross-process (Redis-backed when
REDIS_URL is set) lock even on deployments that never train and have no
trained-agents file on disk.
Move the missing/empty-file short-circuit above store_lock so the lock is
only acquired when there is actually a file to read. save() and the real
read remain locked.
- callable_to_string returns None for lambdas/closures instead of an
unresolvable dotted path; Crew filters Nones out of restored callback
lists.
- EventNode.event serializer honors info.mode so mode='json' calls cascade
properly into nested event payloads.
- RagTool.adapter serializes to None (post-validator rebuilds from
config); concrete adapters hold runtime state that can't be round-tripped.
Move scope restoration from Crew-level global push to a per-task push
inside Task via resume_task_scope() in event_context. Fixes orphan
task_started warning, hierarchical resume (manager_agent now eligible
for _resuming), and parallel async resume (each contextvars copy owns
its own scope). Tests added.
llm and prompt were declared required with exclude=True, making the
model un-restorable from its own serialized output. Mirror the
CrewAgentExecutor pattern: make them nullable with default None, keep
exclude=True, and re-attach llm on the resume path alongside the other
re-attached fields. Guard the two prompt-deref sites so the runtime
invariant survives the looser type.
* feat(tools): declare env_vars on DatabricksQueryTool
Add EnvVar import and env_vars field to DatabricksQueryTool so the host
UI knows which environment variables the tool requires. Both auth paths
(DATABRICKS_HOST+TOKEN or DATABRICKS_CONFIG_PROFILE) are marked
required=False with descriptions explaining the alternative.
* chore: update tool specifications
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix(tools): correct mongdb typo to pymongo in package_dependencies
The `package_dependencies` field in `MongoDBVectorSearchTool` referenced
the non-existent package `mongdb` instead of the actual PyPI package
`pymongo`, which is the driver imported and used throughout the file.
* chore: update tool specifications
---------
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* feat: add Skills Repository — registry, cache, CLI, and SDK integration
Adds a Skills Repository feature allowing users to publish, install,
and use skills from the CrewAI registry with @org/skill-name refs.
## What's New
### SDK (lib/crewai/)
- SkillFrontmatter: added optional 'version' field (backward compatible)
- SkillCacheManager: manages ~/.crewai/skills/{org}/{name}/ with
.crewai_meta.json tracking, path-traversal-safe tar extraction
- SkillRegistry: parse @org/skill-name refs, local-first resolution
(./skills/ > cache > download), interactive prompt on first use,
CI-mode guard (CREWAI_NONINTERACTIVE/CI env vars)
- Agent.skills and Crew.skills widened to accept str refs (@org/name)
- set_skills() resolves registry refs with org-prefixed dedup keys
- New events: SkillDownloadStartedEvent, SkillDownloadCompletedEvent
### CLI (lib/cli/)
- crewai skill create <name> — context-aware (project vs standalone)
- crewai skill install @org/name — downloads to ./skills/ or cache
- crewai skill publish — ZIP + upload to org registry
- crewai skill list — show installed skills
### PlusAPI (lib/crewai-core/)
- Added SKILLS_RESOURCE, get_skill(), publish_skill(), list_skills()
### Scaffolding
- crew and flow templates now include skills/ directory
### Tests
- 91 SDK skill tests + 15 CLI skill tests, all passing
* fix: address all CI failures and CodeRabbit review comments
Lint:
- Remove unused imports (click, pytest, json)
- Replace try-except-pass with logging (S110)
- Fix unprotected zipfile.extractall (S202)
Security:
- Path traversal: startswith → is_relative_to for tar extraction
- Add path traversal protection to ZIP extraction via _safe_extract_zip
- Both cache.py and CLI main.py hardened
Type checker:
- Fix import path: crewai.events.event_bus (not crewai_event_bus)
- Remove unused type: ignore comments
- Fix type mismatches in set_skills() variable types
Code quality:
- Fix f-string interpolation in SkillNotCachedError
- Use ValidationError instead of Exception in test
* style: ruff format + autofix remaining lint errors
* refactor: reuse SDK parser and SkillCacheManager in CLI
- _parse_frontmatter() now delegates to crewai.skills.parser.parse_frontmatter
when available, with a minimal fallback for CLI-only installs
- install() global cache path now reuses SkillCacheManager.store() instead
of duplicating metadata writing logic
* refactor: add _print_current_organization to SkillCommand (matches ToolCommand pattern)
* fix: write .crewai_meta.json in fallback install path
CodeRabbit caught that the ImportError fallback in install() didn't write
cache metadata, making skills invisible to 'crewai skill list'.
* fix: tighten @org/name ref validation to prevent path traversal
Reject refs with multiple slashes (@org/a/b), dot segments (@../skill),
or leading dots in org/name. Applied to both CLI install() and SDK
parse_registry_ref() so the contract is enforced consistently.
* fix: update test assertions to match tightened error messages
* fix: align OSS client with AMP API contract
- download_skill(): fetch download_url (presigned URL) instead of
expecting inline base64. Falls back to 'file' field for compat.
- Read 'latest_version' field, fall back to 'version'
- Same fixes applied to CLI install() command
* fix: publish as tar.gz (matches AMP content_type validation) + add zip fallback to SDK cache
CLI publish:
- _build_skill_zip → _build_skill_tarball (tar.gz format)
- Content type: application/x-gzip (matches SkillVersion validation)
SDK cache:
- store() now tries tar.gz first, falls back to zip extraction
- Added _safe_extract_zip for path-traversal-safe zip handling
- Both formats work for download/install regardless of server format
---------
Co-authored-by: João Moura <joaomdmoura@gmail.com>
Adds typed containers for wire payloads, literal aliases for HTTP method
and log type, and Ffnal markers on resource constants. Updates
upstream returns in project_utils.py and deploy/main.py to match
the new contracts.
In `_execute_task_with_a2a` and its async variant, the try body
sets `task.output_pydantic = None` before returning an A2A
response. The finally block then checks
`if task.output_pydantic is not None` before restoring the
original value — but since it was just set to None, the condition
is always False and the original value is never restored. This
permanently mutates the Task object.
Remove the guard so `output_pydantic` is unconditionally restored,
matching the unconditional restoration of `description` and
`response_model` in the same block.
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>