Root cause: test_gap_implementations.py assigned directly to
crewai_event_bus.emit (instance attribute), which shadowed the class
method even after restoration. Later tests using patch.object on the
class couldn't intercept calls.
Also converts all 19 positional crewai_event_bus.emit() calls across 8
new_agent files to use the event= keyword argument, matching the
pattern in llm.py. Adds <summary> tag stripping for both ainvoke() and
astream() to prevent summarization prompt leakage in agent responses.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove dead `env_vars.get("MODEL")` check in _setup_env (always truthy
since MODEL is set two lines above)
- Fix test_sync_delegation mock: use return_value instead of side_effect
list and disable planning to prevent StopAsyncIteration on Python 3.10
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- test_streaming_properties_from_docs: add record_mode="none" so VCR never
falls through to the real OpenAI API; cassette already exists.
- gitpython >=3.1.50 (GHSA-mv93-w799-cj2w)
- langchain-core >=1.3.1 (GHSA-pjwx-r37v-7724; resolves to 1.3.3)
- urllib3 >=2.7.0 (GHSA-qccp-gfcp-xxvc, GHSA-mf9v-mfxr-j63j; 2.6.4 was never released)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Replace S101 assert guards with explicit if/raise RuntimeError in
benchmark.py and cli.py (3 locations)
- Fix test_create_llm_from_env_with_unaccepted_attributes to use
DEFAULT_LLM_MODEL with clear=True so the assertion isn't brittle
against the hardcoded model name
- Add n_iterations loop to _test_new_agents (was unused, now mirrors
_train_new_agents iteration pattern)
- Consolidate dotenv loading in cli.py and agent_tui.py to use the
existing load_env_vars() from utils.py instead of duplicating logic
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- test_lite_agent_standalone_still_works: replace real LLM with Mock to
avoid ConnectionError hitting OpenAI in CI
- coworker_tools.py:352: add type: ignore[import-not-found] for crewai.a2a.client
- coworker_tools.py:415: filter BaseException instances from gather results
so return type matches list[str]
- executor.py:740: add type: ignore[import-not-found] for checkpoint_events
- executor.py:2245: guard r.content access with isinstance(r, Message) check
- flow.py:3259: cast model_dump() result to dict[str, Any]
- flow.py: fix response/future no-redef errors by hoisting declarations
and renaming coro_future to avoid duplicate type annotations
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- B904: raise KeyboardInterrupt from err in cli_provider.py
- mypy: add TYPE_CHECKING import for SQLiteConversationStorage, annotate
_initialized class var in TaskScheduler, fix Match type params and
Returning Any in create_agent.py
- tests: mock aget_llm_response in 3 integration tests that fail when
network is blocked but OPENAI_API_KEY is set
- flow.py: use asyncio.run_coroutine_threadsafe() instead of asyncio.run()
when a loop is already running in ask() and say()
- cli.py: fix threshold=0.0 treated as falsy by using `is not None` check
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Introduced a new `LoadedCases` class to encapsulate benchmark cases and optional thresholds, improving data management.
- Updated `load_benchmark_cases` function to support loading cases from both bare arrays and object wrappers with a threshold.
- Modified CLI options to allow dynamic threshold configuration, defaulting to a value from `config.json` if not specified.
- Enhanced error handling for invalid benchmark case formats and added tests to validate new functionality.
These changes aim to improve the flexibility and usability of benchmark case management within the CrewAI framework.
- Added a `_safe_render` function to escape Rich markup and convert markdown to Rich format.
- Implemented token-by-token streaming for agent responses in the TUI, improving user experience during interactions.
- Updated the CLI to allow selection of LLM providers and models, enhancing flexibility in agent creation.
- Refactored benchmark case paths to use a `tests` directory instead of `benchmarks`.
- Introduced a `last_stream_result` property in the `NewAgent` class to retrieve the latest streaming response.
These changes aim to provide a more interactive and user-friendly experience in managing agents within the CrewAI framework.
- Introduced a new `create_agent` command for interactive agent definition.
- Added `agent_tui.py` for a conversational TUI supporting multi-agent interactions.
- Updated CLI to support agent creation and training workflows.
- Enhanced `.gitignore` to exclude demo files and configuration artifacts.
- Implemented a benchmark runner for testing agent performance against defined cases.
This commit lays the groundwork for a more interactive and user-friendly experience in managing agents within the CrewAI framework.
When a tool with result_as_answer=True raises an exception, the agent
was receiving result_as_answer=True and returning the error string as
the final answer. Now we set result_as_answer=False when an error event
is emitted, allowing the agent to reflect and retry.
FixescrewAIInc/crewAI#5156
---------
Co-authored-by: NIK-TIGER-BILL <nik.tiger.bill@github.com>
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
## Summary
- Reverts `b0e2fda` ("fix(flow): add execution_id separate from state.id", COR-48): removes `Flow.execution_id` and points `current_flow_id` / `current_flow_request_id` back at `flow_id` (i.e. `state.id`). The separate per-run tracking id was no longer the right abstraction once `restore_from_state_id` reshapes how `state.id` is assigned;
- Adds an optional `restore_from_state_id` kwarg to `Flow.kickoff` / `Flow.kickoff_async` that hydrates state from a previously-persisted flow's latest snapshot
- Reassigns `state.id` to a fresh value (or `inputs["id"]` if pinned) so the new run's `@persist` writes don't extend the source's history
- Existing `inputs["id"]` resume, `@persist`, and `from_checkpoint` paths are unchanged
## Problem
`@persist` only supports *resume* today: `kickoff(inputs={"id": <uuid>})` hydrates state and continues writing under the same `flow_uuid`. There's no way to **fork** — hydrate from a snapshot but persist under a separate key, leaving the source's history intact. This PR adds that.
| | `state.id` after kickoff | `@persist` writes land under |
|---|---|---|
| `inputs["id"]` (resume) | supplied id | supplied id (extends history) |
| `restore_from_state_id` (fork) | fresh id, or `inputs["id"]` if pinned | new id (source preserved) |
## Behavior
| `inputs.id` | `restore_from_state_id` | Effect |
|---|---|---|
| — | — | Fresh kickoff |
| set | — | Existing resume |
| — | UUID | Fork — new `state.id`, hydrated from source |
| set | UUID | Fork into a pinned `state.id`, hydrated from source |
- Source not found → silent fallback (mirrors existing resume)
- Both `from_checkpoint` and `restore_from_state_id` set → `ValueError`
- `restore_from_state_id=None` → byte-identical to current main
## Design
Fork hydration runs before the existing `inputs` block in `kickoff_async`. On a hit, it calls the same `_restore_state` primitive used by resume, then overwrites `state.id` with a fresh UUID (or `inputs["id"]`). A `fork_succeeded` flag gates the existing `inputs["id"]` path so we don't double-load. `_completed_methods` / `_is_execution_resuming` are intentionally untouched — skip-completed-methods remains the territory of `apply_checkpoint` and `from_pending`.
## Test plan
- [ ] `pytest tests/test_flow_persistence.py` — 5 new tests (four-row matrix, not-found fallback, default no-op, conflict raise) + 6 existing as regression
- [ ] `pytest tests/test_flow.py` — broader flow suite
- [ ] Manual end-to-end against an HITL `@persist` flow
* feat(azure): forward credential_scopes to Azure AI Inference client
Adds a credential_scopes field to the native Azure AI Inference
provider and a matching AZURE_CREDENTIAL_SCOPES env var
(comma-separated). The value is forwarded to ChatCompletionsClient /
AsyncChatCompletionsClient when set, letting keyless / Entra-based
callers target a specific Azure AD audience (e.g.
https://cognitiveservices.azure.com/.default) without subclassing the
provider. Matches the upstream azure.ai.inference SDK kwarg of the
same name.
Lazy build re-reads the env var so an LLM constructed at module
import (before deployment env vars are set) still picks up scopes —
same pattern as the existing AZURE_API_KEY / AZURE_ENDPOINT lazy
reads. to_config_dict round-trips the field.
* refactor(azure): tighten credential_scopes env handling
Address review feedback:
- Move os.getenv into the helper so AZURE_CREDENTIAL_SCOPES appears once
- Match the surrounding api_key/endpoint `or` style in the validator
- Drop the list() defensive copy in to_config_dict — every other field
in that method (and the base class's `stop`) is assigned by reference
* feat(flow): add optional key param to @persist decorator
Allows users to specify which state attribute to use as the
persistence key instead of always defaulting to state.id.
Usage: @persist(key='conversation_id')
Falls back to state.id when key is not provided (no breaking change).
Raises ValueError if the specified key is missing or falsy on state.
* docs(flow): document @persist key parameter for custom persistence keys
* fix(flow): use explicit None check for persist key to avoid empty-string fallback
---------
Co-authored-by: iris-clawd <iris-clawd@anthropic.com>
Co-authored-by: iris-clawd <iris@crewai.com>
Co-authored-by: Lorenze Jay <63378463+lorenzejay@users.noreply.github.com>
CrewAgentExecutor is reused across sequential tasks but invoke/ainvoke
only appended to self.messages and never reset self.iterations, so
task 2 inherited task 1's history and iteration count.
* fix(flow): add execution_id separate from state.id (COR-48)
When a consumer passes `id` in `kickoff(inputs=...)`, that value
overwrites the flow's state.id — which was also being used as the
execution tracking identity for telemetry, tracing, and external
correlation. Two kickoffs sharing the same consumer id ended up
with the same tracking id, breaking any downstream system that
joins on it.
Introduces `Flow.execution_id`: a stable per-run identifier stored
as a `PrivateAttr` on the `Flow` model, exposed via property +
setter. It defaults to a fresh `uuid4` per instance, is never
touched by `inputs["id"]`, and can be assigned by outer systems
that already have an execution identity (e.g. a task id).
Switches the `current_flow_id` / `current_flow_request_id`
ContextVars to seed from `execution_id` so OTel spans emitted by
`FlowTrackable` children correlate on the stable tracking key.
`state.id` keeps its existing override semantics for
persistence/restore — consumers resuming a persisted flow via
`inputs["id"]` work exactly as before.
Adds tests covering default uniqueness per instance, immunity to
consumer `inputs["id"]`, context-var propagation, absence from
serialized state, and parity for dict-state flows.
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
Enables keyless Azure auth (OIDC Workload Identity Federation, Managed
Identity, Azure CLI, env-configured Service Principal) without any
crewAI-specific configuration. Customers whose deployment environment
already sets the standard azure-identity env vars get keyless auth for
free; the existing API-key path is unchanged.
Linear: FAC-40
* fix: merge execution metadata on duplicate batch initialization in TraceBatchManager
- Updated TraceBatchManager to merge execution metadata when a batch is initialized multiple times.
- Enhanced logging to reflect the merging of metadata during duplicate initialization.
- Added a test case to verify that execution metadata is correctly merged when initializing a batch after a lazy action.
* drop env events emitting from traces listener
Add fork classmethod, _restore_runtime, and _restore_event_scope
to BaseAgent. Fix from_checkpoint to set runtime state on the
event bus and restore event scopes. Store kickoff event ID across
checkpoints to skip re-emission on resume. Handle agent entity
type in checkpoint CLI and TUI.
The test_older_than tests in both JSON and SQLite prune suites used
hardcoded 2026-04-17 timestamps for the 'new' checkpoint. Once that
date passes, the checkpoint is older than 1 day and gets pruned along
with the 'old' one, causing assert count >= 1 to fail (count=0).
Use 2099-01-01 for the 'new' checkpoint so tests remain stable.
Co-authored-by: Joao Moura <joaomdmoura@gmail.com>
Add three new CLI subcommands to improve checkpoint UX:
- `crewai checkpoint resume [id]` skips the TUI and resumes from the
latest or specified checkpoint directly
- `crewai checkpoint diff <id1> <id2>` compares two checkpoints showing
changes in metadata, inputs, task status, and outputs
- `crewai checkpoint prune --keep N --older-than Xd` removes old
checkpoints from JSON dirs or SQLite databases
Also writes a resume hint to stderr after every checkpoint save so
users discover the command without needing to know it exists.