crewAI

mirror of https://github.com/crewAIInc/crewAI.git synced 2026-07-01 13:18:10 +00:00

Author	SHA1	Message	Date
Greyson LaLonde	bba48ec9df	refactor: pin crewai-a2a version and move a2a tests to standalone package Pin crewai-a2a to 1.13.0a6 to match workspace versioning convention. Move all a2a tests and cassettes from lib/crewai to lib/crewai-a2a, add crewai-a2a to devtools bump tooling, and update pytest/ruff/mypy configs.	2026-04-02 04:10:13 +08:00
Greyson LaLonde	2ada20e9c6	refactor: extract a2a module into standalone workspace package Move lib/crewai/src/crewai/a2a to lib/crewai-a2a/src/crewai_a2a as a separate workspace member. crewai's `[a2a]` optional extra now pulls in crewai-a2a instead of listing raw deps. All imports updated from crewai.a2a.* to crewai_a2a.*. Handle PydanticUndefinedAnnotation in crewai/__init__.py model_rebuild to break circular import at load time.	2026-04-02 02:32:14 +08:00
João Moura	258f31d44c	docs: update changelog and version for v1.13.0a6 (#5214 ) 1.13.0a6	2026-04-01 14:26:07 -03:00
João Moura	68720fd4e5	feat: bump versions to 1.13.0a6 (#5213 )	2026-04-01 14:23:44 -03:00
alex-clawd	3132910084	perf: reduce framework overhead — lazy event bus, skip tracing when disabled (#5187 ) * perf: reduce framework overhead for NVIDIA benchmarks - Lazy initialize event bus thread pool and event loop on first emit() instead of at import time (~200ms savings) - Skip trace listener registration (50+ handlers) when tracing disabled - Skip trace prompt in non-interactive contexts (isatty check) to avoid 20s timeout in CI/Docker/API servers - Skip flush() when no events were emitted (avoids 30s timeout waste) - Add _has_pending_events flag to track if any events were emitted - Add _executor_initialized flag for lazy init double-checked locking All existing behavior preserved when tracing IS enabled. No public APIs changed - only conditional guards added. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * fix: address PR review comments — tracing override, executor init order, stdin guard, unused import Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: fix ruff formatting in trace_listener.py and utils.py --------- Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com> Co-authored-by: Iris Clawd <iris@crewai.com> Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>	2026-04-01 14:17:57 -03:00
Lucas Gomide	c8f3a96779	docs: fix RBAC permission levels to match actual UI options (#5210 ) Some checks failed CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details Check Documentation Broken Links / Check broken links (push) Has been cancelled Details	2026-04-01 10:35:06 -04:00
João Moura	18ada25f01	docs: update changelog and version for v1.13.0a5 (#5200 ) Some checks failed CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details Check Documentation Broken Links / Check broken links (push) Has been cancelled Details Mark stale issues and pull requests / stale (push) Has been cancelled Details 1.13.0a5	2026-04-01 04:00:09 -03:00
João Moura	146da8d73a	feat: bump versions to 1.13.0a5 (#5199 )	2026-04-01 03:59:07 -03:00
Greyson LaLonde	98c6109214	docs: update changelog and version for v1.13.0a4 Some checks failed CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details Check Documentation Broken Links / Check broken links (push) Has been cancelled Details Build uv cache / build-cache (3.10) (push) Has been cancelled Details Build uv cache / build-cache (3.11) (push) Has been cancelled Details Build uv cache / build-cache (3.12) (push) Has been cancelled Details Build uv cache / build-cache (3.13) (push) Has been cancelled Details Nightly Canary Release / Check for new commits (push) Has been cancelled Details Nightly Canary Release / Build nightly packages (push) Has been cancelled Details Nightly Canary Release / Publish nightly to PyPI (push) Has been cancelled Details 1.13.0a4	2026-04-01 05:08:12 +08:00
Greyson LaLonde	54a9174c12	feat: bump versions to 1.13.0a4	2026-04-01 05:01:29 +08:00
Greyson LaLonde	c26ae969b3	docs: update changelog and version for v1.13.0a3 1.13.0a3	2026-04-01 04:16:25 +08:00
Greyson LaLonde	205555b786	feat: bump versions to 1.13.0a3	2026-04-01 04:02:29 +08:00
Greyson LaLonde	d6714a0e60	refactor: convert Flow to Pydantic BaseModel	2026-04-01 03:48:41 +08:00
dependabot[bot]	107bc7f7be	chore(deps): bump the security-updates group across 1 directory with 2 updates (#5088 ) Bumps the security-updates group with 2 updates in the / directory: [nltk](https://github.com/nltk/nltk) and [pypdf](https://github.com/py-pdf/pypdf). Updates `nltk` from 3.9.3 to 3.9.4 - [Changelog](https://github.com/nltk/nltk/blob/develop/ChangeLog) - [Commits](https://github.com/nltk/nltk/compare/3.9.3...3.9.4) Updates `pypdf` from 6.9.1 to 6.9.2 - [Release notes](https://github.com/py-pdf/pypdf/releases) - [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md) - [Commits](https://github.com/py-pdf/pypdf/compare/6.9.1...6.9.2) --- updated-dependencies: - dependency-name: nltk dependency-version: 3.9.4 dependency-type: indirect dependency-group: security-updates - dependency-name: pypdf dependency-version: 6.9.2 dependency-type: indirect dependency-group: security-updates ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-03-31 14:03:42 -05:00
iris-clawd	b1f49b1356	docs: fix inaccuracies in agent-capabilities across all languages (#5191 ) - Apps run locally (with CREWAI_PLATFORM_INTEGRATION_TOKEN env var), not remotely - Apps auth is an integration token, not OAuth - Updated comparison tables and card descriptions in en, pt-BR, ko, ar	2026-03-31 15:00:00 -03:00
iris-clawd	accae5ca43	docs: Add Agent Capabilities overview and improve Skills documentation (#5189 ) * docs: add Agent Capabilities overview page and improve Skills docs - New 'Agent Capabilities' page explaining all 5 extension types (Tools, MCPs, Apps, Skills, Knowledge) with comparison table and decision guide - Rewrite Skills page with practical examples showing Skills + Tools patterns, common FAQ, and Skills vs Knowledge comparison - Add cross-reference callout on Tools page linking to the capabilities overview - Add agent-capabilities to Core Concepts navigation (after agents) * docs: add pt-BR and ko translations for agent-capabilities and updated skills/tools * docs: add Arabic (ar) translations for agent-capabilities and updated skills/tools	2026-03-31 14:47:38 -03:00
Lucas Gomide	68e943be68	feat: emit token usage data in LLMCallCompletedEvent	2026-04-01 00:18:36 +08:00
Greyson LaLonde	3283a00e31	fix(deps): cap lancedb below 0.30.1 for Windows compatibility Some checks failed Build uv cache / build-cache (3.10) (push) Has been cancelled Details Build uv cache / build-cache (3.11) (push) Has been cancelled Details Build uv cache / build-cache (3.12) (push) Has been cancelled Details Build uv cache / build-cache (3.13) (push) Has been cancelled Details CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details lancedb 0.30.1 dropped the win_amd64 wheel, breaking installation on Windows. Pin to <0.30.1 so uv resolves to a version that still ships Windows binaries.	2026-03-31 16:59:45 +08:00
Greyson LaLonde	dfc0f9a317	refactor: replace InstanceOf[T] with plain type annotations Some checks failed CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details Mark stale issues and pull requests / stale (push) Has been cancelled Details * refactor: replace InstanceOf[T] with plain type annotations InstanceOf[] is a Pydantic validation wrapper that adds runtime isinstance checks. Plain type annotations are sufficient here since the models already use arbitrary_types_allowed or the types are BaseModel subclasses. * refactor: convert BaseKnowledgeStorage to BaseModel * fix: update tests for BaseKnowledgeStorage BaseModel conversion * fix: correct embedder config structure in test	2026-03-31 08:11:21 +08:00
Greyson LaLonde	ef79456968	chore: remove unused third_party LLM directory Some checks failed CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details Build uv cache / build-cache (3.10) (push) Has been cancelled Details Build uv cache / build-cache (3.11) (push) Has been cancelled Details Build uv cache / build-cache (3.12) (push) Has been cancelled Details Build uv cache / build-cache (3.13) (push) Has been cancelled Details Nightly Canary Release / Check for new commits (push) Has been cancelled Details Nightly Canary Release / Build nightly packages (push) Has been cancelled Details Nightly Canary Release / Publish nightly to PyPI (push) Has been cancelled Details	2026-03-31 07:33:56 +08:00
Greyson LaLonde	6c7ea422e7	refactor: convert LLM classes to Pydantic BaseModel	2026-03-31 07:07:11 +08:00
Lorenze Jay	bb9bcd6823	refactor: remove unused and methods from (#5172 ) This commit cleans up the class by removing the and methods, which are no longer needed. The changes help streamline the code and improve maintainability.	2026-03-30 15:01:58 -07:00
Lucas Gomide	ac14b9127e	fix: handle GPT-5.x models not supporting the `stop` API parameter (#5144 ) Some checks failed CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details GPT-5.x models reject the `stop` parameter at the API level with "Unsupported parameter: 'stop' is not supported with this model". This breaks CrewAI executions when routing through LiteLLM (e.g. via OpenAI-compatible gateways like Asimov), because the LiteLLM fallback path always includes `stop` in the API request params. The native OpenAI provider was unaffected because it never sends `stop` to the API — it applies stop words client-side via `_apply_stop_words()`. However, when the request goes through LiteLLM (custom endpoints, proxy gateways), `stop` is sent as an API parameter and GPT-5.x rejects it. Additionally, the existing retry logic that catches this error only matched the OpenAI API error format ("Unsupported parameter") but missed LiteLLM's own pre-validation error format ("does not support parameters"), so the self-healing retry never triggered for LiteLLM-routed calls.	2026-03-30 11:36:51 -04:00
Thiago Moretto	98b7626784	feat: extract and publish tool metadata to AMP (#4298 ) * Exporting tool's metadata to AMP - initial work * Fix payload (nest under `tools` key) * Remove debug message + code simplification * Priting out detected tools * Extract module name * fix: address PR review feedback for tool metadata extraction - Use sha256 instead of md5 for module name hashing (lint S324) - Filter required list to match filtered properties in JSON schema * fix: Use sha256 instead of md5 for module name hashing (lint S324) - Add missing mocks to metadata extraction failure test * style: fix ruff formatting * fix: resolve mypy type errors in utils.py * fix: address bot review feedback on tool metadata - Use `is not None` instead of truthiness check so empty tools list is sent to the API rather than being silently dropped as None - Strip __init__ suffix from module path for tools in __init__.py files - Extend _unwrap_schema to handle function-before, function-wrap, and definitions wrapper types * fix: capture env_vars declared with Field(default_factory=...) When env_vars uses default_factory, pydantic stores a callable in the schema instead of a static default value. Fall back to calling the factory when no static default is present. --------- Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>	2026-03-30 09:21:53 -04:00
iris-clawd	e21c506214	docs: Add comprehensive SSO configuration guide (#5152 ) Some checks failed CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details Check Documentation Broken Links / Check broken links (push) Has been cancelled Details Mark stale issues and pull requests / stale (push) Has been cancelled Details Nightly Canary Release / Check for new commits (push) Has been cancelled Details Nightly Canary Release / Build nightly packages (push) Has been cancelled Details Nightly Canary Release / Publish nightly to PyPI (push) Has been cancelled Details * docs: add comprehensive SSO configuration guide Add SSO documentation page covering all supported identity providers for both SaaS (AMP) and Factory deployments. Includes: - Provider overview (WorkOS, Entra ID, Okta, Auth0, Keycloak) - SaaS vs Factory SSO availability - Step-by-step setup guides per provider with env vars - CLI authentication via Device Authorization Grant - RBAC integration overview - Troubleshooting common SSO issues - Complete environment variables reference Placed in the Manage nav group alongside RBAC. * fix: add key icon to SSO docs page * fix: broken links in SSO docs (installation, configuration)	2026-03-28 13:15:34 +08:00
Greyson LaLonde	9fe0c15549	docs: update changelog and version for v1.13.0rc1 Some checks failed CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details Check Documentation Broken Links / Check broken links (push) Has been cancelled Details Mark stale issues and pull requests / stale (push) Has been cancelled Details Nightly Canary Release / Build nightly packages (push) Has been cancelled Details Nightly Canary Release / Publish nightly to PyPI (push) Has been cancelled Details Nightly Canary Release / Check for new commits (push) Has been cancelled Details 1.13.0rc1	2026-03-27 11:30:45 +08:00
Greyson LaLonde	78d8ddb649	feat: bump versions to 1.13.0rc1	2026-03-27 11:26:04 +08:00
Greyson LaLonde	1b2062009a	docs: update changelog and version for v1.13.0a2 Some checks failed CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details Check Documentation Broken Links / Check broken links (push) Has been cancelled Details Nightly Canary Release / Check for new commits (push) Has been cancelled Details Nightly Canary Release / Build nightly packages (push) Has been cancelled Details Nightly Canary Release / Publish nightly to PyPI (push) Has been cancelled Details 1.13.0a2	2026-03-27 04:05:32 +08:00
Greyson LaLonde	886aa4ba8f	feat: bump versions to 1.13.0a2	2026-03-27 04:00:59 +08:00
Greyson LaLonde	5bec000b21	feat: auto-update deployment test repo during release After PyPI publish, clones crewAIInc/crew_deployment_test, bumps the crewai[tools] pin to the new version, regenerates uv.lock, and pushes to main. Includes retry logic for CDN propagation delays.	2026-03-27 03:54:10 +08:00
Greyson LaLonde	2965384907	feat: improve enterprise release resilience and UX - Add --skip-to-enterprise flag to resume just Phase 3 after a failure - Add --prerelease=allow to uv sync for alpha/beta/rc versions - Retry uv sync up to 10 times to handle PyPI CDN propagation delay - Update pyproject.toml [project] version field (fixes apps/api version) - Print PR URL after creating enterprise bump PR	2026-03-27 03:36:56 +08:00
Greyson LaLonde	032ef06ef6	docs: update changelog and version for v1.13.0a1 1.13.0a1	2026-03-27 03:07:26 +08:00
Greyson LaLonde	0ce9567cfc	feat: bump versions to 1.13.0a1	2026-03-27 03:00:29 +08:00
Greyson LaLonde	d7252bfee7	fix: pin Node to LTS 22 in docs broken links workflow Mintlify doesn't support Node 25+, and `node-version: latest` was pulling 25.8.2 causing the workflow to fail.	2026-03-27 02:36:11 +08:00
Greyson LaLonde	10fc3796bb	fix: bust uv cache for freshly published packages in enterprise release	2026-03-27 02:21:31 +08:00
iris-clawd	52249683a7	docs: comprehensive RBAC permissions matrix and deployment guide (#5112 ) - Add full feature permissions matrix (11 features × permission levels) - Document Owner vs Member default permissions - Add deployment guide: what permissions are needed to deploy from GitHub or Zip - Document entity-level permissions (deployment permission types: run, traces, manage_settings, HITL, full_access) - Document entity RBAC for env vars, LLM connections, and Git repositories - Add common role patterns: Developer, Viewer/Stakeholder, Ops/Platform Admin - Add quick-reference table for minimum deployment permissions Addresses user feedback that RBAC was too restrictive and unclear: members didn't know which permissions to configure for a developer profile.	2026-03-26 12:30:17 -04:00
João Moura	6193e082e1	docs: update changelog and version for v1.12.2 (#5103 ) Some checks failed CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details Check Documentation Broken Links / Check broken links (push) Has been cancelled Details 1.12.2	2026-03-26 03:54:26 -03:00
João Moura	33f33c6fcc	feat: bump versions to 1.12.2 (#5101 )	2026-03-26 03:33:10 -03:00
alex-clawd	74976b157d	fix: preserve method return value as flow output for @human_feedback with emit (#5099 ) * fix: preserve method return value as flow output for @human_feedback with emit When a @human_feedback decorated method with emit= is the final method in a flow (no downstream listeners triggered), the flow's final output was incorrectly set to the collapsed outcome string (e.g., 'approved') instead of the method's actual return value (e.g., a state dict). Root cause: _process_feedback() returns the collapsed_outcome string when emit is set, and this string was being stored as the method's result in _method_outputs. The fix: 1. In human_feedback.py: After _process_feedback, stash the real method_output on the flow instance as _human_feedback_method_output when emit is set. 2. In flow.py: After appending a method result to _method_outputs, check if _human_feedback_method_output is set. If so, replace the last entry with the stashed real output and clear the stash. This ensures: - Routing still works correctly (collapsed outcome used for @listen matching) - The flow's final result is the actual method return value - If downstream listeners execute, their results become the final output Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> * style: ruff format flow.py * fix: use per-method dict stash for concurrency safety and None returns Addresses review comments: - Replace single flow-level slot with dict keyed by method name, safe under concurrent @human_feedback+emit execution - Dict key presence (not value) indicates stashed output, correctly preserving None return values - Added test for None return value preservation --------- Co-authored-by: Joao Moura <joao@crewai.com> Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>	2026-03-26 03:28:17 -03:00
Greyson LaLonde	bd03f6cf64	feat: add enterprise release phase to devtools release Some checks failed CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details Check Documentation Broken Links / Check broken links (push) Has been cancelled Details Nightly Canary Release / Check for new commits (push) Has been cancelled Details Nightly Canary Release / Build nightly packages (push) Has been cancelled Details Nightly Canary Release / Publish nightly to PyPI (push) Has been cancelled Details Mark stale issues and pull requests / stale (push) Has been cancelled Details	2026-03-26 12:22:37 +08:00
Rip&Tear	a91cd1a7d7	Revise security policy and reporting instructions (#5096 ) * Revise security policy and reporting instructions Updated the security reporting process and contact details. * Update .github/security.md ---------	2026-03-26 10:50:21 +08:00
João Moura	66dee3195f	docs: update changelog and version for v1.12.1 (#5095 ) 1.12.1	2026-03-25 22:52:11 -03:00
João Moura	034f576dc0	feat: bump versions to 1.12.1 (#5094 ) * chore: bump version to 1.12.1 across all modules * feat: bump versions to 1.12.1	2026-03-25 22:45:33 -03:00
Lucas Gomide	918654318b	feat: add request_id to HumanFeedbackRequestedEvent (#5092 ) * feat: add request_id to HumanFeedbackRequestedEvent Allow platforms to attach a correlation identifier to human feedback requests so downstream consumers can deterministically match spans to their corresponding feedback records * feat: add request_id to HumanFeedbackReceivedEvent for correlation Without request_id on the received event, consumers cannot correlate a feedback response back to its originating request. Both sides of the request/response pair need the correlation identifier. --------- Co-authored-by: Alex <alex@crewai.com>	2026-03-25 22:43:24 -03:00
João Moura	371e6cfd11	docs: update changelog and version for v1.12.0 (#5091 ) 1.12.0	2026-03-25 22:07:28 -03:00
João Moura	6fd70ce6e5	chore: bump version to 1.14.0 across all modules (#5090 ) * chore: bump version to 1.14.0 across all modules * chore: downgrade version to 1.12.0 across all modules	2026-03-25 22:03:37 -03:00
alex-clawd	c183b77991	fix: address Copilot review on OpenAI-compatible providers (#5042 ) (#5089 ) Some checks failed CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details Check Documentation Broken Links / Check broken links (push) Has been cancelled Details Build uv cache / build-cache (3.10) (push) Has been cancelled Details Build uv cache / build-cache (3.11) (push) Has been cancelled Details Build uv cache / build-cache (3.12) (push) Has been cancelled Details Build uv cache / build-cache (3.13) (push) Has been cancelled Details - Delegate supports_function_calling() to parent (handles o1 models via OpenRouter) - Guard empty env vars in base_url resolution - Fix misleading comment about model validation rules - Remove unused MagicMock import - Use 'is not None' for env var restoration in tests Co-authored-by: Joao Moura <joao@crewai.com>	2026-03-25 18:22:13 -03:00
Greyson LaLonde	b5a0d6e709	docs: update changelog and version for v1.12.0a3 1.12.0a3	2026-03-26 04:17:37 +08:00
Greyson LaLonde	454156cff9	feat: bump versions to 1.12.0a3	2026-03-26 04:12:49 +08:00
Tiago Freire	d86707da3d	Fix: bad credentials for traces batch push (404) (#4947 ) ## Summary ### Core fixes <details> <summary><b>Fix silent 404 cascade on trace event send</b></summary> When `_initialize_backend_batch` failed, `trace_batch_id` was left populated with a client-generated UUID never registered server-side. All subsequent event sends hit a non-existent batch endpoint and returned 404. Now all three failure paths (None response, non-2xx status, exception) clear `trace_batch_id`. </details> <details> <summary><b>Fix first-time deferred batch init silently skipped</b></summary> First-time users have `is_tracing_enabled_in_context() = False` by design. This caused `_initialize_backend_batch` to return early without creating the batch, and `finalize_batch` to skip finalization (same guard). The first-time handler now passes `skip_context_check=True` to bypass both guards, calls `_finalize_backend_batch` directly, gates `backend_initialized` on actual success, checks `_send_events_to_backend` return status (marking batch as failed on 500), captures event count/duration/batch ID before they're consumed by send/finalize, and cleans up all singleton state via `_reset_batch_state()` on every exit path. </details> <details> <summary><b>Sync <code>is_current_batch_ephemeral</code> on batch creation success</b></summary> When the batch is successfully created on the server, `is_current_batch_ephemeral` is now synced with the actual `use_ephemeral` value used. This prevents endpoint mismatches where the batch was created on one endpoint but events and finalization were sent to a different one, resulting in 404. </details> <details> <summary><b>Route <code>mark_trace_batch_as_failed</code> to correct endpoint for ephemeral batches</b></summary> `mark_trace_batch_as_failed` always routed to the non-ephemeral endpoint (`/tracing/batches/{id}`), causing 404s when called on ephemeral batches — the same class of endpoint mismatch this PR aims to fix. Added `mark_ephemeral_trace_batch_as_failed` to `PlusAPI` and a `_mark_batch_as_failed` helper on `TraceBatchManager` that routes based on `is_current_batch_ephemeral`. </details> <details> <summary><b>Gate <code>backend_initialized</code> on actual init success (non-first-time path)</b></summary> On the non-first-time path, `backend_initialized` was set to `True` unconditionally after `_initialize_backend_batch` returned. With the new failure-path cleanup that clears `trace_batch_id`, this created an inconsistent state: `backend_initialized=True` + `trace_batch_id=None`. Now set via `self.trace_batch_id is not None`. </details> ### Resilience improvements <details> <summary><b>Retry transient failures on batch creation</b></summary> `_initialize_backend_batch` now retries up to 2 times with 200ms backoff on transient failures (None response, 5xx, network errors). Non-transient 4xx errors are not retried. The short backoff minimizes lock hold time on the non-first-time path where `_batch_ready_cv` is held. </details> <details> <summary><b>Fall back to ephemeral on server auth rejection</b></summary> When the non-ephemeral endpoint returns 401/403 (expired token, revoked credentials, key rotation), the client automatically switches to ephemeral tracing instead of losing traces. The fallback forwards `skip_context_check` and is guarded against infinite recursion — if ephemeral also fails, `trace_batch_id` is cleared normally. </details> <details> <summary><b>Fix action-event race initializing batch as non-ephemeral</b></summary> `_handle_action_event` called `batch_manager.initialize_batch()` directly, defaulting `use_ephemeral=False`. When a `DefaultEnvEvent` or `LLMCallStartedEvent` fired before `CrewKickoffStartedEvent` in the thread pool, the batch was locked in as non-ephemeral. Now routes through `_initialize_batch()` which computes `use_ephemeral` from `_check_authenticated()`. </details> <details> <summary><b>Guard <code>_mark_batch_as_failed</code> against cascading network errors</b></summary> When `_finalize_backend_batch` failed with a network error (e.g. `[Errno 54] Connection reset by peer`), the exception handler called `_mark_batch_as_failed` — which also makes an HTTP request on the same dead connection. That second failure was unhandled. Now wrapped in a try/except so it logs at debug level instead of propagating. </details> <details> <summary><b>Design decision: first-time users always use ephemeral</b></summary> First-time trace collection always creates ephemeral batches, regardless of authentication status. This is intentional: 1. The first-time handler UX is built around ephemeral traces — it displays an access code, a 24-hour expiry link, and opens the browser to the ephemeral trace viewer. Non-ephemeral batches don't produce these artifacts, so the handler would fall through to the "Local Traces Collected" fallback even when traces were successfully sent. 2. The server handles account linking automatically — `LinkEphemeralTracesJob` runs on user signup and migrates ephemeral traces to permanent records. Logged-in users can access their traces via their dashboard regardless. 3. Checking auth during batch setup broke event collection — moving `_check_authenticated()` into `_initialize_batch` caused the batch initialization to fail silently during the flow/crew start event handler, preventing all event collection. Keeping the first-time path fast and side-effect-free preserves event collection. The auth check is deferred to the non-first-time path (second run onwards), where `is_tracing_enabled_in_context()` is `True` and the normal tracing pipeline handles everything — including the 401/403 ephemeral fallback. </details> ### Manual tests <details> <summary><b>Matrix</b></summary> \| Scenario \| First run \| Second run \| \|----------\|-----------\|------------\| \| Logged out, fresh `.crewai_user.json` \| Ephemeral trace created, URL returned \| Ephemeral trace created, URL returned \| \| Logged in, fresh `.crewai_user.json` \| Ephemeral trace created, URL returned \| Trace batch finalized, URL returned \| \| Flow execution \| Tested with `poem_flow` \| Tested with `poem_flow` \| \| Crew execution \| Tested with `hitl_crew` \| Tested with `hitl_crew` \| </details>	2026-03-25 16:00:05 -04:00

1 2 3 4 5 ...

2170 Commits