feat: add native OpenTelemetry instrumentation

Open spans directly on the user's thread so that stdlib log records
emitted during hot paths like `Crew.kickoff`, `BaseTool.run`, and
`LLM.call` carry the active trace context and correlate with the
spans they belong to — a gap the previous metrics-only telemetry
could not close. Introduces a `crewai.telemetry.otel` module
exposing `operation` and `follows_from`, instruments the execution
hot paths, and propagates the active context across every
parallel-dispatch site. Depends only on `opentelemetry-api` so
provider and exporter choice stays with the host application per the
standard OTel library pattern; without an installed SDK the
`ProxyTracer` keeps everything as a NoOp.
This commit is contained in:
Lucas Gomide
2026-06-22 15:58:39 -03:00
parent 4cbfbdb232
commit fb4b2afb77
27 changed files with 1637 additions and 515 deletions

View File

@@ -327,6 +327,7 @@
"pages": [
"edge/en/observability/tracing",
"edge/en/observability/overview",
"edge/en/observability/opentelemetry",
"edge/en/observability/arize-phoenix",
"edge/en/observability/braintrust",
"edge/en/observability/datadog",

View File

@@ -4,6 +4,37 @@ description: "Product updates, improvements, and bug fixes for CrewAI"
icon: "clock"
mode: "wide"
---
<Update label="Unreleased">
## Native OpenTelemetry instrumentation
CrewAI now ships native [OpenTelemetry](https://opentelemetry.io/) spans
for every major step of execution: crew kickoffs, task runs, agent
steps, tool calls, LLM requests, flow methods, memory reads/writes,
knowledge queries, A2A delegations, agent reasoning, and LLM
guardrails. See the new [OpenTelemetry guide](/en/observability/opentelemetry)
for the complete attribute reference and configuration recipes.
**What this means for existing OTel users:** if your application already
installs a `TracerProvider` (Datadog, Honeycomb, Tempo, Jaeger, OTLP,
etc.) you will start seeing crewAI spans alongside your service traces
automatically — no code changes required. Logs emitted while a crewAI
span is active are correlated to the trace via the standard
OpenTelemetry `LoggingHandler`.
Spans are opt-in by construction: when no SDK provider is installed,
every instrumentation point degrades to a no-op span with effectively
zero overhead. To enable head sampling for production:
```python
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.sampling import ParentBased, TraceIdRatioBased
# Sample 10% of root traces.
provider = TracerProvider(sampler=ParentBased(root=TraceIdRatioBased(0.1)))
```
</Update>
<Update label="Jun 18, 2026">
## v1.14.8a2

View File

@@ -0,0 +1,184 @@
---
title: OpenTelemetry
description: Native OpenTelemetry spans for kickoffs, tasks, agents, tools, LLM calls, memory, and flows
icon: signal-stream
mode: "wide"
---
# Native OpenTelemetry Instrumentation
crewAI emits native [OpenTelemetry](https://opentelemetry.io/) spans for every
major step of execution: crew kickoffs, task runs, agent steps, tool calls,
LLM requests, flow methods, memory reads/writes, knowledge queries, A2A
delegations, agent reasoning, and LLM guardrails.
The instrumentation is **always on** — there is nothing to install or
configure inside crewAI itself. When no OpenTelemetry SDK is registered,
spans degrade to no-ops with effectively zero overhead. The moment your
application installs a `TracerProvider`, the same spans become real spans
that are exported to whatever backend you've configured.
This is the right integration point if you already operate an OpenTelemetry
collector (Datadog, Honeycomb, New Relic, Jaeger, Tempo, Splunk, Elastic,
or self-hosted OTLP) and want crewAI traces to land alongside your existing
service traces — with correlated logs.
## Quickstart
Install the SDK and an exporter — crewAI itself only depends on the
OpenTelemetry **API**, never the SDK.
```bash
uv add opentelemetry-sdk opentelemetry-exporter-otlp
```
Then install a provider once at startup, before you import or instantiate
any crew:
```python
from opentelemetry import trace
from opentelemetry.sdk.resources import Resource
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
provider = TracerProvider(resource=Resource.create({"service.name": "my-crew-app"}))
provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace.set_tracer_provider(provider)
from crewai import Agent, Crew, Task
crew = Crew(agents=[...], tasks=[...])
crew.kickoff() # spans are now exported to your OTLP endpoint
```
## What gets instrumented
Every span uses the tracer name `"crewai"` and follows the
`crewai.<component>.<field>` attribute naming convention.
| Span name | Where it opens | Key attributes |
| ---------------------- | ----------------------------------------- | --------------------------------------------------------------- |
| `execute crew` | `Crew.kickoff` | `crewai.crew.name`, `crewai.crew.id` |
| `execute task` | `Task.execute_sync` / `Task.execute_async`| `crewai.task.name`, `crewai.task.id` |
| `execute agent` | `Agent.execute_task` | `crewai.agent.role`, `crewai.agent.id` |
| `call tool` | `BaseTool.run` / `Tool.run` | `crewai.tool.name` |
| `call llm` | `LLM.call` and provider completions | `crewai.llm.model` |
| `execute flow` | `Flow.kickoff_async` | `crewai.flow.name`, `crewai.flow.id` |
| `execute flow method` | `Flow._execute_method` | `crewai.flow.name`, `crewai.flow.method` |
| `resume flow` | `Flow._resume_async_body` | `crewai.flow.name`, `crewai.flow.id` |
| `remember memory` | `UnifiedMemory.remember` | `crewai.memory.source_type` |
| `recall memory` | `UnifiedMemory.recall` | `crewai.memory.source_type`, `crewai.memory.depth` |
| `query knowledge` | `Knowledge.query` / `Knowledge.aquery` | `crewai.knowledge.sources` |
| `a2a delegate` | `aexecute_a2a_delegation` | `crewai.a2a.endpoint`, `crewai.a2a.is_multiturn`, `crewai.a2a.turn_number` |
| `agent reason` | `ReasoningHandler.handle_agent_reasoning` | `crewai.agent.role`, `crewai.task.id` |
| `guard llm` | `LLMGuardrail.__call__` | `crewai.guardrail.type` |
Spans nest naturally — a `call tool` span sits inside its `execute agent`
parent, which sits inside `execute task`, which sits inside `execute crew`.
## Correlating logs with traces
Because crewAI uses the OpenTelemetry API everywhere, any
`logging.getLogger(...)` call made inside an active crewAI span will
automatically inherit the active `trace_id` and `span_id` once you attach
the OTel `LoggingHandler` to the root logger:
```python
import logging
from opentelemetry.sdk._logs import LoggerProvider, LoggingHandler
from opentelemetry.sdk._logs.export import BatchLogRecordProcessor
from opentelemetry.exporter.otlp.proto.http._log_exporter import OTLPLogExporter
log_provider = LoggerProvider()
log_provider.add_log_record_processor(BatchLogRecordProcessor(OTLPLogExporter()))
logging.getLogger().addHandler(LoggingHandler(level=logging.INFO, logger_provider=log_provider))
```
Now every log line emitted while a span is active carries the span's
identifiers, letting you jump from a trace to its logs (and back) in
your observability backend.
## Sampler configuration
`TracerProvider` defaults to sampling every span. For production workloads
you'll usually want head sampling. The most common choices:
```python
from opentelemetry.sdk.trace.sampling import ParentBased, TraceIdRatioBased
# Sample 10% of root traces, but always inherit the parent's decision so a
# downstream service can force-sample its callers.
sampler = ParentBased(root=TraceIdRatioBased(0.1))
provider = TracerProvider(sampler=sampler)
```
```python
# "Always sample errors": let your application escalate sampling for
# specific traces by setting `trace.get_current_span().set_attribute(...)`
# and pairing TraceIdRatioBased with a custom sampler that promotes a
# trace to "RECORD_AND_SAMPLE" when an error attribute is set.
```
For testing, swap in `ALWAYS_ON` or `ALWAYS_OFF`:
```python
from opentelemetry.sdk.trace.sampling import ALWAYS_ON
provider = TracerProvider(sampler=ALWAYS_ON)
```
## Adding custom attributes
You can enrich crewAI spans from anywhere in user code (a tool, a
callback, a custom Flow method) using the standard OpenTelemetry API:
```python
from opentelemetry import trace
def my_tool(...):
span = trace.get_current_span()
span.set_attribute("myapp.tenant_id", tenant_id)
span.set_attribute("myapp.request_priority", "high")
...
```
These attributes attach to whichever crewAI span is currently active
(usually the surrounding `call tool` span).
## Disabling
There are two equally valid ways to disable instrumentation:
- **Do not install a `TracerProvider`.** Spans become no-ops with
near-zero cost.
- **Install a sampler that always returns "drop".** Useful when you have
one provider you want to keep around for other services:
```python
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.sampling import ALWAYS_OFF
provider = TracerProvider(sampler=ALWAYS_OFF)
trace.set_tracer_provider(provider)
```
You can also set `OTEL_SDK_DISABLED=true` in the environment — the SDK
honors it and returns no-op tracers regardless of what you configure.
## Continuity across HITL resume
When a `Flow` resumes after a Human-in-the-Loop pause, the resumed trace
is causally related to the paused trace but not in a parent/child
relationship. crewAI exposes a `follows_from` helper for this:
```python
from crewai.telemetry.otel import follows_from, operation
with operation("resume flow", links=[follows_from(prev_trace_id, prev_span_id)]):
...
```
The link carries the `crewai.link.type = "follows_from"` attribute so
downstream tooling can render it as a causal-but-not-parent edge.