mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-04-14 15:02:37 +00:00
- Rewrite TUI with Tree widget showing branch/fork lineage - Add Resume and Fork buttons in detail panel with Collapsible entities - Show branch and parent_id in detail panel and CLI info output - Auto-detect .checkpoints.db when default dir missing - Append .db to location for SqliteProvider when no extension set - Fix RuntimeState.from_checkpoint not setting provider/location - Fork now writes initial checkpoint on new branch - Add from_checkpoint, fork, and CLI docs to checkpointing.mdx
306 lines
9.3 KiB
Plaintext
306 lines
9.3 KiB
Plaintext
---
|
|
title: Checkpointing
|
|
description: Automatically save execution state so crews, flows, and agents can resume after failures.
|
|
icon: floppy-disk
|
|
mode: "wide"
|
|
---
|
|
|
|
<Warning>
|
|
Checkpointing is in early release. APIs may change in future versions.
|
|
</Warning>
|
|
|
|
## Overview
|
|
|
|
Checkpointing automatically saves execution state during a run. If a crew, flow, or agent fails mid-execution, you can restore from the last checkpoint and resume without re-running completed work.
|
|
|
|
## Quick Start
|
|
|
|
```python
|
|
from crewai import Crew, CheckpointConfig
|
|
|
|
crew = Crew(
|
|
agents=[...],
|
|
tasks=[...],
|
|
checkpoint=True, # uses defaults: ./.checkpoints, on task_completed
|
|
)
|
|
result = crew.kickoff()
|
|
```
|
|
|
|
Checkpoint files are written to `./.checkpoints/` after each completed task.
|
|
|
|
## Configuration
|
|
|
|
Use `CheckpointConfig` for full control:
|
|
|
|
```python
|
|
from crewai import Crew, CheckpointConfig
|
|
|
|
crew = Crew(
|
|
agents=[...],
|
|
tasks=[...],
|
|
checkpoint=CheckpointConfig(
|
|
location="./my_checkpoints",
|
|
on_events=["task_completed", "crew_kickoff_completed"],
|
|
max_checkpoints=5,
|
|
),
|
|
)
|
|
```
|
|
|
|
### CheckpointConfig Fields
|
|
|
|
| Field | Type | Default | Description |
|
|
|:------|:-----|:--------|:------------|
|
|
| `location` | `str` | `"./.checkpoints"` | Storage destination — a directory for `JsonProvider`, a database file path for `SqliteProvider` |
|
|
| `on_events` | `list[str]` | `["task_completed"]` | Event types that trigger a checkpoint |
|
|
| `provider` | `BaseProvider` | `JsonProvider()` | Storage backend |
|
|
| `max_checkpoints` | `int \| None` | `None` | Max checkpoints to keep. Oldest are pruned after each write. Pruning is handled by the provider. |
|
|
| `restore_from` | `Path \| str \| None` | `None` | Path to a checkpoint to restore from. Used when passing config via a kickoff method's `from_checkpoint` parameter. |
|
|
|
|
### Inheritance and Opt-Out
|
|
|
|
The `checkpoint` field on Crew, Flow, and Agent accepts `CheckpointConfig`, `True`, `False`, or `None`:
|
|
|
|
| Value | Behavior |
|
|
|:------|:---------|
|
|
| `None` (default) | Inherit from parent. An agent inherits its crew's config. |
|
|
| `True` | Enable with defaults. |
|
|
| `False` | Explicit opt-out. Stops inheritance from parent. |
|
|
| `CheckpointConfig(...)` | Custom configuration. |
|
|
|
|
```python
|
|
crew = Crew(
|
|
agents=[
|
|
Agent(role="Researcher", ...), # inherits crew's checkpoint
|
|
Agent(role="Writer", ..., checkpoint=False), # opted out, no checkpoints
|
|
],
|
|
tasks=[...],
|
|
checkpoint=True,
|
|
)
|
|
```
|
|
|
|
## Resuming from a Checkpoint
|
|
|
|
Pass a `CheckpointConfig` with `restore_from` to any kickoff method. The crew restores from that checkpoint, skips completed tasks, and resumes.
|
|
|
|
```python
|
|
from crewai import Crew, CheckpointConfig
|
|
|
|
crew = Crew(agents=[...], tasks=[...])
|
|
result = crew.kickoff(
|
|
from_checkpoint=CheckpointConfig(
|
|
restore_from="./my_checkpoints/20260407T120000_abc123.json",
|
|
),
|
|
)
|
|
```
|
|
|
|
Remaining `CheckpointConfig` fields apply to the new run, so checkpointing continues after the restore.
|
|
|
|
You can also use the classmethod directly:
|
|
|
|
```python
|
|
config = CheckpointConfig(restore_from="./my_checkpoints/20260407T120000_abc123.json")
|
|
crew = Crew.from_checkpoint(config)
|
|
result = crew.kickoff()
|
|
```
|
|
|
|
## Forking from a Checkpoint
|
|
|
|
`fork()` restores a checkpoint and starts a new execution branch. Useful for exploring alternative paths from the same point.
|
|
|
|
```python
|
|
from crewai import Crew, CheckpointConfig
|
|
|
|
config = CheckpointConfig(restore_from="./my_checkpoints/20260407T120000_abc123.json")
|
|
crew = Crew.fork(config, branch="experiment-a")
|
|
result = crew.kickoff(inputs={"strategy": "aggressive"})
|
|
```
|
|
|
|
Each fork gets a unique lineage ID so checkpoints from different branches don't collide. The `branch` label is optional and auto-generated if omitted.
|
|
|
|
## Works on Crew, Flow, and Agent
|
|
|
|
### Crew
|
|
|
|
```python
|
|
crew = Crew(
|
|
agents=[researcher, writer],
|
|
tasks=[research_task, write_task, review_task],
|
|
checkpoint=CheckpointConfig(location="./crew_cp"),
|
|
)
|
|
```
|
|
|
|
Default trigger: `task_completed` (one checkpoint per finished task).
|
|
|
|
### Flow
|
|
|
|
```python
|
|
from crewai.flow.flow import Flow, start, listen
|
|
from crewai import CheckpointConfig
|
|
|
|
class MyFlow(Flow):
|
|
@start()
|
|
def step_one(self):
|
|
return "data"
|
|
|
|
@listen(step_one)
|
|
def step_two(self, data):
|
|
return process(data)
|
|
|
|
flow = MyFlow(
|
|
checkpoint=CheckpointConfig(
|
|
location="./flow_cp",
|
|
on_events=["method_execution_finished"],
|
|
),
|
|
)
|
|
result = flow.kickoff()
|
|
|
|
# Resume
|
|
config = CheckpointConfig(restore_from="./flow_cp/20260407T120000_abc123.json")
|
|
flow = MyFlow.from_checkpoint(config)
|
|
result = flow.kickoff()
|
|
```
|
|
|
|
### Agent
|
|
|
|
```python
|
|
agent = Agent(
|
|
role="Researcher",
|
|
goal="Research topics",
|
|
backstory="Expert researcher",
|
|
checkpoint=CheckpointConfig(
|
|
location="./agent_cp",
|
|
on_events=["lite_agent_execution_completed"],
|
|
),
|
|
)
|
|
result = agent.kickoff(messages=[{"role": "user", "content": "Research AI trends"}])
|
|
```
|
|
|
|
## Storage Providers
|
|
|
|
CrewAI ships with two checkpoint storage providers.
|
|
|
|
### JsonProvider (default)
|
|
|
|
Writes each checkpoint as a separate JSON file. Simple, human-readable, easy to inspect.
|
|
|
|
```python
|
|
from crewai import Crew, CheckpointConfig
|
|
from crewai.state import JsonProvider
|
|
|
|
crew = Crew(
|
|
agents=[...],
|
|
tasks=[...],
|
|
checkpoint=CheckpointConfig(
|
|
location="./my_checkpoints",
|
|
provider=JsonProvider(), # this is the default
|
|
max_checkpoints=5, # prunes oldest files
|
|
),
|
|
)
|
|
```
|
|
|
|
Files are named `<timestamp>_<uuid>.json` inside the location directory.
|
|
|
|
### SqliteProvider
|
|
|
|
Stores all checkpoints in a single SQLite database file. Better for high-frequency checkpointing and avoids many small files.
|
|
|
|
```python
|
|
from crewai import Crew, CheckpointConfig
|
|
from crewai.state import SqliteProvider
|
|
|
|
crew = Crew(
|
|
agents=[...],
|
|
tasks=[...],
|
|
checkpoint=CheckpointConfig(
|
|
location="./.checkpoints.db",
|
|
provider=SqliteProvider(),
|
|
max_checkpoints=50,
|
|
),
|
|
)
|
|
```
|
|
|
|
WAL journal mode is enabled for concurrent read access.
|
|
|
|
## Event Types
|
|
|
|
The `on_events` field accepts any combination of event type strings. Common choices:
|
|
|
|
| Use Case | Events |
|
|
|:---------|:-------|
|
|
| After each task (Crew) | `["task_completed"]` |
|
|
| After each flow method | `["method_execution_finished"]` |
|
|
| After agent execution | `["agent_execution_completed"]`, `["lite_agent_execution_completed"]` |
|
|
| On crew completion only | `["crew_kickoff_completed"]` |
|
|
| After every LLM call | `["llm_call_completed"]` |
|
|
| On everything | `["*"]` |
|
|
|
|
<Warning>
|
|
Using `["*"]` or high-frequency events like `llm_call_completed` will write many checkpoint files and may impact performance. Use `max_checkpoints` to limit disk usage.
|
|
</Warning>
|
|
|
|
## Manual Checkpointing
|
|
|
|
For full control, register your own event handler and call `state.checkpoint()` directly:
|
|
|
|
```python
|
|
from crewai.events.event_bus import crewai_event_bus
|
|
from crewai.events.types.llm_events import LLMCallCompletedEvent
|
|
|
|
# Sync handler
|
|
@crewai_event_bus.on(LLMCallCompletedEvent)
|
|
def on_llm_done(source, event, state):
|
|
path = state.checkpoint("./my_checkpoints")
|
|
print(f"Saved checkpoint: {path}")
|
|
|
|
# Async handler
|
|
@crewai_event_bus.on(LLMCallCompletedEvent)
|
|
async def on_llm_done_async(source, event, state):
|
|
path = await state.acheckpoint("./my_checkpoints")
|
|
print(f"Saved checkpoint: {path}")
|
|
```
|
|
|
|
The `state` argument is the `RuntimeState` passed automatically by the event bus when your handler accepts 3 parameters. You can register handlers on any event type listed in the [Event Listeners](/en/concepts/event-listener) documentation.
|
|
|
|
Checkpointing is best-effort: if a checkpoint write fails, the error is logged but execution continues uninterrupted.
|
|
|
|
## CLI
|
|
|
|
The `crewai checkpoint` command gives you a TUI for browsing, inspecting, resuming, and forking checkpoints. It auto-detects whether your checkpoints are JSON files or a SQLite database.
|
|
|
|
```bash
|
|
# Launch the TUI — auto-detects .checkpoints/ or .checkpoints.db
|
|
crewai checkpoint
|
|
|
|
# Point at a specific location
|
|
crewai checkpoint --location ./my_checkpoints
|
|
crewai checkpoint --location ./.checkpoints.db
|
|
```
|
|
|
|
<Frame>
|
|
<img src="/images/checkpointing.png" alt="Checkpoint TUI" />
|
|
</Frame>
|
|
|
|
The left panel is a tree view. Checkpoints are grouped by branch, and forks nest under the checkpoint they diverged from. Select a checkpoint to see its metadata, entity state, and task progress in the detail panel. Hit **Resume** to pick up where it left off, or **Fork** to start a new branch from that point.
|
|
|
|
### Editing inputs and task outputs
|
|
|
|
When a checkpoint is selected, the detail panel shows:
|
|
|
|
- **Inputs** — if the original kickoff had inputs (e.g. `{topic}`), they appear as editable fields pre-filled with the original values. Change them before resuming or forking.
|
|
- **Task outputs** — completed tasks show their output in editable text areas. Edit a task's output to change the context that downstream tasks receive. When you modify a task output and hit Fork, all subsequent tasks are invalidated and re-run with the new context.
|
|
|
|
This is useful for "what if" exploration — fork from a checkpoint, tweak a task's result, and see how it changes downstream behavior.
|
|
|
|
### Subcommands
|
|
|
|
```bash
|
|
# List all checkpoints
|
|
crewai checkpoint list ./my_checkpoints
|
|
|
|
# Inspect a specific checkpoint
|
|
crewai checkpoint info ./my_checkpoints/20260407T120000_abc123.json
|
|
|
|
# Inspect latest in a SQLite database
|
|
crewai checkpoint info ./.checkpoints.db
|
|
```
|