feat: add CheckpointConfig for automatic checkpointing

This commit is contained in:
Greyson LaLonde
2026-04-07 05:34:25 +08:00
committed by GitHub
parent 86ce54fc82
commit c4e2d7ea3b
13 changed files with 2113 additions and 775 deletions

View File

@@ -0,0 +1,187 @@
---
title: Checkpointing
description: حفظ حالة التنفيذ تلقائيا حتى تتمكن الطواقم والتدفقات والوكلاء من الاستئناف بعد الفشل.
icon: floppy-disk
mode: "wide"
---
<Warning>
الـ Checkpointing في اصدار مبكر. قد تتغير واجهات البرمجة في الاصدارات المستقبلية.
</Warning>
## نظرة عامة
يقوم الـ Checkpointing بحفظ حالة التنفيذ تلقائيا اثناء التشغيل. اذا فشل طاقم او تدفق او وكيل اثناء التنفيذ، يمكنك الاستعادة من اخر نقطة حفظ والاستئناف دون اعادة تنفيذ العمل المكتمل.
## البداية السريعة
```python
from crewai import Crew, CheckpointConfig
crew = Crew(
agents=[...],
tasks=[...],
checkpoint=True, # يستخدم الافتراضيات: ./.checkpoints, عند task_completed
)
result = crew.kickoff()
```
تتم كتابة ملفات نقاط الحفظ في `./.checkpoints/` بعد اكتمال كل مهمة.
## التكوين
استخدم `CheckpointConfig` للتحكم الكامل:
```python
from crewai import Crew, CheckpointConfig
crew = Crew(
agents=[...],
tasks=[...],
checkpoint=CheckpointConfig(
directory="./my_checkpoints",
on_events=["task_completed", "crew_kickoff_completed"],
max_checkpoints=5,
),
)
```
### حقول CheckpointConfig
| الحقل | النوع | الافتراضي | الوصف |
|:------|:------|:----------|:------|
| `directory` | `str` | `"./.checkpoints"` | مسار ملفات نقاط الحفظ |
| `on_events` | `list[str]` | `["task_completed"]` | انواع الاحداث التي تطلق نقطة حفظ |
| `provider` | `BaseProvider` | `JsonProvider()` | واجهة التخزين |
| `max_checkpoints` | `int \| None` | `None` | الحد الاقصى للملفات؛ يتم حذف الاقدم اولا |
### الوراثة والانسحاب
يقبل حقل `checkpoint` في Crew و Flow و Agent قيم `CheckpointConfig` او `True` او `False` او `None`:
| القيمة | السلوك |
|:-------|:-------|
| `None` (افتراضي) | يرث من الاصل. الوكيل يرث اعدادات الطاقم. |
| `True` | تفعيل بالاعدادات الافتراضية. |
| `False` | انسحاب صريح. يوقف الوراثة من الاصل. |
| `CheckpointConfig(...)` | اعدادات مخصصة. |
```python
crew = Crew(
agents=[
Agent(role="Researcher", ...), # يرث checkpoint من الطاقم
Agent(role="Writer", ..., checkpoint=False), # منسحب، بدون نقاط حفظ
],
tasks=[...],
checkpoint=True,
)
```
## الاستئناف من نقطة حفظ
```python
# استعادة واستئناف
crew = Crew.from_checkpoint("./my_checkpoints/20260407T120000_abc123.json")
result = crew.kickoff() # يستأنف من اخر مهمة مكتملة
```
يتخطى الطاقم المستعاد المهام المكتملة ويستأنف من اول مهمة غير مكتملة.
## يعمل على Crew و Flow و Agent
### Crew
```python
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, write_task, review_task],
checkpoint=CheckpointConfig(directory="./crew_cp"),
)
```
المشغل الافتراضي: `task_completed` (نقطة حفظ واحدة لكل مهمة مكتملة).
### Flow
```python
from crewai.flow.flow import Flow, start, listen
from crewai import CheckpointConfig
class MyFlow(Flow):
@start()
def step_one(self):
return "data"
@listen(step_one)
def step_two(self, data):
return process(data)
flow = MyFlow(
checkpoint=CheckpointConfig(
directory="./flow_cp",
on_events=["method_execution_finished"],
),
)
result = flow.kickoff()
# استئناف
flow = MyFlow.from_checkpoint("./flow_cp/20260407T120000_abc123.json")
result = flow.kickoff()
```
### Agent
```python
agent = Agent(
role="Researcher",
goal="Research topics",
backstory="Expert researcher",
checkpoint=CheckpointConfig(
directory="./agent_cp",
on_events=["lite_agent_execution_completed"],
),
)
result = agent.kickoff(messages=[{"role": "user", "content": "Research AI trends"}])
```
## انواع الاحداث
يقبل حقل `on_events` اي مجموعة من سلاسل انواع الاحداث. الخيارات الشائعة:
| حالة الاستخدام | الاحداث |
|:---------------|:--------|
| بعد كل مهمة (Crew) | `["task_completed"]` |
| بعد كل طريقة في التدفق | `["method_execution_finished"]` |
| بعد تنفيذ الوكيل | `["agent_execution_completed"]`, `["lite_agent_execution_completed"]` |
| عند اكتمال الطاقم فقط | `["crew_kickoff_completed"]` |
| بعد كل استدعاء LLM | `["llm_call_completed"]` |
| على كل شيء | `["*"]` |
<Warning>
استخدام `["*"]` او احداث عالية التردد مثل `llm_call_completed` سيكتب العديد من ملفات نقاط الحفظ وقد يؤثر على الاداء. استخدم `max_checkpoints` للحد من استخدام المساحة.
</Warning>
## نقاط الحفظ اليدوية
للتحكم الكامل، سجل معالج الاحداث الخاص بك واستدع `state.checkpoint()` مباشرة:
```python
from crewai.events.event_bus import crewai_event_bus
from crewai.events.types.llm_events import LLMCallCompletedEvent
# معالج متزامن
@crewai_event_bus.on(LLMCallCompletedEvent)
def on_llm_done(source, event, state):
path = state.checkpoint("./my_checkpoints")
print(f"تم حفظ نقطة الحفظ: {path}")
# معالج غير متزامن
@crewai_event_bus.on(LLMCallCompletedEvent)
async def on_llm_done_async(source, event, state):
path = await state.acheckpoint("./my_checkpoints")
print(f"تم حفظ نقطة الحفظ: {path}")
```
وسيط `state` هو `RuntimeState` الذي يتم تمريره تلقائيا بواسطة ناقل الاحداث عندما يقبل المعالج 3 معاملات. يمكنك تسجيل معالجات على اي نوع حدث مدرج في وثائق [Event Listeners](/ar/concepts/event-listener).
الـ Checkpointing يعمل بافضل جهد: اذا فشلت كتابة نقطة حفظ، يتم تسجيل الخطأ ولكن التنفيذ يستمر دون انقطاع.

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,187 @@
---
title: Checkpointing
description: Automatically save execution state so crews, flows, and agents can resume after failures.
icon: floppy-disk
mode: "wide"
---
<Warning>
Checkpointing is in early release. APIs may change in future versions.
</Warning>
## Overview
Checkpointing automatically saves execution state during a run. If a crew, flow, or agent fails mid-execution, you can restore from the last checkpoint and resume without re-running completed work.
## Quick Start
```python
from crewai import Crew, CheckpointConfig
crew = Crew(
agents=[...],
tasks=[...],
checkpoint=True, # uses defaults: ./.checkpoints, on task_completed
)
result = crew.kickoff()
```
Checkpoint files are written to `./.checkpoints/` after each completed task.
## Configuration
Use `CheckpointConfig` for full control:
```python
from crewai import Crew, CheckpointConfig
crew = Crew(
agents=[...],
tasks=[...],
checkpoint=CheckpointConfig(
directory="./my_checkpoints",
on_events=["task_completed", "crew_kickoff_completed"],
max_checkpoints=5,
),
)
```
### CheckpointConfig Fields
| Field | Type | Default | Description |
|:------|:-----|:--------|:------------|
| `directory` | `str` | `"./.checkpoints"` | Filesystem path for checkpoint files |
| `on_events` | `list[str]` | `["task_completed"]` | Event types that trigger a checkpoint |
| `provider` | `BaseProvider` | `JsonProvider()` | Storage backend |
| `max_checkpoints` | `int \| None` | `None` | Max files to keep; oldest pruned first |
### Inheritance and Opt-Out
The `checkpoint` field on Crew, Flow, and Agent accepts `CheckpointConfig`, `True`, `False`, or `None`:
| Value | Behavior |
|:------|:---------|
| `None` (default) | Inherit from parent. An agent inherits its crew's config. |
| `True` | Enable with defaults. |
| `False` | Explicit opt-out. Stops inheritance from parent. |
| `CheckpointConfig(...)` | Custom configuration. |
```python
crew = Crew(
agents=[
Agent(role="Researcher", ...), # inherits crew's checkpoint
Agent(role="Writer", ..., checkpoint=False), # opted out, no checkpoints
],
tasks=[...],
checkpoint=True,
)
```
## Resuming from a Checkpoint
```python
# Restore and resume
crew = Crew.from_checkpoint("./my_checkpoints/20260407T120000_abc123.json")
result = crew.kickoff() # picks up from last completed task
```
The restored crew skips already-completed tasks and resumes from the first incomplete one.
## Works on Crew, Flow, and Agent
### Crew
```python
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, write_task, review_task],
checkpoint=CheckpointConfig(directory="./crew_cp"),
)
```
Default trigger: `task_completed` (one checkpoint per finished task).
### Flow
```python
from crewai.flow.flow import Flow, start, listen
from crewai import CheckpointConfig
class MyFlow(Flow):
@start()
def step_one(self):
return "data"
@listen(step_one)
def step_two(self, data):
return process(data)
flow = MyFlow(
checkpoint=CheckpointConfig(
directory="./flow_cp",
on_events=["method_execution_finished"],
),
)
result = flow.kickoff()
# Resume
flow = MyFlow.from_checkpoint("./flow_cp/20260407T120000_abc123.json")
result = flow.kickoff()
```
### Agent
```python
agent = Agent(
role="Researcher",
goal="Research topics",
backstory="Expert researcher",
checkpoint=CheckpointConfig(
directory="./agent_cp",
on_events=["lite_agent_execution_completed"],
),
)
result = agent.kickoff(messages=[{"role": "user", "content": "Research AI trends"}])
```
## Event Types
The `on_events` field accepts any combination of event type strings. Common choices:
| Use Case | Events |
|:---------|:-------|
| After each task (Crew) | `["task_completed"]` |
| After each flow method | `["method_execution_finished"]` |
| After agent execution | `["agent_execution_completed"]`, `["lite_agent_execution_completed"]` |
| On crew completion only | `["crew_kickoff_completed"]` |
| After every LLM call | `["llm_call_completed"]` |
| On everything | `["*"]` |
<Warning>
Using `["*"]` or high-frequency events like `llm_call_completed` will write many checkpoint files and may impact performance. Use `max_checkpoints` to limit disk usage.
</Warning>
## Manual Checkpointing
For full control, register your own event handler and call `state.checkpoint()` directly:
```python
from crewai.events.event_bus import crewai_event_bus
from crewai.events.types.llm_events import LLMCallCompletedEvent
# Sync handler
@crewai_event_bus.on(LLMCallCompletedEvent)
def on_llm_done(source, event, state):
path = state.checkpoint("./my_checkpoints")
print(f"Saved checkpoint: {path}")
# Async handler
@crewai_event_bus.on(LLMCallCompletedEvent)
async def on_llm_done_async(source, event, state):
path = await state.acheckpoint("./my_checkpoints")
print(f"Saved checkpoint: {path}")
```
The `state` argument is the `RuntimeState` passed automatically by the event bus when your handler accepts 3 parameters. You can register handlers on any event type listed in the [Event Listeners](/en/concepts/event-listener) documentation.
Checkpointing is best-effort: if a checkpoint write fails, the error is logged but execution continues uninterrupted.

View File

@@ -0,0 +1,187 @@
---
title: Checkpointing
description: 실행 상태를 자동으로 저장하여 크루, 플로우, 에이전트가 실패 후 재개할 수 있습니다.
icon: floppy-disk
mode: "wide"
---
<Warning>
체크포인팅은 초기 릴리스 단계입니다. API는 향후 버전에서 변경될 수 있습니다.
</Warning>
## 개요
체크포인팅은 실행 중 자동으로 실행 상태를 저장합니다. 크루, 플로우 또는 에이전트가 실행 도중 실패하면 마지막 체크포인트에서 복원하여 이미 완료된 작업을 다시 실행하지 않고 재개할 수 있습니다.
## 빠른 시작
```python
from crewai import Crew, CheckpointConfig
crew = Crew(
agents=[...],
tasks=[...],
checkpoint=True, # 기본값 사용: ./.checkpoints, task_completed 이벤트
)
result = crew.kickoff()
```
각 태스크가 완료된 후 `./.checkpoints/`에 체크포인트 파일이 기록됩니다.
## 설정
`CheckpointConfig`를 사용하여 세부 설정을 제어합니다:
```python
from crewai import Crew, CheckpointConfig
crew = Crew(
agents=[...],
tasks=[...],
checkpoint=CheckpointConfig(
directory="./my_checkpoints",
on_events=["task_completed", "crew_kickoff_completed"],
max_checkpoints=5,
),
)
```
### CheckpointConfig 필드
| 필드 | 타입 | 기본값 | 설명 |
|:-----|:-----|:-------|:-----|
| `directory` | `str` | `"./.checkpoints"` | 체크포인트 파일 경로 |
| `on_events` | `list[str]` | `["task_completed"]` | 체크포인트를 트리거하는 이벤트 타입 |
| `provider` | `BaseProvider` | `JsonProvider()` | 스토리지 백엔드 |
| `max_checkpoints` | `int \| None` | `None` | 보관할 최대 파일 수; 오래된 것부터 삭제 |
### 상속 및 옵트아웃
Crew, Flow, Agent의 `checkpoint` 필드는 `CheckpointConfig`, `True`, `False`, `None`을 받습니다:
| 값 | 동작 |
|:---|:-----|
| `None` (기본값) | 부모에서 상속. 에이전트는 크루의 설정을 상속합니다. |
| `True` | 기본값으로 활성화. |
| `False` | 명시적 옵트아웃. 부모 상속을 중단합니다. |
| `CheckpointConfig(...)` | 사용자 정의 설정. |
```python
crew = Crew(
agents=[
Agent(role="Researcher", ...), # 크루의 checkpoint 상속
Agent(role="Writer", ..., checkpoint=False), # 옵트아웃, 체크포인트 없음
],
tasks=[...],
checkpoint=True,
)
```
## 체크포인트에서 재개
```python
# 복원 및 재개
crew = Crew.from_checkpoint("./my_checkpoints/20260407T120000_abc123.json")
result = crew.kickoff() # 마지막으로 완료된 태스크부터 재개
```
복원된 크루는 이미 완료된 태스크를 건너뛰고 첫 번째 미완료 태스크부터 재개합니다.
## Crew, Flow, Agent에서 사용 가능
### Crew
```python
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, write_task, review_task],
checkpoint=CheckpointConfig(directory="./crew_cp"),
)
```
기본 트리거: `task_completed` (완료된 태스크당 하나의 체크포인트).
### Flow
```python
from crewai.flow.flow import Flow, start, listen
from crewai import CheckpointConfig
class MyFlow(Flow):
@start()
def step_one(self):
return "data"
@listen(step_one)
def step_two(self, data):
return process(data)
flow = MyFlow(
checkpoint=CheckpointConfig(
directory="./flow_cp",
on_events=["method_execution_finished"],
),
)
result = flow.kickoff()
# 재개
flow = MyFlow.from_checkpoint("./flow_cp/20260407T120000_abc123.json")
result = flow.kickoff()
```
### Agent
```python
agent = Agent(
role="Researcher",
goal="Research topics",
backstory="Expert researcher",
checkpoint=CheckpointConfig(
directory="./agent_cp",
on_events=["lite_agent_execution_completed"],
),
)
result = agent.kickoff(messages=[{"role": "user", "content": "Research AI trends"}])
```
## 이벤트 타입
`on_events` 필드는 이벤트 타입 문자열의 조합을 받습니다. 일반적인 선택:
| 사용 사례 | 이벤트 |
|:----------|:-------|
| 각 태스크 완료 후 (Crew) | `["task_completed"]` |
| 각 플로우 메서드 완료 후 | `["method_execution_finished"]` |
| 에이전트 실행 완료 후 | `["agent_execution_completed"]`, `["lite_agent_execution_completed"]` |
| 크루 완료 시에만 | `["crew_kickoff_completed"]` |
| 모든 LLM 호출 후 | `["llm_call_completed"]` |
| 모든 이벤트 | `["*"]` |
<Warning>
`["*"]` 또는 `llm_call_completed`와 같은 고빈도 이벤트를 사용하면 많은 체크포인트 파일이 생성되어 성능에 영향을 줄 수 있습니다. `max_checkpoints`를 사용하여 디스크 사용량을 제한하세요.
</Warning>
## 수동 체크포인팅
완전한 제어를 위해 자체 이벤트 핸들러를 등록하고 `state.checkpoint()`를 직접 호출할 수 있습니다:
```python
from crewai.events.event_bus import crewai_event_bus
from crewai.events.types.llm_events import LLMCallCompletedEvent
# 동기 핸들러
@crewai_event_bus.on(LLMCallCompletedEvent)
def on_llm_done(source, event, state):
path = state.checkpoint("./my_checkpoints")
print(f"체크포인트 저장: {path}")
# 비동기 핸들러
@crewai_event_bus.on(LLMCallCompletedEvent)
async def on_llm_done_async(source, event, state):
path = await state.acheckpoint("./my_checkpoints")
print(f"체크포인트 저장: {path}")
```
`state` 인수는 핸들러가 3개의 매개변수를 받을 때 이벤트 버스가 자동으로 전달하는 `RuntimeState`입니다. [Event Listeners](/ko/concepts/event-listener) 문서에 나열된 모든 이벤트 타입에 핸들러를 등록할 수 있습니다.
체크포인팅은 best-effort입니다: 체크포인트 기록이 실패하면 오류가 로그에 기록되지만 실행은 중단 없이 계속됩니다.

View File

@@ -0,0 +1,187 @@
---
title: Checkpointing
description: Salve automaticamente o estado de execucao para que crews, flows e agentes possam retomar apos falhas.
icon: floppy-disk
mode: "wide"
---
<Warning>
O checkpointing esta em versao inicial. As APIs podem mudar em versoes futuras.
</Warning>
## Visao Geral
O checkpointing salva automaticamente o estado de execucao durante uma execucao. Se uma crew, flow ou agente falhar no meio da execucao, voce pode restaurar a partir do ultimo checkpoint e retomar sem reexecutar o trabalho ja concluido.
## Inicio Rapido
```python
from crewai import Crew, CheckpointConfig
crew = Crew(
agents=[...],
tasks=[...],
checkpoint=True, # usa padroes: ./.checkpoints, em task_completed
)
result = crew.kickoff()
```
Os arquivos de checkpoint sao gravados em `./.checkpoints/` apos cada tarefa concluida.
## Configuracao
Use `CheckpointConfig` para controle total:
```python
from crewai import Crew, CheckpointConfig
crew = Crew(
agents=[...],
tasks=[...],
checkpoint=CheckpointConfig(
directory="./my_checkpoints",
on_events=["task_completed", "crew_kickoff_completed"],
max_checkpoints=5,
),
)
```
### Campos do CheckpointConfig
| Campo | Tipo | Padrao | Descricao |
|:------|:-----|:-------|:----------|
| `directory` | `str` | `"./.checkpoints"` | Caminho para os arquivos de checkpoint |
| `on_events` | `list[str]` | `["task_completed"]` | Tipos de evento que acionam um checkpoint |
| `provider` | `BaseProvider` | `JsonProvider()` | Backend de armazenamento |
| `max_checkpoints` | `int \| None` | `None` | Maximo de arquivos a manter; os mais antigos sao removidos primeiro |
### Heranca e Desativacao
O campo `checkpoint` em Crew, Flow e Agent aceita `CheckpointConfig`, `True`, `False` ou `None`:
| Valor | Comportamento |
|:------|:--------------|
| `None` (padrao) | Herda do pai. Um agente herda a configuracao da crew. |
| `True` | Ativa com padroes. |
| `False` | Desativacao explicita. Interrompe a heranca do pai. |
| `CheckpointConfig(...)` | Configuracao personalizada. |
```python
crew = Crew(
agents=[
Agent(role="Researcher", ...), # herda checkpoint da crew
Agent(role="Writer", ..., checkpoint=False), # desativado, sem checkpoints
],
tasks=[...],
checkpoint=True,
)
```
## Retomando a partir de um Checkpoint
```python
# Restaurar e retomar
crew = Crew.from_checkpoint("./my_checkpoints/20260407T120000_abc123.json")
result = crew.kickoff() # retoma a partir da ultima tarefa concluida
```
A crew restaurada pula tarefas ja concluidas e retoma a partir da primeira incompleta.
## Funciona em Crew, Flow e Agent
### Crew
```python
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, write_task, review_task],
checkpoint=CheckpointConfig(directory="./crew_cp"),
)
```
Gatilho padrao: `task_completed` (um checkpoint por tarefa finalizada).
### Flow
```python
from crewai.flow.flow import Flow, start, listen
from crewai import CheckpointConfig
class MyFlow(Flow):
@start()
def step_one(self):
return "data"
@listen(step_one)
def step_two(self, data):
return process(data)
flow = MyFlow(
checkpoint=CheckpointConfig(
directory="./flow_cp",
on_events=["method_execution_finished"],
),
)
result = flow.kickoff()
# Retomar
flow = MyFlow.from_checkpoint("./flow_cp/20260407T120000_abc123.json")
result = flow.kickoff()
```
### Agent
```python
agent = Agent(
role="Researcher",
goal="Research topics",
backstory="Expert researcher",
checkpoint=CheckpointConfig(
directory="./agent_cp",
on_events=["lite_agent_execution_completed"],
),
)
result = agent.kickoff(messages=[{"role": "user", "content": "Research AI trends"}])
```
## Tipos de Evento
O campo `on_events` aceita qualquer combinacao de strings de tipo de evento. Escolhas comuns:
| Caso de Uso | Eventos |
|:------------|:--------|
| Apos cada tarefa (Crew) | `["task_completed"]` |
| Apos cada metodo do flow | `["method_execution_finished"]` |
| Apos execucao do agente | `["agent_execution_completed"]`, `["lite_agent_execution_completed"]` |
| Apenas na conclusao da crew | `["crew_kickoff_completed"]` |
| Apos cada chamada LLM | `["llm_call_completed"]` |
| Em tudo | `["*"]` |
<Warning>
Usar `["*"]` ou eventos de alta frequencia como `llm_call_completed` gravara muitos arquivos de checkpoint e pode impactar o desempenho. Use `max_checkpoints` para limitar o uso de disco.
</Warning>
## Checkpointing Manual
Para controle total, registre seu proprio handler de evento e chame `state.checkpoint()` diretamente:
```python
from crewai.events.event_bus import crewai_event_bus
from crewai.events.types.llm_events import LLMCallCompletedEvent
# Handler sincrono
@crewai_event_bus.on(LLMCallCompletedEvent)
def on_llm_done(source, event, state):
path = state.checkpoint("./my_checkpoints")
print(f"Checkpoint salvo: {path}")
# Handler assincrono
@crewai_event_bus.on(LLMCallCompletedEvent)
async def on_llm_done_async(source, event, state):
path = await state.acheckpoint("./my_checkpoints")
print(f"Checkpoint salvo: {path}")
```
O argumento `state` e o `RuntimeState` passado automaticamente pelo barramento de eventos quando seu handler aceita 3 parametros. Voce pode registrar handlers em qualquer tipo de evento listado na documentacao de [Event Listeners](/pt-BR/concepts/event-listener).
O checkpointing e best-effort: se uma gravacao de checkpoint falhar, o erro e registrado no log, mas a execucao continua sem interrupcao.