From d77e7b3139f41a51964c77e09fcf75edbda7d325 Mon Sep 17 00:00:00 2001 From: Greyson LaLonde Date: Fri, 22 May 2026 21:14:05 +0800 Subject: [PATCH] docs: restructure checkpointing page --- docs/ar/concepts/checkpointing.mdx | 486 +++++++++++++++-------- docs/en/concepts/checkpointing.mdx | 544 +++++++++++++++----------- docs/ko/concepts/checkpointing.mdx | 484 +++++++++++++++-------- docs/pt-BR/concepts/checkpointing.mdx | 488 +++++++++++++++-------- 4 files changed, 1283 insertions(+), 719 deletions(-) diff --git a/docs/ar/concepts/checkpointing.mdx b/docs/ar/concepts/checkpointing.mdx index 578f04be9..78acd10c7 100644 --- a/docs/ar/concepts/checkpointing.mdx +++ b/docs/ar/concepts/checkpointing.mdx @@ -5,225 +5,385 @@ icon: floppy-disk mode: "wide" --- - -الـ Checkpointing في اصدار مبكر. قد تتغير واجهات البرمجة في الاصدارات المستقبلية. - +الـ Checkpointing يحفظ لقطة من حالة التنفيذ اثناء التشغيل بحيث يمكن لطاقم او تدفق او وكيل الاستئناف بعد الفشل او التفرع الى فرع بديل. -## نظرة عامة + + + كيف يعمل الـ Checkpointing: الاحداث والتخزين والوراثة. + + + دليل 5 دقائق: تشغيل، ايقاف، استئناف. + + + وصفات مركزة على المهام لسير العمل الشائع. + + + `CheckpointConfig` والاحداث والمزودات وسطر الاوامر. + + -يقوم الـ Checkpointing بحفظ حالة التنفيذ تلقائيا اثناء التشغيل. اذا فشل طاقم او تدفق او وكيل اثناء التنفيذ، يمكنك الاستعادة من اخر نقطة حفظ والاستئناف دون اعادة تنفيذ العمل المكتمل. +## الشرح -## البداية السريعة +### ما هي نقطة الحفظ -```python -from crewai import Crew, CheckpointConfig +نقطة الحفظ هي لقطة متسلسلة من `RuntimeState` تكتب في نقطة معينة من التنفيذ. تسجل اي المهام اكتملت ومخرجاتها والمدخلات الحالية ومعرف نسب يحدد التشغيل. -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=True, # يستخدم الافتراضيات: ./.checkpoints, عند task_completed -) -result = crew.kickoff() -``` +عند الاستعادة من نقطة حفظ، يعيد CrewAI بناء تلك الحالة ويتخطى العمل المكتمل ويستمر. عند التفرع، يستعيد CrewAI الحالة تحت نسب جديد بحيث لا يتداخل الفرع الجديد مع التشغيل الاصلي. -تتم كتابة ملفات نقاط الحفظ في `./.checkpoints/` بعد اكتمال كل مهمة. +### متى تكتب نقاط الحفظ -## التكوين +الـ Checkpointing مدفوع بالاحداث. يشترك وقت التشغيل في الاحداث التي تحددها عبر `on_events` ويكتب نقطة حفظ عند اطلاق احدها. الافتراضي `task_completed` ينتج نقطة حفظ لكل مهمة منتهية — توازن معقول بين الدقة واستخدام القرص. الاحداث عالية التردد مثل `llm_call_completed` متاحة للاستعادة الدقيقة لكنها تكتب ملفات اكثر بكثير. -استخدم `CheckpointConfig` للتحكم الكامل: +### التخزين -```python -from crewai import Crew, CheckpointConfig +يتضمن CrewAI مزودين: -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=CheckpointConfig( - location="./my_checkpoints", - on_events=["task_completed", "crew_kickoff_completed"], - max_checkpoints=5, - ), -) -``` +- `JsonProvider` يكتب ملفا لكل نقطة حفظ. قابل للقراءة وسهل التفقد. +- `SqliteProvider` يكتب الى قاعدة بيانات SQLite واحدة. افضل لنقاط الحفظ عالية التردد. -### حقول CheckpointConfig +كلاهما يحذف اقدم نقاط الحفظ عند تحديد `max_checkpoints`. -| الحقل | النوع | الافتراضي | الوصف | -|:------|:------|:----------|:------| -| `location` | `str` | `"./.checkpoints"` | مسار ملفات نقاط الحفظ | -| `on_events` | `list[str]` | `["task_completed"]` | انواع الاحداث التي تطلق نقطة حفظ | -| `provider` | `BaseProvider` | `JsonProvider()` | واجهة التخزين | -| `max_checkpoints` | `int \| None` | `None` | الحد الاقصى للملفات؛ يتم حذف الاقدم اولا | + +كتابة نقاط الحفظ بافضل جهد. فشل نقطة حفظ يسجل لكنه لا يقاطع التشغيل. + -### الوراثة والانسحاب +### نموذج الوراثة -يقبل حقل `checkpoint` في Crew و Flow و Agent قيم `CheckpointConfig` او `True` او `False` او `None`: +`Crew` و`Flow` و`Agent` كلها تقبل وسيط `checkpoint`. يرث الابناء من الاب ما لم يحددوا قيمتهم الخاصة او يمرروا `False` للانسحاب. فعل الـ Checkpointing مرة واحدة على الطاقم وتشارك كل الوكلاء، او استبعد وكيلا واحدا بشكل انتقائي. -| القيمة | السلوك | -|:-------|:-------| -| `None` (افتراضي) | يرث من الاصل. الوكيل يرث اعدادات الطاقم. | -| `True` | تفعيل بالاعدادات الافتراضية. | -| `False` | انسحاب صريح. يوقف الوراثة من الاصل. | -| `CheckpointConfig(...)` | اعدادات مخصصة. | +## درس تطبيقي: استئناف طاقم فاشل -```python -crew = Crew( - agents=[ - Agent(role="Researcher", ...), # يرث checkpoint من الطاقم - Agent(role="Writer", ..., checkpoint=False), # منسحب، بدون نقاط حفظ - ], - tasks=[...], - checkpoint=True, -) -``` +هذا الدليل يستغرق حوالي 5 دقائق. ستشغل طاقما بمهمتين، توقفه في المنتصف، ثم تستأنف من نقطة الحفظ المحفوظة. -## الاستئناف من نقطة حفظ + + + ```python + from crewai import Agent, Crew, Task -```python -# استعادة واستئناف -crew = Crew.from_checkpoint("./my_checkpoints/20260407T120000_abc123.json") -result = crew.kickoff() # يستأنف من اخر مهمة مكتملة -``` + researcher = Agent(role="Researcher", goal="Research", backstory="Expert") + writer = Agent(role="Writer", goal="Write", backstory="Expert") -يتخطى الطاقم المستعاد المهام المكتملة ويستأنف من اول مهمة غير مكتملة. + crew = Crew( + agents=[researcher, writer], + tasks=[ + Task(description="Research AI trends", agent=researcher, expected_output="bullets"), + Task(description="Write a summary", agent=writer, expected_output="paragraph"), + ], + checkpoint=True, + ) + ``` + + + ```python + result = crew.kickoff() + ``` -## يعمل على Crew و Flow و Agent + اضغط `Ctrl+C` بعد انتهاء المهمة الاولى. في `./.checkpoints/`، الملف بصيغة `_.json` هو نقطة الحفظ. + + + ```python + from crewai import CheckpointConfig -### Crew + result = crew.kickoff( + from_checkpoint=CheckpointConfig( + restore_from="./.checkpoints/_.json", + ), + ) + ``` -```python -crew = Crew( - agents=[researcher, writer], - tasks=[research_task, write_task, review_task], - checkpoint=CheckpointConfig(location="./crew_cp"), -) -``` + يتم تخطي مهمة البحث، ويعمل الكاتب على مخرجات البحث المحفوظة، وينتهي الطاقم. + + -المشغل الافتراضي: `task_completed` (نقطة حفظ واحدة لكل مهمة مكتملة). +## ادلة عملية -### Flow + + + ```python + crew = Crew(agents=[...], tasks=[...], checkpoint=True) + ``` -```python -from crewai.flow.flow import Flow, start, listen -from crewai import CheckpointConfig + يكتب الى `./.checkpoints/` عند كل `task_completed`. + -class MyFlow(Flow): - @start() - def step_one(self): - return "data" + + ```python + from crewai import Crew, CheckpointConfig - @listen(step_one) - def step_two(self, data): - return process(data) + crew = Crew( + agents=[...], + tasks=[...], + checkpoint=CheckpointConfig( + location="./my_checkpoints", + on_events=["task_completed", "crew_kickoff_completed"], + max_checkpoints=5, + ), + ) + ``` + -flow = MyFlow( - checkpoint=CheckpointConfig( - location="./flow_cp", - on_events=["method_execution_finished"], - ), -) -result = flow.kickoff() + + + ```python JsonProvider + from crewai import Crew, CheckpointConfig + from crewai.state import JsonProvider -# استئناف -flow = MyFlow.from_checkpoint("./flow_cp/20260407T120000_abc123.json") -result = flow.kickoff() -``` + crew = Crew( + agents=[...], + tasks=[...], + checkpoint=CheckpointConfig( + location="./my_checkpoints", + provider=JsonProvider(), + max_checkpoints=5, + ), + ) + ``` + ```python SqliteProvider + from crewai import Crew, CheckpointConfig + from crewai.state import SqliteProvider -### Agent + crew = Crew( + agents=[...], + tasks=[...], + checkpoint=CheckpointConfig( + location="./.checkpoints.db", + provider=SqliteProvider(), + max_checkpoints=50, + ), + ) + ``` + -```python -agent = Agent( - role="Researcher", - goal="Research topics", - backstory="Expert researcher", - checkpoint=CheckpointConfig( - location="./agent_cp", - on_events=["lite_agent_execution_completed"], - ), -) -result = agent.kickoff(messages=[{"role": "user", "content": "Research AI trends"}]) -``` + + SQLite يفعل وضع journal WAL للقراءات المتزامنة. يفضل لنقاط الحفظ عالية التردد. + + -## مزودات التخزين + + ```python + crew = Crew( + agents=[ + Agent(role="Researcher", ...), + Agent(role="Writer", ..., checkpoint=False), + ], + tasks=[...], + checkpoint=True, + ) + ``` + -يتضمن CrewAI مزودي تخزين لنقاط الحفظ. + + ```python + config = CheckpointConfig(restore_from="./my_checkpoints/.json") + crew = Crew.from_checkpoint(config) + result = crew.kickoff() + ``` + -### JsonProvider (افتراضي) + + `fork()` يستعيد نقطة حفظ تحت نسب جديد بحيث لا يتصادم التشغيل الجديد مع الاصلي. -يكتب كل نقطة حفظ كملف JSON منفصل. + ```python + config = CheckpointConfig(restore_from="./my_checkpoints/.json") + crew = Crew.fork(config, branch="experiment-a") + result = crew.kickoff(inputs={"strategy": "aggressive"}) + ``` -```python -from crewai import Crew, CheckpointConfig -from crewai.state import JsonProvider + تسمية `branch` اختيارية؛ يتم انشاء واحدة اذا اغفلت. + -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=CheckpointConfig( - location="./my_checkpoints", - provider=JsonProvider(), - max_checkpoints=5, - ), -) -``` + + + + ```python + crew = Crew( + agents=[researcher, writer], + tasks=[research_task, write_task, review_task], + checkpoint=CheckpointConfig(location="./crew_cp"), + ) + ``` -### SqliteProvider + المشغل الافتراضي: `task_completed`. + + + ```python + from crewai.flow.flow import Flow, start, listen + from crewai import CheckpointConfig -يخزن جميع نقاط الحفظ في ملف قاعدة بيانات SQLite واحد. + class MyFlow(Flow): + @start() + def step_one(self): + return "data" -```python -from crewai import Crew, CheckpointConfig -from crewai.state import SqliteProvider + @listen(step_one) + def step_two(self, data): + return process(data) -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=CheckpointConfig( - location="./.checkpoints.db", - provider=SqliteProvider(), - ), -) -``` + flow = MyFlow( + checkpoint=CheckpointConfig( + location="./flow_cp", + on_events=["method_execution_finished"], + ), + ) + result = flow.kickoff() + config = CheckpointConfig(restore_from="./flow_cp/.json") + flow = MyFlow.from_checkpoint(config) + result = flow.kickoff() + ``` + + + ```python + agent = Agent( + role="Researcher", + goal="Research topics", + backstory="Expert researcher", + checkpoint=CheckpointConfig( + location="./agent_cp", + on_events=["lite_agent_execution_completed"], + ), + ) + result = agent.kickoff(messages=[{"role": "user", "content": "Research AI trends"}]) + ``` + + + -## انواع الاحداث + + سجل معالجا على اي حدث واستدع `state.checkpoint()`. -يقبل حقل `on_events` اي مجموعة من سلاسل انواع الاحداث. الخيارات الشائعة: + + ```python Sync + from crewai.events.event_bus import crewai_event_bus + from crewai.events.types.llm_events import LLMCallCompletedEvent + + @crewai_event_bus.on(LLMCallCompletedEvent) + def on_llm_done(source, event, state): + path = state.checkpoint("./my_checkpoints") + print(f"تم حفظ نقطة الحفظ: {path}") + ``` + ```python Async + from crewai.events.event_bus import crewai_event_bus + from crewai.events.types.llm_events import LLMCallCompletedEvent + + @crewai_event_bus.on(LLMCallCompletedEvent) + async def on_llm_done_async(source, event, state): + path = await state.acheckpoint("./my_checkpoints") + print(f"تم حفظ نقطة الحفظ: {path}") + ``` + + + يتم تمرير وسيط `state` تلقائيا عندما يقبل المعالج ثلاثة معاملات. راجع [Event Listeners](/ar/concepts/event-listener) لقائمة الاحداث الكاملة. + + + + ```bash + crewai checkpoint # كشف تلقائي لـ .checkpoints/ او .checkpoints.db + crewai checkpoint --location ./my_checkpoints + crewai checkpoint --location ./.checkpoints.db + ``` + + + Checkpoint TUI + + + اللوحة اليسرى تجمع نقاط الحفظ حسب الفرع؛ التفرعات تتداخل تحت ابيها. اختيار نقطة حفظ يعرض بياناتها الوصفية وحالة الكيان وتقدم المهام. **Resume** يكمل التشغيل؛ **Fork** يبدا فرعا جديدا. + + لوحة التفاصيل تعرض منطقتين قابلتين للتحرير: + + - **Inputs** — مدخلات الـ kickoff الاصلية، معبأة مسبقا وقابلة للتحرير. + - **مخرجات المهام** — مخرجات المهام المكتملة. تحرير مخرج والضغط على **Fork** يبطل المهام التابعة لتعاد بالسياق المعدل. + + + مفيد لاستكشاف "ماذا لو": تفرع، عدل، راقب. + + + + + ```bash + crewai checkpoint list ./my_checkpoints + crewai checkpoint info ./my_checkpoints/.json + crewai checkpoint info ./.checkpoints.db + ``` + + + +## المرجع + +### `CheckpointConfig` + + + وجهة التخزين. مجلد لـ `JsonProvider`، مسار ملف قاعدة بيانات لـ `SqliteProvider`. + + + + انواع الاحداث التي تطلق نقطة حفظ. راجع [انواع الاحداث](#انواع-الاحداث). + + + + واجهة التخزين. `JsonProvider` او `SqliteProvider`. + + + + الحد الاقصى لنقاط الحفظ المحتفظ بها. الاقدم تحذف بعد كل كتابة. + + + + نقطة الحفظ المراد استعادتها عند تمريرها عبر `from_checkpoint`. + + +### قيم حقل `checkpoint` + +مقبولة في `Crew` و`Flow` و`Agent`. + + + يرث من الاب. + + + + تفعيل بالاعدادات الافتراضية. + + + + انسحاب صريح. يوقف الوراثة. + + + + اعدادات مخصصة. + + +### انواع الاحداث + +قيم شائعة لـ `on_events`: | حالة الاستخدام | الاحداث | |:---------------|:--------| -| بعد كل مهمة (Crew) | `["task_completed"]` | +| بعد كل مهمة | `["task_completed"]` | | بعد كل طريقة في التدفق | `["method_execution_finished"]` | | بعد تنفيذ الوكيل | `["agent_execution_completed"]`, `["lite_agent_execution_completed"]` | | عند اكتمال الطاقم فقط | `["crew_kickoff_completed"]` | | بعد كل استدعاء LLM | `["llm_call_completed"]` | -| على كل شيء | `["*"]` | +| كل شيء | `["*"]` | -استخدام `["*"]` او احداث عالية التردد مثل `llm_call_completed` سيكتب العديد من ملفات نقاط الحفظ وقد يؤثر على الاداء. استخدم `max_checkpoints` للحد من استخدام المساحة. +`["*"]` والاحداث عالية التردد مثل `llm_call_completed` تكتب نقاط حفظ كثيرة وقد تضر بالاداء. استخدمها مع `max_checkpoints`. -## نقاط الحفظ اليدوية +### مزودات التخزين -للتحكم الكامل، سجل معالج الاحداث الخاص بك واستدع `state.checkpoint()` مباشرة: + + ملف واحد لكل نقطة حفظ بصيغة `_.json` داخل `location`. + -```python -from crewai.events.event_bus import crewai_event_bus -from crewai.events.types.llm_events import LLMCallCompletedEvent + + ملف قاعدة بيانات واحد في `location` مع journaling WAL. + -# معالج متزامن -@crewai_event_bus.on(LLMCallCompletedEvent) -def on_llm_done(source, event, state): - path = state.checkpoint("./my_checkpoints") - print(f"تم حفظ نقطة الحفظ: {path}") +### سطر الاوامر -# معالج غير متزامن -@crewai_event_bus.on(LLMCallCompletedEvent) -async def on_llm_done_async(source, event, state): - path = await state.acheckpoint("./my_checkpoints") - print(f"تم حفظ نقطة الحفظ: {path}") -``` - -وسيط `state` هو `RuntimeState` الذي يتم تمريره تلقائيا بواسطة ناقل الاحداث عندما يقبل المعالج 3 معاملات. يمكنك تسجيل معالجات على اي نوع حدث مدرج في وثائق [Event Listeners](/ar/concepts/event-listener). - -الـ Checkpointing يعمل بافضل جهد: اذا فشلت كتابة نقطة حفظ، يتم تسجيل الخطأ ولكن التنفيذ يستمر دون انقطاع. +| الامر | الغرض | +|:------|:------| +| `crewai checkpoint` | تشغيل TUI؛ كشف التخزين تلقائيا. | +| `crewai checkpoint --location ` | تشغيل TUI على موقع محدد. | +| `crewai checkpoint list ` | سرد نقاط الحفظ. | +| `crewai checkpoint info ` | تفقد ملف نقطة حفظ او اخر مدخل في قاعدة بيانات SQLite. | diff --git a/docs/en/concepts/checkpointing.mdx b/docs/en/concepts/checkpointing.mdx index d6430eb6f..6c513a6ad 100644 --- a/docs/en/concepts/checkpointing.mdx +++ b/docs/en/concepts/checkpointing.mdx @@ -5,301 +5,385 @@ icon: floppy-disk mode: "wide" --- - -Checkpointing is in early release. APIs may change in future versions. - +Checkpointing saves a snapshot of execution state during a run so a crew, flow, or agent can resume after a failure or be forked into an alternate branch. -## Overview + + + How checkpointing works: events, storage, and inheritance. + + + A 5-minute walkthrough: run, interrupt, resume. + + + Task-focused recipes for common workflows. + + + `CheckpointConfig`, events, providers, and CLI. + + -Checkpointing automatically saves execution state during a run. If a crew, flow, or agent fails mid-execution, you can restore from the last checkpoint and resume without re-running completed work. +## Explanation -## Quick Start +### What a checkpoint is -```python -from crewai import Crew, CheckpointConfig +A checkpoint is a serialized snapshot of `RuntimeState` written at a point in execution. It records which tasks have completed, their outputs, the current inputs, and a lineage ID that identifies the run. -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=True, # uses defaults: ./.checkpoints, on task_completed -) -result = crew.kickoff() -``` +When you restore from a checkpoint, CrewAI rebuilds that state, skips already-completed work, and continues. When you fork from one, CrewAI restores the state under a new lineage so the new branch and the original run do not overwrite each other. -Checkpoint files are written to `./.checkpoints/` after each completed task. +### When checkpoints are written -## Configuration +Checkpointing is event-driven. The runtime subscribes to events you select via `on_events` and writes a checkpoint each time one fires. The default `task_completed` produces one checkpoint per finished task — a sensible tradeoff between granularity and disk use. Higher-frequency events like `llm_call_completed` are available for fine-grained recovery but write far more files. -Use `CheckpointConfig` for full control: +### Storage -```python -from crewai import Crew, CheckpointConfig +Two providers ship with CrewAI: -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=CheckpointConfig( - location="./my_checkpoints", - on_events=["task_completed", "crew_kickoff_completed"], - max_checkpoints=5, - ), -) -``` +- `JsonProvider` writes one file per checkpoint. Human-readable and easy to inspect. +- `SqliteProvider` writes to a single SQLite database. Better for high-frequency checkpointing. -### CheckpointConfig Fields +Both prune oldest checkpoints when `max_checkpoints` is set. -| Field | Type | Default | Description | -|:------|:-----|:--------|:------------| -| `location` | `str` | `"./.checkpoints"` | Storage destination — a directory for `JsonProvider`, a database file path for `SqliteProvider` | -| `on_events` | `list[str]` | `["task_completed"]` | Event types that trigger a checkpoint | -| `provider` | `BaseProvider` | `JsonProvider()` | Storage backend | -| `max_checkpoints` | `int \| None` | `None` | Max checkpoints to keep. Oldest are pruned after each write. Pruning is handled by the provider. | -| `restore_from` | `Path \| str \| None` | `None` | Path to a checkpoint to restore from. Used when passing config via a kickoff method's `from_checkpoint` parameter. | + +Checkpoint writes are best-effort. A failed checkpoint is logged but does not interrupt the run. + -### Inheritance and Opt-Out +### Inheritance model -The `checkpoint` field on Crew, Flow, and Agent accepts `CheckpointConfig`, `True`, `False`, or `None`: +`Crew`, `Flow`, and `Agent` all accept a `checkpoint` argument. Children inherit from their parent unless they set their own value or pass `False` to opt out. Enable checkpointing once on the crew and every agent participates, or selectively exclude one agent. -| Value | Behavior | -|:------|:---------| -| `None` (default) | Inherit from parent. An agent inherits its crew's config. | -| `True` | Enable with defaults. | -| `False` | Explicit opt-out. Stops inheritance from parent. | -| `CheckpointConfig(...)` | Custom configuration. | +## Tutorial: Resume a failing crew -```python -crew = Crew( - agents=[ - Agent(role="Researcher", ...), # inherits crew's checkpoint - Agent(role="Writer", ..., checkpoint=False), # opted out, no checkpoints - ], - tasks=[...], - checkpoint=True, -) -``` +This walkthrough takes ~5 minutes. You will run a two-task crew, kill it midway, and resume from the saved checkpoint. -## Resuming from a Checkpoint + + + ```python + from crewai import Agent, Crew, Task -Pass a `CheckpointConfig` with `restore_from` to any kickoff method. The crew restores from that checkpoint, skips completed tasks, and resumes. + researcher = Agent(role="Researcher", goal="Research", backstory="Expert") + writer = Agent(role="Writer", goal="Write", backstory="Expert") -```python -from crewai import Crew, CheckpointConfig + crew = Crew( + agents=[researcher, writer], + tasks=[ + Task(description="Research AI trends", agent=researcher, expected_output="bullets"), + Task(description="Write a summary", agent=writer, expected_output="paragraph"), + ], + checkpoint=True, + ) + ``` + + + ```python + result = crew.kickoff() + ``` -crew = Crew(agents=[...], tasks=[...]) -result = crew.kickoff( - from_checkpoint=CheckpointConfig( - restore_from="./my_checkpoints/20260407T120000_abc123.json", - ), -) -``` + Press `Ctrl+C` after the first task finishes. Look in `./.checkpoints/` — a file named `_.json` is the checkpoint. + + + ```python + from crewai import CheckpointConfig -Remaining `CheckpointConfig` fields apply to the new run, so checkpointing continues after the restore. + result = crew.kickoff( + from_checkpoint=CheckpointConfig( + restore_from="./.checkpoints/_.json", + ), + ) + ``` -You can also use the classmethod directly: + The research task is skipped, the writer runs against the saved research output, and the crew finishes. + + -```python -config = CheckpointConfig(restore_from="./my_checkpoints/20260407T120000_abc123.json") -crew = Crew.from_checkpoint(config) -result = crew.kickoff() -``` +## How-to guides -## Forking from a Checkpoint + + + ```python + crew = Crew(agents=[...], tasks=[...], checkpoint=True) + ``` -`fork()` restores a checkpoint and starts a new execution branch. Useful for exploring alternative paths from the same point. + Writes to `./.checkpoints/` on every `task_completed`. + -```python -from crewai import Crew, CheckpointConfig + + ```python + from crewai import Crew, CheckpointConfig -config = CheckpointConfig(restore_from="./my_checkpoints/20260407T120000_abc123.json") -crew = Crew.fork(config, branch="experiment-a") -result = crew.kickoff(inputs={"strategy": "aggressive"}) -``` + crew = Crew( + agents=[...], + tasks=[...], + checkpoint=CheckpointConfig( + location="./my_checkpoints", + on_events=["task_completed", "crew_kickoff_completed"], + max_checkpoints=5, + ), + ) + ``` + -Each fork gets a unique lineage ID so checkpoints from different branches don't collide. The `branch` label is optional and auto-generated if omitted. + + + ```python JsonProvider + from crewai import Crew, CheckpointConfig + from crewai.state import JsonProvider -## Works on Crew, Flow, and Agent + crew = Crew( + agents=[...], + tasks=[...], + checkpoint=CheckpointConfig( + location="./my_checkpoints", + provider=JsonProvider(), + max_checkpoints=5, + ), + ) + ``` + ```python SqliteProvider + from crewai import Crew, CheckpointConfig + from crewai.state import SqliteProvider -### Crew + crew = Crew( + agents=[...], + tasks=[...], + checkpoint=CheckpointConfig( + location="./.checkpoints.db", + provider=SqliteProvider(), + max_checkpoints=50, + ), + ) + ``` + -```python -crew = Crew( - agents=[researcher, writer], - tasks=[research_task, write_task, review_task], - checkpoint=CheckpointConfig(location="./crew_cp"), -) -``` + + SQLite enables WAL journal mode for concurrent reads. Prefer it for high-frequency checkpointing. + + -Default trigger: `task_completed` (one checkpoint per finished task). + + ```python + crew = Crew( + agents=[ + Agent(role="Researcher", ...), + Agent(role="Writer", ..., checkpoint=False), + ], + tasks=[...], + checkpoint=True, + ) + ``` + -### Flow + + ```python + config = CheckpointConfig(restore_from="./my_checkpoints/.json") + crew = Crew.from_checkpoint(config) + result = crew.kickoff() + ``` + -```python -from crewai.flow.flow import Flow, start, listen -from crewai import CheckpointConfig + + `fork()` restores a checkpoint under a fresh lineage so the new run does not collide with the original. -class MyFlow(Flow): - @start() - def step_one(self): - return "data" + ```python + config = CheckpointConfig(restore_from="./my_checkpoints/.json") + crew = Crew.fork(config, branch="experiment-a") + result = crew.kickoff(inputs={"strategy": "aggressive"}) + ``` - @listen(step_one) - def step_two(self, data): - return process(data) + The `branch` label is optional; one is generated if omitted. + -flow = MyFlow( - checkpoint=CheckpointConfig( - location="./flow_cp", - on_events=["method_execution_finished"], - ), -) -result = flow.kickoff() + + + + ```python + crew = Crew( + agents=[researcher, writer], + tasks=[research_task, write_task, review_task], + checkpoint=CheckpointConfig(location="./crew_cp"), + ) + ``` -# Resume -config = CheckpointConfig(restore_from="./flow_cp/20260407T120000_abc123.json") -flow = MyFlow.from_checkpoint(config) -result = flow.kickoff() -``` + Default trigger: `task_completed`. + + + ```python + from crewai.flow.flow import Flow, start, listen + from crewai import CheckpointConfig -### Agent + class MyFlow(Flow): + @start() + def step_one(self): + return "data" -```python -agent = Agent( - role="Researcher", - goal="Research topics", - backstory="Expert researcher", - checkpoint=CheckpointConfig( - location="./agent_cp", - on_events=["lite_agent_execution_completed"], - ), -) -result = agent.kickoff(messages=[{"role": "user", "content": "Research AI trends"}]) -``` + @listen(step_one) + def step_two(self, data): + return process(data) -## Storage Providers + flow = MyFlow( + checkpoint=CheckpointConfig( + location="./flow_cp", + on_events=["method_execution_finished"], + ), + ) + result = flow.kickoff() -CrewAI ships with two checkpoint storage providers. + config = CheckpointConfig(restore_from="./flow_cp/.json") + flow = MyFlow.from_checkpoint(config) + result = flow.kickoff() + ``` + + + ```python + agent = Agent( + role="Researcher", + goal="Research topics", + backstory="Expert researcher", + checkpoint=CheckpointConfig( + location="./agent_cp", + on_events=["lite_agent_execution_completed"], + ), + ) + result = agent.kickoff(messages=[{"role": "user", "content": "Research AI trends"}]) + ``` + + + -### JsonProvider (default) + + Register a handler on any event and call `state.checkpoint()`. -Writes each checkpoint as a separate JSON file. Simple, human-readable, easy to inspect. + + ```python Sync + from crewai.events.event_bus import crewai_event_bus + from crewai.events.types.llm_events import LLMCallCompletedEvent -```python -from crewai import Crew, CheckpointConfig -from crewai.state import JsonProvider + @crewai_event_bus.on(LLMCallCompletedEvent) + def on_llm_done(source, event, state): + path = state.checkpoint("./my_checkpoints") + print(f"Saved checkpoint: {path}") + ``` + ```python Async + from crewai.events.event_bus import crewai_event_bus + from crewai.events.types.llm_events import LLMCallCompletedEvent -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=CheckpointConfig( - location="./my_checkpoints", - provider=JsonProvider(), # this is the default - max_checkpoints=5, # prunes oldest files - ), -) -``` + @crewai_event_bus.on(LLMCallCompletedEvent) + async def on_llm_done_async(source, event, state): + path = await state.acheckpoint("./my_checkpoints") + print(f"Saved checkpoint: {path}") + ``` + -Files are named `_.json` inside the location directory. + A `state` argument is supplied automatically when the handler takes three parameters. See [Event Listeners](/en/concepts/event-listener) for the full event catalog. + -### SqliteProvider + + ```bash + crewai checkpoint # auto-detects .checkpoints/ or .checkpoints.db + crewai checkpoint --location ./my_checkpoints + crewai checkpoint --location ./.checkpoints.db + ``` -Stores all checkpoints in a single SQLite database file. Better for high-frequency checkpointing and avoids many small files. + + Checkpoint TUI + -```python -from crewai import Crew, CheckpointConfig -from crewai.state import SqliteProvider + The left panel groups checkpoints by branch; forks nest under their parent. Selecting a checkpoint shows its metadata, entity state, and task progress. **Resume** continues the run; **Fork** starts a new branch. -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=CheckpointConfig( - location="./.checkpoints.db", - provider=SqliteProvider(), - max_checkpoints=50, - ), -) -``` + The detail panel exposes two editable areas: -WAL journal mode is enabled for concurrent read access. + - **Inputs** — original kickoff inputs, pre-filled and editable. + - **Task outputs** — outputs of completed tasks. Editing an output and hitting **Fork** invalidates downstream tasks so they re-run against the modified context. -## Event Types + + Useful for "what if" exploration: fork, tweak, observe. + + -The `on_events` field accepts any combination of event type strings. Common choices: + + ```bash + crewai checkpoint list ./my_checkpoints + crewai checkpoint info ./my_checkpoints/.json + crewai checkpoint info ./.checkpoints.db + ``` + + -| Use Case | Events | +## Reference + +### `CheckpointConfig` + + + Storage destination. A directory for `JsonProvider`, a database file path for `SqliteProvider`. + + + + Event types that trigger a checkpoint. See [event types](#event-types). + + + + Storage backend. Either `JsonProvider` or `SqliteProvider`. + + + + Maximum checkpoints to retain. Oldest are pruned after each write. + + + + Checkpoint to restore from when passed via `from_checkpoint`. + + +### `checkpoint` field values + +Accepted by `Crew`, `Flow`, and `Agent`. + + + Inherit from parent. + + + + Enable with defaults. + + + + Explicit opt-out. Stops inheritance. + + + + Custom configuration. + + +### Event types + +Common values for `on_events`: + +| Use case | Events | |:---------|:-------| -| After each task (Crew) | `["task_completed"]` | +| After each task | `["task_completed"]` | | After each flow method | `["method_execution_finished"]` | | After agent execution | `["agent_execution_completed"]`, `["lite_agent_execution_completed"]` | | On crew completion only | `["crew_kickoff_completed"]` | | After every LLM call | `["llm_call_completed"]` | -| On everything | `["*"]` | +| Everything | `["*"]` | -Using `["*"]` or high-frequency events like `llm_call_completed` will write many checkpoint files and may impact performance. Use `max_checkpoints` to limit disk usage. +`["*"]` and high-frequency events like `llm_call_completed` write many checkpoints and can degrade performance. Pair them with `max_checkpoints`. -## Manual Checkpointing +### Storage providers -For full control, register your own event handler and call `state.checkpoint()` directly: + + One file per checkpoint, named `_.json` inside `location`. + -```python -from crewai.events.event_bus import crewai_event_bus -from crewai.events.types.llm_events import LLMCallCompletedEvent + + Single database file at `location` with WAL journaling. + -# Sync handler -@crewai_event_bus.on(LLMCallCompletedEvent) -def on_llm_done(source, event, state): - path = state.checkpoint("./my_checkpoints") - print(f"Saved checkpoint: {path}") +### CLI -# Async handler -@crewai_event_bus.on(LLMCallCompletedEvent) -async def on_llm_done_async(source, event, state): - path = await state.acheckpoint("./my_checkpoints") - print(f"Saved checkpoint: {path}") -``` - -The `state` argument is the `RuntimeState` passed automatically by the event bus when your handler accepts 3 parameters. You can register handlers on any event type listed in the [Event Listeners](/en/concepts/event-listener) documentation. - -Checkpointing is best-effort: if a checkpoint write fails, the error is logged but execution continues uninterrupted. - -## CLI - -The `crewai checkpoint` command gives you a TUI for browsing, inspecting, resuming, and forking checkpoints. It auto-detects whether your checkpoints are JSON files or a SQLite database. - -```bash -# Launch the TUI — auto-detects .checkpoints/ or .checkpoints.db -crewai checkpoint - -# Point at a specific location -crewai checkpoint --location ./my_checkpoints -crewai checkpoint --location ./.checkpoints.db -``` - - - Checkpoint TUI - - -The left panel is a tree view. Checkpoints are grouped by branch, and forks nest under the checkpoint they diverged from. Select a checkpoint to see its metadata, entity state, and task progress in the detail panel. Hit **Resume** to pick up where it left off, or **Fork** to start a new branch from that point. - -### Editing inputs and task outputs - -When a checkpoint is selected, the detail panel shows: - -- **Inputs** — if the original kickoff had inputs (e.g. `{topic}`), they appear as editable fields pre-filled with the original values. Change them before resuming or forking. -- **Task outputs** — completed tasks show their output in editable text areas. Edit a task's output to change the context that downstream tasks receive. When you modify a task output and hit Fork, all subsequent tasks are invalidated and re-run with the new context. - -This is useful for "what if" exploration — fork from a checkpoint, tweak a task's result, and see how it changes downstream behavior. - -### Subcommands - -```bash -# List all checkpoints -crewai checkpoint list ./my_checkpoints - -# Inspect a specific checkpoint -crewai checkpoint info ./my_checkpoints/20260407T120000_abc123.json - -# Inspect latest in a SQLite database -crewai checkpoint info ./.checkpoints.db -``` +| Command | Purpose | +|:--------|:--------| +| `crewai checkpoint` | Launch the TUI; auto-detect storage. | +| `crewai checkpoint --location ` | Launch the TUI against a specific location. | +| `crewai checkpoint list ` | List checkpoints. | +| `crewai checkpoint info ` | Inspect a checkpoint file or the latest entry in a SQLite database. | diff --git a/docs/ko/concepts/checkpointing.mdx b/docs/ko/concepts/checkpointing.mdx index 643c6d9c1..842ec4354 100644 --- a/docs/ko/concepts/checkpointing.mdx +++ b/docs/ko/concepts/checkpointing.mdx @@ -5,194 +5,360 @@ icon: floppy-disk mode: "wide" --- - -체크포인팅은 초기 릴리스 단계입니다. API는 향후 버전에서 변경될 수 있습니다. - +체크포인팅은 실행 중 실행 상태의 스냅샷을 저장하여 크루, 플로우, 에이전트가 실패 후 재개하거나 대체 브랜치로 분기될 수 있도록 합니다. -## 개요 + + + 체크포인팅의 작동 방식: 이벤트, 스토리지, 상속. + + + 5분 가이드: 실행, 중단, 재개. + + + 일반적인 워크플로우를 위한 작업 중심 레시피. + + + `CheckpointConfig`, 이벤트, 프로바이더, CLI. + + -체크포인팅은 실행 중 자동으로 실행 상태를 저장합니다. 크루, 플로우 또는 에이전트가 실행 도중 실패하면 마지막 체크포인트에서 복원하여 이미 완료된 작업을 다시 실행하지 않고 재개할 수 있습니다. +## 설명 -## 빠른 시작 +### 체크포인트란 -```python -from crewai import Crew, CheckpointConfig +체크포인트는 실행의 특정 시점에 기록된 `RuntimeState`의 직렬화된 스냅샷입니다. 어떤 태스크가 완료되었는지, 그 출력값, 현재 입력값, 그리고 실행을 식별하는 lineage ID를 기록합니다. -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=True, # 기본값 사용: ./.checkpoints, task_completed 이벤트 -) -result = crew.kickoff() -``` +체크포인트에서 복원하면 CrewAI는 해당 상태를 재구성하고 이미 완료된 작업을 건너뛰고 계속 진행합니다. 포크하면 CrewAI는 새 lineage 아래에 상태를 복원하여 새 브랜치와 원본 실행이 서로 덮어쓰지 않도록 합니다. -각 태스크가 완료된 후 `./.checkpoints/`에 체크포인트 파일이 기록됩니다. +### 체크포인트가 기록되는 시점 -## 설정 +체크포인팅은 이벤트 기반입니다. 런타임은 `on_events`로 선택한 이벤트를 구독하고, 이벤트가 발생할 때마다 체크포인트를 기록합니다. 기본값 `task_completed`는 완료된 태스크당 하나의 체크포인트를 생성합니다 — 세분화와 디스크 사용의 합리적인 균형입니다. `llm_call_completed`와 같은 고빈도 이벤트는 더 세밀한 복구를 위해 사용 가능하지만 훨씬 많은 파일을 기록합니다. -`CheckpointConfig`를 사용하여 세부 설정을 제어합니다: +### 스토리지 -```python -from crewai import Crew, CheckpointConfig +CrewAI에는 두 가지 프로바이더가 포함되어 있습니다: -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=CheckpointConfig( - location="./my_checkpoints", - on_events=["task_completed", "crew_kickoff_completed"], - max_checkpoints=5, - ), -) -``` +- `JsonProvider`는 체크포인트당 하나의 파일을 기록합니다. 사람이 읽기 쉽고 검사하기 편리합니다. +- `SqliteProvider`는 단일 SQLite 데이터베이스에 기록합니다. 고빈도 체크포인팅에 적합합니다. -### CheckpointConfig 필드 +`max_checkpoints`가 설정되면 두 프로바이더 모두 가장 오래된 체크포인트를 자동으로 제거합니다. -| 필드 | 타입 | 기본값 | 설명 | -|:-----|:-----|:-------|:-----| -| `location` | `str` | `"./.checkpoints"` | 체크포인트 파일 경로 | -| `on_events` | `list[str]` | `["task_completed"]` | 체크포인트를 트리거하는 이벤트 타입 | -| `provider` | `BaseProvider` | `JsonProvider()` | 스토리지 백엔드 | -| `max_checkpoints` | `int \| None` | `None` | 보관할 최대 파일 수; 오래된 것부터 삭제 | + +체크포인트 기록은 best-effort 방식입니다. 실패한 체크포인트는 로그에 기록되지만 실행을 중단시키지 않습니다. + -### 상속 및 옵트아웃 +### 상속 모델 -Crew, Flow, Agent의 `checkpoint` 필드는 `CheckpointConfig`, `True`, `False`, `None`을 받습니다: +`Crew`, `Flow`, `Agent` 모두 `checkpoint` 인수를 받습니다. 자식은 자체 값을 설정하거나 `False`를 전달하여 옵트아웃하지 않는 한 부모로부터 상속합니다. 크루에서 체크포인팅을 한 번 활성화하면 모든 에이전트가 참여하거나, 특정 에이전트만 선택적으로 제외할 수 있습니다. -| 값 | 동작 | -|:---|:-----| -| `None` (기본값) | 부모에서 상속. 에이전트는 크루의 설정을 상속합니다. | -| `True` | 기본값으로 활성화. | -| `False` | 명시적 옵트아웃. 부모 상속을 중단합니다. | -| `CheckpointConfig(...)` | 사용자 정의 설정. | +## 튜토리얼: 실패한 크루 재개하기 -```python -crew = Crew( - agents=[ - Agent(role="Researcher", ...), # 크루의 checkpoint 상속 - Agent(role="Writer", ..., checkpoint=False), # 옵트아웃, 체크포인트 없음 - ], - tasks=[...], - checkpoint=True, -) -``` +이 가이드는 약 5분이 소요됩니다. 두 개의 태스크가 있는 크루를 실행하고 중간에 종료한 다음, 저장된 체크포인트에서 재개합니다. -## 체크포인트에서 재개 + + + ```python + from crewai import Agent, Crew, Task -```python -# 복원 및 재개 -crew = Crew.from_checkpoint("./my_checkpoints/20260407T120000_abc123.json") -result = crew.kickoff() # 마지막으로 완료된 태스크부터 재개 -``` + researcher = Agent(role="Researcher", goal="Research", backstory="Expert") + writer = Agent(role="Writer", goal="Write", backstory="Expert") -복원된 크루는 이미 완료된 태스크를 건너뛰고 첫 번째 미완료 태스크부터 재개합니다. + crew = Crew( + agents=[researcher, writer], + tasks=[ + Task(description="Research AI trends", agent=researcher, expected_output="bullets"), + Task(description="Write a summary", agent=writer, expected_output="paragraph"), + ], + checkpoint=True, + ) + ``` + + + ```python + result = crew.kickoff() + ``` -## Crew, Flow, Agent에서 사용 가능 + 첫 번째 태스크가 완료된 후 `Ctrl+C`를 누릅니다. `./.checkpoints/` 디렉토리에서 `_.json` 형식의 파일이 체크포인트입니다. + + + ```python + from crewai import CheckpointConfig -### Crew + result = crew.kickoff( + from_checkpoint=CheckpointConfig( + restore_from="./.checkpoints/_.json", + ), + ) + ``` -```python -crew = Crew( - agents=[researcher, writer], - tasks=[research_task, write_task, review_task], - checkpoint=CheckpointConfig(location="./crew_cp"), -) -``` + 연구 태스크는 건너뛰고, 작성자는 저장된 연구 출력에 대해 실행되며, 크루가 완료됩니다. + + -기본 트리거: `task_completed` (완료된 태스크당 하나의 체크포인트). +## 사용 방법 -### Flow + + + ```python + crew = Crew(agents=[...], tasks=[...], checkpoint=True) + ``` -```python -from crewai.flow.flow import Flow, start, listen -from crewai import CheckpointConfig + `task_completed` 이벤트마다 `./.checkpoints/`에 기록합니다. + -class MyFlow(Flow): - @start() - def step_one(self): - return "data" + + ```python + from crewai import Crew, CheckpointConfig - @listen(step_one) - def step_two(self, data): - return process(data) + crew = Crew( + agents=[...], + tasks=[...], + checkpoint=CheckpointConfig( + location="./my_checkpoints", + on_events=["task_completed", "crew_kickoff_completed"], + max_checkpoints=5, + ), + ) + ``` + -flow = MyFlow( - checkpoint=CheckpointConfig( - location="./flow_cp", - on_events=["method_execution_finished"], - ), -) -result = flow.kickoff() + + + ```python JsonProvider + from crewai import Crew, CheckpointConfig + from crewai.state import JsonProvider -# 재개 -flow = MyFlow.from_checkpoint("./flow_cp/20260407T120000_abc123.json") -result = flow.kickoff() -``` + crew = Crew( + agents=[...], + tasks=[...], + checkpoint=CheckpointConfig( + location="./my_checkpoints", + provider=JsonProvider(), + max_checkpoints=5, + ), + ) + ``` + ```python SqliteProvider + from crewai import Crew, CheckpointConfig + from crewai.state import SqliteProvider -### Agent + crew = Crew( + agents=[...], + tasks=[...], + checkpoint=CheckpointConfig( + location="./.checkpoints.db", + provider=SqliteProvider(), + max_checkpoints=50, + ), + ) + ``` + -```python -agent = Agent( - role="Researcher", - goal="Research topics", - backstory="Expert researcher", - checkpoint=CheckpointConfig( - location="./agent_cp", - on_events=["lite_agent_execution_completed"], - ), -) -result = agent.kickoff(messages=[{"role": "user", "content": "Research AI trends"}]) -``` + + SQLite는 동시 읽기를 위해 WAL 저널 모드를 활성화합니다. 고빈도 체크포인팅에는 SQLite를 선호하세요. + + -## 스토리지 프로바이더 + + ```python + crew = Crew( + agents=[ + Agent(role="Researcher", ...), + Agent(role="Writer", ..., checkpoint=False), + ], + tasks=[...], + checkpoint=True, + ) + ``` + -CrewAI는 두 가지 체크포인트 스토리지 프로바이더를 제공합니다. + + ```python + config = CheckpointConfig(restore_from="./my_checkpoints/.json") + crew = Crew.from_checkpoint(config) + result = crew.kickoff() + ``` + -### JsonProvider (기본값) + + `fork()`는 새 lineage 아래에 체크포인트를 복원하여 새 실행이 원본과 충돌하지 않도록 합니다. -각 체크포인트를 별도의 JSON 파일로 저장합니다. + ```python + config = CheckpointConfig(restore_from="./my_checkpoints/.json") + crew = Crew.fork(config, branch="experiment-a") + result = crew.kickoff(inputs={"strategy": "aggressive"}) + ``` -```python -from crewai import Crew, CheckpointConfig -from crewai.state import JsonProvider + `branch` 레이블은 선택 사항이며, 생략하면 자동 생성됩니다. + -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=CheckpointConfig( - location="./my_checkpoints", - provider=JsonProvider(), - max_checkpoints=5, - ), -) -``` + + + + ```python + crew = Crew( + agents=[researcher, writer], + tasks=[research_task, write_task, review_task], + checkpoint=CheckpointConfig(location="./crew_cp"), + ) + ``` -### SqliteProvider + 기본 트리거: `task_completed`. + + + ```python + from crewai.flow.flow import Flow, start, listen + from crewai import CheckpointConfig -모든 체크포인트를 단일 SQLite 데이터베이스 파일에 저장합니다. + class MyFlow(Flow): + @start() + def step_one(self): + return "data" -```python -from crewai import Crew, CheckpointConfig -from crewai.state import SqliteProvider + @listen(step_one) + def step_two(self, data): + return process(data) -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=CheckpointConfig( - location="./.checkpoints.db", - provider=SqliteProvider(), - ), -) -``` + flow = MyFlow( + checkpoint=CheckpointConfig( + location="./flow_cp", + on_events=["method_execution_finished"], + ), + ) + result = flow.kickoff() + config = CheckpointConfig(restore_from="./flow_cp/.json") + flow = MyFlow.from_checkpoint(config) + result = flow.kickoff() + ``` + + + ```python + agent = Agent( + role="Researcher", + goal="Research topics", + backstory="Expert researcher", + checkpoint=CheckpointConfig( + location="./agent_cp", + on_events=["lite_agent_execution_completed"], + ), + ) + result = agent.kickoff(messages=[{"role": "user", "content": "Research AI trends"}]) + ``` + + + -## 이벤트 타입 + + 모든 이벤트에 핸들러를 등록하고 `state.checkpoint()`를 호출합니다. -`on_events` 필드는 이벤트 타입 문자열의 조합을 받습니다. 일반적인 선택: + + ```python Sync + from crewai.events.event_bus import crewai_event_bus + from crewai.events.types.llm_events import LLMCallCompletedEvent + + @crewai_event_bus.on(LLMCallCompletedEvent) + def on_llm_done(source, event, state): + path = state.checkpoint("./my_checkpoints") + print(f"체크포인트 저장: {path}") + ``` + ```python Async + from crewai.events.event_bus import crewai_event_bus + from crewai.events.types.llm_events import LLMCallCompletedEvent + + @crewai_event_bus.on(LLMCallCompletedEvent) + async def on_llm_done_async(source, event, state): + path = await state.acheckpoint("./my_checkpoints") + print(f"체크포인트 저장: {path}") + ``` + + + 핸들러가 세 개의 매개변수를 받을 때 `state` 인수가 자동으로 제공됩니다. 전체 이벤트 카탈로그는 [Event Listeners](/ko/concepts/event-listener) 문서를 참조하세요. + + + + ```bash + crewai checkpoint # .checkpoints/ 또는 .checkpoints.db 자동 감지 + crewai checkpoint --location ./my_checkpoints + crewai checkpoint --location ./.checkpoints.db + ``` + + + Checkpoint TUI + + + 왼쪽 패널은 체크포인트를 브랜치별로 그룹화하며, 포크는 부모 아래에 중첩됩니다. 체크포인트를 선택하면 메타데이터, 엔티티 상태, 태스크 진행 상황이 표시됩니다. **Resume**은 실행을 계속하고, **Fork**는 새 브랜치를 시작합니다. + + 세부 정보 패널에는 두 개의 편집 가능한 영역이 있습니다: + + - **Inputs** — 원래 kickoff의 입력으로, 미리 채워져 있으며 편집 가능합니다. + - **태스크 출력** — 완료된 태스크의 출력. 출력을 편집하고 **Fork**를 누르면 다운스트림 태스크가 무효화되어 수정된 컨텍스트로 다시 실행됩니다. + + + "what if" 탐색에 유용합니다: 포크, 조정, 관찰. + + + + + ```bash + crewai checkpoint list ./my_checkpoints + crewai checkpoint info ./my_checkpoints/.json + crewai checkpoint info ./.checkpoints.db + ``` + + + +## 레퍼런스 + +### `CheckpointConfig` + + + 스토리지 대상. `JsonProvider`는 디렉토리, `SqliteProvider`는 데이터베이스 파일 경로. + + + + 체크포인트를 트리거하는 이벤트 타입. [이벤트 타입](#이벤트-타입) 참조. + + + + 스토리지 백엔드. `JsonProvider` 또는 `SqliteProvider`. + + + + 보관할 최대 체크포인트 수. 각 기록 후 가장 오래된 것이 제거됩니다. + + + + `from_checkpoint`를 통해 전달될 때 복원할 체크포인트. + + +### `checkpoint` 필드 값 + +`Crew`, `Flow`, `Agent`에서 사용 가능. + + + 부모에서 상속. + + + + 기본값으로 활성화. + + + + 명시적 옵트아웃. 상속을 중단합니다. + + + + 사용자 정의 설정. + + +### 이벤트 타입 + +`on_events`에 대한 일반적인 값: | 사용 사례 | 이벤트 | |:----------|:-------| -| 각 태스크 완료 후 (Crew) | `["task_completed"]` | +| 각 태스크 완료 후 | `["task_completed"]` | | 각 플로우 메서드 완료 후 | `["method_execution_finished"]` | | 에이전트 실행 완료 후 | `["agent_execution_completed"]`, `["lite_agent_execution_completed"]` | | 크루 완료 시에만 | `["crew_kickoff_completed"]` | @@ -200,30 +366,24 @@ crew = Crew( | 모든 이벤트 | `["*"]` | -`["*"]` 또는 `llm_call_completed`와 같은 고빈도 이벤트를 사용하면 많은 체크포인트 파일이 생성되어 성능에 영향을 줄 수 있습니다. `max_checkpoints`를 사용하여 디스크 사용량을 제한하세요. +`["*"]` 및 `llm_call_completed`와 같은 고빈도 이벤트는 많은 체크포인트를 기록하고 성능을 저하시킬 수 있습니다. `max_checkpoints`와 함께 사용하세요. -## 수동 체크포인팅 +### 스토리지 프로바이더 -완전한 제어를 위해 자체 이벤트 핸들러를 등록하고 `state.checkpoint()`를 직접 호출할 수 있습니다: + + 체크포인트당 하나의 파일, `location` 내부에 `_.json` 형식으로 명명. + -```python -from crewai.events.event_bus import crewai_event_bus -from crewai.events.types.llm_events import LLMCallCompletedEvent + + WAL 저널링이 있는 `location`의 단일 데이터베이스 파일. + -# 동기 핸들러 -@crewai_event_bus.on(LLMCallCompletedEvent) -def on_llm_done(source, event, state): - path = state.checkpoint("./my_checkpoints") - print(f"체크포인트 저장: {path}") +### CLI -# 비동기 핸들러 -@crewai_event_bus.on(LLMCallCompletedEvent) -async def on_llm_done_async(source, event, state): - path = await state.acheckpoint("./my_checkpoints") - print(f"체크포인트 저장: {path}") -``` - -`state` 인수는 핸들러가 3개의 매개변수를 받을 때 이벤트 버스가 자동으로 전달하는 `RuntimeState`입니다. [Event Listeners](/ko/concepts/event-listener) 문서에 나열된 모든 이벤트 타입에 핸들러를 등록할 수 있습니다. - -체크포인팅은 best-effort입니다: 체크포인트 기록이 실패하면 오류가 로그에 기록되지만 실행은 중단 없이 계속됩니다. +| 명령 | 목적 | +|:-----|:-----| +| `crewai checkpoint` | TUI 실행; 스토리지 자동 감지. | +| `crewai checkpoint --location ` | 특정 위치에 대해 TUI 실행. | +| `crewai checkpoint list ` | 체크포인트 나열. | +| `crewai checkpoint info ` | 체크포인트 파일 또는 SQLite 데이터베이스의 최신 항목 검사. | diff --git a/docs/pt-BR/concepts/checkpointing.mdx b/docs/pt-BR/concepts/checkpointing.mdx index 25db59713..5028f4a4f 100644 --- a/docs/pt-BR/concepts/checkpointing.mdx +++ b/docs/pt-BR/concepts/checkpointing.mdx @@ -5,225 +5,385 @@ icon: floppy-disk mode: "wide" --- - -O checkpointing esta em versao inicial. As APIs podem mudar em versoes futuras. - +O checkpointing salva um snapshot do estado de execucao durante uma execucao para que uma crew, flow ou agente possa retomar apos uma falha ou ser bifurcado em uma branch alternativa. -## Visao Geral + + + Como o checkpointing funciona: eventos, armazenamento e heranca. + + + Um passo a passo de 5 minutos: executar, interromper, retomar. + + + Receitas focadas em tarefas para fluxos comuns. + + + `CheckpointConfig`, eventos, provedores e CLI. + + -O checkpointing salva automaticamente o estado de execucao durante uma execucao. Se uma crew, flow ou agente falhar no meio da execucao, voce pode restaurar a partir do ultimo checkpoint e retomar sem reexecutar o trabalho ja concluido. +## Explicacao -## Inicio Rapido +### O que e um checkpoint -```python -from crewai import Crew, CheckpointConfig +Um checkpoint e um snapshot serializado do `RuntimeState` gravado em um ponto da execucao. Ele registra quais tarefas foram concluidas, suas saidas, os inputs atuais e um ID de linhagem que identifica a execucao. -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=True, # usa padroes: ./.checkpoints, em task_completed -) -result = crew.kickoff() -``` +Ao restaurar a partir de um checkpoint, o CrewAI reconstroi esse estado, pula o trabalho ja concluido e continua. Ao fazer fork, o CrewAI restaura o estado sob uma nova linhagem para que a nova branch e a execucao original nao se sobreponham. -Os arquivos de checkpoint sao gravados em `./.checkpoints/` apos cada tarefa concluida. +### Quando os checkpoints sao gravados -## Configuracao +O checkpointing e orientado a eventos. O runtime se inscreve nos eventos selecionados em `on_events` e grava um checkpoint sempre que um e disparado. O padrao `task_completed` produz um checkpoint por tarefa finalizada — um equilibrio razoavel entre granularidade e uso de disco. Eventos de alta frequencia como `llm_call_completed` estao disponiveis para recuperacao mais granular, mas gravam muito mais arquivos. -Use `CheckpointConfig` para controle total: +### Armazenamento -```python -from crewai import Crew, CheckpointConfig +Dois provedores acompanham o CrewAI: -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=CheckpointConfig( - location="./my_checkpoints", - on_events=["task_completed", "crew_kickoff_completed"], - max_checkpoints=5, - ), -) -``` +- `JsonProvider` grava um arquivo por checkpoint. Legivel e facil de inspecionar. +- `SqliteProvider` grava em um unico banco SQLite. Melhor para checkpointing de alta frequencia. -### Campos do CheckpointConfig +Ambos removem os checkpoints mais antigos quando `max_checkpoints` esta definido. -| Campo | Tipo | Padrao | Descricao | -|:------|:-----|:-------|:----------| -| `location` | `str` | `"./.checkpoints"` | Caminho para os arquivos de checkpoint | -| `on_events` | `list[str]` | `["task_completed"]` | Tipos de evento que acionam um checkpoint | -| `provider` | `BaseProvider` | `JsonProvider()` | Backend de armazenamento | -| `max_checkpoints` | `int \| None` | `None` | Maximo de arquivos a manter; os mais antigos sao removidos primeiro | + +As gravacoes de checkpoint sao best-effort. Um checkpoint que falha e registrado em log, mas nao interrompe a execucao. + -### Heranca e Desativacao +### Modelo de heranca -O campo `checkpoint` em Crew, Flow e Agent aceita `CheckpointConfig`, `True`, `False` ou `None`: +`Crew`, `Flow` e `Agent` aceitam um argumento `checkpoint`. Filhos herdam do pai a menos que definam seu proprio valor ou passem `False` para desativar. Ative o checkpointing uma vez na crew e todos os agentes participam, ou exclua um agente seletivamente. -| Valor | Comportamento | -|:------|:--------------| -| `None` (padrao) | Herda do pai. Um agente herda a configuracao da crew. | -| `True` | Ativa com padroes. | -| `False` | Desativacao explicita. Interrompe a heranca do pai. | -| `CheckpointConfig(...)` | Configuracao personalizada. | +## Tutorial: Retomar uma crew com falha -```python -crew = Crew( - agents=[ - Agent(role="Researcher", ...), # herda checkpoint da crew - Agent(role="Writer", ..., checkpoint=False), # desativado, sem checkpoints - ], - tasks=[...], - checkpoint=True, -) -``` +Este passo a passo leva cerca de 5 minutos. Voce executara uma crew de duas tarefas, a interrompera no meio e a retomara a partir do checkpoint salvo. -## Retomando a partir de um Checkpoint + + + ```python + from crewai import Agent, Crew, Task -```python -# Restaurar e retomar -crew = Crew.from_checkpoint("./my_checkpoints/20260407T120000_abc123.json") -result = crew.kickoff() # retoma a partir da ultima tarefa concluida -``` + researcher = Agent(role="Researcher", goal="Research", backstory="Expert") + writer = Agent(role="Writer", goal="Write", backstory="Expert") -A crew restaurada pula tarefas ja concluidas e retoma a partir da primeira incompleta. + crew = Crew( + agents=[researcher, writer], + tasks=[ + Task(description="Research AI trends", agent=researcher, expected_output="bullets"), + Task(description="Write a summary", agent=writer, expected_output="paragraph"), + ], + checkpoint=True, + ) + ``` + + + ```python + result = crew.kickoff() + ``` -## Funciona em Crew, Flow e Agent + Pressione `Ctrl+C` apos a primeira tarefa concluir. Em `./.checkpoints/`, um arquivo `_.json` e o checkpoint. + + + ```python + from crewai import CheckpointConfig -### Crew + result = crew.kickoff( + from_checkpoint=CheckpointConfig( + restore_from="./.checkpoints/_.json", + ), + ) + ``` -```python -crew = Crew( - agents=[researcher, writer], - tasks=[research_task, write_task, review_task], - checkpoint=CheckpointConfig(location="./crew_cp"), -) -``` + A tarefa de pesquisa e pulada, o escritor executa contra a saida de pesquisa salva e a crew finaliza. + + -Gatilho padrao: `task_completed` (um checkpoint por tarefa finalizada). +## Guias de uso -### Flow + + + ```python + crew = Crew(agents=[...], tasks=[...], checkpoint=True) + ``` -```python -from crewai.flow.flow import Flow, start, listen -from crewai import CheckpointConfig + Grava em `./.checkpoints/` em cada `task_completed`. + -class MyFlow(Flow): - @start() - def step_one(self): - return "data" + + ```python + from crewai import Crew, CheckpointConfig - @listen(step_one) - def step_two(self, data): - return process(data) + crew = Crew( + agents=[...], + tasks=[...], + checkpoint=CheckpointConfig( + location="./my_checkpoints", + on_events=["task_completed", "crew_kickoff_completed"], + max_checkpoints=5, + ), + ) + ``` + -flow = MyFlow( - checkpoint=CheckpointConfig( - location="./flow_cp", - on_events=["method_execution_finished"], - ), -) -result = flow.kickoff() + + + ```python JsonProvider + from crewai import Crew, CheckpointConfig + from crewai.state import JsonProvider -# Retomar -flow = MyFlow.from_checkpoint("./flow_cp/20260407T120000_abc123.json") -result = flow.kickoff() -``` + crew = Crew( + agents=[...], + tasks=[...], + checkpoint=CheckpointConfig( + location="./my_checkpoints", + provider=JsonProvider(), + max_checkpoints=5, + ), + ) + ``` + ```python SqliteProvider + from crewai import Crew, CheckpointConfig + from crewai.state import SqliteProvider -### Agent + crew = Crew( + agents=[...], + tasks=[...], + checkpoint=CheckpointConfig( + location="./.checkpoints.db", + provider=SqliteProvider(), + max_checkpoints=50, + ), + ) + ``` + -```python -agent = Agent( - role="Researcher", - goal="Research topics", - backstory="Expert researcher", - checkpoint=CheckpointConfig( - location="./agent_cp", - on_events=["lite_agent_execution_completed"], - ), -) -result = agent.kickoff(messages=[{"role": "user", "content": "Research AI trends"}]) -``` + + O SQLite ativa o modo journal WAL para leituras concorrentes. Prefira-o para checkpointing de alta frequencia. + + -## Provedores de Armazenamento + + ```python + crew = Crew( + agents=[ + Agent(role="Researcher", ...), + Agent(role="Writer", ..., checkpoint=False), + ], + tasks=[...], + checkpoint=True, + ) + ``` + -O CrewAI inclui dois provedores de armazenamento para checkpoints. + + ```python + config = CheckpointConfig(restore_from="./my_checkpoints/.json") + crew = Crew.from_checkpoint(config) + result = crew.kickoff() + ``` + -### JsonProvider (padrao) + + `fork()` restaura um checkpoint sob uma nova linhagem para que a nova execucao nao colida com a original. -Grava cada checkpoint como um arquivo JSON separado. + ```python + config = CheckpointConfig(restore_from="./my_checkpoints/.json") + crew = Crew.fork(config, branch="experiment-a") + result = crew.kickoff(inputs={"strategy": "aggressive"}) + ``` -```python -from crewai import Crew, CheckpointConfig -from crewai.state import JsonProvider + O label `branch` e opcional; um e gerado se omitido. + -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=CheckpointConfig( - location="./my_checkpoints", - provider=JsonProvider(), - max_checkpoints=5, - ), -) -``` + + + + ```python + crew = Crew( + agents=[researcher, writer], + tasks=[research_task, write_task, review_task], + checkpoint=CheckpointConfig(location="./crew_cp"), + ) + ``` -### SqliteProvider + Gatilho padrao: `task_completed`. + + + ```python + from crewai.flow.flow import Flow, start, listen + from crewai import CheckpointConfig -Armazena todos os checkpoints em um unico arquivo SQLite. + class MyFlow(Flow): + @start() + def step_one(self): + return "data" -```python -from crewai import Crew, CheckpointConfig -from crewai.state import SqliteProvider + @listen(step_one) + def step_two(self, data): + return process(data) -crew = Crew( - agents=[...], - tasks=[...], - checkpoint=CheckpointConfig( - location="./.checkpoints.db", - provider=SqliteProvider(), - ), -) -``` + flow = MyFlow( + checkpoint=CheckpointConfig( + location="./flow_cp", + on_events=["method_execution_finished"], + ), + ) + result = flow.kickoff() + config = CheckpointConfig(restore_from="./flow_cp/.json") + flow = MyFlow.from_checkpoint(config) + result = flow.kickoff() + ``` + + + ```python + agent = Agent( + role="Researcher", + goal="Research topics", + backstory="Expert researcher", + checkpoint=CheckpointConfig( + location="./agent_cp", + on_events=["lite_agent_execution_completed"], + ), + ) + result = agent.kickoff(messages=[{"role": "user", "content": "Research AI trends"}]) + ``` + + + -## Tipos de Evento + + Registre um handler em qualquer evento e chame `state.checkpoint()`. -O campo `on_events` aceita qualquer combinacao de strings de tipo de evento. Escolhas comuns: + + ```python Sync + from crewai.events.event_bus import crewai_event_bus + from crewai.events.types.llm_events import LLMCallCompletedEvent -| Caso de Uso | Eventos | + @crewai_event_bus.on(LLMCallCompletedEvent) + def on_llm_done(source, event, state): + path = state.checkpoint("./my_checkpoints") + print(f"Checkpoint salvo: {path}") + ``` + ```python Async + from crewai.events.event_bus import crewai_event_bus + from crewai.events.types.llm_events import LLMCallCompletedEvent + + @crewai_event_bus.on(LLMCallCompletedEvent) + async def on_llm_done_async(source, event, state): + path = await state.acheckpoint("./my_checkpoints") + print(f"Checkpoint salvo: {path}") + ``` + + + Um argumento `state` e fornecido automaticamente quando o handler recebe tres parametros. Veja [Event Listeners](/pt-BR/concepts/event-listener) para o catalogo completo de eventos. + + + + ```bash + crewai checkpoint # detecta automaticamente .checkpoints/ ou .checkpoints.db + crewai checkpoint --location ./my_checkpoints + crewai checkpoint --location ./.checkpoints.db + ``` + + + Checkpoint TUI + + + O painel esquerdo agrupa checkpoints por branch; forks aninham sob seu pai. Selecionar um checkpoint mostra seus metadados, estado da entidade e progresso das tarefas. **Resume** continua a execucao; **Fork** inicia uma nova branch. + + O painel de detalhes expoe duas areas editaveis: + + - **Inputs** — os inputs originais do kickoff, preenchidos e editaveis. + - **Saidas das tarefas** — saidas das tarefas concluidas. Editar uma saida e pressionar **Fork** invalida tarefas downstream para que sejam reexecutadas com o contexto modificado. + + + Util para exploracao de cenarios: fork, ajuste, observe. + + + + + ```bash + crewai checkpoint list ./my_checkpoints + crewai checkpoint info ./my_checkpoints/.json + crewai checkpoint info ./.checkpoints.db + ``` + + + +## Referencia + +### `CheckpointConfig` + + + Destino do armazenamento. Diretorio para `JsonProvider`, caminho de arquivo de banco para `SqliteProvider`. + + + + Tipos de evento que disparam um checkpoint. Veja [tipos de evento](#tipos-de-evento). + + + + Backend de armazenamento. `JsonProvider` ou `SqliteProvider`. + + + + Maximo de checkpoints a reter. Os mais antigos sao removidos apos cada gravacao. + + + + Checkpoint a restaurar quando passado via `from_checkpoint`. + + +### Valores do campo `checkpoint` + +Aceito por `Crew`, `Flow` e `Agent`. + + + Herda do pai. + + + + Ativa com padroes. + + + + Desativacao explicita. Interrompe a heranca. + + + + Configuracao personalizada. + + +### Tipos de evento + +Valores comuns para `on_events`: + +| Caso de uso | Eventos | |:------------|:--------| -| Apos cada tarefa (Crew) | `["task_completed"]` | +| Apos cada tarefa | `["task_completed"]` | | Apos cada metodo do flow | `["method_execution_finished"]` | | Apos execucao do agente | `["agent_execution_completed"]`, `["lite_agent_execution_completed"]` | | Apenas na conclusao da crew | `["crew_kickoff_completed"]` | | Apos cada chamada LLM | `["llm_call_completed"]` | -| Em tudo | `["*"]` | +| Tudo | `["*"]` | -Usar `["*"]` ou eventos de alta frequencia como `llm_call_completed` gravara muitos arquivos de checkpoint e pode impactar o desempenho. Use `max_checkpoints` para limitar o uso de disco. +`["*"]` e eventos de alta frequencia como `llm_call_completed` gravam muitos checkpoints e podem degradar o desempenho. Combine com `max_checkpoints`. -## Checkpointing Manual +### Provedores de armazenamento -Para controle total, registre seu proprio handler de evento e chame `state.checkpoint()` diretamente: + + Um arquivo por checkpoint, nomeado `_.json` dentro de `location`. + -```python -from crewai.events.event_bus import crewai_event_bus -from crewai.events.types.llm_events import LLMCallCompletedEvent + + Arquivo de banco unico em `location` com journaling WAL. + -# Handler sincrono -@crewai_event_bus.on(LLMCallCompletedEvent) -def on_llm_done(source, event, state): - path = state.checkpoint("./my_checkpoints") - print(f"Checkpoint salvo: {path}") +### CLI -# Handler assincrono -@crewai_event_bus.on(LLMCallCompletedEvent) -async def on_llm_done_async(source, event, state): - path = await state.acheckpoint("./my_checkpoints") - print(f"Checkpoint salvo: {path}") -``` - -O argumento `state` e o `RuntimeState` passado automaticamente pelo barramento de eventos quando seu handler aceita 3 parametros. Voce pode registrar handlers em qualquer tipo de evento listado na documentacao de [Event Listeners](/pt-BR/concepts/event-listener). - -O checkpointing e best-effort: se uma gravacao de checkpoint falhar, o erro e registrado no log, mas a execucao continua sem interrupcao. +| Comando | Proposito | +|:--------|:----------| +| `crewai checkpoint` | Inicia a TUI; detecta o armazenamento automaticamente. | +| `crewai checkpoint --location ` | Inicia a TUI em uma localizacao especifica. | +| `crewai checkpoint list ` | Lista checkpoints. | +| `crewai checkpoint info ` | Inspeciona um arquivo de checkpoint ou a entrada mais recente em um banco SQLite. |