Files
crewAI/lib/crewai/tests/test_async_human_feedback.py
João Moura bb477f8a91
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Check Documentation Broken Links / Check broken links (push) Has been cancelled
Vulnerability Scan / pip-audit (push) Has been cancelled
Nightly Canary Release / Check for new commits (push) Has been cancelled
Nightly Canary Release / Build nightly packages (push) Has been cancelled
Nightly Canary Release / Publish nightly to PyPI (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
JSON first crews (#6131)
* feat(cli): introduce JSON crew project support and TUI enhancements

- Added support for creating and running JSON-defined crew projects, allowing users to scaffold projects with a new `create_json_crew.py` file.
- Implemented a full-screen Textual TUI for crew execution in `crew_run_tui.py`, enhancing user interaction with a two-column layout.
- Updated `run_crew.py` to prioritize JSON crew projects and added daemon mode for running without TUI.
- Introduced interactive pickers in `tui_picker.py` for improved CLI prompts.
- Enhanced validation for JSON crew files in `validate.py` to ensure proper structure and agent definitions.
- Updated `.gitignore` to exclude demo and crewai directories.

* feat: update LLM model references to gpt-5.4-mini

- Changed default LLM model from gpt-4o-mini to gpt-5.4-mini across various files, including CLI options, JSON crew configurations, and agent definitions.
- Enhanced benchmark and human feedback functionalities to utilize the new model.
- Improved user interface elements in the TUI for better interaction and feedback during execution.
- Added support for new skills directory in JSON crew project creation.

* feat(benchmark): add crew-level benchmarking functionality

- Introduced a new `benchmark` command in the CLI for crew-level benchmarking, allowing users to specify agents, models, and timeout settings.
- Implemented `CrewBenchmarkCase` to handle crew-level benchmark cases with inputs and criteria.
- Enhanced the benchmark runner to support progress tracking and detailed reporting of results for multiple models.
- Added tests for loading crew benchmark cases and validating their structure.
- Updated existing benchmark functions to accommodate the new crew-level execution model.

* feat(cli): enhance JSON crew project functionality and TUI improvements

- Added optional agent-level guardrails and advanced options in JSON crew configurations to improve output validation and flexibility.
- Updated the TUI to better handle plan step statuses, including visual indicators for task completion and failure.
- Introduced methods for parsing and managing step observation events, ensuring accurate updates to task statuses during execution.
- Enhanced validation for JSON crew projects, ensuring proper structure and error handling for agent and task definitions.
- Added comprehensive tests for new features and validation logic, ensuring robustness in JSON crew project handling.

* refactor(cli): streamline JSON crew project handling and improve validation

- Refactored JSON crew project loading and validation logic to enhance clarity and maintainability.
- Introduced utility functions for finding JSON crew files, improving code reuse across modules.
- Removed deprecated benchmark functionality and associated tests to simplify the codebase.
- Updated CLI commands to utilize the new JSON project structure, ensuring compatibility with recent changes.
- Enhanced test coverage for JSON crew project features, ensuring robust validation and error handling.

* feat(cli): enhance activity log navigation and focus management

- Added functionality to focus on the activity log when navigating through log entries.
- Implemented refresh logic for the log panel to ensure updates are displayed correctly during navigation.
- Improved keyboard navigation for log entries, allowing users to expand and scroll through logs seamlessly.
- Added tests to verify the correct behavior of log navigation and focus management in the TUI.

* feat(cli): enhance JSON crew project interaction and input handling

- Introduced a new function to enable prompt line editing for better user experience during input prompts.
- Updated the JSON crew project wizards to show interpolation hints for dynamic values, improving user guidance.
- Enhanced the handling of missing input placeholders by prompting users for required values during crew setup.
- Refactored the crew run logic to ensure proper loading and preparation of JSON-defined crews, including runtime input management.
- Added tests to verify the correct behavior of new input handling features and JSON crew project interactions.

* feat(cli): improve crew project input prompts and event handling

- Enhanced the `_prompt_text` function to allow for configurable spacing before prompts, improving user experience during input collection.
- Updated the wizards for agent and task creation to utilize the new prompt configuration, ensuring a more compact and streamlined interaction.
- Introduced new plan step lifecycle events (`PlanStepStartedEvent`, `PlanStepCompletedEvent`) to better track the execution status of plan steps.
- Refactored the step executor to emit these events during the execution of tasks, improving observability and debugging capabilities.
- Added tests to verify the correct behavior of new prompt handling and event emissions during crew project execution.

* fix: refine json-first crew interactions

* fix: prioritize common json crew tools

* fix: make json crew more tools expandable

* fix: show json crew tools by category

* feat(memory): update default embedder to OpenAI text-embedding-3-large and enhance memory compatibility

- Changed the default embedding model for Memory to OpenAI text-embedding-3-large, which uses 3072-dimensional vectors.
- Added warnings regarding compatibility issues with existing local memory stores created with 1536-dimensional embeddings.
- Updated documentation to reflect the new default embedder and its configuration options.
- Enhanced the CLI and codebase to support the new embedding model across various components, ensuring a seamless transition for users.

* fix: address PR review feedback for JSON-first crews

Review blockers:
- Forward trained_agents_file to JSON crews: crewai run -f now exports
  CREWAI_TRAINED_AGENTS_FILE for the in-process JSON crew path
- Wizard agent picker: Esc/cancel now reprompts instead of silently
  assigning the first agent
- JSON tool resolution hard-fails: unknown tool names, missing custom
  tool files, and invalid custom tool modules raise JSONProjectError
  with actionable messages instead of warn-and-continue
- Embedding dimension mismatch: LanceDB and Qdrant Edge storages raise
  EmbeddingDimensionMismatchError with reset/pin guidance instead of
  silently zero-filling vectors or returning empty search results
- Custom tool code execution documented in loader docstring and the
  scaffolded project README

CI fixes:
- ruff format across lib/
- All 133 PR-introduced mypy errors fixed (llm.py lazy-litellm and
  cli.py lazy command shims now use TYPE_CHECKING imports; textual
  is_mounted misuse fixed; pick_many overloads; misc annotations)

Bot review comments:
- Empty except blocks now have explanatory comments or debug logging
- Removed unused _C_BG/_C_PANEL/_C_BORDER globals and redundant
  import re; tests use a single import style for create_json_crew

Tests: trained-agents propagation, wizard cancel, tool resolution
failures, and dimension mismatch guidance.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: address second round of PR review comments

Cursor Bugbot:
- Wizard agent slugs: strip to [a-z0-9_] and fall back to agent_<n> so
  symbol-only roles can't produce an empty agents/.jsonc filename
- Wizard task names: dedupe against prior task names and fall back to
  task_<n> for symbol-only descriptions

CodeRabbit:
- Agent.message(): import Task explicitly at runtime instead of relying
  on the namespace injection done by crewai/__init__
- Async executor: move the native-tools-unsupported fallback from
  _ainvoke_loop_react (self-recursion) to _ainvoke_loop_native_tools,
  mirroring the sync implementation
- StepExecutor downgrade: keep the in-step conversation and append the
  text-tooling instructions instead of rebuilding messages, so completed
  native tool calls are not re-executed
- crewai-files: extension-based MIME lookup now runs before byte
  sniffing so csv/xml types are not degraded to text/plain
- Memory storages: validate every record in a save() batch against a
  consistent embedding dimension (LanceDB previously checked only the
  first record); added mixed-batch tests
- _print_post_tui_summary now typed against CrewRunApp
- Docs: Azure OpenAI default embedder change called out in the memory
  migration warning and provider table

Code quality bots:
- Removed unused _C_YELLOW/_C_CYAN (crew_run_tui) and _GREEN (tui_picker)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(cli): accordion tool picker in JSON crew wizard

The flat tool list had grown to ~90 rows. The picker now shows:
- Common tools always visible at the top
- Every other category as a single expandable row with tool and
  selection counts (e.g. "Search & Research  (27 tools, 2 selected)")
- Expanding a category collapses the previously expanded one
- Selections persist across expand/collapse via new preselected
  support in pick_many; cursor follows the toggled category row

tui_picker gains preselected + initial_cursor options on pick_many,
and Esc in multi-select now confirms the current selection instead of
discarding it (required so collapsing can't silently drop choices).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* refactor(cli): remove --daemon flag from crewai run

The flag only affected JSON crew projects — classic and flow projects
ignored it entirely, which made the behavior inconsistent. Removed the
option, the daemon code path (_run_json_crew_daemon), and its helper
(_load_json_crew_with_inputs).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: update run command tests after --daemon removal

lib/crewai/tests/cli/test_run_crew.py still asserted the old
run_crew(trained_agents_file=..., daemon=False) call signature.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(cli): exit codes, mid-run quit, async statuses, hyphen placeholders

Addresses the latest Bugbot review round:

- Failed JSON crew runs now exit non-zero (SystemExit(1)) so scripts
  and CI don't treat failures as success, mirroring the classic path
- Quitting the TUI mid-run now ends the process (os._exit(130));
  kickoff runs in a thread worker that cannot be force-cancelled, so
  letting the CLI return would leave LLM/tool work burning tokens in
  the background
- Sidebar task statuses are now async-safe: completion/failure events
  resolve the task's own row via identity instead of assuming the most
  recently started task, and starting a task no longer blanket-marks
  earlier active rows as done
- The runtime-input prompt regex now accepts hyphenated placeholder
  names ({my-topic}), matching kickoff's interpolation pattern

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: validation safety, custom tool sandboxing, TUI log integrity, memory error surfacing

- Deploy validation no longer executes project code: validation mode
  checks tool declarations structurally (well-formed entries, custom
  tool file exists) without importing or instantiating anything.
  custom:<name> resolution only happens on the actual run path.
- custom:<name> is constrained to [A-Za-z_][A-Za-z0-9_]* and the
  resolved path must stay inside the project's tools/ directory, so
  custom:../foo or absolute-path names cannot execute code outside it.
  Tool paths resolve relative to the crew project root, not cwd.
- TUI task logs are built from per-task state captured at task start
  (idx, description, agent, start time); an out-of-order completion
  takes its output from the event and no longer steals or resets the
  current task's streamed steps/output.
- EmbeddingDimensionMismatchError now inherits ValueError instead of
  RuntimeError so background saves surface it through
  MemorySaveFailedEvent instead of silently dropping the save; the
  shutdown catch in _background_encode_batch is narrowed to the
  "cannot schedule new futures" case.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(cli): declared project type wins over crew.json presence

A flow project that also contains a crew.json(c) file now runs and
validates as the flow it declares in pyproject.toml instead of being
hijacked by the JSON crew path. Both crewai run (_has_json_crew) and
deploy validation (_is_json_crew) check tool.crewai.type; a missing or
unreadable pyproject still means a bare JSON crew project.

Also documents why StepObservationFailedEvent intentionally marks the
plan step "done": the event signals an observer failure, not a step
failure, and the executor continues past it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(cli): type the declared_type locals so mypy stays clean

Comparing an Any-typed .get() chain returns Any, which tripped
no-any-return on the previous commit.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 04:19:48 -03:00

1356 lines
49 KiB
Python

"""Tests for async human feedback functionality.
This module tests the async/non-blocking human feedback flow, including:
- PendingFeedbackContext creation and serialization
- HumanFeedbackPending exception handling
- HumanFeedbackProvider protocol
- ConsoleProvider
- Flow.from_pending() and Flow.resume()
- SQLite persistence with pending feedback
"""
from __future__ import annotations
import json
import os
import tempfile
from datetime import datetime
from typing import Any
from unittest.mock import MagicMock, patch
import pytest
from pydantic import BaseModel
from crewai.flow import Flow, HumanFeedbackResult, start, listen, human_feedback
from crewai.flow.async_feedback import (
ConsoleProvider,
HumanFeedbackPending,
HumanFeedbackProvider,
PendingFeedbackContext,
)
from crewai.flow.persistence import SQLiteFlowPersistence
# PendingFeedbackContext Tests
class TestPendingFeedbackContext:
"""Tests for PendingFeedbackContext dataclass."""
def test_create_basic_context(self) -> None:
"""Test creating a basic pending feedback context."""
context = PendingFeedbackContext(
flow_id="test-flow-123",
flow_class="myapp.flows.ReviewFlow",
method_name="review_content",
method_output="Content to review",
message="Please review this content:",
)
assert context.flow_id == "test-flow-123"
assert context.flow_class == "myapp.flows.ReviewFlow"
assert context.method_name == "review_content"
assert context.method_output == "Content to review"
assert context.message == "Please review this content:"
assert context.emit is None
assert context.default_outcome is None
assert context.metadata == {}
assert isinstance(context.requested_at, datetime)
def test_create_context_with_emit(self) -> None:
"""Test creating context with routing outcomes."""
context = PendingFeedbackContext(
flow_id="test-flow-456",
flow_class="myapp.flows.ApprovalFlow",
method_name="submit_for_approval",
method_output={"document": "content"},
message="Approve or reject:",
emit=["approved", "rejected", "needs_revision"],
default_outcome="needs_revision",
llm="gpt-4o-mini",
)
assert context.emit == ["approved", "rejected", "needs_revision"]
assert context.default_outcome == "needs_revision"
assert context.llm == "gpt-4o-mini"
def test_to_dict_serialization(self) -> None:
"""Test serializing context to dictionary."""
context = PendingFeedbackContext(
flow_id="test-flow-789",
flow_class="myapp.flows.TestFlow",
method_name="test_method",
method_output={"key": "value"},
message="Test message",
emit=["yes", "no"],
metadata={"channel": "#reviews"},
)
result = context.to_dict()
assert result["flow_id"] == "test-flow-789"
assert result["flow_class"] == "myapp.flows.TestFlow"
assert result["method_name"] == "test_method"
assert result["method_output"] == {"key": "value"}
assert result["message"] == "Test message"
assert result["emit"] == ["yes", "no"]
assert result["metadata"] == {"channel": "#reviews"}
assert "requested_at" in result
def test_from_dict_deserialization(self) -> None:
"""Test deserializing context from dictionary."""
data = {
"flow_id": "test-flow-abc",
"flow_class": "myapp.flows.TestFlow",
"method_name": "my_method",
"method_output": "output value",
"message": "Feedback message",
"emit": ["option_a", "option_b"],
"default_outcome": "option_a",
"metadata": {"user_id": "123"},
"llm": "gpt-4o-mini",
"requested_at": "2024-01-15T10:30:00",
}
context = PendingFeedbackContext.from_dict(data)
assert context.flow_id == "test-flow-abc"
assert context.flow_class == "myapp.flows.TestFlow"
assert context.method_name == "my_method"
assert context.emit == ["option_a", "option_b"]
assert context.default_outcome == "option_a"
assert context.llm == "gpt-4o-mini"
def test_roundtrip_serialization(self) -> None:
"""Test that to_dict/from_dict roundtrips correctly."""
original = PendingFeedbackContext(
flow_id="roundtrip-test",
flow_class="test.TestFlow",
method_name="test",
method_output={"nested": {"data": [1, 2, 3]}},
message="Test",
emit=["a", "b"],
metadata={"key": "value"},
)
serialized = original.to_dict()
restored = PendingFeedbackContext.from_dict(serialized)
assert restored.flow_id == original.flow_id
assert restored.flow_class == original.flow_class
assert restored.method_name == original.method_name
assert restored.method_output == original.method_output
assert restored.emit == original.emit
assert restored.metadata == original.metadata
# HumanFeedbackPending Exception Tests
class TestHumanFeedbackPending:
"""Tests for HumanFeedbackPending exception."""
def test_basic_exception(self) -> None:
"""Test creating basic pending exception."""
context = PendingFeedbackContext(
flow_id="exc-test",
flow_class="test.Flow",
method_name="method",
method_output="output",
message="message",
)
exc = HumanFeedbackPending(context=context)
assert exc.context == context
assert exc.callback_info == {}
assert "exc-test" in str(exc)
assert "method" in str(exc)
def test_exception_with_callback_info(self) -> None:
"""Test pending exception with callback information."""
context = PendingFeedbackContext(
flow_id="callback-test",
flow_class="test.Flow",
method_name="method",
method_output="output",
message="message",
)
exc = HumanFeedbackPending(
context=context,
callback_info={
"webhook_url": "https://example.com/webhook",
"slack_thread": "123456",
},
)
assert exc.callback_info["webhook_url"] == "https://example.com/webhook"
assert exc.callback_info["slack_thread"] == "123456"
def test_exception_with_custom_message(self) -> None:
"""Test pending exception with custom message."""
context = PendingFeedbackContext(
flow_id="msg-test",
flow_class="test.Flow",
method_name="method",
method_output="output",
message="message",
)
exc = HumanFeedbackPending(
context=context,
message="Custom pending message",
)
assert str(exc) == "Custom pending message"
def test_exception_is_catchable(self) -> None:
"""Test that exception can be caught and handled."""
context = PendingFeedbackContext(
flow_id="catch-test",
flow_class="test.Flow",
method_name="method",
method_output="output",
message="message",
)
with pytest.raises(HumanFeedbackPending) as exc_info:
raise HumanFeedbackPending(context=context)
assert exc_info.value.context.flow_id == "catch-test"
# HumanFeedbackProvider Protocol Tests
class TestHumanFeedbackProvider:
"""Tests for HumanFeedbackProvider protocol."""
def test_protocol_compliance_sync_provider(self) -> None:
"""Test that sync provider complies with protocol."""
class SyncProvider:
def request_feedback(
self, context: PendingFeedbackContext, flow: Flow
) -> str:
return "sync feedback"
provider = SyncProvider()
assert isinstance(provider, HumanFeedbackProvider)
def test_protocol_compliance_async_provider(self) -> None:
"""Test that async provider complies with protocol."""
class AsyncProvider:
def request_feedback(
self, context: PendingFeedbackContext, flow: Flow
) -> str:
raise HumanFeedbackPending(context=context)
provider = AsyncProvider()
assert isinstance(provider, HumanFeedbackProvider)
# ConsoleProvider Tests
class TestConsoleProvider:
"""Tests for ConsoleProvider."""
def test_provider_initialization(self) -> None:
"""Test console provider initialization."""
provider = ConsoleProvider()
assert provider.verbose is True
quiet_provider = ConsoleProvider(verbose=False)
assert quiet_provider.verbose is False
# SQLite Persistence Tests for Async Feedback
class TestSQLitePendingFeedback:
"""Tests for SQLite persistence with pending feedback."""
def test_save_and_load_pending_feedback(self) -> None:
"""Test saving and loading pending feedback context."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
context = PendingFeedbackContext(
flow_id="persist-test-123",
flow_class="test.TestFlow",
method_name="review",
method_output={"data": "test"},
message="Review this:",
emit=["approved", "rejected"],
llm="gpt-4o-mini",
)
state_data = {"counter": 10, "items": ["a", "b"]}
persistence.save_pending_feedback(
flow_uuid="persist-test-123",
context=context,
state_data=state_data,
)
result = persistence.load_pending_feedback("persist-test-123")
assert result is not None
loaded_state, loaded_context = result
assert loaded_state["counter"] == 10
assert loaded_state["items"] == ["a", "b"]
assert loaded_context.flow_id == "persist-test-123"
assert loaded_context.emit == ["approved", "rejected"]
def test_load_nonexistent_pending_feedback(self) -> None:
"""Test loading pending feedback that doesn't exist."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
result = persistence.load_pending_feedback("nonexistent-id")
assert result is None
def test_clear_pending_feedback(self) -> None:
"""Test clearing pending feedback after resume."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
context = PendingFeedbackContext(
flow_id="clear-test",
flow_class="test.Flow",
method_name="method",
method_output="output",
message="message",
)
persistence.save_pending_feedback(
flow_uuid="clear-test",
context=context,
state_data={"key": "value"},
)
assert persistence.load_pending_feedback("clear-test") is not None
persistence.clear_pending_feedback("clear-test")
assert persistence.load_pending_feedback("clear-test") is None
def test_replace_existing_pending_feedback(self) -> None:
"""Test that saving pending feedback replaces existing entry."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
flow_id = "replace-test"
context1 = PendingFeedbackContext(
flow_id=flow_id,
flow_class="test.Flow",
method_name="method1",
method_output="output1",
message="message1",
)
persistence.save_pending_feedback(
flow_uuid=flow_id,
context=context1,
state_data={"version": 1},
)
context2 = PendingFeedbackContext(
flow_id=flow_id,
flow_class="test.Flow",
method_name="method2",
method_output="output2",
message="message2",
)
persistence.save_pending_feedback(
flow_uuid=flow_id,
context=context2,
state_data={"version": 2},
)
result = persistence.load_pending_feedback(flow_id)
assert result is not None
state, context = result
assert state["version"] == 2
assert context.method_name == "method2"
# Custom Async Provider Tests
class TestCustomAsyncProvider:
"""Tests for custom async providers."""
def test_provider_raises_pending_exception(self) -> None:
"""Test that async provider raises HumanFeedbackPending."""
class WebhookProvider:
def __init__(self, webhook_url: str):
self.webhook_url = webhook_url
def request_feedback(
self, context: PendingFeedbackContext, flow: Flow
) -> str:
raise HumanFeedbackPending(
context=context,
callback_info={"url": f"{self.webhook_url}/{context.flow_id}"},
)
provider = WebhookProvider("https://example.com/api")
context = PendingFeedbackContext(
flow_id="webhook-test",
flow_class="test.Flow",
method_name="method",
method_output="output",
message="message",
)
mock_flow = MagicMock()
with pytest.raises(HumanFeedbackPending) as exc_info:
provider.request_feedback(context, mock_flow)
assert exc_info.value.callback_info["url"] == (
"https://example.com/api/webhook-test"
)
# Flow.from_pending and resume Tests
class TestFlowResumeWithFeedback:
"""Tests for Flow.from_pending and resume."""
def test_from_pending_uses_default_persistence(self) -> None:
"""Test that from_pending uses SQLiteFlowPersistence by default."""
class TestFlow(Flow):
@start()
def begin(self):
return "started"
with pytest.raises(ValueError, match="No pending feedback found"):
TestFlow.from_pending("nonexistent-id")
def test_from_pending_raises_for_missing_flow(self) -> None:
"""Test that from_pending raises error for nonexistent flow."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
class TestFlow(Flow):
@start()
def begin(self):
return "started"
with pytest.raises(ValueError, match="No pending feedback found"):
TestFlow.from_pending("nonexistent-id", persistence)
def test_from_pending_restores_state(self) -> None:
"""Test that from_pending correctly restores flow state."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
class TestState(BaseModel):
id: str = "test-restore-123"
counter: int = 0
class TestFlow(Flow[TestState]):
@start()
def begin(self):
return "started"
# Manually save pending feedback
context = PendingFeedbackContext(
flow_id="test-restore-123",
flow_class="test.TestFlow",
method_name="review",
method_output="content",
message="Review:",
)
persistence.save_pending_feedback(
flow_uuid="test-restore-123",
context=context,
state_data={"id": "test-restore-123", "counter": 42},
)
flow = TestFlow.from_pending("test-restore-123", persistence)
assert flow._pending_feedback_context is not None
assert flow._pending_feedback_context.flow_id == "test-restore-123"
assert flow._is_execution_resuming is True
assert flow.state.counter == 42
def test_resume_without_pending_raises_error(self) -> None:
"""Test that resume raises error without pending context."""
class TestFlow(Flow):
@start()
def begin(self):
return "started"
flow = TestFlow()
with pytest.raises(ValueError, match="No pending feedback context"):
flow.resume("some feedback")
def test_resume_from_async_context_raises_error(self) -> None:
"""Test that resume() raises RuntimeError when called from async context."""
import asyncio
class TestFlow(Flow):
@start()
def begin(self):
return "started"
async def call_resume_from_async():
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test.db")
persistence = SQLiteFlowPersistence(db_path)
context = PendingFeedbackContext(
flow_id="async-context-test",
flow_class="TestFlow",
method_name="begin",
method_output="output",
message="Review:",
)
persistence.save_pending_feedback(
flow_uuid="async-context-test",
context=context,
state_data={"id": "async-context-test"},
)
flow = TestFlow.from_pending("async-context-test", persistence)
# This should raise RuntimeError because we're in an async context
with pytest.raises(RuntimeError, match="cannot be called from within an async context"):
flow.resume("feedback")
asyncio.run(call_resume_from_async())
@pytest.mark.asyncio
async def test_resume_async_direct(self) -> None:
"""Test resume_async() can be called directly in async context."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test.db")
persistence = SQLiteFlowPersistence(db_path)
class TestFlow(Flow):
@start()
@human_feedback(message="Review:")
def generate(self):
return "content"
@listen(generate)
def process(self, result):
return f"processed: {result.feedback}"
context = PendingFeedbackContext(
flow_id="async-direct-test",
flow_class="TestFlow",
method_name="generate",
method_output="content",
message="Review:",
)
persistence.save_pending_feedback(
flow_uuid="async-direct-test",
context=context,
state_data={"id": "async-direct-test"},
)
flow = TestFlow.from_pending("async-direct-test", persistence)
with patch("crewai.flow.runtime.crewai_event_bus.emit"):
result = await flow.resume_async("async feedback")
assert flow.last_human_feedback is not None
assert flow.last_human_feedback.feedback == "async feedback"
@patch("crewai.flow.runtime.crewai_event_bus.emit")
def test_resume_basic(self, mock_emit: MagicMock) -> None:
"""Test basic resume functionality."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
class TestFlow(Flow):
@start()
@human_feedback(message="Review this:")
def generate(self):
return "generated content"
@listen(generate)
def process(self, feedback_result):
return f"Processed: {feedback_result.feedback}"
# Manually save pending feedback (simulating async pause)
context = PendingFeedbackContext(
flow_id="resume-test-123",
flow_class="test.TestFlow",
method_name="generate",
method_output="generated content",
message="Review this:",
)
persistence.save_pending_feedback(
flow_uuid="resume-test-123",
context=context,
state_data={"id": "resume-test-123"},
)
flow = TestFlow.from_pending("resume-test-123", persistence)
result = flow.resume("looks good!")
assert flow.last_human_feedback is not None
assert flow.last_human_feedback.feedback == "looks good!"
assert flow.last_human_feedback.output == "generated content"
assert persistence.load_pending_feedback("resume-test-123") is None
@patch("crewai.flow.runtime.crewai_event_bus.emit")
def test_terminal_resume_without_emit_returns_feedback_result(
self, mock_emit: MagicMock
) -> None:
"""Terminal resumed non-emit methods return the full feedback result."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
class TestFlow(Flow):
@start()
@human_feedback(message="Review this:", metadata={"stage": "draft"})
def generate(self):
return {"content": "generated content"}
context = PendingFeedbackContext(
flow_id="terminal-non-emit-test-123",
flow_class="test.TestFlow",
method_name="generate",
method_output={"content": "generated content"},
message="Review this:",
metadata={"stage": "draft"},
)
persistence.save_pending_feedback(
flow_uuid="terminal-non-emit-test-123",
context=context,
state_data={"id": "terminal-non-emit-test-123"},
)
flow = TestFlow.from_pending("terminal-non-emit-test-123", persistence)
result = flow.resume("looks good!")
assert isinstance(result, HumanFeedbackResult)
assert result.output == {"content": "generated content"}
assert result.feedback == "looks good!"
assert result.outcome is None
assert result.metadata == {"stage": "draft"}
assert flow.method_outputs == [result]
@patch("crewai.flow.runtime.crewai_event_bus.emit")
def test_resume_routing(self, mock_emit: MagicMock) -> None:
"""Test resume with routing."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
class TestFlow(Flow):
result_path: str = ""
@start()
@human_feedback(
message="Approve?",
emit=["approved", "rejected"],
llm="gpt-4o-mini",
)
def review(self):
return "content"
@listen("approved")
def handle_approved(self):
self.result_path = "approved"
return "Approved!"
@listen("rejected")
def handle_rejected(self):
self.result_path = "rejected"
return "Rejected!"
context = PendingFeedbackContext(
flow_id="route-test-123",
flow_class="test.TestFlow",
method_name="review",
method_output="content",
message="Approve?",
emit=["approved", "rejected"],
llm="gpt-4o-mini",
)
persistence.save_pending_feedback(
flow_uuid="route-test-123",
context=context,
state_data={"id": "route-test-123"},
)
flow = TestFlow.from_pending("route-test-123", persistence)
with patch.object(flow, "_collapse_to_outcome", return_value="approved"):
result = flow.resume("yes, this looks great")
assert flow.last_human_feedback.outcome == "approved"
assert flow.result_path == "approved"
@patch("crewai.flow.runtime.crewai_event_bus.emit")
def test_terminal_resume_with_emit_returns_method_output(
self, mock_emit: MagicMock
) -> None:
"""Terminal resumed emit methods return the original method output."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
method_output = {"content": "original content", "status": "ready"}
class TestFlow(Flow):
@start()
@human_feedback(
message="Approve?",
emit=["approved", "rejected"],
llm="gpt-4o-mini",
)
def review(self):
return method_output
context = PendingFeedbackContext(
flow_id="terminal-route-test-123",
flow_class="test.TestFlow",
method_name="review",
method_output=method_output,
message="Approve?",
emit=["approved", "rejected"],
llm="gpt-4o-mini",
)
persistence.save_pending_feedback(
flow_uuid="terminal-route-test-123",
context=context,
state_data={"id": "terminal-route-test-123"},
)
flow = TestFlow.from_pending("terminal-route-test-123", persistence)
with patch.object(flow, "_collapse_to_outcome", return_value="approved"):
result = flow.resume("yes, this looks great")
assert result == method_output
assert flow.method_outputs == [method_output]
assert flow.last_human_feedback.outcome == "approved"
@patch("crewai.flow.runtime.crewai_event_bus.emit")
def test_resume_records_method_output_before_downstream_listeners(
self, mock_emit: MagicMock
) -> None:
"""Downstream listeners can read outputs from the resumed method."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
class TestFlow(Flow):
@start()
@human_feedback(message="Review:")
def review(self):
return "generated content"
@listen(review)
def downstream(self, result):
self.state["seen_outputs"] = self.method_outputs
return f"downstream:{result.output}"
context = PendingFeedbackContext(
flow_id="listener-output-test-123",
flow_class="test.TestFlow",
method_name="review",
method_output="generated content",
message="Review:",
)
persistence.save_pending_feedback(
flow_uuid="listener-output-test-123",
context=context,
state_data={"id": "listener-output-test-123"},
)
flow = TestFlow.from_pending("listener-output-test-123", persistence)
result = flow.resume("looks good")
assert result == "downstream:generated content"
assert len(flow.state["seen_outputs"]) == 1
seen_output = flow.state["seen_outputs"][0]
assert isinstance(seen_output, HumanFeedbackResult)
assert seen_output.output == "generated content"
assert seen_output.feedback == "looks good"
# Integration Tests with @human_feedback decorator
class TestAsyncHumanFeedbackIntegration:
"""Integration tests for async human feedback with decorator."""
def test_decorator_with_provider_parameter(self) -> None:
"""Test that decorator accepts provider parameter."""
class MockProvider:
def request_feedback(
self, context: PendingFeedbackContext, flow: Flow
) -> str:
raise HumanFeedbackPending(context=context)
class TestFlow(Flow):
@start()
@human_feedback(
message="Review:",
provider=MockProvider(),
)
def review(self):
return "content"
flow = TestFlow()
method = getattr(flow, "review")
assert hasattr(method, "__human_feedback_config__")
assert method.__human_feedback_config__.provider is not None
@patch("crewai.flow.runtime.crewai_event_bus.emit")
def test_async_provider_pauses_flow(self, mock_emit: MagicMock) -> None:
"""Test that async provider pauses flow execution."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
class PausingProvider:
def __init__(self, persistence: SQLiteFlowPersistence):
self.persistence = persistence
def request_feedback(
self, context: PendingFeedbackContext, flow: Flow
) -> str:
self.persistence.save_pending_feedback(
flow_uuid=context.flow_id,
context=context,
state_data=flow.state if isinstance(flow.state, dict) else flow.state.model_dump(),
)
raise HumanFeedbackPending(
context=context,
callback_info={"saved": True},
)
class TestFlow(Flow):
@start()
@human_feedback(
message="Review:",
provider=PausingProvider(persistence),
)
def generate(self):
return "generated content"
flow = TestFlow(persistence=persistence)
# kickoff now returns HumanFeedbackPending instead of raising it
result = flow.kickoff()
assert isinstance(result, HumanFeedbackPending)
assert result.callback_info["saved"] is True
flow_id = result.context.flow_id
persisted = persistence.load_pending_feedback(flow_id)
assert persisted is not None
@patch("crewai.flow.runtime.crewai_event_bus.emit")
def test_full_async_flow_cycle(self, mock_emit: MagicMock) -> None:
"""Test complete async flow: start -> pause -> resume."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
flow_id_holder: list[str] = []
class SaveAndPauseProvider:
def __init__(self, persistence: SQLiteFlowPersistence):
self.persistence = persistence
def request_feedback(
self, context: PendingFeedbackContext, flow: Flow
) -> str:
flow_id_holder.append(context.flow_id)
self.persistence.save_pending_feedback(
flow_uuid=context.flow_id,
context=context,
state_data=flow.state if isinstance(flow.state, dict) else flow.state.model_dump(),
)
raise HumanFeedbackPending(context=context)
class ReviewFlow(Flow):
processed_feedback: str = ""
@start()
@human_feedback(
message="Review this content:",
provider=SaveAndPauseProvider(persistence),
)
def generate(self):
return "AI generated content"
@listen(generate)
def process(self, feedback_result):
self.processed_feedback = feedback_result.feedback
return f"Final: {feedback_result.feedback}"
flow1 = ReviewFlow(persistence=persistence)
result = flow1.kickoff()
# kickoff now returns HumanFeedbackPending instead of raising it
assert isinstance(result, HumanFeedbackPending)
assert len(flow_id_holder) == 1
paused_flow_id = flow_id_holder[0]
flow2 = ReviewFlow.from_pending(paused_flow_id, persistence)
result = flow2.resume("This is my feedback")
assert flow2.last_human_feedback.feedback == "This is my feedback"
assert flow2.processed_feedback == "This is my feedback"
# Edge Case Tests
class TestAutoPersistence:
"""Tests for automatic persistence when no persistence is provided."""
@patch("crewai.flow.runtime.crewai_event_bus.emit")
def test_auto_persistence_when_none_provided(self, mock_emit: MagicMock) -> None:
"""Test that persistence is auto-created when HumanFeedbackPending is raised."""
class PausingProvider:
def request_feedback(
self, context: PendingFeedbackContext, flow: Flow
) -> str:
raise HumanFeedbackPending(
context=context,
callback_info={"paused": True},
)
class TestFlow(Flow):
@start()
@human_feedback(
message="Review:",
provider=PausingProvider(),
)
def generate(self):
return "content"
flow = TestFlow()
assert flow.persistence is None
# kickoff should auto-create persistence when HumanFeedbackPending is raised
result = flow.kickoff()
assert isinstance(result, HumanFeedbackPending)
# Persistence should have been auto-created
assert flow.persistence is not None
flow_id = result.context.flow_id
loaded = flow.persistence.load_pending_feedback(flow_id)
assert loaded is not None
class TestCollapseToOutcomeJsonParsing:
"""Tests for _collapse_to_outcome JSON parsing edge cases."""
def test_json_string_response_is_parsed(self) -> None:
"""Test that JSON string response from LLM is correctly parsed."""
flow = Flow()
with patch("crewai.llm.LLM") as MockLLM:
mock_llm = MagicMock()
# Simulate LLM returning JSON string (the bug we fixed)
mock_llm.call.return_value = '{"outcome": "approved"}'
MockLLM.return_value = mock_llm
result = flow._collapse_to_outcome(
feedback="I approve this",
outcomes=["approved", "rejected"],
llm="gpt-4o-mini",
)
assert result == "approved"
def test_plain_string_response_is_matched(self) -> None:
"""Test that plain string response is correctly matched."""
flow = Flow()
with patch("crewai.llm.LLM") as MockLLM:
mock_llm = MagicMock()
# Simulate LLM returning plain outcome string
mock_llm.call.return_value = "rejected"
MockLLM.return_value = mock_llm
result = flow._collapse_to_outcome(
feedback="This is not good",
outcomes=["approved", "rejected"],
llm="gpt-4o-mini",
)
assert result == "rejected"
def test_invalid_json_falls_back_to_matching(self) -> None:
"""Test that invalid JSON falls back to string matching."""
flow = Flow()
with patch("crewai.llm.LLM") as MockLLM:
mock_llm = MagicMock()
mock_llm.call.return_value = "{invalid json but says approved"
MockLLM.return_value = mock_llm
result = flow._collapse_to_outcome(
feedback="looks good",
outcomes=["approved", "rejected"],
llm="gpt-4o-mini",
)
assert result == "approved"
def test_llm_exception_falls_back_to_simple_prompting(self) -> None:
"""Test that LLM exception triggers fallback to simple prompting."""
flow = Flow()
with patch("crewai.llm.LLM") as MockLLM:
mock_llm = MagicMock()
# First call raises, second call succeeds (fallback)
mock_llm.call.side_effect = [
Exception("Structured output failed"),
"approved",
]
MockLLM.return_value = mock_llm
result = flow._collapse_to_outcome(
feedback="I approve",
outcomes=["approved", "rejected"],
llm="gpt-4o-mini",
)
assert result == "approved"
# Verify it was called twice (initial + fallback)
assert mock_llm.call.call_count == 2
class TestLLMObjectPreservedInContext:
"""Tests that BaseLLM objects have their model string preserved in PendingFeedbackContext."""
@patch("crewai.flow.runtime.crewai_event_bus.emit")
def test_basellm_object_model_string_survives_roundtrip(self, mock_emit: MagicMock) -> None:
"""Test that when llm is a BaseLLM object, its model string is stored in context
so that outcome collapsing works after async pause/resume.
This is the exact bug: locally the sync path keeps the LLM object in memory,
but in production the async path serializes the context and the LLM object was
discarded (stored as None), causing resume to skip classification and always
fall back to emit[0].
"""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
from crewai.llm import LLM
mock_llm_obj = LLM(
model="llama3",
provider="ollama",
base_url="http://localhost:11434",
)
class PausingProvider:
def __init__(self, persistence: SQLiteFlowPersistence):
self.persistence = persistence
self.captured_context: PendingFeedbackContext | None = None
def request_feedback(
self, context: PendingFeedbackContext, flow: Flow
) -> str:
self.captured_context = context
self.persistence.save_pending_feedback(
flow_uuid=context.flow_id,
context=context,
state_data=flow.state if isinstance(flow.state, dict) else flow.state.model_dump(),
)
raise HumanFeedbackPending(context=context)
provider = PausingProvider(persistence)
class TestFlow(Flow):
result_path: str = ""
@start()
@human_feedback(
message="Approve?",
emit=["needs_changes", "approved"],
llm=mock_llm_obj,
default_outcome="approved",
provider=provider,
)
def review(self):
return "content for review"
@listen("approved")
def handle_approved(self):
self.result_path = "approved"
return "Approved!"
@listen("needs_changes")
def handle_changes(self):
self.result_path = "needs_changes"
return "Changes needed"
flow1 = TestFlow(persistence=persistence)
result = flow1.kickoff()
assert isinstance(result, HumanFeedbackPending)
assert provider.captured_context is not None
assert isinstance(provider.captured_context.llm, dict)
assert provider.captured_context.llm["model"] == "ollama/llama3"
flow_id = result.context.flow_id
loaded = persistence.load_pending_feedback(flow_id)
assert loaded is not None
_, loaded_context = loaded
assert isinstance(loaded_context.llm, dict)
assert loaded_context.llm["model"] == "ollama/llama3"
flow2 = TestFlow.from_pending(flow_id, persistence)
assert flow2._pending_feedback_context is not None
assert isinstance(flow2._pending_feedback_context.llm, dict)
assert flow2._pending_feedback_context.llm["model"] == "ollama/llama3"
with patch.object(flow2, "_collapse_to_outcome", return_value="approved") as mock_collapse:
flow2.resume("this looks good, proceed!")
# The key assertion: _collapse_to_outcome was called (not skipped due to llm=None)
mock_collapse.assert_called_once()
call_kwargs = mock_collapse.call_args
assert call_kwargs.kwargs["feedback"] == "this looks good, proceed!"
assert call_kwargs.kwargs["outcomes"] == ["needs_changes", "approved"]
# LLM should be a live object (from _human_feedback_llm) or reconstructed, not None
assert call_kwargs.kwargs["llm"] is not None
assert getattr(call_kwargs.kwargs["llm"], "model", None) == "llama3"
assert flow2.last_human_feedback.outcome == "approved"
assert flow2.result_path == "approved"
def test_string_llm_still_works(self) -> None:
"""Test that passing llm as a string still works correctly."""
context = PendingFeedbackContext(
flow_id="str-llm-test",
flow_class="test.Flow",
method_name="review",
method_output="output",
message="Review:",
emit=["approved", "rejected"],
llm="gpt-4o-mini",
)
serialized = context.to_dict()
restored = PendingFeedbackContext.from_dict(serialized)
assert restored.llm == "gpt-4o-mini"
def test_none_llm_when_no_model_attr(self) -> None:
"""Test that llm is None when object has no model attribute."""
from crewai.flow.human_feedback import _serialize_llm_for_context
mock_obj = MagicMock(spec=[])
assert _serialize_llm_for_context(mock_obj) is None
def test_provider_prefix_added_to_bare_model(self) -> None:
"""Test that provider prefix is added when model has no slash."""
from crewai.flow.human_feedback import _serialize_llm_for_context
from crewai.llm import LLM
llm = LLM(
model="llama3",
provider="ollama",
base_url="http://localhost:11434",
)
result = _serialize_llm_for_context(llm)
assert isinstance(result, dict)
assert result["model"] == "ollama/llama3"
def test_provider_prefix_not_doubled_when_already_present(self) -> None:
"""Test that provider prefix is not added when model already has a slash."""
from crewai.flow.human_feedback import _serialize_llm_for_context
from crewai.llm import LLM
llm = LLM(model="ollama/llama3", base_url="http://localhost:11434")
result = _serialize_llm_for_context(llm)
assert isinstance(result, dict)
assert result["model"] == "ollama/llama3"
def test_no_provider_attr_falls_back_to_bare_model(self) -> None:
"""Test that objects without to_config_dict fall back to model string."""
from crewai.flow.human_feedback import _serialize_llm_for_context
mock_obj = MagicMock(spec=[])
mock_obj.model = "gpt-4o-mini"
assert _serialize_llm_for_context(mock_obj) == "gpt-4o-mini"
class TestAsyncHumanFeedbackEdgeCases:
"""Edge case tests for async human feedback."""
def test_pending_context_with_complex_output(self) -> None:
"""Test context with complex nested output."""
complex_output = {
"items": [{"id": 1, "name": "Item 1"}, {"id": 2, "name": "Item 2"}],
"metadata": {"total": 2, "page": 1},
"nested": {"deep": {"value": "test"}},
}
context = PendingFeedbackContext(
flow_id="complex-test",
flow_class="test.Flow",
method_name="method",
method_output=complex_output,
message="Review:",
)
# Serialize and deserialize
serialized = context.to_dict()
json_str = json.dumps(serialized)
restored = PendingFeedbackContext.from_dict(json.loads(json_str))
assert restored.method_output == complex_output
def test_empty_feedback_uses_default_outcome(self) -> None:
"""Test that empty feedback uses default outcome during resume."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
class TestFlow(Flow):
@start()
def generate(self):
return "content"
context = PendingFeedbackContext(
flow_id="default-test",
flow_class="test.Flow",
method_name="generate",
method_output="content",
message="Review:",
emit=["approved", "rejected"],
default_outcome="approved",
llm="gpt-4o-mini",
)
persistence.save_pending_feedback(
flow_uuid="default-test",
context=context,
state_data={"id": "default-test"},
)
flow = TestFlow.from_pending("default-test", persistence)
with patch("crewai.flow.runtime.crewai_event_bus.emit"):
result = flow.resume("")
assert flow.last_human_feedback.outcome == "approved"
def test_resume_without_feedback_uses_default(self) -> None:
"""Test that resume() can be called without feedback argument."""
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test.db")
persistence = SQLiteFlowPersistence(db_path)
class TestFlow(Flow):
@start()
def step(self):
return "output"
context = PendingFeedbackContext(
flow_id="no-feedback-test",
flow_class="TestFlow",
method_name="step",
method_output="test output",
message="Review:",
emit=["approved", "rejected"],
default_outcome="approved",
llm="gpt-4o-mini",
)
persistence.save_pending_feedback(
flow_uuid="no-feedback-test",
context=context,
state_data={"id": "no-feedback-test"},
)
flow = TestFlow.from_pending("no-feedback-test", persistence)
with patch("crewai.flow.runtime.crewai_event_bus.emit"):
result = flow.resume()
assert flow.last_human_feedback.outcome == "approved"
assert flow.last_human_feedback.feedback == ""
class TestResumeLLMFromSerializedContext:
"""Resume rebuilds the collapse LLM from the serialized context alone."""
@patch("crewai.flow.runtime.crewai_event_bus.emit")
def test_resume_builds_llm_from_serialized_context(
self, mock_emit: MagicMock
) -> None:
with tempfile.TemporaryDirectory() as tmpdir:
db_path = os.path.join(tmpdir, "test_flows.db")
persistence = SQLiteFlowPersistence(db_path)
class TestFlow(Flow):
@start()
@human_feedback(
message="Approve?",
emit=["approved", "rejected"],
llm="gpt-4o-mini",
)
def review(self):
return "content"
context = PendingFeedbackContext(
flow_id="fallback-test",
flow_class="TestFlow",
method_name="review",
method_output="content",
message="Approve?",
emit=["approved", "rejected"],
llm="gpt-4o-mini",
)
persistence.save_pending_feedback(
flow_uuid="fallback-test",
context=context,
state_data={"id": "fallback-test"},
)
flow = TestFlow.from_pending("fallback-test", persistence)
captured_llm = []
def capture_llm(feedback, outcomes, llm):
captured_llm.append(llm)
return "approved"
with patch.object(flow, "_collapse_to_outcome", side_effect=capture_llm):
flow.resume("looks good!")
assert len(captured_llm) == 1
from crewai.llms.base_llm import BaseLLM as BaseLLMClass
assert isinstance(captured_llm[0], BaseLLMClass)
assert captured_llm[0].model == "gpt-4o-mini"