mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-07-01 05:08:12 +00:00
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Check Documentation Broken Links / Check broken links (push) Has been cancelled
Vulnerability Scan / pip-audit (push) Has been cancelled
Nightly Canary Release / Check for new commits (push) Has been cancelled
Nightly Canary Release / Build nightly packages (push) Has been cancelled
Nightly Canary Release / Publish nightly to PyPI (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
* feat(cli): introduce JSON crew project support and TUI enhancements - Added support for creating and running JSON-defined crew projects, allowing users to scaffold projects with a new `create_json_crew.py` file. - Implemented a full-screen Textual TUI for crew execution in `crew_run_tui.py`, enhancing user interaction with a two-column layout. - Updated `run_crew.py` to prioritize JSON crew projects and added daemon mode for running without TUI. - Introduced interactive pickers in `tui_picker.py` for improved CLI prompts. - Enhanced validation for JSON crew files in `validate.py` to ensure proper structure and agent definitions. - Updated `.gitignore` to exclude demo and crewai directories. * feat: update LLM model references to gpt-5.4-mini - Changed default LLM model from gpt-4o-mini to gpt-5.4-mini across various files, including CLI options, JSON crew configurations, and agent definitions. - Enhanced benchmark and human feedback functionalities to utilize the new model. - Improved user interface elements in the TUI for better interaction and feedback during execution. - Added support for new skills directory in JSON crew project creation. * feat(benchmark): add crew-level benchmarking functionality - Introduced a new `benchmark` command in the CLI for crew-level benchmarking, allowing users to specify agents, models, and timeout settings. - Implemented `CrewBenchmarkCase` to handle crew-level benchmark cases with inputs and criteria. - Enhanced the benchmark runner to support progress tracking and detailed reporting of results for multiple models. - Added tests for loading crew benchmark cases and validating their structure. - Updated existing benchmark functions to accommodate the new crew-level execution model. * feat(cli): enhance JSON crew project functionality and TUI improvements - Added optional agent-level guardrails and advanced options in JSON crew configurations to improve output validation and flexibility. - Updated the TUI to better handle plan step statuses, including visual indicators for task completion and failure. - Introduced methods for parsing and managing step observation events, ensuring accurate updates to task statuses during execution. - Enhanced validation for JSON crew projects, ensuring proper structure and error handling for agent and task definitions. - Added comprehensive tests for new features and validation logic, ensuring robustness in JSON crew project handling. * refactor(cli): streamline JSON crew project handling and improve validation - Refactored JSON crew project loading and validation logic to enhance clarity and maintainability. - Introduced utility functions for finding JSON crew files, improving code reuse across modules. - Removed deprecated benchmark functionality and associated tests to simplify the codebase. - Updated CLI commands to utilize the new JSON project structure, ensuring compatibility with recent changes. - Enhanced test coverage for JSON crew project features, ensuring robust validation and error handling. * feat(cli): enhance activity log navigation and focus management - Added functionality to focus on the activity log when navigating through log entries. - Implemented refresh logic for the log panel to ensure updates are displayed correctly during navigation. - Improved keyboard navigation for log entries, allowing users to expand and scroll through logs seamlessly. - Added tests to verify the correct behavior of log navigation and focus management in the TUI. * feat(cli): enhance JSON crew project interaction and input handling - Introduced a new function to enable prompt line editing for better user experience during input prompts. - Updated the JSON crew project wizards to show interpolation hints for dynamic values, improving user guidance. - Enhanced the handling of missing input placeholders by prompting users for required values during crew setup. - Refactored the crew run logic to ensure proper loading and preparation of JSON-defined crews, including runtime input management. - Added tests to verify the correct behavior of new input handling features and JSON crew project interactions. * feat(cli): improve crew project input prompts and event handling - Enhanced the `_prompt_text` function to allow for configurable spacing before prompts, improving user experience during input collection. - Updated the wizards for agent and task creation to utilize the new prompt configuration, ensuring a more compact and streamlined interaction. - Introduced new plan step lifecycle events (`PlanStepStartedEvent`, `PlanStepCompletedEvent`) to better track the execution status of plan steps. - Refactored the step executor to emit these events during the execution of tasks, improving observability and debugging capabilities. - Added tests to verify the correct behavior of new prompt handling and event emissions during crew project execution. * fix: refine json-first crew interactions * fix: prioritize common json crew tools * fix: make json crew more tools expandable * fix: show json crew tools by category * feat(memory): update default embedder to OpenAI text-embedding-3-large and enhance memory compatibility - Changed the default embedding model for Memory to OpenAI text-embedding-3-large, which uses 3072-dimensional vectors. - Added warnings regarding compatibility issues with existing local memory stores created with 1536-dimensional embeddings. - Updated documentation to reflect the new default embedder and its configuration options. - Enhanced the CLI and codebase to support the new embedding model across various components, ensuring a seamless transition for users. * fix: address PR review feedback for JSON-first crews Review blockers: - Forward trained_agents_file to JSON crews: crewai run -f now exports CREWAI_TRAINED_AGENTS_FILE for the in-process JSON crew path - Wizard agent picker: Esc/cancel now reprompts instead of silently assigning the first agent - JSON tool resolution hard-fails: unknown tool names, missing custom tool files, and invalid custom tool modules raise JSONProjectError with actionable messages instead of warn-and-continue - Embedding dimension mismatch: LanceDB and Qdrant Edge storages raise EmbeddingDimensionMismatchError with reset/pin guidance instead of silently zero-filling vectors or returning empty search results - Custom tool code execution documented in loader docstring and the scaffolded project README CI fixes: - ruff format across lib/ - All 133 PR-introduced mypy errors fixed (llm.py lazy-litellm and cli.py lazy command shims now use TYPE_CHECKING imports; textual is_mounted misuse fixed; pick_many overloads; misc annotations) Bot review comments: - Empty except blocks now have explanatory comments or debug logging - Removed unused _C_BG/_C_PANEL/_C_BORDER globals and redundant import re; tests use a single import style for create_json_crew Tests: trained-agents propagation, wizard cancel, tool resolution failures, and dimension mismatch guidance. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix: address second round of PR review comments Cursor Bugbot: - Wizard agent slugs: strip to [a-z0-9_] and fall back to agent_<n> so symbol-only roles can't produce an empty agents/.jsonc filename - Wizard task names: dedupe against prior task names and fall back to task_<n> for symbol-only descriptions CodeRabbit: - Agent.message(): import Task explicitly at runtime instead of relying on the namespace injection done by crewai/__init__ - Async executor: move the native-tools-unsupported fallback from _ainvoke_loop_react (self-recursion) to _ainvoke_loop_native_tools, mirroring the sync implementation - StepExecutor downgrade: keep the in-step conversation and append the text-tooling instructions instead of rebuilding messages, so completed native tool calls are not re-executed - crewai-files: extension-based MIME lookup now runs before byte sniffing so csv/xml types are not degraded to text/plain - Memory storages: validate every record in a save() batch against a consistent embedding dimension (LanceDB previously checked only the first record); added mixed-batch tests - _print_post_tui_summary now typed against CrewRunApp - Docs: Azure OpenAI default embedder change called out in the memory migration warning and provider table Code quality bots: - Removed unused _C_YELLOW/_C_CYAN (crew_run_tui) and _GREEN (tui_picker) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(cli): accordion tool picker in JSON crew wizard The flat tool list had grown to ~90 rows. The picker now shows: - Common tools always visible at the top - Every other category as a single expandable row with tool and selection counts (e.g. "Search & Research (27 tools, 2 selected)") - Expanding a category collapses the previously expanded one - Selections persist across expand/collapse via new preselected support in pick_many; cursor follows the toggled category row tui_picker gains preselected + initial_cursor options on pick_many, and Esc in multi-select now confirms the current selection instead of discarding it (required so collapsing can't silently drop choices). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * refactor(cli): remove --daemon flag from crewai run The flag only affected JSON crew projects — classic and flow projects ignored it entirely, which made the behavior inconsistent. Removed the option, the daemon code path (_run_json_crew_daemon), and its helper (_load_json_crew_with_inputs). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test: update run command tests after --daemon removal lib/crewai/tests/cli/test_run_crew.py still asserted the old run_crew(trained_agents_file=..., daemon=False) call signature. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(cli): exit codes, mid-run quit, async statuses, hyphen placeholders Addresses the latest Bugbot review round: - Failed JSON crew runs now exit non-zero (SystemExit(1)) so scripts and CI don't treat failures as success, mirroring the classic path - Quitting the TUI mid-run now ends the process (os._exit(130)); kickoff runs in a thread worker that cannot be force-cancelled, so letting the CLI return would leave LLM/tool work burning tokens in the background - Sidebar task statuses are now async-safe: completion/failure events resolve the task's own row via identity instead of assuming the most recently started task, and starting a task no longer blanket-marks earlier active rows as done - The runtime-input prompt regex now accepts hyphenated placeholder names ({my-topic}), matching kickoff's interpolation pattern Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix: validation safety, custom tool sandboxing, TUI log integrity, memory error surfacing - Deploy validation no longer executes project code: validation mode checks tool declarations structurally (well-formed entries, custom tool file exists) without importing or instantiating anything. custom:<name> resolution only happens on the actual run path. - custom:<name> is constrained to [A-Za-z_][A-Za-z0-9_]* and the resolved path must stay inside the project's tools/ directory, so custom:../foo or absolute-path names cannot execute code outside it. Tool paths resolve relative to the crew project root, not cwd. - TUI task logs are built from per-task state captured at task start (idx, description, agent, start time); an out-of-order completion takes its output from the event and no longer steals or resets the current task's streamed steps/output. - EmbeddingDimensionMismatchError now inherits ValueError instead of RuntimeError so background saves surface it through MemorySaveFailedEvent instead of silently dropping the save; the shutdown catch in _background_encode_batch is narrowed to the "cannot schedule new futures" case. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(cli): declared project type wins over crew.json presence A flow project that also contains a crew.json(c) file now runs and validates as the flow it declares in pyproject.toml instead of being hijacked by the JSON crew path. Both crewai run (_has_json_crew) and deploy validation (_is_json_crew) check tool.crewai.type; a missing or unreadable pyproject still means a bare JSON crew project. Also documents why StepObservationFailedEvent intentionally marks the plan step "done": the event signals an observer failure, not a step failure, and the executor continues past it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(cli): type the declared_type locals so mypy stays clean Comparing an Any-typed .get() chain returns Any, which tripped no-any-return on the previous commit. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
1356 lines
49 KiB
Python
1356 lines
49 KiB
Python
"""Tests for async human feedback functionality.
|
|
|
|
This module tests the async/non-blocking human feedback flow, including:
|
|
- PendingFeedbackContext creation and serialization
|
|
- HumanFeedbackPending exception handling
|
|
- HumanFeedbackProvider protocol
|
|
- ConsoleProvider
|
|
- Flow.from_pending() and Flow.resume()
|
|
- SQLite persistence with pending feedback
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
import json
|
|
import os
|
|
import tempfile
|
|
from datetime import datetime
|
|
from typing import Any
|
|
from unittest.mock import MagicMock, patch
|
|
|
|
import pytest
|
|
from pydantic import BaseModel
|
|
|
|
from crewai.flow import Flow, HumanFeedbackResult, start, listen, human_feedback
|
|
from crewai.flow.async_feedback import (
|
|
ConsoleProvider,
|
|
HumanFeedbackPending,
|
|
HumanFeedbackProvider,
|
|
PendingFeedbackContext,
|
|
)
|
|
from crewai.flow.persistence import SQLiteFlowPersistence
|
|
|
|
|
|
# PendingFeedbackContext Tests
|
|
|
|
|
|
class TestPendingFeedbackContext:
|
|
"""Tests for PendingFeedbackContext dataclass."""
|
|
|
|
def test_create_basic_context(self) -> None:
|
|
"""Test creating a basic pending feedback context."""
|
|
context = PendingFeedbackContext(
|
|
flow_id="test-flow-123",
|
|
flow_class="myapp.flows.ReviewFlow",
|
|
method_name="review_content",
|
|
method_output="Content to review",
|
|
message="Please review this content:",
|
|
)
|
|
|
|
assert context.flow_id == "test-flow-123"
|
|
assert context.flow_class == "myapp.flows.ReviewFlow"
|
|
assert context.method_name == "review_content"
|
|
assert context.method_output == "Content to review"
|
|
assert context.message == "Please review this content:"
|
|
assert context.emit is None
|
|
assert context.default_outcome is None
|
|
assert context.metadata == {}
|
|
assert isinstance(context.requested_at, datetime)
|
|
|
|
def test_create_context_with_emit(self) -> None:
|
|
"""Test creating context with routing outcomes."""
|
|
context = PendingFeedbackContext(
|
|
flow_id="test-flow-456",
|
|
flow_class="myapp.flows.ApprovalFlow",
|
|
method_name="submit_for_approval",
|
|
method_output={"document": "content"},
|
|
message="Approve or reject:",
|
|
emit=["approved", "rejected", "needs_revision"],
|
|
default_outcome="needs_revision",
|
|
llm="gpt-4o-mini",
|
|
)
|
|
|
|
assert context.emit == ["approved", "rejected", "needs_revision"]
|
|
assert context.default_outcome == "needs_revision"
|
|
assert context.llm == "gpt-4o-mini"
|
|
|
|
def test_to_dict_serialization(self) -> None:
|
|
"""Test serializing context to dictionary."""
|
|
context = PendingFeedbackContext(
|
|
flow_id="test-flow-789",
|
|
flow_class="myapp.flows.TestFlow",
|
|
method_name="test_method",
|
|
method_output={"key": "value"},
|
|
message="Test message",
|
|
emit=["yes", "no"],
|
|
metadata={"channel": "#reviews"},
|
|
)
|
|
|
|
result = context.to_dict()
|
|
|
|
assert result["flow_id"] == "test-flow-789"
|
|
assert result["flow_class"] == "myapp.flows.TestFlow"
|
|
assert result["method_name"] == "test_method"
|
|
assert result["method_output"] == {"key": "value"}
|
|
assert result["message"] == "Test message"
|
|
assert result["emit"] == ["yes", "no"]
|
|
assert result["metadata"] == {"channel": "#reviews"}
|
|
assert "requested_at" in result
|
|
|
|
def test_from_dict_deserialization(self) -> None:
|
|
"""Test deserializing context from dictionary."""
|
|
data = {
|
|
"flow_id": "test-flow-abc",
|
|
"flow_class": "myapp.flows.TestFlow",
|
|
"method_name": "my_method",
|
|
"method_output": "output value",
|
|
"message": "Feedback message",
|
|
"emit": ["option_a", "option_b"],
|
|
"default_outcome": "option_a",
|
|
"metadata": {"user_id": "123"},
|
|
"llm": "gpt-4o-mini",
|
|
"requested_at": "2024-01-15T10:30:00",
|
|
}
|
|
|
|
context = PendingFeedbackContext.from_dict(data)
|
|
|
|
assert context.flow_id == "test-flow-abc"
|
|
assert context.flow_class == "myapp.flows.TestFlow"
|
|
assert context.method_name == "my_method"
|
|
assert context.emit == ["option_a", "option_b"]
|
|
assert context.default_outcome == "option_a"
|
|
assert context.llm == "gpt-4o-mini"
|
|
|
|
def test_roundtrip_serialization(self) -> None:
|
|
"""Test that to_dict/from_dict roundtrips correctly."""
|
|
original = PendingFeedbackContext(
|
|
flow_id="roundtrip-test",
|
|
flow_class="test.TestFlow",
|
|
method_name="test",
|
|
method_output={"nested": {"data": [1, 2, 3]}},
|
|
message="Test",
|
|
emit=["a", "b"],
|
|
metadata={"key": "value"},
|
|
)
|
|
|
|
serialized = original.to_dict()
|
|
restored = PendingFeedbackContext.from_dict(serialized)
|
|
|
|
assert restored.flow_id == original.flow_id
|
|
assert restored.flow_class == original.flow_class
|
|
assert restored.method_name == original.method_name
|
|
assert restored.method_output == original.method_output
|
|
assert restored.emit == original.emit
|
|
assert restored.metadata == original.metadata
|
|
|
|
|
|
# HumanFeedbackPending Exception Tests
|
|
|
|
|
|
class TestHumanFeedbackPending:
|
|
"""Tests for HumanFeedbackPending exception."""
|
|
|
|
def test_basic_exception(self) -> None:
|
|
"""Test creating basic pending exception."""
|
|
context = PendingFeedbackContext(
|
|
flow_id="exc-test",
|
|
flow_class="test.Flow",
|
|
method_name="method",
|
|
method_output="output",
|
|
message="message",
|
|
)
|
|
|
|
exc = HumanFeedbackPending(context=context)
|
|
|
|
assert exc.context == context
|
|
assert exc.callback_info == {}
|
|
assert "exc-test" in str(exc)
|
|
assert "method" in str(exc)
|
|
|
|
def test_exception_with_callback_info(self) -> None:
|
|
"""Test pending exception with callback information."""
|
|
context = PendingFeedbackContext(
|
|
flow_id="callback-test",
|
|
flow_class="test.Flow",
|
|
method_name="method",
|
|
method_output="output",
|
|
message="message",
|
|
)
|
|
|
|
exc = HumanFeedbackPending(
|
|
context=context,
|
|
callback_info={
|
|
"webhook_url": "https://example.com/webhook",
|
|
"slack_thread": "123456",
|
|
},
|
|
)
|
|
|
|
assert exc.callback_info["webhook_url"] == "https://example.com/webhook"
|
|
assert exc.callback_info["slack_thread"] == "123456"
|
|
|
|
def test_exception_with_custom_message(self) -> None:
|
|
"""Test pending exception with custom message."""
|
|
context = PendingFeedbackContext(
|
|
flow_id="msg-test",
|
|
flow_class="test.Flow",
|
|
method_name="method",
|
|
method_output="output",
|
|
message="message",
|
|
)
|
|
|
|
exc = HumanFeedbackPending(
|
|
context=context,
|
|
message="Custom pending message",
|
|
)
|
|
|
|
assert str(exc) == "Custom pending message"
|
|
|
|
def test_exception_is_catchable(self) -> None:
|
|
"""Test that exception can be caught and handled."""
|
|
context = PendingFeedbackContext(
|
|
flow_id="catch-test",
|
|
flow_class="test.Flow",
|
|
method_name="method",
|
|
method_output="output",
|
|
message="message",
|
|
)
|
|
|
|
with pytest.raises(HumanFeedbackPending) as exc_info:
|
|
raise HumanFeedbackPending(context=context)
|
|
|
|
assert exc_info.value.context.flow_id == "catch-test"
|
|
|
|
|
|
# HumanFeedbackProvider Protocol Tests
|
|
|
|
|
|
class TestHumanFeedbackProvider:
|
|
"""Tests for HumanFeedbackProvider protocol."""
|
|
|
|
def test_protocol_compliance_sync_provider(self) -> None:
|
|
"""Test that sync provider complies with protocol."""
|
|
|
|
class SyncProvider:
|
|
def request_feedback(
|
|
self, context: PendingFeedbackContext, flow: Flow
|
|
) -> str:
|
|
return "sync feedback"
|
|
|
|
provider = SyncProvider()
|
|
assert isinstance(provider, HumanFeedbackProvider)
|
|
|
|
def test_protocol_compliance_async_provider(self) -> None:
|
|
"""Test that async provider complies with protocol."""
|
|
|
|
class AsyncProvider:
|
|
def request_feedback(
|
|
self, context: PendingFeedbackContext, flow: Flow
|
|
) -> str:
|
|
raise HumanFeedbackPending(context=context)
|
|
|
|
provider = AsyncProvider()
|
|
assert isinstance(provider, HumanFeedbackProvider)
|
|
|
|
|
|
# ConsoleProvider Tests
|
|
|
|
|
|
class TestConsoleProvider:
|
|
"""Tests for ConsoleProvider."""
|
|
|
|
def test_provider_initialization(self) -> None:
|
|
"""Test console provider initialization."""
|
|
provider = ConsoleProvider()
|
|
assert provider.verbose is True
|
|
|
|
quiet_provider = ConsoleProvider(verbose=False)
|
|
assert quiet_provider.verbose is False
|
|
|
|
|
|
|
|
# SQLite Persistence Tests for Async Feedback
|
|
|
|
|
|
class TestSQLitePendingFeedback:
|
|
"""Tests for SQLite persistence with pending feedback."""
|
|
|
|
def test_save_and_load_pending_feedback(self) -> None:
|
|
"""Test saving and loading pending feedback context."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
context = PendingFeedbackContext(
|
|
flow_id="persist-test-123",
|
|
flow_class="test.TestFlow",
|
|
method_name="review",
|
|
method_output={"data": "test"},
|
|
message="Review this:",
|
|
emit=["approved", "rejected"],
|
|
llm="gpt-4o-mini",
|
|
)
|
|
|
|
state_data = {"counter": 10, "items": ["a", "b"]}
|
|
|
|
persistence.save_pending_feedback(
|
|
flow_uuid="persist-test-123",
|
|
context=context,
|
|
state_data=state_data,
|
|
)
|
|
|
|
result = persistence.load_pending_feedback("persist-test-123")
|
|
|
|
assert result is not None
|
|
loaded_state, loaded_context = result
|
|
assert loaded_state["counter"] == 10
|
|
assert loaded_state["items"] == ["a", "b"]
|
|
assert loaded_context.flow_id == "persist-test-123"
|
|
assert loaded_context.emit == ["approved", "rejected"]
|
|
|
|
def test_load_nonexistent_pending_feedback(self) -> None:
|
|
"""Test loading pending feedback that doesn't exist."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
result = persistence.load_pending_feedback("nonexistent-id")
|
|
assert result is None
|
|
|
|
def test_clear_pending_feedback(self) -> None:
|
|
"""Test clearing pending feedback after resume."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
context = PendingFeedbackContext(
|
|
flow_id="clear-test",
|
|
flow_class="test.Flow",
|
|
method_name="method",
|
|
method_output="output",
|
|
message="message",
|
|
)
|
|
|
|
persistence.save_pending_feedback(
|
|
flow_uuid="clear-test",
|
|
context=context,
|
|
state_data={"key": "value"},
|
|
)
|
|
|
|
assert persistence.load_pending_feedback("clear-test") is not None
|
|
|
|
persistence.clear_pending_feedback("clear-test")
|
|
|
|
assert persistence.load_pending_feedback("clear-test") is None
|
|
|
|
def test_replace_existing_pending_feedback(self) -> None:
|
|
"""Test that saving pending feedback replaces existing entry."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
flow_id = "replace-test"
|
|
|
|
context1 = PendingFeedbackContext(
|
|
flow_id=flow_id,
|
|
flow_class="test.Flow",
|
|
method_name="method1",
|
|
method_output="output1",
|
|
message="message1",
|
|
)
|
|
persistence.save_pending_feedback(
|
|
flow_uuid=flow_id,
|
|
context=context1,
|
|
state_data={"version": 1},
|
|
)
|
|
|
|
context2 = PendingFeedbackContext(
|
|
flow_id=flow_id,
|
|
flow_class="test.Flow",
|
|
method_name="method2",
|
|
method_output="output2",
|
|
message="message2",
|
|
)
|
|
persistence.save_pending_feedback(
|
|
flow_uuid=flow_id,
|
|
context=context2,
|
|
state_data={"version": 2},
|
|
)
|
|
|
|
result = persistence.load_pending_feedback(flow_id)
|
|
assert result is not None
|
|
state, context = result
|
|
assert state["version"] == 2
|
|
assert context.method_name == "method2"
|
|
|
|
|
|
# Custom Async Provider Tests
|
|
|
|
|
|
class TestCustomAsyncProvider:
|
|
"""Tests for custom async providers."""
|
|
|
|
def test_provider_raises_pending_exception(self) -> None:
|
|
"""Test that async provider raises HumanFeedbackPending."""
|
|
|
|
class WebhookProvider:
|
|
def __init__(self, webhook_url: str):
|
|
self.webhook_url = webhook_url
|
|
|
|
def request_feedback(
|
|
self, context: PendingFeedbackContext, flow: Flow
|
|
) -> str:
|
|
raise HumanFeedbackPending(
|
|
context=context,
|
|
callback_info={"url": f"{self.webhook_url}/{context.flow_id}"},
|
|
)
|
|
|
|
provider = WebhookProvider("https://example.com/api")
|
|
context = PendingFeedbackContext(
|
|
flow_id="webhook-test",
|
|
flow_class="test.Flow",
|
|
method_name="method",
|
|
method_output="output",
|
|
message="message",
|
|
)
|
|
mock_flow = MagicMock()
|
|
|
|
with pytest.raises(HumanFeedbackPending) as exc_info:
|
|
provider.request_feedback(context, mock_flow)
|
|
|
|
assert exc_info.value.callback_info["url"] == (
|
|
"https://example.com/api/webhook-test"
|
|
)
|
|
|
|
|
|
# Flow.from_pending and resume Tests
|
|
|
|
|
|
class TestFlowResumeWithFeedback:
|
|
"""Tests for Flow.from_pending and resume."""
|
|
|
|
def test_from_pending_uses_default_persistence(self) -> None:
|
|
"""Test that from_pending uses SQLiteFlowPersistence by default."""
|
|
|
|
class TestFlow(Flow):
|
|
@start()
|
|
def begin(self):
|
|
return "started"
|
|
|
|
with pytest.raises(ValueError, match="No pending feedback found"):
|
|
TestFlow.from_pending("nonexistent-id")
|
|
|
|
def test_from_pending_raises_for_missing_flow(self) -> None:
|
|
"""Test that from_pending raises error for nonexistent flow."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
class TestFlow(Flow):
|
|
@start()
|
|
def begin(self):
|
|
return "started"
|
|
|
|
with pytest.raises(ValueError, match="No pending feedback found"):
|
|
TestFlow.from_pending("nonexistent-id", persistence)
|
|
|
|
def test_from_pending_restores_state(self) -> None:
|
|
"""Test that from_pending correctly restores flow state."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
class TestState(BaseModel):
|
|
id: str = "test-restore-123"
|
|
counter: int = 0
|
|
|
|
class TestFlow(Flow[TestState]):
|
|
@start()
|
|
def begin(self):
|
|
return "started"
|
|
|
|
# Manually save pending feedback
|
|
context = PendingFeedbackContext(
|
|
flow_id="test-restore-123",
|
|
flow_class="test.TestFlow",
|
|
method_name="review",
|
|
method_output="content",
|
|
message="Review:",
|
|
)
|
|
persistence.save_pending_feedback(
|
|
flow_uuid="test-restore-123",
|
|
context=context,
|
|
state_data={"id": "test-restore-123", "counter": 42},
|
|
)
|
|
|
|
flow = TestFlow.from_pending("test-restore-123", persistence)
|
|
|
|
assert flow._pending_feedback_context is not None
|
|
assert flow._pending_feedback_context.flow_id == "test-restore-123"
|
|
assert flow._is_execution_resuming is True
|
|
assert flow.state.counter == 42
|
|
|
|
def test_resume_without_pending_raises_error(self) -> None:
|
|
"""Test that resume raises error without pending context."""
|
|
|
|
class TestFlow(Flow):
|
|
@start()
|
|
def begin(self):
|
|
return "started"
|
|
|
|
flow = TestFlow()
|
|
|
|
with pytest.raises(ValueError, match="No pending feedback context"):
|
|
flow.resume("some feedback")
|
|
|
|
def test_resume_from_async_context_raises_error(self) -> None:
|
|
"""Test that resume() raises RuntimeError when called from async context."""
|
|
import asyncio
|
|
|
|
class TestFlow(Flow):
|
|
@start()
|
|
def begin(self):
|
|
return "started"
|
|
|
|
async def call_resume_from_async():
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
context = PendingFeedbackContext(
|
|
flow_id="async-context-test",
|
|
flow_class="TestFlow",
|
|
method_name="begin",
|
|
method_output="output",
|
|
message="Review:",
|
|
)
|
|
persistence.save_pending_feedback(
|
|
flow_uuid="async-context-test",
|
|
context=context,
|
|
state_data={"id": "async-context-test"},
|
|
)
|
|
|
|
flow = TestFlow.from_pending("async-context-test", persistence)
|
|
|
|
# This should raise RuntimeError because we're in an async context
|
|
with pytest.raises(RuntimeError, match="cannot be called from within an async context"):
|
|
flow.resume("feedback")
|
|
|
|
asyncio.run(call_resume_from_async())
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_resume_async_direct(self) -> None:
|
|
"""Test resume_async() can be called directly in async context."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
class TestFlow(Flow):
|
|
@start()
|
|
@human_feedback(message="Review:")
|
|
def generate(self):
|
|
return "content"
|
|
|
|
@listen(generate)
|
|
def process(self, result):
|
|
return f"processed: {result.feedback}"
|
|
|
|
context = PendingFeedbackContext(
|
|
flow_id="async-direct-test",
|
|
flow_class="TestFlow",
|
|
method_name="generate",
|
|
method_output="content",
|
|
message="Review:",
|
|
)
|
|
persistence.save_pending_feedback(
|
|
flow_uuid="async-direct-test",
|
|
context=context,
|
|
state_data={"id": "async-direct-test"},
|
|
)
|
|
|
|
flow = TestFlow.from_pending("async-direct-test", persistence)
|
|
|
|
with patch("crewai.flow.runtime.crewai_event_bus.emit"):
|
|
result = await flow.resume_async("async feedback")
|
|
|
|
assert flow.last_human_feedback is not None
|
|
assert flow.last_human_feedback.feedback == "async feedback"
|
|
|
|
@patch("crewai.flow.runtime.crewai_event_bus.emit")
|
|
def test_resume_basic(self, mock_emit: MagicMock) -> None:
|
|
"""Test basic resume functionality."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
class TestFlow(Flow):
|
|
@start()
|
|
@human_feedback(message="Review this:")
|
|
def generate(self):
|
|
return "generated content"
|
|
|
|
@listen(generate)
|
|
def process(self, feedback_result):
|
|
return f"Processed: {feedback_result.feedback}"
|
|
|
|
# Manually save pending feedback (simulating async pause)
|
|
context = PendingFeedbackContext(
|
|
flow_id="resume-test-123",
|
|
flow_class="test.TestFlow",
|
|
method_name="generate",
|
|
method_output="generated content",
|
|
message="Review this:",
|
|
)
|
|
persistence.save_pending_feedback(
|
|
flow_uuid="resume-test-123",
|
|
context=context,
|
|
state_data={"id": "resume-test-123"},
|
|
)
|
|
|
|
flow = TestFlow.from_pending("resume-test-123", persistence)
|
|
result = flow.resume("looks good!")
|
|
|
|
assert flow.last_human_feedback is not None
|
|
assert flow.last_human_feedback.feedback == "looks good!"
|
|
assert flow.last_human_feedback.output == "generated content"
|
|
|
|
assert persistence.load_pending_feedback("resume-test-123") is None
|
|
|
|
@patch("crewai.flow.runtime.crewai_event_bus.emit")
|
|
def test_terminal_resume_without_emit_returns_feedback_result(
|
|
self, mock_emit: MagicMock
|
|
) -> None:
|
|
"""Terminal resumed non-emit methods return the full feedback result."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
class TestFlow(Flow):
|
|
@start()
|
|
@human_feedback(message="Review this:", metadata={"stage": "draft"})
|
|
def generate(self):
|
|
return {"content": "generated content"}
|
|
|
|
context = PendingFeedbackContext(
|
|
flow_id="terminal-non-emit-test-123",
|
|
flow_class="test.TestFlow",
|
|
method_name="generate",
|
|
method_output={"content": "generated content"},
|
|
message="Review this:",
|
|
metadata={"stage": "draft"},
|
|
)
|
|
persistence.save_pending_feedback(
|
|
flow_uuid="terminal-non-emit-test-123",
|
|
context=context,
|
|
state_data={"id": "terminal-non-emit-test-123"},
|
|
)
|
|
|
|
flow = TestFlow.from_pending("terminal-non-emit-test-123", persistence)
|
|
result = flow.resume("looks good!")
|
|
|
|
assert isinstance(result, HumanFeedbackResult)
|
|
assert result.output == {"content": "generated content"}
|
|
assert result.feedback == "looks good!"
|
|
assert result.outcome is None
|
|
assert result.metadata == {"stage": "draft"}
|
|
assert flow.method_outputs == [result]
|
|
|
|
@patch("crewai.flow.runtime.crewai_event_bus.emit")
|
|
def test_resume_routing(self, mock_emit: MagicMock) -> None:
|
|
"""Test resume with routing."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
class TestFlow(Flow):
|
|
result_path: str = ""
|
|
|
|
@start()
|
|
@human_feedback(
|
|
message="Approve?",
|
|
emit=["approved", "rejected"],
|
|
llm="gpt-4o-mini",
|
|
)
|
|
def review(self):
|
|
return "content"
|
|
|
|
@listen("approved")
|
|
def handle_approved(self):
|
|
self.result_path = "approved"
|
|
return "Approved!"
|
|
|
|
@listen("rejected")
|
|
def handle_rejected(self):
|
|
self.result_path = "rejected"
|
|
return "Rejected!"
|
|
|
|
context = PendingFeedbackContext(
|
|
flow_id="route-test-123",
|
|
flow_class="test.TestFlow",
|
|
method_name="review",
|
|
method_output="content",
|
|
message="Approve?",
|
|
emit=["approved", "rejected"],
|
|
llm="gpt-4o-mini",
|
|
)
|
|
persistence.save_pending_feedback(
|
|
flow_uuid="route-test-123",
|
|
context=context,
|
|
state_data={"id": "route-test-123"},
|
|
)
|
|
|
|
flow = TestFlow.from_pending("route-test-123", persistence)
|
|
|
|
with patch.object(flow, "_collapse_to_outcome", return_value="approved"):
|
|
result = flow.resume("yes, this looks great")
|
|
|
|
assert flow.last_human_feedback.outcome == "approved"
|
|
assert flow.result_path == "approved"
|
|
|
|
@patch("crewai.flow.runtime.crewai_event_bus.emit")
|
|
def test_terminal_resume_with_emit_returns_method_output(
|
|
self, mock_emit: MagicMock
|
|
) -> None:
|
|
"""Terminal resumed emit methods return the original method output."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
method_output = {"content": "original content", "status": "ready"}
|
|
|
|
class TestFlow(Flow):
|
|
@start()
|
|
@human_feedback(
|
|
message="Approve?",
|
|
emit=["approved", "rejected"],
|
|
llm="gpt-4o-mini",
|
|
)
|
|
def review(self):
|
|
return method_output
|
|
|
|
context = PendingFeedbackContext(
|
|
flow_id="terminal-route-test-123",
|
|
flow_class="test.TestFlow",
|
|
method_name="review",
|
|
method_output=method_output,
|
|
message="Approve?",
|
|
emit=["approved", "rejected"],
|
|
llm="gpt-4o-mini",
|
|
)
|
|
persistence.save_pending_feedback(
|
|
flow_uuid="terminal-route-test-123",
|
|
context=context,
|
|
state_data={"id": "terminal-route-test-123"},
|
|
)
|
|
|
|
flow = TestFlow.from_pending("terminal-route-test-123", persistence)
|
|
|
|
with patch.object(flow, "_collapse_to_outcome", return_value="approved"):
|
|
result = flow.resume("yes, this looks great")
|
|
|
|
assert result == method_output
|
|
assert flow.method_outputs == [method_output]
|
|
assert flow.last_human_feedback.outcome == "approved"
|
|
|
|
@patch("crewai.flow.runtime.crewai_event_bus.emit")
|
|
def test_resume_records_method_output_before_downstream_listeners(
|
|
self, mock_emit: MagicMock
|
|
) -> None:
|
|
"""Downstream listeners can read outputs from the resumed method."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
class TestFlow(Flow):
|
|
@start()
|
|
@human_feedback(message="Review:")
|
|
def review(self):
|
|
return "generated content"
|
|
|
|
@listen(review)
|
|
def downstream(self, result):
|
|
self.state["seen_outputs"] = self.method_outputs
|
|
return f"downstream:{result.output}"
|
|
|
|
context = PendingFeedbackContext(
|
|
flow_id="listener-output-test-123",
|
|
flow_class="test.TestFlow",
|
|
method_name="review",
|
|
method_output="generated content",
|
|
message="Review:",
|
|
)
|
|
persistence.save_pending_feedback(
|
|
flow_uuid="listener-output-test-123",
|
|
context=context,
|
|
state_data={"id": "listener-output-test-123"},
|
|
)
|
|
|
|
flow = TestFlow.from_pending("listener-output-test-123", persistence)
|
|
result = flow.resume("looks good")
|
|
|
|
assert result == "downstream:generated content"
|
|
assert len(flow.state["seen_outputs"]) == 1
|
|
seen_output = flow.state["seen_outputs"][0]
|
|
assert isinstance(seen_output, HumanFeedbackResult)
|
|
assert seen_output.output == "generated content"
|
|
assert seen_output.feedback == "looks good"
|
|
|
|
|
|
# Integration Tests with @human_feedback decorator
|
|
|
|
|
|
class TestAsyncHumanFeedbackIntegration:
|
|
"""Integration tests for async human feedback with decorator."""
|
|
|
|
def test_decorator_with_provider_parameter(self) -> None:
|
|
"""Test that decorator accepts provider parameter."""
|
|
|
|
class MockProvider:
|
|
def request_feedback(
|
|
self, context: PendingFeedbackContext, flow: Flow
|
|
) -> str:
|
|
raise HumanFeedbackPending(context=context)
|
|
|
|
class TestFlow(Flow):
|
|
@start()
|
|
@human_feedback(
|
|
message="Review:",
|
|
provider=MockProvider(),
|
|
)
|
|
def review(self):
|
|
return "content"
|
|
|
|
flow = TestFlow()
|
|
method = getattr(flow, "review")
|
|
assert hasattr(method, "__human_feedback_config__")
|
|
assert method.__human_feedback_config__.provider is not None
|
|
|
|
@patch("crewai.flow.runtime.crewai_event_bus.emit")
|
|
def test_async_provider_pauses_flow(self, mock_emit: MagicMock) -> None:
|
|
"""Test that async provider pauses flow execution."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
class PausingProvider:
|
|
def __init__(self, persistence: SQLiteFlowPersistence):
|
|
self.persistence = persistence
|
|
|
|
def request_feedback(
|
|
self, context: PendingFeedbackContext, flow: Flow
|
|
) -> str:
|
|
self.persistence.save_pending_feedback(
|
|
flow_uuid=context.flow_id,
|
|
context=context,
|
|
state_data=flow.state if isinstance(flow.state, dict) else flow.state.model_dump(),
|
|
)
|
|
raise HumanFeedbackPending(
|
|
context=context,
|
|
callback_info={"saved": True},
|
|
)
|
|
|
|
class TestFlow(Flow):
|
|
@start()
|
|
@human_feedback(
|
|
message="Review:",
|
|
provider=PausingProvider(persistence),
|
|
)
|
|
def generate(self):
|
|
return "generated content"
|
|
|
|
flow = TestFlow(persistence=persistence)
|
|
|
|
# kickoff now returns HumanFeedbackPending instead of raising it
|
|
result = flow.kickoff()
|
|
|
|
assert isinstance(result, HumanFeedbackPending)
|
|
assert result.callback_info["saved"] is True
|
|
|
|
flow_id = result.context.flow_id
|
|
|
|
persisted = persistence.load_pending_feedback(flow_id)
|
|
assert persisted is not None
|
|
|
|
@patch("crewai.flow.runtime.crewai_event_bus.emit")
|
|
def test_full_async_flow_cycle(self, mock_emit: MagicMock) -> None:
|
|
"""Test complete async flow: start -> pause -> resume."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
flow_id_holder: list[str] = []
|
|
|
|
class SaveAndPauseProvider:
|
|
def __init__(self, persistence: SQLiteFlowPersistence):
|
|
self.persistence = persistence
|
|
|
|
def request_feedback(
|
|
self, context: PendingFeedbackContext, flow: Flow
|
|
) -> str:
|
|
flow_id_holder.append(context.flow_id)
|
|
self.persistence.save_pending_feedback(
|
|
flow_uuid=context.flow_id,
|
|
context=context,
|
|
state_data=flow.state if isinstance(flow.state, dict) else flow.state.model_dump(),
|
|
)
|
|
raise HumanFeedbackPending(context=context)
|
|
|
|
class ReviewFlow(Flow):
|
|
processed_feedback: str = ""
|
|
|
|
@start()
|
|
@human_feedback(
|
|
message="Review this content:",
|
|
provider=SaveAndPauseProvider(persistence),
|
|
)
|
|
def generate(self):
|
|
return "AI generated content"
|
|
|
|
@listen(generate)
|
|
def process(self, feedback_result):
|
|
self.processed_feedback = feedback_result.feedback
|
|
return f"Final: {feedback_result.feedback}"
|
|
|
|
flow1 = ReviewFlow(persistence=persistence)
|
|
result = flow1.kickoff()
|
|
|
|
# kickoff now returns HumanFeedbackPending instead of raising it
|
|
assert isinstance(result, HumanFeedbackPending)
|
|
assert len(flow_id_holder) == 1
|
|
paused_flow_id = flow_id_holder[0]
|
|
|
|
flow2 = ReviewFlow.from_pending(paused_flow_id, persistence)
|
|
result = flow2.resume("This is my feedback")
|
|
|
|
assert flow2.last_human_feedback.feedback == "This is my feedback"
|
|
assert flow2.processed_feedback == "This is my feedback"
|
|
|
|
|
|
# Edge Case Tests
|
|
|
|
|
|
class TestAutoPersistence:
|
|
"""Tests for automatic persistence when no persistence is provided."""
|
|
|
|
@patch("crewai.flow.runtime.crewai_event_bus.emit")
|
|
def test_auto_persistence_when_none_provided(self, mock_emit: MagicMock) -> None:
|
|
"""Test that persistence is auto-created when HumanFeedbackPending is raised."""
|
|
|
|
class PausingProvider:
|
|
def request_feedback(
|
|
self, context: PendingFeedbackContext, flow: Flow
|
|
) -> str:
|
|
raise HumanFeedbackPending(
|
|
context=context,
|
|
callback_info={"paused": True},
|
|
)
|
|
|
|
class TestFlow(Flow):
|
|
@start()
|
|
@human_feedback(
|
|
message="Review:",
|
|
provider=PausingProvider(),
|
|
)
|
|
def generate(self):
|
|
return "content"
|
|
|
|
flow = TestFlow()
|
|
assert flow.persistence is None
|
|
|
|
# kickoff should auto-create persistence when HumanFeedbackPending is raised
|
|
result = flow.kickoff()
|
|
|
|
assert isinstance(result, HumanFeedbackPending)
|
|
|
|
# Persistence should have been auto-created
|
|
assert flow.persistence is not None
|
|
|
|
flow_id = result.context.flow_id
|
|
loaded = flow.persistence.load_pending_feedback(flow_id)
|
|
assert loaded is not None
|
|
|
|
|
|
class TestCollapseToOutcomeJsonParsing:
|
|
"""Tests for _collapse_to_outcome JSON parsing edge cases."""
|
|
|
|
def test_json_string_response_is_parsed(self) -> None:
|
|
"""Test that JSON string response from LLM is correctly parsed."""
|
|
flow = Flow()
|
|
|
|
with patch("crewai.llm.LLM") as MockLLM:
|
|
mock_llm = MagicMock()
|
|
# Simulate LLM returning JSON string (the bug we fixed)
|
|
mock_llm.call.return_value = '{"outcome": "approved"}'
|
|
MockLLM.return_value = mock_llm
|
|
|
|
result = flow._collapse_to_outcome(
|
|
feedback="I approve this",
|
|
outcomes=["approved", "rejected"],
|
|
llm="gpt-4o-mini",
|
|
)
|
|
|
|
assert result == "approved"
|
|
|
|
def test_plain_string_response_is_matched(self) -> None:
|
|
"""Test that plain string response is correctly matched."""
|
|
flow = Flow()
|
|
|
|
with patch("crewai.llm.LLM") as MockLLM:
|
|
mock_llm = MagicMock()
|
|
# Simulate LLM returning plain outcome string
|
|
mock_llm.call.return_value = "rejected"
|
|
MockLLM.return_value = mock_llm
|
|
|
|
result = flow._collapse_to_outcome(
|
|
feedback="This is not good",
|
|
outcomes=["approved", "rejected"],
|
|
llm="gpt-4o-mini",
|
|
)
|
|
|
|
assert result == "rejected"
|
|
|
|
def test_invalid_json_falls_back_to_matching(self) -> None:
|
|
"""Test that invalid JSON falls back to string matching."""
|
|
flow = Flow()
|
|
|
|
with patch("crewai.llm.LLM") as MockLLM:
|
|
mock_llm = MagicMock()
|
|
mock_llm.call.return_value = "{invalid json but says approved"
|
|
MockLLM.return_value = mock_llm
|
|
|
|
result = flow._collapse_to_outcome(
|
|
feedback="looks good",
|
|
outcomes=["approved", "rejected"],
|
|
llm="gpt-4o-mini",
|
|
)
|
|
|
|
assert result == "approved"
|
|
|
|
def test_llm_exception_falls_back_to_simple_prompting(self) -> None:
|
|
"""Test that LLM exception triggers fallback to simple prompting."""
|
|
flow = Flow()
|
|
|
|
with patch("crewai.llm.LLM") as MockLLM:
|
|
mock_llm = MagicMock()
|
|
# First call raises, second call succeeds (fallback)
|
|
mock_llm.call.side_effect = [
|
|
Exception("Structured output failed"),
|
|
"approved",
|
|
]
|
|
MockLLM.return_value = mock_llm
|
|
|
|
result = flow._collapse_to_outcome(
|
|
feedback="I approve",
|
|
outcomes=["approved", "rejected"],
|
|
llm="gpt-4o-mini",
|
|
)
|
|
|
|
assert result == "approved"
|
|
# Verify it was called twice (initial + fallback)
|
|
assert mock_llm.call.call_count == 2
|
|
|
|
|
|
class TestLLMObjectPreservedInContext:
|
|
"""Tests that BaseLLM objects have their model string preserved in PendingFeedbackContext."""
|
|
|
|
@patch("crewai.flow.runtime.crewai_event_bus.emit")
|
|
def test_basellm_object_model_string_survives_roundtrip(self, mock_emit: MagicMock) -> None:
|
|
"""Test that when llm is a BaseLLM object, its model string is stored in context
|
|
so that outcome collapsing works after async pause/resume.
|
|
|
|
This is the exact bug: locally the sync path keeps the LLM object in memory,
|
|
but in production the async path serializes the context and the LLM object was
|
|
discarded (stored as None), causing resume to skip classification and always
|
|
fall back to emit[0].
|
|
"""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
from crewai.llm import LLM
|
|
mock_llm_obj = LLM(
|
|
model="llama3",
|
|
provider="ollama",
|
|
base_url="http://localhost:11434",
|
|
)
|
|
|
|
class PausingProvider:
|
|
def __init__(self, persistence: SQLiteFlowPersistence):
|
|
self.persistence = persistence
|
|
self.captured_context: PendingFeedbackContext | None = None
|
|
|
|
def request_feedback(
|
|
self, context: PendingFeedbackContext, flow: Flow
|
|
) -> str:
|
|
self.captured_context = context
|
|
self.persistence.save_pending_feedback(
|
|
flow_uuid=context.flow_id,
|
|
context=context,
|
|
state_data=flow.state if isinstance(flow.state, dict) else flow.state.model_dump(),
|
|
)
|
|
raise HumanFeedbackPending(context=context)
|
|
|
|
provider = PausingProvider(persistence)
|
|
|
|
class TestFlow(Flow):
|
|
result_path: str = ""
|
|
|
|
@start()
|
|
@human_feedback(
|
|
message="Approve?",
|
|
emit=["needs_changes", "approved"],
|
|
llm=mock_llm_obj,
|
|
default_outcome="approved",
|
|
provider=provider,
|
|
)
|
|
def review(self):
|
|
return "content for review"
|
|
|
|
@listen("approved")
|
|
def handle_approved(self):
|
|
self.result_path = "approved"
|
|
return "Approved!"
|
|
|
|
@listen("needs_changes")
|
|
def handle_changes(self):
|
|
self.result_path = "needs_changes"
|
|
return "Changes needed"
|
|
|
|
flow1 = TestFlow(persistence=persistence)
|
|
result = flow1.kickoff()
|
|
assert isinstance(result, HumanFeedbackPending)
|
|
|
|
assert provider.captured_context is not None
|
|
assert isinstance(provider.captured_context.llm, dict)
|
|
assert provider.captured_context.llm["model"] == "ollama/llama3"
|
|
|
|
flow_id = result.context.flow_id
|
|
loaded = persistence.load_pending_feedback(flow_id)
|
|
assert loaded is not None
|
|
_, loaded_context = loaded
|
|
assert isinstance(loaded_context.llm, dict)
|
|
assert loaded_context.llm["model"] == "ollama/llama3"
|
|
|
|
flow2 = TestFlow.from_pending(flow_id, persistence)
|
|
assert flow2._pending_feedback_context is not None
|
|
assert isinstance(flow2._pending_feedback_context.llm, dict)
|
|
assert flow2._pending_feedback_context.llm["model"] == "ollama/llama3"
|
|
|
|
with patch.object(flow2, "_collapse_to_outcome", return_value="approved") as mock_collapse:
|
|
flow2.resume("this looks good, proceed!")
|
|
|
|
# The key assertion: _collapse_to_outcome was called (not skipped due to llm=None)
|
|
mock_collapse.assert_called_once()
|
|
call_kwargs = mock_collapse.call_args
|
|
assert call_kwargs.kwargs["feedback"] == "this looks good, proceed!"
|
|
assert call_kwargs.kwargs["outcomes"] == ["needs_changes", "approved"]
|
|
# LLM should be a live object (from _human_feedback_llm) or reconstructed, not None
|
|
assert call_kwargs.kwargs["llm"] is not None
|
|
assert getattr(call_kwargs.kwargs["llm"], "model", None) == "llama3"
|
|
assert flow2.last_human_feedback.outcome == "approved"
|
|
assert flow2.result_path == "approved"
|
|
|
|
def test_string_llm_still_works(self) -> None:
|
|
"""Test that passing llm as a string still works correctly."""
|
|
context = PendingFeedbackContext(
|
|
flow_id="str-llm-test",
|
|
flow_class="test.Flow",
|
|
method_name="review",
|
|
method_output="output",
|
|
message="Review:",
|
|
emit=["approved", "rejected"],
|
|
llm="gpt-4o-mini",
|
|
)
|
|
|
|
serialized = context.to_dict()
|
|
restored = PendingFeedbackContext.from_dict(serialized)
|
|
assert restored.llm == "gpt-4o-mini"
|
|
|
|
def test_none_llm_when_no_model_attr(self) -> None:
|
|
"""Test that llm is None when object has no model attribute."""
|
|
from crewai.flow.human_feedback import _serialize_llm_for_context
|
|
|
|
mock_obj = MagicMock(spec=[])
|
|
assert _serialize_llm_for_context(mock_obj) is None
|
|
|
|
def test_provider_prefix_added_to_bare_model(self) -> None:
|
|
"""Test that provider prefix is added when model has no slash."""
|
|
from crewai.flow.human_feedback import _serialize_llm_for_context
|
|
from crewai.llm import LLM
|
|
|
|
llm = LLM(
|
|
model="llama3",
|
|
provider="ollama",
|
|
base_url="http://localhost:11434",
|
|
)
|
|
result = _serialize_llm_for_context(llm)
|
|
assert isinstance(result, dict)
|
|
assert result["model"] == "ollama/llama3"
|
|
|
|
def test_provider_prefix_not_doubled_when_already_present(self) -> None:
|
|
"""Test that provider prefix is not added when model already has a slash."""
|
|
from crewai.flow.human_feedback import _serialize_llm_for_context
|
|
from crewai.llm import LLM
|
|
|
|
llm = LLM(model="ollama/llama3", base_url="http://localhost:11434")
|
|
result = _serialize_llm_for_context(llm)
|
|
assert isinstance(result, dict)
|
|
assert result["model"] == "ollama/llama3"
|
|
|
|
def test_no_provider_attr_falls_back_to_bare_model(self) -> None:
|
|
"""Test that objects without to_config_dict fall back to model string."""
|
|
from crewai.flow.human_feedback import _serialize_llm_for_context
|
|
|
|
mock_obj = MagicMock(spec=[])
|
|
mock_obj.model = "gpt-4o-mini"
|
|
assert _serialize_llm_for_context(mock_obj) == "gpt-4o-mini"
|
|
|
|
|
|
class TestAsyncHumanFeedbackEdgeCases:
|
|
"""Edge case tests for async human feedback."""
|
|
|
|
def test_pending_context_with_complex_output(self) -> None:
|
|
"""Test context with complex nested output."""
|
|
complex_output = {
|
|
"items": [{"id": 1, "name": "Item 1"}, {"id": 2, "name": "Item 2"}],
|
|
"metadata": {"total": 2, "page": 1},
|
|
"nested": {"deep": {"value": "test"}},
|
|
}
|
|
|
|
context = PendingFeedbackContext(
|
|
flow_id="complex-test",
|
|
flow_class="test.Flow",
|
|
method_name="method",
|
|
method_output=complex_output,
|
|
message="Review:",
|
|
)
|
|
|
|
# Serialize and deserialize
|
|
serialized = context.to_dict()
|
|
json_str = json.dumps(serialized)
|
|
restored = PendingFeedbackContext.from_dict(json.loads(json_str))
|
|
|
|
assert restored.method_output == complex_output
|
|
|
|
def test_empty_feedback_uses_default_outcome(self) -> None:
|
|
"""Test that empty feedback uses default outcome during resume."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
class TestFlow(Flow):
|
|
@start()
|
|
def generate(self):
|
|
return "content"
|
|
|
|
context = PendingFeedbackContext(
|
|
flow_id="default-test",
|
|
flow_class="test.Flow",
|
|
method_name="generate",
|
|
method_output="content",
|
|
message="Review:",
|
|
emit=["approved", "rejected"],
|
|
default_outcome="approved",
|
|
llm="gpt-4o-mini",
|
|
)
|
|
persistence.save_pending_feedback(
|
|
flow_uuid="default-test",
|
|
context=context,
|
|
state_data={"id": "default-test"},
|
|
)
|
|
|
|
flow = TestFlow.from_pending("default-test", persistence)
|
|
|
|
with patch("crewai.flow.runtime.crewai_event_bus.emit"):
|
|
result = flow.resume("")
|
|
|
|
assert flow.last_human_feedback.outcome == "approved"
|
|
|
|
def test_resume_without_feedback_uses_default(self) -> None:
|
|
"""Test that resume() can be called without feedback argument."""
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
class TestFlow(Flow):
|
|
@start()
|
|
def step(self):
|
|
return "output"
|
|
|
|
context = PendingFeedbackContext(
|
|
flow_id="no-feedback-test",
|
|
flow_class="TestFlow",
|
|
method_name="step",
|
|
method_output="test output",
|
|
message="Review:",
|
|
emit=["approved", "rejected"],
|
|
default_outcome="approved",
|
|
llm="gpt-4o-mini",
|
|
)
|
|
persistence.save_pending_feedback(
|
|
flow_uuid="no-feedback-test",
|
|
context=context,
|
|
state_data={"id": "no-feedback-test"},
|
|
)
|
|
|
|
flow = TestFlow.from_pending("no-feedback-test", persistence)
|
|
|
|
with patch("crewai.flow.runtime.crewai_event_bus.emit"):
|
|
result = flow.resume()
|
|
|
|
assert flow.last_human_feedback.outcome == "approved"
|
|
assert flow.last_human_feedback.feedback == ""
|
|
|
|
|
|
|
|
|
|
class TestResumeLLMFromSerializedContext:
|
|
"""Resume rebuilds the collapse LLM from the serialized context alone."""
|
|
|
|
@patch("crewai.flow.runtime.crewai_event_bus.emit")
|
|
def test_resume_builds_llm_from_serialized_context(
|
|
self, mock_emit: MagicMock
|
|
) -> None:
|
|
with tempfile.TemporaryDirectory() as tmpdir:
|
|
db_path = os.path.join(tmpdir, "test_flows.db")
|
|
persistence = SQLiteFlowPersistence(db_path)
|
|
|
|
class TestFlow(Flow):
|
|
@start()
|
|
@human_feedback(
|
|
message="Approve?",
|
|
emit=["approved", "rejected"],
|
|
llm="gpt-4o-mini",
|
|
)
|
|
def review(self):
|
|
return "content"
|
|
|
|
context = PendingFeedbackContext(
|
|
flow_id="fallback-test",
|
|
flow_class="TestFlow",
|
|
method_name="review",
|
|
method_output="content",
|
|
message="Approve?",
|
|
emit=["approved", "rejected"],
|
|
llm="gpt-4o-mini",
|
|
)
|
|
persistence.save_pending_feedback(
|
|
flow_uuid="fallback-test",
|
|
context=context,
|
|
state_data={"id": "fallback-test"},
|
|
)
|
|
|
|
flow = TestFlow.from_pending("fallback-test", persistence)
|
|
|
|
captured_llm = []
|
|
|
|
def capture_llm(feedback, outcomes, llm):
|
|
captured_llm.append(llm)
|
|
return "approved"
|
|
|
|
with patch.object(flow, "_collapse_to_outcome", side_effect=capture_llm):
|
|
flow.resume("looks good!")
|
|
|
|
assert len(captured_llm) == 1
|
|
from crewai.llms.base_llm import BaseLLM as BaseLLMClass
|
|
assert isinstance(captured_llm[0], BaseLLMClass)
|
|
assert captured_llm[0].model == "gpt-4o-mini"
|