Files
crewAI/lib/crewai/tests/agents/test_lite_agent.py
João Moura bb477f8a91
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Check Documentation Broken Links / Check broken links (push) Has been cancelled
Vulnerability Scan / pip-audit (push) Has been cancelled
Nightly Canary Release / Check for new commits (push) Has been cancelled
Nightly Canary Release / Build nightly packages (push) Has been cancelled
Nightly Canary Release / Publish nightly to PyPI (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
JSON first crews (#6131)
* feat(cli): introduce JSON crew project support and TUI enhancements

- Added support for creating and running JSON-defined crew projects, allowing users to scaffold projects with a new `create_json_crew.py` file.
- Implemented a full-screen Textual TUI for crew execution in `crew_run_tui.py`, enhancing user interaction with a two-column layout.
- Updated `run_crew.py` to prioritize JSON crew projects and added daemon mode for running without TUI.
- Introduced interactive pickers in `tui_picker.py` for improved CLI prompts.
- Enhanced validation for JSON crew files in `validate.py` to ensure proper structure and agent definitions.
- Updated `.gitignore` to exclude demo and crewai directories.

* feat: update LLM model references to gpt-5.4-mini

- Changed default LLM model from gpt-4o-mini to gpt-5.4-mini across various files, including CLI options, JSON crew configurations, and agent definitions.
- Enhanced benchmark and human feedback functionalities to utilize the new model.
- Improved user interface elements in the TUI for better interaction and feedback during execution.
- Added support for new skills directory in JSON crew project creation.

* feat(benchmark): add crew-level benchmarking functionality

- Introduced a new `benchmark` command in the CLI for crew-level benchmarking, allowing users to specify agents, models, and timeout settings.
- Implemented `CrewBenchmarkCase` to handle crew-level benchmark cases with inputs and criteria.
- Enhanced the benchmark runner to support progress tracking and detailed reporting of results for multiple models.
- Added tests for loading crew benchmark cases and validating their structure.
- Updated existing benchmark functions to accommodate the new crew-level execution model.

* feat(cli): enhance JSON crew project functionality and TUI improvements

- Added optional agent-level guardrails and advanced options in JSON crew configurations to improve output validation and flexibility.
- Updated the TUI to better handle plan step statuses, including visual indicators for task completion and failure.
- Introduced methods for parsing and managing step observation events, ensuring accurate updates to task statuses during execution.
- Enhanced validation for JSON crew projects, ensuring proper structure and error handling for agent and task definitions.
- Added comprehensive tests for new features and validation logic, ensuring robustness in JSON crew project handling.

* refactor(cli): streamline JSON crew project handling and improve validation

- Refactored JSON crew project loading and validation logic to enhance clarity and maintainability.
- Introduced utility functions for finding JSON crew files, improving code reuse across modules.
- Removed deprecated benchmark functionality and associated tests to simplify the codebase.
- Updated CLI commands to utilize the new JSON project structure, ensuring compatibility with recent changes.
- Enhanced test coverage for JSON crew project features, ensuring robust validation and error handling.

* feat(cli): enhance activity log navigation and focus management

- Added functionality to focus on the activity log when navigating through log entries.
- Implemented refresh logic for the log panel to ensure updates are displayed correctly during navigation.
- Improved keyboard navigation for log entries, allowing users to expand and scroll through logs seamlessly.
- Added tests to verify the correct behavior of log navigation and focus management in the TUI.

* feat(cli): enhance JSON crew project interaction and input handling

- Introduced a new function to enable prompt line editing for better user experience during input prompts.
- Updated the JSON crew project wizards to show interpolation hints for dynamic values, improving user guidance.
- Enhanced the handling of missing input placeholders by prompting users for required values during crew setup.
- Refactored the crew run logic to ensure proper loading and preparation of JSON-defined crews, including runtime input management.
- Added tests to verify the correct behavior of new input handling features and JSON crew project interactions.

* feat(cli): improve crew project input prompts and event handling

- Enhanced the `_prompt_text` function to allow for configurable spacing before prompts, improving user experience during input collection.
- Updated the wizards for agent and task creation to utilize the new prompt configuration, ensuring a more compact and streamlined interaction.
- Introduced new plan step lifecycle events (`PlanStepStartedEvent`, `PlanStepCompletedEvent`) to better track the execution status of plan steps.
- Refactored the step executor to emit these events during the execution of tasks, improving observability and debugging capabilities.
- Added tests to verify the correct behavior of new prompt handling and event emissions during crew project execution.

* fix: refine json-first crew interactions

* fix: prioritize common json crew tools

* fix: make json crew more tools expandable

* fix: show json crew tools by category

* feat(memory): update default embedder to OpenAI text-embedding-3-large and enhance memory compatibility

- Changed the default embedding model for Memory to OpenAI text-embedding-3-large, which uses 3072-dimensional vectors.
- Added warnings regarding compatibility issues with existing local memory stores created with 1536-dimensional embeddings.
- Updated documentation to reflect the new default embedder and its configuration options.
- Enhanced the CLI and codebase to support the new embedding model across various components, ensuring a seamless transition for users.

* fix: address PR review feedback for JSON-first crews

Review blockers:
- Forward trained_agents_file to JSON crews: crewai run -f now exports
  CREWAI_TRAINED_AGENTS_FILE for the in-process JSON crew path
- Wizard agent picker: Esc/cancel now reprompts instead of silently
  assigning the first agent
- JSON tool resolution hard-fails: unknown tool names, missing custom
  tool files, and invalid custom tool modules raise JSONProjectError
  with actionable messages instead of warn-and-continue
- Embedding dimension mismatch: LanceDB and Qdrant Edge storages raise
  EmbeddingDimensionMismatchError with reset/pin guidance instead of
  silently zero-filling vectors or returning empty search results
- Custom tool code execution documented in loader docstring and the
  scaffolded project README

CI fixes:
- ruff format across lib/
- All 133 PR-introduced mypy errors fixed (llm.py lazy-litellm and
  cli.py lazy command shims now use TYPE_CHECKING imports; textual
  is_mounted misuse fixed; pick_many overloads; misc annotations)

Bot review comments:
- Empty except blocks now have explanatory comments or debug logging
- Removed unused _C_BG/_C_PANEL/_C_BORDER globals and redundant
  import re; tests use a single import style for create_json_crew

Tests: trained-agents propagation, wizard cancel, tool resolution
failures, and dimension mismatch guidance.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: address second round of PR review comments

Cursor Bugbot:
- Wizard agent slugs: strip to [a-z0-9_] and fall back to agent_<n> so
  symbol-only roles can't produce an empty agents/.jsonc filename
- Wizard task names: dedupe against prior task names and fall back to
  task_<n> for symbol-only descriptions

CodeRabbit:
- Agent.message(): import Task explicitly at runtime instead of relying
  on the namespace injection done by crewai/__init__
- Async executor: move the native-tools-unsupported fallback from
  _ainvoke_loop_react (self-recursion) to _ainvoke_loop_native_tools,
  mirroring the sync implementation
- StepExecutor downgrade: keep the in-step conversation and append the
  text-tooling instructions instead of rebuilding messages, so completed
  native tool calls are not re-executed
- crewai-files: extension-based MIME lookup now runs before byte
  sniffing so csv/xml types are not degraded to text/plain
- Memory storages: validate every record in a save() batch against a
  consistent embedding dimension (LanceDB previously checked only the
  first record); added mixed-batch tests
- _print_post_tui_summary now typed against CrewRunApp
- Docs: Azure OpenAI default embedder change called out in the memory
  migration warning and provider table

Code quality bots:
- Removed unused _C_YELLOW/_C_CYAN (crew_run_tui) and _GREEN (tui_picker)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* feat(cli): accordion tool picker in JSON crew wizard

The flat tool list had grown to ~90 rows. The picker now shows:
- Common tools always visible at the top
- Every other category as a single expandable row with tool and
  selection counts (e.g. "Search & Research  (27 tools, 2 selected)")
- Expanding a category collapses the previously expanded one
- Selections persist across expand/collapse via new preselected
  support in pick_many; cursor follows the toggled category row

tui_picker gains preselected + initial_cursor options on pick_many,
and Esc in multi-select now confirms the current selection instead of
discarding it (required so collapsing can't silently drop choices).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* refactor(cli): remove --daemon flag from crewai run

The flag only affected JSON crew projects — classic and flow projects
ignored it entirely, which made the behavior inconsistent. Removed the
option, the daemon code path (_run_json_crew_daemon), and its helper
(_load_json_crew_with_inputs).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* test: update run command tests after --daemon removal

lib/crewai/tests/cli/test_run_crew.py still asserted the old
run_crew(trained_agents_file=..., daemon=False) call signature.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(cli): exit codes, mid-run quit, async statuses, hyphen placeholders

Addresses the latest Bugbot review round:

- Failed JSON crew runs now exit non-zero (SystemExit(1)) so scripts
  and CI don't treat failures as success, mirroring the classic path
- Quitting the TUI mid-run now ends the process (os._exit(130));
  kickoff runs in a thread worker that cannot be force-cancelled, so
  letting the CLI return would leave LLM/tool work burning tokens in
  the background
- Sidebar task statuses are now async-safe: completion/failure events
  resolve the task's own row via identity instead of assuming the most
  recently started task, and starting a task no longer blanket-marks
  earlier active rows as done
- The runtime-input prompt regex now accepts hyphenated placeholder
  names ({my-topic}), matching kickoff's interpolation pattern

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix: validation safety, custom tool sandboxing, TUI log integrity, memory error surfacing

- Deploy validation no longer executes project code: validation mode
  checks tool declarations structurally (well-formed entries, custom
  tool file exists) without importing or instantiating anything.
  custom:<name> resolution only happens on the actual run path.
- custom:<name> is constrained to [A-Za-z_][A-Za-z0-9_]* and the
  resolved path must stay inside the project's tools/ directory, so
  custom:../foo or absolute-path names cannot execute code outside it.
  Tool paths resolve relative to the crew project root, not cwd.
- TUI task logs are built from per-task state captured at task start
  (idx, description, agent, start time); an out-of-order completion
  takes its output from the event and no longer steals or resets the
  current task's streamed steps/output.
- EmbeddingDimensionMismatchError now inherits ValueError instead of
  RuntimeError so background saves surface it through
  MemorySaveFailedEvent instead of silently dropping the save; the
  shutdown catch in _background_encode_batch is narrowed to the
  "cannot schedule new futures" case.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(cli): declared project type wins over crew.json presence

A flow project that also contains a crew.json(c) file now runs and
validates as the flow it declares in pyproject.toml instead of being
hijacked by the JSON crew path. Both crewai run (_has_json_crew) and
deploy validation (_is_json_crew) check tool.crewai.type; a missing or
unreadable pyproject still means a bare JSON crew project.

Also documents why StepObservationFailedEvent intentionally marks the
plan step "done": the event signals an observer failure, not a step
failure, and the executor continues past it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

* fix(cli): type the declared_type locals so mypy stays clean

Comparing an Any-typed .get() chain returns Any, which tripped
no-any-return on the previous commit.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-14 04:19:48 -03:00

1141 lines
37 KiB
Python

# mypy: ignore-errors
import threading
from collections import defaultdict
from typing import cast
from unittest.mock import Mock, patch
from crewai.events.event_bus import crewai_event_bus
from crewai.events.types.agent_events import LiteAgentExecutionStartedEvent
from crewai.events.types.tool_usage_events import ToolUsageStartedEvent
from crewai.lite_agent import LiteAgent
from crewai.lite_agent_output import LiteAgentOutput
from crewai.llms.base_llm import BaseLLM
from pydantic import BaseModel, Field
import pytest
from crewai import LLM, Agent
from crewai.flow import Flow, start
from crewai.tools import BaseTool
from crewai.types.usage_metrics import UsageMetrics
class SecretLookupTool(BaseTool):
name: str = "secret_lookup"
description: str = "A tool to lookup secrets"
def _run(self) -> str:
return "SUPERSECRETPASSWORD123"
class WebSearchTool(BaseTool):
"""Tool for searching the web for information."""
name: str = "search_web"
description: str = "Search the web for information about a topic."
def _run(self, query: str) -> str:
"""Search the web for information about a topic."""
if "tokyo" in query.lower():
return "Tokyo's population in 2023 was approximately 21 million people in the city proper, and 37 million in the greater metropolitan area."
if "climate change" in query.lower() and "coral" in query.lower():
return "Climate change severely impacts coral reefs through: 1) Ocean warming causing coral bleaching, 2) Ocean acidification reducing calcification, 3) Sea level rise affecting light availability, 4) Increased storm frequency damaging reef structures. Sources: NOAA Coral Reef Conservation Program, Global Coral Reef Alliance."
return f"Found information about {query}: This is a simulated search result for demonstration purposes."
class CalculatorTool(BaseTool):
"""Tool for performing calculations."""
name: str = "calculate"
description: str = "Calculate the result of a mathematical expression."
def _run(self, expression: str) -> str:
"""Calculate the result of a mathematical expression."""
try:
result = eval(expression, {"__builtins__": {}}) # noqa: S307
return f"The result of {expression} is {result}"
except Exception as e:
return f"Error calculating {expression}: {e!s}"
# Define a custom response format using Pydantic
class ResearchResult(BaseModel):
"""Structure for research results."""
main_findings: str = Field(description="The main findings from the research")
key_points: list[str] = Field(description="List of key points")
sources: list[str] = Field(description="List of sources used")
@pytest.mark.vcr()
@pytest.mark.parametrize("verbose", [True, False])
def test_agent_kickoff_preserves_parameters(verbose):
"""Test that Agent.kickoff() uses the correct parameters from the Agent."""
mock_llm = Mock(spec=LLM)
mock_llm.call.return_value = "Final Answer: Test response"
mock_llm.stop = []
from crewai.types.usage_metrics import UsageMetrics
mock_usage_metrics = UsageMetrics(
total_tokens=100,
prompt_tokens=50,
completion_tokens=50,
cached_prompt_tokens=0,
successful_requests=1,
)
mock_llm.get_token_usage_summary.return_value = mock_usage_metrics
custom_tools = [WebSearchTool(), CalculatorTool()]
max_iter = 10
agent = Agent(
role="Test Agent",
goal="Test Goal",
backstory="Test Backstory",
llm=mock_llm,
tools=custom_tools,
max_iter=max_iter,
verbose=verbose,
)
result = agent.kickoff("Test query")
assert agent.role == "Test Agent"
assert agent.goal == "Test Goal"
assert agent.backstory == "Test Backstory"
assert len(agent.tools) == 2
assert isinstance(agent.tools[0], WebSearchTool)
assert isinstance(agent.tools[1], CalculatorTool)
assert agent.max_iter == max_iter
assert agent.verbose == verbose
assert result is not None
assert result.raw is not None
@pytest.mark.vcr()
def test_lite_agent_with_tools():
"""Test that Agent can use tools."""
llm = LLM(model="gpt-4o-mini")
agent = Agent(
role="Research Assistant",
goal="Find information about the population of Tokyo",
backstory="You are a helpful research assistant who can search for information about the population of Tokyo.",
llm=llm,
tools=[WebSearchTool()],
verbose=True,
)
result = agent.kickoff(
"What is the population of Tokyo and how many people would that be per square kilometer if Tokyo's area is 2,194 square kilometers?"
)
assert "21 million" in result.raw or "37 million" in result.raw, (
"Agent should find Tokyo's population"
)
assert "per square kilometer" in result.raw, (
"Agent should calculate population density"
)
received_events = []
event_received = threading.Event()
@crewai_event_bus.on(ToolUsageStartedEvent)
def event_handler(source, event):
received_events.append(event)
event_received.set()
agent.kickoff("What are the effects of climate change on coral reefs?")
assert event_received.wait(timeout=5), "Timeout waiting for tool usage events"
assert len(received_events) > 0, "Tool usage events should be emitted"
event = received_events[0]
assert isinstance(event, ToolUsageStartedEvent)
assert event.agent_role == "Research Assistant"
assert event.tool_name == "search_web"
@pytest.mark.vcr()
def test_lite_agent_structured_output():
"""Test that Agent can return a simple structured output."""
class SimpleOutput(BaseModel):
"""Simple structure for agent outputs."""
summary: str = Field(description="A brief summary of findings")
confidence: int = Field(description="Confidence level from 1-100")
web_search_tool = WebSearchTool()
llm = LLM(model="gpt-4o-mini")
agent = Agent(
role="Info Gatherer",
goal="Provide brief information",
backstory="You gather and summarize information quickly.",
llm=llm,
tools=[web_search_tool],
verbose=True,
)
result = agent.kickoff(
"What is the population of Tokyo? Return your structured output in JSON format with the following fields: summary, confidence",
response_format=SimpleOutput,
)
assert result.pydantic is not None, "Should return a Pydantic model"
output = cast(SimpleOutput, result.pydantic)
assert isinstance(output.summary, str), "Summary should be a string"
assert len(output.summary) > 0, "Summary should not be empty"
assert isinstance(output.confidence, int), "Confidence should be an integer"
assert 1 <= output.confidence <= 100, "Confidence should be between 1 and 100"
assert "tokyo" in output.summary.lower() or "population" in output.summary.lower()
assert result.usage_metrics is not None
return result
@pytest.mark.vcr()
def test_lite_agent_returns_usage_metrics():
"""Test that LiteAgent returns usage metrics."""
llm = LLM(model="gpt-4o-mini")
agent = Agent(
role="Research Assistant",
goal="Find information about the population of Tokyo",
backstory="You are a helpful research assistant who can search for information about the population of Tokyo.",
llm=llm,
tools=[WebSearchTool()],
verbose=True,
)
result = agent.kickoff(
"What is the population of Tokyo? Return your structured output in JSON format with the following fields: summary, confidence"
)
assert result.usage_metrics is not None
assert result.usage_metrics["total_tokens"] > 0
@pytest.mark.vcr()
def test_lite_agent_output_includes_messages():
"""Test that LiteAgentOutput includes messages from agent execution."""
llm = LLM(model="gpt-4o-mini")
agent = Agent(
role="Research Assistant",
goal="Find information about the population of Tokyo",
backstory="You are a helpful research assistant who can search for information about the population of Tokyo.",
llm=llm,
tools=[WebSearchTool()],
verbose=True,
)
result = agent.kickoff("What is the population of Tokyo?")
assert isinstance(result, LiteAgentOutput)
assert hasattr(result, "messages")
assert isinstance(result.messages, list)
assert len(result.messages) > 0
@pytest.mark.vcr()
@pytest.mark.asyncio
async def test_lite_agent_returns_usage_metrics_async():
"""Test that LiteAgent returns usage metrics when run asynchronously."""
llm = LLM(model="gpt-4o-mini")
agent = Agent(
role="Research Assistant",
goal="Find information about the population of Tokyo",
backstory="You are a helpful research assistant who can search for information about the population of Tokyo.",
llm=llm,
tools=[WebSearchTool()],
verbose=True,
)
result = await agent.kickoff_async(
"What is the population of Tokyo? Return your structured output in JSON format with the following fields: summary, confidence"
)
assert isinstance(result, LiteAgentOutput)
assert (
"21 million" in result.raw
or "37 million" in result.raw
or "21000000" in result.raw
or "37000000" in result.raw
)
assert result.usage_metrics is not None
assert result.usage_metrics["total_tokens"] > 0
class TestFlow(Flow):
"""A test flow that creates and runs an agent."""
def __init__(self, llm, tools):
self.llm = llm
self.tools = tools
super().__init__()
@start()
def start(self):
agent = Agent(
role="Test Agent",
goal="Test Goal",
backstory="Test Backstory",
llm=self.llm,
tools=self.tools,
)
return agent.kickoff("Test query")
def verify_agent_flow_context(result, agent, flow):
"""Verify that both the result and agent have the correct flow context."""
assert result._flow_id == flow.flow_id # type: ignore[attr-defined]
assert result._request_id == flow.flow_id # type: ignore[attr-defined]
assert agent is not None
assert agent._flow_id == flow.flow_id # type: ignore[attr-defined]
assert agent._request_id == flow.flow_id # type: ignore[attr-defined]
def test_sets_flow_context_when_inside_flow():
"""Test that an Agent can be created and executed inside a Flow context."""
captured_event = None
mock_llm = Mock(spec=LLM)
mock_llm.call.return_value = "Test response"
mock_llm.stop = []
from crewai.types.usage_metrics import UsageMetrics
mock_usage_metrics = UsageMetrics(
total_tokens=100,
prompt_tokens=50,
completion_tokens=50,
cached_prompt_tokens=0,
successful_requests=1,
)
mock_llm.get_token_usage_summary.return_value = mock_usage_metrics
class MyFlow(Flow):
@start()
def start(self):
agent = Agent(
role="Test Agent",
goal="Test Goal",
backstory="Test Backstory",
llm=mock_llm,
tools=[WebSearchTool()],
)
return agent.kickoff("Test query")
flow = MyFlow()
event_received = threading.Event()
@crewai_event_bus.on(LiteAgentExecutionStartedEvent)
def capture_event(source, event):
nonlocal captured_event
captured_event = event
event_received.set()
result = flow.kickoff()
assert event_received.wait(timeout=5), "Timeout waiting for agent execution event"
assert captured_event is not None
assert captured_event.agent_info["role"] == "Test Agent"
assert result is not None
@pytest.mark.vcr()
def test_guardrail_is_called_using_string():
"""Test that a string guardrail triggers events and retries correctly.
Uses a callable guardrail that deterministically fails on the first
attempt and passes on the second. This tests the guardrail event
machinery (started/completed events, retry loop) without depending
on the LLM to comply with contradictory constraints.
"""
guardrail_events: dict[str, list] = defaultdict(list)
from crewai.events.event_types import (
LLMGuardrailCompletedEvent,
LLMGuardrailStartedEvent,
)
# Deterministic guardrail: fail first call, pass second
call_count = {"n": 0}
def fail_then_pass_guardrail(output):
call_count["n"] += 1
if call_count["n"] == 1:
return (False, "Missing required format — please use a numbered list")
return (True, output)
agent = Agent(
role="Sports Analyst",
goal="List the best soccer players",
backstory="You are an expert at gathering and organizing information.",
guardrail=fail_then_pass_guardrail,
guardrail_max_retries=3,
)
condition = threading.Condition()
@crewai_event_bus.on(LLMGuardrailStartedEvent)
def capture_guardrail_started(source, event):
assert isinstance(source, Agent)
with condition:
guardrail_events["started"].append(event)
condition.notify()
@crewai_event_bus.on(LLMGuardrailCompletedEvent)
def capture_guardrail_completed(source, event):
assert isinstance(source, Agent)
with condition:
guardrail_events["completed"].append(event)
condition.notify()
result = agent.kickoff(messages="Top 5 best soccer players in the world?")
with condition:
success = condition.wait_for(
lambda: len(guardrail_events["started"]) >= 2
and any(e.success for e in guardrail_events["completed"]),
timeout=10,
)
assert success, "Timeout waiting for successful guardrail event"
assert len(guardrail_events["started"]) >= 2
assert len(guardrail_events["completed"]) >= 2
assert not guardrail_events["completed"][0].success
successful_events = [e for e in guardrail_events["completed"] if e.success]
assert len(successful_events) >= 1, "Expected at least one successful guardrail completion"
assert result is not None
@pytest.mark.vcr()
def test_guardrail_is_called_using_callable():
guardrail_events: dict[str, list] = defaultdict(list)
from crewai.events.event_types import (
LLMGuardrailCompletedEvent,
LLMGuardrailStartedEvent,
)
condition = threading.Condition()
@crewai_event_bus.on(LLMGuardrailStartedEvent)
def capture_guardrail_started(source, event):
with condition:
guardrail_events["started"].append(event)
condition.notify()
@crewai_event_bus.on(LLMGuardrailCompletedEvent)
def capture_guardrail_completed(source, event):
with condition:
guardrail_events["completed"].append(event)
condition.notify()
agent = Agent(
role="Sports Analyst",
goal="Gather information about the best soccer players",
backstory="""You are an expert at gathering and organizing information. You carefully collect details and present them in a structured way.""",
guardrail=lambda output: (True, "Pelé - Santos, 1958"),
)
result = agent.kickoff(messages="Top 1 best players in the world?")
with condition:
success = condition.wait_for(
lambda: len(guardrail_events["started"]) >= 1
and len(guardrail_events["completed"]) >= 1,
timeout=10,
)
assert success, "Timeout waiting for all guardrail events"
assert len(guardrail_events["started"]) == 1
assert len(guardrail_events["completed"]) == 1
assert guardrail_events["completed"][0].success
assert "Pelé - Santos, 1958" in result.raw
@pytest.mark.vcr()
def test_guardrail_reached_attempt_limit():
guardrail_events: dict[str, list] = defaultdict(list)
from crewai.events.event_types import (
LLMGuardrailCompletedEvent,
LLMGuardrailStartedEvent,
)
condition = threading.Condition()
@crewai_event_bus.on(LLMGuardrailStartedEvent)
def capture_guardrail_started(source, event):
with condition:
guardrail_events["started"].append(event)
condition.notify()
@crewai_event_bus.on(LLMGuardrailCompletedEvent)
def capture_guardrail_completed(source, event):
with condition:
guardrail_events["completed"].append(event)
condition.notify()
agent = Agent(
role="Sports Analyst",
goal="Gather information about the best soccer players",
backstory="""You are an expert at gathering and organizing information. You carefully collect details and present them in a structured way.""",
guardrail=lambda output: (
False,
"You are not allowed to include Brazilian players",
),
guardrail_max_retries=2,
)
with pytest.raises(
Exception, match="Agent's guardrail failed validation after 2 retries"
):
agent.kickoff(messages="Top 10 best players in the world?")
with condition:
success = condition.wait_for(
lambda: len(guardrail_events["started"]) >= 3
and len(guardrail_events["completed"]) >= 3,
timeout=10,
)
assert success, "Timeout waiting for all guardrail events"
assert len(guardrail_events["started"]) == 3 # 2 retries + 1 initial call
assert len(guardrail_events["completed"]) == 3 # 2 retries + 1 initial call
assert not guardrail_events["completed"][0].success
assert not guardrail_events["completed"][1].success
assert not guardrail_events["completed"][2].success
@pytest.mark.vcr()
def test_agent_output_when_guardrail_returns_base_model():
class Player(BaseModel):
name: str
country: str
agent = Agent(
role="Sports Analyst",
goal="Gather information about the best soccer players",
backstory="""You are an expert at gathering and organizing information. You carefully collect details and present them in a structured way.""",
guardrail=lambda output: (
True,
Player(name="Lionel Messi", country="Argentina"),
),
)
result = agent.kickoff(messages="Top 10 best players in the world?")
assert result.pydantic == Player(name="Lionel Messi", country="Argentina")
def test_lite_agent_with_custom_llm_and_guardrails():
"""Test that CustomLLM (inheriting from BaseLLM) works with guardrails."""
class CustomLLM(BaseLLM):
def __init__(self, response: str = "Custom response"):
super().__init__(model="custom-model")
self.response = response
self.call_count = 0
def call(
self,
messages,
tools=None,
callbacks=None,
available_functions=None,
from_task=None,
from_agent=None,
response_model=None,
) -> str:
self.call_count += 1
if "valid" in str(messages) and "feedback" in str(messages):
return '{"valid": true, "feedback": null}'
if "Thought:" in str(messages):
return f"Thought: I will analyze soccer players\nFinal Answer: {self.response}"
return self.response
def supports_function_calling(self) -> bool:
return False
def supports_stop_words(self) -> bool:
return False
def get_context_window_size(self) -> int:
return 4096
custom_llm = CustomLLM(response="Brazilian soccer players are the best!")
agent = LiteAgent(
role="Sports Analyst",
goal="Analyze soccer players",
backstory="You analyze soccer players and their performance.",
llm=custom_llm,
guardrail="Only include Brazilian players",
)
result = agent.kickoff("Tell me about the best soccer players")
assert custom_llm.call_count > 0
assert "Brazilian" in result.raw
custom_llm2 = CustomLLM(response="Original response")
def test_guardrail(output):
return (True, "Modified by guardrail")
agent2 = LiteAgent(
role="Test Agent",
goal="Test goal",
backstory="Test backstory",
llm=custom_llm2,
guardrail=test_guardrail,
)
result2 = agent2.kickoff("Test message")
assert result2.raw == "Modified by guardrail"
@pytest.mark.vcr()
def test_lite_agent_with_invalid_llm():
"""Test that LiteAgent raises proper error when create_llm returns None."""
with patch("crewai.lite_agent.create_llm", return_value=None):
with pytest.raises(ValueError) as exc_info:
LiteAgent(
role="Test Agent",
goal="Test goal",
backstory="Test backstory",
llm="invalid-model",
)
assert "Expected LLM instance of type BaseLLM" in str(exc_info.value)
@patch.dict("os.environ", {"CREWAI_PLATFORM_INTEGRATION_TOKEN": "test_token"})
@patch("crewai_tools.tools.crewai_platform_tools.crewai_platform_action_tool.requests.post")
@patch("crewai_tools.tools.crewai_platform_tools.crewai_platform_tool_builder.requests.get")
@pytest.mark.vcr()
def test_agent_kickoff_with_platform_tools(mock_get, mock_post):
"""Test that Agent.kickoff() properly integrates platform tools with LiteAgent"""
mock_response = Mock()
mock_response.raise_for_status.return_value = None
mock_response.json.return_value = {
"actions": {
"github": [
{
"name": "create_issue",
"description": "Create a GitHub issue",
"parameters": {
"type": "object",
"properties": {
"title": {"type": "string", "description": "Issue title"},
"body": {"type": "string", "description": "Issue body"},
},
"required": ["title"],
},
}
]
}
}
mock_get.return_value = mock_response
mock_post_response = Mock()
mock_post_response.ok = True
mock_post_response.json.return_value = {
"success": True,
"issue_url": "https://github.com/test/repo/issues/1"
}
mock_post.return_value = mock_post_response
agent = Agent(
role="Test Agent",
goal="Test goal",
backstory="Test backstory",
llm=LLM(model="gpt-3.5-turbo"),
apps=["github"],
verbose=True
)
result = agent.kickoff("Create a GitHub issue")
assert isinstance(result, LiteAgentOutput)
assert result.raw is not None
@patch.dict("os.environ", {"EXA_API_KEY": "test_exa_key"})
@patch("crewai.agent.Agent.get_mcp_tools")
@pytest.mark.vcr()
def test_agent_kickoff_with_mcp_tools(mock_get_mcp_tools):
"""Test that Agent.kickoff() properly integrates MCP tools with LiteAgent"""
class MockMCPTool(BaseTool):
name: str = "exa_search"
description: str = "Search the web using Exa"
def _run(self, query: str) -> str:
return f"Mock search results for: {query}"
mock_get_mcp_tools.return_value = [MockMCPTool()]
agent = Agent(
role="Test Agent",
goal="Test goal",
backstory="Test backstory",
llm=LLM(model="gpt-3.5-turbo"),
mcps=["https://mcp.exa.ai/mcp?api_key=test_exa_key&profile=research"],
verbose=True
)
result = agent.kickoff("Search for information about AI")
assert isinstance(result, LiteAgentOutput)
assert result.raw is not None
mock_get_mcp_tools.assert_called_once_with(["https://mcp.exa.ai/mcp?api_key=test_exa_key&profile=research"])
from crewai.flow.flow import listen
@pytest.mark.vcr()
def test_lite_agent_inside_flow_sync():
"""Test that LiteAgent.kickoff() works magically inside a Flow.
This tests the "magic auto-async" pattern where calling agent.kickoff()
from within a Flow automatically detects the event loop and returns a
coroutine that the Flow framework awaits. Users don't need to use async/await.
"""
execution_log = []
class TestFlow(Flow):
@start()
def run_agent(self):
execution_log.append("flow_started")
agent = Agent(
role="Test Agent",
goal="Answer questions",
backstory="A helpful test assistant",
llm=LLM(model="gpt-4o-mini"),
verbose=False,
)
# Magic: just call kickoff() normally - it auto-detects Flow context
result = agent.kickoff(messages="What is 2+2? Reply with just the number.")
execution_log.append("agent_completed")
return result
flow = TestFlow()
result = flow.kickoff()
assert "flow_started" in execution_log
assert "agent_completed" in execution_log
assert result is not None
assert isinstance(result, LiteAgentOutput)
@pytest.mark.vcr()
def test_lite_agent_inside_flow_with_tools():
"""Test that LiteAgent with tools works correctly inside a Flow."""
class TestFlow(Flow):
@start()
def run_agent_with_tools(self):
agent = Agent(
role="Calculator Agent",
goal="Perform calculations",
backstory="A math expert",
llm=LLM(model="gpt-4o-mini"),
tools=[CalculatorTool()],
verbose=False,
)
result = agent.kickoff(messages="Calculate 10 * 5")
return result
flow = TestFlow()
result = flow.kickoff()
assert result is not None
assert isinstance(result, LiteAgentOutput)
assert result.raw is not None
@pytest.mark.vcr()
def test_multiple_agents_in_same_flow():
"""Test that multiple LiteAgents can run sequentially in the same Flow."""
class MultiAgentFlow(Flow):
@start()
def first_step(self):
agent1 = Agent(
role="First Agent",
goal="Greet users",
backstory="A friendly greeter",
llm=LLM(model="gpt-4o-mini"),
verbose=False,
)
return agent1.kickoff(messages="Say hello")
@listen(first_step)
def second_step(self, first_result):
agent2 = Agent(
role="Second Agent",
goal="Say goodbye",
backstory="A polite farewell agent",
llm=LLM(model="gpt-4o-mini"),
verbose=False,
)
return agent2.kickoff(messages="Say goodbye")
flow = MultiAgentFlow()
result = flow.kickoff()
assert result is not None
assert isinstance(result, LiteAgentOutput)
@pytest.mark.vcr()
def test_lite_agent_kickoff_async_inside_flow():
"""Test that Agent.kickoff_async() works correctly from async Flow methods."""
class AsyncAgentFlow(Flow):
@start()
async def async_agent_step(self):
agent = Agent(
role="Async Test Agent",
goal="Answer questions asynchronously",
backstory="An async helper",
llm=LLM(model="gpt-4o-mini"),
verbose=False,
)
result = await agent.kickoff_async(messages="What is 3+3?")
return result
flow = AsyncAgentFlow()
result = flow.kickoff()
assert result is not None
assert isinstance(result, LiteAgentOutput)
@pytest.mark.vcr()
def test_lite_agent_standalone_still_works():
"""Test that LiteAgent.kickoff() still works normally outside of a Flow.
This verifies that the magic auto-async pattern doesn't break standalone usage
where there's no event loop running.
"""
agent = Agent(
role="Standalone Agent",
goal="Answer questions",
backstory="A helpful assistant",
llm=LLM(model="gpt-4o-mini"),
verbose=False,
)
result = agent.kickoff(messages="What is 5+5? Reply with just the number.")
assert result is not None
assert isinstance(result, LiteAgentOutput)
assert result.raw is not None
def test_agent_kickoff_with_files_parameter():
"""Test that Agent.kickoff() accepts and passes files to the executor."""
from unittest.mock import Mock, patch
from crewai_files import File
from crewai.types.usage_metrics import UsageMetrics
mock_llm = Mock(spec=LLM)
mock_llm.call.return_value = "Final Answer: I can see the file content."
mock_llm.stop = []
mock_llm.supports_stop_words.return_value = False
mock_llm.get_token_usage_summary.return_value = UsageMetrics(
total_tokens=100,
prompt_tokens=50,
completion_tokens=50,
cached_prompt_tokens=0,
successful_requests=1,
)
agent = Agent(
role="File Analyzer",
goal="Analyze files",
backstory="An agent that analyzes files",
llm=mock_llm,
verbose=False,
)
test_file = File(source=b"mock pdf content")
input_files = {"document.pdf": test_file}
with patch.object(
agent, "_prepare_kickoff", wraps=agent._prepare_kickoff
) as mock_prepare:
result = agent.kickoff(messages="Analyze the document", input_files=input_files)
mock_prepare.assert_called_once()
call_args = mock_prepare.call_args
assert call_args.args[0] == "Analyze the document"
called_files = call_args.kwargs.get("input_files") or call_args.args[2]
assert "document.pdf" in called_files
assert called_files["document.pdf"] is test_file
assert result is not None
def test_prepare_kickoff_extracts_files_from_messages():
"""Test that _prepare_kickoff extracts files from messages."""
from unittest.mock import Mock
from crewai_files import File
from crewai.types.usage_metrics import UsageMetrics
mock_llm = Mock(spec=LLM)
mock_llm.call.return_value = "Final Answer: Done."
mock_llm.stop = []
mock_llm.supports_stop_words.return_value = False
mock_llm.get_token_usage_summary.return_value = UsageMetrics(
total_tokens=100,
prompt_tokens=50,
completion_tokens=50,
cached_prompt_tokens=0,
successful_requests=1,
)
agent = Agent(
role="Test Agent",
goal="Test files",
backstory="Test backstory",
llm=mock_llm,
verbose=False,
)
test_file = File(source=b"mock image content")
messages = [
{"role": "user", "content": "Analyze this", "files": {"img.png": test_file}}
]
executor, inputs, agent_info, parsed_tools = agent._prepare_kickoff(messages=messages)
assert "files" in inputs
assert "img.png" in inputs["files"]
assert inputs["files"]["img.png"] is test_file
def test_prepare_kickoff_merges_files_from_messages_and_parameter():
"""Test that _prepare_kickoff merges files from messages and parameter."""
from unittest.mock import Mock
from crewai_files import File
from crewai.types.usage_metrics import UsageMetrics
mock_llm = Mock(spec=LLM)
mock_llm.call.return_value = "Final Answer: Done."
mock_llm.stop = []
mock_llm.supports_stop_words.return_value = False
mock_llm.get_token_usage_summary.return_value = UsageMetrics(
total_tokens=100,
prompt_tokens=50,
completion_tokens=50,
cached_prompt_tokens=0,
successful_requests=1,
)
agent = Agent(
role="Test Agent",
goal="Test files",
backstory="Test backstory",
llm=mock_llm,
verbose=False,
)
msg_file = File(source=b"message file content")
param_file = File(source=b"param file content")
messages = [
{"role": "user", "content": "Analyze these", "files": {"from_msg.png": msg_file}}
]
input_files = {"from_param.pdf": param_file}
executor, inputs, agent_info, parsed_tools = agent._prepare_kickoff(
messages=messages, input_files=input_files
)
assert "files" in inputs
assert "from_msg.png" in inputs["files"]
assert "from_param.pdf" in inputs["files"]
assert inputs["files"]["from_msg.png"] is msg_file
assert inputs["files"]["from_param.pdf"] is param_file
def test_prepare_kickoff_param_files_override_message_files():
"""Test that files parameter overrides files from messages with same name."""
from unittest.mock import Mock
from crewai_files import File
from crewai.types.usage_metrics import UsageMetrics
mock_llm = Mock(spec=LLM)
mock_llm.call.return_value = "Final Answer: Done."
mock_llm.stop = []
mock_llm.supports_stop_words.return_value = False
mock_llm.get_token_usage_summary.return_value = UsageMetrics(
total_tokens=100,
prompt_tokens=50,
completion_tokens=50,
cached_prompt_tokens=0,
successful_requests=1,
)
agent = Agent(
role="Test Agent",
goal="Test files",
backstory="Test backstory",
llm=mock_llm,
verbose=False,
)
msg_file = File(source=b"message file content")
param_file = File(source=b"param file content")
messages = [
{"role": "user", "content": "Analyze", "files": {"same.png": msg_file}}
]
input_files = {"same.png": param_file}
executor, inputs, agent_info, parsed_tools = agent._prepare_kickoff(
messages=messages, input_files=input_files
)
assert "files" in inputs
assert inputs["files"]["same.png"] is param_file
def test_lite_agent_verbose_false_suppresses_printer_output():
"""Test that setting verbose=False suppresses all printer output."""
from crewai.agents.parser import AgentFinish
from crewai.types.usage_metrics import UsageMetrics
mock_llm = Mock(spec=LLM)
mock_llm.call.return_value = "Final Answer: Hello!"
mock_llm.stop = []
mock_llm.supports_stop_words.return_value = False
mock_llm.get_token_usage_summary.return_value = UsageMetrics(
total_tokens=100,
prompt_tokens=50,
completion_tokens=50,
cached_prompt_tokens=0,
successful_requests=1,
)
with pytest.warns(FutureWarning):
agent = LiteAgent(
role="Test Agent",
goal="Test goal",
backstory="Test backstory",
llm=mock_llm,
verbose=False,
)
mock_printer = Mock()
with patch("crewai.lite_agent.PRINTER", mock_printer):
result = agent.kickoff("Say hello")
assert result is not None
assert isinstance(result, LiteAgentOutput)
mock_printer.print.assert_not_called()
@pytest.mark.filterwarnings("ignore:LiteAgent is deprecated")
def test_lite_agent_memory_none_default():
"""With memory=None (default), _memory is None and no memory is used."""
mock_llm = Mock(spec=LLM)
mock_llm.call.return_value = "Final Answer: Ok"
mock_llm.stop = []
mock_llm.get_token_usage_summary.return_value = UsageMetrics(
total_tokens=10,
prompt_tokens=5,
completion_tokens=5,
cached_prompt_tokens=0,
successful_requests=1,
)
agent = LiteAgent(
role="Test",
goal="Test goal",
backstory="Test backstory",
llm=mock_llm,
memory=None,
verbose=False,
)
assert agent._memory is None
@pytest.mark.filterwarnings("ignore:LiteAgent is deprecated")
def test_lite_agent_memory_true_resolves_to_default_memory():
"""With memory=True, _memory is a Memory instance."""
from crewai.memory.unified_memory import Memory
mock_llm = Mock(spec=LLM)
mock_llm.call.return_value = "Final Answer: Ok"
mock_llm.stop = []
mock_llm.get_token_usage_summary.return_value = UsageMetrics(
total_tokens=10,
prompt_tokens=5,
completion_tokens=5,
cached_prompt_tokens=0,
successful_requests=1,
)
agent = LiteAgent(
role="Test",
goal="Test goal",
backstory="Test backstory",
llm=mock_llm,
memory=True,
verbose=False,
)
assert agent._memory is not None
assert isinstance(agent._memory, Memory)
assert agent._memory.llm is agent.llm
@pytest.mark.filterwarnings("ignore:LiteAgent is deprecated")
def test_lite_agent_memory_instance_recall_and_save_called():
"""With a custom memory instance, kickoff calls recall and then extract_memories/remember."""
mock_llm = Mock(spec=LLM)
mock_llm.call.return_value = "Final Answer: The answer is 42."
mock_llm.stop = []
mock_llm.supports_stop_words.return_value = False
mock_llm.get_token_usage_summary.return_value = UsageMetrics(
total_tokens=10,
prompt_tokens=5,
completion_tokens=5,
cached_prompt_tokens=0,
successful_requests=1,
)
mock_memory = Mock()
mock_memory.read_only = False
mock_memory.recall.return_value = []
mock_memory.extract_memories.return_value = ["Fact one.", "Fact two."]
agent = LiteAgent(
role="Test",
goal="Test goal",
backstory="Test backstory",
llm=mock_llm,
memory=mock_memory,
verbose=False,
)
assert agent._memory is mock_memory
agent.kickoff("What is the answer?")
mock_memory.recall.assert_called_once()
call_kw = mock_memory.recall.call_args[1]
assert call_kw.get("limit") == 10
# depth is not passed explicitly; Memory.recall() defaults to "deep"
mock_memory.extract_memories.assert_called_once()
mock_memory.remember_many.assert_called_once_with(
["Fact one.", "Fact two."], agent_role="Test"
)