mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-07-01 13:18:10 +00:00
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Check Documentation Broken Links / Check broken links (push) Has been cancelled
Vulnerability Scan / pip-audit (push) Has been cancelled
Nightly Canary Release / Check for new commits (push) Has been cancelled
Nightly Canary Release / Build nightly packages (push) Has been cancelled
Nightly Canary Release / Publish nightly to PyPI (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
* feat(cli): introduce JSON crew project support and TUI enhancements - Added support for creating and running JSON-defined crew projects, allowing users to scaffold projects with a new `create_json_crew.py` file. - Implemented a full-screen Textual TUI for crew execution in `crew_run_tui.py`, enhancing user interaction with a two-column layout. - Updated `run_crew.py` to prioritize JSON crew projects and added daemon mode for running without TUI. - Introduced interactive pickers in `tui_picker.py` for improved CLI prompts. - Enhanced validation for JSON crew files in `validate.py` to ensure proper structure and agent definitions. - Updated `.gitignore` to exclude demo and crewai directories. * feat: update LLM model references to gpt-5.4-mini - Changed default LLM model from gpt-4o-mini to gpt-5.4-mini across various files, including CLI options, JSON crew configurations, and agent definitions. - Enhanced benchmark and human feedback functionalities to utilize the new model. - Improved user interface elements in the TUI for better interaction and feedback during execution. - Added support for new skills directory in JSON crew project creation. * feat(benchmark): add crew-level benchmarking functionality - Introduced a new `benchmark` command in the CLI for crew-level benchmarking, allowing users to specify agents, models, and timeout settings. - Implemented `CrewBenchmarkCase` to handle crew-level benchmark cases with inputs and criteria. - Enhanced the benchmark runner to support progress tracking and detailed reporting of results for multiple models. - Added tests for loading crew benchmark cases and validating their structure. - Updated existing benchmark functions to accommodate the new crew-level execution model. * feat(cli): enhance JSON crew project functionality and TUI improvements - Added optional agent-level guardrails and advanced options in JSON crew configurations to improve output validation and flexibility. - Updated the TUI to better handle plan step statuses, including visual indicators for task completion and failure. - Introduced methods for parsing and managing step observation events, ensuring accurate updates to task statuses during execution. - Enhanced validation for JSON crew projects, ensuring proper structure and error handling for agent and task definitions. - Added comprehensive tests for new features and validation logic, ensuring robustness in JSON crew project handling. * refactor(cli): streamline JSON crew project handling and improve validation - Refactored JSON crew project loading and validation logic to enhance clarity and maintainability. - Introduced utility functions for finding JSON crew files, improving code reuse across modules. - Removed deprecated benchmark functionality and associated tests to simplify the codebase. - Updated CLI commands to utilize the new JSON project structure, ensuring compatibility with recent changes. - Enhanced test coverage for JSON crew project features, ensuring robust validation and error handling. * feat(cli): enhance activity log navigation and focus management - Added functionality to focus on the activity log when navigating through log entries. - Implemented refresh logic for the log panel to ensure updates are displayed correctly during navigation. - Improved keyboard navigation for log entries, allowing users to expand and scroll through logs seamlessly. - Added tests to verify the correct behavior of log navigation and focus management in the TUI. * feat(cli): enhance JSON crew project interaction and input handling - Introduced a new function to enable prompt line editing for better user experience during input prompts. - Updated the JSON crew project wizards to show interpolation hints for dynamic values, improving user guidance. - Enhanced the handling of missing input placeholders by prompting users for required values during crew setup. - Refactored the crew run logic to ensure proper loading and preparation of JSON-defined crews, including runtime input management. - Added tests to verify the correct behavior of new input handling features and JSON crew project interactions. * feat(cli): improve crew project input prompts and event handling - Enhanced the `_prompt_text` function to allow for configurable spacing before prompts, improving user experience during input collection. - Updated the wizards for agent and task creation to utilize the new prompt configuration, ensuring a more compact and streamlined interaction. - Introduced new plan step lifecycle events (`PlanStepStartedEvent`, `PlanStepCompletedEvent`) to better track the execution status of plan steps. - Refactored the step executor to emit these events during the execution of tasks, improving observability and debugging capabilities. - Added tests to verify the correct behavior of new prompt handling and event emissions during crew project execution. * fix: refine json-first crew interactions * fix: prioritize common json crew tools * fix: make json crew more tools expandable * fix: show json crew tools by category * feat(memory): update default embedder to OpenAI text-embedding-3-large and enhance memory compatibility - Changed the default embedding model for Memory to OpenAI text-embedding-3-large, which uses 3072-dimensional vectors. - Added warnings regarding compatibility issues with existing local memory stores created with 1536-dimensional embeddings. - Updated documentation to reflect the new default embedder and its configuration options. - Enhanced the CLI and codebase to support the new embedding model across various components, ensuring a seamless transition for users. * fix: address PR review feedback for JSON-first crews Review blockers: - Forward trained_agents_file to JSON crews: crewai run -f now exports CREWAI_TRAINED_AGENTS_FILE for the in-process JSON crew path - Wizard agent picker: Esc/cancel now reprompts instead of silently assigning the first agent - JSON tool resolution hard-fails: unknown tool names, missing custom tool files, and invalid custom tool modules raise JSONProjectError with actionable messages instead of warn-and-continue - Embedding dimension mismatch: LanceDB and Qdrant Edge storages raise EmbeddingDimensionMismatchError with reset/pin guidance instead of silently zero-filling vectors or returning empty search results - Custom tool code execution documented in loader docstring and the scaffolded project README CI fixes: - ruff format across lib/ - All 133 PR-introduced mypy errors fixed (llm.py lazy-litellm and cli.py lazy command shims now use TYPE_CHECKING imports; textual is_mounted misuse fixed; pick_many overloads; misc annotations) Bot review comments: - Empty except blocks now have explanatory comments or debug logging - Removed unused _C_BG/_C_PANEL/_C_BORDER globals and redundant import re; tests use a single import style for create_json_crew Tests: trained-agents propagation, wizard cancel, tool resolution failures, and dimension mismatch guidance. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix: address second round of PR review comments Cursor Bugbot: - Wizard agent slugs: strip to [a-z0-9_] and fall back to agent_<n> so symbol-only roles can't produce an empty agents/.jsonc filename - Wizard task names: dedupe against prior task names and fall back to task_<n> for symbol-only descriptions CodeRabbit: - Agent.message(): import Task explicitly at runtime instead of relying on the namespace injection done by crewai/__init__ - Async executor: move the native-tools-unsupported fallback from _ainvoke_loop_react (self-recursion) to _ainvoke_loop_native_tools, mirroring the sync implementation - StepExecutor downgrade: keep the in-step conversation and append the text-tooling instructions instead of rebuilding messages, so completed native tool calls are not re-executed - crewai-files: extension-based MIME lookup now runs before byte sniffing so csv/xml types are not degraded to text/plain - Memory storages: validate every record in a save() batch against a consistent embedding dimension (LanceDB previously checked only the first record); added mixed-batch tests - _print_post_tui_summary now typed against CrewRunApp - Docs: Azure OpenAI default embedder change called out in the memory migration warning and provider table Code quality bots: - Removed unused _C_YELLOW/_C_CYAN (crew_run_tui) and _GREEN (tui_picker) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * feat(cli): accordion tool picker in JSON crew wizard The flat tool list had grown to ~90 rows. The picker now shows: - Common tools always visible at the top - Every other category as a single expandable row with tool and selection counts (e.g. "Search & Research (27 tools, 2 selected)") - Expanding a category collapses the previously expanded one - Selections persist across expand/collapse via new preselected support in pick_many; cursor follows the toggled category row tui_picker gains preselected + initial_cursor options on pick_many, and Esc in multi-select now confirms the current selection instead of discarding it (required so collapsing can't silently drop choices). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * refactor(cli): remove --daemon flag from crewai run The flag only affected JSON crew projects — classic and flow projects ignored it entirely, which made the behavior inconsistent. Removed the option, the daemon code path (_run_json_crew_daemon), and its helper (_load_json_crew_with_inputs). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test: update run command tests after --daemon removal lib/crewai/tests/cli/test_run_crew.py still asserted the old run_crew(trained_agents_file=..., daemon=False) call signature. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(cli): exit codes, mid-run quit, async statuses, hyphen placeholders Addresses the latest Bugbot review round: - Failed JSON crew runs now exit non-zero (SystemExit(1)) so scripts and CI don't treat failures as success, mirroring the classic path - Quitting the TUI mid-run now ends the process (os._exit(130)); kickoff runs in a thread worker that cannot be force-cancelled, so letting the CLI return would leave LLM/tool work burning tokens in the background - Sidebar task statuses are now async-safe: completion/failure events resolve the task's own row via identity instead of assuming the most recently started task, and starting a task no longer blanket-marks earlier active rows as done - The runtime-input prompt regex now accepts hyphenated placeholder names ({my-topic}), matching kickoff's interpolation pattern Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix: validation safety, custom tool sandboxing, TUI log integrity, memory error surfacing - Deploy validation no longer executes project code: validation mode checks tool declarations structurally (well-formed entries, custom tool file exists) without importing or instantiating anything. custom:<name> resolution only happens on the actual run path. - custom:<name> is constrained to [A-Za-z_][A-Za-z0-9_]* and the resolved path must stay inside the project's tools/ directory, so custom:../foo or absolute-path names cannot execute code outside it. Tool paths resolve relative to the crew project root, not cwd. - TUI task logs are built from per-task state captured at task start (idx, description, agent, start time); an out-of-order completion takes its output from the event and no longer steals or resets the current task's streamed steps/output. - EmbeddingDimensionMismatchError now inherits ValueError instead of RuntimeError so background saves surface it through MemorySaveFailedEvent instead of silently dropping the save; the shutdown catch in _background_encode_batch is narrowed to the "cannot schedule new futures" case. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(cli): declared project type wins over crew.json presence A flow project that also contains a crew.json(c) file now runs and validates as the flow it declares in pyproject.toml instead of being hijacked by the JSON crew path. Both crewai run (_has_json_crew) and deploy validation (_is_json_crew) check tool.crewai.type; a missing or unreadable pyproject still means a bare JSON crew project. Also documents why StepObservationFailedEvent intentionally marks the plan step "done": the event signals an observer failure, not a step failure, and the executor continues past it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix(cli): type the declared_type locals so mypy stays clean Comparing an Any-typed .get() chain returns Any, which tripped no-any-return on the previous commit. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
1295 lines
43 KiB
Python
1295 lines
43 KiB
Python
"""Integration tests for native tool calling functionality.
|
|
|
|
These tests verify that agents can use native function calling
|
|
when the LLM supports it, across multiple providers.
|
|
"""
|
|
|
|
from __future__ import annotations
|
|
|
|
from collections.abc import Generator
|
|
import os
|
|
import threading
|
|
import time
|
|
from collections import Counter
|
|
from unittest.mock import Mock, patch
|
|
|
|
import pytest
|
|
from pydantic import BaseModel, Field
|
|
|
|
from crewai import Agent, Crew, Task
|
|
from crewai.agents.parser import AgentFinish
|
|
from crewai.events import crewai_event_bus
|
|
from crewai.hooks import register_after_tool_call_hook, register_before_tool_call_hook
|
|
from crewai.hooks.tool_hooks import ToolCallHookContext
|
|
from crewai.llm import LLM
|
|
from crewai.tools.base_tool import BaseTool
|
|
|
|
|
|
class CalculatorInput(BaseModel):
|
|
"""Input schema for calculator tool."""
|
|
|
|
expression: str = Field(description="Mathematical expression to evaluate")
|
|
|
|
|
|
class CalculatorTool(BaseTool):
|
|
"""A calculator tool that performs mathematical calculations."""
|
|
|
|
name: str = "calculator"
|
|
description: str = "Perform mathematical calculations. Use this for any math operations."
|
|
args_schema: type[BaseModel] = CalculatorInput
|
|
|
|
def _run(self, expression: str) -> str:
|
|
"""Execute the calculation."""
|
|
try:
|
|
# Safe evaluation for basic math
|
|
result = eval(expression) # noqa: S307
|
|
return f"The result of {expression} is {result}"
|
|
except Exception as e:
|
|
return f"Error calculating {expression}: {e}"
|
|
|
|
|
|
class WeatherInput(BaseModel):
|
|
"""Input schema for weather tool."""
|
|
|
|
location: str = Field(description="City name to get weather for")
|
|
|
|
|
|
class WeatherTool(BaseTool):
|
|
"""A mock weather tool for testing."""
|
|
|
|
name: str = "get_weather"
|
|
description: str = "Get the current weather for a location"
|
|
args_schema: type[BaseModel] = WeatherInput
|
|
|
|
def _run(self, location: str) -> str:
|
|
"""Get weather (mock implementation)."""
|
|
return f"The weather in {location} is sunny with a temperature of 72°F"
|
|
|
|
class FailingTool(BaseTool):
|
|
"""A tool that always fails."""
|
|
name: str = "failing_tool"
|
|
description: str = "This tool always fails"
|
|
def _run(self) -> str:
|
|
raise Exception("This tool always fails")
|
|
|
|
|
|
class LocalSearchInput(BaseModel):
|
|
query: str = Field(description="Search query")
|
|
|
|
|
|
class ParallelProbe:
|
|
"""Thread-safe in-memory recorder for tool execution windows."""
|
|
|
|
_lock = threading.Lock()
|
|
_windows: list[tuple[str, float, float]] = []
|
|
|
|
@classmethod
|
|
def reset(cls) -> None:
|
|
with cls._lock:
|
|
cls._windows = []
|
|
|
|
@classmethod
|
|
def record(cls, tool_name: str, start: float, end: float) -> None:
|
|
with cls._lock:
|
|
cls._windows.append((tool_name, start, end))
|
|
|
|
@classmethod
|
|
def windows(cls) -> list[tuple[str, float, float]]:
|
|
with cls._lock:
|
|
return list(cls._windows)
|
|
|
|
|
|
def _parallel_prompt() -> str:
|
|
return (
|
|
"This is a tool-calling compliance test. "
|
|
"In your next assistant turn, emit exactly 3 tool calls in the same response (parallel tool calls), in this order: "
|
|
"1) parallel_local_search_one(query='latest OpenAI model release notes'), "
|
|
"2) parallel_local_search_two(query='latest Anthropic model release notes'), "
|
|
"3) parallel_local_search_three(query='latest Gemini model release notes'). "
|
|
"Do not call any other tools and do not answer before those 3 tool calls are emitted. "
|
|
"After the tool results return, provide a one paragraph summary."
|
|
)
|
|
|
|
|
|
def _max_concurrency(windows: list[tuple[str, float, float]]) -> int:
|
|
points: list[tuple[float, int]] = []
|
|
for _, start, end in windows:
|
|
points.append((start, 1))
|
|
points.append((end, -1))
|
|
points.sort(key=lambda p: (p[0], p[1]))
|
|
|
|
current = 0
|
|
maximum = 0
|
|
for _, delta in points:
|
|
current += delta
|
|
if current > maximum:
|
|
maximum = current
|
|
return maximum
|
|
|
|
|
|
def _assert_tools_overlapped() -> None:
|
|
windows = ParallelProbe.windows()
|
|
local_windows = [
|
|
w
|
|
for w in windows
|
|
if w[0].startswith("parallel_local_search_")
|
|
]
|
|
|
|
assert len(local_windows) >= 3, f"Expected at least 3 local tool calls, got {len(local_windows)}"
|
|
assert _max_concurrency(local_windows) >= 2, "Expected overlapping local tool executions"
|
|
|
|
|
|
@pytest.fixture
|
|
def calculator_tool() -> CalculatorTool:
|
|
"""Create a calculator tool for testing."""
|
|
return CalculatorTool()
|
|
|
|
|
|
@pytest.fixture
|
|
def weather_tool() -> WeatherTool:
|
|
"""Create a weather tool for testing."""
|
|
return WeatherTool()
|
|
|
|
@pytest.fixture
|
|
def failing_tool() -> BaseTool:
|
|
"""Create a weather tool for testing."""
|
|
return FailingTool(
|
|
|
|
)
|
|
|
|
|
|
@pytest.fixture
|
|
def parallel_tools() -> list[BaseTool]:
|
|
"""Create local tools used to verify native parallel execution deterministically."""
|
|
|
|
class ParallelLocalSearchOne(BaseTool):
|
|
name: str = "parallel_local_search_one"
|
|
description: str = "Local search tool #1 for concurrency testing."
|
|
args_schema: type[BaseModel] = LocalSearchInput
|
|
|
|
def _run(self, query: str) -> str:
|
|
start = time.perf_counter()
|
|
time.sleep(1.0)
|
|
end = time.perf_counter()
|
|
ParallelProbe.record(self.name, start, end)
|
|
return f"[one] {query}"
|
|
|
|
class ParallelLocalSearchTwo(BaseTool):
|
|
name: str = "parallel_local_search_two"
|
|
description: str = "Local search tool #2 for concurrency testing."
|
|
args_schema: type[BaseModel] = LocalSearchInput
|
|
|
|
def _run(self, query: str) -> str:
|
|
start = time.perf_counter()
|
|
time.sleep(1.0)
|
|
end = time.perf_counter()
|
|
ParallelProbe.record(self.name, start, end)
|
|
return f"[two] {query}"
|
|
|
|
class ParallelLocalSearchThree(BaseTool):
|
|
name: str = "parallel_local_search_three"
|
|
description: str = "Local search tool #3 for concurrency testing."
|
|
args_schema: type[BaseModel] = LocalSearchInput
|
|
|
|
def _run(self, query: str) -> str:
|
|
start = time.perf_counter()
|
|
time.sleep(1.0)
|
|
end = time.perf_counter()
|
|
ParallelProbe.record(self.name, start, end)
|
|
return f"[three] {query}"
|
|
|
|
return [
|
|
ParallelLocalSearchOne(),
|
|
ParallelLocalSearchTwo(),
|
|
ParallelLocalSearchThree(),
|
|
]
|
|
|
|
|
|
def _attach_parallel_probe_handler() -> None:
|
|
@crewai_event_bus.on(ToolUsageFinishedEvent)
|
|
def _capture_tool_window(_source, event: ToolUsageFinishedEvent):
|
|
if not event.tool_name.startswith("parallel_local_search_"):
|
|
return
|
|
ParallelProbe.record(
|
|
event.tool_name,
|
|
event.started_at.timestamp(),
|
|
event.finished_at.timestamp(),
|
|
)
|
|
|
|
# OpenAI Provider Tests
|
|
|
|
|
|
class TestOpenAINativeToolCalling:
|
|
"""Tests for native tool calling with OpenAI models."""
|
|
|
|
@pytest.mark.vcr()
|
|
def test_openai_agent_with_native_tool_calling(
|
|
self, calculator_tool: CalculatorTool
|
|
) -> None:
|
|
"""Test OpenAI agent can use native tool calling."""
|
|
agent = Agent(
|
|
role="Math Assistant",
|
|
goal="Help users with mathematical calculations",
|
|
backstory="You are a helpful math assistant.",
|
|
tools=[calculator_tool],
|
|
llm=LLM(model="gpt-4o-mini"),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
|
|
task = Task(
|
|
description="Calculate what is 15 * 8",
|
|
expected_output="The result of the calculation",
|
|
agent=agent,
|
|
)
|
|
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
|
|
assert result is not None
|
|
assert result.raw is not None
|
|
assert "120" in str(result.raw)
|
|
|
|
def test_openai_agent_kickoff_with_tools_mocked(
|
|
self, calculator_tool: CalculatorTool
|
|
) -> None:
|
|
"""Test OpenAI agent kickoff with mocked LLM call."""
|
|
llm = LLM(model="gpt-5-nano")
|
|
|
|
with patch.object(llm, "call", return_value="The answer is 120.") as mock_call:
|
|
agent = Agent(
|
|
role="Math Assistant",
|
|
goal="Calculate math",
|
|
backstory="You calculate.",
|
|
tools=[calculator_tool],
|
|
llm=llm,
|
|
verbose=False,
|
|
)
|
|
|
|
task = Task(
|
|
description="Calculate 15 * 8",
|
|
expected_output="Result",
|
|
agent=agent,
|
|
)
|
|
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
|
|
assert mock_call.called
|
|
assert result is not None
|
|
|
|
@pytest.mark.vcr()
|
|
@pytest.mark.timeout(180)
|
|
def test_openai_parallel_native_tool_calling_test_crew(
|
|
self, parallel_tools: list[BaseTool]
|
|
) -> None:
|
|
agent = Agent(
|
|
role="Parallel Tool Agent",
|
|
goal="Use both tools exactly as instructed",
|
|
backstory="You follow tool instructions precisely.",
|
|
tools=parallel_tools,
|
|
llm=LLM(model="gpt-5-nano", temperature=1),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
task = Task(
|
|
description=_parallel_prompt(),
|
|
expected_output="A one sentence summary of both tool outputs",
|
|
agent=agent,
|
|
)
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
assert result is not None
|
|
_assert_tools_overlapped()
|
|
|
|
@pytest.mark.vcr()
|
|
@pytest.mark.timeout(180)
|
|
def test_openai_parallel_native_tool_calling_test_agent_kickoff(
|
|
self, parallel_tools: list[BaseTool]
|
|
) -> None:
|
|
agent = Agent(
|
|
role="Parallel Tool Agent",
|
|
goal="Use both tools exactly as instructed",
|
|
backstory="You follow tool instructions precisely.",
|
|
tools=parallel_tools,
|
|
llm=LLM(model="gpt-4o-mini"),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
result = agent.kickoff(_parallel_prompt())
|
|
assert result is not None
|
|
_assert_tools_overlapped()
|
|
|
|
@pytest.mark.vcr()
|
|
@pytest.mark.timeout(180)
|
|
def test_openai_parallel_native_tool_calling_tool_hook_parity_crew(
|
|
self, parallel_tools: list[BaseTool]
|
|
) -> None:
|
|
hook_calls: dict[str, list[dict[str, str]]] = {"before": [], "after": []}
|
|
|
|
def before_hook(context: ToolCallHookContext) -> bool | None:
|
|
if context.tool_name.startswith("parallel_local_search_"):
|
|
hook_calls["before"].append(
|
|
{
|
|
"tool_name": context.tool_name,
|
|
"query": str(context.tool_input.get("query", "")),
|
|
}
|
|
)
|
|
return None
|
|
|
|
def after_hook(context: ToolCallHookContext) -> str | None:
|
|
if context.tool_name.startswith("parallel_local_search_"):
|
|
hook_calls["after"].append(
|
|
{
|
|
"tool_name": context.tool_name,
|
|
"query": str(context.tool_input.get("query", "")),
|
|
}
|
|
)
|
|
return None
|
|
|
|
register_before_tool_call_hook(before_hook)
|
|
register_after_tool_call_hook(after_hook)
|
|
|
|
try:
|
|
agent = Agent(
|
|
role="Parallel Tool Agent",
|
|
goal="Use both tools exactly as instructed",
|
|
backstory="You follow tool instructions precisely.",
|
|
tools=parallel_tools,
|
|
llm=LLM(model="gpt-5-nano", temperature=1),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
task = Task(
|
|
description=_parallel_prompt(),
|
|
expected_output="A one sentence summary of both tool outputs",
|
|
agent=agent,
|
|
)
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
|
|
assert result is not None
|
|
_assert_tools_overlapped()
|
|
|
|
before_names = [call["tool_name"] for call in hook_calls["before"]]
|
|
after_names = [call["tool_name"] for call in hook_calls["after"]]
|
|
assert len(before_names) >= 3, "Expected before hooks for all parallel calls"
|
|
assert Counter(before_names) == Counter(after_names)
|
|
assert all(call["query"] for call in hook_calls["before"])
|
|
assert all(call["query"] for call in hook_calls["after"])
|
|
finally:
|
|
from crewai.hooks import (
|
|
unregister_after_tool_call_hook,
|
|
unregister_before_tool_call_hook,
|
|
)
|
|
|
|
unregister_before_tool_call_hook(before_hook)
|
|
unregister_after_tool_call_hook(after_hook)
|
|
|
|
@pytest.mark.vcr()
|
|
@pytest.mark.timeout(180)
|
|
def test_openai_parallel_native_tool_calling_tool_hook_parity_agent_kickoff(
|
|
self, parallel_tools: list[BaseTool]
|
|
) -> None:
|
|
hook_calls: dict[str, list[dict[str, str]]] = {"before": [], "after": []}
|
|
|
|
def before_hook(context: ToolCallHookContext) -> bool | None:
|
|
if context.tool_name.startswith("parallel_local_search_"):
|
|
hook_calls["before"].append(
|
|
{
|
|
"tool_name": context.tool_name,
|
|
"query": str(context.tool_input.get("query", "")),
|
|
}
|
|
)
|
|
return None
|
|
|
|
def after_hook(context: ToolCallHookContext) -> str | None:
|
|
if context.tool_name.startswith("parallel_local_search_"):
|
|
hook_calls["after"].append(
|
|
{
|
|
"tool_name": context.tool_name,
|
|
"query": str(context.tool_input.get("query", "")),
|
|
}
|
|
)
|
|
return None
|
|
|
|
register_before_tool_call_hook(before_hook)
|
|
register_after_tool_call_hook(after_hook)
|
|
|
|
try:
|
|
agent = Agent(
|
|
role="Parallel Tool Agent",
|
|
goal="Use both tools exactly as instructed",
|
|
backstory="You follow tool instructions precisely.",
|
|
tools=parallel_tools,
|
|
llm=LLM(model="gpt-5-nano", temperature=1),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
result = agent.kickoff(_parallel_prompt())
|
|
|
|
assert result is not None
|
|
_assert_tools_overlapped()
|
|
|
|
before_names = [call["tool_name"] for call in hook_calls["before"]]
|
|
after_names = [call["tool_name"] for call in hook_calls["after"]]
|
|
assert len(before_names) >= 3, "Expected before hooks for all parallel calls"
|
|
assert Counter(before_names) == Counter(after_names)
|
|
assert all(call["query"] for call in hook_calls["before"])
|
|
assert all(call["query"] for call in hook_calls["after"])
|
|
finally:
|
|
from crewai.hooks import (
|
|
unregister_after_tool_call_hook,
|
|
unregister_before_tool_call_hook,
|
|
)
|
|
|
|
unregister_before_tool_call_hook(before_hook)
|
|
unregister_after_tool_call_hook(after_hook)
|
|
|
|
|
|
# Anthropic Provider Tests
|
|
class TestAnthropicNativeToolCalling:
|
|
"""Tests for native tool calling with Anthropic models."""
|
|
|
|
@pytest.fixture(autouse=True)
|
|
def mock_anthropic_api_key(self):
|
|
"""Mock ANTHROPIC_API_KEY for tests."""
|
|
if "ANTHROPIC_API_KEY" not in os.environ:
|
|
with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
|
|
yield
|
|
else:
|
|
yield
|
|
|
|
@pytest.mark.vcr()
|
|
def test_anthropic_agent_with_native_tool_calling(
|
|
self, calculator_tool: CalculatorTool
|
|
) -> None:
|
|
"""Test Anthropic agent can use native tool calling."""
|
|
agent = Agent(
|
|
role="Math Assistant",
|
|
goal="Help users with mathematical calculations",
|
|
backstory="You are a helpful math assistant.",
|
|
tools=[calculator_tool],
|
|
llm=LLM(model="anthropic/claude-3-5-haiku-20241022"),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
|
|
task = Task(
|
|
description="Calculate what is 15 * 8",
|
|
expected_output="The result of the calculation",
|
|
agent=agent,
|
|
)
|
|
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
|
|
assert result is not None
|
|
assert result.raw is not None
|
|
|
|
def test_anthropic_agent_kickoff_with_tools_mocked(
|
|
self, calculator_tool: CalculatorTool
|
|
) -> None:
|
|
"""Test Anthropic agent kickoff with mocked LLM call."""
|
|
llm = LLM(model="anthropic/claude-3-5-haiku-20241022")
|
|
|
|
with patch.object(llm, "call", return_value="The answer is 120.") as mock_call:
|
|
agent = Agent(
|
|
role="Math Assistant",
|
|
goal="Calculate math",
|
|
backstory="You calculate.",
|
|
tools=[calculator_tool],
|
|
llm=llm,
|
|
verbose=False,
|
|
)
|
|
|
|
task = Task(
|
|
description="Calculate 15 * 8",
|
|
expected_output="Result",
|
|
agent=agent,
|
|
)
|
|
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
|
|
assert mock_call.called
|
|
assert result is not None
|
|
|
|
@pytest.mark.vcr()
|
|
def test_anthropic_parallel_native_tool_calling_test_crew(
|
|
self, parallel_tools: list[BaseTool]
|
|
) -> None:
|
|
agent = Agent(
|
|
role="Parallel Tool Agent",
|
|
goal="Use both tools exactly as instructed",
|
|
backstory="You follow tool instructions precisely.",
|
|
tools=parallel_tools,
|
|
llm=LLM(model="anthropic/claude-sonnet-4-6"),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
task = Task(
|
|
description=_parallel_prompt(),
|
|
expected_output="A one sentence summary of both tool outputs",
|
|
agent=agent,
|
|
)
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
assert result is not None
|
|
_assert_tools_overlapped()
|
|
|
|
@pytest.mark.vcr()
|
|
def test_anthropic_parallel_native_tool_calling_test_agent_kickoff(
|
|
self, parallel_tools: list[BaseTool]
|
|
) -> None:
|
|
agent = Agent(
|
|
role="Parallel Tool Agent",
|
|
goal="Use both tools exactly as instructed",
|
|
backstory="You follow tool instructions precisely.",
|
|
tools=parallel_tools,
|
|
llm=LLM(model="anthropic/claude-sonnet-4-6"),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
result = agent.kickoff(_parallel_prompt())
|
|
assert result is not None
|
|
_assert_tools_overlapped()
|
|
|
|
|
|
# Google/Gemini Provider Tests
|
|
|
|
|
|
class TestGeminiNativeToolCalling:
|
|
"""Tests for native tool calling with Gemini models."""
|
|
|
|
@pytest.fixture(autouse=True)
|
|
def mock_google_api_key(self):
|
|
"""Mock GOOGLE_API_KEY for tests."""
|
|
if "GOOGLE_API_KEY" not in os.environ and "GEMINI_API_KEY" not in os.environ:
|
|
with patch.dict(os.environ, {"GOOGLE_API_KEY": "test-key"}):
|
|
yield
|
|
else:
|
|
yield
|
|
|
|
|
|
@pytest.mark.vcr()
|
|
def test_gemini_agent_with_native_tool_calling(
|
|
self, calculator_tool: CalculatorTool
|
|
) -> None:
|
|
"""Test Gemini agent can use native tool calling."""
|
|
|
|
agent = Agent(
|
|
role="Math Assistant",
|
|
goal="Help users with mathematical calculations",
|
|
backstory="You are a helpful math assistant.",
|
|
tools=[calculator_tool],
|
|
llm=LLM(model="gemini/gemini-2.5-flash"),
|
|
)
|
|
|
|
task = Task(
|
|
description="Calculate what is 15 * 8",
|
|
expected_output="The result of the calculation",
|
|
agent=agent,
|
|
)
|
|
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
|
|
assert result is not None
|
|
assert result.raw is not None
|
|
|
|
def test_gemini_agent_kickoff_with_tools_mocked(
|
|
self, calculator_tool: CalculatorTool
|
|
) -> None:
|
|
"""Test Gemini agent kickoff with mocked LLM call."""
|
|
llm = LLM(model="gemini/gemini-2.5-flash")
|
|
|
|
with patch.object(llm, "call", return_value="The answer is 120.") as mock_call:
|
|
agent = Agent(
|
|
role="Math Assistant",
|
|
goal="Calculate math",
|
|
backstory="You calculate.",
|
|
tools=[calculator_tool],
|
|
llm=llm,
|
|
verbose=False,
|
|
)
|
|
|
|
task = Task(
|
|
description="Calculate 15 * 8",
|
|
expected_output="Result",
|
|
agent=agent,
|
|
)
|
|
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
|
|
assert mock_call.called
|
|
assert result is not None
|
|
|
|
@pytest.mark.vcr()
|
|
def test_gemini_parallel_native_tool_calling_test_crew(
|
|
self, parallel_tools: list[BaseTool]
|
|
) -> None:
|
|
agent = Agent(
|
|
role="Parallel Tool Agent",
|
|
goal="Use both tools exactly as instructed",
|
|
backstory="You follow tool instructions precisely.",
|
|
tools=parallel_tools,
|
|
llm=LLM(model="gemini/gemini-2.5-flash"),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
task = Task(
|
|
description=_parallel_prompt(),
|
|
expected_output="A one sentence summary of both tool outputs",
|
|
agent=agent,
|
|
)
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
assert result is not None
|
|
_assert_tools_overlapped()
|
|
|
|
@pytest.mark.vcr()
|
|
def test_gemini_parallel_native_tool_calling_test_agent_kickoff(
|
|
self, parallel_tools: list[BaseTool]
|
|
) -> None:
|
|
agent = Agent(
|
|
role="Parallel Tool Agent",
|
|
goal="Use both tools exactly as instructed",
|
|
backstory="You follow tool instructions precisely.",
|
|
tools=parallel_tools,
|
|
llm=LLM(model="gemini/gemini-2.5-flash"),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
result = agent.kickoff(_parallel_prompt())
|
|
assert result is not None
|
|
_assert_tools_overlapped()
|
|
|
|
|
|
# Azure Provider Tests
|
|
|
|
|
|
class TestAzureNativeToolCalling:
|
|
"""Tests for native tool calling with Azure OpenAI models."""
|
|
|
|
@pytest.fixture(autouse=True)
|
|
def mock_azure_env(self):
|
|
"""Mock Azure environment variables for tests."""
|
|
env_vars = {
|
|
"AZURE_API_KEY": "test-key",
|
|
"AZURE_API_BASE": "https://test.openai.azure.com",
|
|
"AZURE_API_VERSION": "2024-02-15-preview",
|
|
}
|
|
if "AZURE_API_KEY" not in os.environ:
|
|
with patch.dict(os.environ, env_vars):
|
|
yield
|
|
else:
|
|
yield
|
|
|
|
@pytest.mark.vcr()
|
|
def test_azure_agent_with_native_tool_calling(
|
|
self, calculator_tool: CalculatorTool
|
|
) -> None:
|
|
"""Test Azure agent can use native tool calling."""
|
|
agent = Agent(
|
|
role="Math Assistant",
|
|
goal="Help users with mathematical calculations",
|
|
backstory="You are a helpful math assistant.",
|
|
tools=[calculator_tool],
|
|
llm=LLM(model="azure/gpt-5-nano"),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
|
|
task = Task(
|
|
description="Calculate what is 15 * 8",
|
|
expected_output="The result of the calculation",
|
|
agent=agent,
|
|
)
|
|
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
|
|
assert result is not None
|
|
assert result.raw is not None
|
|
assert "120" in str(result.raw)
|
|
|
|
def test_azure_agent_kickoff_with_tools_mocked(
|
|
self, calculator_tool: CalculatorTool
|
|
) -> None:
|
|
"""Test Azure agent kickoff with mocked LLM call."""
|
|
llm = LLM(
|
|
model="azure/gpt-5-nano",
|
|
api_key="test-key",
|
|
base_url="https://test.openai.azure.com",
|
|
)
|
|
|
|
with patch.object(llm, "call", return_value="The answer is 120.") as mock_call:
|
|
agent = Agent(
|
|
role="Math Assistant",
|
|
goal="Calculate math",
|
|
backstory="You calculate.",
|
|
tools=[calculator_tool],
|
|
llm=llm,
|
|
verbose=False,
|
|
)
|
|
|
|
task = Task(
|
|
description="Calculate 15 * 8",
|
|
expected_output="Result",
|
|
agent=agent,
|
|
)
|
|
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
|
|
assert mock_call.called
|
|
assert result is not None
|
|
|
|
@pytest.mark.vcr()
|
|
def test_azure_parallel_native_tool_calling_test_crew(
|
|
self, parallel_tools: list[BaseTool]
|
|
) -> None:
|
|
agent = Agent(
|
|
role="Parallel Tool Agent",
|
|
goal="Use both tools exactly as instructed",
|
|
backstory="You follow tool instructions precisely.",
|
|
tools=parallel_tools,
|
|
llm=LLM(model="azure/gpt-5-nano"),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
task = Task(
|
|
description=_parallel_prompt(),
|
|
expected_output="A one sentence summary of both tool outputs",
|
|
agent=agent,
|
|
)
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
assert result is not None
|
|
_assert_tools_overlapped()
|
|
|
|
@pytest.mark.vcr()
|
|
def test_azure_parallel_native_tool_calling_test_agent_kickoff(
|
|
self, parallel_tools: list[BaseTool]
|
|
) -> None:
|
|
agent = Agent(
|
|
role="Parallel Tool Agent",
|
|
goal="Use both tools exactly as instructed",
|
|
backstory="You follow tool instructions precisely.",
|
|
tools=parallel_tools,
|
|
llm=LLM(model="azure/gpt-5-nano"),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
result = agent.kickoff(_parallel_prompt())
|
|
assert result is not None
|
|
_assert_tools_overlapped()
|
|
|
|
|
|
# Bedrock Provider Tests
|
|
|
|
|
|
class TestBedrockNativeToolCalling:
|
|
"""Tests for native tool calling with AWS Bedrock models."""
|
|
|
|
@pytest.fixture(autouse=True)
|
|
def validate_bedrock_credentials_for_live_recording(self):
|
|
"""Run Bedrock tests only when explicitly enabled."""
|
|
run_live_bedrock = os.getenv("RUN_BEDROCK_LIVE_TESTS", "false").lower() == "true"
|
|
|
|
if not run_live_bedrock:
|
|
pytest.skip(
|
|
"Skipping Bedrock tests by default. "
|
|
"Set RUN_BEDROCK_LIVE_TESTS=true with valid AWS credentials to enable."
|
|
)
|
|
|
|
access_key = os.getenv("AWS_ACCESS_KEY_ID", "")
|
|
secret_key = os.getenv("AWS_SECRET_ACCESS_KEY", "")
|
|
if (
|
|
not access_key
|
|
or not secret_key
|
|
or access_key.startswith(("fake-", "test-"))
|
|
or secret_key.startswith(("fake-", "test-"))
|
|
):
|
|
pytest.skip(
|
|
"Skipping Bedrock tests: valid AWS credentials are required when "
|
|
"RUN_BEDROCK_LIVE_TESTS=true."
|
|
)
|
|
|
|
yield
|
|
|
|
@pytest.mark.vcr()
|
|
def test_bedrock_agent_kickoff_with_tools_mocked(
|
|
self, calculator_tool: CalculatorTool
|
|
) -> None:
|
|
"""Test Bedrock agent kickoff with mocked LLM call."""
|
|
llm = LLM(model="bedrock/us.anthropic.claude-sonnet-4-6")
|
|
|
|
agent = Agent(
|
|
role="Math Assistant",
|
|
goal="Calculate math",
|
|
backstory="You calculate.",
|
|
tools=[calculator_tool],
|
|
llm=llm,
|
|
verbose=False,
|
|
max_iter=5,
|
|
)
|
|
|
|
task = Task(
|
|
description="Calculate 15 * 8",
|
|
expected_output="Result",
|
|
agent=agent,
|
|
)
|
|
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
|
|
assert result is not None
|
|
assert result.raw is not None
|
|
assert "120" in str(result.raw)
|
|
|
|
@pytest.mark.vcr()
|
|
def test_bedrock_parallel_native_tool_calling_test_crew(
|
|
self, parallel_tools: list[BaseTool]
|
|
) -> None:
|
|
agent = Agent(
|
|
role="Parallel Tool Agent",
|
|
goal="Use both tools exactly as instructed",
|
|
backstory="You follow tool instructions precisely.",
|
|
tools=parallel_tools,
|
|
llm=LLM(model="bedrock/us.anthropic.claude-sonnet-4-6"),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
task = Task(
|
|
description=_parallel_prompt(),
|
|
expected_output="A one sentence summary of both tool outputs",
|
|
agent=agent,
|
|
)
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
assert result is not None
|
|
_assert_tools_overlapped()
|
|
|
|
@pytest.mark.vcr()
|
|
def test_bedrock_parallel_native_tool_calling_test_agent_kickoff(
|
|
self, parallel_tools: list[BaseTool]
|
|
) -> None:
|
|
agent = Agent(
|
|
role="Parallel Tool Agent",
|
|
goal="Use both tools exactly as instructed",
|
|
backstory="You follow tool instructions precisely.",
|
|
tools=parallel_tools,
|
|
llm=LLM(model="bedrock/us.anthropic.claude-sonnet-4-6"),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
result = agent.kickoff(_parallel_prompt())
|
|
assert result is not None
|
|
_assert_tools_overlapped()
|
|
|
|
|
|
# Cross-Provider Native Tool Calling Behavior Tests
|
|
|
|
|
|
class TestNativeToolCallingBehavior:
|
|
"""Tests for native tool calling behavior across providers."""
|
|
|
|
def test_supports_function_calling_check(self) -> None:
|
|
"""Test that supports_function_calling() is properly checked."""
|
|
# OpenAI should support function calling
|
|
openai_llm = LLM(model="gpt-5-nano")
|
|
assert hasattr(openai_llm, "supports_function_calling")
|
|
assert openai_llm.supports_function_calling() is True
|
|
|
|
def test_anthropic_supports_function_calling(self) -> None:
|
|
"""Test that Anthropic models support function calling."""
|
|
with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
|
|
llm = LLM(model="anthropic/claude-3-5-haiku-20241022")
|
|
assert hasattr(llm, "supports_function_calling")
|
|
assert llm.supports_function_calling() is True
|
|
|
|
def test_gemini_supports_function_calling(self) -> None:
|
|
"""Test that Gemini models support function calling."""
|
|
llm = LLM(model="gemini/gemini-2.5-flash")
|
|
assert hasattr(llm, "supports_function_calling")
|
|
assert llm.supports_function_calling() is True
|
|
|
|
|
|
# Token Usage Tests
|
|
|
|
|
|
class TestNativeToolCallingTokenUsage:
|
|
"""Tests for token usage with native tool calling."""
|
|
|
|
@pytest.mark.vcr()
|
|
def test_openai_native_tool_calling_token_usage(
|
|
self, calculator_tool: CalculatorTool
|
|
) -> None:
|
|
"""Test token usage tracking with OpenAI native tool calling."""
|
|
agent = Agent(
|
|
role="Calculator",
|
|
goal="Perform calculations efficiently",
|
|
backstory="You calculate things.",
|
|
tools=[calculator_tool],
|
|
llm=LLM(model="gpt-5-nano"),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
|
|
task = Task(
|
|
description="What is 100 / 4?",
|
|
expected_output="The result",
|
|
agent=agent,
|
|
)
|
|
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
|
|
assert result is not None
|
|
assert result.token_usage is not None
|
|
assert result.token_usage.total_tokens > 0
|
|
assert result.token_usage.successful_requests >= 1
|
|
|
|
print(f"\n[OPENAI NATIVE TOOL CALLING TOKEN USAGE]")
|
|
print(f" Prompt tokens: {result.token_usage.prompt_tokens}")
|
|
print(f" Completion tokens: {result.token_usage.completion_tokens}")
|
|
print(f" Total tokens: {result.token_usage.total_tokens}")
|
|
|
|
@pytest.mark.vcr()
|
|
def test_native_tool_calling_error_handling(failing_tool: FailingTool):
|
|
"""Test that native tool calling handles errors properly and emits error events."""
|
|
import threading
|
|
from crewai.events import crewai_event_bus
|
|
from crewai.events.types.tool_usage_events import ToolUsageErrorEvent
|
|
|
|
received_events = []
|
|
event_received = threading.Event()
|
|
|
|
@crewai_event_bus.on(ToolUsageErrorEvent)
|
|
def handle_tool_error(source, event):
|
|
received_events.append(event)
|
|
event_received.set()
|
|
|
|
agent = Agent(
|
|
role="Calculator",
|
|
goal="Perform calculations efficiently",
|
|
backstory="You calculate things.",
|
|
tools=[failing_tool],
|
|
llm=LLM(model="gpt-5-nano"),
|
|
verbose=False,
|
|
max_iter=3,
|
|
)
|
|
|
|
result = agent.kickoff("Use the failing_tool to do something.")
|
|
assert result is not None
|
|
|
|
assert event_received.wait(timeout=10), "ToolUsageErrorEvent was not emitted"
|
|
assert len(received_events) >= 1
|
|
|
|
error_event = received_events[0]
|
|
assert error_event.tool_name == "failing_tool"
|
|
assert error_event.agent_role == agent.role
|
|
assert "This tool always fails" in str(error_event.error)
|
|
|
|
|
|
# Max Usage Count Tests for Native Tool Calling
|
|
|
|
|
|
class CountingInput(BaseModel):
|
|
"""Input schema for counting tool."""
|
|
|
|
value: str = Field(description="Value to count")
|
|
|
|
|
|
class CountingTool(BaseTool):
|
|
"""A tool that counts its usage."""
|
|
|
|
name: str = "counting_tool"
|
|
description: str = "A tool that counts how many times it's been called"
|
|
args_schema: type[BaseModel] = CountingInput
|
|
|
|
def _run(self, value: str) -> str:
|
|
"""Return the value with a count prefix."""
|
|
return f"Counted: {value}"
|
|
|
|
|
|
class TestMaxUsageCountWithNativeToolCalling:
|
|
"""Tests for max_usage_count with native tool calling."""
|
|
|
|
@pytest.mark.vcr()
|
|
def test_max_usage_count_tracked_in_native_tool_calling(self) -> None:
|
|
"""Test that max_usage_count is properly tracked when using native tool calling."""
|
|
tool = CountingTool(max_usage_count=3)
|
|
|
|
assert tool.max_usage_count == 3
|
|
assert tool.current_usage_count == 0
|
|
|
|
agent = Agent(
|
|
role="Counting Agent",
|
|
goal="Call the counting tool multiple times",
|
|
backstory="You are an agent that counts things.",
|
|
tools=[tool],
|
|
llm=LLM(model="gpt-5-nano"),
|
|
verbose=False,
|
|
max_iter=5,
|
|
)
|
|
|
|
task = Task(
|
|
description="Call the counting_tool 3 times with values 'first', 'second', and 'third'",
|
|
expected_output="The results of the counting operations",
|
|
agent=agent,
|
|
)
|
|
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
crew.kickoff()
|
|
|
|
assert tool.max_usage_count == 3
|
|
assert tool.current_usage_count <= tool.max_usage_count
|
|
|
|
@pytest.mark.vcr()
|
|
def test_max_usage_count_limit_enforced_in_native_tool_calling(self) -> None:
|
|
"""Test that when max_usage_count is reached, tool returns error message."""
|
|
tool = CountingTool(max_usage_count=2)
|
|
|
|
agent = Agent(
|
|
role="Counting Agent",
|
|
goal="Use the counting tool as many times as requested",
|
|
backstory="You are an agent that counts things. You must try to use the tool for each value requested.",
|
|
tools=[tool],
|
|
llm=LLM(model="gpt-5-nano"),
|
|
verbose=False,
|
|
max_iter=5,
|
|
)
|
|
|
|
# Request more tool calls than the max_usage_count allows
|
|
task = Task(
|
|
description="Call the counting_tool 4 times with values 'one', 'two', 'three', and 'four'",
|
|
expected_output="The results of the counting operations, noting any failures",
|
|
agent=agent,
|
|
)
|
|
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
|
|
assert result is not None
|
|
assert tool.current_usage_count == tool.max_usage_count
|
|
# After hitting the limit, further calls should have been rejected
|
|
|
|
@pytest.mark.vcr()
|
|
def test_tool_usage_increments_after_successful_execution(self) -> None:
|
|
"""Test that usage count increments after each successful native tool call."""
|
|
tool = CountingTool(max_usage_count=10)
|
|
|
|
assert tool.current_usage_count == 0
|
|
|
|
agent = Agent(
|
|
role="Counting Agent",
|
|
goal="Use the counting tool exactly as requested",
|
|
backstory="You are an agent that counts things precisely.",
|
|
tools=[tool],
|
|
llm=LLM(model="gpt-5-nano"),
|
|
verbose=False,
|
|
max_iter=5,
|
|
)
|
|
|
|
task = Task(
|
|
description="Call the counting_tool exactly 2 times: first with value 'alpha', then with value 'beta'",
|
|
expected_output="The results showing both 'Counted: alpha' and 'Counted: beta'",
|
|
agent=agent,
|
|
)
|
|
|
|
crew = Crew(agents=[agent], tasks=[task])
|
|
result = crew.kickoff()
|
|
|
|
assert result is not None
|
|
assert tool.current_usage_count >= 2
|
|
assert tool.current_usage_count <= tool.max_usage_count
|
|
|
|
|
|
# JSON Parse Error Handling Tests
|
|
|
|
|
|
class TestNativeToolCallingJsonParseError:
|
|
"""Tests that malformed JSON tool arguments produce clear errors
|
|
instead of silently dropping all arguments."""
|
|
|
|
def _make_executor(self, tools: list[BaseTool]) -> "CrewAgentExecutor":
|
|
"""Create a minimal CrewAgentExecutor with mocked dependencies."""
|
|
from crewai.agents.crew_agent_executor import CrewAgentExecutor
|
|
from crewai.tools.base_tool import to_langchain
|
|
|
|
structured_tools = to_langchain(tools)
|
|
mock_agent = Mock()
|
|
mock_agent.key = "test_agent"
|
|
mock_agent.role = "tester"
|
|
mock_agent.verbose = False
|
|
mock_agent.fingerprint = None
|
|
mock_agent.tools_results = []
|
|
|
|
mock_task = Mock()
|
|
mock_task.name = "test"
|
|
mock_task.description = "test"
|
|
mock_task.id = "test-id"
|
|
|
|
executor = CrewAgentExecutor(
|
|
tools=structured_tools,
|
|
original_tools=tools,
|
|
)
|
|
executor.agent = mock_agent
|
|
executor.task = mock_task
|
|
return executor
|
|
|
|
def test_malformed_json_returns_parse_error(self) -> None:
|
|
"""Malformed JSON args must return a descriptive error, not silently become {}."""
|
|
|
|
class CodeTool(BaseTool):
|
|
name: str = "execute_code"
|
|
description: str = "Run code"
|
|
|
|
def _run(self, code: str) -> str:
|
|
return f"ran: {code}"
|
|
|
|
tool = CodeTool()
|
|
executor = self._make_executor([tool])
|
|
|
|
from crewai.utilities.agent_utils import convert_tools_to_openai_schema
|
|
_, available_functions, _ = convert_tools_to_openai_schema([tool])
|
|
|
|
malformed_json = '{"code": "print("hello")"}'
|
|
|
|
result = executor._execute_single_native_tool_call(
|
|
call_id="call_123",
|
|
func_name="execute_code",
|
|
func_args=malformed_json,
|
|
available_functions=available_functions,
|
|
)
|
|
|
|
assert "Failed to parse tool arguments as JSON" in result["result"]
|
|
assert tool.current_usage_count == 0
|
|
|
|
def test_valid_json_still_executes_normally(self) -> None:
|
|
"""Valid JSON args should execute the tool as before."""
|
|
|
|
class CodeTool(BaseTool):
|
|
name: str = "execute_code"
|
|
description: str = "Run code"
|
|
|
|
def _run(self, code: str) -> str:
|
|
return f"ran: {code}"
|
|
|
|
tool = CodeTool()
|
|
executor = self._make_executor([tool])
|
|
|
|
from crewai.utilities.agent_utils import convert_tools_to_openai_schema
|
|
_, available_functions, _ = convert_tools_to_openai_schema([tool])
|
|
|
|
valid_json = '{"code": "print(1)"}'
|
|
|
|
result = executor._execute_single_native_tool_call(
|
|
call_id="call_456",
|
|
func_name="execute_code",
|
|
func_args=valid_json,
|
|
available_functions=available_functions,
|
|
)
|
|
|
|
assert result["result"] == "ran: print(1)"
|
|
|
|
def test_native_tool_loop_falls_back_when_provider_rejects_tools(self) -> None:
|
|
"""Unsupported native tools errors should continue through ReAct."""
|
|
|
|
class SearchTool(BaseTool):
|
|
name: str = "search"
|
|
description: str = "Search for information"
|
|
|
|
def _run(self, query: str) -> str:
|
|
return f"result for {query}"
|
|
|
|
executor = self._make_executor([SearchTool()])
|
|
executor.llm = Mock()
|
|
executor.messages = [{"role": "user", "content": "Search for CrewAI"}]
|
|
executor.callbacks = []
|
|
executor.iterations = 0
|
|
executor.max_iter = 3
|
|
executor.request_within_rpm_limit = None
|
|
executor.respect_context_window = False
|
|
|
|
fallback_finish = AgentFinish(
|
|
thought="done",
|
|
output="final",
|
|
text="Final Answer: final",
|
|
)
|
|
with (
|
|
patch(
|
|
"crewai.agents.crew_agent_executor.get_llm_response",
|
|
side_effect=RuntimeError(
|
|
"registry.ollama.ai/library/mariner:latest does not support tools"
|
|
),
|
|
),
|
|
patch.object(
|
|
executor,
|
|
"_invoke_loop_react",
|
|
return_value=fallback_finish,
|
|
) as react_loop,
|
|
):
|
|
result = executor._invoke_loop_native_tools()
|
|
|
|
assert result is fallback_finish
|
|
react_loop.assert_called_once()
|
|
assert "Native tool calling is unavailable" in executor.messages[-1]["content"]
|
|
assert "Action Input" in executor.messages[-1]["content"]
|
|
|
|
def test_dict_args_bypass_json_parsing(self) -> None:
|
|
"""When func_args is already a dict, no JSON parsing occurs."""
|
|
|
|
class CodeTool(BaseTool):
|
|
name: str = "execute_code"
|
|
description: str = "Run code"
|
|
|
|
def _run(self, code: str) -> str:
|
|
return f"ran: {code}"
|
|
|
|
tool = CodeTool()
|
|
executor = self._make_executor([tool])
|
|
|
|
from crewai.utilities.agent_utils import convert_tools_to_openai_schema
|
|
_, available_functions, _ = convert_tools_to_openai_schema([tool])
|
|
|
|
result = executor._execute_single_native_tool_call(
|
|
call_id="call_789",
|
|
func_name="execute_code",
|
|
func_args={"code": "x = 42"},
|
|
available_functions=available_functions,
|
|
)
|
|
|
|
assert result["result"] == "ran: x = 42"
|
|
|
|
def test_schema_validation_catches_missing_args_on_native_path(self) -> None:
|
|
"""The native function calling path should now enforce args_schema,
|
|
catching missing required fields before _run is called."""
|
|
|
|
class StrictTool(BaseTool):
|
|
name: str = "strict_tool"
|
|
description: str = "A tool with required args"
|
|
|
|
def _run(self, code: str, language: str) -> str:
|
|
return f"{language}: {code}"
|
|
|
|
tool = StrictTool()
|
|
executor = self._make_executor([tool])
|
|
|
|
from crewai.utilities.agent_utils import convert_tools_to_openai_schema
|
|
_, available_functions, _ = convert_tools_to_openai_schema([tool])
|
|
|
|
result = executor._execute_single_native_tool_call(
|
|
call_id="call_schema",
|
|
func_name="strict_tool",
|
|
func_args={"code": "print(1)"},
|
|
available_functions=available_functions,
|
|
)
|
|
|
|
assert "Error" in result["result"]
|
|
assert "validation failed" in result["result"].lower() or "missing" in result["result"].lower()
|