Compare commits

..

2 Commits

Author SHA1 Message Date
Devin AI
f46d19e193 fix: address PR feedback with improved validation, documentation, and tests
Co-Authored-By: Joe Moura <joao@crewai.com>
2025-04-03 11:09:30 +00:00
Devin AI
d8571dc196 feat: add ToolWithInstruction wrapper for tool-specific usage instructions (issue #2515)
Co-Authored-By: Joe Moura <joao@crewai.com>
2025-04-03 11:04:12 +00:00
9 changed files with 392 additions and 129 deletions

View File

@@ -267,6 +267,7 @@ In addition to the sequential process, you can use the hierarchical process, whi
- **Role-Based Agent Design**: Customize agents with specific roles, goals, and tools.
- **Autonomous Inter-Agent Delegation**: Agents can autonomously delegate tasks and inquire amongst themselves, enhancing problem-solving efficiency.
- **Flexible Task Management**: Define tasks with customizable tools and assign them to agents dynamically.
- **Tool Instructions**: Attach specific usage instructions to tools for better control over when and how agents use them.
- **Processes Driven**: Currently only supports `sequential` task execution and `hierarchical` processes, but more complex processes like consensual and autonomous are being worked on.
- **Save output as file**: Save the output of individual tasks as a file, so you can use it later.
- **Parse output as Pydantic or Json**: Parse the output of individual tasks as a Pydantic model or as a Json if you want to.

View File

@@ -0,0 +1,153 @@
# Tool Instructions
CrewAI allows you to provide specific instructions for when and how to use tools. This is useful when you want to guide agents on proper tool usage without cluttering their backstory.
## Basic Usage
```python
from crewai import Agent
from crewai_tools import ScrapeWebsiteTool
from crewai.tools import ToolWithInstruction
# Create a tool with instructions
scrape_tool = ScrapeWebsiteTool()
scrape_with_instructions = ToolWithInstruction(
tool=scrape_tool,
instructions="""
ALWAYS use this tool when making a joke.
NEVER use this tool when making joke about someone's mom.
"""
)
# Use the tool with an agent
agent = Agent(
role="Comedian",
goal="Create hilarious and engaging jokes",
backstory="""
You are a professional stand-up comedian with years of experience in crafting jokes.
You have a great sense of humor and can create jokes about any topic
while keeping them appropriate and entertaining.
""",
tools=[scrape_with_instructions],
)
```
## Real-World Examples
### Example 1: Research Assistant with Web Search Tool
```python
from crewai import Agent
from crewai_tools import SearchTool
from crewai.tools import ToolWithInstruction
search_tool = SearchTool()
search_with_instructions = ToolWithInstruction(
tool=search_tool,
instructions="""
Use this tool ONLY for factual information that requires up-to-date data.
ALWAYS verify information by searching multiple sources.
DO NOT use this tool for speculative questions or opinions.
"""
)
research_agent = Agent(
role="Research Analyst",
goal="Provide accurate and well-sourced information",
backstory="You are a meticulous research analyst with attention to detail and fact-checking.",
tools=[search_with_instructions],
)
```
### Example 2: Data Scientist with Multiple Analysis Tools
```python
from crewai import Agent
from crewai_tools import PythonTool, DataVisualizationTool
from crewai.tools import ToolWithInstruction
# Python tool for data processing
python_tool = PythonTool()
python_with_instructions = ToolWithInstruction(
tool=python_tool,
instructions="""
Use this tool for data cleaning, transformation, and statistical analysis.
ALWAYS include comments in your code.
DO NOT use this tool for creating visualizations.
"""
)
# Visualization tool
viz_tool = DataVisualizationTool()
viz_with_instructions = ToolWithInstruction(
tool=viz_tool,
instructions="""
Use this tool ONLY for creating data visualizations.
ALWAYS label axes and include titles in your charts.
PREFER simple visualizations that clearly communicate the main insight.
"""
)
data_scientist = Agent(
role="Data Scientist",
goal="Analyze data and create insightful visualizations",
backstory="You are an experienced data scientist who excels at finding patterns in data.",
tools=[python_with_instructions, viz_with_instructions],
)
```
## How Instructions Are Presented to Agents
When an agent considers using a tool, the instructions are included in the tool's description. For example, a tool with instructions might appear to the agent like this:
```
Tool: search_web
Description: Search the web for information on a given topic.
Instructions: Use this tool ONLY for factual information that requires up-to-date data.
ALWAYS verify information by searching multiple sources.
DO NOT use this tool for speculative questions or opinions.
```
This clear presentation helps the agent understand when and how to use the tool appropriately.
## Dynamically Updating Instructions
You can update tool instructions dynamically during execution:
```python
# Create a tool with initial instructions
search_with_instructions = ToolWithInstruction(
tool=search_tool,
instructions="Initial instructions for tool usage"
)
# Later, update the instructions based on new requirements
search_with_instructions.update_instructions("Updated instructions for tool usage")
```
## Error Handling and Best Practices
### Validation
The `ToolWithInstruction` class includes validation to ensure instructions are not empty and don't exceed a maximum length. If you provide invalid instructions, a `ValueError` will be raised.
### Best Practices for Writing Instructions
1. **Be specific and clear** about when to use and when not to use the tool
2. **Use imperative language** like "ALWAYS", "NEVER", "USE", "DO NOT USE"
3. **Keep instructions concise** but comprehensive
4. **Include examples** of good and bad usage scenarios when possible
5. **Format instructions** with line breaks for readability
## When to Use Tool Instructions
Tool instructions are useful when:
1. You want to specify precise conditions for tool usage
2. You have multiple similar tools that should be used in different situations
3. You want to keep the agent's backstory focused on its role and personality,
not technical details about tools
4. You need to provide technical guidance on how to format inputs or interpret outputs
5. You want to enforce consistent tool usage across multiple agents
Tool instructions are semantically more correct than putting tool usage guidelines in the agent's backstory.

View File

@@ -4,7 +4,6 @@ import uuid
import warnings
from concurrent.futures import Future
from hashlib import md5
from crewai.llm import LLM
from typing import Any, Callable, Dict, List, Optional, Tuple, Union
from pydantic import (
@@ -1076,36 +1075,19 @@ class Crew(BaseModel):
def test(
self,
n_iterations: int,
llm: Union[str, LLM],
openai_model_name: Optional[str] = None,
inputs: Optional[Dict[str, Any]] = None,
) -> None:
"""Test and evaluate the Crew with the given inputs for n iterations concurrently using concurrent.futures.
Args:
n_iterations: Number of test iterations to run
llm: Language model to use for evaluation. Can be either a model name string (e.g. "gpt-4")
or an LLM instance for custom implementations
inputs: Optional dictionary of input values to use for task execution
Example:
```python
# Using model name string
crew.test(n_iterations=3, llm="gpt-4")
# Using custom LLM implementation
custom_llm = LLM(model="custom-model")
crew.test(n_iterations=3, llm=custom_llm)
```
"""
"""Test and evaluate the Crew with the given inputs for n iterations concurrently using concurrent.futures."""
test_crew = self.copy()
self._test_execution_span = test_crew._telemetry.test_execution_span(
test_crew,
n_iterations,
inputs,
str(llm) if isinstance(llm, LLM) else llm,
)
evaluator = CrewEvaluator(test_crew, llm)
openai_model_name, # type: ignore[arg-type]
) # type: ignore[arg-type]
evaluator = CrewEvaluator(test_crew, openai_model_name) # type: ignore[arg-type]
for i in range(1, n_iterations + 1):
evaluator.set_iteration(i)

View File

@@ -1 +1,2 @@
from .base_tool import BaseTool, tool
from .tool_with_instruction import ToolWithInstruction

View File

@@ -0,0 +1,110 @@
from typing import Any, List, Optional, Dict, Callable, Union, ClassVar
from pydantic import Field, model_validator, field_validator, ConfigDict
from crewai.tools.base_tool import BaseTool
from crewai.tools.structured_tool import CrewStructuredTool
class ToolWithInstruction(BaseTool):
"""A wrapper for tools that adds specific usage instructions.
This allows users to provide specific instructions on when and how to use a tool,
without having to include these instructions in the agent's backstory.
Attributes:
tool: The tool to wrap
instructions: Specific instructions about when and how to use this tool
name: Name of the tool (inherited from the wrapped tool)
description: Description of the tool (inherited from the wrapped tool with instructions)
"""
MAX_INSTRUCTION_LENGTH: ClassVar[int] = 2000
name: str = Field(default="", description="Name of the tool")
description: str = Field(default="", description="Description of the tool")
tool: BaseTool = Field(description="The tool to wrap")
instructions: str = Field(description="Instructions about when and how to use this tool")
model_config = ConfigDict(arbitrary_types_allowed=True)
@field_validator("instructions")
@classmethod
def validate_instructions(cls, value: str) -> str:
"""Validate that instructions are not empty and not too long.
Args:
value: The instructions string to validate
Returns:
str: The validated and sanitized instructions
Raises:
ValueError: If instructions are empty or exceed maximum length
"""
if not value or not value.strip():
raise ValueError("Instructions cannot be empty")
if len(value) > cls.MAX_INSTRUCTION_LENGTH:
raise ValueError(
f"Instructions exceed maximum length of {cls.MAX_INSTRUCTION_LENGTH} characters"
)
return value.strip()
@model_validator(mode="after")
def set_tool_attributes(self) -> "ToolWithInstruction":
"""Sets name, description, and args_schema from the wrapped tool.
Returns:
ToolWithInstruction: The validated instance with updated attributes.
"""
self.name = self.tool.name
self.description = f"{self.tool.description}\nInstructions: {self.instructions}"
self.args_schema = self.tool.args_schema
return self
def update_instructions(self, new_instructions: str) -> None:
"""Updates the tool's usage instructions.
Args:
new_instructions (str): New instructions for tool usage.
Raises:
ValueError: If new instructions are empty or exceed maximum length
"""
if not new_instructions or not new_instructions.strip():
raise ValueError("Instructions cannot be empty")
if len(new_instructions) > self.MAX_INSTRUCTION_LENGTH:
raise ValueError(
f"Instructions exceed maximum length of {self.MAX_INSTRUCTION_LENGTH} characters"
)
self.instructions = new_instructions.strip()
self.description = f"{self.tool.description}\nInstructions: {self.instructions}"
def _run(self, *args: Any, **kwargs: Any) -> Any:
"""Run the wrapped tool.
Args:
*args: Positional arguments to pass to the wrapped tool
**kwargs: Keyword arguments to pass to the wrapped tool
Returns:
Any: The result from the wrapped tool's _run method
"""
return self.tool._run(*args, **kwargs)
def to_structured_tool(self) -> CrewStructuredTool:
"""Convert this tool to a CrewStructuredTool instance.
Returns:
CrewStructuredTool: A structured tool with instructions included in the description
"""
structured_tool = self.tool.to_structured_tool()
structured_tool.description = f"{structured_tool.description}\nInstructions: {self.instructions}"
return structured_tool

View File

@@ -1,16 +1,10 @@
from collections import defaultdict
from typing import Any, Dict, List, Optional, TypeVar, Union
from typing import DefaultDict # Separate import to avoid circular imports
from pydantic import BaseModel, Field
from rich.box import HEAVY_EDGE
from rich.console import Console
from rich.table import Table
from crewai.llm import LLM
T = TypeVar('T', bound=LLM)
from crewai.agent import Agent
from crewai.task import Task
from crewai.tasks.task_output import TaskOutput
@@ -34,47 +28,14 @@ class CrewEvaluator:
iteration (int): The current iteration of the evaluation.
"""
_tasks_scores: DefaultDict[int, List[float]] = Field(
default_factory=lambda: defaultdict(list))
_run_execution_times: DefaultDict[int, List[float]] = Field(
default_factory=lambda: defaultdict(list))
tasks_scores: defaultdict = defaultdict(list)
run_execution_times: defaultdict = defaultdict(list)
iteration: int = 0
@property
def tasks_scores(self) -> DefaultDict[int, List[float]]:
return self._tasks_scores
@tasks_scores.setter
def tasks_scores(self, value: Dict[int, List[float]]) -> None:
self._tasks_scores = defaultdict(list, value)
@property
def run_execution_times(self) -> DefaultDict[int, List[float]]:
return self._run_execution_times
@run_execution_times.setter
def run_execution_times(self, value: Dict[int, List[float]]) -> None:
self._run_execution_times = defaultdict(list, value)
def __init__(self, crew, llm: Union[str, T]):
"""Initialize the CrewEvaluator.
Args:
crew: The Crew instance to evaluate
llm: Language model to use for evaluation. Can be either a model name string
or an LLM instance for custom implementations
Raises:
ValueError: If llm is None or invalid
"""
if not llm:
raise ValueError("Invalid LLM configuration")
def __init__(self, crew, openai_model_name: str):
self.crew = crew
self.llm = LLM(model=llm) if isinstance(llm, str) else llm
self.openai_model_name = openai_model_name
self._telemetry = Telemetry()
self._tasks_scores = defaultdict(list)
self._run_execution_times = defaultdict(list)
self._setup_for_evaluating()
def _setup_for_evaluating(self) -> None:
@@ -90,7 +51,7 @@ class CrewEvaluator:
),
backstory="Evaluator agent for crew evaluation with precise capabilities to evaluate the performance of the agents in the crew based on the tasks they have performed",
verbose=False,
llm=self.llm,
llm=self.openai_model_name,
)
def _evaluation_task(
@@ -220,19 +181,11 @@ class CrewEvaluator:
self.crew,
evaluation_result.pydantic.quality,
current_task._execution_time,
self._get_llm_identifier(),
self.openai_model_name,
)
self._tasks_scores[self.iteration].append(evaluation_result.pydantic.quality)
self._run_execution_times[self.iteration].append(
self.tasks_scores[self.iteration].append(evaluation_result.pydantic.quality)
self.run_execution_times[self.iteration].append(
current_task._execution_time
)
else:
raise ValueError("Evaluation result is not in the expected format")
def _get_llm_identifier(self) -> str:
"""Get a string identifier for the LLM instance.
Returns:
String representation of the LLM for telemetry
"""
return str(self.llm) if isinstance(self.llm, LLM) else self.llm

View File

@@ -10,7 +10,6 @@ import instructor
import pydantic_core
import pytest
from crewai.llm import LLM
from crewai.agent import Agent
from crewai.agents.cache import CacheHandler
from crewai.crew import Crew
@@ -1124,7 +1123,7 @@ def test_kickoff_for_each_empty_input():
assert results == []
@pytest.mark.vcr(filter_headeruvs=["authorization"])
@pytest.mark.vcr(filter_headers=["authorization"])
def test_kickoff_for_each_invalid_input():
"""Tests if kickoff_for_each raises TypeError for invalid input types."""
@@ -2829,7 +2828,7 @@ def test_crew_testing_function(kickoff_mock, copy_mock, crew_evaluator):
copy_mock.return_value = crew
n_iterations = 2
crew.test(n_iterations, llm="gpt-4o-mini", inputs={"topic": "AI"})
crew.test(n_iterations, openai_model_name="gpt-4o-mini", inputs={"topic": "AI"})
# Ensure kickoff is called on the copied crew
kickoff_mock.assert_has_calls(
@@ -2845,32 +2844,6 @@ def test_crew_testing_function(kickoff_mock, copy_mock, crew_evaluator):
]
)
@mock.patch("crewai.crew.CrewEvaluator")
@mock.patch("crewai.crew.Crew.copy")
@mock.patch("crewai.crew.Crew.kickoff")
def test_crew_testing_with_custom_llm(kickoff_mock, copy_mock, crew_evaluator):
task = Task(
description="Test task",
expected_output="Test output",
agent=researcher,
)
crew = Crew(agents=[researcher], tasks=[task])
copy_mock.return_value = crew
custom_llm = LLM(model="gpt-4")
crew.test(2, llm=custom_llm, inputs={"topic": "AI"})
kickoff_mock.assert_has_calls([
mock.call(inputs={"topic": "AI"}),
mock.call(inputs={"topic": "AI"})
])
crew_evaluator.assert_has_calls([
mock.call(crew, custom_llm),
mock.call().set_iteration(1),
mock.call().set_iteration(2),
mock.call().print_crew_evaluation_result(),
])
@pytest.mark.vcr(filter_headers=["authorization"])
def test_hierarchical_verbose_manager_agent():
@@ -3152,4 +3125,4 @@ def test_multimodal_agent_live_image_analysis():
# Verify we got a meaningful response
assert isinstance(result.raw, str)
assert len(result.raw) > 100 # Expecting a detailed analysis
assert "error" not in result.raw.lower() # No error messages in response
assert "error" not in result.raw.lower() # No error messages in response

View File

@@ -0,0 +1,110 @@
import pytest
from unittest.mock import MagicMock, patch
from typing import Any, Dict, Optional
from crewai.tools.base_tool import BaseTool, Tool
from crewai.tools.tool_with_instruction import ToolWithInstruction
class MockTool(BaseTool):
"""Mock tool for testing."""
name: str = "mock_tool"
description: str = "A mock tool for testing"
def _run(self, *args: Any, **kwargs: Any) -> str:
return "mock result"
class TestToolWithInstruction:
"""Test suite for ToolWithInstruction."""
def test_initialization(self):
"""Test tool initialization with instructions."""
tool = MockTool()
instructions = "Only use this tool for XYZ"
wrapped_tool = ToolWithInstruction(tool=tool, instructions=instructions)
assert wrapped_tool.name == tool.name
assert "Instructions: Only use this tool for XYZ" in wrapped_tool.description
assert wrapped_tool.args_schema == tool.args_schema
def test_run_method(self):
"""Test that the run method delegates to the original tool."""
tool = MockTool()
instructions = "Only use this tool for XYZ"
wrapped_tool = ToolWithInstruction(tool=tool, instructions=instructions)
result = wrapped_tool.run()
assert result == "mock result"
def test_to_structured_tool(self):
"""Test that to_structured_tool includes instructions."""
tool = MockTool()
instructions = "Only use this tool for XYZ"
wrapped_tool = ToolWithInstruction(tool=tool, instructions=instructions)
structured_tool = wrapped_tool.to_structured_tool()
assert "Instructions: Only use this tool for XYZ" in structured_tool.description
def test_with_function_tool(self):
"""Test tool wrapping with a function tool."""
def sample_func():
return "sample result"
tool = Tool(
name="sample_tool",
description="A sample tool",
func=sample_func
)
instructions = "Only use this tool for XYZ"
wrapped_tool = ToolWithInstruction(tool=tool, instructions=instructions)
assert wrapped_tool.name == tool.name
assert "Instructions: Only use this tool for XYZ" in wrapped_tool.description
def test_empty_instructions(self):
"""Test that empty instructions raise ValueError."""
tool = MockTool()
with pytest.raises(ValueError, match="Instructions cannot be empty"):
ToolWithInstruction(tool=tool, instructions="")
with pytest.raises(ValueError, match="Instructions cannot be empty"):
ToolWithInstruction(tool=tool, instructions=" ")
def test_too_long_instructions(self):
"""Test that instructions exceeding maximum length raise ValueError."""
tool = MockTool()
long_instructions = "x" * (ToolWithInstruction.MAX_INSTRUCTION_LENGTH + 1)
with pytest.raises(ValueError, match="Instructions exceed maximum length"):
ToolWithInstruction(tool=tool, instructions=long_instructions)
def test_update_instructions(self):
"""Test updating instructions dynamically."""
tool = MockTool()
initial_instructions = "Initial instructions"
new_instructions = "Updated instructions"
wrapped_tool = ToolWithInstruction(tool=tool, instructions=initial_instructions)
assert "Instructions: Initial instructions" in wrapped_tool.description
wrapped_tool.update_instructions(new_instructions)
assert "Instructions: Updated instructions" in wrapped_tool.description
assert wrapped_tool.instructions == new_instructions
def test_update_instructions_validation(self):
"""Test validation when updating instructions."""
tool = MockTool()
wrapped_tool = ToolWithInstruction(tool=tool, instructions="Valid instructions")
with pytest.raises(ValueError, match="Instructions cannot be empty"):
wrapped_tool.update_instructions("")
long_instructions = "x" * (ToolWithInstruction.MAX_INSTRUCTION_LENGTH + 1)
with pytest.raises(ValueError, match="Instructions exceed maximum length"):
wrapped_tool.update_instructions(long_instructions)

View File

@@ -2,7 +2,6 @@ from unittest import mock
import pytest
from crewai.llm import LLM
from crewai.agent import Agent
from crewai.crew import Crew
from crewai.task import Task
@@ -24,7 +23,7 @@ class TestCrewEvaluator:
)
crew = Crew(agents=[agent], tasks=[task])
return CrewEvaluator(crew, llm="gpt-4o-mini")
return CrewEvaluator(crew, openai_model_name="gpt-4o-mini")
def test_setup_for_evaluating(self, crew_planner):
crew_planner._setup_for_evaluating()
@@ -48,25 +47,6 @@ class TestCrewEvaluator:
assert agent.verbose is False
assert agent.llm.model == "gpt-4o-mini"
@pytest.mark.parametrize("llm_input,expected_model", [
(LLM(model="gpt-4"), "gpt-4"),
("gpt-4", "gpt-4"),
])
def test_evaluator_with_llm_types(self, crew_planner, llm_input, expected_model):
evaluator = CrewEvaluator(crew_planner.crew, llm_input)
agent = evaluator._evaluator_agent()
assert agent.llm.model == expected_model
def test_evaluator_with_invalid_llm(self, crew_planner):
with pytest.raises(ValueError, match="Invalid LLM configuration"):
CrewEvaluator(crew_planner.crew, None)
def test_evaluator_with_string_llm(self, crew_planner):
evaluator = CrewEvaluator(crew_planner.crew, "gpt-4")
agent = evaluator._evaluator_agent()
assert isinstance(agent.llm, LLM)
assert agent.llm.model == "gpt-4"
def test_evaluation_task(self, crew_planner):
evaluator_agent = Agent(
role="Evaluator Agent",