docs: Add Tavily Search & Extractor tools to Search-Research suite (#3146)

* docs: Add Tavily Search and Extractor tools documentation

* docs: Add Tavily Search and Extractor tools to the documentation

---------

Co-authored-by: Tony Kipkemboi <iamtonykipkemboi@gmail.com>
This commit is contained in:
Ranuga Disansa
2025-07-21 21:31:29 +05:30
committed by GitHub
parent 2fd99503ed
commit 424433ff58
4 changed files with 276 additions and 3 deletions

View File

@@ -166,7 +166,9 @@
"en/tools/search-research/websitesearchtool", "en/tools/search-research/websitesearchtool",
"en/tools/search-research/codedocssearchtool", "en/tools/search-research/codedocssearchtool",
"en/tools/search-research/youtubechannelsearchtool", "en/tools/search-research/youtubechannelsearchtool",
"en/tools/search-research/youtubevideosearchtool" "en/tools/search-research/youtubevideosearchtool",
"en/tools/search-research/tavilysearchtool",
"en/tools/search-research/tavilyextractortool"
] ]
}, },
{ {

View File

@@ -44,6 +44,14 @@ These tools enable your agents to search the web, research topics, and find info
<Card title="YouTube Video Search" icon="play" href="/en/tools/search-research/youtubevideosearchtool"> <Card title="YouTube Video Search" icon="play" href="/en/tools/search-research/youtubevideosearchtool">
Find and analyze YouTube videos by topic, keyword, or criteria. Find and analyze YouTube videos by topic, keyword, or criteria.
</Card> </Card>
<Card title="Tavily Search Tool" icon="magnifying-glass" href="/en/tools/search-research/tavilysearchtool">
Comprehensive web search using Tavily's AI-powered search API.
</Card>
<Card title="Tavily Extractor Tool" icon="file-text" href="/en/tools/search-research/tavilyextractortool">
Extract structured content from web pages using the Tavily API.
</Card>
</CardGroup> </CardGroup>
## **Common Use Cases** ## **Common Use Cases**
@@ -55,17 +63,19 @@ These tools enable your agents to search the web, research topics, and find info
- **Academic Research**: Find scholarly articles and technical papers - **Academic Research**: Find scholarly articles and technical papers
```python ```python
from crewai_tools import SerperDevTool, GitHubSearchTool, YoutubeVideoSearchTool from crewai_tools import SerperDevTool, GitHubSearchTool, YoutubeVideoSearchTool, TavilySearchTool, TavilyExtractorTool
# Create research tools # Create research tools
web_search = SerperDevTool() web_search = SerperDevTool()
code_search = GitHubSearchTool() code_search = GitHubSearchTool()
video_research = YoutubeVideoSearchTool() video_research = YoutubeVideoSearchTool()
tavily_search = TavilySearchTool()
content_extractor = TavilyExtractorTool()
# Add to your agent # Add to your agent
agent = Agent( agent = Agent(
role="Research Analyst", role="Research Analyst",
tools=[web_search, code_search, video_research], tools=[web_search, code_search, video_research, tavily_search, content_extractor],
goal="Gather comprehensive information on any topic" goal="Gather comprehensive information on any topic"
) )
``` ```

View File

@@ -0,0 +1,139 @@
---
title: "Tavily Extractor Tool"
description: "Extract structured content from web pages using the Tavily API"
icon: "file-text"
---
The `TavilyExtractorTool` allows CrewAI agents to extract structured content from web pages using the Tavily API. It can process single URLs or lists of URLs and provides options for controlling the extraction depth and including images.
## Installation
To use the `TavilyExtractorTool`, you need to install the `tavily-python` library:
```shell
pip install 'crewai[tools]' tavily-python
```
You also need to set your Tavily API key as an environment variable:
```bash
export TAVILY_API_KEY='your-tavily-api-key'
```
## Example Usage
Here's how to initialize and use the `TavilyExtractorTool` within a CrewAI agent:
```python
import os
from crewai import Agent, Task, Crew
from crewai_tools import TavilyExtractorTool
# Ensure TAVILY_API_KEY is set in your environment
# os.environ["TAVILY_API_KEY"] = "YOUR_API_KEY"
# Initialize the tool
tavily_tool = TavilyExtractorTool()
# Create an agent that uses the tool
extractor_agent = Agent(
role='Web Content Extractor',
goal='Extract key information from specified web pages',
backstory='You are an expert at extracting relevant content from websites using the Tavily API.',
tools=[tavily_tool],
verbose=True
)
# Define a task for the agent
extract_task = Task(
description='Extract the main content from the URL https://example.com using basic extraction depth.',
expected_output='A JSON string containing the extracted content from the URL.',
agent=extractor_agent
)
# Create and run the crew
crew = Crew(
agents=[extractor_agent],
tasks=[extract_task],
verbose=2
)
result = crew.kickoff()
print(result)
```
## Configuration Options
The `TavilyExtractorTool` accepts the following arguments:
- `urls` (Union[List[str], str]): **Required**. A single URL string or a list of URL strings to extract data from.
- `include_images` (Optional[bool]): Whether to include images in the extraction results. Defaults to `False`.
- `extract_depth` (Literal["basic", "advanced"]): The depth of extraction. Use `"basic"` for faster, surface-level extraction or `"advanced"` for more comprehensive extraction. Defaults to `"basic"`.
- `timeout` (int): The maximum time in seconds to wait for the extraction request to complete. Defaults to `60`.
## Advanced Usage
### Multiple URLs with Advanced Extraction
```python
# Example with multiple URLs and advanced extraction
multi_extract_task = Task(
description='Extract content from https://example.com and https://anotherexample.org using advanced extraction.',
expected_output='A JSON string containing the extracted content from both URLs.',
agent=extractor_agent
)
# Configure the tool with custom parameters
custom_extractor = TavilyExtractorTool(
extract_depth='advanced',
include_images=True,
timeout=120
)
agent_with_custom_tool = Agent(
role="Advanced Content Extractor",
goal="Extract comprehensive content with images",
tools=[custom_extractor]
)
```
### Tool Parameters
You can customize the tool's behavior by setting parameters during initialization:
```python
# Initialize with custom configuration
extractor_tool = TavilyExtractorTool(
extract_depth='advanced', # More comprehensive extraction
include_images=True, # Include image results
timeout=90 # Custom timeout
)
```
## Features
- **Single or Multiple URLs**: Extract content from one URL or process multiple URLs in a single request
- **Configurable Depth**: Choose between basic (fast) and advanced (comprehensive) extraction modes
- **Image Support**: Optionally include images in the extraction results
- **Structured Output**: Returns well-formatted JSON containing the extracted content
- **Error Handling**: Robust handling of network timeouts and extraction errors
## Response Format
The tool returns a JSON string representing the structured data extracted from the provided URL(s). The exact structure depends on the content of the pages and the `extract_depth` used.
Common response elements include:
- **Title**: The page title
- **Content**: Main text content of the page
- **Images**: Image URLs and metadata (when `include_images=True`)
- **Metadata**: Additional page information like author, description, etc.
## Use Cases
- **Content Analysis**: Extract and analyze content from competitor websites
- **Research**: Gather structured data from multiple sources for analysis
- **Content Migration**: Extract content from existing websites for migration
- **Monitoring**: Regular extraction of content for change detection
- **Data Collection**: Systematic extraction of information from web sources
Refer to the [Tavily API documentation](https://docs.tavily.com/docs/tavily-api/python-sdk#extract) for detailed information about the response structure and available options.

View File

@@ -0,0 +1,122 @@
---
title: "Tavily Search Tool"
description: "Perform comprehensive web searches using the Tavily Search API"
icon: "magnifying-glass"
---
The `TavilySearchTool` provides an interface to the Tavily Search API, enabling CrewAI agents to perform comprehensive web searches. It allows for specifying search depth, topics, time ranges, included/excluded domains, and whether to include direct answers, raw content, or images in the results.
## Installation
To use the `TavilySearchTool`, you need to install the `tavily-python` library:
```shell
pip install 'crewai[tools]' tavily-python
```
## Environment Variables
Ensure your Tavily API key is set as an environment variable:
```bash
export TAVILY_API_KEY='your_tavily_api_key'
```
## Example Usage
Here's how to initialize and use the `TavilySearchTool` within a CrewAI agent:
```python
import os
from crewai import Agent, Task, Crew
from crewai_tools import TavilySearchTool
# Ensure the TAVILY_API_KEY environment variable is set
# os.environ["TAVILY_API_KEY"] = "YOUR_TAVILY_API_KEY"
# Initialize the tool
tavily_tool = TavilySearchTool()
# Create an agent that uses the tool
researcher = Agent(
role='Market Researcher',
goal='Find information about the latest AI trends',
backstory='An expert market researcher specializing in technology.',
tools=[tavily_tool],
verbose=True
)
# Create a task for the agent
research_task = Task(
description='Search for the top 3 AI trends in 2024.',
expected_output='A JSON report summarizing the top 3 AI trends found.',
agent=researcher
)
# Form the crew and kick it off
crew = Crew(
agents=[researcher],
tasks=[research_task],
verbose=2
)
result = crew.kickoff()
print(result)
```
## Configuration Options
The `TavilySearchTool` accepts the following arguments during initialization or when calling the `run` method:
- `query` (str): **Required**. The search query string.
- `search_depth` (Literal["basic", "advanced"], optional): The depth of the search. Defaults to `"basic"`.
- `topic` (Literal["general", "news", "finance"], optional): The topic to focus the search on. Defaults to `"general"`.
- `time_range` (Literal["day", "week", "month", "year"], optional): The time range for the search. Defaults to `None`.
- `days` (int, optional): The number of days to search back. Relevant if `time_range` is not set. Defaults to `7`.
- `max_results` (int, optional): The maximum number of search results to return. Defaults to `5`.
- `include_domains` (Sequence[str], optional): A list of domains to prioritize in the search. Defaults to `None`.
- `exclude_domains` (Sequence[str], optional): A list of domains to exclude from the search. Defaults to `None`.
- `include_answer` (Union[bool, Literal["basic", "advanced"]], optional): Whether to include a direct answer synthesized from the search results. Defaults to `False`.
- `include_raw_content` (bool, optional): Whether to include the raw HTML content of the searched pages. Defaults to `False`.
- `include_images` (bool, optional): Whether to include image results. Defaults to `False`.
- `timeout` (int, optional): The request timeout in seconds. Defaults to `60`.
## Advanced Usage
You can configure the tool with custom parameters:
```python
# Example: Initialize with specific parameters
custom_tavily_tool = TavilySearchTool(
search_depth='advanced',
max_results=10,
include_answer=True
)
# The agent will use these defaults
agent_with_custom_tool = Agent(
role="Advanced Researcher",
goal="Conduct detailed research with comprehensive results",
tools=[custom_tavily_tool]
)
```
## Features
- **Comprehensive Search**: Access to Tavily's powerful search index
- **Configurable Depth**: Choose between basic and advanced search modes
- **Topic Filtering**: Focus searches on general, news, or finance topics
- **Time Range Control**: Limit results to specific time periods
- **Domain Control**: Include or exclude specific domains
- **Direct Answers**: Get synthesized answers from search results
- **Content Filtering**: Prevent context window issues with automatic content truncation
## Response Format
The tool returns search results as a JSON string containing:
- Search results with titles, URLs, and content snippets
- Optional direct answers to queries
- Optional image results
- Optional raw HTML content (when enabled)
Content for each result is automatically truncated to prevent context window issues while maintaining the most relevant information.