Add pt-BR docs translation (#3039)

* docs: add pt-br translations

Powered by a CrewAI Flow https://github.com/danielfsbarreto/docs_translator

* Update mcp/overview.mdx brazilian docs

Its en-US counterpart was updated after I did a pass,
so now it includes the new section about @CrewBase
This commit is contained in:
Daniel Barreto
2025-06-25 12:52:33 -03:00
committed by GitHub
parent f6dfec61d6
commit a50fae3a4b
339 changed files with 33822 additions and 517 deletions

View File

@@ -0,0 +1,118 @@
---
title: AI Mind Tool
description: The `AIMindTool` is designed to query data sources in natural language.
icon: brain
---
# `AIMindTool`
## Description
The `AIMindTool` is a wrapper around [AI-Minds](https://mindsdb.com/minds) provided by [MindsDB](https://mindsdb.com/). It allows you to query data sources in natural language by simply configuring their connection parameters. This tool is useful when you need answers to questions from your data stored in various data sources including PostgreSQL, MySQL, MariaDB, ClickHouse, Snowflake, and Google BigQuery.
Minds are AI systems that work similarly to large language models (LLMs) but go beyond by answering any question from any data. This is accomplished by:
- Selecting the most relevant data for an answer using parametric search
- Understanding the meaning and providing responses within the correct context through semantic search
- Delivering precise answers by analyzing data and using machine learning (ML) models
## Installation
To incorporate this tool into your project, you need to install the Minds SDK:
```shell
uv add minds-sdk
```
## Steps to Get Started
To effectively use the `AIMindTool`, follow these steps:
1. **Package Installation**: Confirm that the `crewai[tools]` and `minds-sdk` packages are installed in your Python environment.
2. **API Key Acquisition**: Sign up for a Minds account [here](https://mdb.ai/register), and obtain an API key.
3. **Environment Configuration**: Store your obtained API key in an environment variable named `MINDS_API_KEY` to facilitate its use by the tool.
## Example
The following example demonstrates how to initialize the tool and execute a query:
```python Code
from crewai_tools import AIMindTool
# Initialize the AIMindTool
aimind_tool = AIMindTool(
datasources=[
{
"description": "house sales data",
"engine": "postgres",
"connection_data": {
"user": "demo_user",
"password": "demo_password",
"host": "samples.mindsdb.com",
"port": 5432,
"database": "demo",
"schema": "demo_data"
},
"tables": ["house_sales"]
}
]
)
# Run a natural language query
result = aimind_tool.run("How many 3 bedroom houses were sold in 2008?")
print(result)
```
## Parameters
The `AIMindTool` accepts the following parameters:
- **api_key**: Optional. Your Minds API key. If not provided, it will be read from the `MINDS_API_KEY` environment variable.
- **datasources**: A list of dictionaries, each containing the following keys:
- **description**: A description of the data contained in the datasource.
- **engine**: The engine (or type) of the datasource.
- **connection_data**: A dictionary containing the connection parameters for the datasource.
- **tables**: A list of tables that the data source will use. This is optional and can be omitted if all tables in the data source are to be used.
A list of supported data sources and their connection parameters can be found [here](https://docs.mdb.ai/docs/data_sources).
## Agent Integration Example
Here's how to integrate the `AIMindTool` with a CrewAI agent:
```python Code
from crewai import Agent
from crewai.project import agent
from crewai_tools import AIMindTool
# Initialize the tool
aimind_tool = AIMindTool(
datasources=[
{
"description": "sales data",
"engine": "postgres",
"connection_data": {
"user": "your_user",
"password": "your_password",
"host": "your_host",
"port": 5432,
"database": "your_db",
"schema": "your_schema"
},
"tables": ["sales"]
}
]
)
# Define an agent with the AIMindTool
@agent
def data_analyst(self) -> Agent:
return Agent(
config=self.agents_config["data_analyst"],
allow_delegation=False,
tools=[aimind_tool]
)
```
## Conclusion
The `AIMindTool` provides a powerful way to query your data sources using natural language, making it easier to extract insights without writing complex SQL queries. By connecting to various data sources and leveraging AI-Minds technology, this tool enables agents to access and analyze data efficiently.

View File

@@ -0,0 +1,209 @@
---
title: Code Interpreter
description: The `CodeInterpreterTool` is a powerful tool designed for executing Python 3 code within a secure, isolated environment.
icon: code-simple
---
# `CodeInterpreterTool`
## Description
The `CodeInterpreterTool` enables CrewAI agents to execute Python 3 code that they generate autonomously. This functionality is particularly valuable as it allows agents to create code, execute it, obtain the results, and utilize that information to inform subsequent decisions and actions.
There are several ways to use this tool:
### Docker Container (Recommended)
This is the primary option. The code runs in a secure, isolated Docker container, ensuring safety regardless of its content.
Make sure Docker is installed and running on your system. If you dont have it, you can install it from [here](https://docs.docker.com/get-docker/).
### Sandbox environment
If Docker is unavailable — either not installed or not accessible for any reason — the code will be executed in a restricted Python environment - called sandbox.
This environment is very limited, with strict restrictions on many modules and built-in functions.
### Unsafe Execution
**NOT RECOMMENDED FOR PRODUCTION**
This mode allows execution of any Python code, including dangerous calls to `sys, os..` and similar modules. [Check out](/en/tools/ai-ml/codeinterpretertool#enabling-unsafe-mode) how to enable this mode
## Logging
The `CodeInterpreterTool` logs the selected execution strategy to STDOUT
## Installation
To use this tool, you need to install the CrewAI tools package:
```shell
pip install 'crewai[tools]'
```
## Example
The following example demonstrates how to use the `CodeInterpreterTool` with a CrewAI agent:
```python Code
from crewai import Agent, Task, Crew, Process
from crewai_tools import CodeInterpreterTool
# Initialize the tool
code_interpreter = CodeInterpreterTool()
# Define an agent that uses the tool
programmer_agent = Agent(
role="Python Programmer",
goal="Write and execute Python code to solve problems",
backstory="An expert Python programmer who can write efficient code to solve complex problems.",
tools=[code_interpreter],
verbose=True,
)
# Example task to generate and execute code
coding_task = Task(
description="Write a Python function to calculate the Fibonacci sequence up to the 10th number and print the result.",
expected_output="The Fibonacci sequence up to the 10th number.",
agent=programmer_agent,
)
# Create and run the crew
crew = Crew(
agents=[programmer_agent],
tasks=[coding_task],
verbose=True,
process=Process.sequential,
)
result = crew.kickoff()
```
You can also enable code execution directly when creating an agent:
```python Code
from crewai import Agent
# Create an agent with code execution enabled
programmer_agent = Agent(
role="Python Programmer",
goal="Write and execute Python code to solve problems",
backstory="An expert Python programmer who can write efficient code to solve complex problems.",
allow_code_execution=True, # This automatically adds the CodeInterpreterTool
verbose=True,
)
```
### Enabling `unsafe_mode`
```python Code
from crewai_tools import CodeInterpreterTool
code = """
import os
os.system("ls -la")
"""
CodeInterpreterTool(unsafe_mode=True).run(code=code)
```
## Parameters
The `CodeInterpreterTool` accepts the following parameters during initialization:
- **user_dockerfile_path**: Optional. Path to a custom Dockerfile to use for the code interpreter container.
- **user_docker_base_url**: Optional. URL to the Docker daemon to use for running the container.
- **unsafe_mode**: Optional. Whether to run code directly on the host machine instead of in a Docker container or sandbox. Default is `False`. Use with caution!
- **default_image_tag**: Optional. Default Docker image tag. Default is `code-interpreter:latest`
When using the tool with an agent, the agent will need to provide:
- **code**: Required. The Python 3 code to execute.
- **libraries_used**: Optional. A list of libraries used in the code that need to be installed. Default is `[]`
## Agent Integration Example
Here's a more detailed example of how to integrate the `CodeInterpreterTool` with a CrewAI agent:
```python Code
from crewai import Agent, Task, Crew
from crewai_tools import CodeInterpreterTool
# Initialize the tool
code_interpreter = CodeInterpreterTool()
# Define an agent that uses the tool
data_analyst = Agent(
role="Data Analyst",
goal="Analyze data using Python code",
backstory="""You are an expert data analyst who specializes in using Python
to analyze and visualize data. You can write efficient code to process
large datasets and extract meaningful insights.""",
tools=[code_interpreter],
verbose=True,
)
# Create a task for the agent
analysis_task = Task(
description="""
Write Python code to:
1. Generate a random dataset of 100 points with x and y coordinates
2. Calculate the correlation coefficient between x and y
3. Create a scatter plot of the data
4. Print the correlation coefficient and save the plot as 'scatter.png'
Make sure to handle any necessary imports and print the results.
""",
expected_output="The correlation coefficient and confirmation that the scatter plot has been saved.",
agent=data_analyst,
)
# Run the task
crew = Crew(
agents=[data_analyst],
tasks=[analysis_task],
verbose=True,
process=Process.sequential,
)
result = crew.kickoff()
```
## Implementation Details
The `CodeInterpreterTool` uses Docker to create a secure environment for code execution:
```python Code
class CodeInterpreterTool(BaseTool):
name: str = "Code Interpreter"
description: str = "Interprets Python3 code strings with a final print statement."
args_schema: Type[BaseModel] = CodeInterpreterSchema
default_image_tag: str = "code-interpreter:latest"
def _run(self, **kwargs) -> str:
code = kwargs.get("code", self.code)
libraries_used = kwargs.get("libraries_used", [])
if self.unsafe_mode:
return self.run_code_unsafe(code, libraries_used)
else:
return self.run_code_safety(code, libraries_used)
```
The tool performs the following steps:
1. Verifies that the Docker image exists or builds it if necessary
2. Creates a Docker container with the current working directory mounted
3. Installs any required libraries specified by the agent
4. Executes the Python code in the container
5. Returns the output of the code execution
6. Cleans up by stopping and removing the container
## Security Considerations
By default, the `CodeInterpreterTool` runs code in an isolated Docker container, which provides a layer of security. However, there are still some security considerations to keep in mind:
1. The Docker container has access to the current working directory, so sensitive files could potentially be accessed.
2. If the Docker container is unavailable and the code needs to run safely, it will be executed in a sandbox environment. For security reasons, installing arbitrary libraries is not allowed
3. The `unsafe_mode` parameter allows code to be executed directly on the host machine, which should only be used in trusted environments.
4. Be cautious when allowing agents to install arbitrary libraries, as they could potentially include malicious code.
## Conclusion
The `CodeInterpreterTool` provides a powerful way for CrewAI agents to execute Python code in a relatively secure environment. By enabling agents to write and run code, it significantly expands their problem-solving capabilities, especially for tasks involving data analysis, calculations, or other computational work. This tool is particularly useful for agents that need to perform complex operations that are more efficiently expressed in code than in natural language.

View File

@@ -0,0 +1,51 @@
---
title: DALL-E Tool
description: The `DallETool` is a powerful tool designed for generating images from textual descriptions.
icon: image
---
# `DallETool`
## Description
This tool is used to give the Agent the ability to generate images using the DALL-E model. It is a transformer-based model that generates images from textual descriptions.
This tool allows the Agent to generate images based on the text input provided by the user.
## Installation
Install the crewai_tools package
```shell
pip install 'crewai[tools]'
```
## Example
Remember that when using this tool, the text must be generated by the Agent itself. The text must be a description of the image you want to generate.
```python Code
from crewai_tools import DallETool
Agent(
...
tools=[DallETool()],
)
```
If needed you can also tweak the parameters of the DALL-E model by passing them as arguments to the `DallETool` class. For example:
```python Code
from crewai_tools import DallETool
dalle_tool = DallETool(model="dall-e-3",
size="1024x1024",
quality="standard",
n=1)
Agent(
...
tools=[dalle_tool]
)
```
The parameters are based on the `client.images.generate` method from the OpenAI API. For more information on the parameters,
please refer to the [OpenAI API documentation](https://platform.openai.com/docs/guides/images/introduction?lang=python).

View File

@@ -0,0 +1,58 @@
---
title: LangChain Tool
description: The `LangChainTool` is a wrapper for LangChain tools and query engines.
icon: link
---
## `LangChainTool`
<Info>
CrewAI seamlessly integrates with LangChain's comprehensive [list of tools](https://python.langchain.com/docs/integrations/tools/), all of which can be used with CrewAI.
</Info>
```python Code
import os
from dotenv import load_dotenv
from crewai import Agent, Task, Crew
from crewai.tools import BaseTool
from pydantic import Field
from langchain_community.utilities import GoogleSerperAPIWrapper
# Set up your SERPER_API_KEY key in an .env file, eg:
# SERPER_API_KEY=<your api key>
load_dotenv()
search = GoogleSerperAPIWrapper()
class SearchTool(BaseTool):
name: str = "Search"
description: str = "Useful for search-based queries. Use this to find current information about markets, companies, and trends."
search: GoogleSerperAPIWrapper = Field(default_factory=GoogleSerperAPIWrapper)
def _run(self, query: str) -> str:
"""Execute the search query and return results"""
try:
return self.search.run(query)
except Exception as e:
return f"Error performing search: {str(e)}"
# Create Agents
researcher = Agent(
role='Research Analyst',
goal='Gather current market data and trends',
backstory="""You are an expert research analyst with years of experience in
gathering market intelligence. You're known for your ability to find
relevant and up-to-date market information and present it in a clear,
actionable format.""",
tools=[SearchTool()],
verbose=True
)
# rest of the code ...
```
## Conclusion
Tools are pivotal in extending the capabilities of CrewAI agents, enabling them to undertake a broad spectrum of tasks and collaborate effectively.
When building solutions with CrewAI, leverage both custom and existing tools to empower your agents and enhance the AI ecosystem. Consider utilizing error handling, caching mechanisms,
and the flexibility of tool arguments to optimize your agents' performance and capabilities.

View File

@@ -0,0 +1,146 @@
---
title: LlamaIndex Tool
description: The `LlamaIndexTool` is a wrapper for LlamaIndex tools and query engines.
icon: address-book
---
# `LlamaIndexTool`
## Description
The `LlamaIndexTool` is designed to be a general wrapper around LlamaIndex tools and query engines, enabling you to leverage LlamaIndex resources in terms of RAG/agentic pipelines as tools to plug into CrewAI agents. This tool allows you to seamlessly integrate LlamaIndex's powerful data processing and retrieval capabilities into your CrewAI workflows.
## Installation
To use this tool, you need to install LlamaIndex:
```shell
uv add llama-index
```
## Steps to Get Started
To effectively use the `LlamaIndexTool`, follow these steps:
1. **Install LlamaIndex**: Install the LlamaIndex package using the command above.
2. **Set Up LlamaIndex**: Follow the [LlamaIndex documentation](https://docs.llamaindex.ai/) to set up a RAG/agent pipeline.
3. **Create a Tool or Query Engine**: Create a LlamaIndex tool or query engine that you want to use with CrewAI.
## Example
The following examples demonstrate how to initialize the tool from different LlamaIndex components:
### From a LlamaIndex Tool
```python Code
from crewai_tools import LlamaIndexTool
from crewai import Agent
from llama_index.core.tools import FunctionTool
# Example 1: Initialize from FunctionTool
def search_data(query: str) -> str:
"""Search for information in the data."""
# Your implementation here
return f"Results for: {query}"
# Create a LlamaIndex FunctionTool
og_tool = FunctionTool.from_defaults(
search_data,
name="DataSearchTool",
description="Search for information in the data"
)
# Wrap it with LlamaIndexTool
tool = LlamaIndexTool.from_tool(og_tool)
# Define an agent that uses the tool
@agent
def researcher(self) -> Agent:
'''
This agent uses the LlamaIndexTool to search for information.
'''
return Agent(
config=self.agents_config["researcher"],
tools=[tool]
)
```
### From LlamaHub Tools
```python Code
from crewai_tools import LlamaIndexTool
from llama_index.tools.wolfram_alpha import WolframAlphaToolSpec
# Initialize from LlamaHub Tools
wolfram_spec = WolframAlphaToolSpec(app_id="your_app_id")
wolfram_tools = wolfram_spec.to_tool_list()
tools = [LlamaIndexTool.from_tool(t) for t in wolfram_tools]
```
### From a LlamaIndex Query Engine
```python Code
from crewai_tools import LlamaIndexTool
from llama_index.core import VectorStoreIndex
from llama_index.core.readers import SimpleDirectoryReader
# Load documents
documents = SimpleDirectoryReader("./data").load_data()
# Create an index
index = VectorStoreIndex.from_documents(documents)
# Create a query engine
query_engine = index.as_query_engine()
# Create a LlamaIndexTool from the query engine
query_tool = LlamaIndexTool.from_query_engine(
query_engine,
name="Company Data Query Tool",
description="Use this tool to lookup information in company documents"
)
```
## Class Methods
The `LlamaIndexTool` provides two main class methods for creating instances:
### from_tool
Creates a `LlamaIndexTool` from a LlamaIndex tool.
```python Code
@classmethod
def from_tool(cls, tool: Any, **kwargs: Any) -> "LlamaIndexTool":
# Implementation details
```
### from_query_engine
Creates a `LlamaIndexTool` from a LlamaIndex query engine.
```python Code
@classmethod
def from_query_engine(
cls,
query_engine: Any,
name: Optional[str] = None,
description: Optional[str] = None,
return_direct: bool = False,
**kwargs: Any,
) -> "LlamaIndexTool":
# Implementation details
```
## Parameters
The `from_query_engine` method accepts the following parameters:
- **query_engine**: Required. The LlamaIndex query engine to wrap.
- **name**: Optional. The name of the tool.
- **description**: Optional. The description of the tool.
- **return_direct**: Optional. Whether to return the response directly. Default is `False`.
## Conclusion
The `LlamaIndexTool` provides a powerful way to integrate LlamaIndex's capabilities into CrewAI agents. By wrapping LlamaIndex tools and query engines, it enables agents to leverage sophisticated data retrieval and processing functionalities, enhancing their ability to work with complex information sources.

View File

@@ -0,0 +1,64 @@
---
title: "Overview"
description: "Leverage AI services, generate images, process vision, and build intelligent systems"
icon: "face-smile"
---
These tools integrate with AI and machine learning services to enhance your agents with advanced capabilities like image generation, vision processing, and intelligent code execution.
## **Available Tools**
<CardGroup cols={2}>
<Card title="DALL-E Tool" icon="image" href="/en/tools/ai-ml/dalletool">
Generate AI images using OpenAI's DALL-E model.
</Card>
<Card title="Vision Tool" icon="eye" href="/en/tools/ai-ml/visiontool">
Process and analyze images with computer vision capabilities.
</Card>
<Card title="AI Mind Tool" icon="brain" href="/en/tools/ai-ml/aimindtool">
Advanced AI reasoning and decision-making capabilities.
</Card>
<Card title="LlamaIndex Tool" icon="llama" href="/en/tools/ai-ml/llamaindextool">
Build knowledge bases and retrieval systems with LlamaIndex.
</Card>
<Card title="LangChain Tool" icon="link" href="/en/tools/ai-ml/langchaintool">
Integrate with LangChain for complex AI workflows.
</Card>
<Card title="RAG Tool" icon="database" href="/en/tools/ai-ml/ragtool">
Implement Retrieval-Augmented Generation systems.
</Card>
<Card title="Code Interpreter Tool" icon="code" href="/en/tools/ai-ml/codeinterpretertool">
Execute Python code and perform data analysis.
</Card>
</CardGroup>
## **Common Use Cases**
- **Content Generation**: Create images, text, and multimedia content
- **Data Analysis**: Execute code and analyze complex datasets
- **Knowledge Systems**: Build RAG systems and intelligent databases
- **Computer Vision**: Process and understand visual content
- **AI Safety**: Implement content moderation and safety checks
```python
from crewai_tools import DallETool, VisionTool, CodeInterpreterTool
# Create AI tools
image_generator = DallETool()
vision_processor = VisionTool()
code_executor = CodeInterpreterTool()
# Add to your agent
agent = Agent(
role="AI Specialist",
tools=[image_generator, vision_processor, code_executor],
goal="Create and analyze content using AI capabilities"
)

View File

@@ -0,0 +1,172 @@
---
title: RAG Tool
description: The `RagTool` is a dynamic knowledge base tool for answering questions using Retrieval-Augmented Generation.
icon: vector-square
---
# `RagTool`
## Description
The `RagTool` is designed to answer questions by leveraging the power of Retrieval-Augmented Generation (RAG) through EmbedChain.
It provides a dynamic knowledge base that can be queried to retrieve relevant information from various data sources.
This tool is particularly useful for applications that require access to a vast array of information and need to provide contextually relevant answers.
## Example
The following example demonstrates how to initialize the tool and use it with different data sources:
```python Code
from crewai_tools import RagTool
# Create a RAG tool with default settings
rag_tool = RagTool()
# Add content from a file
rag_tool.add(data_type="file", path="path/to/your/document.pdf")
# Add content from a web page
rag_tool.add(data_type="web_page", url="https://example.com")
# Define an agent with the RagTool
@agent
def knowledge_expert(self) -> Agent:
'''
This agent uses the RagTool to answer questions about the knowledge base.
'''
return Agent(
config=self.agents_config["knowledge_expert"],
allow_delegation=False,
tools=[rag_tool]
)
```
## Supported Data Sources
The `RagTool` can be used with a wide variety of data sources, including:
- 📰 PDF files
- 📊 CSV files
- 📃 JSON files
- 📝 Text
- 📁 Directories/Folders
- 🌐 HTML Web pages
- 📽️ YouTube Channels
- 📺 YouTube Videos
- 📚 Documentation websites
- 📝 MDX files
- 📄 DOCX files
- 🧾 XML files
- 📬 Gmail
- 📝 GitHub repositories
- 🐘 PostgreSQL databases
- 🐬 MySQL databases
- 🤖 Slack conversations
- 💬 Discord messages
- 🗨️ Discourse forums
- 📝 Substack newsletters
- 🐝 Beehiiv content
- 💾 Dropbox files
- 🖼️ Images
- ⚙️ Custom data sources
## Parameters
The `RagTool` accepts the following parameters:
- **summarize**: Optional. Whether to summarize the retrieved content. Default is `False`.
- **adapter**: Optional. A custom adapter for the knowledge base. If not provided, an EmbedchainAdapter will be used.
- **config**: Optional. Configuration for the underlying EmbedChain App.
## Adding Content
You can add content to the knowledge base using the `add` method:
```python Code
# Add a PDF file
rag_tool.add(data_type="file", path="path/to/your/document.pdf")
# Add a web page
rag_tool.add(data_type="web_page", url="https://example.com")
# Add a YouTube video
rag_tool.add(data_type="youtube_video", url="https://www.youtube.com/watch?v=VIDEO_ID")
# Add a directory of files
rag_tool.add(data_type="directory", path="path/to/your/directory")
```
## Agent Integration Example
Here's how to integrate the `RagTool` with a CrewAI agent:
```python Code
from crewai import Agent
from crewai.project import agent
from crewai_tools import RagTool
# Initialize the tool and add content
rag_tool = RagTool()
rag_tool.add(data_type="web_page", url="https://docs.crewai.com")
rag_tool.add(data_type="file", path="company_data.pdf")
# Define an agent with the RagTool
@agent
def knowledge_expert(self) -> Agent:
return Agent(
config=self.agents_config["knowledge_expert"],
allow_delegation=False,
tools=[rag_tool]
)
```
## Advanced Configuration
You can customize the behavior of the `RagTool` by providing a configuration dictionary:
```python Code
from crewai_tools import RagTool
# Create a RAG tool with custom configuration
config = {
"app": {
"name": "custom_app",
},
"llm": {
"provider": "openai",
"config": {
"model": "gpt-4",
}
},
"embedding_model": {
"provider": "openai",
"config": {
"model": "text-embedding-ada-002"
}
},
"vectordb": {
"provider": "elasticsearch",
"config": {
"collection_name": "my-collection",
"cloud_id": "deployment-name:xxxx",
"api_key": "your-key",
"verify_certs": False
}
},
"chunker": {
"chunk_size": 400,
"chunk_overlap": 100,
"length_function": "len",
"min_chunk_size": 0
}
}
rag_tool = RagTool(config=config, summarize=True)
```
The internal RAG tool utilizes the Embedchain adapter, allowing you to pass any configuration options that are supported by Embedchain.
You can refer to the [Embedchain documentation](https://docs.embedchain.ai/components/introduction) for details.
Make sure to review the configuration options available in the .yaml file.
## Conclusion
The `RagTool` provides a powerful way to create and query knowledge bases from various data sources. By leveraging Retrieval-Augmented Generation, it enables agents to access and retrieve relevant information efficiently, enhancing their ability to provide accurate and contextually appropriate responses.

View File

@@ -0,0 +1,49 @@
---
title: Vision Tool
description: The `VisionTool` is designed to extract text from images.
icon: eye
---
# `VisionTool`
## Description
This tool is used to extract text from images. When passed to the agent it will extract the text from the image and then use it to generate a response, report or any other output.
The URL or the PATH of the image should be passed to the Agent.
## Installation
Install the crewai_tools package
```shell
pip install 'crewai[tools]'
```
## Usage
In order to use the VisionTool, the OpenAI API key should be set in the environment variable `OPENAI_API_KEY`.
```python Code
from crewai_tools import VisionTool
vision_tool = VisionTool()
@agent
def researcher(self) -> Agent:
'''
This agent uses the VisionTool to extract text from images.
'''
return Agent(
config=self.agents_config["researcher"],
allow_delegation=False,
tools=[vision_tool]
)
```
## Arguments
The VisionTool requires the following arguments:
| Argument | Type | Description |
| :----------------- | :------- | :------------------------------------------------------------------------------- |
| **image_path_url** | `string` | **Mandatory**. The path to the image file from which text needs to be extracted. |

View File

@@ -0,0 +1,99 @@
---
title: Apify Actors
description: "`ApifyActorsTool` lets you call Apify Actors to provide your CrewAI workflows with web scraping, crawling, data extraction, and web automation capabilities."
# hack to use custom Apify icon
icon: "); -webkit-mask-image: url('https://upload.wikimedia.org/wikipedia/commons/a/ae/Apify.svg');/*"
---
# `ApifyActorsTool`
Integrate [Apify Actors](https://apify.com/actors) into your CrewAI workflows.
## Description
The `ApifyActorsTool` connects [Apify Actors](https://apify.com/actors), cloud-based programs for web scraping and automation, to your CrewAI workflows.
Use any of the 4,000+ Actors on [Apify Store](https://apify.com/store) for use cases such as extracting data from social media, search engines, online maps, e-commerce sites, travel portals, or general websites.
For details, see the [Apify CrewAI integration](https://docs.apify.com/platform/integrations/crewai) in Apify documentation.
## Steps to get started
<Steps>
<Step title="Install dependencies">
Install `crewai[tools]` and `langchain-apify` using pip: `pip install 'crewai[tools]' langchain-apify`.
</Step>
<Step title="Obtain an Apify API token">
Sign up to [Apify Console](https://console.apify.com/) and get your [Apify API token](https://console.apify.com/settings/integrations)..
</Step>
<Step title="Configure environment">
Set your Apify API token as the `APIFY_API_TOKEN` environment variable to enable the tool's functionality.
</Step>
</Steps>
## Usage example
Use the `ApifyActorsTool` manually to run the [RAG Web Browser Actor](https://apify.com/apify/rag-web-browser) to perform a web search:
```python
from crewai_tools import ApifyActorsTool
# Initialize the tool with an Apify Actor
tool = ApifyActorsTool(actor_name="apify/rag-web-browser")
# Run the tool with input parameters
results = tool.run(run_input={"query": "What is CrewAI?", "maxResults": 5})
# Process the results
for result in results:
print(f"URL: {result['metadata']['url']}")
print(f"Content: {result.get('markdown', 'N/A')[:100]}...")
```
### Expected output
Here is the output from running the code above:
```text
URL: https://www.example.com/crewai-intro
Content: CrewAI is a framework for building AI-powered workflows...
URL: https://docs.crewai.com/
Content: Official documentation for CrewAI...
```
The `ApifyActorsTool` automatically fetches the Actor definition and input schema from Apify using the provided `actor_name` and then constructs the tool description and argument schema. This means you need to specify only a valid `actor_name`, and the tool handles the rest when used with agents—no need to specify the `run_input`. Here's how it works:
```python
from crewai import Agent
from crewai_tools import ApifyActorsTool
rag_browser = ApifyActorsTool(actor_name="apify/rag-web-browser")
agent = Agent(
role="Research Analyst",
goal="Find and summarize information about specific topics",
backstory="You are an experienced researcher with attention to detail",
tools=[rag_browser],
)
```
You can run other Actors from [Apify Store](https://apify.com/store) simply by changing the `actor_name` and, when using it manually, adjusting the `run_input` based on the Actor input schema.
For an example of usage with agents, see the [CrewAI Actor template](https://apify.com/templates/python-crewai).
## Configuration
The `ApifyActorsTool` requires these inputs to work:
- **`actor_name`**
The ID of the Apify Actor to run, e.g., `"apify/rag-web-browser"`. Browse all Actors on [Apify Store](https://apify.com/store).
- **`run_input`**
A dictionary of input parameters for the Actor when running the tool manually.
- For example, for the `apify/rag-web-browser` Actor: `{"query": "search term", "maxResults": 5}`
- See the Actor's [input schema](https://apify.com/apify/rag-web-browser/input-schema) for the list of input parameters.
## Resources
- **[Apify](https://apify.com/)**: Explore the Apify platform.
- **[How to build an AI agent on Apify](https://blog.apify.com/how-to-build-an-ai-agent/)** - A complete step-by-step guide to creating, publishing, and monetizing AI agents on the Apify platform.
- **[RAG Web Browser Actor](https://apify.com/apify/rag-web-browser)**: A popular Actor for web search for LLMs.
- **[CrewAI Integration Guide](https://docs.apify.com/platform/integrations/crewai)**: Follow the official guide for integrating Apify and CrewAI.

View File

@@ -0,0 +1,118 @@
---
title: Composio Tool
description: Composio provides 250+ production-ready tools for AI agents with flexible authentication management.
icon: gear-code
---
# `ComposioToolSet`
## Description
Composio is an integration platform that allows you to connect your AI agents to 250+ tools. Key features include:
- **Enterprise-Grade Authentication**: Built-in support for OAuth, API Keys, JWT with automatic token refresh
- **Full Observability**: Detailed tool usage logs, execution timestamps, and more
## Installation
To incorporate Composio tools into your project, follow the instructions below:
```shell
pip install composio-crewai
pip install crewai
```
After the installation is complete, either run `composio login` or export your composio API key as `COMPOSIO_API_KEY`. Get your Composio API key from [here](https://app.composio.dev)
## Example
The following example demonstrates how to initialize the tool and execute a github action:
1. Initialize Composio toolset
```python Code
from composio_crewai import ComposioToolSet, App, Action
from crewai import Agent, Task, Crew
toolset = ComposioToolSet()
```
2. Connect your GitHub account
<CodeGroup>
```shell CLI
composio add github
```
```python Code
request = toolset.initiate_connection(app=App.GITHUB)
print(f"Open this URL to authenticate: {request.redirectUrl}")
```
</CodeGroup>
3. Get Tools
- Retrieving all the tools from an app (not recommended for production):
```python Code
tools = toolset.get_tools(apps=[App.GITHUB])
```
- Filtering tools based on tags:
```python Code
tag = "users"
filtered_action_enums = toolset.find_actions_by_tags(
App.GITHUB,
tags=[tag],
)
tools = toolset.get_tools(actions=filtered_action_enums)
```
- Filtering tools based on use case:
```python Code
use_case = "Star a repository on GitHub"
filtered_action_enums = toolset.find_actions_by_use_case(
App.GITHUB, use_case=use_case, advanced=False
)
tools = toolset.get_tools(actions=filtered_action_enums)
```
<Tip>Set `advanced` to True to get actions for complex use cases</Tip>
- Using specific tools:
In this demo, we will use the `GITHUB_STAR_A_REPOSITORY_FOR_THE_AUTHENTICATED_USER` action from the GitHub app.
```python Code
tools = toolset.get_tools(
actions=[Action.GITHUB_STAR_A_REPOSITORY_FOR_THE_AUTHENTICATED_USER]
)
```
Learn more about filtering actions [here](https://docs.composio.dev/patterns/tools/use-tools/use-specific-actions)
4. Define agent
```python Code
crewai_agent = Agent(
role="GitHub Agent",
goal="You take action on GitHub using GitHub APIs",
backstory="You are AI agent that is responsible for taking actions on GitHub on behalf of users using GitHub APIs",
verbose=True,
tools=tools,
llm= # pass an llm
)
```
5. Execute task
```python Code
task = Task(
description="Star a repo composiohq/composio on GitHub",
agent=crewai_agent,
expected_output="Status of the operation",
)
crew = Crew(agents=[crewai_agent], tasks=[task])
crew.kickoff()
```
* More detailed list of tools can be found [here](https://app.composio.dev)

View File

@@ -0,0 +1,126 @@
---
title: MultiOn Tool
description: The `MultiOnTool` empowers CrewAI agents with the capability to navigate and interact with the web through natural language instructions.
icon: globe
---
## Overview
The `MultiOnTool` is designed to wrap [MultiOn's](https://docs.multion.ai/welcome) web browsing capabilities, enabling CrewAI agents to control web browsers using natural language instructions. This tool facilitates seamless web browsing, making it an essential asset for projects requiring dynamic web data interaction and automation of web-based tasks.
## Installation
To use this tool, you need to install the MultiOn package:
```shell
uv add multion
```
You'll also need to install the MultiOn browser extension and enable API usage.
## Steps to Get Started
To effectively use the `MultiOnTool`, follow these steps:
1. **Install CrewAI**: Ensure that the `crewai[tools]` package is installed in your Python environment.
2. **Install and use MultiOn**: Follow [MultiOn documentation](https://docs.multion.ai/learn/browser-extension) for installing the MultiOn Browser Extension.
3. **Enable API Usage**: Click on the MultiOn extension in the extensions folder of your browser (not the hovering MultiOn icon on the web page) to open the extension configurations. Click the API Enabled toggle to enable the API.
## Example
The following example demonstrates how to initialize the tool and execute a web browsing task:
```python Code
from crewai import Agent, Task, Crew
from crewai_tools import MultiOnTool
# Initialize the tool
multion_tool = MultiOnTool(api_key="YOUR_MULTION_API_KEY", local=False)
# Define an agent that uses the tool
browser_agent = Agent(
role="Browser Agent",
goal="Control web browsers using natural language",
backstory="An expert browsing agent.",
tools=[multion_tool],
verbose=True,
)
# Example task to search and summarize news
browse_task = Task(
description="Summarize the top 3 trending AI News headlines",
expected_output="A summary of the top 3 trending AI News headlines",
agent=browser_agent,
)
# Create and run the crew
crew = Crew(agents=[browser_agent], tasks=[browse_task])
result = crew.kickoff()
```
## Parameters
The `MultiOnTool` accepts the following parameters during initialization:
- **api_key**: Optional. Specifies the MultiOn API key. If not provided, it will look for the `MULTION_API_KEY` environment variable.
- **local**: Optional. Set to `True` to run the agent locally on your browser. Make sure the MultiOn browser extension is installed and API Enabled is checked. Default is `False`.
- **max_steps**: Optional. Sets the maximum number of steps the MultiOn agent can take for a command. Default is `3`.
## Usage
When using the `MultiOnTool`, the agent will provide natural language instructions that the tool translates into web browsing actions. The tool returns the results of the browsing session along with a status.
```python Code
# Example of using the tool with an agent
browser_agent = Agent(
role="Web Browser Agent",
goal="Search for and summarize information from the web",
backstory="An expert at finding and extracting information from websites.",
tools=[multion_tool],
verbose=True,
)
# Create a task for the agent
search_task = Task(
description="Search for the latest AI news on TechCrunch and summarize the top 3 headlines",
expected_output="A summary of the top 3 AI news headlines from TechCrunch",
agent=browser_agent,
)
# Run the task
crew = Crew(agents=[browser_agent], tasks=[search_task])
result = crew.kickoff()
```
If the status returned is `CONTINUE`, the agent should be instructed to reissue the same instruction to continue execution.
## Implementation Details
The `MultiOnTool` is implemented as a subclass of `BaseTool` from CrewAI. It wraps the MultiOn client to provide web browsing capabilities:
```python Code
class MultiOnTool(BaseTool):
"""Tool to wrap MultiOn Browse Capabilities."""
name: str = "Multion Browse Tool"
description: str = """Multion gives the ability for LLMs to control web browsers using natural language instructions.
If the status is 'CONTINUE', reissue the same instruction to continue execution
"""
# Implementation details...
def _run(self, cmd: str, *args: Any, **kwargs: Any) -> str:
"""
Run the Multion client with the given command.
Args:
cmd (str): The detailed and specific natural language instruction for web browsing
*args (Any): Additional arguments to pass to the Multion client
**kwargs (Any): Additional keyword arguments to pass to the Multion client
"""
# Implementation details...
```
## Conclusion
The `MultiOnTool` provides a powerful way to integrate web browsing capabilities into CrewAI agents. By enabling agents to interact with websites through natural language instructions, it opens up a wide range of possibilities for web-based tasks, from data collection and research to automated interactions with web services.

View File

@@ -0,0 +1,55 @@
---
title: "Overview"
description: "Automate workflows and integrate with external platforms and services"
icon: "face-smile"
---
These tools enable your agents to automate workflows, integrate with external platforms, and connect with various third-party services for enhanced functionality.
## **Available Tools**
<CardGroup cols={2}>
<Card title="Apify Actor Tool" icon="spider" href="/en/tools/automation/apifyactorstool">
Run Apify actors for web scraping and automation tasks.
</Card>
<Card title="Composio Tool" icon="puzzle-piece" href="/en/tools/automation/composiotool">
Integrate with hundreds of apps and services through Composio.
</Card>
<Card title="Multion Tool" icon="window-restore" href="/en/tools/automation/multiontool">
Automate browser interactions and web-based workflows.
</Card>
</CardGroup>
## **Common Use Cases**
- **Workflow Automation**: Automate repetitive tasks and processes
- **API Integration**: Connect with external APIs and services
- **Data Synchronization**: Sync data between different platforms
- **Process Orchestration**: Coordinate complex multi-step workflows
- **Third-party Services**: Leverage external tools and platforms
```python
from crewai_tools import ApifyActorTool, ComposioTool, MultiOnTool
# Create automation tools
apify_automation = ApifyActorTool()
platform_integration = ComposioTool()
browser_automation = MultiOnTool()
# Add to your agent
agent = Agent(
role="Automation Specialist",
tools=[apify_automation, platform_integration, browser_automation],
goal="Automate workflows and integrate systems"
)
```
## **Integration Benefits**
- **Efficiency**: Reduce manual work through automation
- **Scalability**: Handle increased workloads automatically
- **Reliability**: Consistent execution of workflows
- **Connectivity**: Bridge different systems and platforms
- **Productivity**: Focus on high-value tasks while automation handles routine work

View File

@@ -0,0 +1,187 @@
---
title: Bedrock Invoke Agent Tool
description: Enables CrewAI agents to invoke Amazon Bedrock Agents and leverage their capabilities within your workflows
icon: aws
---
# `BedrockInvokeAgentTool`
The `BedrockInvokeAgentTool` enables CrewAI agents to invoke Amazon Bedrock Agents and leverage their capabilities within your workflows.
## Installation
```bash
uv pip install 'crewai[tools]'
```
## Requirements
- AWS credentials configured (either through environment variables or AWS CLI)
- `boto3` and `python-dotenv` packages
- Access to Amazon Bedrock Agents
## Usage
Here's how to use the tool with a CrewAI agent:
```python {2, 4-8}
from crewai import Agent, Task, Crew
from crewai_tools.aws.bedrock.agents.invoke_agent_tool import BedrockInvokeAgentTool
# Initialize the tool
agent_tool = BedrockInvokeAgentTool(
agent_id="your-agent-id",
agent_alias_id="your-agent-alias-id"
)
# Create a CrewAI agent that uses the tool
aws_expert = Agent(
role='AWS Service Expert',
goal='Help users understand AWS services and quotas',
backstory='I am an expert in AWS services and can provide detailed information about them.',
tools=[agent_tool],
verbose=True
)
# Create a task for the agent
quota_task = Task(
description="Find out the current service quotas for EC2 in us-west-2 and explain any recent changes.",
agent=aws_expert
)
# Create a crew with the agent
crew = Crew(
agents=[aws_expert],
tasks=[quota_task],
verbose=2
)
# Run the crew
result = crew.kickoff()
print(result)
```
## Tool Arguments
| Argument | Type | Required | Default | Description |
|:---------|:-----|:---------|:--------|:------------|
| **agent_id** | `str` | Yes | None | The unique identifier of the Bedrock agent |
| **agent_alias_id** | `str` | Yes | None | The unique identifier of the agent alias |
| **session_id** | `str` | No | timestamp | The unique identifier of the session |
| **enable_trace** | `bool` | No | False | Whether to enable trace for debugging |
| **end_session** | `bool` | No | False | Whether to end the session after invocation |
| **description** | `str` | No | None | Custom description for the tool |
## Environment Variables
```bash
BEDROCK_AGENT_ID=your-agent-id # Alternative to passing agent_id
BEDROCK_AGENT_ALIAS_ID=your-agent-alias-id # Alternative to passing agent_alias_id
AWS_REGION=your-aws-region # Defaults to us-west-2
AWS_ACCESS_KEY_ID=your-access-key # Required for AWS authentication
AWS_SECRET_ACCESS_KEY=your-secret-key # Required for AWS authentication
```
## Advanced Usage
### Multi-Agent Workflow with Session Management
```python {2, 4-22}
from crewai import Agent, Task, Crew, Process
from crewai_tools.aws.bedrock.agents.invoke_agent_tool import BedrockInvokeAgentTool
# Initialize tools with session management
initial_tool = BedrockInvokeAgentTool(
agent_id="your-agent-id",
agent_alias_id="your-agent-alias-id",
session_id="custom-session-id"
)
followup_tool = BedrockInvokeAgentTool(
agent_id="your-agent-id",
agent_alias_id="your-agent-alias-id",
session_id="custom-session-id"
)
final_tool = BedrockInvokeAgentTool(
agent_id="your-agent-id",
agent_alias_id="your-agent-alias-id",
session_id="custom-session-id",
end_session=True
)
# Create agents for different stages
researcher = Agent(
role='AWS Service Researcher',
goal='Gather information about AWS services',
backstory='I am specialized in finding detailed AWS service information.',
tools=[initial_tool]
)
analyst = Agent(
role='Service Compatibility Analyst',
goal='Analyze service compatibility and requirements',
backstory='I analyze AWS services for compatibility and integration possibilities.',
tools=[followup_tool]
)
summarizer = Agent(
role='Technical Documentation Writer',
goal='Create clear technical summaries',
backstory='I specialize in creating clear, concise technical documentation.',
tools=[final_tool]
)
# Create tasks
research_task = Task(
description="Find all available AWS services in us-west-2 region.",
agent=researcher
)
analysis_task = Task(
description="Analyze which services support IPv6 and their implementation requirements.",
agent=analyst
)
summary_task = Task(
description="Create a summary of IPv6-compatible services and their key features.",
agent=summarizer
)
# Create a crew with the agents and tasks
crew = Crew(
agents=[researcher, analyst, summarizer],
tasks=[research_task, analysis_task, summary_task],
process=Process.sequential,
verbose=2
)
# Run the crew
result = crew.kickoff()
```
## Use Cases
### Hybrid Multi-Agent Collaborations
- Create workflows where CrewAI agents collaborate with managed Bedrock agents running as services in AWS
- Enable scenarios where sensitive data processing happens within your AWS environment while other agents operate externally
- Bridge on-premises CrewAI agents with cloud-based Bedrock agents for distributed intelligence workflows
### Data Sovereignty and Compliance
- Keep data-sensitive agentic workflows within your AWS environment while allowing external CrewAI agents to orchestrate tasks
- Maintain compliance with data residency requirements by processing sensitive information only within your AWS account
- Enable secure multi-agent collaborations where some agents cannot access your organization's private data
### Seamless AWS Service Integration
- Access any AWS service through Amazon Bedrock Actions without writing complex integration code
- Enable CrewAI agents to interact with AWS services through natural language requests
- Leverage pre-built Bedrock agent capabilities to interact with AWS services like Bedrock Knowledge Bases, Lambda, and more
### Scalable Hybrid Agent Architectures
- Offload computationally intensive tasks to managed Bedrock agents while lightweight tasks run in CrewAI
- Scale agent processing by distributing workloads between local CrewAI agents and cloud-based Bedrock agents
### Cross-Organizational Agent Collaboration
- Enable secure collaboration between your organization's CrewAI agents and partner organizations' Bedrock agents
- Create workflows where external expertise from Bedrock agents can be incorporated without exposing sensitive data
- Build agent ecosystems that span organizational boundaries while maintaining security and data control

View File

@@ -0,0 +1,165 @@
---
title: 'Bedrock Knowledge Base Retriever'
description: 'Retrieve information from Amazon Bedrock Knowledge Bases using natural language queries'
icon: aws
---
# `BedrockKBRetrieverTool`
The `BedrockKBRetrieverTool` enables CrewAI agents to retrieve information from Amazon Bedrock Knowledge Bases using natural language queries.
## Installation
```bash
uv pip install 'crewai[tools]'
```
## Requirements
- AWS credentials configured (either through environment variables or AWS CLI)
- `boto3` and `python-dotenv` packages
- Access to Amazon Bedrock Knowledge Base
## Usage
Here's how to use the tool with a CrewAI agent:
```python {2, 4-17}
from crewai import Agent, Task, Crew
from crewai_tools.aws.bedrock.knowledge_base.retriever_tool import BedrockKBRetrieverTool
# Initialize the tool
kb_tool = BedrockKBRetrieverTool(
knowledge_base_id="your-kb-id",
number_of_results=5
)
# Create a CrewAI agent that uses the tool
researcher = Agent(
role='Knowledge Base Researcher',
goal='Find information about company policies',
backstory='I am a researcher specialized in retrieving and analyzing company documentation.',
tools=[kb_tool],
verbose=True
)
# Create a task for the agent
research_task = Task(
description="Find our company's remote work policy and summarize the key points.",
agent=researcher
)
# Create a crew with the agent
crew = Crew(
agents=[researcher],
tasks=[research_task],
verbose=2
)
# Run the crew
result = crew.kickoff()
print(result)
```
## Tool Arguments
| Argument | Type | Required | Default | Description |
|:---------|:-----|:---------|:---------|:-------------|
| **knowledge_base_id** | `str` | Yes | None | The unique identifier of the knowledge base (0-10 alphanumeric characters) |
| **number_of_results** | `int` | No | 5 | Maximum number of results to return |
| **retrieval_configuration** | `dict` | No | None | Custom configurations for the knowledge base query |
| **guardrail_configuration** | `dict` | No | None | Content filtering settings |
| **next_token** | `str` | No | None | Token for pagination |
## Environment Variables
```bash
BEDROCK_KB_ID=your-knowledge-base-id # Alternative to passing knowledge_base_id
AWS_REGION=your-aws-region # Defaults to us-east-1
AWS_ACCESS_KEY_ID=your-access-key # Required for AWS authentication
AWS_SECRET_ACCESS_KEY=your-secret-key # Required for AWS authentication
```
## Response Format
The tool returns results in JSON format:
```json
{
"results": [
{
"content": "Retrieved text content",
"content_type": "text",
"source_type": "S3",
"source_uri": "s3://bucket/document.pdf",
"score": 0.95,
"metadata": {
"additional": "metadata"
}
}
],
"nextToken": "pagination-token",
"guardrailAction": "NONE"
}
```
## Advanced Usage
### Custom Retrieval Configuration
```python
kb_tool = BedrockKBRetrieverTool(
knowledge_base_id="your-kb-id",
retrieval_configuration={
"vectorSearchConfiguration": {
"numberOfResults": 10,
"overrideSearchType": "HYBRID"
}
}
)
policy_expert = Agent(
role='Policy Expert',
goal='Analyze company policies in detail',
backstory='I am an expert in corporate policy analysis with deep knowledge of regulatory requirements.',
tools=[kb_tool]
)
```
## Supported Data Sources
- Amazon S3
- Confluence
- Salesforce
- SharePoint
- Web pages
- Custom document locations
- Amazon Kendra
- SQL databases
## Use Cases
### Enterprise Knowledge Integration
- Enable CrewAI agents to access your organization's proprietary knowledge without exposing sensitive data
- Allow agents to make decisions based on your company's specific policies, procedures, and documentation
- Create agents that can answer questions based on your internal documentation while maintaining data security
### Specialized Domain Knowledge
- Connect CrewAI agents to domain-specific knowledge bases (legal, medical, technical) without retraining models
- Leverage existing knowledge repositories that are already maintained in your AWS environment
- Combine CrewAI's reasoning with domain-specific information from your knowledge bases
### Data-Driven Decision Making
- Ground CrewAI agent responses in your actual company data rather than general knowledge
- Ensure agents provide recommendations based on your specific business context and documentation
- Reduce hallucinations by retrieving factual information from your knowledge bases
### Scalable Information Access
- Access terabytes of organizational knowledge without embedding it all into your models
- Dynamically query only the relevant information needed for specific tasks
- Leverage AWS's scalable infrastructure to handle large knowledge bases efficiently
### Compliance and Governance
- Ensure CrewAI agents provide responses that align with your company's approved documentation
- Create auditable trails of information sources used by your agents
- Maintain control over what information sources your agents can access

View File

@@ -0,0 +1,50 @@
---
title: "Overview"
description: "Interact with cloud services, storage systems, and cloud-based AI platforms"
icon: "face-smile"
---
These tools enable your agents to interact with cloud services, access cloud storage, and leverage cloud-based AI platforms for scalable operations.
## **Available Tools**
<CardGroup cols={2}>
<Card title="S3 Reader Tool" icon="cloud" href="/en/tools/cloud-storage/s3readertool">
Read files and data from Amazon S3 buckets.
</Card>
<Card title="S3 Writer Tool" icon="cloud-arrow-up" href="/en/tools/cloud-storage/s3writertool">
Write and upload files to Amazon S3 storage.
</Card>
<Card title="Bedrock Invoke Agent" icon="aws" href="/en/tools/cloud-storage/bedrockinvokeagenttool">
Invoke Amazon Bedrock agents for AI-powered tasks.
</Card>
<Card title="Bedrock KB Retriever" icon="database" href="/en/tools/cloud-storage/bedrockkbretriever">
Retrieve information from Amazon Bedrock knowledge bases.
</Card>
</CardGroup>
## **Common Use Cases**
- **File Storage**: Store and retrieve files from cloud storage systems
- **Data Backup**: Backup important data to cloud storage
- **AI Services**: Access cloud-based AI models and services
- **Knowledge Retrieval**: Query cloud-hosted knowledge bases
- **Scalable Operations**: Leverage cloud infrastructure for processing
```python
from crewai_tools import S3ReaderTool, S3WriterTool, BedrockInvokeAgentTool
# Create cloud tools
s3_reader = S3ReaderTool()
s3_writer = S3WriterTool()
bedrock_agent = BedrockInvokeAgentTool()
# Add to your agent
agent = Agent(
role="Cloud Operations Specialist",
tools=[s3_reader, s3_writer, bedrock_agent],
goal="Manage cloud resources and AI services"
)

View File

@@ -0,0 +1,144 @@
---
title: S3 Reader Tool
description: The `S3ReaderTool` enables CrewAI agents to read files from Amazon S3 buckets.
icon: aws
---
# `S3ReaderTool`
## Description
The `S3ReaderTool` is designed to read files from Amazon S3 buckets. This tool allows CrewAI agents to access and retrieve content stored in S3, making it ideal for workflows that require reading data, configuration files, or any other content stored in AWS S3 storage.
## Installation
To use this tool, you need to install the required dependencies:
```shell
uv add boto3
```
## Steps to Get Started
To effectively use the `S3ReaderTool`, follow these steps:
1. **Install Dependencies**: Install the required packages using the command above.
2. **Configure AWS Credentials**: Set up your AWS credentials as environment variables.
3. **Initialize the Tool**: Create an instance of the tool.
4. **Specify S3 Path**: Provide the S3 path to the file you want to read.
## Example
The following example demonstrates how to use the `S3ReaderTool` to read a file from an S3 bucket:
```python Code
from crewai import Agent, Task, Crew
from crewai_tools.aws.s3 import S3ReaderTool
# Initialize the tool
s3_reader_tool = S3ReaderTool()
# Define an agent that uses the tool
file_reader_agent = Agent(
role="File Reader",
goal="Read files from S3 buckets",
backstory="An expert in retrieving and processing files from cloud storage.",
tools=[s3_reader_tool],
verbose=True,
)
# Example task to read a configuration file
read_task = Task(
description="Read the configuration file from {my_bucket} and summarize its contents.",
expected_output="A summary of the configuration file contents.",
agent=file_reader_agent,
)
# Create and run the crew
crew = Crew(agents=[file_reader_agent], tasks=[read_task])
result = crew.kickoff(inputs={"my_bucket": "s3://my-bucket/config/app-config.json"})
```
## Parameters
The `S3ReaderTool` accepts the following parameter when used by an agent:
- **file_path**: Required. The S3 file path in the format `s3://bucket-name/file-name`.
## AWS Credentials
The tool requires AWS credentials to access S3 buckets. You can configure these credentials using environment variables:
- **CREW_AWS_REGION**: The AWS region where your S3 bucket is located. Default is `us-east-1`.
- **CREW_AWS_ACCESS_KEY_ID**: Your AWS access key ID.
- **CREW_AWS_SEC_ACCESS_KEY**: Your AWS secret access key.
## Usage
When using the `S3ReaderTool` with an agent, the agent will need to provide the S3 file path:
```python Code
# Example of using the tool with an agent
file_reader_agent = Agent(
role="File Reader",
goal="Read files from S3 buckets",
backstory="An expert in retrieving and processing files from cloud storage.",
tools=[s3_reader_tool],
verbose=True,
)
# Create a task for the agent to read a specific file
read_config_task = Task(
description="Read the application configuration file from {my_bucket} and extract the database connection settings.",
expected_output="The database connection settings from the configuration file.",
agent=file_reader_agent,
)
# Run the task
crew = Crew(agents=[file_reader_agent], tasks=[read_config_task])
result = crew.kickoff(inputs={"my_bucket": "s3://my-bucket/config/app-config.json"})
```
## Error Handling
The `S3ReaderTool` includes error handling for common S3 issues:
- Invalid S3 path format
- Missing or inaccessible files
- Permission issues
- AWS credential problems
When an error occurs, the tool will return an error message that includes details about the issue.
## Implementation Details
The `S3ReaderTool` uses the AWS SDK for Python (boto3) to interact with S3:
```python Code
class S3ReaderTool(BaseTool):
name: str = "S3 Reader Tool"
description: str = "Reads a file from Amazon S3 given an S3 file path"
def _run(self, file_path: str) -> str:
try:
bucket_name, object_key = self._parse_s3_path(file_path)
s3 = boto3.client(
's3',
region_name=os.getenv('CREW_AWS_REGION', 'us-east-1'),
aws_access_key_id=os.getenv('CREW_AWS_ACCESS_KEY_ID'),
aws_secret_access_key=os.getenv('CREW_AWS_SEC_ACCESS_KEY')
)
# Read file content from S3
response = s3.get_object(Bucket=bucket_name, Key=object_key)
file_content = response['Body'].read().decode('utf-8')
return file_content
except ClientError as e:
return f"Error reading file from S3: {str(e)}"
```
## Conclusion
The `S3ReaderTool` provides a straightforward way to read files from Amazon S3 buckets. By enabling agents to access content stored in S3, it facilitates workflows that require cloud-based file access. This tool is particularly useful for data processing, configuration management, and any task that involves retrieving information from AWS S3 storage.

View File

@@ -0,0 +1,150 @@
---
title: S3 Writer Tool
description: The `S3WriterTool` enables CrewAI agents to write content to files in Amazon S3 buckets.
icon: aws
---
# `S3WriterTool`
## Description
The `S3WriterTool` is designed to write content to files in Amazon S3 buckets. This tool allows CrewAI agents to create or update files in S3, making it ideal for workflows that require storing data, saving configuration files, or persisting any other content to AWS S3 storage.
## Installation
To use this tool, you need to install the required dependencies:
```shell
uv add boto3
```
## Steps to Get Started
To effectively use the `S3WriterTool`, follow these steps:
1. **Install Dependencies**: Install the required packages using the command above.
2. **Configure AWS Credentials**: Set up your AWS credentials as environment variables.
3. **Initialize the Tool**: Create an instance of the tool.
4. **Specify S3 Path and Content**: Provide the S3 path where you want to write the file and the content to be written.
## Example
The following example demonstrates how to use the `S3WriterTool` to write content to a file in an S3 bucket:
```python Code
from crewai import Agent, Task, Crew
from crewai_tools.aws.s3 import S3WriterTool
# Initialize the tool
s3_writer_tool = S3WriterTool()
# Define an agent that uses the tool
file_writer_agent = Agent(
role="File Writer",
goal="Write content to files in S3 buckets",
backstory="An expert in storing and managing files in cloud storage.",
tools=[s3_writer_tool],
verbose=True,
)
# Example task to write a report
write_task = Task(
description="Generate a summary report of the quarterly sales data and save it to {my_bucket}.",
expected_output="Confirmation that the report was successfully saved to S3.",
agent=file_writer_agent,
)
# Create and run the crew
crew = Crew(agents=[file_writer_agent], tasks=[write_task])
result = crew.kickoff(inputs={"my_bucket": "s3://my-bucket/reports/quarterly-summary.txt"})
```
## Parameters
The `S3WriterTool` accepts the following parameters when used by an agent:
- **file_path**: Required. The S3 file path in the format `s3://bucket-name/file-name`.
- **content**: Required. The content to write to the file.
## AWS Credentials
The tool requires AWS credentials to access S3 buckets. You can configure these credentials using environment variables:
- **CREW_AWS_REGION**: The AWS region where your S3 bucket is located. Default is `us-east-1`.
- **CREW_AWS_ACCESS_KEY_ID**: Your AWS access key ID.
- **CREW_AWS_SEC_ACCESS_KEY**: Your AWS secret access key.
## Usage
When using the `S3WriterTool` with an agent, the agent will need to provide both the S3 file path and the content to write:
```python Code
# Example of using the tool with an agent
file_writer_agent = Agent(
role="File Writer",
goal="Write content to files in S3 buckets",
backstory="An expert in storing and managing files in cloud storage.",
tools=[s3_writer_tool],
verbose=True,
)
# Create a task for the agent to write a specific file
write_config_task = Task(
description="""
Create a configuration file with the following database settings:
- host: db.example.com
- port: 5432
- username: app_user
- password: secure_password
Save this configuration as JSON to {my_bucket}.
""",
expected_output="Confirmation that the configuration file was successfully saved to S3.",
agent=file_writer_agent,
)
# Run the task
crew = Crew(agents=[file_writer_agent], tasks=[write_config_task])
result = crew.kickoff(inputs={"my_bucket": "s3://my-bucket/config/db-config.json"})
```
## Error Handling
The `S3WriterTool` includes error handling for common S3 issues:
- Invalid S3 path format
- Permission issues (e.g., no write access to the bucket)
- AWS credential problems
- Bucket does not exist
When an error occurs, the tool will return an error message that includes details about the issue.
## Implementation Details
The `S3WriterTool` uses the AWS SDK for Python (boto3) to interact with S3:
```python Code
class S3WriterTool(BaseTool):
name: str = "S3 Writer Tool"
description: str = "Writes content to a file in Amazon S3 given an S3 file path"
def _run(self, file_path: str, content: str) -> str:
try:
bucket_name, object_key = self._parse_s3_path(file_path)
s3 = boto3.client(
's3',
region_name=os.getenv('CREW_AWS_REGION', 'us-east-1'),
aws_access_key_id=os.getenv('CREW_AWS_ACCESS_KEY_ID'),
aws_secret_access_key=os.getenv('CREW_AWS_SEC_ACCESS_KEY')
)
s3.put_object(Bucket=bucket_name, Key=object_key, Body=content.encode('utf-8'))
return f"Successfully wrote content to {file_path}"
except ClientError as e:
return f"Error writing file to S3: {str(e)}"
```
## Conclusion
The `S3WriterTool` provides a straightforward way to write content to files in Amazon S3 buckets. By enabling agents to create and update files in S3, it facilitates workflows that require cloud-based file storage. This tool is particularly useful for data persistence, configuration management, report generation, and any task that involves storing information in AWS S3 storage.

View File

@@ -0,0 +1,69 @@
---
title: MySQL RAG Search
description: The `MySQLSearchTool` is designed to search MySQL databases and return the most relevant results.
icon: database
---
## Overview
This tool is designed to facilitate semantic searches within MySQL database tables. Leveraging the RAG (Retrieve and Generate) technology,
the MySQLSearchTool provides users with an efficient means of querying database table content, specifically tailored for MySQL databases.
It simplifies the process of finding relevant data through semantic search queries, making it an invaluable resource for users needing
to perform advanced queries on extensive datasets within a MySQL database.
## Installation
To install the `crewai_tools` package and utilize the MySQLSearchTool, execute the following command in your terminal:
```shell
pip install 'crewai[tools]'
```
## Example
Below is an example showcasing how to use the MySQLSearchTool to conduct a semantic search on a table within a MySQL database:
```python Code
from crewai_tools import MySQLSearchTool
# Initialize the tool with the database URI and the target table name
tool = MySQLSearchTool(
db_uri='mysql://user:password@localhost:3306/mydatabase',
table_name='employees'
)
```
## Arguments
The MySQLSearchTool requires the following arguments for its operation:
- `db_uri`: A string representing the URI of the MySQL database to be queried. This argument is mandatory and must include the necessary authentication details and the location of the database.
- `table_name`: A string specifying the name of the table within the database on which the semantic search will be performed. This argument is mandatory.
## Custom model and embeddings
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
```python Code
tool = MySQLSearchTool(
config=dict(
llm=dict(
provider="ollama", # or google, openai, anthropic, llama2, ...
config=dict(
model="llama2",
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="google",
config=dict(
model="models/embedding-001",
task_type="retrieval_document",
# title="Embeddings",
),
),
)
)
```

View File

@@ -0,0 +1,83 @@
---
title: NL2SQL Tool
description: The `NL2SQLTool` is designed to convert natural language to SQL queries.
icon: language
---
## Overview
This tool is used to convert natural language to SQL queries. When passed to the agent it will generate queries and then use them to interact with the database.
This enables multiple workflows like having an Agent to access the database fetch information based on the goal and then use the information to generate a response, report or any other output.
Along with that provides the ability for the Agent to update the database based on its goal.
**Attention**: Make sure that the Agent has access to a Read-Replica or that is okay for the Agent to run insert/update queries on the database.
## Requirements
- SqlAlchemy
- Any DB compatible library (e.g. psycopg2, mysql-connector-python)
## Installation
Install the crewai_tools package
```shell
pip install 'crewai[tools]'
```
## Usage
In order to use the NL2SQLTool, you need to pass the database URI to the tool. The URI should be in the format `dialect+driver://username:password@host:port/database`.
```python Code
from crewai_tools import NL2SQLTool
# psycopg2 was installed to run this example with PostgreSQL
nl2sql = NL2SQLTool(db_uri="postgresql://example@localhost:5432/test_db")
@agent
def researcher(self) -> Agent:
return Agent(
config=self.agents_config["researcher"],
allow_delegation=False,
tools=[nl2sql]
)
```
## Example
The primary task goal was:
"Retrieve the average, maximum, and minimum monthly revenue for each city, but only include cities that have more than one user. Also, count the number of user in each city and
sort the results by the average monthly revenue in descending order"
So the Agent tried to get information from the DB, the first one is wrong so the Agent tries again and gets the correct information and passes to the next agent.
![alt text](https://github.com/crewAIInc/crewAI-tools/blob/main/crewai_tools/tools/nl2sql/images/image-2.png?raw=true)
![alt text](https://github.com/crewAIInc/crewAI-tools/raw/main/crewai_tools/tools/nl2sql/images/image-3.png)
The second task goal was:
"Review the data and create a detailed report, and then create the table on the database with the fields based on the data provided.
Include information on the average, maximum, and minimum monthly revenue for each city, but only include cities that have more than one user. Also, count the number of users in each city and sort the results by the average monthly revenue in descending order."
Now things start to get interesting, the Agent generates the SQL query to not only create the table but also insert the data into the table. And in the end the Agent still returns the final report which is exactly what was in the database.
![alt text](https://github.com/crewAIInc/crewAI-tools/raw/main/crewai_tools/tools/nl2sql/images/image-4.png)
![alt text](https://github.com/crewAIInc/crewAI-tools/raw/main/crewai_tools/tools/nl2sql/images/image-5.png)
![alt text](https://github.com/crewAIInc/crewAI-tools/raw/main/crewai_tools/tools/nl2sql/images/image-9.png)
![alt text](https://github.com/crewAIInc/crewAI-tools/raw/main/crewai_tools/tools/nl2sql/images/image-7.png)
This is a simple example of how the NL2SQLTool can be used to interact with the database and generate reports based on the data in the database.
The Tool provides endless possibilities on the logic of the Agent and how it can interact with the database.
```md
DB -> Agent -> ... -> Agent -> DB
```

View File

@@ -0,0 +1,57 @@
---
title: "Overview"
description: "Connect to databases, vector stores, and data warehouses for comprehensive data access"
icon: "face-smile"
---
These tools enable your agents to interact with various database systems, from traditional SQL databases to modern vector stores and data warehouses.
## **Available Tools**
<CardGroup cols={2}>
<Card title="MySQL Tool" icon="database" href="/en/tools/database-data/mysqltool">
Connect to and query MySQL databases with SQL operations.
</Card>
<Card title="PostgreSQL Search" icon="elephant" href="/en/tools/database-data/pgsearchtool">
Search and query PostgreSQL databases efficiently.
</Card>
<Card title="Snowflake Search" icon="snowflake" href="/en/tools/database-data/snowflakesearchtool">
Access Snowflake data warehouse for analytics and reporting.
</Card>
<Card title="NL2SQL Tool" icon="language" href="/en/tools/database-data/nl2sqltool">
Convert natural language queries to SQL statements automatically.
</Card>
<Card title="Qdrant Vector Search" icon="vector-square" href="/en/tools/database-data/qdrantvectorsearchtool">
Search vector embeddings using Qdrant vector database.
</Card>
<Card title="Weaviate Vector Search" icon="network-wired" href="/en/tools/database-data/weaviatevectorsearchtool">
Perform semantic search with Weaviate vector database.
</Card>
</CardGroup>
## **Common Use Cases**
- **Data Analysis**: Query databases for business intelligence and reporting
- **Vector Search**: Find similar content using semantic embeddings
- **ETL Operations**: Extract, transform, and load data between systems
- **Real-time Analytics**: Access live data for decision making
```python
from crewai_tools import MySQLTool, QdrantVectorSearchTool, NL2SQLTool
# Create database tools
mysql_db = MySQLTool()
vector_search = QdrantVectorSearchTool()
nl_to_sql = NL2SQLTool()
# Add to your agent
agent = Agent(
role="Data Analyst",
tools=[mysql_db, vector_search, nl_to_sql],
goal="Extract insights from various data sources"
)

View File

@@ -0,0 +1,82 @@
---
title: PG RAG Search
description: The `PGSearchTool` is designed to search PostgreSQL databases and return the most relevant results.
icon: elephant
---
## Overview
<Note>
The PGSearchTool is currently under development. This document outlines the intended functionality and interface.
As development progresses, please be aware that some features may not be available or could change.
</Note>
## Description
The PGSearchTool is envisioned as a powerful tool for facilitating semantic searches within PostgreSQL database tables. By leveraging advanced Retrieve and Generate (RAG) technology,
it aims to provide an efficient means for querying database table content, specifically tailored for PostgreSQL databases.
The tool's goal is to simplify the process of finding relevant data through semantic search queries, offering a valuable resource for users needing to conduct advanced queries on
extensive datasets within a PostgreSQL environment.
## Installation
The `crewai_tools` package, which will include the PGSearchTool upon its release, can be installed using the following command:
```shell
pip install 'crewai[tools]'
```
<Note>
The PGSearchTool is not yet available in the current version of the `crewai_tools` package. This installation command will be updated once the tool is released.
</Note>
## Example Usage
Below is a proposed example showcasing how to use the PGSearchTool for conducting a semantic search on a table within a PostgreSQL database:
```python Code
from crewai_tools import PGSearchTool
# Initialize the tool with the database URI and the target table name
tool = PGSearchTool(
db_uri='postgresql://user:password@localhost:5432/mydatabase',
table_name='employees'
)
```
## Arguments
The PGSearchTool is designed to require the following arguments for its operation:
| Argument | Type | Description |
|:---------------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------|
| **db_uri** | `string` | **Mandatory**. A string representing the URI of the PostgreSQL database to be queried. This argument will be mandatory and must include the necessary authentication details and the location of the database. |
| **table_name** | `string` | **Mandatory**. A string specifying the name of the table within the database on which the semantic search will be performed. This argument will also be mandatory. |
## Custom Model and Embeddings
The tool intends to use OpenAI for both embeddings and summarization by default. Users will have the option to customize the model using a config dictionary as follows:
```python Code
tool = PGSearchTool(
config=dict(
llm=dict(
provider="ollama", # or google, openai, anthropic, llama2, ...
config=dict(
model="llama2",
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="google", # or openai, ollama, ...
config=dict(
model="models/embedding-001",
task_type="retrieval_document",
# title="Embeddings",
),
),
)
)
```

View File

@@ -0,0 +1,271 @@
---
title: 'Qdrant Vector Search Tool'
description: 'Semantic search capabilities for CrewAI agents using Qdrant vector database'
icon: vector-square
---
## Overview
The Qdrant Vector Search Tool enables semantic search capabilities in your CrewAI agents by leveraging [Qdrant](https://qdrant.tech/), a vector similarity search engine. This tool allows your agents to search through documents stored in a Qdrant collection using semantic similarity.
## Installation
Install the required packages:
```bash
uv add qdrant-client
```
## Basic Usage
Here's a minimal example of how to use the tool:
```python
from crewai import Agent
from crewai_tools import QdrantVectorSearchTool
# Initialize the tool
qdrant_tool = QdrantVectorSearchTool(
qdrant_url="your_qdrant_url",
qdrant_api_key="your_qdrant_api_key",
collection_name="your_collection"
)
# Create an agent that uses the tool
agent = Agent(
role="Research Assistant",
goal="Find relevant information in documents",
tools=[qdrant_tool]
)
# The tool will automatically use OpenAI embeddings
# and return the 3 most relevant results with scores > 0.35
```
## Complete Working Example
Here's a complete example showing how to:
1. Extract text from a PDF
2. Generate embeddings using OpenAI
3. Store in Qdrant
4. Create a CrewAI agentic RAG workflow for semantic search
```python
import os
import uuid
import pdfplumber
from openai import OpenAI
from dotenv import load_dotenv
from crewai import Agent, Task, Crew, Process, LLM
from crewai_tools import QdrantVectorSearchTool
from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct, Distance, VectorParams
# Load environment variables
load_dotenv()
# Initialize OpenAI client
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
# Extract text from PDF
def extract_text_from_pdf(pdf_path):
text = []
with pdfplumber.open(pdf_path) as pdf:
for page in pdf.pages:
page_text = page.extract_text()
if page_text:
text.append(page_text.strip())
return text
# Generate OpenAI embeddings
def get_openai_embedding(text):
response = client.embeddings.create(
input=text,
model="text-embedding-3-small"
)
return response.data[0].embedding
# Store text and embeddings in Qdrant
def load_pdf_to_qdrant(pdf_path, qdrant, collection_name):
# Extract text from PDF
text_chunks = extract_text_from_pdf(pdf_path)
# Create Qdrant collection
if qdrant.collection_exists(collection_name):
qdrant.delete_collection(collection_name)
qdrant.create_collection(
collection_name=collection_name,
vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)
# Store embeddings
points = []
for chunk in text_chunks:
embedding = get_openai_embedding(chunk)
points.append(PointStruct(
id=str(uuid.uuid4()),
vector=embedding,
payload={"text": chunk}
))
qdrant.upsert(collection_name=collection_name, points=points)
# Initialize Qdrant client and load data
qdrant = QdrantClient(
url=os.getenv("QDRANT_URL"),
api_key=os.getenv("QDRANT_API_KEY")
)
collection_name = "example_collection"
pdf_path = "path/to/your/document.pdf"
load_pdf_to_qdrant(pdf_path, qdrant, collection_name)
# Initialize Qdrant search tool
qdrant_tool = QdrantVectorSearchTool(
qdrant_url=os.getenv("QDRANT_URL"),
qdrant_api_key=os.getenv("QDRANT_API_KEY"),
collection_name=collection_name,
limit=3,
score_threshold=0.35
)
# Create CrewAI agents
search_agent = Agent(
role="Senior Semantic Search Agent",
goal="Find and analyze documents based on semantic search",
backstory="""You are an expert research assistant who can find relevant
information using semantic search in a Qdrant database.""",
tools=[qdrant_tool],
verbose=True
)
answer_agent = Agent(
role="Senior Answer Assistant",
goal="Generate answers to questions based on the context provided",
backstory="""You are an expert answer assistant who can generate
answers to questions based on the context provided.""",
tools=[qdrant_tool],
verbose=True
)
# Define tasks
search_task = Task(
description="""Search for relevant documents about the {query}.
Your final answer should include:
- The relevant information found
- The similarity scores of the results
- The metadata of the relevant documents""",
agent=search_agent
)
answer_task = Task(
description="""Given the context and metadata of relevant documents,
generate a final answer based on the context.""",
agent=answer_agent
)
# Run CrewAI workflow
crew = Crew(
agents=[search_agent, answer_agent],
tasks=[search_task, answer_task],
process=Process.sequential,
verbose=True
)
result = crew.kickoff(
inputs={"query": "What is the role of X in the document?"}
)
print(result)
```
## Tool Parameters
### Required Parameters
- `qdrant_url` (str): The URL of your Qdrant server
- `qdrant_api_key` (str): API key for authentication with Qdrant
- `collection_name` (str): Name of the Qdrant collection to search
### Optional Parameters
- `limit` (int): Maximum number of results to return (default: 3)
- `score_threshold` (float): Minimum similarity score threshold (default: 0.35)
- `custom_embedding_fn` (Callable[[str], list[float]]): Custom function for text vectorization
## Search Parameters
The tool accepts these parameters in its schema:
- `query` (str): The search query to find similar documents
- `filter_by` (str, optional): Metadata field to filter on
- `filter_value` (str, optional): Value to filter by
## Return Format
The tool returns results in JSON format:
```json
[
{
"metadata": {
// Any metadata stored with the document
},
"context": "The actual text content of the document",
"distance": 0.95 // Similarity score
}
]
```
## Default Embedding
By default, the tool uses OpenAI's `text-embedding-3-small` model for vectorization. This requires:
- OpenAI API key set in environment: `OPENAI_API_KEY`
## Custom Embeddings
Instead of using the default embedding model, you might want to use your own embedding function in cases where you:
1. Want to use a different embedding model (e.g., Cohere, HuggingFace, Ollama models)
2. Need to reduce costs by using open-source embedding models
3. Have specific requirements for vector dimensions or embedding quality
4. Want to use domain-specific embeddings (e.g., for medical or legal text)
Here's an example using a HuggingFace model:
```python
from transformers import AutoTokenizer, AutoModel
import torch
# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
def custom_embeddings(text: str) -> list[float]:
# Tokenize and get model outputs
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
outputs = model(**inputs)
# Use mean pooling to get text embedding
embeddings = outputs.last_hidden_state.mean(dim=1)
# Convert to list of floats and return
return embeddings[0].tolist()
# Use custom embeddings with the tool
tool = QdrantVectorSearchTool(
qdrant_url="your_url",
qdrant_api_key="your_key",
collection_name="your_collection",
custom_embedding_fn=custom_embeddings # Pass your custom function
)
```
## Error Handling
The tool handles these specific errors:
- Raises ImportError if `qdrant-client` is not installed (with option to auto-install)
- Raises ValueError if `QDRANT_URL` is not set
- Prompts to install `qdrant-client` if missing using `uv add qdrant-client`
## Environment Variables
Required environment variables:
```bash
export QDRANT_URL="your_qdrant_url" # If not provided in constructor
export QDRANT_API_KEY="your_api_key" # If not provided in constructor
export OPENAI_API_KEY="your_openai_key" # If using default embeddings

View File

@@ -0,0 +1,202 @@
---
title: Snowflake Search Tool
description: The `SnowflakeSearchTool` enables CrewAI agents to execute SQL queries and perform semantic search on Snowflake data warehouses.
icon: snowflake
---
# `SnowflakeSearchTool`
## Description
The `SnowflakeSearchTool` is designed to connect to Snowflake data warehouses and execute SQL queries with advanced features like connection pooling, retry logic, and asynchronous execution. This tool allows CrewAI agents to interact with Snowflake databases, making it ideal for data analysis, reporting, and business intelligence tasks that require access to enterprise data stored in Snowflake.
## Installation
To use this tool, you need to install the required dependencies:
```shell
uv add cryptography snowflake-connector-python snowflake-sqlalchemy
```
Or alternatively:
```shell
uv sync --extra snowflake
```
## Steps to Get Started
To effectively use the `SnowflakeSearchTool`, follow these steps:
1. **Install Dependencies**: Install the required packages using one of the commands above.
2. **Configure Snowflake Connection**: Create a `SnowflakeConfig` object with your Snowflake credentials.
3. **Initialize the Tool**: Create an instance of the tool with the necessary configuration.
4. **Execute Queries**: Use the tool to run SQL queries against your Snowflake database.
## Example
The following example demonstrates how to use the `SnowflakeSearchTool` to query data from a Snowflake database:
```python Code
from crewai import Agent, Task, Crew
from crewai_tools import SnowflakeSearchTool, SnowflakeConfig
# Create Snowflake configuration
config = SnowflakeConfig(
account="your_account",
user="your_username",
password="your_password",
warehouse="COMPUTE_WH",
database="your_database",
snowflake_schema="your_schema"
)
# Initialize the tool
snowflake_tool = SnowflakeSearchTool(config=config)
# Define an agent that uses the tool
data_analyst_agent = Agent(
role="Data Analyst",
goal="Analyze data from Snowflake database",
backstory="An expert data analyst who can extract insights from enterprise data.",
tools=[snowflake_tool],
verbose=True,
)
# Example task to query sales data
query_task = Task(
description="Query the sales data for the last quarter and summarize the top 5 products by revenue.",
expected_output="A summary of the top 5 products by revenue for the last quarter.",
agent=data_analyst_agent,
)
# Create and run the crew
crew = Crew(agents=[data_analyst_agent],
tasks=[query_task])
result = crew.kickoff()
```
You can also customize the tool with additional parameters:
```python Code
# Initialize the tool with custom parameters
snowflake_tool = SnowflakeSearchTool(
config=config,
pool_size=10,
max_retries=5,
retry_delay=2.0,
enable_caching=True
)
```
## Parameters
### SnowflakeConfig Parameters
The `SnowflakeConfig` class accepts the following parameters:
- **account**: Required. Snowflake account identifier.
- **user**: Required. Snowflake username.
- **password**: Optional*. Snowflake password.
- **private_key_path**: Optional*. Path to private key file (alternative to password).
- **warehouse**: Required. Snowflake warehouse name.
- **database**: Required. Default database.
- **snowflake_schema**: Required. Default schema.
- **role**: Optional. Snowflake role.
- **session_parameters**: Optional. Custom session parameters as a dictionary.
*Either `password` or `private_key_path` must be provided.
### SnowflakeSearchTool Parameters
The `SnowflakeSearchTool` accepts the following parameters during initialization:
- **config**: Required. A `SnowflakeConfig` object containing connection details.
- **pool_size**: Optional. Number of connections in the pool. Default is 5.
- **max_retries**: Optional. Maximum retry attempts for failed queries. Default is 3.
- **retry_delay**: Optional. Delay between retries in seconds. Default is 1.0.
- **enable_caching**: Optional. Whether to enable query result caching. Default is True.
## Usage
When using the `SnowflakeSearchTool`, you need to provide the following parameters:
- **query**: Required. The SQL query to execute.
- **database**: Optional. Override the default database specified in the config.
- **snowflake_schema**: Optional. Override the default schema specified in the config.
- **timeout**: Optional. Query timeout in seconds. Default is 300.
The tool will return the query results as a list of dictionaries, where each dictionary represents a row with column names as keys.
```python Code
# Example of using the tool with an agent
data_analyst = Agent(
role="Data Analyst",
goal="Analyze sales data from Snowflake",
backstory="An expert data analyst with experience in SQL and data visualization.",
tools=[snowflake_tool],
verbose=True
)
# The agent will use the tool with parameters like:
# query="SELECT product_name, SUM(revenue) as total_revenue FROM sales GROUP BY product_name ORDER BY total_revenue DESC LIMIT 5"
# timeout=600
# Create a task for the agent
analysis_task = Task(
description="Query the sales database and identify the top 5 products by revenue for the last quarter.",
expected_output="A detailed analysis of the top 5 products by revenue.",
agent=data_analyst
)
# Run the task
crew = Crew(
agents=[data_analyst],
tasks=[analysis_task]
)
result = crew.kickoff()
```
## Advanced Features
### Connection Pooling
The `SnowflakeSearchTool` implements connection pooling to improve performance by reusing database connections. You can control the pool size with the `pool_size` parameter.
### Automatic Retries
The tool automatically retries failed queries with exponential backoff. You can configure the retry behavior with the `max_retries` and `retry_delay` parameters.
### Query Result Caching
To improve performance for repeated queries, the tool can cache query results. This feature is enabled by default but can be disabled by setting `enable_caching=False`.
### Key-Pair Authentication
In addition to password authentication, the tool supports key-pair authentication for enhanced security:
```python Code
config = SnowflakeConfig(
account="your_account",
user="your_username",
private_key_path="/path/to/your/private/key.p8",
warehouse="COMPUTE_WH",
database="your_database",
snowflake_schema="your_schema"
)
```
## Error Handling
The `SnowflakeSearchTool` includes comprehensive error handling for common Snowflake issues:
- Connection failures
- Query timeouts
- Authentication errors
- Database and schema errors
When an error occurs, the tool will attempt to retry the operation (if configured) and provide detailed error information.
## Conclusion
The `SnowflakeSearchTool` provides a powerful way to integrate Snowflake data warehouses with CrewAI agents. With features like connection pooling, automatic retries, and query caching, it enables efficient and reliable access to enterprise data. This tool is particularly useful for data analysis, reporting, and business intelligence tasks that require access to structured data stored in Snowflake.

View File

@@ -0,0 +1,163 @@
---
title: Weaviate Vector Search
description: The `WeaviateVectorSearchTool` is designed to search a Weaviate vector database for semantically similar documents.
icon: network-wired
---
## Overview
The `WeaviateVectorSearchTool` is specifically crafted for conducting semantic searches within documents stored in a Weaviate vector database. This tool allows you to find semantically similar documents to a given query, leveraging the power of vector embeddings for more accurate and contextually relevant search results.
[Weaviate](https://weaviate.io/) is a vector database that stores and queries vector embeddings, enabling semantic search capabilities.
## Installation
To incorporate this tool into your project, you need to install the Weaviate client:
```shell
uv add weaviate-client
```
## Steps to Get Started
To effectively use the `WeaviateVectorSearchTool`, follow these steps:
1. **Package Installation**: Confirm that the `crewai[tools]` and `weaviate-client` packages are installed in your Python environment.
2. **Weaviate Setup**: Set up a Weaviate cluster. You can follow the [Weaviate documentation](https://weaviate.io/developers/wcs/manage-clusters/connect) for instructions.
3. **API Keys**: Obtain your Weaviate cluster URL and API key.
4. **OpenAI API Key**: Ensure you have an OpenAI API key set in your environment variables as `OPENAI_API_KEY`.
## Example
The following example demonstrates how to initialize the tool and execute a search:
```python Code
from crewai_tools import WeaviateVectorSearchTool
# Initialize the tool
tool = WeaviateVectorSearchTool(
collection_name='example_collections',
limit=3,
weaviate_cluster_url="https://your-weaviate-cluster-url.com",
weaviate_api_key="your-weaviate-api-key",
)
@agent
def search_agent(self) -> Agent:
'''
This agent uses the WeaviateVectorSearchTool to search for
semantically similar documents in a Weaviate vector database.
'''
return Agent(
config=self.agents_config["search_agent"],
tools=[tool]
)
```
## Parameters
The `WeaviateVectorSearchTool` accepts the following parameters:
- **collection_name**: Required. The name of the collection to search within.
- **weaviate_cluster_url**: Required. The URL of the Weaviate cluster.
- **weaviate_api_key**: Required. The API key for the Weaviate cluster.
- **limit**: Optional. The number of results to return. Default is `3`.
- **vectorizer**: Optional. The vectorizer to use. If not provided, it will use `text2vec_openai` with the `nomic-embed-text` model.
- **generative_model**: Optional. The generative model to use. If not provided, it will use OpenAI's `gpt-4o`.
## Advanced Configuration
You can customize the vectorizer and generative model used by the tool:
```python Code
from crewai_tools import WeaviateVectorSearchTool
from weaviate.classes.config import Configure
# Setup custom model for vectorizer and generative model
tool = WeaviateVectorSearchTool(
collection_name='example_collections',
limit=3,
vectorizer=Configure.Vectorizer.text2vec_openai(model="nomic-embed-text"),
generative_model=Configure.Generative.openai(model="gpt-4o-mini"),
weaviate_cluster_url="https://your-weaviate-cluster-url.com",
weaviate_api_key="your-weaviate-api-key",
)
```
## Preloading Documents
You can preload your Weaviate database with documents before using the tool:
```python Code
import os
from crewai_tools import WeaviateVectorSearchTool
import weaviate
from weaviate.classes.init import Auth
# Connect to Weaviate
client = weaviate.connect_to_weaviate_cloud(
cluster_url="https://your-weaviate-cluster-url.com",
auth_credentials=Auth.api_key("your-weaviate-api-key"),
headers={"X-OpenAI-Api-Key": "your-openai-api-key"}
)
# Get or create collection
test_docs = client.collections.get("example_collections")
if not test_docs:
test_docs = client.collections.create(
name="example_collections",
vectorizer_config=Configure.Vectorizer.text2vec_openai(model="nomic-embed-text"),
generative_config=Configure.Generative.openai(model="gpt-4o"),
)
# Load documents
docs_to_load = os.listdir("knowledge")
with test_docs.batch.dynamic() as batch:
for d in docs_to_load:
with open(os.path.join("knowledge", d), "r") as f:
content = f.read()
batch.add_object(
{
"content": content,
"year": d.split("_")[0],
}
)
# Initialize the tool
tool = WeaviateVectorSearchTool(
collection_name='example_collections',
limit=3,
weaviate_cluster_url="https://your-weaviate-cluster-url.com",
weaviate_api_key="your-weaviate-api-key",
)
```
## Agent Integration Example
Here's how to integrate the `WeaviateVectorSearchTool` with a CrewAI agent:
```python Code
from crewai import Agent
from crewai_tools import WeaviateVectorSearchTool
# Initialize the tool
weaviate_tool = WeaviateVectorSearchTool(
collection_name='example_collections',
limit=3,
weaviate_cluster_url="https://your-weaviate-cluster-url.com",
weaviate_api_key="your-weaviate-api-key",
)
# Create an agent with the tool
rag_agent = Agent(
name="rag_agent",
role="You are a helpful assistant that can answer questions with the help of the WeaviateVectorSearchTool.",
llm="gpt-4o-mini",
tools=[weaviate_tool],
)
```
## Conclusion
The `WeaviateVectorSearchTool` provides a powerful way to search for semantically similar documents in a Weaviate vector database. By leveraging vector embeddings, it enables more accurate and contextually relevant search results compared to traditional keyword-based searches. This tool is particularly useful for applications that require finding information based on meaning rather than exact matches.

View File

@@ -0,0 +1,77 @@
---
title: CSV RAG Search
description: The `CSVSearchTool` is a powerful RAG (Retrieval-Augmented Generation) tool designed for semantic searches within a CSV file's content.
icon: file-csv
---
# `CSVSearchTool`
<Note>
**Experimental**: We are still working on improving tools, so there might be unexpected behavior or changes in the future.
</Note>
## Description
This tool is used to perform a RAG (Retrieval-Augmented Generation) search within a CSV file's content. It allows users to semantically search for queries in the content of a specified CSV file.
This feature is particularly useful for extracting information from large CSV datasets where traditional search methods might be inefficient. All tools with "Search" in their name, including CSVSearchTool,
are RAG tools designed for searching different sources of data.
## Installation
Install the crewai_tools package
```shell
pip install 'crewai[tools]'
```
## Example
```python Code
from crewai_tools import CSVSearchTool
# Initialize the tool with a specific CSV file.
# This setup allows the agent to only search the given CSV file.
tool = CSVSearchTool(csv='path/to/your/csvfile.csv')
# OR
# Initialize the tool without a specific CSV file.
# Agent will need to provide the CSV path at runtime.
tool = CSVSearchTool()
```
## Arguments
The following parameters can be used to customize the `CSVSearchTool`'s behavior:
| Argument | Type | Description |
|:---------------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------|
| **csv** | `string` | _Optional_. The path to the CSV file you want to search. This is a mandatory argument if the tool was initialized without a specific CSV file; otherwise, it is optional. |
## Custom model and embeddings
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
```python Code
tool = CSVSearchTool(
config=dict(
llm=dict(
provider="ollama", # or google, openai, anthropic, llama2, ...
config=dict(
model="llama2",
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="google", # or openai, ollama, ...
config=dict(
model="models/embedding-001",
task_type="retrieval_document",
# title="Embeddings",
),
),
)
)
```

View File

@@ -0,0 +1,53 @@
---
title: Directory Read
description: The `DirectoryReadTool` is a powerful utility designed to provide a comprehensive listing of directory contents.
icon: folder-tree
---
# `DirectoryReadTool`
<Note>
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
</Note>
## Description
The DirectoryReadTool is a powerful utility designed to provide a comprehensive listing of directory contents.
It can recursively navigate through the specified directory, offering users a detailed enumeration of all files, including those within subdirectories.
This tool is crucial for tasks that require a thorough inventory of directory structures or for validating the organization of files within directories.
## Installation
To utilize the DirectoryReadTool in your project, install the `crewai_tools` package. If this package is not yet part of your environment, you can install it using pip with the command below:
```shell
pip install 'crewai[tools]'
```
This command installs the latest version of the `crewai_tools` package, granting access to the DirectoryReadTool among other utilities.
## Example
Employing the DirectoryReadTool is straightforward. The following code snippet demonstrates how to set it up and use the tool to list the contents of a specified directory:
```python Code
from crewai_tools import DirectoryReadTool
# Initialize the tool so the agent can read any directory's content
# it learns about during execution
tool = DirectoryReadTool()
# OR
# Initialize the tool with a specific directory,
# so the agent can only read the content of the specified directory
tool = DirectoryReadTool(directory='/path/to/your/directory')
```
## Arguments
The following parameters can be used to customize the `DirectoryReadTool`'s behavior:
| Argument | Type | Description |
|:---------------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------|
| **directory** | `string` | _Optional_. An argument that specifies the path to the directory whose contents you wish to list. It accepts both absolute and relative paths, guiding the tool to the desired directory for content listing. |

View File

@@ -0,0 +1,67 @@
---
title: Directory RAG Search
description: The `DirectorySearchTool` is a powerful RAG (Retrieval-Augmented Generation) tool designed for semantic searches within a directory's content.
icon: address-book
---
# `DirectorySearchTool`
<Note>
**Experimental**: The DirectorySearchTool is under continuous development. Features and functionalities might evolve, and unexpected behavior may occur as we refine the tool.
</Note>
## Description
The DirectorySearchTool enables semantic search within the content of specified directories, leveraging the Retrieval-Augmented Generation (RAG) methodology for efficient navigation through files. Designed for flexibility, it allows users to dynamically specify search directories at runtime or set a fixed directory during initial setup.
## Installation
To use the DirectorySearchTool, begin by installing the crewai_tools package. Execute the following command in your terminal:
```shell
pip install 'crewai[tools]'
```
## Initialization and Usage
Import the DirectorySearchTool from the `crewai_tools` package to start. You can initialize the tool without specifying a directory, enabling the setting of the search directory at runtime. Alternatively, the tool can be initialized with a predefined directory.
```python Code
from crewai_tools import DirectorySearchTool
# For dynamic directory specification at runtime
tool = DirectorySearchTool()
# For fixed directory searches
tool = DirectorySearchTool(directory='/path/to/directory')
```
## Arguments
- `directory`: A string argument that specifies the search directory. This is optional during initialization but required for searches if not set initially.
## Custom Model and Embeddings
The DirectorySearchTool uses OpenAI for embeddings and summarization by default. Customization options for these settings include changing the model provider and configuration, enhancing flexibility for advanced users.
```python Code
tool = DirectorySearchTool(
config=dict(
llm=dict(
provider="ollama", # Options include ollama, google, anthropic, llama2, and more
config=dict(
model="llama2",
# Additional configurations here
),
),
embedder=dict(
provider="google", # or openai, ollama, ...
config=dict(
model="models/embedding-001",
task_type="retrieval_document",
# title="Embeddings",
),
),
)
)
```

View File

@@ -0,0 +1,79 @@
---
title: DOCX RAG Search
description: The `DOCXSearchTool` is a RAG tool designed for semantic searching within DOCX documents.
icon: file-word
---
# `DOCXSearchTool`
<Note>
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
</Note>
## Description
The `DOCXSearchTool` is a RAG tool designed for semantic searching within DOCX documents.
It enables users to effectively search and extract relevant information from DOCX files using query-based searches.
This tool is invaluable for data analysis, information management, and research tasks,
streamlining the process of finding specific information within large document collections.
## Installation
Install the crewai_tools package by running the following command in your terminal:
```shell
uv pip install docx2txt 'crewai[tools]'
```
## Example
The following example demonstrates initializing the DOCXSearchTool to search within any DOCX file's content or with a specific DOCX file path.
```python Code
from crewai_tools import DOCXSearchTool
# Initialize the tool to search within any DOCX file's content
tool = DOCXSearchTool()
# OR
# Initialize the tool with a specific DOCX file,
# so the agent can only search the content of the specified DOCX file
tool = DOCXSearchTool(docx='path/to/your/document.docx')
```
## Arguments
The following parameters can be used to customize the `DOCXSearchTool`'s behavior:
| Argument | Type | Description |
|:---------------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------|
| **docx** | `string` | _Optional_. An argument that specifies the path to the DOCX file you want to search. If not provided during initialization, the tool allows for later specification of any DOCX file's content path for searching. |
## Custom model and embeddings
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
```python Code
tool = DOCXSearchTool(
config=dict(
llm=dict(
provider="ollama", # or google, openai, anthropic, llama2, ...
config=dict(
model="llama2",
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="google", # or openai, ollama, ...
config=dict(
model="models/embedding-001",
task_type="retrieval_document",
# title="Embeddings",
),
),
)
)
```

View File

@@ -0,0 +1,44 @@
---
title: File Read
description: The `FileReadTool` is designed to read files from the local file system.
icon: folders
---
## Overview
<Note>
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
</Note>
The FileReadTool conceptually represents a suite of functionalities within the crewai_tools package aimed at facilitating file reading and content retrieval.
This suite includes tools for processing batch text files, reading runtime configuration files, and importing data for analytics.
It supports a variety of text-based file formats such as `.txt`, `.csv`, `.json`, and more. Depending on the file type, the suite offers specialized functionality,
such as converting JSON content into a Python dictionary for ease of use.
## Installation
To utilize the functionalities previously attributed to the FileReadTool, install the crewai_tools package:
```shell
pip install 'crewai[tools]'
```
## Usage Example
To get started with the FileReadTool:
```python Code
from crewai_tools import FileReadTool
# Initialize the tool to read any files the agents knows or lean the path for
file_read_tool = FileReadTool()
# OR
# Initialize the tool with a specific file path, so the agent can only read the content of the specified file
file_read_tool = FileReadTool(file_path='path/to/your/file.txt')
```
## Arguments
- `file_path`: The path to the file you want to read. It accepts both absolute and relative paths. Ensure the file exists and you have the necessary permissions to access it.

View File

@@ -0,0 +1,50 @@
---
title: File Write
description: The `FileWriterTool` is designed to write content to files.
icon: file-pen
---
# `FileWriterTool`
## Description
The `FileWriterTool` is a component of the crewai_tools package, designed to simplify the process of writing content to files with cross-platform compatibility (Windows, Linux, macOS).
It is particularly useful in scenarios such as generating reports, saving logs, creating configuration files, and more.
This tool handles path differences across operating systems, supports UTF-8 encoding, and automatically creates directories if they don't exist, making it easier to organize your output reliably across different platforms.
## Installation
Install the crewai_tools package to use the `FileWriterTool` in your projects:
```shell
pip install 'crewai[tools]'
```
## Example
To get started with the `FileWriterTool`:
```python Code
from crewai_tools import FileWriterTool
# Initialize the tool
file_writer_tool = FileWriterTool()
# Write content to a file in a specified directory
result = file_writer_tool._run('example.txt', 'This is a test content.', 'test_directory')
print(result)
```
## Arguments
- `filename`: The name of the file you want to create or overwrite.
- `content`: The content to write into the file.
- `directory` (optional): The path to the directory where the file will be created. Defaults to the current directory (`.`). If the directory does not exist, it will be created.
## Conclusion
By integrating the `FileWriterTool` into your crews, the agents can reliably write content to files across different operating systems.
This tool is essential for tasks that require saving output data, creating structured file systems, and handling cross-platform file operations.
It's particularly recommended for Windows users who may encounter file writing issues with standard Python file operations.
By adhering to the setup and usage guidelines provided, incorporating this tool into projects is straightforward and ensures consistent file writing behavior across all platforms.

View File

@@ -0,0 +1,75 @@
---
title: JSON RAG Search
description: The `JSONSearchTool` is designed to search JSON files and return the most relevant results.
icon: file-code
---
# `JSONSearchTool`
<Note>
The JSONSearchTool is currently in an experimental phase. This means the tool
is under active development, and users might encounter unexpected behavior or
changes. We highly encourage feedback on any issues or suggestions for
improvements.
</Note>
## Description
The JSONSearchTool is designed to facilitate efficient and precise searches within JSON file contents. It utilizes a RAG (Retrieve and Generate) search mechanism, allowing users to specify a JSON path for targeted searches within a particular JSON file. This capability significantly improves the accuracy and relevance of search results.
## Installation
To install the JSONSearchTool, use the following pip command:
```shell
pip install 'crewai[tools]'
```
## Usage Examples
Here are updated examples on how to utilize the JSONSearchTool effectively for searching within JSON files. These examples take into account the current implementation and usage patterns identified in the codebase.
```python Code
from crewai_tools import JSONSearchTool
# General JSON content search
# This approach is suitable when the JSON path is either known beforehand or can be dynamically identified.
tool = JSONSearchTool()
# Restricting search to a specific JSON file
# Use this initialization method when you want to limit the search scope to a specific JSON file.
tool = JSONSearchTool(json_path='./path/to/your/file.json')
```
## Arguments
- `json_path` (str, optional): Specifies the path to the JSON file to be searched. This argument is not required if the tool is initialized for a general search. When provided, it confines the search to the specified JSON file.
## Configuration Options
The JSONSearchTool supports extensive customization through a configuration dictionary. This allows users to select different models for embeddings and summarization based on their requirements.
```python Code
tool = JSONSearchTool(
config={
"llm": {
"provider": "ollama", # Other options include google, openai, anthropic, llama2, etc.
"config": {
"model": "llama2",
# Additional optional configurations can be specified here.
# temperature=0.5,
# top_p=1,
# stream=true,
},
},
"embedding_model": {
"provider": "google", # or openai, ollama, ...
"config": {
"model": "models/embedding-001",
"task_type": "retrieval_document",
# Further customization options can be added here.
},
},
}
)
```

View File

@@ -0,0 +1,73 @@
---
title: MDX RAG Search
description: The `MDXSearchTool` is designed to search MDX files and return the most relevant results.
icon: markdown
---
# `MDXSearchTool`
<Note>
The MDXSearchTool is in continuous development. Features may be added or removed, and functionality could change unpredictably as we refine the tool.
</Note>
## Description
The MDX Search Tool is a component of the `crewai_tools` package aimed at facilitating advanced markdown language extraction. It enables users to effectively search and extract relevant information from MD files using query-based searches. This tool is invaluable for data analysis, information management, and research tasks, streamlining the process of finding specific information within large document collections.
## Installation
Before using the MDX Search Tool, ensure the `crewai_tools` package is installed. If it is not, you can install it with the following command:
```shell
pip install 'crewai[tools]'
```
## Usage Example
To use the MDX Search Tool, you must first set up the necessary environment variables. Then, integrate the tool into your crewAI project to begin your market research. Below is a basic example of how to do this:
```python Code
from crewai_tools import MDXSearchTool
# Initialize the tool to search any MDX content it learns about during execution
tool = MDXSearchTool()
# OR
# Initialize the tool with a specific MDX file path for an exclusive search within that document
tool = MDXSearchTool(mdx='path/to/your/document.mdx')
```
## Parameters
- mdx: **Optional**. Specifies the MDX file path for the search. It can be provided during initialization.
## Customization of Model and Embeddings
The tool defaults to using OpenAI for embeddings and summarization. For customization, utilize a configuration dictionary as shown below:
```python Code
tool = MDXSearchTool(
config=dict(
llm=dict(
provider="ollama", # Options include google, openai, anthropic, llama2, etc.
config=dict(
model="llama2",
# Optional parameters can be included here.
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="google", # or openai, ollama, ...
config=dict(
model="models/embedding-001",
task_type="retrieval_document",
# Optional title for the embeddings can be added here.
# title="Embeddings",
),
),
)
)
```

View File

@@ -0,0 +1,88 @@
---
title: "Overview"
description: "Read, write, and search through various file formats with CrewAI's document processing tools"
icon: "face-smile"
---
These tools enable your agents to work with various file formats and document types. From reading PDFs to processing JSON data, these tools handle all your document processing needs.
## **Available Tools**
<CardGroup cols={2}>
<Card title="File Read Tool" icon="folders" href="/en/tools/file-document/filereadtool">
Read content from any file type including text, markdown, and more.
</Card>
<Card title="File Write Tool" icon="file-pen" href="/en/tools/file-document/filewritetool">
Write content to files, create new documents, and save processed data.
</Card>
<Card title="PDF Search Tool" icon="file-pdf" href="/en/tools/file-document/pdfsearchtool">
Search and extract text content from PDF documents efficiently.
</Card>
<Card title="DOCX Search Tool" icon="file-word" href="/en/tools/file-document/docxsearchtool">
Search through Microsoft Word documents and extract relevant content.
</Card>
<Card title="JSON Search Tool" icon="brackets-curly" href="/en/tools/file-document/jsonsearchtool">
Parse and search through JSON files with advanced query capabilities.
</Card>
<Card title="CSV Search Tool" icon="table" href="/en/tools/file-document/csvsearchtool">
Process and search through CSV files, extract specific rows and columns.
</Card>
<Card title="XML Search Tool" icon="code" href="/en/tools/file-document/xmlsearchtool">
Parse XML files and search for specific elements and attributes.
</Card>
<Card title="MDX Search Tool" icon="markdown" href="/en/tools/file-document/mdxsearchtool">
Search through MDX files and extract content from documentation.
</Card>
<Card title="TXT Search Tool" icon="file-lines" href="/en/tools/file-document/txtsearchtool">
Search through plain text files with pattern matching capabilities.
</Card>
<Card title="Directory Search Tool" icon="folder-open" href="/en/tools/file-document/directorysearchtool">
Search for files and folders within directory structures.
</Card>
<Card title="Directory Read Tool" icon="folder" href="/en/tools/file-document/directoryreadtool">
Read and list directory contents, file structures, and metadata.
</Card>
</CardGroup>
## **Common Use Cases**
- **Document Processing**: Extract and analyze content from various file formats
- **Data Import**: Read structured data from CSV, JSON, and XML files
- **Content Search**: Find specific information within large document collections
- **File Management**: Organize and manipulate files and directories
- **Data Export**: Save processed results to various file formats
## **Quick Start Example**
```python
from crewai_tools import FileReadTool, PDFSearchTool, JSONSearchTool
# Create tools
file_reader = FileReadTool()
pdf_searcher = PDFSearchTool()
json_processor = JSONSearchTool()
# Add to your agent
agent = Agent(
role="Document Analyst",
tools=[file_reader, pdf_searcher, json_processor],
goal="Process and analyze various document types"
)
```
## **Tips for Document Processing**
- **File Permissions**: Ensure your agent has proper read/write permissions
- **Large Files**: Consider chunking for very large documents
- **Format Support**: Check tool documentation for supported file formats
- **Error Handling**: Implement proper error handling for corrupted or inaccessible files

View File

@@ -0,0 +1,71 @@
---
title: PDF RAG Search
description: The `PDFSearchTool` is designed to search PDF files and return the most relevant results.
icon: file-pdf
---
# `PDFSearchTool`
<Note>
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
</Note>
## Description
The PDFSearchTool is a RAG tool designed for semantic searches within PDF content. It allows for inputting a search query and a PDF document, leveraging advanced search techniques to find relevant content efficiently.
This capability makes it especially useful for extracting specific information from large PDF files quickly.
## Installation
To get started with the PDFSearchTool, first, ensure the crewai_tools package is installed with the following command:
```shell
pip install 'crewai[tools]'
```
## Example
Here's how to use the PDFSearchTool to search within a PDF document:
```python Code
from crewai_tools import PDFSearchTool
# Initialize the tool allowing for any PDF content search if the path is provided during execution
tool = PDFSearchTool()
# OR
# Initialize the tool with a specific PDF path for exclusive search within that document
tool = PDFSearchTool(pdf='path/to/your/document.pdf')
```
## Arguments
- `pdf`: **Optional** The PDF path for the search. Can be provided at initialization or within the `run` method's arguments. If provided at initialization, the tool confines its search to the specified document.
## Custom model and embeddings
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
```python Code
tool = PDFSearchTool(
config=dict(
llm=dict(
provider="ollama", # or google, openai, anthropic, llama2, ...
config=dict(
model="llama2",
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="google", # or openai, ollama, ...
config=dict(
model="models/embedding-001",
task_type="retrieval_document",
# title="Embeddings",
),
),
)
)
```

View File

@@ -0,0 +1,80 @@
---
title: TXT RAG Search
description: The `TXTSearchTool` is designed to perform a RAG (Retrieval-Augmented Generation) search within the content of a text file.
icon: file-lines
---
## Overview
<Note>
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
</Note>
This tool is used to perform a RAG (Retrieval-Augmented Generation) search within the content of a text file.
It allows for semantic searching of a query within a specified text file's content,
making it an invaluable resource for quickly extracting information or finding specific sections of text based on the query provided.
## Installation
To use the `TXTSearchTool`, you first need to install the `crewai_tools` package.
This can be done using pip, a package manager for Python.
Open your terminal or command prompt and enter the following command:
```shell
pip install 'crewai[tools]'
```
This command will download and install the TXTSearchTool along with any necessary dependencies.
## Example
The following example demonstrates how to use the TXTSearchTool to search within a text file.
This example shows both the initialization of the tool with a specific text file and the subsequent search within that file's content.
```python Code
from crewai_tools import TXTSearchTool
# Initialize the tool to search within any text file's content
# the agent learns about during its execution
tool = TXTSearchTool()
# OR
# Initialize the tool with a specific text file,
# so the agent can search within the given text file's content
tool = TXTSearchTool(txt='path/to/text/file.txt')
```
## Arguments
- `txt` (str): **Optional**. The path to the text file you want to search.
This argument is only required if the tool was not initialized with a specific text file;
otherwise, the search will be conducted within the initially provided text file.
## Custom model and embeddings
By default, the tool uses OpenAI for both embeddings and summarization.
To customize the model, you can use a config dictionary as follows:
```python Code
tool = TXTSearchTool(
config=dict(
llm=dict(
provider="ollama", # or google, openai, anthropic, llama2, ...
config=dict(
model="llama2",
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="google", # or openai, ollama, ...
config=dict(
model="models/embedding-001",
task_type="retrieval_document",
# title="Embeddings",
),
),
)
)
```

View File

@@ -0,0 +1,77 @@
---
title: XML RAG Search
description: The `XMLSearchTool` is designed to perform a RAG (Retrieval-Augmented Generation) search within the content of a XML file.
icon: file-xml
---
# `XMLSearchTool`
<Note>
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
</Note>
## Description
The XMLSearchTool is a cutting-edge RAG tool engineered for conducting semantic searches within XML files.
Ideal for users needing to parse and extract information from XML content efficiently, this tool supports inputting a search query and an optional XML file path.
By specifying an XML path, users can target their search more precisely to the content of that file, thereby obtaining more relevant search outcomes.
## Installation
To start using the XMLSearchTool, you must first install the crewai_tools package. This can be easily done with the following command:
```shell
pip install 'crewai[tools]'
```
## Example
Here are two examples demonstrating how to use the XMLSearchTool.
The first example shows searching within a specific XML file, while the second example illustrates initiating a search without predefining an XML path, providing flexibility in search scope.
```python Code
from crewai_tools import XMLSearchTool
# Allow agents to search within any XML file's content
#as it learns about their paths during execution
tool = XMLSearchTool()
# OR
# Initialize the tool with a specific XML file path
#for exclusive search within that document
tool = XMLSearchTool(xml='path/to/your/xmlfile.xml')
```
## Arguments
- `xml`: This is the path to the XML file you wish to search.
It is an optional parameter during the tool's initialization but must be provided either at initialization or as part of the `run` method's arguments to execute a search.
## Custom model and embeddings
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
```python Code
tool = XMLSearchTool(
config=dict(
llm=dict(
provider="ollama", # or google, openai, anthropic, llama2, ...
config=dict(
model="llama2",
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="google", # or openai, ollama, ...
config=dict(
model="models/embedding-001",
task_type="retrieval_document",
# title="Embeddings",
),
),
)
)
```

120
docs/en/tools/overview.mdx Normal file
View File

@@ -0,0 +1,120 @@
---
title: "Tools Overview"
description: "Discover CrewAI's extensive library of 40+ tools to supercharge your AI agents"
icon: "toolbox"
---
CrewAI provides an extensive library of pre-built tools to enhance your agents' capabilities. From file processing to web scraping, database queries to AI services - we've got you covered.
## **Tool Categories**
<CardGroup cols={2}>
<Card
title="File & Document"
icon="file-check"
href="/en/tools/file-document/overview"
color="#3B82F6"
>
Read, write, and search through various file formats including PDF, DOCX, JSON, CSV, and more. Perfect for document processing workflows.
</Card>
<Card
title="Web Scraping & Browsing"
icon="globe"
href="/en/tools/web-scraping/overview"
color="#10B981"
>
Extract data from websites, automate browser interactions, and scrape content at scale with tools like Firecrawl, Selenium, and more.
</Card>
<Card
title="Search & Research"
icon="magnifying-glass"
href="/en/tools/search-research/overview"
color="#F59E0B"
>
Perform web searches, find code repositories, research YouTube content, and discover information across the internet.
</Card>
<Card
title="Database & Data"
icon="database"
href="/en/tools/database-data/overview"
color="#8B5CF6"
>
Connect to SQL databases, vector stores, and data warehouses. Query MySQL, PostgreSQL, Snowflake, Qdrant, and Weaviate.
</Card>
<Card
title="AI & Machine Learning"
icon="brain"
href="/en/tools/ai-ml/overview"
color="#EF4444"
>
Generate images with DALL-E, process vision tasks, integrate with LangChain, build RAG systems, and leverage code interpreters.
</Card>
<Card
title="Cloud & Storage"
icon="cloud"
href="/en/tools/cloud-storage/overview"
color="#06B6D4"
>
Interact with cloud services including AWS S3, Amazon Bedrock, and other cloud storage and AI services.
</Card>
<Card
title="Automation & Integration"
icon="bolt"
href="/en/tools/automation/overview"
color="#84CC16"
>
Automate workflows with Apify, Composio, and other integration platforms to connect your agents with external services.
</Card>
</CardGroup>
## **Quick Access**
Need a specific tool? Here are some popular choices:
<CardGroup cols={3}>
<Card title="RAG Tool" icon="image" href="/en/tools/ai-ml/ragtool">
Implement Retrieval-Augmented Generation
</Card>
<Card title="Serper Dev" icon="book-atlas" href="/en/tools/search-research/serperdevtool">
Google search API
</Card>
<Card title="File Read" icon="file" href="/en/tools/file-document/filereadtool">
Read any file type
</Card>
<Card title="Scrape Website" icon="globe" href="/en/tools/web-scraping/scrapewebsitetool">
Extract web content
</Card>
<Card title="Code Interpreter" icon="code" href="/en/tools/ai-ml/codeinterpretertool">
Execute Python code
</Card>
<Card title="S3 Reader" icon="cloud" href="/en/tools/cloud-storage/s3readertool">
Access AWS S3 files
</Card>
</CardGroup>
## **Getting Started**
To use any tool in your CrewAI project:
1. **Import** the tool in your crew configuration
2. **Add** it to your agent's tools list
3. **Configure** any required API keys or settings
```python
from crewai_tools import FileReadTool, SerperDevTool
# Add tools to your agent
agent = Agent(
role="Research Analyst",
tools=[FileReadTool(), SerperDevTool()],
# ... other configuration
)
```
Ready to explore? Pick a category above to discover tools that fit your use case!

View File

@@ -0,0 +1,96 @@
---
title: Brave Search
description: The `BraveSearchTool` is designed to search the internet using the Brave Search API.
icon: searchengin
---
# `BraveSearchTool`
## Description
This tool is designed to perform web searches using the Brave Search API. It allows you to search the internet with a specified query and retrieve relevant results. The tool supports customizable result counts and country-specific searches.
## Installation
To incorporate this tool into your project, follow the installation instructions below:
```shell
pip install 'crewai[tools]'
```
## Steps to Get Started
To effectively use the `BraveSearchTool`, follow these steps:
1. **Package Installation**: Confirm that the `crewai[tools]` package is installed in your Python environment.
2. **API Key Acquisition**: Acquire a Brave Search API key by registering at [Brave Search API](https://api.search.brave.com/app/keys).
3. **Environment Configuration**: Store your obtained API key in an environment variable named `BRAVE_API_KEY` to facilitate its use by the tool.
## Example
The following example demonstrates how to initialize the tool and execute a search with a given query:
```python Code
from crewai_tools import BraveSearchTool
# Initialize the tool for internet searching capabilities
tool = BraveSearchTool()
# Execute a search
results = tool.run(search_query="CrewAI agent framework")
print(results)
```
## Parameters
The `BraveSearchTool` accepts the following parameters:
- **search_query**: Mandatory. The search query you want to use to search the internet.
- **country**: Optional. Specify the country for the search results. Default is empty string.
- **n_results**: Optional. Number of search results to return. Default is `10`.
- **save_file**: Optional. Whether to save the search results to a file. Default is `False`.
## Example with Parameters
Here is an example demonstrating how to use the tool with additional parameters:
```python Code
from crewai_tools import BraveSearchTool
# Initialize the tool with custom parameters
tool = BraveSearchTool(
country="US",
n_results=5,
save_file=True
)
# Execute a search
results = tool.run(search_query="Latest AI developments")
print(results)
```
## Agent Integration Example
Here's how to integrate the `BraveSearchTool` with a CrewAI agent:
```python Code
from crewai import Agent
from crewai.project import agent
from crewai_tools import BraveSearchTool
# Initialize the tool
brave_search_tool = BraveSearchTool()
# Define an agent with the BraveSearchTool
@agent
def researcher(self) -> Agent:
return Agent(
config=self.agents_config["researcher"],
allow_delegation=False,
tools=[brave_search_tool]
)
```
## Conclusion
By integrating the `BraveSearchTool` into Python projects, users gain the ability to conduct real-time, relevant searches across the internet directly from their applications. The tool provides a simple interface to the powerful Brave Search API, making it easy to retrieve and process search results programmatically. By adhering to the setup and usage guidelines provided, incorporating this tool into projects is streamlined and straightforward.

View File

@@ -0,0 +1,84 @@
---
title: Code Docs RAG Search
description: The `CodeDocsSearchTool` is a powerful RAG (Retrieval-Augmented Generation) tool designed for semantic searches within code documentation.
icon: code
---
# `CodeDocsSearchTool`
<Note>
**Experimental**: We are still working on improving tools, so there might be unexpected behavior or changes in the future.
</Note>
## Description
The CodeDocsSearchTool is a powerful RAG (Retrieval-Augmented Generation) tool designed for semantic searches within code documentation.
It enables users to efficiently find specific information or topics within code documentation. By providing a `docs_url` during initialization,
the tool narrows down the search to that particular documentation site. Alternatively, without a specific `docs_url`,
it searches across a wide array of code documentation known or discovered throughout its execution, making it versatile for various documentation search needs.
## Installation
To start using the CodeDocsSearchTool, first, install the crewai_tools package via pip:
```shell
pip install 'crewai[tools]'
```
## Example
Utilize the CodeDocsSearchTool as follows to conduct searches within code documentation:
```python Code
from crewai_tools import CodeDocsSearchTool
# To search any code documentation content
# if the URL is known or discovered during its execution:
tool = CodeDocsSearchTool()
# OR
# To specifically focus your search on a given documentation site
# by providing its URL:
tool = CodeDocsSearchTool(docs_url='https://docs.example.com/reference')
```
<Note>
Substitute 'https://docs.example.com/reference' with your target documentation URL
and 'How to use search tool' with the search query relevant to your needs.
</Note>
## Arguments
The following parameters can be used to customize the `CodeDocsSearchTool`'s behavior:
| Argument | Type | Description |
|:---------------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------|
| **docs_url** | `string` | _Optional_. Specifies the URL of the code documentation to be searched. |
## Custom model and embeddings
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
```python Code
tool = CodeDocsSearchTool(
config=dict(
llm=dict(
provider="ollama", # or google, openai, anthropic, llama2, ...
config=dict(
model="llama2",
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="google", # or openai, ollama, ...
config=dict(
model="models/embedding-001",
task_type="retrieval_document",
# title="Embeddings",
),
),
)
)
```

View File

@@ -0,0 +1,52 @@
---
title: EXA Search Web Loader
description: The `EXASearchTool` is designed to perform a semantic search for a specified query from a text's content across the internet.
icon: globe-pointer
---
# `EXASearchTool`
## Description
The EXASearchTool is designed to perform a semantic search for a specified query from a text's content across the internet.
It utilizes the [exa.ai](https://exa.ai/) API to fetch and display the most relevant search results based on the query provided by the user.
## Installation
To incorporate this tool into your project, follow the installation instructions below:
```shell
pip install 'crewai[tools]'
```
## Example
The following example demonstrates how to initialize the tool and execute a search with a given query:
```python Code
from crewai_tools import EXASearchTool
# Initialize the tool for internet searching capabilities
tool = EXASearchTool()
```
## Steps to Get Started
To effectively use the EXASearchTool, follow these steps:
<Steps>
<Step title="Package Installation">
Confirm that the `crewai[tools]` package is installed in your Python environment.
</Step>
<Step title="API Key Acquisition">
Acquire a [exa.ai](https://exa.ai/) API key by registering for a free account at [exa.ai](https://exa.ai/).
</Step>
<Step title="Environment Configuration">
Store your obtained API key in an environment variable named `EXA_API_KEY` to facilitate its use by the tool.
</Step>
</Steps>
## Conclusion
By integrating the `EXASearchTool` into Python projects, users gain the ability to conduct real-time, relevant searches across the internet directly from their applications.
By adhering to the setup and usage guidelines provided, incorporating this tool into projects is streamlined and straightforward.

View File

@@ -0,0 +1,83 @@
---
title: Github Search
description: The `GithubSearchTool` is designed to search websites and convert them into clean markdown or structured data.
icon: github
---
# `GithubSearchTool`
<Note>
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
</Note>
## Description
The GithubSearchTool is a Retrieval-Augmented Generation (RAG) tool specifically designed for conducting semantic searches within GitHub repositories. Utilizing advanced semantic search capabilities, it sifts through code, pull requests, issues, and repositories, making it an essential tool for developers, researchers, or anyone in need of precise information from GitHub.
## Installation
To use the GithubSearchTool, first ensure the crewai_tools package is installed in your Python environment:
```shell
pip install 'crewai[tools]'
```
This command installs the necessary package to run the GithubSearchTool along with any other tools included in the crewai_tools package.
## Example
Heres how you can use the GithubSearchTool to perform semantic searches within a GitHub repository:
```python Code
from crewai_tools import GithubSearchTool
# Initialize the tool for semantic searches within a specific GitHub repository
tool = GithubSearchTool(
github_repo='https://github.com/example/repo',
gh_token='your_github_personal_access_token',
content_types=['code', 'issue'] # Options: code, repo, pr, issue
)
# OR
# Initialize the tool for semantic searches within a specific GitHub repository, so the agent can search any repository if it learns about during its execution
tool = GithubSearchTool(
gh_token='your_github_personal_access_token',
content_types=['code', 'issue'] # Options: code, repo, pr, issue
)
```
## Arguments
- `github_repo` : The URL of the GitHub repository where the search will be conducted. This is a mandatory field and specifies the target repository for your search.
- `gh_token` : Your GitHub Personal Access Token (PAT) required for authentication. You can create one in your GitHub account settings under Developer Settings > Personal Access Tokens.
- `content_types` : Specifies the types of content to include in your search. You must provide a list of content types from the following options: `code` for searching within the code,
`repo` for searching within the repository's general information, `pr` for searching within pull requests, and `issue` for searching within issues.
This field is mandatory and allows tailoring the search to specific content types within the GitHub repository.
## Custom model and embeddings
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
```python Code
tool = GithubSearchTool(
config=dict(
llm=dict(
provider="ollama", # or google, openai, anthropic, llama2, ...
config=dict(
model="llama2",
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="google", # or openai, ollama, ...
config=dict(
model="models/embedding-001",
task_type="retrieval_document",
# title="Embeddings",
),
),
)
)

View File

@@ -0,0 +1,112 @@
---
title: Linkup Search Tool
description: The `LinkupSearchTool` enables querying the Linkup API for contextual information.
icon: link
---
# `LinkupSearchTool`
## Description
The `LinkupSearchTool` provides the ability to query the Linkup API for contextual information and retrieve structured results. This tool is ideal for enriching workflows with up-to-date and reliable information from Linkup, allowing agents to access relevant data during their tasks.
## Installation
To use this tool, you need to install the Linkup SDK:
```shell
uv add linkup-sdk
```
## Steps to Get Started
To effectively use the `LinkupSearchTool`, follow these steps:
1. **API Key**: Obtain a Linkup API key.
2. **Environment Setup**: Set up your environment with the API key.
3. **Install SDK**: Install the Linkup SDK using the command above.
## Example
The following example demonstrates how to initialize the tool and use it in an agent:
```python Code
from crewai_tools import LinkupSearchTool
from crewai import Agent
import os
# Initialize the tool with your API key
linkup_tool = LinkupSearchTool(api_key=os.getenv("LINKUP_API_KEY"))
# Define an agent that uses the tool
@agent
def researcher(self) -> Agent:
'''
This agent uses the LinkupSearchTool to retrieve contextual information
from the Linkup API.
'''
return Agent(
config=self.agents_config["researcher"],
tools=[linkup_tool]
)
```
## Parameters
The `LinkupSearchTool` accepts the following parameters:
### Constructor Parameters
- **api_key**: Required. Your Linkup API key.
### Run Parameters
- **query**: Required. The search term or phrase.
- **depth**: Optional. The search depth. Default is "standard".
- **output_type**: Optional. The type of output. Default is "searchResults".
## Advanced Usage
You can customize the search parameters for more specific results:
```python Code
# Perform a search with custom parameters
results = linkup_tool.run(
query="Women Nobel Prize Physics",
depth="deep",
output_type="searchResults"
)
```
## Return Format
The tool returns results in the following format:
```json
{
"success": true,
"results": [
{
"name": "Result Title",
"url": "https://example.com/result",
"content": "Content of the result..."
},
// Additional results...
]
}
```
If an error occurs, the response will be:
```json
{
"success": false,
"error": "Error message"
}
```
## Error Handling
The tool gracefully handles API errors and provides structured feedback. If the API request fails, the tool will return a dictionary with `success: false` and an error message.
## Conclusion
The `LinkupSearchTool` provides a seamless way to integrate Linkup's contextual information retrieval capabilities into your CrewAI agents. By leveraging this tool, agents can access relevant and up-to-date information to enhance their decision-making and task execution.

View File

@@ -0,0 +1,71 @@
---
title: "Overview"
description: "Perform web searches, find repositories, and research information across the internet"
icon: "face-smile"
---
These tools enable your agents to search the web, research topics, and find information across various platforms including search engines, GitHub, and YouTube.
## **Available Tools**
<CardGroup cols={2}>
<Card title="Serper Dev Tool" icon="google" href="/en/tools/search-research/serperdevtool">
Google search API integration for comprehensive web search capabilities.
</Card>
<Card title="Brave Search Tool" icon="shield" href="/en/tools/search-research/bravesearchtool">
Privacy-focused search with Brave's independent search index.
</Card>
<Card title="Exa Search Tool" icon="magnifying-glass" href="/en/tools/search-research/exasearchtool">
AI-powered search for finding specific and relevant content.
</Card>
<Card title="LinkUp Search Tool" icon="link" href="/en/tools/search-research/linkupsearchtool">
Real-time web search with fresh content indexing.
</Card>
<Card title="GitHub Search Tool" icon="github" href="/en/tools/search-research/githubsearchtool">
Search GitHub repositories, code, issues, and documentation.
</Card>
<Card title="Website Search Tool" icon="globe" href="/en/tools/search-research/websitesearchtool">
Search within specific websites and domains.
</Card>
<Card title="Code Docs Search Tool" icon="code" href="/en/tools/search-research/codedocssearchtool">
Search through code documentation and technical resources.
</Card>
<Card title="YouTube Channel Search" icon="youtube" href="/en/tools/search-research/youtubechannelsearchtool">
Search YouTube channels for specific content and creators.
</Card>
<Card title="YouTube Video Search" icon="play" href="/en/tools/search-research/youtubevideosearchtool">
Find and analyze YouTube videos by topic, keyword, or criteria.
</Card>
</CardGroup>
## **Common Use Cases**
- **Market Research**: Search for industry trends and competitor analysis
- **Content Discovery**: Find relevant articles, videos, and resources
- **Code Research**: Search repositories and documentation for solutions
- **Lead Generation**: Research companies and individuals
- **Academic Research**: Find scholarly articles and technical papers
```python
from crewai_tools import SerperDevTool, GitHubSearchTool, YoutubeVideoSearchTool
# Create research tools
web_search = SerperDevTool()
code_search = GitHubSearchTool()
video_research = YoutubeVideoSearchTool()
# Add to your agent
agent = Agent(
role="Research Analyst",
tools=[web_search, code_search, video_research],
goal="Gather comprehensive information on any topic"
)
```

View File

@@ -0,0 +1,112 @@
---
title: Google Serper Search
description: The `SerperDevTool` is designed to search the internet and return the most relevant results.
icon: google
---
# `SerperDevTool`
<Note>
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
</Note>
## Description
This tool is designed to perform a semantic search for a specified query from a text's content across the internet. It utilizes the [serper.dev](https://serper.dev) API
to fetch and display the most relevant search results based on the query provided by the user.
## Installation
To incorporate this tool into your project, follow the installation instructions below:
```shell
pip install 'crewai[tools]'
```
## Example
The following example demonstrates how to initialize the tool and execute a search with a given query:
```python Code
from crewai_tools import SerperDevTool
# Initialize the tool for internet searching capabilities
tool = SerperDevTool()
```
## Steps to Get Started
To effectively use the `SerperDevTool`, follow these steps:
1. **Package Installation**: Confirm that the `crewai[tools]` package is installed in your Python environment.
2. **API Key Acquisition**: Acquire a `serper.dev` API key by registering for a free account at `serper.dev`.
3. **Environment Configuration**: Store your obtained API key in an environment variable named `SERPER_API_KEY` to facilitate its use by the tool.
## Parameters
The `SerperDevTool` comes with several parameters that will be passed to the API :
- **search_url**: The URL endpoint for the search API. (Default is `https://google.serper.dev/search`)
- **country**: Optional. Specify the country for the search results.
- **location**: Optional. Specify the location for the search results.
- **locale**: Optional. Specify the locale for the search results.
- **n_results**: Number of search results to return. Default is `10`.
The values for `country`, `location`, `locale` and `search_url` can be found on the [Serper Playground](https://serper.dev/playground).
## Example with Parameters
Here is an example demonstrating how to use the tool with additional parameters:
```python Code
from crewai_tools import SerperDevTool
tool = SerperDevTool(
search_url="https://google.serper.dev/scholar",
n_results=2,
)
print(tool.run(search_query="ChatGPT"))
# Using Tool: Search the internet
# Search results: Title: Role of chat gpt in public health
# Link: https://link.springer.com/article/10.1007/s10439-023-03172-7
# Snippet: … ChatGPT in public health. In this overview, we will examine the potential uses of ChatGPT in
# ---
# Title: Potential use of chat gpt in global warming
# Link: https://link.springer.com/article/10.1007/s10439-023-03171-8
# Snippet: … as ChatGPT, have the potential to play a critical role in advancing our understanding of climate
# ---
```
```python Code
from crewai_tools import SerperDevTool
tool = SerperDevTool(
country="fr",
locale="fr",
location="Paris, Paris, Ile-de-France, France",
n_results=2,
)
print(tool.run(search_query="Jeux Olympiques"))
# Using Tool: Search the internet
# Search results: Title: Jeux Olympiques de Paris 2024 - Actualités, calendriers, résultats
# Link: https://olympics.com/fr/paris-2024
# Snippet: Quels sont les sports présents aux Jeux Olympiques de Paris 2024 ? · Athlétisme · Aviron · Badminton · Basketball · Basketball 3x3 · Boxe · Breaking · Canoë ...
# ---
# Title: Billetterie Officielle de Paris 2024 - Jeux Olympiques et Paralympiques
# Link: https://tickets.paris2024.org/
# Snippet: Achetez vos billets exclusivement sur le site officiel de la billetterie de Paris 2024 pour participer au plus grand événement sportif au monde.
# ---
```
## Conclusion
By integrating the `SerperDevTool` into Python projects, users gain the ability to conduct real-time, relevant searches across the internet directly from their applications.
The updated parameters allow for more customized and localized search results. By adhering to the setup and usage guidelines provided, incorporating this tool into projects is streamlined and straightforward.

View File

@@ -0,0 +1,77 @@
---
title: Website RAG Search
description: The `WebsiteSearchTool` is designed to perform a RAG (Retrieval-Augmented Generation) search within the content of a website.
icon: globe-stand
---
# `WebsiteSearchTool`
<Note>
The WebsiteSearchTool is currently in an experimental phase. We are actively working on incorporating this tool into our suite of offerings and will update the documentation accordingly.
</Note>
## Description
The WebsiteSearchTool is designed as a concept for conducting semantic searches within the content of websites.
It aims to leverage advanced machine learning models like Retrieval-Augmented Generation (RAG) to navigate and extract information from specified URLs efficiently.
This tool intends to offer flexibility, allowing users to perform searches across any website or focus on specific websites of interest.
Please note, the current implementation details of the WebsiteSearchTool are under development, and its functionalities as described may not yet be accessible.
## Installation
To prepare your environment for when the WebsiteSearchTool becomes available, you can install the foundational package with:
```shell
pip install 'crewai[tools]'
```
This command installs the necessary dependencies to ensure that once the tool is fully integrated, users can start using it immediately.
## Example Usage
Below are examples of how the WebsiteSearchTool could be utilized in different scenarios. Please note, these examples are illustrative and represent planned functionality:
```python Code
from crewai_tools import WebsiteSearchTool
# Example of initiating tool that agents can use
# to search across any discovered websites
tool = WebsiteSearchTool()
# Example of limiting the search to the content of a specific website,
# so now agents can only search within that website
tool = WebsiteSearchTool(website='https://example.com')
```
## Arguments
- `website`: An optional argument intended to specify the website URL for focused searches. This argument is designed to enhance the tool's flexibility by allowing targeted searches when necessary.
## Customization Options
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
```python Code
tool = WebsiteSearchTool(
config=dict(
llm=dict(
provider="ollama", # or google, openai, anthropic, llama2, ...
config=dict(
model="llama2",
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="google", # or openai, ollama, ...
config=dict(
model="models/embedding-001",
task_type="retrieval_document",
# title="Embeddings",
),
),
)
)
```

View File

@@ -0,0 +1,194 @@
---
title: YouTube Channel RAG Search
description: The `YoutubeChannelSearchTool` is designed to perform a RAG (Retrieval-Augmented Generation) search within the content of a Youtube channel.
icon: youtube
---
# `YoutubeChannelSearchTool`
<Note>
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
</Note>
## Description
This tool is designed to perform semantic searches within a specific Youtube channel's content.
Leveraging the RAG (Retrieval-Augmented Generation) methodology, it provides relevant search results,
making it invaluable for extracting information or finding specific content without the need to manually sift through videos.
It streamlines the search process within Youtube channels, catering to researchers, content creators, and viewers seeking specific information or topics.
## Installation
To utilize the YoutubeChannelSearchTool, the `crewai_tools` package must be installed. Execute the following command in your shell to install:
```shell
pip install 'crewai[tools]'
```
## Example
The following example demonstrates how to use the `YoutubeChannelSearchTool` with a CrewAI agent:
```python Code
from crewai import Agent, Task, Crew
from crewai_tools import YoutubeChannelSearchTool
# Initialize the tool for general YouTube channel searches
youtube_channel_tool = YoutubeChannelSearchTool()
# Define an agent that uses the tool
channel_researcher = Agent(
role="Channel Researcher",
goal="Extract relevant information from YouTube channels",
backstory="An expert researcher who specializes in analyzing YouTube channel content.",
tools=[youtube_channel_tool],
verbose=True,
)
# Example task to search for information in a specific channel
research_task = Task(
description="Search for information about machine learning tutorials in the YouTube channel {youtube_channel_handle}",
expected_output="A summary of the key machine learning tutorials available on the channel.",
agent=channel_researcher,
)
# Create and run the crew
crew = Crew(agents=[channel_researcher], tasks=[research_task])
result = crew.kickoff(inputs={"youtube_channel_handle": "@exampleChannel"})
```
You can also initialize the tool with a specific YouTube channel handle:
```python Code
# Initialize the tool with a specific YouTube channel handle
youtube_channel_tool = YoutubeChannelSearchTool(
youtube_channel_handle='@exampleChannel'
)
# Define an agent that uses the tool
channel_researcher = Agent(
role="Channel Researcher",
goal="Extract relevant information from a specific YouTube channel",
backstory="An expert researcher who specializes in analyzing YouTube channel content.",
tools=[youtube_channel_tool],
verbose=True,
)
```
## Parameters
The `YoutubeChannelSearchTool` accepts the following parameters:
- **youtube_channel_handle**: Optional. The handle of the YouTube channel to search within. If provided during initialization, the agent won't need to specify it when using the tool. If the handle doesn't start with '@', it will be automatically added.
- **config**: Optional. Configuration for the underlying RAG system, including LLM and embedder settings.
- **summarize**: Optional. Whether to summarize the retrieved content. Default is `False`.
When using the tool with an agent, the agent will need to provide:
- **search_query**: Required. The search query to find relevant information in the channel content.
- **youtube_channel_handle**: Required only if not provided during initialization. The handle of the YouTube channel to search within.
## Custom Model and Embeddings
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
```python Code
youtube_channel_tool = YoutubeChannelSearchTool(
config=dict(
llm=dict(
provider="ollama", # or google, openai, anthropic, llama2, ...
config=dict(
model="llama2",
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="google", # or openai, ollama, ...
config=dict(
model="models/embedding-001",
task_type="retrieval_document",
# title="Embeddings",
),
),
)
)
```
## Agent Integration Example
Here's a more detailed example of how to integrate the `YoutubeChannelSearchTool` with a CrewAI agent:
```python Code
from crewai import Agent, Task, Crew
from crewai_tools import YoutubeChannelSearchTool
# Initialize the tool
youtube_channel_tool = YoutubeChannelSearchTool()
# Define an agent that uses the tool
channel_researcher = Agent(
role="Channel Researcher",
goal="Extract and analyze information from YouTube channels",
backstory="""You are an expert channel researcher who specializes in extracting
and analyzing information from YouTube channels. You have a keen eye for detail
and can quickly identify key points and insights from video content across an entire channel.""",
tools=[youtube_channel_tool],
verbose=True,
)
# Create a task for the agent
research_task = Task(
description="""
Search for information about data science projects and tutorials
in the YouTube channel {youtube_channel_handle}.
Focus on:
1. Key data science techniques covered
2. Popular tutorial series
3. Most viewed or recommended videos
Provide a comprehensive summary of these points.
""",
expected_output="A detailed summary of data science content available on the channel.",
agent=channel_researcher,
)
# Run the task
crew = Crew(agents=[channel_researcher], tasks=[research_task])
result = crew.kickoff(inputs={"youtube_channel_handle": "@exampleDataScienceChannel"})
```
## Implementation Details
The `YoutubeChannelSearchTool` is implemented as a subclass of `RagTool`, which provides the base functionality for Retrieval-Augmented Generation:
```python Code
class YoutubeChannelSearchTool(RagTool):
name: str = "Search a Youtube Channels content"
description: str = "A tool that can be used to semantic search a query from a Youtube Channels content."
args_schema: Type[BaseModel] = YoutubeChannelSearchToolSchema
def __init__(self, youtube_channel_handle: Optional[str] = None, **kwargs):
super().__init__(**kwargs)
if youtube_channel_handle is not None:
kwargs["data_type"] = DataType.YOUTUBE_CHANNEL
self.add(youtube_channel_handle)
self.description = f"A tool that can be used to semantic search a query the {youtube_channel_handle} Youtube Channels content."
self.args_schema = FixedYoutubeChannelSearchToolSchema
self._generate_description()
def add(
self,
youtube_channel_handle: str,
**kwargs: Any,
) -> None:
if not youtube_channel_handle.startswith("@"):
youtube_channel_handle = f"@{youtube_channel_handle}"
super().add(youtube_channel_handle, **kwargs)
```
## Conclusion
The `YoutubeChannelSearchTool` provides a powerful way to search and extract information from YouTube channel content using RAG techniques. By enabling agents to search across an entire channel's videos, it facilitates information extraction and analysis tasks that would otherwise be difficult to perform. This tool is particularly useful for research, content analysis, and knowledge extraction from YouTube channels.

View File

@@ -0,0 +1,187 @@
---
title: YouTube Video RAG Search
description: The `YoutubeVideoSearchTool` is designed to perform a RAG (Retrieval-Augmented Generation) search within the content of a Youtube video.
icon: youtube
---
# `YoutubeVideoSearchTool`
<Note>
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
</Note>
## Description
This tool is part of the `crewai_tools` package and is designed to perform semantic searches within Youtube video content, utilizing Retrieval-Augmented Generation (RAG) techniques.
It is one of several "Search" tools in the package that leverage RAG for different sources.
The YoutubeVideoSearchTool allows for flexibility in searches; users can search across any Youtube video content without specifying a video URL,
or they can target their search to a specific Youtube video by providing its URL.
## Installation
To utilize the `YoutubeVideoSearchTool`, you must first install the `crewai_tools` package.
This package contains the `YoutubeVideoSearchTool` among other utilities designed to enhance your data analysis and processing tasks.
Install the package by executing the following command in your terminal:
```shell
pip install 'crewai[tools]'
```
## Example
The following example demonstrates how to use the `YoutubeVideoSearchTool` with a CrewAI agent:
```python Code
from crewai import Agent, Task, Crew
from crewai_tools import YoutubeVideoSearchTool
# Initialize the tool for general YouTube video searches
youtube_search_tool = YoutubeVideoSearchTool()
# Define an agent that uses the tool
video_researcher = Agent(
role="Video Researcher",
goal="Extract relevant information from YouTube videos",
backstory="An expert researcher who specializes in analyzing video content.",
tools=[youtube_search_tool],
verbose=True,
)
# Example task to search for information in a specific video
research_task = Task(
description="Search for information about machine learning frameworks in the YouTube video at {youtube_video_url}",
expected_output="A summary of the key machine learning frameworks mentioned in the video.",
agent=video_researcher,
)
# Create and run the crew
crew = Crew(agents=[video_researcher], tasks=[research_task])
result = crew.kickoff(inputs={"youtube_video_url": "https://youtube.com/watch?v=example"})
```
You can also initialize the tool with a specific YouTube video URL:
```python Code
# Initialize the tool with a specific YouTube video URL
youtube_search_tool = YoutubeVideoSearchTool(
youtube_video_url='https://youtube.com/watch?v=example'
)
# Define an agent that uses the tool
video_researcher = Agent(
role="Video Researcher",
goal="Extract relevant information from a specific YouTube video",
backstory="An expert researcher who specializes in analyzing video content.",
tools=[youtube_search_tool],
verbose=True,
)
```
## Parameters
The `YoutubeVideoSearchTool` accepts the following parameters:
- **youtube_video_url**: Optional. The URL of the YouTube video to search within. If provided during initialization, the agent won't need to specify it when using the tool.
- **config**: Optional. Configuration for the underlying RAG system, including LLM and embedder settings.
- **summarize**: Optional. Whether to summarize the retrieved content. Default is `False`.
When using the tool with an agent, the agent will need to provide:
- **search_query**: Required. The search query to find relevant information in the video content.
- **youtube_video_url**: Required only if not provided during initialization. The URL of the YouTube video to search within.
## Custom Model and Embeddings
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
```python Code
youtube_search_tool = YoutubeVideoSearchTool(
config=dict(
llm=dict(
provider="ollama", # or google, openai, anthropic, llama2, ...
config=dict(
model="llama2",
# temperature=0.5,
# top_p=1,
# stream=true,
),
),
embedder=dict(
provider="google", # or openai, ollama, ...
config=dict(
model="models/embedding-001",
task_type="retrieval_document",
# title="Embeddings",
),
),
)
)
```
## Agent Integration Example
Here's a more detailed example of how to integrate the `YoutubeVideoSearchTool` with a CrewAI agent:
```python Code
from crewai import Agent, Task, Crew
from crewai_tools import YoutubeVideoSearchTool
# Initialize the tool
youtube_search_tool = YoutubeVideoSearchTool()
# Define an agent that uses the tool
video_researcher = Agent(
role="Video Researcher",
goal="Extract and analyze information from YouTube videos",
backstory="""You are an expert video researcher who specializes in extracting
and analyzing information from YouTube videos. You have a keen eye for detail
and can quickly identify key points and insights from video content.""",
tools=[youtube_search_tool],
verbose=True,
)
# Create a task for the agent
research_task = Task(
description="""
Search for information about recent advancements in artificial intelligence
in the YouTube video at {youtube_video_url}.
Focus on:
1. Key AI technologies mentioned
2. Real-world applications discussed
3. Future predictions made by the speaker
Provide a comprehensive summary of these points.
""",
expected_output="A detailed summary of AI advancements, applications, and future predictions from the video.",
agent=video_researcher,
)
# Run the task
crew = Crew(agents=[video_researcher], tasks=[research_task])
result = crew.kickoff(inputs={"youtube_video_url": "https://youtube.com/watch?v=example"})
```
## Implementation Details
The `YoutubeVideoSearchTool` is implemented as a subclass of `RagTool`, which provides the base functionality for Retrieval-Augmented Generation:
```python Code
class YoutubeVideoSearchTool(RagTool):
name: str = "Search a Youtube Video content"
description: str = "A tool that can be used to semantic search a query from a Youtube Video content."
args_schema: Type[BaseModel] = YoutubeVideoSearchToolSchema
def __init__(self, youtube_video_url: Optional[str] = None, **kwargs):
super().__init__(**kwargs)
if youtube_video_url is not None:
kwargs["data_type"] = DataType.YOUTUBE_VIDEO
self.add(youtube_video_url)
self.description = f"A tool that can be used to semantic search a query the {youtube_video_url} Youtube Video content."
self.args_schema = FixedYoutubeVideoSearchToolSchema
self._generate_description()
```
## Conclusion
The `YoutubeVideoSearchTool` provides a powerful way to search and extract information from YouTube video content using RAG techniques. By enabling agents to search within video content, it facilitates information extraction and analysis tasks that would otherwise be difficult to perform. This tool is particularly useful for research, content analysis, and knowledge extraction from video sources.

View File

@@ -0,0 +1,50 @@
---
title: Browserbase Web Loader
description: Browserbase is a developer platform to reliably run, manage, and monitor headless browsers.
icon: browser
---
# `BrowserbaseLoadTool`
## Description
[Browserbase](https://browserbase.com) is a developer platform to reliably run, manage, and monitor headless browsers.
Power your AI data retrievals with:
- [Serverless Infrastructure](https://docs.browserbase.com/under-the-hood) providing reliable browsers to extract data from complex UIs
- [Stealth Mode](https://docs.browserbase.com/features/stealth-mode) with included fingerprinting tactics and automatic captcha solving
- [Session Debugger](https://docs.browserbase.com/features/sessions) to inspect your Browser Session with networks timeline and logs
- [Live Debug](https://docs.browserbase.com/guides/session-debug-connection/browser-remote-control) to quickly debug your automation
## Installation
- Get an API key and Project ID from [browserbase.com](https://browserbase.com) and set it in environment variables (`BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID`).
- Install the [Browserbase SDK](http://github.com/browserbase/python-sdk) along with `crewai[tools]` package:
```shell
pip install browserbase 'crewai[tools]'
```
## Example
Utilize the BrowserbaseLoadTool as follows to allow your agent to load websites:
```python Code
from crewai_tools import BrowserbaseLoadTool
# Initialize the tool with the Browserbase API key and Project ID
tool = BrowserbaseLoadTool()
```
## Arguments
The following parameters can be used to customize the `BrowserbaseLoadTool`'s behavior:
| Argument | Type | Description |
|:---------------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------|
| **api_key** | `string` | _Optional_. Browserbase API key. Default is `BROWSERBASE_API_KEY` env variable. |
| **project_id** | `string` | _Optional_. Browserbase Project ID. Default is `BROWSERBASE_PROJECT_ID` env variable. |
| **text_content** | `bool` | _Optional_. Retrieve only text content. Default is `False`. |
| **session_id** | `string` | _Optional_. Provide an existing Session ID. |
| **proxy** | `bool` | _Optional_. Enable/Disable Proxies. Default is `False`. |

View File

@@ -0,0 +1,47 @@
---
title: Firecrawl Crawl Website
description: The `FirecrawlCrawlWebsiteTool` is designed to crawl and convert websites into clean markdown or structured data.
icon: fire-flame
---
# `FirecrawlCrawlWebsiteTool`
## Description
[Firecrawl](https://firecrawl.dev) is a platform for crawling and convert any website into clean markdown or structured data.
## Installation
- Get an API key from [firecrawl.dev](https://firecrawl.dev) and set it in environment variables (`FIRECRAWL_API_KEY`).
- Install the [Firecrawl SDK](https://github.com/mendableai/firecrawl) along with `crewai[tools]` package:
```shell
pip install firecrawl-py 'crewai[tools]'
```
## Example
Utilize the FirecrawlScrapeFromWebsiteTool as follows to allow your agent to load websites:
```python Code
from crewai_tools import FirecrawlCrawlWebsiteTool
tool = FirecrawlCrawlWebsiteTool(url='firecrawl.dev')
```
## Arguments
- `api_key`: Optional. Specifies Firecrawl API key. Defaults is the `FIRECRAWL_API_KEY` environment variable.
- `url`: The base URL to start crawling from.
- `page_options`: Optional.
- `onlyMainContent`: Optional. Only return the main content of the page excluding headers, navs, footers, etc.
- `includeHtml`: Optional. Include the raw HTML content of the page. Will output a html key in the response.
- `crawler_options`: Optional. Options for controlling the crawling behavior.
- `includes`: Optional. URL patterns to include in the crawl.
- `exclude`: Optional. URL patterns to exclude from the crawl.
- `generateImgAltText`: Optional. Generate alt text for images using LLMs (requires a paid plan).
- `returnOnlyUrls`: Optional. If true, returns only the URLs as a list in the crawl status. Note: the response will be a list of URLs inside the data, not a list of documents.
- `maxDepth`: Optional. Maximum depth to crawl. Depth 1 is the base URL, depth 2 includes the base URL and its direct children, and so on.
- `mode`: Optional. The crawling mode to use. Fast mode crawls 4x faster on websites without a sitemap but may not be as accurate and shouldn't be used on heavily JavaScript-rendered websites.
- `limit`: Optional. Maximum number of pages to crawl.
- `timeout`: Optional. Timeout in milliseconds for the crawling operation.

View File

@@ -0,0 +1,43 @@
---
title: Firecrawl Scrape Website
description: The `FirecrawlScrapeWebsiteTool` is designed to scrape websites and convert them into clean markdown or structured data.
icon: fire-flame
---
# `FirecrawlScrapeWebsiteTool`
## Description
[Firecrawl](https://firecrawl.dev) is a platform for crawling and convert any website into clean markdown or structured data.
## Installation
- Get an API key from [firecrawl.dev](https://firecrawl.dev) and set it in environment variables (`FIRECRAWL_API_KEY`).
- Install the [Firecrawl SDK](https://github.com/mendableai/firecrawl) along with `crewai[tools]` package:
```shell
pip install firecrawl-py 'crewai[tools]'
```
## Example
Utilize the FirecrawlScrapeWebsiteTool as follows to allow your agent to load websites:
```python Code
from crewai_tools import FirecrawlScrapeWebsiteTool
tool = FirecrawlScrapeWebsiteTool(url='firecrawl.dev')
```
## Arguments
- `api_key`: Optional. Specifies Firecrawl API key. Defaults is the `FIRECRAWL_API_KEY` environment variable.
- `url`: The URL to scrape.
- `page_options`: Optional.
- `onlyMainContent`: Optional. Only return the main content of the page excluding headers, navs, footers, etc.
- `includeHtml`: Optional. Include the raw HTML content of the page. Will output a html key in the response.
- `extractor_options`: Optional. Options for LLM-based extraction of structured information from the page content
- `mode`: The extraction mode to use, currently supports 'llm-extraction'
- `extractionPrompt`: Optional. A prompt describing what information to extract from the page
- `extractionSchema`: Optional. The schema for the data to be extracted
- `timeout`: Optional. Timeout in milliseconds for the request

View File

@@ -0,0 +1,41 @@
---
title: Firecrawl Search
description: The `FirecrawlSearchTool` is designed to search websites and convert them into clean markdown or structured data.
icon: fire-flame
---
# `FirecrawlSearchTool`
## Description
[Firecrawl](https://firecrawl.dev) is a platform for crawling and convert any website into clean markdown or structured data.
## Installation
- Get an API key from [firecrawl.dev](https://firecrawl.dev) and set it in environment variables (`FIRECRAWL_API_KEY`).
- Install the [Firecrawl SDK](https://github.com/mendableai/firecrawl) along with `crewai[tools]` package:
```shell
pip install firecrawl-py 'crewai[tools]'
```
## Example
Utilize the FirecrawlSearchTool as follows to allow your agent to load websites:
```python Code
from crewai_tools import FirecrawlSearchTool
tool = FirecrawlSearchTool(query='what is firecrawl?')
```
## Arguments
- `api_key`: Optional. Specifies Firecrawl API key. Defaults is the `FIRECRAWL_API_KEY` environment variable.
- `query`: The search query string to be used for searching.
- `page_options`: Optional. Options for result formatting.
- `onlyMainContent`: Optional. Only return the main content of the page excluding headers, navs, footers, etc.
- `includeHtml`: Optional. Include the raw HTML content of the page. Will output a html key in the response.
- `fetchPageContent`: Optional. Fetch the full content of the page.
- `search_options`: Optional. Options for controlling the crawling behavior.
- `limit`: Optional. Maximum number of pages to crawl.

View File

@@ -0,0 +1,86 @@
---
title: Hyperbrowser Load Tool
description: The `HyperbrowserLoadTool` enables web scraping and crawling using Hyperbrowser.
icon: globe
---
# `HyperbrowserLoadTool`
## Description
The `HyperbrowserLoadTool` enables web scraping and crawling using [Hyperbrowser](https://hyperbrowser.ai), a platform for running and scaling headless browsers. This tool allows you to scrape a single page or crawl an entire site, returning the content in properly formatted markdown or HTML.
Key Features:
- Instant Scalability - Spin up hundreds of browser sessions in seconds without infrastructure headaches
- Simple Integration - Works seamlessly with popular tools like Puppeteer and Playwright
- Powerful APIs - Easy to use APIs for scraping/crawling any site
- Bypass Anti-Bot Measures - Built-in stealth mode, ad blocking, automatic CAPTCHA solving, and rotating proxies
## Installation
To use this tool, you need to install the Hyperbrowser SDK:
```shell
uv add hyperbrowser
```
## Steps to Get Started
To effectively use the `HyperbrowserLoadTool`, follow these steps:
1. **Sign Up**: Head to [Hyperbrowser](https://app.hyperbrowser.ai/) to sign up and generate an API key.
2. **API Key**: Set the `HYPERBROWSER_API_KEY` environment variable or pass it directly to the tool constructor.
3. **Install SDK**: Install the Hyperbrowser SDK using the command above.
## Example
The following example demonstrates how to initialize the tool and use it to scrape a website:
```python Code
from crewai_tools import HyperbrowserLoadTool
from crewai import Agent
# Initialize the tool with your API key
tool = HyperbrowserLoadTool(api_key="your_api_key") # Or use environment variable
# Define an agent that uses the tool
@agent
def web_researcher(self) -> Agent:
'''
This agent uses the HyperbrowserLoadTool to scrape websites
and extract information.
'''
return Agent(
config=self.agents_config["web_researcher"],
tools=[tool]
)
```
## Parameters
The `HyperbrowserLoadTool` accepts the following parameters:
### Constructor Parameters
- **api_key**: Optional. Your Hyperbrowser API key. If not provided, it will be read from the `HYPERBROWSER_API_KEY` environment variable.
### Run Parameters
- **url**: Required. The website URL to scrape or crawl.
- **operation**: Optional. The operation to perform on the website. Either 'scrape' or 'crawl'. Default is 'scrape'.
- **params**: Optional. Additional parameters for the scrape or crawl operation.
## Supported Parameters
For detailed information on all supported parameters, visit:
- [Scrape Parameters](https://docs.hyperbrowser.ai/reference/sdks/python/scrape#start-scrape-job-and-wait)
- [Crawl Parameters](https://docs.hyperbrowser.ai/reference/sdks/python/crawl#start-crawl-job-and-wait)
## Return Format
The tool returns content in the following format:
- For **scrape** operations: The content of the page in markdown or HTML format.
- For **crawl** operations: The content of each page separated by dividers, including the URL of each page.
## Conclusion
The `HyperbrowserLoadTool` provides a powerful way to scrape and crawl websites, handling complex scenarios like anti-bot measures, CAPTCHAs, and more. By leveraging Hyperbrowser's platform, this tool enables agents to access and extract web content efficiently.

View File

@@ -0,0 +1,107 @@
---
title: "Overview"
description: "Extract data from websites and automate browser interactions with powerful scraping tools"
icon: "face-smile"
---
These tools enable your agents to interact with the web, extract data from websites, and automate browser-based tasks. From simple web scraping to complex browser automation, these tools cover all your web interaction needs.
## **Available Tools**
<CardGroup cols={2}>
<Card title="Scrape Website Tool" icon="globe" href="/en/tools/web-scraping/scrapewebsitetool">
General-purpose web scraping tool for extracting content from any website.
</Card>
<Card title="Scrape Element Tool" icon="crosshairs" href="/en/tools/web-scraping/scrapeelementfromwebsitetool">
Target specific elements on web pages with precision scraping capabilities.
</Card>
<Card title="Firecrawl Crawl Tool" icon="spider" href="/en/tools/web-scraping/firecrawlcrawlwebsitetool">
Crawl entire websites systematically with Firecrawl's powerful engine.
</Card>
<Card title="Firecrawl Scrape Tool" icon="fire" href="/en/tools/web-scraping/firecrawlscrapewebsitetool">
High-performance web scraping with Firecrawl's advanced capabilities.
</Card>
<Card title="Firecrawl Search Tool" icon="magnifying-glass" href="/en/tools/web-scraping/firecrawlsearchtool">
Search and extract specific content using Firecrawl's search features.
</Card>
<Card title="Selenium Scraping Tool" icon="robot" href="/en/tools/web-scraping/seleniumscrapingtool">
Browser automation and scraping with Selenium WebDriver capabilities.
</Card>
<Card title="ScrapFly Tool" icon="plane" href="/en/tools/web-scraping/scrapflyscrapetool">
Professional web scraping with ScrapFly's premium scraping service.
</Card>
<Card title="ScrapGraph Tool" icon="network-wired" href="/en/tools/web-scraping/scrapegraphscrapetool">
Graph-based web scraping for complex data relationships.
</Card>
<Card title="Spider Tool" icon="spider" href="/en/tools/web-scraping/spidertool">
Comprehensive web crawling and data extraction capabilities.
</Card>
<Card title="BrowserBase Tool" icon="browser" href="/en/tools/web-scraping/browserbaseloadtool">
Cloud-based browser automation with BrowserBase infrastructure.
</Card>
<Card title="HyperBrowser Tool" icon="window-maximize" href="/en/tools/web-scraping/hyperbrowserloadtool">
Fast browser interactions with HyperBrowser's optimized engine.
</Card>
<Card title="Stagehand Tool" icon="hand" href="/en/tools/web-scraping/stagehandtool">
Intelligent browser automation with natural language commands.
</Card>
<Card title="Oxylabs Scraper Tool" icon="globe" href="/en/tools/web-scraping/oxylabsscraperstool">
Access web data at scale with Oxylabs.
</Card>
</CardGroup>
## **Common Use Cases**
- **Data Extraction**: Scrape product information, prices, and reviews
- **Content Monitoring**: Track changes on websites and news sources
- **Lead Generation**: Extract contact information and business data
- **Market Research**: Gather competitive intelligence and market data
- **Testing & QA**: Automate browser testing and validation workflows
- **Social Media**: Extract posts, comments, and social media analytics
## **Quick Start Example**
```python
from crewai_tools import ScrapeWebsiteTool, FirecrawlScrapeWebsiteTool, SeleniumScrapingTool
# Create scraping tools
simple_scraper = ScrapeWebsiteTool()
advanced_scraper = FirecrawlScrapeWebsiteTool()
browser_automation = SeleniumScrapingTool()
# Add to your agent
agent = Agent(
role="Web Research Specialist",
tools=[simple_scraper, advanced_scraper, browser_automation],
goal="Extract and analyze web data efficiently"
)
```
## **Scraping Best Practices**
- **Respect robots.txt**: Always check and follow website scraping policies
- **Rate Limiting**: Implement delays between requests to avoid overwhelming servers
- **User Agents**: Use appropriate user agent strings to identify your bot
- **Legal Compliance**: Ensure your scraping activities comply with terms of service
- **Error Handling**: Implement robust error handling for network issues and blocked requests
- **Data Quality**: Validate and clean extracted data before processing
## **Tool Selection Guide**
- **Simple Tasks**: Use `ScrapeWebsiteTool` for basic content extraction
- **JavaScript-Heavy Sites**: Use `SeleniumScrapingTool` for dynamic content
- **Scale & Performance**: Use `FirecrawlScrapeWebsiteTool` for high-volume scraping
- **Cloud Infrastructure**: Use `BrowserBaseLoadTool` for scalable browser automation
- **Complex Workflows**: Use `StagehandTool` for intelligent browser interactions

View File

@@ -0,0 +1,236 @@
---
title: Oxylabs Scrapers
description: >
Oxylabs Scrapers allow to easily access the information from the respective sources. Please see the list of available sources below:
- `Amazon Product`
- `Amazon Search`
- `Google Seach`
- `Universal`
icon: globe
---
## Installation
Get the credentials by creating an Oxylabs Account [here](https://oxylabs.io).
```shell
pip install 'crewai[tools]' oxylabs
```
Check [Oxylabs Documentation](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/targets) to get more information about API parameters.
# `OxylabsAmazonProductScraperTool`
### Example
```python
from crewai_tools import OxylabsAmazonProductScraperTool
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
tool = OxylabsAmazonProductScraperTool()
result = tool.run(query="AAAAABBBBCC")
print(result)
```
### Parameters
- `query` - 10-symbol ASIN code.
- `domain` - domain localization for Amazon.
- `geo_location` - the _Deliver to_ location.
- `user_agent_type` - device type and browser.
- `render` - enables JavaScript rendering when set to `html`.
- `callback_url` - URL to your callback endpoint.
- `context` - Additional advanced settings and controls for specialized requirements.
- `parse` - returns parsed data when set to true.
- `parsing_instructions` - define your own parsing and data transformation logic that will be executed on an HTML scraping result.
### Advanced example
```python
from crewai_tools import OxylabsAmazonProductScraperTool
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
tool = OxylabsAmazonProductScraperTool(
config={
"domain": "com",
"parse": True,
"context": [
{
"key": "autoselect_variant",
"value": True
}
]
}
)
result = tool.run(query="AAAAABBBBCC")
print(result)
```
# `OxylabsAmazonSearchScraperTool`
### Example
```python
from crewai_tools import OxylabsAmazonSearchScraperTool
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
tool = OxylabsAmazonSearchScraperTool()
result = tool.run(query="headsets")
print(result)
```
### Parameters
- `query` - Amazon search term.
- `domain` - Domain localization for Bestbuy.
- `start_page` - starting page number.
- `pages` - number of pages to retrieve.
- `geo_location` - the _Deliver to_ location.
- `user_agent_type` - device type and browser.
- `render` - enables JavaScript rendering when set to `html`.
- `callback_url` - URL to your callback endpoint.
- `context` - Additional advanced settings and controls for specialized requirements.
- `parse` - returns parsed data when set to true.
- `parsing_instructions` - define your own parsing and data transformation logic that will be executed on an HTML scraping result.
### Advanced example
```python
from crewai_tools import OxylabsAmazonSearchScraperTool
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
tool = OxylabsAmazonSearchScraperTool(
config={
"domain": 'nl',
"start_page": 2,
"pages": 2,
"parse": True,
"context": [
{'key': 'category_id', 'value': 16391693031}
],
}
)
result = tool.run(query='nirvana tshirt')
print(result)
```
# `OxylabsGoogleSearchScraperTool`
### Example
```python
from crewai_tools import OxylabsGoogleSearchScraperTool
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
tool = OxylabsGoogleSearchScraperTool()
result = tool.run(query="iPhone 16")
print(result)
```
### Parameters
- `query` - search keyword.
- `domain` - domain localization for Google.
- `start_page` - starting page number.
- `pages` - number of pages to retrieve.
- `limit` - number of results to retrieve in each page.
- `locale` - `Accept-Language` header value which changes your Google search page web interface language.
- `geo_location` - the geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data.
- `user_agent_type` - device type and browser.
- `render` - enables JavaScript rendering when set to `html`.
- `callback_url` - URL to your callback endpoint.
- `context` - Additional advanced settings and controls for specialized requirements.
- `parse` - returns parsed data when set to true.
- `parsing_instructions` - define your own parsing and data transformation logic that will be executed on an HTML scraping result.
### Advanced example
```python
from crewai_tools import OxylabsGoogleSearchScraperTool
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
tool = OxylabsGoogleSearchScraperTool(
config={
"parse": True,
"geo_location": "Paris, France",
"user_agent_type": "tablet",
}
)
result = tool.run(query="iPhone 16")
print(result)
```
# `OxylabsUniversalScraperTool`
### Example
```python
from crewai_tools import OxylabsUniversalScraperTool
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
tool = OxylabsUniversalScraperTool()
result = tool.run(url="https://ip.oxylabs.io")
print(result)
```
### Parameters
- `url` - website url to scrape.
- `user_agent_type` - device type and browser.
- `geo_location` - sets the proxy's geolocation to retrieve data.
- `render` - enables JavaScript rendering when set to `html`.
- `callback_url` - URL to your callback endpoint.
- `context` - Additional advanced settings and controls for specialized requirements.
- `parse` - returns parsed data when set to `true`, as long as a dedicated parser exists for the submitted URL's page type.
- `parsing_instructions` - define your own parsing and data transformation logic that will be executed on an HTML scraping result.
### Advanced example
```python
from crewai_tools import OxylabsUniversalScraperTool
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
tool = OxylabsUniversalScraperTool(
config={
"render": "html",
"user_agent_type": "mobile",
"context": [
{"key": "force_headers", "value": True},
{"key": "force_cookies", "value": True},
{
"key": "headers",
"value": {
"Custom-Header-Name": "custom header content",
},
},
{
"key": "cookies",
"value": [
{"key": "NID", "value": "1234567890"},
{"key": "1P JAR", "value": "0987654321"},
],
},
{"key": "http_method", "value": "get"},
{"key": "follow_redirects", "value": True},
{"key": "successful_status_codes", "value": [808, 909]},
],
}
)
result = tool.run(url="https://ip.oxylabs.io")
print(result)
```

View File

@@ -0,0 +1,139 @@
---
title: Scrape Element From Website Tool
description: The `ScrapeElementFromWebsiteTool` enables CrewAI agents to extract specific elements from websites using CSS selectors.
icon: code
---
# `ScrapeElementFromWebsiteTool`
## Description
The `ScrapeElementFromWebsiteTool` is designed to extract specific elements from websites using CSS selectors. This tool allows CrewAI agents to scrape targeted content from web pages, making it useful for data extraction tasks where only specific parts of a webpage are needed.
## Installation
To use this tool, you need to install the required dependencies:
```shell
uv add requests beautifulsoup4
```
## Steps to Get Started
To effectively use the `ScrapeElementFromWebsiteTool`, follow these steps:
1. **Install Dependencies**: Install the required packages using the command above.
2. **Identify CSS Selectors**: Determine the CSS selectors for the elements you want to extract from the website.
3. **Initialize the Tool**: Create an instance of the tool with the necessary parameters.
## Example
The following example demonstrates how to use the `ScrapeElementFromWebsiteTool` to extract specific elements from a website:
```python Code
from crewai import Agent, Task, Crew
from crewai_tools import ScrapeElementFromWebsiteTool
# Initialize the tool
scrape_tool = ScrapeElementFromWebsiteTool()
# Define an agent that uses the tool
web_scraper_agent = Agent(
role="Web Scraper",
goal="Extract specific information from websites",
backstory="An expert in web scraping who can extract targeted content from web pages.",
tools=[scrape_tool],
verbose=True,
)
# Example task to extract headlines from a news website
scrape_task = Task(
description="Extract the main headlines from the CNN homepage. Use the CSS selector '.headline' to target the headline elements.",
expected_output="A list of the main headlines from CNN.",
agent=web_scraper_agent,
)
# Create and run the crew
crew = Crew(agents=[web_scraper_agent], tasks=[scrape_task])
result = crew.kickoff()
```
You can also initialize the tool with predefined parameters:
```python Code
# Initialize the tool with predefined parameters
scrape_tool = ScrapeElementFromWebsiteTool(
website_url="https://www.example.com",
css_element=".main-content"
)
```
## Parameters
The `ScrapeElementFromWebsiteTool` accepts the following parameters during initialization:
- **website_url**: Optional. The URL of the website to scrape. If provided during initialization, the agent won't need to specify it when using the tool.
- **css_element**: Optional. The CSS selector for the elements to extract. If provided during initialization, the agent won't need to specify it when using the tool.
- **cookies**: Optional. A dictionary containing cookies to be sent with the request. This can be useful for websites that require authentication.
## Usage
When using the `ScrapeElementFromWebsiteTool` with an agent, the agent will need to provide the following parameters (unless they were specified during initialization):
- **website_url**: The URL of the website to scrape.
- **css_element**: The CSS selector for the elements to extract.
The tool will return the text content of all elements matching the CSS selector, joined by newlines.
```python Code
# Example of using the tool with an agent
web_scraper_agent = Agent(
role="Web Scraper",
goal="Extract specific elements from websites",
backstory="An expert in web scraping who can extract targeted content using CSS selectors.",
tools=[scrape_tool],
verbose=True,
)
# Create a task for the agent to extract specific elements
extract_task = Task(
description="""
Extract all product titles from the featured products section on example.com.
Use the CSS selector '.product-title' to target the title elements.
""",
expected_output="A list of product titles from the website",
agent=web_scraper_agent,
)
# Run the task through a crew
crew = Crew(agents=[web_scraper_agent], tasks=[extract_task])
result = crew.kickoff()
```
## Implementation Details
The `ScrapeElementFromWebsiteTool` uses the `requests` library to fetch the web page and `BeautifulSoup` to parse the HTML and extract the specified elements:
```python Code
class ScrapeElementFromWebsiteTool(BaseTool):
name: str = "Read a website content"
description: str = "A tool that can be used to read a website content."
# Implementation details...
def _run(self, **kwargs: Any) -> Any:
website_url = kwargs.get("website_url", self.website_url)
css_element = kwargs.get("css_element", self.css_element)
page = requests.get(
website_url,
headers=self.headers,
cookies=self.cookies if self.cookies else {},
)
parsed = BeautifulSoup(page.content, "html.parser")
elements = parsed.select(css_element)
return "\n".join([element.get_text() for element in elements])
```
## Conclusion
The `ScrapeElementFromWebsiteTool` provides a powerful way to extract specific elements from websites using CSS selectors. By enabling agents to target only the content they need, it makes web scraping tasks more efficient and focused. This tool is particularly useful for data extraction, content monitoring, and research tasks where specific information needs to be extracted from web pages.

View File

@@ -0,0 +1,196 @@
---
title: Scrapegraph Scrape Tool
description: The `ScrapegraphScrapeTool` leverages Scrapegraph AI's SmartScraper API to intelligently extract content from websites.
icon: chart-area
---
# `ScrapegraphScrapeTool`
## Description
The `ScrapegraphScrapeTool` is designed to leverage Scrapegraph AI's SmartScraper API to intelligently extract content from websites. This tool provides advanced web scraping capabilities with AI-powered content extraction, making it ideal for targeted data collection and content analysis tasks. Unlike traditional web scrapers, it can understand the context and structure of web pages to extract the most relevant information based on natural language prompts.
## Installation
To use this tool, you need to install the Scrapegraph Python client:
```shell
uv add scrapegraph-py
```
You'll also need to set up your Scrapegraph API key as an environment variable:
```shell
export SCRAPEGRAPH_API_KEY="your_api_key"
```
You can obtain an API key from [Scrapegraph AI](https://scrapegraphai.com).
## Steps to Get Started
To effectively use the `ScrapegraphScrapeTool`, follow these steps:
1. **Install Dependencies**: Install the required package using the command above.
2. **Set Up API Key**: Set your Scrapegraph API key as an environment variable or provide it during initialization.
3. **Initialize the Tool**: Create an instance of the tool with the necessary parameters.
4. **Define Extraction Prompts**: Create natural language prompts to guide the extraction of specific content.
## Example
The following example demonstrates how to use the `ScrapegraphScrapeTool` to extract content from a website:
```python Code
from crewai import Agent, Task, Crew
from crewai_tools import ScrapegraphScrapeTool
# Initialize the tool
scrape_tool = ScrapegraphScrapeTool(api_key="your_api_key")
# Define an agent that uses the tool
web_scraper_agent = Agent(
role="Web Scraper",
goal="Extract specific information from websites",
backstory="An expert in web scraping who can extract targeted content from web pages.",
tools=[scrape_tool],
verbose=True,
)
# Example task to extract product information from an e-commerce site
scrape_task = Task(
description="Extract product names, prices, and descriptions from the featured products section of example.com.",
expected_output="A structured list of product information including names, prices, and descriptions.",
agent=web_scraper_agent,
)
# Create and run the crew
crew = Crew(agents=[web_scraper_agent], tasks=[scrape_task])
result = crew.kickoff()
```
You can also initialize the tool with predefined parameters:
```python Code
# Initialize the tool with predefined parameters
scrape_tool = ScrapegraphScrapeTool(
website_url="https://www.example.com",
user_prompt="Extract all product prices and descriptions",
api_key="your_api_key"
)
```
## Parameters
The `ScrapegraphScrapeTool` accepts the following parameters during initialization:
- **api_key**: Optional. Your Scrapegraph API key. If not provided, it will look for the `SCRAPEGRAPH_API_KEY` environment variable.
- **website_url**: Optional. The URL of the website to scrape. If provided during initialization, the agent won't need to specify it when using the tool.
- **user_prompt**: Optional. Custom instructions for content extraction. If provided during initialization, the agent won't need to specify it when using the tool.
- **enable_logging**: Optional. Whether to enable logging for the Scrapegraph client. Default is `False`.
## Usage
When using the `ScrapegraphScrapeTool` with an agent, the agent will need to provide the following parameters (unless they were specified during initialization):
- **website_url**: The URL of the website to scrape.
- **user_prompt**: Optional. Custom instructions for content extraction. Default is "Extract the main content of the webpage".
The tool will return the extracted content based on the provided prompt.
```python Code
# Example of using the tool with an agent
web_scraper_agent = Agent(
role="Web Scraper",
goal="Extract specific information from websites",
backstory="An expert in web scraping who can extract targeted content from web pages.",
tools=[scrape_tool],
verbose=True,
)
# Create a task for the agent to extract specific content
extract_task = Task(
description="Extract the main heading and summary from example.com",
expected_output="The main heading and summary from the website",
agent=web_scraper_agent,
)
# Run the task
crew = Crew(agents=[web_scraper_agent], tasks=[extract_task])
result = crew.kickoff()
```
## Error Handling
The `ScrapegraphScrapeTool` may raise the following exceptions:
- **ValueError**: When API key is missing or URL format is invalid.
- **RateLimitError**: When API rate limits are exceeded.
- **RuntimeError**: When scraping operation fails (network issues, API errors).
It's recommended to instruct agents to handle potential errors gracefully:
```python Code
# Create a task that includes error handling instructions
robust_extract_task = Task(
description="""
Extract the main heading from example.com.
Be aware that you might encounter errors such as:
- Invalid URL format
- Missing API key
- Rate limit exceeded
- Network or API errors
If you encounter any errors, provide a clear explanation of what went wrong
and suggest possible solutions.
""",
expected_output="Either the extracted heading or a clear error explanation",
agent=web_scraper_agent,
)
```
## Rate Limiting
The Scrapegraph API has rate limits that vary based on your subscription plan. Consider the following best practices:
- Implement appropriate delays between requests when processing multiple URLs.
- Handle rate limit errors gracefully in your application.
- Check your API plan limits on the Scrapegraph dashboard.
## Implementation Details
The `ScrapegraphScrapeTool` uses the Scrapegraph Python client to interact with the SmartScraper API:
```python Code
class ScrapegraphScrapeTool(BaseTool):
"""
A tool that uses Scrapegraph AI to intelligently scrape website content.
"""
# Implementation details...
def _run(self, **kwargs: Any) -> Any:
website_url = kwargs.get("website_url", self.website_url)
user_prompt = (
kwargs.get("user_prompt", self.user_prompt)
or "Extract the main content of the webpage"
)
if not website_url:
raise ValueError("website_url is required")
# Validate URL format
self._validate_url(website_url)
try:
# Make the SmartScraper request
response = self._client.smartscraper(
website_url=website_url,
user_prompt=user_prompt,
)
return response
# Error handling...
```
## Conclusion
The `ScrapegraphScrapeTool` provides a powerful way to extract content from websites using AI-powered understanding of web page structure. By enabling agents to target specific information using natural language prompts, it makes web scraping tasks more efficient and focused. This tool is particularly useful for data extraction, content monitoring, and research tasks where specific information needs to be extracted from web pages.

View File

@@ -0,0 +1,47 @@
---
title: Scrape Website
description: The `ScrapeWebsiteTool` is designed to extract and read the content of a specified website.
icon: magnifying-glass-location
---
# `ScrapeWebsiteTool`
<Note>
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
</Note>
## Description
A tool designed to extract and read the content of a specified website. It is capable of handling various types of web pages by making HTTP requests and parsing the received HTML content.
This tool can be particularly useful for web scraping tasks, data collection, or extracting specific information from websites.
## Installation
Install the crewai_tools package
```shell
pip install 'crewai[tools]'
```
## Example
```python
from crewai_tools import ScrapeWebsiteTool
# To enable scrapping any website it finds during it's execution
tool = ScrapeWebsiteTool()
# Initialize the tool with the website URL,
# so the agent can only scrap the content of the specified website
tool = ScrapeWebsiteTool(website_url='https://www.example.com')
# Extract the text from the site
text = tool.run()
print(text)
```
## Arguments
| Argument | Type | Description |
|:---------------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------|
| **website_url** | `string` | **Mandatory** website URL to read the file. This is the primary input for the tool, specifying which website's content should be scraped and read. |

View File

@@ -0,0 +1,220 @@
---
title: Scrapfly Scrape Website Tool
description: The `ScrapflyScrapeWebsiteTool` leverages Scrapfly's web scraping API to extract content from websites in various formats.
icon: spider
---
# `ScrapflyScrapeWebsiteTool`
## Description
The `ScrapflyScrapeWebsiteTool` is designed to leverage [Scrapfly](https://scrapfly.io/)'s web scraping API to extract content from websites. This tool provides advanced web scraping capabilities with headless browser support, proxies, and anti-bot bypass features. It allows for extracting web page data in various formats, including raw HTML, markdown, and plain text, making it ideal for a wide range of web scraping tasks.
## Installation
To use this tool, you need to install the Scrapfly SDK:
```shell
uv add scrapfly-sdk
```
You'll also need to obtain a Scrapfly API key by registering at [scrapfly.io/register](https://www.scrapfly.io/register/).
## Steps to Get Started
To effectively use the `ScrapflyScrapeWebsiteTool`, follow these steps:
1. **Install Dependencies**: Install the Scrapfly SDK using the command above.
2. **Obtain API Key**: Register at Scrapfly to get your API key.
3. **Initialize the Tool**: Create an instance of the tool with your API key.
4. **Configure Scraping Parameters**: Customize the scraping parameters based on your needs.
## Example
The following example demonstrates how to use the `ScrapflyScrapeWebsiteTool` to extract content from a website:
```python Code
from crewai import Agent, Task, Crew
from crewai_tools import ScrapflyScrapeWebsiteTool
# Initialize the tool
scrape_tool = ScrapflyScrapeWebsiteTool(api_key="your_scrapfly_api_key")
# Define an agent that uses the tool
web_scraper_agent = Agent(
role="Web Scraper",
goal="Extract information from websites",
backstory="An expert in web scraping who can extract content from any website.",
tools=[scrape_tool],
verbose=True,
)
# Example task to extract content from a website
scrape_task = Task(
description="Extract the main content from the product page at https://web-scraping.dev/products and summarize the available products.",
expected_output="A summary of the products available on the website.",
agent=web_scraper_agent,
)
# Create and run the crew
crew = Crew(agents=[web_scraper_agent], tasks=[scrape_task])
result = crew.kickoff()
```
You can also customize the scraping parameters:
```python Code
# Example with custom scraping parameters
web_scraper_agent = Agent(
role="Web Scraper",
goal="Extract information from websites with custom parameters",
backstory="An expert in web scraping who can extract content from any website.",
tools=[scrape_tool],
verbose=True,
)
# The agent will use the tool with parameters like:
# url="https://web-scraping.dev/products"
# scrape_format="markdown"
# ignore_scrape_failures=True
# scrape_config={
# "asp": True, # Bypass scraping blocking solutions, like Cloudflare
# "render_js": True, # Enable JavaScript rendering with a cloud headless browser
# "proxy_pool": "public_residential_pool", # Select a proxy pool
# "country": "us", # Select a proxy location
# "auto_scroll": True, # Auto scroll the page
# }
scrape_task = Task(
description="Extract the main content from the product page at https://web-scraping.dev/products using advanced scraping options including JavaScript rendering and proxy settings.",
expected_output="A detailed summary of the products with all available information.",
agent=web_scraper_agent,
)
```
## Parameters
The `ScrapflyScrapeWebsiteTool` accepts the following parameters:
### Initialization Parameters
- **api_key**: Required. Your Scrapfly API key.
### Run Parameters
- **url**: Required. The URL of the website to scrape.
- **scrape_format**: Optional. The format in which to extract the web page content. Options are "raw" (HTML), "markdown", or "text". Default is "markdown".
- **scrape_config**: Optional. A dictionary containing additional Scrapfly scraping configuration options.
- **ignore_scrape_failures**: Optional. Whether to ignore failures during scraping. If set to `True`, the tool will return `None` instead of raising an exception when scraping fails.
## Scrapfly Configuration Options
The `scrape_config` parameter allows you to customize the scraping behavior with the following options:
- **asp**: Enable anti-scraping protection bypass.
- **render_js**: Enable JavaScript rendering with a cloud headless browser.
- **proxy_pool**: Select a proxy pool (e.g., "public_residential_pool", "datacenter").
- **country**: Select a proxy location (e.g., "us", "uk").
- **auto_scroll**: Automatically scroll the page to load lazy-loaded content.
- **js**: Execute custom JavaScript code by the headless browser.
For a complete list of configuration options, refer to the [Scrapfly API documentation](https://scrapfly.io/docs/scrape-api/getting-started).
## Usage
When using the `ScrapflyScrapeWebsiteTool` with an agent, the agent will need to provide the URL of the website to scrape and can optionally specify the format and additional configuration options:
```python Code
# Example of using the tool with an agent
web_scraper_agent = Agent(
role="Web Scraper",
goal="Extract information from websites",
backstory="An expert in web scraping who can extract content from any website.",
tools=[scrape_tool],
verbose=True,
)
# Create a task for the agent
scrape_task = Task(
description="Extract the main content from example.com in markdown format.",
expected_output="The main content of example.com in markdown format.",
agent=web_scraper_agent,
)
# Run the task
crew = Crew(agents=[web_scraper_agent], tasks=[scrape_task])
result = crew.kickoff()
```
For more advanced usage with custom configuration:
```python Code
# Create a task with more specific instructions
advanced_scrape_task = Task(
description="""
Extract content from example.com with the following requirements:
- Convert the content to plain text format
- Enable JavaScript rendering
- Use a US-based proxy
- Handle any scraping failures gracefully
""",
expected_output="The extracted content from example.com",
agent=web_scraper_agent,
)
```
## Error Handling
By default, the `ScrapflyScrapeWebsiteTool` will raise an exception if scraping fails. Agents can be instructed to handle failures gracefully by specifying the `ignore_scrape_failures` parameter:
```python Code
# Create a task that instructs the agent to handle errors
error_handling_task = Task(
description="""
Extract content from a potentially problematic website and make sure to handle any
scraping failures gracefully by setting ignore_scrape_failures to True.
""",
expected_output="Either the extracted content or a graceful error message",
agent=web_scraper_agent,
)
```
## Implementation Details
The `ScrapflyScrapeWebsiteTool` uses the Scrapfly SDK to interact with the Scrapfly API:
```python Code
class ScrapflyScrapeWebsiteTool(BaseTool):
name: str = "Scrapfly web scraping API tool"
description: str = (
"Scrape a webpage url using Scrapfly and return its content as markdown or text"
)
# Implementation details...
def _run(
self,
url: str,
scrape_format: str = "markdown",
scrape_config: Optional[Dict[str, Any]] = None,
ignore_scrape_failures: Optional[bool] = None,
):
from scrapfly import ScrapeApiResponse, ScrapeConfig
scrape_config = scrape_config if scrape_config is not None else {}
try:
response: ScrapeApiResponse = self.scrapfly.scrape(
ScrapeConfig(url, format=scrape_format, **scrape_config)
)
return response.scrape_result["content"]
except Exception as e:
if ignore_scrape_failures:
logger.error(f"Error fetching data from {url}, exception: {e}")
return None
else:
raise e
```
## Conclusion
The `ScrapflyScrapeWebsiteTool` provides a powerful way to extract content from websites using Scrapfly's advanced web scraping capabilities. With features like headless browser support, proxies, and anti-bot bypass, it can handle complex websites and extract content in various formats. This tool is particularly useful for data extraction, content monitoring, and research tasks where reliable web scraping is required.

View File

@@ -0,0 +1,195 @@
---
title: Selenium Scraper
description: The `SeleniumScrapingTool` is designed to extract and read the content of a specified website using Selenium.
icon: clipboard-user
---
# `SeleniumScrapingTool`
<Note>
This tool is currently in development. As we refine its capabilities, users may encounter unexpected behavior.
Your feedback is invaluable to us for making improvements.
</Note>
## Description
The `SeleniumScrapingTool` is crafted for high-efficiency web scraping tasks.
It allows for precise extraction of content from web pages by using CSS selectors to target specific elements.
Its design caters to a wide range of scraping needs, offering flexibility to work with any provided website URL.
## Installation
To use this tool, you need to install the CrewAI tools package and Selenium:
```shell
pip install 'crewai[tools]'
uv add selenium webdriver-manager
```
You'll also need to have Chrome installed on your system, as the tool uses Chrome WebDriver for browser automation.
## Example
The following example demonstrates how to use the `SeleniumScrapingTool` with a CrewAI agent:
```python Code
from crewai import Agent, Task, Crew, Process
from crewai_tools import SeleniumScrapingTool
# Initialize the tool
selenium_tool = SeleniumScrapingTool()
# Define an agent that uses the tool
web_scraper_agent = Agent(
role="Web Scraper",
goal="Extract information from websites using Selenium",
backstory="An expert web scraper who can extract content from dynamic websites.",
tools=[selenium_tool],
verbose=True,
)
# Example task to scrape content from a website
scrape_task = Task(
description="Extract the main content from the homepage of example.com. Use the CSS selector 'main' to target the main content area.",
expected_output="The main content from example.com's homepage.",
agent=web_scraper_agent,
)
# Create and run the crew
crew = Crew(
agents=[web_scraper_agent],
tasks=[scrape_task],
verbose=True,
process=Process.sequential,
)
result = crew.kickoff()
```
You can also initialize the tool with predefined parameters:
```python Code
# Initialize the tool with predefined parameters
selenium_tool = SeleniumScrapingTool(
website_url='https://example.com',
css_element='.main-content',
wait_time=5
)
# Define an agent that uses the tool
web_scraper_agent = Agent(
role="Web Scraper",
goal="Extract information from websites using Selenium",
backstory="An expert web scraper who can extract content from dynamic websites.",
tools=[selenium_tool],
verbose=True,
)
```
## Parameters
The `SeleniumScrapingTool` accepts the following parameters during initialization:
- **website_url**: Optional. The URL of the website to scrape. If provided during initialization, the agent won't need to specify it when using the tool.
- **css_element**: Optional. The CSS selector for the elements to extract. If provided during initialization, the agent won't need to specify it when using the tool.
- **cookie**: Optional. A dictionary containing cookie information, useful for simulating a logged-in session to access restricted content.
- **wait_time**: Optional. Specifies the delay (in seconds) before scraping, allowing the website and any dynamic content to fully load. Default is `3` seconds.
- **return_html**: Optional. Whether to return the HTML content instead of just the text. Default is `False`.
When using the tool with an agent, the agent will need to provide the following parameters (unless they were specified during initialization):
- **website_url**: Required. The URL of the website to scrape.
- **css_element**: Required. The CSS selector for the elements to extract.
## Agent Integration Example
Here's a more detailed example of how to integrate the `SeleniumScrapingTool` with a CrewAI agent:
```python Code
from crewai import Agent, Task, Crew, Process
from crewai_tools import SeleniumScrapingTool
# Initialize the tool
selenium_tool = SeleniumScrapingTool()
# Define an agent that uses the tool
web_scraper_agent = Agent(
role="Web Scraper",
goal="Extract and analyze information from dynamic websites",
backstory="""You are an expert web scraper who specializes in extracting
content from dynamic websites that require browser automation. You have
extensive knowledge of CSS selectors and can identify the right selectors
to target specific content on any website.""",
tools=[selenium_tool],
verbose=True,
)
# Create a task for the agent
scrape_task = Task(
description="""
Extract the following information from the news website at {website_url}:
1. The headlines of all featured articles (CSS selector: '.headline')
2. The publication dates of these articles (CSS selector: '.pub-date')
3. The author names where available (CSS selector: '.author')
Compile this information into a structured format with each article's details grouped together.
""",
expected_output="A structured list of articles with their headlines, publication dates, and authors.",
agent=web_scraper_agent,
)
# Run the task
crew = Crew(
agents=[web_scraper_agent],
tasks=[scrape_task],
verbose=True,
process=Process.sequential,
)
result = crew.kickoff(inputs={"website_url": "https://news-example.com"})
```
## Implementation Details
The `SeleniumScrapingTool` uses Selenium WebDriver to automate browser interactions:
```python Code
class SeleniumScrapingTool(BaseTool):
name: str = "Read a website content"
description: str = "A tool that can be used to read a website content."
args_schema: Type[BaseModel] = SeleniumScrapingToolSchema
def _run(self, **kwargs: Any) -> Any:
website_url = kwargs.get("website_url", self.website_url)
css_element = kwargs.get("css_element", self.css_element)
return_html = kwargs.get("return_html", self.return_html)
driver = self._create_driver(website_url, self.cookie, self.wait_time)
content = self._get_content(driver, css_element, return_html)
driver.close()
return "\n".join(content)
```
The tool performs the following steps:
1. Creates a headless Chrome browser instance
2. Navigates to the specified URL
3. Waits for the specified time to allow the page to load
4. Adds any cookies if provided
5. Extracts content based on the CSS selector
6. Returns the extracted content as text or HTML
7. Closes the browser instance
## Handling Dynamic Content
The `SeleniumScrapingTool` is particularly useful for scraping websites with dynamic content that is loaded via JavaScript. By using a real browser instance, it can:
1. Execute JavaScript on the page
2. Wait for dynamic content to load
3. Interact with elements if needed
4. Extract content that would not be available with simple HTTP requests
You can adjust the `wait_time` parameter to ensure that all dynamic content has loaded before extraction.
## Conclusion
The `SeleniumScrapingTool` provides a powerful way to extract content from websites using browser automation. By enabling agents to interact with websites as a real user would, it facilitates scraping of dynamic content that would be difficult or impossible to extract using simpler methods. This tool is particularly useful for research, data collection, and monitoring tasks that involve modern web applications with JavaScript-rendered content.

View File

@@ -0,0 +1,92 @@
---
title: Spider Scraper
description: The `SpiderTool` is designed to extract and read the content of a specified website using Spider.
icon: spider-web
---
# `SpiderTool`
## Description
[Spider](https://spider.cloud/?ref=crewai) is the [fastest](https://github.com/spider-rs/spider/blob/main/benches/BENCHMARKS.md#benchmark-results)
open source scraper and crawler that returns LLM-ready data.
It converts any website into pure HTML, markdown, metadata or text while enabling you to crawl with custom actions using AI.
## Installation
To use the `SpiderTool` you need to download the [Spider SDK](https://pypi.org/project/spider-client/)
and the `crewai[tools]` SDK too:
```shell
pip install spider-client 'crewai[tools]'
```
## Example
This example shows you how you can use the `SpiderTool` to enable your agent to scrape and crawl websites.
The data returned from the Spider API is already LLM-ready, so no need to do any cleaning there.
```python Code
from crewai_tools import SpiderTool
def main():
spider_tool = SpiderTool()
searcher = Agent(
role="Web Research Expert",
goal="Find related information from specific URL's",
backstory="An expert web researcher that uses the web extremely well",
tools=[spider_tool],
verbose=True,
)
return_metadata = Task(
description="Scrape https://spider.cloud with a limit of 1 and enable metadata",
expected_output="Metadata and 10 word summary of spider.cloud",
agent=searcher
)
crew = Crew(
agents=[searcher],
tasks=[
return_metadata,
],
verbose=2
)
crew.kickoff()
if __name__ == "__main__":
main()
```
## Arguments
| Argument | Type | Description |
|:------------------|:---------|:-----------------------------------------------------------------------------------------------------------------------------------------------------|
| **api_key** | `string` | Specifies Spider API key. If not specified, it looks for `SPIDER_API_KEY` in environment variables. |
| **params** | `object` | Optional parameters for the request. Defaults to `{"return_format": "markdown"}` to optimize content for LLMs. |
| **request** | `string` | Type of request to perform (`http`, `chrome`, `smart`). `smart` defaults to HTTP, switching to JavaScript rendering if needed. |
| **limit** | `int` | Max pages to crawl per website. Set to `0` or omit for unlimited. |
| **depth** | `int` | Max crawl depth. Set to `0` for no limit. |
| **cache** | `bool` | Enables HTTP caching to speed up repeated runs. Default is `true`. |
| **budget** | `object` | Sets path-based limits for crawled pages, e.g., `{"*":1}` for root page only. |
| **locale** | `string` | Locale for the request, e.g., `en-US`. |
| **cookies** | `string` | HTTP cookies for the request. |
| **stealth** | `bool` | Enables stealth mode for Chrome requests to avoid detection. Default is `true`. |
| **headers** | `object` | HTTP headers as a map of key-value pairs for all requests. |
| **metadata** | `bool` | Stores metadata about pages and content, aiding AI interoperability. Defaults to `false`. |
| **viewport** | `object` | Sets Chrome viewport dimensions. Default is `800x600`. |
| **encoding** | `string` | Specifies encoding type, e.g., `UTF-8`, `SHIFT_JIS`. |
| **subdomains** | `bool` | Includes subdomains in the crawl. Default is `false`. |
| **user_agent** | `string` | Custom HTTP user agent. Defaults to a random agent. |
| **store_data** | `bool` | Enables data storage for the request. Overrides `storageless` when set. Default is `false`. |
| **gpt_config** | `object` | Allows AI to generate crawl actions, with optional chaining steps via an array for `"prompt"`. |
| **fingerprint** | `bool` | Enables advanced fingerprinting for Chrome. |
| **storageless** | `bool` | Prevents all data storage, including AI embeddings. Default is `false`. |
| **readability** | `bool` | Pre-processes content for reading via [Mozillas readability](https://github.com/mozilla/readability). Improves content for LLMs. |
| **return_format** | `string` | Format to return data: `markdown`, `raw`, `text`, `html2text`. Use `raw` for default page format. |
| **proxy_enabled** | `bool` | Enables high-performance proxies to avoid network-level blocking. |
| **query_selector** | `string` | CSS query selector for content extraction from markup. |
| **full_resources** | `bool` | Downloads all resources linked to the website. |
| **request_timeout** | `int` | Timeout in seconds for requests (5-60). Default is `30`. |
| **run_in_background** | `bool` | Runs the request in the background, useful for data storage and triggering dashboard crawls. No effect if `storageless` is set. |

View File

@@ -0,0 +1,244 @@
---
title: Stagehand Tool
description: Web automation tool that integrates Stagehand with CrewAI for browser interaction and automation
icon: hand
---
# Overview
The `StagehandTool` integrates the [Stagehand](https://docs.stagehand.dev/get_started/introduction) framework with CrewAI, enabling agents to interact with websites and automate browser tasks using natural language instructions.
## Overview
Stagehand is a powerful browser automation framework built by Browserbase that allows AI agents to:
- Navigate to websites
- Click buttons, links, and other elements
- Fill in forms
- Extract data from web pages
- Observe and identify elements
- Perform complex workflows
The StagehandTool wraps the Stagehand Python SDK to provide CrewAI agents with browser control capabilities through three core primitives:
1. **Act**: Perform actions like clicking, typing, or navigating
2. **Extract**: Extract structured data from web pages
3. **Observe**: Identify and analyze elements on the page
## Prerequisites
Before using this tool, ensure you have:
1. A [Browserbase](https://www.browserbase.com/) account with API key and project ID
2. An API key for an LLM (OpenAI or Anthropic Claude)
3. The Stagehand Python SDK installed
Install the required dependency:
```bash
pip install stagehand-py
```
## Usage
### Basic Implementation
The StagehandTool can be implemented in two ways:
#### 1. Using Context Manager (Recommended)
<Tip>
The context manager approach is recommended as it ensures proper cleanup of resources even if exceptions occur.
</Tip>
```python
from crewai import Agent, Task, Crew
from crewai_tools import StagehandTool
from stagehand.schemas import AvailableModel
# Initialize the tool with your API keys using a context manager
with StagehandTool(
api_key="your-browserbase-api-key",
project_id="your-browserbase-project-id",
model_api_key="your-llm-api-key", # OpenAI or Anthropic API key
model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST, # Optional: specify which model to use
) as stagehand_tool:
# Create an agent with the tool
researcher = Agent(
role="Web Researcher",
goal="Find and summarize information from websites",
backstory="I'm an expert at finding information online.",
verbose=True,
tools=[stagehand_tool],
)
# Create a task that uses the tool
research_task = Task(
description="Go to https://www.example.com and tell me what you see on the homepage.",
agent=researcher,
)
# Run the crew
crew = Crew(
agents=[researcher],
tasks=[research_task],
verbose=True,
)
result = crew.kickoff()
print(result)
```
#### 2. Manual Resource Management
```python
from crewai import Agent, Task, Crew
from crewai_tools import StagehandTool
from stagehand.schemas import AvailableModel
# Initialize the tool with your API keys
stagehand_tool = StagehandTool(
api_key="your-browserbase-api-key",
project_id="your-browserbase-project-id",
model_api_key="your-llm-api-key",
model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,
)
try:
# Create an agent with the tool
researcher = Agent(
role="Web Researcher",
goal="Find and summarize information from websites",
backstory="I'm an expert at finding information online.",
verbose=True,
tools=[stagehand_tool],
)
# Create a task that uses the tool
research_task = Task(
description="Go to https://www.example.com and tell me what you see on the homepage.",
agent=researcher,
)
# Run the crew
crew = Crew(
agents=[researcher],
tasks=[research_task],
verbose=True,
)
result = crew.kickoff()
print(result)
finally:
# Explicitly clean up resources
stagehand_tool.close()
```
## Command Types
The StagehandTool supports three different command types for specific web automation tasks:
### 1. Act Command
The `act` command type (default) enables webpage interactions like clicking buttons, filling forms, and navigation.
```python
# Perform an action (default behavior)
result = stagehand_tool.run(
instruction="Click the login button",
url="https://example.com",
command_type="act" # Default, so can be omitted
)
# Fill out a form
result = stagehand_tool.run(
instruction="Fill the contact form with name 'John Doe', email 'john@example.com', and message 'Hello world'",
url="https://example.com/contact"
)
```
### 2. Extract Command
The `extract` command type retrieves structured data from webpages.
```python
# Extract all product information
result = stagehand_tool.run(
instruction="Extract all product names, prices, and descriptions",
url="https://example.com/products",
command_type="extract"
)
# Extract specific information with a selector
result = stagehand_tool.run(
instruction="Extract the main article title and content",
url="https://example.com/blog/article",
command_type="extract",
selector=".article-container" # Optional CSS selector
)
```
### 3. Observe Command
The `observe` command type identifies and analyzes webpage elements.
```python
# Find interactive elements
result = stagehand_tool.run(
instruction="Find all interactive elements in the navigation menu",
url="https://example.com",
command_type="observe"
)
# Identify form fields
result = stagehand_tool.run(
instruction="Identify all the input fields in the registration form",
url="https://example.com/register",
command_type="observe",
selector="#registration-form"
)
```
## Configuration Options
Customize the StagehandTool behavior with these parameters:
```python
stagehand_tool = StagehandTool(
api_key="your-browserbase-api-key",
project_id="your-browserbase-project-id",
model_api_key="your-llm-api-key",
model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,
dom_settle_timeout_ms=5000, # Wait longer for DOM to settle
headless=True, # Run browser in headless mode
self_heal=True, # Attempt to recover from errors
wait_for_captcha_solves=True, # Wait for CAPTCHA solving
verbose=1, # Control logging verbosity (0-3)
)
```
## Best Practices
1. **Be Specific**: Provide detailed instructions for better results
2. **Choose Appropriate Command Type**: Select the right command type for your task
3. **Use Selectors**: Leverage CSS selectors to improve accuracy
4. **Break Down Complex Tasks**: Split complex workflows into multiple tool calls
5. **Implement Error Handling**: Add error handling for potential issues
## Troubleshooting
Common issues and solutions:
- **Session Issues**: Verify API keys for both Browserbase and LLM provider
- **Element Not Found**: Increase `dom_settle_timeout_ms` for slower pages
- **Action Failures**: Use `observe` to identify correct elements first
- **Incomplete Data**: Refine instructions or provide specific selectors
## Additional Resources
For questions about the CrewAI integration:
- Join Stagehand's [Slack community](https://stagehand.dev/slack)
- Open an issue in the [Stagehand repository](https://github.com/browserbase/stagehand)
- Visit [Stagehand documentation](https://docs.stagehand.dev/)