mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-01-09 08:08:32 +00:00
Add pt-BR docs translation (#3039)
* docs: add pt-br translations Powered by a CrewAI Flow https://github.com/danielfsbarreto/docs_translator * Update mcp/overview.mdx brazilian docs Its en-US counterpart was updated after I did a pass, so now it includes the new section about @CrewBase
This commit is contained in:
118
docs/en/tools/ai-ml/aimindtool.mdx
Normal file
118
docs/en/tools/ai-ml/aimindtool.mdx
Normal file
@@ -0,0 +1,118 @@
|
||||
---
|
||||
title: AI Mind Tool
|
||||
description: The `AIMindTool` is designed to query data sources in natural language.
|
||||
icon: brain
|
||||
---
|
||||
|
||||
# `AIMindTool`
|
||||
|
||||
## Description
|
||||
|
||||
The `AIMindTool` is a wrapper around [AI-Minds](https://mindsdb.com/minds) provided by [MindsDB](https://mindsdb.com/). It allows you to query data sources in natural language by simply configuring their connection parameters. This tool is useful when you need answers to questions from your data stored in various data sources including PostgreSQL, MySQL, MariaDB, ClickHouse, Snowflake, and Google BigQuery.
|
||||
|
||||
Minds are AI systems that work similarly to large language models (LLMs) but go beyond by answering any question from any data. This is accomplished by:
|
||||
- Selecting the most relevant data for an answer using parametric search
|
||||
- Understanding the meaning and providing responses within the correct context through semantic search
|
||||
- Delivering precise answers by analyzing data and using machine learning (ML) models
|
||||
|
||||
## Installation
|
||||
|
||||
To incorporate this tool into your project, you need to install the Minds SDK:
|
||||
|
||||
```shell
|
||||
uv add minds-sdk
|
||||
```
|
||||
|
||||
## Steps to Get Started
|
||||
|
||||
To effectively use the `AIMindTool`, follow these steps:
|
||||
|
||||
1. **Package Installation**: Confirm that the `crewai[tools]` and `minds-sdk` packages are installed in your Python environment.
|
||||
2. **API Key Acquisition**: Sign up for a Minds account [here](https://mdb.ai/register), and obtain an API key.
|
||||
3. **Environment Configuration**: Store your obtained API key in an environment variable named `MINDS_API_KEY` to facilitate its use by the tool.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to initialize the tool and execute a query:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import AIMindTool
|
||||
|
||||
# Initialize the AIMindTool
|
||||
aimind_tool = AIMindTool(
|
||||
datasources=[
|
||||
{
|
||||
"description": "house sales data",
|
||||
"engine": "postgres",
|
||||
"connection_data": {
|
||||
"user": "demo_user",
|
||||
"password": "demo_password",
|
||||
"host": "samples.mindsdb.com",
|
||||
"port": 5432,
|
||||
"database": "demo",
|
||||
"schema": "demo_data"
|
||||
},
|
||||
"tables": ["house_sales"]
|
||||
}
|
||||
]
|
||||
)
|
||||
|
||||
# Run a natural language query
|
||||
result = aimind_tool.run("How many 3 bedroom houses were sold in 2008?")
|
||||
print(result)
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `AIMindTool` accepts the following parameters:
|
||||
|
||||
- **api_key**: Optional. Your Minds API key. If not provided, it will be read from the `MINDS_API_KEY` environment variable.
|
||||
- **datasources**: A list of dictionaries, each containing the following keys:
|
||||
- **description**: A description of the data contained in the datasource.
|
||||
- **engine**: The engine (or type) of the datasource.
|
||||
- **connection_data**: A dictionary containing the connection parameters for the datasource.
|
||||
- **tables**: A list of tables that the data source will use. This is optional and can be omitted if all tables in the data source are to be used.
|
||||
|
||||
A list of supported data sources and their connection parameters can be found [here](https://docs.mdb.ai/docs/data_sources).
|
||||
|
||||
## Agent Integration Example
|
||||
|
||||
Here's how to integrate the `AIMindTool` with a CrewAI agent:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent
|
||||
from crewai.project import agent
|
||||
from crewai_tools import AIMindTool
|
||||
|
||||
# Initialize the tool
|
||||
aimind_tool = AIMindTool(
|
||||
datasources=[
|
||||
{
|
||||
"description": "sales data",
|
||||
"engine": "postgres",
|
||||
"connection_data": {
|
||||
"user": "your_user",
|
||||
"password": "your_password",
|
||||
"host": "your_host",
|
||||
"port": 5432,
|
||||
"database": "your_db",
|
||||
"schema": "your_schema"
|
||||
},
|
||||
"tables": ["sales"]
|
||||
}
|
||||
]
|
||||
)
|
||||
|
||||
# Define an agent with the AIMindTool
|
||||
@agent
|
||||
def data_analyst(self) -> Agent:
|
||||
return Agent(
|
||||
config=self.agents_config["data_analyst"],
|
||||
allow_delegation=False,
|
||||
tools=[aimind_tool]
|
||||
)
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `AIMindTool` provides a powerful way to query your data sources using natural language, making it easier to extract insights without writing complex SQL queries. By connecting to various data sources and leveraging AI-Minds technology, this tool enables agents to access and analyze data efficiently.
|
||||
209
docs/en/tools/ai-ml/codeinterpretertool.mdx
Normal file
209
docs/en/tools/ai-ml/codeinterpretertool.mdx
Normal file
@@ -0,0 +1,209 @@
|
||||
---
|
||||
title: Code Interpreter
|
||||
description: The `CodeInterpreterTool` is a powerful tool designed for executing Python 3 code within a secure, isolated environment.
|
||||
icon: code-simple
|
||||
---
|
||||
|
||||
# `CodeInterpreterTool`
|
||||
|
||||
## Description
|
||||
|
||||
The `CodeInterpreterTool` enables CrewAI agents to execute Python 3 code that they generate autonomously. This functionality is particularly valuable as it allows agents to create code, execute it, obtain the results, and utilize that information to inform subsequent decisions and actions.
|
||||
|
||||
There are several ways to use this tool:
|
||||
|
||||
### Docker Container (Recommended)
|
||||
|
||||
This is the primary option. The code runs in a secure, isolated Docker container, ensuring safety regardless of its content.
|
||||
Make sure Docker is installed and running on your system. If you don’t have it, you can install it from [here](https://docs.docker.com/get-docker/).
|
||||
|
||||
### Sandbox environment
|
||||
|
||||
If Docker is unavailable — either not installed or not accessible for any reason — the code will be executed in a restricted Python environment - called sandbox.
|
||||
This environment is very limited, with strict restrictions on many modules and built-in functions.
|
||||
|
||||
### Unsafe Execution
|
||||
|
||||
**NOT RECOMMENDED FOR PRODUCTION**
|
||||
This mode allows execution of any Python code, including dangerous calls to `sys, os..` and similar modules. [Check out](/en/tools/ai-ml/codeinterpretertool#enabling-unsafe-mode) how to enable this mode
|
||||
|
||||
## Logging
|
||||
|
||||
The `CodeInterpreterTool` logs the selected execution strategy to STDOUT
|
||||
|
||||
|
||||
## Installation
|
||||
|
||||
To use this tool, you need to install the CrewAI tools package:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to use the `CodeInterpreterTool` with a CrewAI agent:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent, Task, Crew, Process
|
||||
from crewai_tools import CodeInterpreterTool
|
||||
|
||||
# Initialize the tool
|
||||
code_interpreter = CodeInterpreterTool()
|
||||
|
||||
# Define an agent that uses the tool
|
||||
programmer_agent = Agent(
|
||||
role="Python Programmer",
|
||||
goal="Write and execute Python code to solve problems",
|
||||
backstory="An expert Python programmer who can write efficient code to solve complex problems.",
|
||||
tools=[code_interpreter],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Example task to generate and execute code
|
||||
coding_task = Task(
|
||||
description="Write a Python function to calculate the Fibonacci sequence up to the 10th number and print the result.",
|
||||
expected_output="The Fibonacci sequence up to the 10th number.",
|
||||
agent=programmer_agent,
|
||||
)
|
||||
|
||||
# Create and run the crew
|
||||
crew = Crew(
|
||||
agents=[programmer_agent],
|
||||
tasks=[coding_task],
|
||||
verbose=True,
|
||||
process=Process.sequential,
|
||||
)
|
||||
result = crew.kickoff()
|
||||
```
|
||||
|
||||
You can also enable code execution directly when creating an agent:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent
|
||||
|
||||
# Create an agent with code execution enabled
|
||||
programmer_agent = Agent(
|
||||
role="Python Programmer",
|
||||
goal="Write and execute Python code to solve problems",
|
||||
backstory="An expert Python programmer who can write efficient code to solve complex problems.",
|
||||
allow_code_execution=True, # This automatically adds the CodeInterpreterTool
|
||||
verbose=True,
|
||||
)
|
||||
```
|
||||
|
||||
### Enabling `unsafe_mode`
|
||||
|
||||
```python Code
|
||||
from crewai_tools import CodeInterpreterTool
|
||||
|
||||
code = """
|
||||
import os
|
||||
os.system("ls -la")
|
||||
"""
|
||||
|
||||
CodeInterpreterTool(unsafe_mode=True).run(code=code)
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `CodeInterpreterTool` accepts the following parameters during initialization:
|
||||
|
||||
- **user_dockerfile_path**: Optional. Path to a custom Dockerfile to use for the code interpreter container.
|
||||
- **user_docker_base_url**: Optional. URL to the Docker daemon to use for running the container.
|
||||
- **unsafe_mode**: Optional. Whether to run code directly on the host machine instead of in a Docker container or sandbox. Default is `False`. Use with caution!
|
||||
- **default_image_tag**: Optional. Default Docker image tag. Default is `code-interpreter:latest`
|
||||
|
||||
When using the tool with an agent, the agent will need to provide:
|
||||
|
||||
- **code**: Required. The Python 3 code to execute.
|
||||
- **libraries_used**: Optional. A list of libraries used in the code that need to be installed. Default is `[]`
|
||||
|
||||
## Agent Integration Example
|
||||
|
||||
Here's a more detailed example of how to integrate the `CodeInterpreterTool` with a CrewAI agent:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools import CodeInterpreterTool
|
||||
|
||||
# Initialize the tool
|
||||
code_interpreter = CodeInterpreterTool()
|
||||
|
||||
# Define an agent that uses the tool
|
||||
data_analyst = Agent(
|
||||
role="Data Analyst",
|
||||
goal="Analyze data using Python code",
|
||||
backstory="""You are an expert data analyst who specializes in using Python
|
||||
to analyze and visualize data. You can write efficient code to process
|
||||
large datasets and extract meaningful insights.""",
|
||||
tools=[code_interpreter],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Create a task for the agent
|
||||
analysis_task = Task(
|
||||
description="""
|
||||
Write Python code to:
|
||||
1. Generate a random dataset of 100 points with x and y coordinates
|
||||
2. Calculate the correlation coefficient between x and y
|
||||
3. Create a scatter plot of the data
|
||||
4. Print the correlation coefficient and save the plot as 'scatter.png'
|
||||
|
||||
Make sure to handle any necessary imports and print the results.
|
||||
""",
|
||||
expected_output="The correlation coefficient and confirmation that the scatter plot has been saved.",
|
||||
agent=data_analyst,
|
||||
)
|
||||
|
||||
# Run the task
|
||||
crew = Crew(
|
||||
agents=[data_analyst],
|
||||
tasks=[analysis_task],
|
||||
verbose=True,
|
||||
process=Process.sequential,
|
||||
)
|
||||
result = crew.kickoff()
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
The `CodeInterpreterTool` uses Docker to create a secure environment for code execution:
|
||||
|
||||
```python Code
|
||||
class CodeInterpreterTool(BaseTool):
|
||||
name: str = "Code Interpreter"
|
||||
description: str = "Interprets Python3 code strings with a final print statement."
|
||||
args_schema: Type[BaseModel] = CodeInterpreterSchema
|
||||
default_image_tag: str = "code-interpreter:latest"
|
||||
|
||||
def _run(self, **kwargs) -> str:
|
||||
code = kwargs.get("code", self.code)
|
||||
libraries_used = kwargs.get("libraries_used", [])
|
||||
|
||||
if self.unsafe_mode:
|
||||
return self.run_code_unsafe(code, libraries_used)
|
||||
else:
|
||||
return self.run_code_safety(code, libraries_used)
|
||||
```
|
||||
|
||||
The tool performs the following steps:
|
||||
1. Verifies that the Docker image exists or builds it if necessary
|
||||
2. Creates a Docker container with the current working directory mounted
|
||||
3. Installs any required libraries specified by the agent
|
||||
4. Executes the Python code in the container
|
||||
5. Returns the output of the code execution
|
||||
6. Cleans up by stopping and removing the container
|
||||
|
||||
## Security Considerations
|
||||
|
||||
By default, the `CodeInterpreterTool` runs code in an isolated Docker container, which provides a layer of security. However, there are still some security considerations to keep in mind:
|
||||
|
||||
1. The Docker container has access to the current working directory, so sensitive files could potentially be accessed.
|
||||
2. If the Docker container is unavailable and the code needs to run safely, it will be executed in a sandbox environment. For security reasons, installing arbitrary libraries is not allowed
|
||||
3. The `unsafe_mode` parameter allows code to be executed directly on the host machine, which should only be used in trusted environments.
|
||||
4. Be cautious when allowing agents to install arbitrary libraries, as they could potentially include malicious code.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `CodeInterpreterTool` provides a powerful way for CrewAI agents to execute Python code in a relatively secure environment. By enabling agents to write and run code, it significantly expands their problem-solving capabilities, especially for tasks involving data analysis, calculations, or other computational work. This tool is particularly useful for agents that need to perform complex operations that are more efficiently expressed in code than in natural language.
|
||||
51
docs/en/tools/ai-ml/dalletool.mdx
Normal file
51
docs/en/tools/ai-ml/dalletool.mdx
Normal file
@@ -0,0 +1,51 @@
|
||||
---
|
||||
title: DALL-E Tool
|
||||
description: The `DallETool` is a powerful tool designed for generating images from textual descriptions.
|
||||
icon: image
|
||||
---
|
||||
|
||||
# `DallETool`
|
||||
|
||||
## Description
|
||||
|
||||
This tool is used to give the Agent the ability to generate images using the DALL-E model. It is a transformer-based model that generates images from textual descriptions.
|
||||
This tool allows the Agent to generate images based on the text input provided by the user.
|
||||
|
||||
## Installation
|
||||
|
||||
Install the crewai_tools package
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
Remember that when using this tool, the text must be generated by the Agent itself. The text must be a description of the image you want to generate.
|
||||
|
||||
```python Code
|
||||
from crewai_tools import DallETool
|
||||
|
||||
Agent(
|
||||
...
|
||||
tools=[DallETool()],
|
||||
)
|
||||
```
|
||||
|
||||
If needed you can also tweak the parameters of the DALL-E model by passing them as arguments to the `DallETool` class. For example:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import DallETool
|
||||
|
||||
dalle_tool = DallETool(model="dall-e-3",
|
||||
size="1024x1024",
|
||||
quality="standard",
|
||||
n=1)
|
||||
|
||||
Agent(
|
||||
...
|
||||
tools=[dalle_tool]
|
||||
)
|
||||
```
|
||||
|
||||
The parameters are based on the `client.images.generate` method from the OpenAI API. For more information on the parameters,
|
||||
please refer to the [OpenAI API documentation](https://platform.openai.com/docs/guides/images/introduction?lang=python).
|
||||
58
docs/en/tools/ai-ml/langchaintool.mdx
Normal file
58
docs/en/tools/ai-ml/langchaintool.mdx
Normal file
@@ -0,0 +1,58 @@
|
||||
---
|
||||
title: LangChain Tool
|
||||
description: The `LangChainTool` is a wrapper for LangChain tools and query engines.
|
||||
icon: link
|
||||
---
|
||||
|
||||
## `LangChainTool`
|
||||
|
||||
<Info>
|
||||
CrewAI seamlessly integrates with LangChain's comprehensive [list of tools](https://python.langchain.com/docs/integrations/tools/), all of which can be used with CrewAI.
|
||||
</Info>
|
||||
|
||||
```python Code
|
||||
import os
|
||||
from dotenv import load_dotenv
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai.tools import BaseTool
|
||||
from pydantic import Field
|
||||
from langchain_community.utilities import GoogleSerperAPIWrapper
|
||||
|
||||
# Set up your SERPER_API_KEY key in an .env file, eg:
|
||||
# SERPER_API_KEY=<your api key>
|
||||
load_dotenv()
|
||||
|
||||
search = GoogleSerperAPIWrapper()
|
||||
|
||||
class SearchTool(BaseTool):
|
||||
name: str = "Search"
|
||||
description: str = "Useful for search-based queries. Use this to find current information about markets, companies, and trends."
|
||||
search: GoogleSerperAPIWrapper = Field(default_factory=GoogleSerperAPIWrapper)
|
||||
|
||||
def _run(self, query: str) -> str:
|
||||
"""Execute the search query and return results"""
|
||||
try:
|
||||
return self.search.run(query)
|
||||
except Exception as e:
|
||||
return f"Error performing search: {str(e)}"
|
||||
|
||||
# Create Agents
|
||||
researcher = Agent(
|
||||
role='Research Analyst',
|
||||
goal='Gather current market data and trends',
|
||||
backstory="""You are an expert research analyst with years of experience in
|
||||
gathering market intelligence. You're known for your ability to find
|
||||
relevant and up-to-date market information and present it in a clear,
|
||||
actionable format.""",
|
||||
tools=[SearchTool()],
|
||||
verbose=True
|
||||
)
|
||||
|
||||
# rest of the code ...
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
Tools are pivotal in extending the capabilities of CrewAI agents, enabling them to undertake a broad spectrum of tasks and collaborate effectively.
|
||||
When building solutions with CrewAI, leverage both custom and existing tools to empower your agents and enhance the AI ecosystem. Consider utilizing error handling, caching mechanisms,
|
||||
and the flexibility of tool arguments to optimize your agents' performance and capabilities.
|
||||
146
docs/en/tools/ai-ml/llamaindextool.mdx
Normal file
146
docs/en/tools/ai-ml/llamaindextool.mdx
Normal file
@@ -0,0 +1,146 @@
|
||||
---
|
||||
title: LlamaIndex Tool
|
||||
description: The `LlamaIndexTool` is a wrapper for LlamaIndex tools and query engines.
|
||||
icon: address-book
|
||||
---
|
||||
|
||||
# `LlamaIndexTool`
|
||||
|
||||
## Description
|
||||
|
||||
The `LlamaIndexTool` is designed to be a general wrapper around LlamaIndex tools and query engines, enabling you to leverage LlamaIndex resources in terms of RAG/agentic pipelines as tools to plug into CrewAI agents. This tool allows you to seamlessly integrate LlamaIndex's powerful data processing and retrieval capabilities into your CrewAI workflows.
|
||||
|
||||
## Installation
|
||||
|
||||
To use this tool, you need to install LlamaIndex:
|
||||
|
||||
```shell
|
||||
uv add llama-index
|
||||
```
|
||||
|
||||
## Steps to Get Started
|
||||
|
||||
To effectively use the `LlamaIndexTool`, follow these steps:
|
||||
|
||||
1. **Install LlamaIndex**: Install the LlamaIndex package using the command above.
|
||||
2. **Set Up LlamaIndex**: Follow the [LlamaIndex documentation](https://docs.llamaindex.ai/) to set up a RAG/agent pipeline.
|
||||
3. **Create a Tool or Query Engine**: Create a LlamaIndex tool or query engine that you want to use with CrewAI.
|
||||
|
||||
## Example
|
||||
|
||||
The following examples demonstrate how to initialize the tool from different LlamaIndex components:
|
||||
|
||||
### From a LlamaIndex Tool
|
||||
|
||||
```python Code
|
||||
from crewai_tools import LlamaIndexTool
|
||||
from crewai import Agent
|
||||
from llama_index.core.tools import FunctionTool
|
||||
|
||||
# Example 1: Initialize from FunctionTool
|
||||
def search_data(query: str) -> str:
|
||||
"""Search for information in the data."""
|
||||
# Your implementation here
|
||||
return f"Results for: {query}"
|
||||
|
||||
# Create a LlamaIndex FunctionTool
|
||||
og_tool = FunctionTool.from_defaults(
|
||||
search_data,
|
||||
name="DataSearchTool",
|
||||
description="Search for information in the data"
|
||||
)
|
||||
|
||||
# Wrap it with LlamaIndexTool
|
||||
tool = LlamaIndexTool.from_tool(og_tool)
|
||||
|
||||
# Define an agent that uses the tool
|
||||
@agent
|
||||
def researcher(self) -> Agent:
|
||||
'''
|
||||
This agent uses the LlamaIndexTool to search for information.
|
||||
'''
|
||||
return Agent(
|
||||
config=self.agents_config["researcher"],
|
||||
tools=[tool]
|
||||
)
|
||||
```
|
||||
|
||||
### From LlamaHub Tools
|
||||
|
||||
```python Code
|
||||
from crewai_tools import LlamaIndexTool
|
||||
from llama_index.tools.wolfram_alpha import WolframAlphaToolSpec
|
||||
|
||||
# Initialize from LlamaHub Tools
|
||||
wolfram_spec = WolframAlphaToolSpec(app_id="your_app_id")
|
||||
wolfram_tools = wolfram_spec.to_tool_list()
|
||||
tools = [LlamaIndexTool.from_tool(t) for t in wolfram_tools]
|
||||
```
|
||||
|
||||
### From a LlamaIndex Query Engine
|
||||
|
||||
```python Code
|
||||
from crewai_tools import LlamaIndexTool
|
||||
from llama_index.core import VectorStoreIndex
|
||||
from llama_index.core.readers import SimpleDirectoryReader
|
||||
|
||||
# Load documents
|
||||
documents = SimpleDirectoryReader("./data").load_data()
|
||||
|
||||
# Create an index
|
||||
index = VectorStoreIndex.from_documents(documents)
|
||||
|
||||
# Create a query engine
|
||||
query_engine = index.as_query_engine()
|
||||
|
||||
# Create a LlamaIndexTool from the query engine
|
||||
query_tool = LlamaIndexTool.from_query_engine(
|
||||
query_engine,
|
||||
name="Company Data Query Tool",
|
||||
description="Use this tool to lookup information in company documents"
|
||||
)
|
||||
```
|
||||
|
||||
## Class Methods
|
||||
|
||||
The `LlamaIndexTool` provides two main class methods for creating instances:
|
||||
|
||||
### from_tool
|
||||
|
||||
Creates a `LlamaIndexTool` from a LlamaIndex tool.
|
||||
|
||||
```python Code
|
||||
@classmethod
|
||||
def from_tool(cls, tool: Any, **kwargs: Any) -> "LlamaIndexTool":
|
||||
# Implementation details
|
||||
```
|
||||
|
||||
### from_query_engine
|
||||
|
||||
Creates a `LlamaIndexTool` from a LlamaIndex query engine.
|
||||
|
||||
```python Code
|
||||
@classmethod
|
||||
def from_query_engine(
|
||||
cls,
|
||||
query_engine: Any,
|
||||
name: Optional[str] = None,
|
||||
description: Optional[str] = None,
|
||||
return_direct: bool = False,
|
||||
**kwargs: Any,
|
||||
) -> "LlamaIndexTool":
|
||||
# Implementation details
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `from_query_engine` method accepts the following parameters:
|
||||
|
||||
- **query_engine**: Required. The LlamaIndex query engine to wrap.
|
||||
- **name**: Optional. The name of the tool.
|
||||
- **description**: Optional. The description of the tool.
|
||||
- **return_direct**: Optional. Whether to return the response directly. Default is `False`.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `LlamaIndexTool` provides a powerful way to integrate LlamaIndex's capabilities into CrewAI agents. By wrapping LlamaIndex tools and query engines, it enables agents to leverage sophisticated data retrieval and processing functionalities, enhancing their ability to work with complex information sources.
|
||||
64
docs/en/tools/ai-ml/overview.mdx
Normal file
64
docs/en/tools/ai-ml/overview.mdx
Normal file
@@ -0,0 +1,64 @@
|
||||
---
|
||||
title: "Overview"
|
||||
description: "Leverage AI services, generate images, process vision, and build intelligent systems"
|
||||
icon: "face-smile"
|
||||
---
|
||||
|
||||
These tools integrate with AI and machine learning services to enhance your agents with advanced capabilities like image generation, vision processing, and intelligent code execution.
|
||||
|
||||
## **Available Tools**
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="DALL-E Tool" icon="image" href="/en/tools/ai-ml/dalletool">
|
||||
Generate AI images using OpenAI's DALL-E model.
|
||||
</Card>
|
||||
|
||||
<Card title="Vision Tool" icon="eye" href="/en/tools/ai-ml/visiontool">
|
||||
Process and analyze images with computer vision capabilities.
|
||||
</Card>
|
||||
|
||||
<Card title="AI Mind Tool" icon="brain" href="/en/tools/ai-ml/aimindtool">
|
||||
Advanced AI reasoning and decision-making capabilities.
|
||||
</Card>
|
||||
|
||||
<Card title="LlamaIndex Tool" icon="llama" href="/en/tools/ai-ml/llamaindextool">
|
||||
Build knowledge bases and retrieval systems with LlamaIndex.
|
||||
</Card>
|
||||
|
||||
<Card title="LangChain Tool" icon="link" href="/en/tools/ai-ml/langchaintool">
|
||||
Integrate with LangChain for complex AI workflows.
|
||||
</Card>
|
||||
|
||||
<Card title="RAG Tool" icon="database" href="/en/tools/ai-ml/ragtool">
|
||||
Implement Retrieval-Augmented Generation systems.
|
||||
</Card>
|
||||
|
||||
<Card title="Code Interpreter Tool" icon="code" href="/en/tools/ai-ml/codeinterpretertool">
|
||||
Execute Python code and perform data analysis.
|
||||
</Card>
|
||||
|
||||
|
||||
</CardGroup>
|
||||
|
||||
## **Common Use Cases**
|
||||
|
||||
- **Content Generation**: Create images, text, and multimedia content
|
||||
- **Data Analysis**: Execute code and analyze complex datasets
|
||||
- **Knowledge Systems**: Build RAG systems and intelligent databases
|
||||
- **Computer Vision**: Process and understand visual content
|
||||
- **AI Safety**: Implement content moderation and safety checks
|
||||
|
||||
```python
|
||||
from crewai_tools import DallETool, VisionTool, CodeInterpreterTool
|
||||
|
||||
# Create AI tools
|
||||
image_generator = DallETool()
|
||||
vision_processor = VisionTool()
|
||||
code_executor = CodeInterpreterTool()
|
||||
|
||||
# Add to your agent
|
||||
agent = Agent(
|
||||
role="AI Specialist",
|
||||
tools=[image_generator, vision_processor, code_executor],
|
||||
goal="Create and analyze content using AI capabilities"
|
||||
)
|
||||
172
docs/en/tools/ai-ml/ragtool.mdx
Normal file
172
docs/en/tools/ai-ml/ragtool.mdx
Normal file
@@ -0,0 +1,172 @@
|
||||
---
|
||||
title: RAG Tool
|
||||
description: The `RagTool` is a dynamic knowledge base tool for answering questions using Retrieval-Augmented Generation.
|
||||
icon: vector-square
|
||||
---
|
||||
|
||||
# `RagTool`
|
||||
|
||||
## Description
|
||||
|
||||
The `RagTool` is designed to answer questions by leveraging the power of Retrieval-Augmented Generation (RAG) through EmbedChain.
|
||||
It provides a dynamic knowledge base that can be queried to retrieve relevant information from various data sources.
|
||||
This tool is particularly useful for applications that require access to a vast array of information and need to provide contextually relevant answers.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to initialize the tool and use it with different data sources:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import RagTool
|
||||
|
||||
# Create a RAG tool with default settings
|
||||
rag_tool = RagTool()
|
||||
|
||||
# Add content from a file
|
||||
rag_tool.add(data_type="file", path="path/to/your/document.pdf")
|
||||
|
||||
# Add content from a web page
|
||||
rag_tool.add(data_type="web_page", url="https://example.com")
|
||||
|
||||
# Define an agent with the RagTool
|
||||
@agent
|
||||
def knowledge_expert(self) -> Agent:
|
||||
'''
|
||||
This agent uses the RagTool to answer questions about the knowledge base.
|
||||
'''
|
||||
return Agent(
|
||||
config=self.agents_config["knowledge_expert"],
|
||||
allow_delegation=False,
|
||||
tools=[rag_tool]
|
||||
)
|
||||
```
|
||||
|
||||
## Supported Data Sources
|
||||
|
||||
The `RagTool` can be used with a wide variety of data sources, including:
|
||||
|
||||
- 📰 PDF files
|
||||
- 📊 CSV files
|
||||
- 📃 JSON files
|
||||
- 📝 Text
|
||||
- 📁 Directories/Folders
|
||||
- 🌐 HTML Web pages
|
||||
- 📽️ YouTube Channels
|
||||
- 📺 YouTube Videos
|
||||
- 📚 Documentation websites
|
||||
- 📝 MDX files
|
||||
- 📄 DOCX files
|
||||
- 🧾 XML files
|
||||
- 📬 Gmail
|
||||
- 📝 GitHub repositories
|
||||
- 🐘 PostgreSQL databases
|
||||
- 🐬 MySQL databases
|
||||
- 🤖 Slack conversations
|
||||
- 💬 Discord messages
|
||||
- 🗨️ Discourse forums
|
||||
- 📝 Substack newsletters
|
||||
- 🐝 Beehiiv content
|
||||
- 💾 Dropbox files
|
||||
- 🖼️ Images
|
||||
- ⚙️ Custom data sources
|
||||
|
||||
## Parameters
|
||||
|
||||
The `RagTool` accepts the following parameters:
|
||||
|
||||
- **summarize**: Optional. Whether to summarize the retrieved content. Default is `False`.
|
||||
- **adapter**: Optional. A custom adapter for the knowledge base. If not provided, an EmbedchainAdapter will be used.
|
||||
- **config**: Optional. Configuration for the underlying EmbedChain App.
|
||||
|
||||
## Adding Content
|
||||
|
||||
You can add content to the knowledge base using the `add` method:
|
||||
|
||||
```python Code
|
||||
# Add a PDF file
|
||||
rag_tool.add(data_type="file", path="path/to/your/document.pdf")
|
||||
|
||||
# Add a web page
|
||||
rag_tool.add(data_type="web_page", url="https://example.com")
|
||||
|
||||
# Add a YouTube video
|
||||
rag_tool.add(data_type="youtube_video", url="https://www.youtube.com/watch?v=VIDEO_ID")
|
||||
|
||||
# Add a directory of files
|
||||
rag_tool.add(data_type="directory", path="path/to/your/directory")
|
||||
```
|
||||
|
||||
## Agent Integration Example
|
||||
|
||||
Here's how to integrate the `RagTool` with a CrewAI agent:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent
|
||||
from crewai.project import agent
|
||||
from crewai_tools import RagTool
|
||||
|
||||
# Initialize the tool and add content
|
||||
rag_tool = RagTool()
|
||||
rag_tool.add(data_type="web_page", url="https://docs.crewai.com")
|
||||
rag_tool.add(data_type="file", path="company_data.pdf")
|
||||
|
||||
# Define an agent with the RagTool
|
||||
@agent
|
||||
def knowledge_expert(self) -> Agent:
|
||||
return Agent(
|
||||
config=self.agents_config["knowledge_expert"],
|
||||
allow_delegation=False,
|
||||
tools=[rag_tool]
|
||||
)
|
||||
```
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
You can customize the behavior of the `RagTool` by providing a configuration dictionary:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import RagTool
|
||||
|
||||
# Create a RAG tool with custom configuration
|
||||
config = {
|
||||
"app": {
|
||||
"name": "custom_app",
|
||||
},
|
||||
"llm": {
|
||||
"provider": "openai",
|
||||
"config": {
|
||||
"model": "gpt-4",
|
||||
}
|
||||
},
|
||||
"embedding_model": {
|
||||
"provider": "openai",
|
||||
"config": {
|
||||
"model": "text-embedding-ada-002"
|
||||
}
|
||||
},
|
||||
"vectordb": {
|
||||
"provider": "elasticsearch",
|
||||
"config": {
|
||||
"collection_name": "my-collection",
|
||||
"cloud_id": "deployment-name:xxxx",
|
||||
"api_key": "your-key",
|
||||
"verify_certs": False
|
||||
}
|
||||
},
|
||||
"chunker": {
|
||||
"chunk_size": 400,
|
||||
"chunk_overlap": 100,
|
||||
"length_function": "len",
|
||||
"min_chunk_size": 0
|
||||
}
|
||||
}
|
||||
|
||||
rag_tool = RagTool(config=config, summarize=True)
|
||||
```
|
||||
|
||||
The internal RAG tool utilizes the Embedchain adapter, allowing you to pass any configuration options that are supported by Embedchain.
|
||||
You can refer to the [Embedchain documentation](https://docs.embedchain.ai/components/introduction) for details.
|
||||
Make sure to review the configuration options available in the .yaml file.
|
||||
|
||||
## Conclusion
|
||||
The `RagTool` provides a powerful way to create and query knowledge bases from various data sources. By leveraging Retrieval-Augmented Generation, it enables agents to access and retrieve relevant information efficiently, enhancing their ability to provide accurate and contextually appropriate responses.
|
||||
49
docs/en/tools/ai-ml/visiontool.mdx
Normal file
49
docs/en/tools/ai-ml/visiontool.mdx
Normal file
@@ -0,0 +1,49 @@
|
||||
---
|
||||
title: Vision Tool
|
||||
description: The `VisionTool` is designed to extract text from images.
|
||||
icon: eye
|
||||
---
|
||||
|
||||
# `VisionTool`
|
||||
|
||||
## Description
|
||||
|
||||
This tool is used to extract text from images. When passed to the agent it will extract the text from the image and then use it to generate a response, report or any other output.
|
||||
The URL or the PATH of the image should be passed to the Agent.
|
||||
|
||||
## Installation
|
||||
|
||||
Install the crewai_tools package
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
In order to use the VisionTool, the OpenAI API key should be set in the environment variable `OPENAI_API_KEY`.
|
||||
|
||||
```python Code
|
||||
from crewai_tools import VisionTool
|
||||
|
||||
vision_tool = VisionTool()
|
||||
|
||||
@agent
|
||||
def researcher(self) -> Agent:
|
||||
'''
|
||||
This agent uses the VisionTool to extract text from images.
|
||||
'''
|
||||
return Agent(
|
||||
config=self.agents_config["researcher"],
|
||||
allow_delegation=False,
|
||||
tools=[vision_tool]
|
||||
)
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
The VisionTool requires the following arguments:
|
||||
|
||||
| Argument | Type | Description |
|
||||
| :----------------- | :------- | :------------------------------------------------------------------------------- |
|
||||
| **image_path_url** | `string` | **Mandatory**. The path to the image file from which text needs to be extracted. |
|
||||
99
docs/en/tools/automation/apifyactorstool.mdx
Normal file
99
docs/en/tools/automation/apifyactorstool.mdx
Normal file
@@ -0,0 +1,99 @@
|
||||
---
|
||||
title: Apify Actors
|
||||
description: "`ApifyActorsTool` lets you call Apify Actors to provide your CrewAI workflows with web scraping, crawling, data extraction, and web automation capabilities."
|
||||
# hack to use custom Apify icon
|
||||
icon: "); -webkit-mask-image: url('https://upload.wikimedia.org/wikipedia/commons/a/ae/Apify.svg');/*"
|
||||
---
|
||||
|
||||
# `ApifyActorsTool`
|
||||
|
||||
Integrate [Apify Actors](https://apify.com/actors) into your CrewAI workflows.
|
||||
|
||||
## Description
|
||||
|
||||
The `ApifyActorsTool` connects [Apify Actors](https://apify.com/actors), cloud-based programs for web scraping and automation, to your CrewAI workflows.
|
||||
Use any of the 4,000+ Actors on [Apify Store](https://apify.com/store) for use cases such as extracting data from social media, search engines, online maps, e-commerce sites, travel portals, or general websites.
|
||||
|
||||
For details, see the [Apify CrewAI integration](https://docs.apify.com/platform/integrations/crewai) in Apify documentation.
|
||||
|
||||
## Steps to get started
|
||||
|
||||
<Steps>
|
||||
<Step title="Install dependencies">
|
||||
Install `crewai[tools]` and `langchain-apify` using pip: `pip install 'crewai[tools]' langchain-apify`.
|
||||
</Step>
|
||||
<Step title="Obtain an Apify API token">
|
||||
Sign up to [Apify Console](https://console.apify.com/) and get your [Apify API token](https://console.apify.com/settings/integrations)..
|
||||
</Step>
|
||||
<Step title="Configure environment">
|
||||
Set your Apify API token as the `APIFY_API_TOKEN` environment variable to enable the tool's functionality.
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
## Usage example
|
||||
|
||||
Use the `ApifyActorsTool` manually to run the [RAG Web Browser Actor](https://apify.com/apify/rag-web-browser) to perform a web search:
|
||||
|
||||
```python
|
||||
from crewai_tools import ApifyActorsTool
|
||||
|
||||
# Initialize the tool with an Apify Actor
|
||||
tool = ApifyActorsTool(actor_name="apify/rag-web-browser")
|
||||
|
||||
# Run the tool with input parameters
|
||||
results = tool.run(run_input={"query": "What is CrewAI?", "maxResults": 5})
|
||||
|
||||
# Process the results
|
||||
for result in results:
|
||||
print(f"URL: {result['metadata']['url']}")
|
||||
print(f"Content: {result.get('markdown', 'N/A')[:100]}...")
|
||||
```
|
||||
|
||||
### Expected output
|
||||
|
||||
Here is the output from running the code above:
|
||||
|
||||
```text
|
||||
URL: https://www.example.com/crewai-intro
|
||||
Content: CrewAI is a framework for building AI-powered workflows...
|
||||
URL: https://docs.crewai.com/
|
||||
Content: Official documentation for CrewAI...
|
||||
```
|
||||
|
||||
The `ApifyActorsTool` automatically fetches the Actor definition and input schema from Apify using the provided `actor_name` and then constructs the tool description and argument schema. This means you need to specify only a valid `actor_name`, and the tool handles the rest when used with agents—no need to specify the `run_input`. Here's how it works:
|
||||
|
||||
```python
|
||||
from crewai import Agent
|
||||
from crewai_tools import ApifyActorsTool
|
||||
|
||||
rag_browser = ApifyActorsTool(actor_name="apify/rag-web-browser")
|
||||
|
||||
agent = Agent(
|
||||
role="Research Analyst",
|
||||
goal="Find and summarize information about specific topics",
|
||||
backstory="You are an experienced researcher with attention to detail",
|
||||
tools=[rag_browser],
|
||||
)
|
||||
```
|
||||
|
||||
You can run other Actors from [Apify Store](https://apify.com/store) simply by changing the `actor_name` and, when using it manually, adjusting the `run_input` based on the Actor input schema.
|
||||
|
||||
For an example of usage with agents, see the [CrewAI Actor template](https://apify.com/templates/python-crewai).
|
||||
|
||||
## Configuration
|
||||
|
||||
The `ApifyActorsTool` requires these inputs to work:
|
||||
|
||||
- **`actor_name`**
|
||||
The ID of the Apify Actor to run, e.g., `"apify/rag-web-browser"`. Browse all Actors on [Apify Store](https://apify.com/store).
|
||||
- **`run_input`**
|
||||
A dictionary of input parameters for the Actor when running the tool manually.
|
||||
- For example, for the `apify/rag-web-browser` Actor: `{"query": "search term", "maxResults": 5}`
|
||||
- See the Actor's [input schema](https://apify.com/apify/rag-web-browser/input-schema) for the list of input parameters.
|
||||
|
||||
## Resources
|
||||
|
||||
- **[Apify](https://apify.com/)**: Explore the Apify platform.
|
||||
- **[How to build an AI agent on Apify](https://blog.apify.com/how-to-build-an-ai-agent/)** - A complete step-by-step guide to creating, publishing, and monetizing AI agents on the Apify platform.
|
||||
- **[RAG Web Browser Actor](https://apify.com/apify/rag-web-browser)**: A popular Actor for web search for LLMs.
|
||||
- **[CrewAI Integration Guide](https://docs.apify.com/platform/integrations/crewai)**: Follow the official guide for integrating Apify and CrewAI.
|
||||
118
docs/en/tools/automation/composiotool.mdx
Normal file
118
docs/en/tools/automation/composiotool.mdx
Normal file
@@ -0,0 +1,118 @@
|
||||
---
|
||||
title: Composio Tool
|
||||
description: Composio provides 250+ production-ready tools for AI agents with flexible authentication management.
|
||||
icon: gear-code
|
||||
---
|
||||
|
||||
# `ComposioToolSet`
|
||||
|
||||
## Description
|
||||
Composio is an integration platform that allows you to connect your AI agents to 250+ tools. Key features include:
|
||||
|
||||
- **Enterprise-Grade Authentication**: Built-in support for OAuth, API Keys, JWT with automatic token refresh
|
||||
- **Full Observability**: Detailed tool usage logs, execution timestamps, and more
|
||||
|
||||
## Installation
|
||||
|
||||
To incorporate Composio tools into your project, follow the instructions below:
|
||||
|
||||
```shell
|
||||
pip install composio-crewai
|
||||
pip install crewai
|
||||
```
|
||||
|
||||
After the installation is complete, either run `composio login` or export your composio API key as `COMPOSIO_API_KEY`. Get your Composio API key from [here](https://app.composio.dev)
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to initialize the tool and execute a github action:
|
||||
|
||||
1. Initialize Composio toolset
|
||||
|
||||
```python Code
|
||||
from composio_crewai import ComposioToolSet, App, Action
|
||||
from crewai import Agent, Task, Crew
|
||||
|
||||
toolset = ComposioToolSet()
|
||||
```
|
||||
|
||||
2. Connect your GitHub account
|
||||
<CodeGroup>
|
||||
```shell CLI
|
||||
composio add github
|
||||
```
|
||||
```python Code
|
||||
request = toolset.initiate_connection(app=App.GITHUB)
|
||||
print(f"Open this URL to authenticate: {request.redirectUrl}")
|
||||
```
|
||||
</CodeGroup>
|
||||
|
||||
3. Get Tools
|
||||
|
||||
- Retrieving all the tools from an app (not recommended for production):
|
||||
```python Code
|
||||
tools = toolset.get_tools(apps=[App.GITHUB])
|
||||
```
|
||||
|
||||
- Filtering tools based on tags:
|
||||
```python Code
|
||||
tag = "users"
|
||||
|
||||
filtered_action_enums = toolset.find_actions_by_tags(
|
||||
App.GITHUB,
|
||||
tags=[tag],
|
||||
)
|
||||
|
||||
tools = toolset.get_tools(actions=filtered_action_enums)
|
||||
```
|
||||
|
||||
- Filtering tools based on use case:
|
||||
```python Code
|
||||
use_case = "Star a repository on GitHub"
|
||||
|
||||
filtered_action_enums = toolset.find_actions_by_use_case(
|
||||
App.GITHUB, use_case=use_case, advanced=False
|
||||
)
|
||||
|
||||
tools = toolset.get_tools(actions=filtered_action_enums)
|
||||
```
|
||||
<Tip>Set `advanced` to True to get actions for complex use cases</Tip>
|
||||
|
||||
- Using specific tools:
|
||||
|
||||
In this demo, we will use the `GITHUB_STAR_A_REPOSITORY_FOR_THE_AUTHENTICATED_USER` action from the GitHub app.
|
||||
```python Code
|
||||
tools = toolset.get_tools(
|
||||
actions=[Action.GITHUB_STAR_A_REPOSITORY_FOR_THE_AUTHENTICATED_USER]
|
||||
)
|
||||
```
|
||||
Learn more about filtering actions [here](https://docs.composio.dev/patterns/tools/use-tools/use-specific-actions)
|
||||
|
||||
4. Define agent
|
||||
|
||||
```python Code
|
||||
crewai_agent = Agent(
|
||||
role="GitHub Agent",
|
||||
goal="You take action on GitHub using GitHub APIs",
|
||||
backstory="You are AI agent that is responsible for taking actions on GitHub on behalf of users using GitHub APIs",
|
||||
verbose=True,
|
||||
tools=tools,
|
||||
llm= # pass an llm
|
||||
)
|
||||
```
|
||||
|
||||
5. Execute task
|
||||
|
||||
```python Code
|
||||
task = Task(
|
||||
description="Star a repo composiohq/composio on GitHub",
|
||||
agent=crewai_agent,
|
||||
expected_output="Status of the operation",
|
||||
)
|
||||
|
||||
crew = Crew(agents=[crewai_agent], tasks=[task])
|
||||
|
||||
crew.kickoff()
|
||||
```
|
||||
|
||||
* More detailed list of tools can be found [here](https://app.composio.dev)
|
||||
126
docs/en/tools/automation/multiontool.mdx
Normal file
126
docs/en/tools/automation/multiontool.mdx
Normal file
@@ -0,0 +1,126 @@
|
||||
---
|
||||
title: MultiOn Tool
|
||||
description: The `MultiOnTool` empowers CrewAI agents with the capability to navigate and interact with the web through natural language instructions.
|
||||
icon: globe
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The `MultiOnTool` is designed to wrap [MultiOn's](https://docs.multion.ai/welcome) web browsing capabilities, enabling CrewAI agents to control web browsers using natural language instructions. This tool facilitates seamless web browsing, making it an essential asset for projects requiring dynamic web data interaction and automation of web-based tasks.
|
||||
|
||||
## Installation
|
||||
|
||||
To use this tool, you need to install the MultiOn package:
|
||||
|
||||
```shell
|
||||
uv add multion
|
||||
```
|
||||
|
||||
You'll also need to install the MultiOn browser extension and enable API usage.
|
||||
|
||||
## Steps to Get Started
|
||||
|
||||
To effectively use the `MultiOnTool`, follow these steps:
|
||||
|
||||
1. **Install CrewAI**: Ensure that the `crewai[tools]` package is installed in your Python environment.
|
||||
2. **Install and use MultiOn**: Follow [MultiOn documentation](https://docs.multion.ai/learn/browser-extension) for installing the MultiOn Browser Extension.
|
||||
3. **Enable API Usage**: Click on the MultiOn extension in the extensions folder of your browser (not the hovering MultiOn icon on the web page) to open the extension configurations. Click the API Enabled toggle to enable the API.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to initialize the tool and execute a web browsing task:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools import MultiOnTool
|
||||
|
||||
# Initialize the tool
|
||||
multion_tool = MultiOnTool(api_key="YOUR_MULTION_API_KEY", local=False)
|
||||
|
||||
# Define an agent that uses the tool
|
||||
browser_agent = Agent(
|
||||
role="Browser Agent",
|
||||
goal="Control web browsers using natural language",
|
||||
backstory="An expert browsing agent.",
|
||||
tools=[multion_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Example task to search and summarize news
|
||||
browse_task = Task(
|
||||
description="Summarize the top 3 trending AI News headlines",
|
||||
expected_output="A summary of the top 3 trending AI News headlines",
|
||||
agent=browser_agent,
|
||||
)
|
||||
|
||||
# Create and run the crew
|
||||
crew = Crew(agents=[browser_agent], tasks=[browse_task])
|
||||
result = crew.kickoff()
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `MultiOnTool` accepts the following parameters during initialization:
|
||||
|
||||
- **api_key**: Optional. Specifies the MultiOn API key. If not provided, it will look for the `MULTION_API_KEY` environment variable.
|
||||
- **local**: Optional. Set to `True` to run the agent locally on your browser. Make sure the MultiOn browser extension is installed and API Enabled is checked. Default is `False`.
|
||||
- **max_steps**: Optional. Sets the maximum number of steps the MultiOn agent can take for a command. Default is `3`.
|
||||
|
||||
## Usage
|
||||
|
||||
When using the `MultiOnTool`, the agent will provide natural language instructions that the tool translates into web browsing actions. The tool returns the results of the browsing session along with a status.
|
||||
|
||||
```python Code
|
||||
# Example of using the tool with an agent
|
||||
browser_agent = Agent(
|
||||
role="Web Browser Agent",
|
||||
goal="Search for and summarize information from the web",
|
||||
backstory="An expert at finding and extracting information from websites.",
|
||||
tools=[multion_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Create a task for the agent
|
||||
search_task = Task(
|
||||
description="Search for the latest AI news on TechCrunch and summarize the top 3 headlines",
|
||||
expected_output="A summary of the top 3 AI news headlines from TechCrunch",
|
||||
agent=browser_agent,
|
||||
)
|
||||
|
||||
# Run the task
|
||||
crew = Crew(agents=[browser_agent], tasks=[search_task])
|
||||
result = crew.kickoff()
|
||||
```
|
||||
|
||||
If the status returned is `CONTINUE`, the agent should be instructed to reissue the same instruction to continue execution.
|
||||
|
||||
## Implementation Details
|
||||
|
||||
The `MultiOnTool` is implemented as a subclass of `BaseTool` from CrewAI. It wraps the MultiOn client to provide web browsing capabilities:
|
||||
|
||||
```python Code
|
||||
class MultiOnTool(BaseTool):
|
||||
"""Tool to wrap MultiOn Browse Capabilities."""
|
||||
|
||||
name: str = "Multion Browse Tool"
|
||||
description: str = """Multion gives the ability for LLMs to control web browsers using natural language instructions.
|
||||
If the status is 'CONTINUE', reissue the same instruction to continue execution
|
||||
"""
|
||||
|
||||
# Implementation details...
|
||||
|
||||
def _run(self, cmd: str, *args: Any, **kwargs: Any) -> str:
|
||||
"""
|
||||
Run the Multion client with the given command.
|
||||
|
||||
Args:
|
||||
cmd (str): The detailed and specific natural language instruction for web browsing
|
||||
*args (Any): Additional arguments to pass to the Multion client
|
||||
**kwargs (Any): Additional keyword arguments to pass to the Multion client
|
||||
"""
|
||||
# Implementation details...
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `MultiOnTool` provides a powerful way to integrate web browsing capabilities into CrewAI agents. By enabling agents to interact with websites through natural language instructions, it opens up a wide range of possibilities for web-based tasks, from data collection and research to automated interactions with web services.
|
||||
55
docs/en/tools/automation/overview.mdx
Normal file
55
docs/en/tools/automation/overview.mdx
Normal file
@@ -0,0 +1,55 @@
|
||||
---
|
||||
title: "Overview"
|
||||
description: "Automate workflows and integrate with external platforms and services"
|
||||
icon: "face-smile"
|
||||
---
|
||||
|
||||
These tools enable your agents to automate workflows, integrate with external platforms, and connect with various third-party services for enhanced functionality.
|
||||
|
||||
## **Available Tools**
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Apify Actor Tool" icon="spider" href="/en/tools/automation/apifyactorstool">
|
||||
Run Apify actors for web scraping and automation tasks.
|
||||
</Card>
|
||||
|
||||
<Card title="Composio Tool" icon="puzzle-piece" href="/en/tools/automation/composiotool">
|
||||
Integrate with hundreds of apps and services through Composio.
|
||||
</Card>
|
||||
|
||||
<Card title="Multion Tool" icon="window-restore" href="/en/tools/automation/multiontool">
|
||||
Automate browser interactions and web-based workflows.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
## **Common Use Cases**
|
||||
|
||||
- **Workflow Automation**: Automate repetitive tasks and processes
|
||||
- **API Integration**: Connect with external APIs and services
|
||||
- **Data Synchronization**: Sync data between different platforms
|
||||
- **Process Orchestration**: Coordinate complex multi-step workflows
|
||||
- **Third-party Services**: Leverage external tools and platforms
|
||||
|
||||
```python
|
||||
from crewai_tools import ApifyActorTool, ComposioTool, MultiOnTool
|
||||
|
||||
# Create automation tools
|
||||
apify_automation = ApifyActorTool()
|
||||
platform_integration = ComposioTool()
|
||||
browser_automation = MultiOnTool()
|
||||
|
||||
# Add to your agent
|
||||
agent = Agent(
|
||||
role="Automation Specialist",
|
||||
tools=[apify_automation, platform_integration, browser_automation],
|
||||
goal="Automate workflows and integrate systems"
|
||||
)
|
||||
```
|
||||
|
||||
## **Integration Benefits**
|
||||
|
||||
- **Efficiency**: Reduce manual work through automation
|
||||
- **Scalability**: Handle increased workloads automatically
|
||||
- **Reliability**: Consistent execution of workflows
|
||||
- **Connectivity**: Bridge different systems and platforms
|
||||
- **Productivity**: Focus on high-value tasks while automation handles routine work
|
||||
187
docs/en/tools/cloud-storage/bedrockinvokeagenttool.mdx
Normal file
187
docs/en/tools/cloud-storage/bedrockinvokeagenttool.mdx
Normal file
@@ -0,0 +1,187 @@
|
||||
---
|
||||
title: Bedrock Invoke Agent Tool
|
||||
description: Enables CrewAI agents to invoke Amazon Bedrock Agents and leverage their capabilities within your workflows
|
||||
icon: aws
|
||||
---
|
||||
|
||||
# `BedrockInvokeAgentTool`
|
||||
|
||||
The `BedrockInvokeAgentTool` enables CrewAI agents to invoke Amazon Bedrock Agents and leverage their capabilities within your workflows.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
uv pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- AWS credentials configured (either through environment variables or AWS CLI)
|
||||
- `boto3` and `python-dotenv` packages
|
||||
- Access to Amazon Bedrock Agents
|
||||
|
||||
## Usage
|
||||
|
||||
Here's how to use the tool with a CrewAI agent:
|
||||
|
||||
```python {2, 4-8}
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools.aws.bedrock.agents.invoke_agent_tool import BedrockInvokeAgentTool
|
||||
|
||||
# Initialize the tool
|
||||
agent_tool = BedrockInvokeAgentTool(
|
||||
agent_id="your-agent-id",
|
||||
agent_alias_id="your-agent-alias-id"
|
||||
)
|
||||
|
||||
# Create a CrewAI agent that uses the tool
|
||||
aws_expert = Agent(
|
||||
role='AWS Service Expert',
|
||||
goal='Help users understand AWS services and quotas',
|
||||
backstory='I am an expert in AWS services and can provide detailed information about them.',
|
||||
tools=[agent_tool],
|
||||
verbose=True
|
||||
)
|
||||
|
||||
# Create a task for the agent
|
||||
quota_task = Task(
|
||||
description="Find out the current service quotas for EC2 in us-west-2 and explain any recent changes.",
|
||||
agent=aws_expert
|
||||
)
|
||||
|
||||
# Create a crew with the agent
|
||||
crew = Crew(
|
||||
agents=[aws_expert],
|
||||
tasks=[quota_task],
|
||||
verbose=2
|
||||
)
|
||||
|
||||
# Run the crew
|
||||
result = crew.kickoff()
|
||||
print(result)
|
||||
```
|
||||
|
||||
## Tool Arguments
|
||||
|
||||
| Argument | Type | Required | Default | Description |
|
||||
|:---------|:-----|:---------|:--------|:------------|
|
||||
| **agent_id** | `str` | Yes | None | The unique identifier of the Bedrock agent |
|
||||
| **agent_alias_id** | `str` | Yes | None | The unique identifier of the agent alias |
|
||||
| **session_id** | `str` | No | timestamp | The unique identifier of the session |
|
||||
| **enable_trace** | `bool` | No | False | Whether to enable trace for debugging |
|
||||
| **end_session** | `bool` | No | False | Whether to end the session after invocation |
|
||||
| **description** | `str` | No | None | Custom description for the tool |
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
BEDROCK_AGENT_ID=your-agent-id # Alternative to passing agent_id
|
||||
BEDROCK_AGENT_ALIAS_ID=your-agent-alias-id # Alternative to passing agent_alias_id
|
||||
AWS_REGION=your-aws-region # Defaults to us-west-2
|
||||
AWS_ACCESS_KEY_ID=your-access-key # Required for AWS authentication
|
||||
AWS_SECRET_ACCESS_KEY=your-secret-key # Required for AWS authentication
|
||||
```
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Multi-Agent Workflow with Session Management
|
||||
|
||||
```python {2, 4-22}
|
||||
from crewai import Agent, Task, Crew, Process
|
||||
from crewai_tools.aws.bedrock.agents.invoke_agent_tool import BedrockInvokeAgentTool
|
||||
|
||||
# Initialize tools with session management
|
||||
initial_tool = BedrockInvokeAgentTool(
|
||||
agent_id="your-agent-id",
|
||||
agent_alias_id="your-agent-alias-id",
|
||||
session_id="custom-session-id"
|
||||
)
|
||||
|
||||
followup_tool = BedrockInvokeAgentTool(
|
||||
agent_id="your-agent-id",
|
||||
agent_alias_id="your-agent-alias-id",
|
||||
session_id="custom-session-id"
|
||||
)
|
||||
|
||||
final_tool = BedrockInvokeAgentTool(
|
||||
agent_id="your-agent-id",
|
||||
agent_alias_id="your-agent-alias-id",
|
||||
session_id="custom-session-id",
|
||||
end_session=True
|
||||
)
|
||||
|
||||
# Create agents for different stages
|
||||
researcher = Agent(
|
||||
role='AWS Service Researcher',
|
||||
goal='Gather information about AWS services',
|
||||
backstory='I am specialized in finding detailed AWS service information.',
|
||||
tools=[initial_tool]
|
||||
)
|
||||
|
||||
analyst = Agent(
|
||||
role='Service Compatibility Analyst',
|
||||
goal='Analyze service compatibility and requirements',
|
||||
backstory='I analyze AWS services for compatibility and integration possibilities.',
|
||||
tools=[followup_tool]
|
||||
)
|
||||
|
||||
summarizer = Agent(
|
||||
role='Technical Documentation Writer',
|
||||
goal='Create clear technical summaries',
|
||||
backstory='I specialize in creating clear, concise technical documentation.',
|
||||
tools=[final_tool]
|
||||
)
|
||||
|
||||
# Create tasks
|
||||
research_task = Task(
|
||||
description="Find all available AWS services in us-west-2 region.",
|
||||
agent=researcher
|
||||
)
|
||||
|
||||
analysis_task = Task(
|
||||
description="Analyze which services support IPv6 and their implementation requirements.",
|
||||
agent=analyst
|
||||
)
|
||||
|
||||
summary_task = Task(
|
||||
description="Create a summary of IPv6-compatible services and their key features.",
|
||||
agent=summarizer
|
||||
)
|
||||
|
||||
# Create a crew with the agents and tasks
|
||||
crew = Crew(
|
||||
agents=[researcher, analyst, summarizer],
|
||||
tasks=[research_task, analysis_task, summary_task],
|
||||
process=Process.sequential,
|
||||
verbose=2
|
||||
)
|
||||
|
||||
# Run the crew
|
||||
result = crew.kickoff()
|
||||
```
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Hybrid Multi-Agent Collaborations
|
||||
- Create workflows where CrewAI agents collaborate with managed Bedrock agents running as services in AWS
|
||||
- Enable scenarios where sensitive data processing happens within your AWS environment while other agents operate externally
|
||||
- Bridge on-premises CrewAI agents with cloud-based Bedrock agents for distributed intelligence workflows
|
||||
|
||||
### Data Sovereignty and Compliance
|
||||
- Keep data-sensitive agentic workflows within your AWS environment while allowing external CrewAI agents to orchestrate tasks
|
||||
- Maintain compliance with data residency requirements by processing sensitive information only within your AWS account
|
||||
- Enable secure multi-agent collaborations where some agents cannot access your organization's private data
|
||||
|
||||
### Seamless AWS Service Integration
|
||||
- Access any AWS service through Amazon Bedrock Actions without writing complex integration code
|
||||
- Enable CrewAI agents to interact with AWS services through natural language requests
|
||||
- Leverage pre-built Bedrock agent capabilities to interact with AWS services like Bedrock Knowledge Bases, Lambda, and more
|
||||
|
||||
### Scalable Hybrid Agent Architectures
|
||||
- Offload computationally intensive tasks to managed Bedrock agents while lightweight tasks run in CrewAI
|
||||
- Scale agent processing by distributing workloads between local CrewAI agents and cloud-based Bedrock agents
|
||||
|
||||
### Cross-Organizational Agent Collaboration
|
||||
- Enable secure collaboration between your organization's CrewAI agents and partner organizations' Bedrock agents
|
||||
- Create workflows where external expertise from Bedrock agents can be incorporated without exposing sensitive data
|
||||
- Build agent ecosystems that span organizational boundaries while maintaining security and data control
|
||||
165
docs/en/tools/cloud-storage/bedrockkbretriever.mdx
Normal file
165
docs/en/tools/cloud-storage/bedrockkbretriever.mdx
Normal file
@@ -0,0 +1,165 @@
|
||||
---
|
||||
title: 'Bedrock Knowledge Base Retriever'
|
||||
description: 'Retrieve information from Amazon Bedrock Knowledge Bases using natural language queries'
|
||||
icon: aws
|
||||
---
|
||||
|
||||
# `BedrockKBRetrieverTool`
|
||||
|
||||
The `BedrockKBRetrieverTool` enables CrewAI agents to retrieve information from Amazon Bedrock Knowledge Bases using natural language queries.
|
||||
|
||||
## Installation
|
||||
|
||||
```bash
|
||||
uv pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Requirements
|
||||
|
||||
- AWS credentials configured (either through environment variables or AWS CLI)
|
||||
- `boto3` and `python-dotenv` packages
|
||||
- Access to Amazon Bedrock Knowledge Base
|
||||
|
||||
## Usage
|
||||
|
||||
Here's how to use the tool with a CrewAI agent:
|
||||
|
||||
```python {2, 4-17}
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools.aws.bedrock.knowledge_base.retriever_tool import BedrockKBRetrieverTool
|
||||
|
||||
# Initialize the tool
|
||||
kb_tool = BedrockKBRetrieverTool(
|
||||
knowledge_base_id="your-kb-id",
|
||||
number_of_results=5
|
||||
)
|
||||
|
||||
# Create a CrewAI agent that uses the tool
|
||||
researcher = Agent(
|
||||
role='Knowledge Base Researcher',
|
||||
goal='Find information about company policies',
|
||||
backstory='I am a researcher specialized in retrieving and analyzing company documentation.',
|
||||
tools=[kb_tool],
|
||||
verbose=True
|
||||
)
|
||||
|
||||
# Create a task for the agent
|
||||
research_task = Task(
|
||||
description="Find our company's remote work policy and summarize the key points.",
|
||||
agent=researcher
|
||||
)
|
||||
|
||||
# Create a crew with the agent
|
||||
crew = Crew(
|
||||
agents=[researcher],
|
||||
tasks=[research_task],
|
||||
verbose=2
|
||||
)
|
||||
|
||||
# Run the crew
|
||||
result = crew.kickoff()
|
||||
print(result)
|
||||
```
|
||||
|
||||
## Tool Arguments
|
||||
|
||||
| Argument | Type | Required | Default | Description |
|
||||
|:---------|:-----|:---------|:---------|:-------------|
|
||||
| **knowledge_base_id** | `str` | Yes | None | The unique identifier of the knowledge base (0-10 alphanumeric characters) |
|
||||
| **number_of_results** | `int` | No | 5 | Maximum number of results to return |
|
||||
| **retrieval_configuration** | `dict` | No | None | Custom configurations for the knowledge base query |
|
||||
| **guardrail_configuration** | `dict` | No | None | Content filtering settings |
|
||||
| **next_token** | `str` | No | None | Token for pagination |
|
||||
|
||||
## Environment Variables
|
||||
|
||||
```bash
|
||||
BEDROCK_KB_ID=your-knowledge-base-id # Alternative to passing knowledge_base_id
|
||||
AWS_REGION=your-aws-region # Defaults to us-east-1
|
||||
AWS_ACCESS_KEY_ID=your-access-key # Required for AWS authentication
|
||||
AWS_SECRET_ACCESS_KEY=your-secret-key # Required for AWS authentication
|
||||
```
|
||||
|
||||
## Response Format
|
||||
|
||||
The tool returns results in JSON format:
|
||||
|
||||
```json
|
||||
{
|
||||
"results": [
|
||||
{
|
||||
"content": "Retrieved text content",
|
||||
"content_type": "text",
|
||||
"source_type": "S3",
|
||||
"source_uri": "s3://bucket/document.pdf",
|
||||
"score": 0.95,
|
||||
"metadata": {
|
||||
"additional": "metadata"
|
||||
}
|
||||
}
|
||||
],
|
||||
"nextToken": "pagination-token",
|
||||
"guardrailAction": "NONE"
|
||||
}
|
||||
```
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
### Custom Retrieval Configuration
|
||||
|
||||
```python
|
||||
kb_tool = BedrockKBRetrieverTool(
|
||||
knowledge_base_id="your-kb-id",
|
||||
retrieval_configuration={
|
||||
"vectorSearchConfiguration": {
|
||||
"numberOfResults": 10,
|
||||
"overrideSearchType": "HYBRID"
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
policy_expert = Agent(
|
||||
role='Policy Expert',
|
||||
goal='Analyze company policies in detail',
|
||||
backstory='I am an expert in corporate policy analysis with deep knowledge of regulatory requirements.',
|
||||
tools=[kb_tool]
|
||||
)
|
||||
```
|
||||
|
||||
## Supported Data Sources
|
||||
|
||||
- Amazon S3
|
||||
- Confluence
|
||||
- Salesforce
|
||||
- SharePoint
|
||||
- Web pages
|
||||
- Custom document locations
|
||||
- Amazon Kendra
|
||||
- SQL databases
|
||||
|
||||
## Use Cases
|
||||
|
||||
### Enterprise Knowledge Integration
|
||||
- Enable CrewAI agents to access your organization's proprietary knowledge without exposing sensitive data
|
||||
- Allow agents to make decisions based on your company's specific policies, procedures, and documentation
|
||||
- Create agents that can answer questions based on your internal documentation while maintaining data security
|
||||
|
||||
### Specialized Domain Knowledge
|
||||
- Connect CrewAI agents to domain-specific knowledge bases (legal, medical, technical) without retraining models
|
||||
- Leverage existing knowledge repositories that are already maintained in your AWS environment
|
||||
- Combine CrewAI's reasoning with domain-specific information from your knowledge bases
|
||||
|
||||
### Data-Driven Decision Making
|
||||
- Ground CrewAI agent responses in your actual company data rather than general knowledge
|
||||
- Ensure agents provide recommendations based on your specific business context and documentation
|
||||
- Reduce hallucinations by retrieving factual information from your knowledge bases
|
||||
|
||||
### Scalable Information Access
|
||||
- Access terabytes of organizational knowledge without embedding it all into your models
|
||||
- Dynamically query only the relevant information needed for specific tasks
|
||||
- Leverage AWS's scalable infrastructure to handle large knowledge bases efficiently
|
||||
|
||||
### Compliance and Governance
|
||||
- Ensure CrewAI agents provide responses that align with your company's approved documentation
|
||||
- Create auditable trails of information sources used by your agents
|
||||
- Maintain control over what information sources your agents can access
|
||||
50
docs/en/tools/cloud-storage/overview.mdx
Normal file
50
docs/en/tools/cloud-storage/overview.mdx
Normal file
@@ -0,0 +1,50 @@
|
||||
---
|
||||
title: "Overview"
|
||||
description: "Interact with cloud services, storage systems, and cloud-based AI platforms"
|
||||
icon: "face-smile"
|
||||
---
|
||||
|
||||
These tools enable your agents to interact with cloud services, access cloud storage, and leverage cloud-based AI platforms for scalable operations.
|
||||
|
||||
## **Available Tools**
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="S3 Reader Tool" icon="cloud" href="/en/tools/cloud-storage/s3readertool">
|
||||
Read files and data from Amazon S3 buckets.
|
||||
</Card>
|
||||
|
||||
<Card title="S3 Writer Tool" icon="cloud-arrow-up" href="/en/tools/cloud-storage/s3writertool">
|
||||
Write and upload files to Amazon S3 storage.
|
||||
</Card>
|
||||
|
||||
<Card title="Bedrock Invoke Agent" icon="aws" href="/en/tools/cloud-storage/bedrockinvokeagenttool">
|
||||
Invoke Amazon Bedrock agents for AI-powered tasks.
|
||||
</Card>
|
||||
|
||||
<Card title="Bedrock KB Retriever" icon="database" href="/en/tools/cloud-storage/bedrockkbretriever">
|
||||
Retrieve information from Amazon Bedrock knowledge bases.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
## **Common Use Cases**
|
||||
|
||||
- **File Storage**: Store and retrieve files from cloud storage systems
|
||||
- **Data Backup**: Backup important data to cloud storage
|
||||
- **AI Services**: Access cloud-based AI models and services
|
||||
- **Knowledge Retrieval**: Query cloud-hosted knowledge bases
|
||||
- **Scalable Operations**: Leverage cloud infrastructure for processing
|
||||
|
||||
```python
|
||||
from crewai_tools import S3ReaderTool, S3WriterTool, BedrockInvokeAgentTool
|
||||
|
||||
# Create cloud tools
|
||||
s3_reader = S3ReaderTool()
|
||||
s3_writer = S3WriterTool()
|
||||
bedrock_agent = BedrockInvokeAgentTool()
|
||||
|
||||
# Add to your agent
|
||||
agent = Agent(
|
||||
role="Cloud Operations Specialist",
|
||||
tools=[s3_reader, s3_writer, bedrock_agent],
|
||||
goal="Manage cloud resources and AI services"
|
||||
)
|
||||
144
docs/en/tools/cloud-storage/s3readertool.mdx
Normal file
144
docs/en/tools/cloud-storage/s3readertool.mdx
Normal file
@@ -0,0 +1,144 @@
|
||||
---
|
||||
title: S3 Reader Tool
|
||||
description: The `S3ReaderTool` enables CrewAI agents to read files from Amazon S3 buckets.
|
||||
icon: aws
|
||||
---
|
||||
|
||||
# `S3ReaderTool`
|
||||
|
||||
## Description
|
||||
|
||||
The `S3ReaderTool` is designed to read files from Amazon S3 buckets. This tool allows CrewAI agents to access and retrieve content stored in S3, making it ideal for workflows that require reading data, configuration files, or any other content stored in AWS S3 storage.
|
||||
|
||||
## Installation
|
||||
|
||||
To use this tool, you need to install the required dependencies:
|
||||
|
||||
```shell
|
||||
uv add boto3
|
||||
```
|
||||
|
||||
## Steps to Get Started
|
||||
|
||||
To effectively use the `S3ReaderTool`, follow these steps:
|
||||
|
||||
1. **Install Dependencies**: Install the required packages using the command above.
|
||||
2. **Configure AWS Credentials**: Set up your AWS credentials as environment variables.
|
||||
3. **Initialize the Tool**: Create an instance of the tool.
|
||||
4. **Specify S3 Path**: Provide the S3 path to the file you want to read.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to use the `S3ReaderTool` to read a file from an S3 bucket:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools.aws.s3 import S3ReaderTool
|
||||
|
||||
# Initialize the tool
|
||||
s3_reader_tool = S3ReaderTool()
|
||||
|
||||
# Define an agent that uses the tool
|
||||
file_reader_agent = Agent(
|
||||
role="File Reader",
|
||||
goal="Read files from S3 buckets",
|
||||
backstory="An expert in retrieving and processing files from cloud storage.",
|
||||
tools=[s3_reader_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Example task to read a configuration file
|
||||
read_task = Task(
|
||||
description="Read the configuration file from {my_bucket} and summarize its contents.",
|
||||
expected_output="A summary of the configuration file contents.",
|
||||
agent=file_reader_agent,
|
||||
)
|
||||
|
||||
# Create and run the crew
|
||||
crew = Crew(agents=[file_reader_agent], tasks=[read_task])
|
||||
result = crew.kickoff(inputs={"my_bucket": "s3://my-bucket/config/app-config.json"})
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `S3ReaderTool` accepts the following parameter when used by an agent:
|
||||
|
||||
- **file_path**: Required. The S3 file path in the format `s3://bucket-name/file-name`.
|
||||
|
||||
## AWS Credentials
|
||||
|
||||
The tool requires AWS credentials to access S3 buckets. You can configure these credentials using environment variables:
|
||||
|
||||
- **CREW_AWS_REGION**: The AWS region where your S3 bucket is located. Default is `us-east-1`.
|
||||
- **CREW_AWS_ACCESS_KEY_ID**: Your AWS access key ID.
|
||||
- **CREW_AWS_SEC_ACCESS_KEY**: Your AWS secret access key.
|
||||
|
||||
## Usage
|
||||
|
||||
When using the `S3ReaderTool` with an agent, the agent will need to provide the S3 file path:
|
||||
|
||||
```python Code
|
||||
# Example of using the tool with an agent
|
||||
file_reader_agent = Agent(
|
||||
role="File Reader",
|
||||
goal="Read files from S3 buckets",
|
||||
backstory="An expert in retrieving and processing files from cloud storage.",
|
||||
tools=[s3_reader_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Create a task for the agent to read a specific file
|
||||
read_config_task = Task(
|
||||
description="Read the application configuration file from {my_bucket} and extract the database connection settings.",
|
||||
expected_output="The database connection settings from the configuration file.",
|
||||
agent=file_reader_agent,
|
||||
)
|
||||
|
||||
# Run the task
|
||||
crew = Crew(agents=[file_reader_agent], tasks=[read_config_task])
|
||||
result = crew.kickoff(inputs={"my_bucket": "s3://my-bucket/config/app-config.json"})
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
The `S3ReaderTool` includes error handling for common S3 issues:
|
||||
|
||||
- Invalid S3 path format
|
||||
- Missing or inaccessible files
|
||||
- Permission issues
|
||||
- AWS credential problems
|
||||
|
||||
When an error occurs, the tool will return an error message that includes details about the issue.
|
||||
|
||||
## Implementation Details
|
||||
|
||||
The `S3ReaderTool` uses the AWS SDK for Python (boto3) to interact with S3:
|
||||
|
||||
```python Code
|
||||
class S3ReaderTool(BaseTool):
|
||||
name: str = "S3 Reader Tool"
|
||||
description: str = "Reads a file from Amazon S3 given an S3 file path"
|
||||
|
||||
def _run(self, file_path: str) -> str:
|
||||
try:
|
||||
bucket_name, object_key = self._parse_s3_path(file_path)
|
||||
|
||||
s3 = boto3.client(
|
||||
's3',
|
||||
region_name=os.getenv('CREW_AWS_REGION', 'us-east-1'),
|
||||
aws_access_key_id=os.getenv('CREW_AWS_ACCESS_KEY_ID'),
|
||||
aws_secret_access_key=os.getenv('CREW_AWS_SEC_ACCESS_KEY')
|
||||
)
|
||||
|
||||
# Read file content from S3
|
||||
response = s3.get_object(Bucket=bucket_name, Key=object_key)
|
||||
file_content = response['Body'].read().decode('utf-8')
|
||||
|
||||
return file_content
|
||||
except ClientError as e:
|
||||
return f"Error reading file from S3: {str(e)}"
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `S3ReaderTool` provides a straightforward way to read files from Amazon S3 buckets. By enabling agents to access content stored in S3, it facilitates workflows that require cloud-based file access. This tool is particularly useful for data processing, configuration management, and any task that involves retrieving information from AWS S3 storage.
|
||||
150
docs/en/tools/cloud-storage/s3writertool.mdx
Normal file
150
docs/en/tools/cloud-storage/s3writertool.mdx
Normal file
@@ -0,0 +1,150 @@
|
||||
---
|
||||
title: S3 Writer Tool
|
||||
description: The `S3WriterTool` enables CrewAI agents to write content to files in Amazon S3 buckets.
|
||||
icon: aws
|
||||
---
|
||||
|
||||
# `S3WriterTool`
|
||||
|
||||
## Description
|
||||
|
||||
The `S3WriterTool` is designed to write content to files in Amazon S3 buckets. This tool allows CrewAI agents to create or update files in S3, making it ideal for workflows that require storing data, saving configuration files, or persisting any other content to AWS S3 storage.
|
||||
|
||||
## Installation
|
||||
|
||||
To use this tool, you need to install the required dependencies:
|
||||
|
||||
```shell
|
||||
uv add boto3
|
||||
```
|
||||
|
||||
## Steps to Get Started
|
||||
|
||||
To effectively use the `S3WriterTool`, follow these steps:
|
||||
|
||||
1. **Install Dependencies**: Install the required packages using the command above.
|
||||
2. **Configure AWS Credentials**: Set up your AWS credentials as environment variables.
|
||||
3. **Initialize the Tool**: Create an instance of the tool.
|
||||
4. **Specify S3 Path and Content**: Provide the S3 path where you want to write the file and the content to be written.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to use the `S3WriterTool` to write content to a file in an S3 bucket:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools.aws.s3 import S3WriterTool
|
||||
|
||||
# Initialize the tool
|
||||
s3_writer_tool = S3WriterTool()
|
||||
|
||||
# Define an agent that uses the tool
|
||||
file_writer_agent = Agent(
|
||||
role="File Writer",
|
||||
goal="Write content to files in S3 buckets",
|
||||
backstory="An expert in storing and managing files in cloud storage.",
|
||||
tools=[s3_writer_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Example task to write a report
|
||||
write_task = Task(
|
||||
description="Generate a summary report of the quarterly sales data and save it to {my_bucket}.",
|
||||
expected_output="Confirmation that the report was successfully saved to S3.",
|
||||
agent=file_writer_agent,
|
||||
)
|
||||
|
||||
# Create and run the crew
|
||||
crew = Crew(agents=[file_writer_agent], tasks=[write_task])
|
||||
result = crew.kickoff(inputs={"my_bucket": "s3://my-bucket/reports/quarterly-summary.txt"})
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `S3WriterTool` accepts the following parameters when used by an agent:
|
||||
|
||||
- **file_path**: Required. The S3 file path in the format `s3://bucket-name/file-name`.
|
||||
- **content**: Required. The content to write to the file.
|
||||
|
||||
## AWS Credentials
|
||||
|
||||
The tool requires AWS credentials to access S3 buckets. You can configure these credentials using environment variables:
|
||||
|
||||
- **CREW_AWS_REGION**: The AWS region where your S3 bucket is located. Default is `us-east-1`.
|
||||
- **CREW_AWS_ACCESS_KEY_ID**: Your AWS access key ID.
|
||||
- **CREW_AWS_SEC_ACCESS_KEY**: Your AWS secret access key.
|
||||
|
||||
## Usage
|
||||
|
||||
When using the `S3WriterTool` with an agent, the agent will need to provide both the S3 file path and the content to write:
|
||||
|
||||
```python Code
|
||||
# Example of using the tool with an agent
|
||||
file_writer_agent = Agent(
|
||||
role="File Writer",
|
||||
goal="Write content to files in S3 buckets",
|
||||
backstory="An expert in storing and managing files in cloud storage.",
|
||||
tools=[s3_writer_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Create a task for the agent to write a specific file
|
||||
write_config_task = Task(
|
||||
description="""
|
||||
Create a configuration file with the following database settings:
|
||||
- host: db.example.com
|
||||
- port: 5432
|
||||
- username: app_user
|
||||
- password: secure_password
|
||||
|
||||
Save this configuration as JSON to {my_bucket}.
|
||||
""",
|
||||
expected_output="Confirmation that the configuration file was successfully saved to S3.",
|
||||
agent=file_writer_agent,
|
||||
)
|
||||
|
||||
# Run the task
|
||||
crew = Crew(agents=[file_writer_agent], tasks=[write_config_task])
|
||||
result = crew.kickoff(inputs={"my_bucket": "s3://my-bucket/config/db-config.json"})
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
The `S3WriterTool` includes error handling for common S3 issues:
|
||||
|
||||
- Invalid S3 path format
|
||||
- Permission issues (e.g., no write access to the bucket)
|
||||
- AWS credential problems
|
||||
- Bucket does not exist
|
||||
|
||||
When an error occurs, the tool will return an error message that includes details about the issue.
|
||||
|
||||
## Implementation Details
|
||||
|
||||
The `S3WriterTool` uses the AWS SDK for Python (boto3) to interact with S3:
|
||||
|
||||
```python Code
|
||||
class S3WriterTool(BaseTool):
|
||||
name: str = "S3 Writer Tool"
|
||||
description: str = "Writes content to a file in Amazon S3 given an S3 file path"
|
||||
|
||||
def _run(self, file_path: str, content: str) -> str:
|
||||
try:
|
||||
bucket_name, object_key = self._parse_s3_path(file_path)
|
||||
|
||||
s3 = boto3.client(
|
||||
's3',
|
||||
region_name=os.getenv('CREW_AWS_REGION', 'us-east-1'),
|
||||
aws_access_key_id=os.getenv('CREW_AWS_ACCESS_KEY_ID'),
|
||||
aws_secret_access_key=os.getenv('CREW_AWS_SEC_ACCESS_KEY')
|
||||
)
|
||||
|
||||
s3.put_object(Bucket=bucket_name, Key=object_key, Body=content.encode('utf-8'))
|
||||
return f"Successfully wrote content to {file_path}"
|
||||
except ClientError as e:
|
||||
return f"Error writing file to S3: {str(e)}"
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `S3WriterTool` provides a straightforward way to write content to files in Amazon S3 buckets. By enabling agents to create and update files in S3, it facilitates workflows that require cloud-based file storage. This tool is particularly useful for data persistence, configuration management, report generation, and any task that involves storing information in AWS S3 storage.
|
||||
69
docs/en/tools/database-data/mysqltool.mdx
Normal file
69
docs/en/tools/database-data/mysqltool.mdx
Normal file
@@ -0,0 +1,69 @@
|
||||
---
|
||||
title: MySQL RAG Search
|
||||
description: The `MySQLSearchTool` is designed to search MySQL databases and return the most relevant results.
|
||||
icon: database
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
This tool is designed to facilitate semantic searches within MySQL database tables. Leveraging the RAG (Retrieve and Generate) technology,
|
||||
the MySQLSearchTool provides users with an efficient means of querying database table content, specifically tailored for MySQL databases.
|
||||
It simplifies the process of finding relevant data through semantic search queries, making it an invaluable resource for users needing
|
||||
to perform advanced queries on extensive datasets within a MySQL database.
|
||||
|
||||
## Installation
|
||||
|
||||
To install the `crewai_tools` package and utilize the MySQLSearchTool, execute the following command in your terminal:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
Below is an example showcasing how to use the MySQLSearchTool to conduct a semantic search on a table within a MySQL database:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import MySQLSearchTool
|
||||
|
||||
# Initialize the tool with the database URI and the target table name
|
||||
tool = MySQLSearchTool(
|
||||
db_uri='mysql://user:password@localhost:3306/mydatabase',
|
||||
table_name='employees'
|
||||
)
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
The MySQLSearchTool requires the following arguments for its operation:
|
||||
|
||||
- `db_uri`: A string representing the URI of the MySQL database to be queried. This argument is mandatory and must include the necessary authentication details and the location of the database.
|
||||
- `table_name`: A string specifying the name of the table within the database on which the semantic search will be performed. This argument is mandatory.
|
||||
|
||||
## Custom model and embeddings
|
||||
|
||||
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
|
||||
|
||||
```python Code
|
||||
tool = MySQLSearchTool(
|
||||
config=dict(
|
||||
llm=dict(
|
||||
provider="ollama", # or google, openai, anthropic, llama2, ...
|
||||
config=dict(
|
||||
model="llama2",
|
||||
# temperature=0.5,
|
||||
# top_p=1,
|
||||
# stream=true,
|
||||
),
|
||||
),
|
||||
embedder=dict(
|
||||
provider="google",
|
||||
config=dict(
|
||||
model="models/embedding-001",
|
||||
task_type="retrieval_document",
|
||||
# title="Embeddings",
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
```
|
||||
83
docs/en/tools/database-data/nl2sqltool.mdx
Normal file
83
docs/en/tools/database-data/nl2sqltool.mdx
Normal file
@@ -0,0 +1,83 @@
|
||||
---
|
||||
title: NL2SQL Tool
|
||||
description: The `NL2SQLTool` is designed to convert natural language to SQL queries.
|
||||
icon: language
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
|
||||
This tool is used to convert natural language to SQL queries. When passed to the agent it will generate queries and then use them to interact with the database.
|
||||
|
||||
This enables multiple workflows like having an Agent to access the database fetch information based on the goal and then use the information to generate a response, report or any other output.
|
||||
Along with that provides the ability for the Agent to update the database based on its goal.
|
||||
|
||||
**Attention**: Make sure that the Agent has access to a Read-Replica or that is okay for the Agent to run insert/update queries on the database.
|
||||
|
||||
## Requirements
|
||||
|
||||
- SqlAlchemy
|
||||
- Any DB compatible library (e.g. psycopg2, mysql-connector-python)
|
||||
|
||||
## Installation
|
||||
|
||||
Install the crewai_tools package
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
In order to use the NL2SQLTool, you need to pass the database URI to the tool. The URI should be in the format `dialect+driver://username:password@host:port/database`.
|
||||
|
||||
|
||||
```python Code
|
||||
from crewai_tools import NL2SQLTool
|
||||
|
||||
# psycopg2 was installed to run this example with PostgreSQL
|
||||
nl2sql = NL2SQLTool(db_uri="postgresql://example@localhost:5432/test_db")
|
||||
|
||||
@agent
|
||||
def researcher(self) -> Agent:
|
||||
return Agent(
|
||||
config=self.agents_config["researcher"],
|
||||
allow_delegation=False,
|
||||
tools=[nl2sql]
|
||||
)
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
The primary task goal was:
|
||||
|
||||
"Retrieve the average, maximum, and minimum monthly revenue for each city, but only include cities that have more than one user. Also, count the number of user in each city and
|
||||
sort the results by the average monthly revenue in descending order"
|
||||
|
||||
So the Agent tried to get information from the DB, the first one is wrong so the Agent tries again and gets the correct information and passes to the next agent.
|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
The second task goal was:
|
||||
|
||||
"Review the data and create a detailed report, and then create the table on the database with the fields based on the data provided.
|
||||
Include information on the average, maximum, and minimum monthly revenue for each city, but only include cities that have more than one user. Also, count the number of users in each city and sort the results by the average monthly revenue in descending order."
|
||||
|
||||
Now things start to get interesting, the Agent generates the SQL query to not only create the table but also insert the data into the table. And in the end the Agent still returns the final report which is exactly what was in the database.
|
||||
|
||||

|
||||

|
||||
|
||||

|
||||

|
||||
|
||||
|
||||
This is a simple example of how the NL2SQLTool can be used to interact with the database and generate reports based on the data in the database.
|
||||
|
||||
The Tool provides endless possibilities on the logic of the Agent and how it can interact with the database.
|
||||
|
||||
```md
|
||||
DB -> Agent -> ... -> Agent -> DB
|
||||
```
|
||||
57
docs/en/tools/database-data/overview.mdx
Normal file
57
docs/en/tools/database-data/overview.mdx
Normal file
@@ -0,0 +1,57 @@
|
||||
---
|
||||
title: "Overview"
|
||||
description: "Connect to databases, vector stores, and data warehouses for comprehensive data access"
|
||||
icon: "face-smile"
|
||||
---
|
||||
|
||||
These tools enable your agents to interact with various database systems, from traditional SQL databases to modern vector stores and data warehouses.
|
||||
|
||||
## **Available Tools**
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="MySQL Tool" icon="database" href="/en/tools/database-data/mysqltool">
|
||||
Connect to and query MySQL databases with SQL operations.
|
||||
</Card>
|
||||
|
||||
<Card title="PostgreSQL Search" icon="elephant" href="/en/tools/database-data/pgsearchtool">
|
||||
Search and query PostgreSQL databases efficiently.
|
||||
</Card>
|
||||
|
||||
<Card title="Snowflake Search" icon="snowflake" href="/en/tools/database-data/snowflakesearchtool">
|
||||
Access Snowflake data warehouse for analytics and reporting.
|
||||
</Card>
|
||||
|
||||
<Card title="NL2SQL Tool" icon="language" href="/en/tools/database-data/nl2sqltool">
|
||||
Convert natural language queries to SQL statements automatically.
|
||||
</Card>
|
||||
|
||||
<Card title="Qdrant Vector Search" icon="vector-square" href="/en/tools/database-data/qdrantvectorsearchtool">
|
||||
Search vector embeddings using Qdrant vector database.
|
||||
</Card>
|
||||
|
||||
<Card title="Weaviate Vector Search" icon="network-wired" href="/en/tools/database-data/weaviatevectorsearchtool">
|
||||
Perform semantic search with Weaviate vector database.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
## **Common Use Cases**
|
||||
|
||||
- **Data Analysis**: Query databases for business intelligence and reporting
|
||||
- **Vector Search**: Find similar content using semantic embeddings
|
||||
- **ETL Operations**: Extract, transform, and load data between systems
|
||||
- **Real-time Analytics**: Access live data for decision making
|
||||
|
||||
```python
|
||||
from crewai_tools import MySQLTool, QdrantVectorSearchTool, NL2SQLTool
|
||||
|
||||
# Create database tools
|
||||
mysql_db = MySQLTool()
|
||||
vector_search = QdrantVectorSearchTool()
|
||||
nl_to_sql = NL2SQLTool()
|
||||
|
||||
# Add to your agent
|
||||
agent = Agent(
|
||||
role="Data Analyst",
|
||||
tools=[mysql_db, vector_search, nl_to_sql],
|
||||
goal="Extract insights from various data sources"
|
||||
)
|
||||
82
docs/en/tools/database-data/pgsearchtool.mdx
Normal file
82
docs/en/tools/database-data/pgsearchtool.mdx
Normal file
@@ -0,0 +1,82 @@
|
||||
---
|
||||
title: PG RAG Search
|
||||
description: The `PGSearchTool` is designed to search PostgreSQL databases and return the most relevant results.
|
||||
icon: elephant
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
<Note>
|
||||
The PGSearchTool is currently under development. This document outlines the intended functionality and interface.
|
||||
As development progresses, please be aware that some features may not be available or could change.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
The PGSearchTool is envisioned as a powerful tool for facilitating semantic searches within PostgreSQL database tables. By leveraging advanced Retrieve and Generate (RAG) technology,
|
||||
it aims to provide an efficient means for querying database table content, specifically tailored for PostgreSQL databases.
|
||||
The tool's goal is to simplify the process of finding relevant data through semantic search queries, offering a valuable resource for users needing to conduct advanced queries on
|
||||
extensive datasets within a PostgreSQL environment.
|
||||
|
||||
## Installation
|
||||
|
||||
The `crewai_tools` package, which will include the PGSearchTool upon its release, can be installed using the following command:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
<Note>
|
||||
The PGSearchTool is not yet available in the current version of the `crewai_tools` package. This installation command will be updated once the tool is released.
|
||||
</Note>
|
||||
|
||||
## Example Usage
|
||||
|
||||
Below is a proposed example showcasing how to use the PGSearchTool for conducting a semantic search on a table within a PostgreSQL database:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import PGSearchTool
|
||||
|
||||
# Initialize the tool with the database URI and the target table name
|
||||
tool = PGSearchTool(
|
||||
db_uri='postgresql://user:password@localhost:5432/mydatabase',
|
||||
table_name='employees'
|
||||
)
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
The PGSearchTool is designed to require the following arguments for its operation:
|
||||
|
||||
| Argument | Type | Description |
|
||||
|:---------------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| **db_uri** | `string` | **Mandatory**. A string representing the URI of the PostgreSQL database to be queried. This argument will be mandatory and must include the necessary authentication details and the location of the database. |
|
||||
| **table_name** | `string` | **Mandatory**. A string specifying the name of the table within the database on which the semantic search will be performed. This argument will also be mandatory. |
|
||||
|
||||
## Custom Model and Embeddings
|
||||
|
||||
The tool intends to use OpenAI for both embeddings and summarization by default. Users will have the option to customize the model using a config dictionary as follows:
|
||||
|
||||
```python Code
|
||||
tool = PGSearchTool(
|
||||
config=dict(
|
||||
llm=dict(
|
||||
provider="ollama", # or google, openai, anthropic, llama2, ...
|
||||
config=dict(
|
||||
model="llama2",
|
||||
# temperature=0.5,
|
||||
# top_p=1,
|
||||
# stream=true,
|
||||
),
|
||||
),
|
||||
embedder=dict(
|
||||
provider="google", # or openai, ollama, ...
|
||||
config=dict(
|
||||
model="models/embedding-001",
|
||||
task_type="retrieval_document",
|
||||
# title="Embeddings",
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
```
|
||||
271
docs/en/tools/database-data/qdrantvectorsearchtool.mdx
Normal file
271
docs/en/tools/database-data/qdrantvectorsearchtool.mdx
Normal file
@@ -0,0 +1,271 @@
|
||||
---
|
||||
title: 'Qdrant Vector Search Tool'
|
||||
description: 'Semantic search capabilities for CrewAI agents using Qdrant vector database'
|
||||
icon: vector-square
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
The Qdrant Vector Search Tool enables semantic search capabilities in your CrewAI agents by leveraging [Qdrant](https://qdrant.tech/), a vector similarity search engine. This tool allows your agents to search through documents stored in a Qdrant collection using semantic similarity.
|
||||
|
||||
## Installation
|
||||
|
||||
Install the required packages:
|
||||
|
||||
```bash
|
||||
uv add qdrant-client
|
||||
```
|
||||
|
||||
## Basic Usage
|
||||
|
||||
Here's a minimal example of how to use the tool:
|
||||
|
||||
```python
|
||||
from crewai import Agent
|
||||
from crewai_tools import QdrantVectorSearchTool
|
||||
|
||||
# Initialize the tool
|
||||
qdrant_tool = QdrantVectorSearchTool(
|
||||
qdrant_url="your_qdrant_url",
|
||||
qdrant_api_key="your_qdrant_api_key",
|
||||
collection_name="your_collection"
|
||||
)
|
||||
|
||||
# Create an agent that uses the tool
|
||||
agent = Agent(
|
||||
role="Research Assistant",
|
||||
goal="Find relevant information in documents",
|
||||
tools=[qdrant_tool]
|
||||
)
|
||||
|
||||
# The tool will automatically use OpenAI embeddings
|
||||
# and return the 3 most relevant results with scores > 0.35
|
||||
```
|
||||
|
||||
## Complete Working Example
|
||||
|
||||
Here's a complete example showing how to:
|
||||
1. Extract text from a PDF
|
||||
2. Generate embeddings using OpenAI
|
||||
3. Store in Qdrant
|
||||
4. Create a CrewAI agentic RAG workflow for semantic search
|
||||
|
||||
```python
|
||||
import os
|
||||
import uuid
|
||||
import pdfplumber
|
||||
from openai import OpenAI
|
||||
from dotenv import load_dotenv
|
||||
from crewai import Agent, Task, Crew, Process, LLM
|
||||
from crewai_tools import QdrantVectorSearchTool
|
||||
from qdrant_client import QdrantClient
|
||||
from qdrant_client.models import PointStruct, Distance, VectorParams
|
||||
|
||||
# Load environment variables
|
||||
load_dotenv()
|
||||
|
||||
# Initialize OpenAI client
|
||||
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
|
||||
|
||||
# Extract text from PDF
|
||||
def extract_text_from_pdf(pdf_path):
|
||||
text = []
|
||||
with pdfplumber.open(pdf_path) as pdf:
|
||||
for page in pdf.pages:
|
||||
page_text = page.extract_text()
|
||||
if page_text:
|
||||
text.append(page_text.strip())
|
||||
return text
|
||||
|
||||
# Generate OpenAI embeddings
|
||||
def get_openai_embedding(text):
|
||||
response = client.embeddings.create(
|
||||
input=text,
|
||||
model="text-embedding-3-small"
|
||||
)
|
||||
return response.data[0].embedding
|
||||
|
||||
# Store text and embeddings in Qdrant
|
||||
def load_pdf_to_qdrant(pdf_path, qdrant, collection_name):
|
||||
# Extract text from PDF
|
||||
text_chunks = extract_text_from_pdf(pdf_path)
|
||||
|
||||
# Create Qdrant collection
|
||||
if qdrant.collection_exists(collection_name):
|
||||
qdrant.delete_collection(collection_name)
|
||||
qdrant.create_collection(
|
||||
collection_name=collection_name,
|
||||
vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
|
||||
)
|
||||
|
||||
# Store embeddings
|
||||
points = []
|
||||
for chunk in text_chunks:
|
||||
embedding = get_openai_embedding(chunk)
|
||||
points.append(PointStruct(
|
||||
id=str(uuid.uuid4()),
|
||||
vector=embedding,
|
||||
payload={"text": chunk}
|
||||
))
|
||||
qdrant.upsert(collection_name=collection_name, points=points)
|
||||
|
||||
# Initialize Qdrant client and load data
|
||||
qdrant = QdrantClient(
|
||||
url=os.getenv("QDRANT_URL"),
|
||||
api_key=os.getenv("QDRANT_API_KEY")
|
||||
)
|
||||
collection_name = "example_collection"
|
||||
pdf_path = "path/to/your/document.pdf"
|
||||
load_pdf_to_qdrant(pdf_path, qdrant, collection_name)
|
||||
|
||||
# Initialize Qdrant search tool
|
||||
qdrant_tool = QdrantVectorSearchTool(
|
||||
qdrant_url=os.getenv("QDRANT_URL"),
|
||||
qdrant_api_key=os.getenv("QDRANT_API_KEY"),
|
||||
collection_name=collection_name,
|
||||
limit=3,
|
||||
score_threshold=0.35
|
||||
)
|
||||
|
||||
# Create CrewAI agents
|
||||
search_agent = Agent(
|
||||
role="Senior Semantic Search Agent",
|
||||
goal="Find and analyze documents based on semantic search",
|
||||
backstory="""You are an expert research assistant who can find relevant
|
||||
information using semantic search in a Qdrant database.""",
|
||||
tools=[qdrant_tool],
|
||||
verbose=True
|
||||
)
|
||||
|
||||
answer_agent = Agent(
|
||||
role="Senior Answer Assistant",
|
||||
goal="Generate answers to questions based on the context provided",
|
||||
backstory="""You are an expert answer assistant who can generate
|
||||
answers to questions based on the context provided.""",
|
||||
tools=[qdrant_tool],
|
||||
verbose=True
|
||||
)
|
||||
|
||||
# Define tasks
|
||||
search_task = Task(
|
||||
description="""Search for relevant documents about the {query}.
|
||||
Your final answer should include:
|
||||
- The relevant information found
|
||||
- The similarity scores of the results
|
||||
- The metadata of the relevant documents""",
|
||||
agent=search_agent
|
||||
)
|
||||
|
||||
answer_task = Task(
|
||||
description="""Given the context and metadata of relevant documents,
|
||||
generate a final answer based on the context.""",
|
||||
agent=answer_agent
|
||||
)
|
||||
|
||||
# Run CrewAI workflow
|
||||
crew = Crew(
|
||||
agents=[search_agent, answer_agent],
|
||||
tasks=[search_task, answer_task],
|
||||
process=Process.sequential,
|
||||
verbose=True
|
||||
)
|
||||
|
||||
result = crew.kickoff(
|
||||
inputs={"query": "What is the role of X in the document?"}
|
||||
)
|
||||
print(result)
|
||||
```
|
||||
|
||||
## Tool Parameters
|
||||
|
||||
### Required Parameters
|
||||
- `qdrant_url` (str): The URL of your Qdrant server
|
||||
- `qdrant_api_key` (str): API key for authentication with Qdrant
|
||||
- `collection_name` (str): Name of the Qdrant collection to search
|
||||
|
||||
### Optional Parameters
|
||||
- `limit` (int): Maximum number of results to return (default: 3)
|
||||
- `score_threshold` (float): Minimum similarity score threshold (default: 0.35)
|
||||
- `custom_embedding_fn` (Callable[[str], list[float]]): Custom function for text vectorization
|
||||
|
||||
## Search Parameters
|
||||
|
||||
The tool accepts these parameters in its schema:
|
||||
- `query` (str): The search query to find similar documents
|
||||
- `filter_by` (str, optional): Metadata field to filter on
|
||||
- `filter_value` (str, optional): Value to filter by
|
||||
|
||||
## Return Format
|
||||
|
||||
The tool returns results in JSON format:
|
||||
|
||||
```json
|
||||
[
|
||||
{
|
||||
"metadata": {
|
||||
// Any metadata stored with the document
|
||||
},
|
||||
"context": "The actual text content of the document",
|
||||
"distance": 0.95 // Similarity score
|
||||
}
|
||||
]
|
||||
```
|
||||
|
||||
## Default Embedding
|
||||
|
||||
By default, the tool uses OpenAI's `text-embedding-3-small` model for vectorization. This requires:
|
||||
- OpenAI API key set in environment: `OPENAI_API_KEY`
|
||||
|
||||
## Custom Embeddings
|
||||
|
||||
Instead of using the default embedding model, you might want to use your own embedding function in cases where you:
|
||||
|
||||
1. Want to use a different embedding model (e.g., Cohere, HuggingFace, Ollama models)
|
||||
2. Need to reduce costs by using open-source embedding models
|
||||
3. Have specific requirements for vector dimensions or embedding quality
|
||||
4. Want to use domain-specific embeddings (e.g., for medical or legal text)
|
||||
|
||||
Here's an example using a HuggingFace model:
|
||||
|
||||
```python
|
||||
from transformers import AutoTokenizer, AutoModel
|
||||
import torch
|
||||
|
||||
# Load model and tokenizer
|
||||
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
|
||||
model = AutoModel.from_pretrained('sentence-transformers/all-MiniLM-L6-v2')
|
||||
|
||||
def custom_embeddings(text: str) -> list[float]:
|
||||
# Tokenize and get model outputs
|
||||
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
|
||||
outputs = model(**inputs)
|
||||
|
||||
# Use mean pooling to get text embedding
|
||||
embeddings = outputs.last_hidden_state.mean(dim=1)
|
||||
|
||||
# Convert to list of floats and return
|
||||
return embeddings[0].tolist()
|
||||
|
||||
# Use custom embeddings with the tool
|
||||
tool = QdrantVectorSearchTool(
|
||||
qdrant_url="your_url",
|
||||
qdrant_api_key="your_key",
|
||||
collection_name="your_collection",
|
||||
custom_embedding_fn=custom_embeddings # Pass your custom function
|
||||
)
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
The tool handles these specific errors:
|
||||
- Raises ImportError if `qdrant-client` is not installed (with option to auto-install)
|
||||
- Raises ValueError if `QDRANT_URL` is not set
|
||||
- Prompts to install `qdrant-client` if missing using `uv add qdrant-client`
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Required environment variables:
|
||||
```bash
|
||||
export QDRANT_URL="your_qdrant_url" # If not provided in constructor
|
||||
export QDRANT_API_KEY="your_api_key" # If not provided in constructor
|
||||
export OPENAI_API_KEY="your_openai_key" # If using default embeddings
|
||||
202
docs/en/tools/database-data/snowflakesearchtool.mdx
Normal file
202
docs/en/tools/database-data/snowflakesearchtool.mdx
Normal file
@@ -0,0 +1,202 @@
|
||||
---
|
||||
title: Snowflake Search Tool
|
||||
description: The `SnowflakeSearchTool` enables CrewAI agents to execute SQL queries and perform semantic search on Snowflake data warehouses.
|
||||
icon: snowflake
|
||||
---
|
||||
|
||||
# `SnowflakeSearchTool`
|
||||
|
||||
## Description
|
||||
|
||||
The `SnowflakeSearchTool` is designed to connect to Snowflake data warehouses and execute SQL queries with advanced features like connection pooling, retry logic, and asynchronous execution. This tool allows CrewAI agents to interact with Snowflake databases, making it ideal for data analysis, reporting, and business intelligence tasks that require access to enterprise data stored in Snowflake.
|
||||
|
||||
## Installation
|
||||
|
||||
To use this tool, you need to install the required dependencies:
|
||||
|
||||
```shell
|
||||
uv add cryptography snowflake-connector-python snowflake-sqlalchemy
|
||||
```
|
||||
|
||||
Or alternatively:
|
||||
|
||||
```shell
|
||||
uv sync --extra snowflake
|
||||
```
|
||||
|
||||
## Steps to Get Started
|
||||
|
||||
To effectively use the `SnowflakeSearchTool`, follow these steps:
|
||||
|
||||
1. **Install Dependencies**: Install the required packages using one of the commands above.
|
||||
2. **Configure Snowflake Connection**: Create a `SnowflakeConfig` object with your Snowflake credentials.
|
||||
3. **Initialize the Tool**: Create an instance of the tool with the necessary configuration.
|
||||
4. **Execute Queries**: Use the tool to run SQL queries against your Snowflake database.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to use the `SnowflakeSearchTool` to query data from a Snowflake database:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools import SnowflakeSearchTool, SnowflakeConfig
|
||||
|
||||
# Create Snowflake configuration
|
||||
config = SnowflakeConfig(
|
||||
account="your_account",
|
||||
user="your_username",
|
||||
password="your_password",
|
||||
warehouse="COMPUTE_WH",
|
||||
database="your_database",
|
||||
snowflake_schema="your_schema"
|
||||
)
|
||||
|
||||
# Initialize the tool
|
||||
snowflake_tool = SnowflakeSearchTool(config=config)
|
||||
|
||||
# Define an agent that uses the tool
|
||||
data_analyst_agent = Agent(
|
||||
role="Data Analyst",
|
||||
goal="Analyze data from Snowflake database",
|
||||
backstory="An expert data analyst who can extract insights from enterprise data.",
|
||||
tools=[snowflake_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Example task to query sales data
|
||||
query_task = Task(
|
||||
description="Query the sales data for the last quarter and summarize the top 5 products by revenue.",
|
||||
expected_output="A summary of the top 5 products by revenue for the last quarter.",
|
||||
agent=data_analyst_agent,
|
||||
)
|
||||
|
||||
# Create and run the crew
|
||||
crew = Crew(agents=[data_analyst_agent],
|
||||
tasks=[query_task])
|
||||
result = crew.kickoff()
|
||||
```
|
||||
|
||||
You can also customize the tool with additional parameters:
|
||||
|
||||
```python Code
|
||||
# Initialize the tool with custom parameters
|
||||
snowflake_tool = SnowflakeSearchTool(
|
||||
config=config,
|
||||
pool_size=10,
|
||||
max_retries=5,
|
||||
retry_delay=2.0,
|
||||
enable_caching=True
|
||||
)
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
### SnowflakeConfig Parameters
|
||||
|
||||
The `SnowflakeConfig` class accepts the following parameters:
|
||||
|
||||
- **account**: Required. Snowflake account identifier.
|
||||
- **user**: Required. Snowflake username.
|
||||
- **password**: Optional*. Snowflake password.
|
||||
- **private_key_path**: Optional*. Path to private key file (alternative to password).
|
||||
- **warehouse**: Required. Snowflake warehouse name.
|
||||
- **database**: Required. Default database.
|
||||
- **snowflake_schema**: Required. Default schema.
|
||||
- **role**: Optional. Snowflake role.
|
||||
- **session_parameters**: Optional. Custom session parameters as a dictionary.
|
||||
|
||||
*Either `password` or `private_key_path` must be provided.
|
||||
|
||||
### SnowflakeSearchTool Parameters
|
||||
|
||||
The `SnowflakeSearchTool` accepts the following parameters during initialization:
|
||||
|
||||
- **config**: Required. A `SnowflakeConfig` object containing connection details.
|
||||
- **pool_size**: Optional. Number of connections in the pool. Default is 5.
|
||||
- **max_retries**: Optional. Maximum retry attempts for failed queries. Default is 3.
|
||||
- **retry_delay**: Optional. Delay between retries in seconds. Default is 1.0.
|
||||
- **enable_caching**: Optional. Whether to enable query result caching. Default is True.
|
||||
|
||||
## Usage
|
||||
|
||||
When using the `SnowflakeSearchTool`, you need to provide the following parameters:
|
||||
|
||||
- **query**: Required. The SQL query to execute.
|
||||
- **database**: Optional. Override the default database specified in the config.
|
||||
- **snowflake_schema**: Optional. Override the default schema specified in the config.
|
||||
- **timeout**: Optional. Query timeout in seconds. Default is 300.
|
||||
|
||||
The tool will return the query results as a list of dictionaries, where each dictionary represents a row with column names as keys.
|
||||
|
||||
```python Code
|
||||
# Example of using the tool with an agent
|
||||
data_analyst = Agent(
|
||||
role="Data Analyst",
|
||||
goal="Analyze sales data from Snowflake",
|
||||
backstory="An expert data analyst with experience in SQL and data visualization.",
|
||||
tools=[snowflake_tool],
|
||||
verbose=True
|
||||
)
|
||||
|
||||
# The agent will use the tool with parameters like:
|
||||
# query="SELECT product_name, SUM(revenue) as total_revenue FROM sales GROUP BY product_name ORDER BY total_revenue DESC LIMIT 5"
|
||||
# timeout=600
|
||||
|
||||
# Create a task for the agent
|
||||
analysis_task = Task(
|
||||
description="Query the sales database and identify the top 5 products by revenue for the last quarter.",
|
||||
expected_output="A detailed analysis of the top 5 products by revenue.",
|
||||
agent=data_analyst
|
||||
)
|
||||
|
||||
# Run the task
|
||||
crew = Crew(
|
||||
agents=[data_analyst],
|
||||
tasks=[analysis_task]
|
||||
)
|
||||
result = crew.kickoff()
|
||||
```
|
||||
|
||||
## Advanced Features
|
||||
|
||||
### Connection Pooling
|
||||
|
||||
The `SnowflakeSearchTool` implements connection pooling to improve performance by reusing database connections. You can control the pool size with the `pool_size` parameter.
|
||||
|
||||
### Automatic Retries
|
||||
|
||||
The tool automatically retries failed queries with exponential backoff. You can configure the retry behavior with the `max_retries` and `retry_delay` parameters.
|
||||
|
||||
### Query Result Caching
|
||||
|
||||
To improve performance for repeated queries, the tool can cache query results. This feature is enabled by default but can be disabled by setting `enable_caching=False`.
|
||||
|
||||
### Key-Pair Authentication
|
||||
|
||||
In addition to password authentication, the tool supports key-pair authentication for enhanced security:
|
||||
|
||||
```python Code
|
||||
config = SnowflakeConfig(
|
||||
account="your_account",
|
||||
user="your_username",
|
||||
private_key_path="/path/to/your/private/key.p8",
|
||||
warehouse="COMPUTE_WH",
|
||||
database="your_database",
|
||||
snowflake_schema="your_schema"
|
||||
)
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
The `SnowflakeSearchTool` includes comprehensive error handling for common Snowflake issues:
|
||||
|
||||
- Connection failures
|
||||
- Query timeouts
|
||||
- Authentication errors
|
||||
- Database and schema errors
|
||||
|
||||
When an error occurs, the tool will attempt to retry the operation (if configured) and provide detailed error information.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `SnowflakeSearchTool` provides a powerful way to integrate Snowflake data warehouses with CrewAI agents. With features like connection pooling, automatic retries, and query caching, it enables efficient and reliable access to enterprise data. This tool is particularly useful for data analysis, reporting, and business intelligence tasks that require access to structured data stored in Snowflake.
|
||||
163
docs/en/tools/database-data/weaviatevectorsearchtool.mdx
Normal file
163
docs/en/tools/database-data/weaviatevectorsearchtool.mdx
Normal file
@@ -0,0 +1,163 @@
|
||||
---
|
||||
title: Weaviate Vector Search
|
||||
description: The `WeaviateVectorSearchTool` is designed to search a Weaviate vector database for semantically similar documents.
|
||||
icon: network-wired
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
|
||||
The `WeaviateVectorSearchTool` is specifically crafted for conducting semantic searches within documents stored in a Weaviate vector database. This tool allows you to find semantically similar documents to a given query, leveraging the power of vector embeddings for more accurate and contextually relevant search results.
|
||||
|
||||
[Weaviate](https://weaviate.io/) is a vector database that stores and queries vector embeddings, enabling semantic search capabilities.
|
||||
|
||||
## Installation
|
||||
|
||||
To incorporate this tool into your project, you need to install the Weaviate client:
|
||||
|
||||
```shell
|
||||
uv add weaviate-client
|
||||
```
|
||||
|
||||
## Steps to Get Started
|
||||
|
||||
To effectively use the `WeaviateVectorSearchTool`, follow these steps:
|
||||
|
||||
1. **Package Installation**: Confirm that the `crewai[tools]` and `weaviate-client` packages are installed in your Python environment.
|
||||
2. **Weaviate Setup**: Set up a Weaviate cluster. You can follow the [Weaviate documentation](https://weaviate.io/developers/wcs/manage-clusters/connect) for instructions.
|
||||
3. **API Keys**: Obtain your Weaviate cluster URL and API key.
|
||||
4. **OpenAI API Key**: Ensure you have an OpenAI API key set in your environment variables as `OPENAI_API_KEY`.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to initialize the tool and execute a search:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import WeaviateVectorSearchTool
|
||||
|
||||
# Initialize the tool
|
||||
tool = WeaviateVectorSearchTool(
|
||||
collection_name='example_collections',
|
||||
limit=3,
|
||||
weaviate_cluster_url="https://your-weaviate-cluster-url.com",
|
||||
weaviate_api_key="your-weaviate-api-key",
|
||||
)
|
||||
|
||||
@agent
|
||||
def search_agent(self) -> Agent:
|
||||
'''
|
||||
This agent uses the WeaviateVectorSearchTool to search for
|
||||
semantically similar documents in a Weaviate vector database.
|
||||
'''
|
||||
return Agent(
|
||||
config=self.agents_config["search_agent"],
|
||||
tools=[tool]
|
||||
)
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `WeaviateVectorSearchTool` accepts the following parameters:
|
||||
|
||||
- **collection_name**: Required. The name of the collection to search within.
|
||||
- **weaviate_cluster_url**: Required. The URL of the Weaviate cluster.
|
||||
- **weaviate_api_key**: Required. The API key for the Weaviate cluster.
|
||||
- **limit**: Optional. The number of results to return. Default is `3`.
|
||||
- **vectorizer**: Optional. The vectorizer to use. If not provided, it will use `text2vec_openai` with the `nomic-embed-text` model.
|
||||
- **generative_model**: Optional. The generative model to use. If not provided, it will use OpenAI's `gpt-4o`.
|
||||
|
||||
## Advanced Configuration
|
||||
|
||||
You can customize the vectorizer and generative model used by the tool:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import WeaviateVectorSearchTool
|
||||
from weaviate.classes.config import Configure
|
||||
|
||||
# Setup custom model for vectorizer and generative model
|
||||
tool = WeaviateVectorSearchTool(
|
||||
collection_name='example_collections',
|
||||
limit=3,
|
||||
vectorizer=Configure.Vectorizer.text2vec_openai(model="nomic-embed-text"),
|
||||
generative_model=Configure.Generative.openai(model="gpt-4o-mini"),
|
||||
weaviate_cluster_url="https://your-weaviate-cluster-url.com",
|
||||
weaviate_api_key="your-weaviate-api-key",
|
||||
)
|
||||
```
|
||||
|
||||
## Preloading Documents
|
||||
|
||||
You can preload your Weaviate database with documents before using the tool:
|
||||
|
||||
```python Code
|
||||
import os
|
||||
from crewai_tools import WeaviateVectorSearchTool
|
||||
import weaviate
|
||||
from weaviate.classes.init import Auth
|
||||
|
||||
# Connect to Weaviate
|
||||
client = weaviate.connect_to_weaviate_cloud(
|
||||
cluster_url="https://your-weaviate-cluster-url.com",
|
||||
auth_credentials=Auth.api_key("your-weaviate-api-key"),
|
||||
headers={"X-OpenAI-Api-Key": "your-openai-api-key"}
|
||||
)
|
||||
|
||||
# Get or create collection
|
||||
test_docs = client.collections.get("example_collections")
|
||||
if not test_docs:
|
||||
test_docs = client.collections.create(
|
||||
name="example_collections",
|
||||
vectorizer_config=Configure.Vectorizer.text2vec_openai(model="nomic-embed-text"),
|
||||
generative_config=Configure.Generative.openai(model="gpt-4o"),
|
||||
)
|
||||
|
||||
# Load documents
|
||||
docs_to_load = os.listdir("knowledge")
|
||||
with test_docs.batch.dynamic() as batch:
|
||||
for d in docs_to_load:
|
||||
with open(os.path.join("knowledge", d), "r") as f:
|
||||
content = f.read()
|
||||
batch.add_object(
|
||||
{
|
||||
"content": content,
|
||||
"year": d.split("_")[0],
|
||||
}
|
||||
)
|
||||
|
||||
# Initialize the tool
|
||||
tool = WeaviateVectorSearchTool(
|
||||
collection_name='example_collections',
|
||||
limit=3,
|
||||
weaviate_cluster_url="https://your-weaviate-cluster-url.com",
|
||||
weaviate_api_key="your-weaviate-api-key",
|
||||
)
|
||||
```
|
||||
|
||||
## Agent Integration Example
|
||||
|
||||
Here's how to integrate the `WeaviateVectorSearchTool` with a CrewAI agent:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent
|
||||
from crewai_tools import WeaviateVectorSearchTool
|
||||
|
||||
# Initialize the tool
|
||||
weaviate_tool = WeaviateVectorSearchTool(
|
||||
collection_name='example_collections',
|
||||
limit=3,
|
||||
weaviate_cluster_url="https://your-weaviate-cluster-url.com",
|
||||
weaviate_api_key="your-weaviate-api-key",
|
||||
)
|
||||
|
||||
# Create an agent with the tool
|
||||
rag_agent = Agent(
|
||||
name="rag_agent",
|
||||
role="You are a helpful assistant that can answer questions with the help of the WeaviateVectorSearchTool.",
|
||||
llm="gpt-4o-mini",
|
||||
tools=[weaviate_tool],
|
||||
)
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `WeaviateVectorSearchTool` provides a powerful way to search for semantically similar documents in a Weaviate vector database. By leveraging vector embeddings, it enables more accurate and contextually relevant search results compared to traditional keyword-based searches. This tool is particularly useful for applications that require finding information based on meaning rather than exact matches.
|
||||
77
docs/en/tools/file-document/csvsearchtool.mdx
Normal file
77
docs/en/tools/file-document/csvsearchtool.mdx
Normal file
@@ -0,0 +1,77 @@
|
||||
---
|
||||
title: CSV RAG Search
|
||||
description: The `CSVSearchTool` is a powerful RAG (Retrieval-Augmented Generation) tool designed for semantic searches within a CSV file's content.
|
||||
icon: file-csv
|
||||
---
|
||||
|
||||
# `CSVSearchTool`
|
||||
|
||||
<Note>
|
||||
**Experimental**: We are still working on improving tools, so there might be unexpected behavior or changes in the future.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
This tool is used to perform a RAG (Retrieval-Augmented Generation) search within a CSV file's content. It allows users to semantically search for queries in the content of a specified CSV file.
|
||||
This feature is particularly useful for extracting information from large CSV datasets where traditional search methods might be inefficient. All tools with "Search" in their name, including CSVSearchTool,
|
||||
are RAG tools designed for searching different sources of data.
|
||||
|
||||
## Installation
|
||||
|
||||
Install the crewai_tools package
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
```python Code
|
||||
from crewai_tools import CSVSearchTool
|
||||
|
||||
# Initialize the tool with a specific CSV file.
|
||||
# This setup allows the agent to only search the given CSV file.
|
||||
tool = CSVSearchTool(csv='path/to/your/csvfile.csv')
|
||||
|
||||
# OR
|
||||
|
||||
# Initialize the tool without a specific CSV file.
|
||||
# Agent will need to provide the CSV path at runtime.
|
||||
tool = CSVSearchTool()
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
The following parameters can be used to customize the `CSVSearchTool`'s behavior:
|
||||
|
||||
| Argument | Type | Description |
|
||||
|:---------------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| **csv** | `string` | _Optional_. The path to the CSV file you want to search. This is a mandatory argument if the tool was initialized without a specific CSV file; otherwise, it is optional. |
|
||||
|
||||
## Custom model and embeddings
|
||||
|
||||
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
|
||||
|
||||
```python Code
|
||||
tool = CSVSearchTool(
|
||||
config=dict(
|
||||
llm=dict(
|
||||
provider="ollama", # or google, openai, anthropic, llama2, ...
|
||||
config=dict(
|
||||
model="llama2",
|
||||
# temperature=0.5,
|
||||
# top_p=1,
|
||||
# stream=true,
|
||||
),
|
||||
),
|
||||
embedder=dict(
|
||||
provider="google", # or openai, ollama, ...
|
||||
config=dict(
|
||||
model="models/embedding-001",
|
||||
task_type="retrieval_document",
|
||||
# title="Embeddings",
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
```
|
||||
53
docs/en/tools/file-document/directoryreadtool.mdx
Normal file
53
docs/en/tools/file-document/directoryreadtool.mdx
Normal file
@@ -0,0 +1,53 @@
|
||||
---
|
||||
title: Directory Read
|
||||
description: The `DirectoryReadTool` is a powerful utility designed to provide a comprehensive listing of directory contents.
|
||||
icon: folder-tree
|
||||
---
|
||||
|
||||
# `DirectoryReadTool`
|
||||
|
||||
<Note>
|
||||
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
The DirectoryReadTool is a powerful utility designed to provide a comprehensive listing of directory contents.
|
||||
It can recursively navigate through the specified directory, offering users a detailed enumeration of all files, including those within subdirectories.
|
||||
This tool is crucial for tasks that require a thorough inventory of directory structures or for validating the organization of files within directories.
|
||||
|
||||
## Installation
|
||||
|
||||
To utilize the DirectoryReadTool in your project, install the `crewai_tools` package. If this package is not yet part of your environment, you can install it using pip with the command below:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
This command installs the latest version of the `crewai_tools` package, granting access to the DirectoryReadTool among other utilities.
|
||||
|
||||
## Example
|
||||
|
||||
Employing the DirectoryReadTool is straightforward. The following code snippet demonstrates how to set it up and use the tool to list the contents of a specified directory:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import DirectoryReadTool
|
||||
|
||||
# Initialize the tool so the agent can read any directory's content
|
||||
# it learns about during execution
|
||||
tool = DirectoryReadTool()
|
||||
|
||||
# OR
|
||||
|
||||
# Initialize the tool with a specific directory,
|
||||
# so the agent can only read the content of the specified directory
|
||||
tool = DirectoryReadTool(directory='/path/to/your/directory')
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
The following parameters can be used to customize the `DirectoryReadTool`'s behavior:
|
||||
|
||||
| Argument | Type | Description |
|
||||
|:---------------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| **directory** | `string` | _Optional_. An argument that specifies the path to the directory whose contents you wish to list. It accepts both absolute and relative paths, guiding the tool to the desired directory for content listing. |
|
||||
67
docs/en/tools/file-document/directorysearchtool.mdx
Normal file
67
docs/en/tools/file-document/directorysearchtool.mdx
Normal file
@@ -0,0 +1,67 @@
|
||||
---
|
||||
title: Directory RAG Search
|
||||
description: The `DirectorySearchTool` is a powerful RAG (Retrieval-Augmented Generation) tool designed for semantic searches within a directory's content.
|
||||
icon: address-book
|
||||
---
|
||||
|
||||
# `DirectorySearchTool`
|
||||
|
||||
<Note>
|
||||
**Experimental**: The DirectorySearchTool is under continuous development. Features and functionalities might evolve, and unexpected behavior may occur as we refine the tool.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
The DirectorySearchTool enables semantic search within the content of specified directories, leveraging the Retrieval-Augmented Generation (RAG) methodology for efficient navigation through files. Designed for flexibility, it allows users to dynamically specify search directories at runtime or set a fixed directory during initial setup.
|
||||
|
||||
## Installation
|
||||
|
||||
To use the DirectorySearchTool, begin by installing the crewai_tools package. Execute the following command in your terminal:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Initialization and Usage
|
||||
|
||||
Import the DirectorySearchTool from the `crewai_tools` package to start. You can initialize the tool without specifying a directory, enabling the setting of the search directory at runtime. Alternatively, the tool can be initialized with a predefined directory.
|
||||
|
||||
```python Code
|
||||
from crewai_tools import DirectorySearchTool
|
||||
|
||||
# For dynamic directory specification at runtime
|
||||
tool = DirectorySearchTool()
|
||||
|
||||
# For fixed directory searches
|
||||
tool = DirectorySearchTool(directory='/path/to/directory')
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
- `directory`: A string argument that specifies the search directory. This is optional during initialization but required for searches if not set initially.
|
||||
|
||||
## Custom Model and Embeddings
|
||||
|
||||
The DirectorySearchTool uses OpenAI for embeddings and summarization by default. Customization options for these settings include changing the model provider and configuration, enhancing flexibility for advanced users.
|
||||
|
||||
```python Code
|
||||
tool = DirectorySearchTool(
|
||||
config=dict(
|
||||
llm=dict(
|
||||
provider="ollama", # Options include ollama, google, anthropic, llama2, and more
|
||||
config=dict(
|
||||
model="llama2",
|
||||
# Additional configurations here
|
||||
),
|
||||
),
|
||||
embedder=dict(
|
||||
provider="google", # or openai, ollama, ...
|
||||
config=dict(
|
||||
model="models/embedding-001",
|
||||
task_type="retrieval_document",
|
||||
# title="Embeddings",
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
```
|
||||
79
docs/en/tools/file-document/docxsearchtool.mdx
Normal file
79
docs/en/tools/file-document/docxsearchtool.mdx
Normal file
@@ -0,0 +1,79 @@
|
||||
---
|
||||
title: DOCX RAG Search
|
||||
description: The `DOCXSearchTool` is a RAG tool designed for semantic searching within DOCX documents.
|
||||
icon: file-word
|
||||
---
|
||||
|
||||
# `DOCXSearchTool`
|
||||
|
||||
<Note>
|
||||
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
The `DOCXSearchTool` is a RAG tool designed for semantic searching within DOCX documents.
|
||||
It enables users to effectively search and extract relevant information from DOCX files using query-based searches.
|
||||
This tool is invaluable for data analysis, information management, and research tasks,
|
||||
streamlining the process of finding specific information within large document collections.
|
||||
|
||||
## Installation
|
||||
|
||||
Install the crewai_tools package by running the following command in your terminal:
|
||||
|
||||
```shell
|
||||
uv pip install docx2txt 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates initializing the DOCXSearchTool to search within any DOCX file's content or with a specific DOCX file path.
|
||||
|
||||
```python Code
|
||||
from crewai_tools import DOCXSearchTool
|
||||
|
||||
# Initialize the tool to search within any DOCX file's content
|
||||
tool = DOCXSearchTool()
|
||||
|
||||
# OR
|
||||
|
||||
# Initialize the tool with a specific DOCX file,
|
||||
# so the agent can only search the content of the specified DOCX file
|
||||
tool = DOCXSearchTool(docx='path/to/your/document.docx')
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
The following parameters can be used to customize the `DOCXSearchTool`'s behavior:
|
||||
|
||||
| Argument | Type | Description |
|
||||
|:---------------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| **docx** | `string` | _Optional_. An argument that specifies the path to the DOCX file you want to search. If not provided during initialization, the tool allows for later specification of any DOCX file's content path for searching. |
|
||||
|
||||
## Custom model and embeddings
|
||||
|
||||
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
|
||||
|
||||
```python Code
|
||||
tool = DOCXSearchTool(
|
||||
config=dict(
|
||||
llm=dict(
|
||||
provider="ollama", # or google, openai, anthropic, llama2, ...
|
||||
config=dict(
|
||||
model="llama2",
|
||||
# temperature=0.5,
|
||||
# top_p=1,
|
||||
# stream=true,
|
||||
),
|
||||
),
|
||||
embedder=dict(
|
||||
provider="google", # or openai, ollama, ...
|
||||
config=dict(
|
||||
model="models/embedding-001",
|
||||
task_type="retrieval_document",
|
||||
# title="Embeddings",
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
```
|
||||
44
docs/en/tools/file-document/filereadtool.mdx
Normal file
44
docs/en/tools/file-document/filereadtool.mdx
Normal file
@@ -0,0 +1,44 @@
|
||||
---
|
||||
title: File Read
|
||||
description: The `FileReadTool` is designed to read files from the local file system.
|
||||
icon: folders
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
<Note>
|
||||
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
|
||||
</Note>
|
||||
|
||||
The FileReadTool conceptually represents a suite of functionalities within the crewai_tools package aimed at facilitating file reading and content retrieval.
|
||||
This suite includes tools for processing batch text files, reading runtime configuration files, and importing data for analytics.
|
||||
It supports a variety of text-based file formats such as `.txt`, `.csv`, `.json`, and more. Depending on the file type, the suite offers specialized functionality,
|
||||
such as converting JSON content into a Python dictionary for ease of use.
|
||||
|
||||
## Installation
|
||||
|
||||
To utilize the functionalities previously attributed to the FileReadTool, install the crewai_tools package:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Usage Example
|
||||
|
||||
To get started with the FileReadTool:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import FileReadTool
|
||||
|
||||
# Initialize the tool to read any files the agents knows or lean the path for
|
||||
file_read_tool = FileReadTool()
|
||||
|
||||
# OR
|
||||
|
||||
# Initialize the tool with a specific file path, so the agent can only read the content of the specified file
|
||||
file_read_tool = FileReadTool(file_path='path/to/your/file.txt')
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
- `file_path`: The path to the file you want to read. It accepts both absolute and relative paths. Ensure the file exists and you have the necessary permissions to access it.
|
||||
50
docs/en/tools/file-document/filewritetool.mdx
Normal file
50
docs/en/tools/file-document/filewritetool.mdx
Normal file
@@ -0,0 +1,50 @@
|
||||
---
|
||||
title: File Write
|
||||
description: The `FileWriterTool` is designed to write content to files.
|
||||
icon: file-pen
|
||||
---
|
||||
|
||||
# `FileWriterTool`
|
||||
|
||||
## Description
|
||||
|
||||
The `FileWriterTool` is a component of the crewai_tools package, designed to simplify the process of writing content to files with cross-platform compatibility (Windows, Linux, macOS).
|
||||
It is particularly useful in scenarios such as generating reports, saving logs, creating configuration files, and more.
|
||||
This tool handles path differences across operating systems, supports UTF-8 encoding, and automatically creates directories if they don't exist, making it easier to organize your output reliably across different platforms.
|
||||
|
||||
## Installation
|
||||
|
||||
Install the crewai_tools package to use the `FileWriterTool` in your projects:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
To get started with the `FileWriterTool`:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import FileWriterTool
|
||||
|
||||
# Initialize the tool
|
||||
file_writer_tool = FileWriterTool()
|
||||
|
||||
# Write content to a file in a specified directory
|
||||
result = file_writer_tool._run('example.txt', 'This is a test content.', 'test_directory')
|
||||
print(result)
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
- `filename`: The name of the file you want to create or overwrite.
|
||||
- `content`: The content to write into the file.
|
||||
- `directory` (optional): The path to the directory where the file will be created. Defaults to the current directory (`.`). If the directory does not exist, it will be created.
|
||||
|
||||
## Conclusion
|
||||
|
||||
By integrating the `FileWriterTool` into your crews, the agents can reliably write content to files across different operating systems.
|
||||
This tool is essential for tasks that require saving output data, creating structured file systems, and handling cross-platform file operations.
|
||||
It's particularly recommended for Windows users who may encounter file writing issues with standard Python file operations.
|
||||
|
||||
By adhering to the setup and usage guidelines provided, incorporating this tool into projects is straightforward and ensures consistent file writing behavior across all platforms.
|
||||
75
docs/en/tools/file-document/jsonsearchtool.mdx
Normal file
75
docs/en/tools/file-document/jsonsearchtool.mdx
Normal file
@@ -0,0 +1,75 @@
|
||||
---
|
||||
title: JSON RAG Search
|
||||
description: The `JSONSearchTool` is designed to search JSON files and return the most relevant results.
|
||||
icon: file-code
|
||||
---
|
||||
|
||||
# `JSONSearchTool`
|
||||
|
||||
<Note>
|
||||
The JSONSearchTool is currently in an experimental phase. This means the tool
|
||||
is under active development, and users might encounter unexpected behavior or
|
||||
changes. We highly encourage feedback on any issues or suggestions for
|
||||
improvements.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
The JSONSearchTool is designed to facilitate efficient and precise searches within JSON file contents. It utilizes a RAG (Retrieve and Generate) search mechanism, allowing users to specify a JSON path for targeted searches within a particular JSON file. This capability significantly improves the accuracy and relevance of search results.
|
||||
|
||||
## Installation
|
||||
|
||||
To install the JSONSearchTool, use the following pip command:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
Here are updated examples on how to utilize the JSONSearchTool effectively for searching within JSON files. These examples take into account the current implementation and usage patterns identified in the codebase.
|
||||
|
||||
```python Code
|
||||
from crewai_tools import JSONSearchTool
|
||||
|
||||
# General JSON content search
|
||||
# This approach is suitable when the JSON path is either known beforehand or can be dynamically identified.
|
||||
tool = JSONSearchTool()
|
||||
|
||||
# Restricting search to a specific JSON file
|
||||
# Use this initialization method when you want to limit the search scope to a specific JSON file.
|
||||
tool = JSONSearchTool(json_path='./path/to/your/file.json')
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
- `json_path` (str, optional): Specifies the path to the JSON file to be searched. This argument is not required if the tool is initialized for a general search. When provided, it confines the search to the specified JSON file.
|
||||
|
||||
## Configuration Options
|
||||
|
||||
The JSONSearchTool supports extensive customization through a configuration dictionary. This allows users to select different models for embeddings and summarization based on their requirements.
|
||||
|
||||
```python Code
|
||||
tool = JSONSearchTool(
|
||||
config={
|
||||
"llm": {
|
||||
"provider": "ollama", # Other options include google, openai, anthropic, llama2, etc.
|
||||
"config": {
|
||||
"model": "llama2",
|
||||
# Additional optional configurations can be specified here.
|
||||
# temperature=0.5,
|
||||
# top_p=1,
|
||||
# stream=true,
|
||||
},
|
||||
},
|
||||
"embedding_model": {
|
||||
"provider": "google", # or openai, ollama, ...
|
||||
"config": {
|
||||
"model": "models/embedding-001",
|
||||
"task_type": "retrieval_document",
|
||||
# Further customization options can be added here.
|
||||
},
|
||||
},
|
||||
}
|
||||
)
|
||||
```
|
||||
73
docs/en/tools/file-document/mdxsearchtool.mdx
Normal file
73
docs/en/tools/file-document/mdxsearchtool.mdx
Normal file
@@ -0,0 +1,73 @@
|
||||
---
|
||||
title: MDX RAG Search
|
||||
description: The `MDXSearchTool` is designed to search MDX files and return the most relevant results.
|
||||
icon: markdown
|
||||
---
|
||||
|
||||
# `MDXSearchTool`
|
||||
|
||||
<Note>
|
||||
The MDXSearchTool is in continuous development. Features may be added or removed, and functionality could change unpredictably as we refine the tool.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
The MDX Search Tool is a component of the `crewai_tools` package aimed at facilitating advanced markdown language extraction. It enables users to effectively search and extract relevant information from MD files using query-based searches. This tool is invaluable for data analysis, information management, and research tasks, streamlining the process of finding specific information within large document collections.
|
||||
|
||||
## Installation
|
||||
|
||||
Before using the MDX Search Tool, ensure the `crewai_tools` package is installed. If it is not, you can install it with the following command:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Usage Example
|
||||
|
||||
To use the MDX Search Tool, you must first set up the necessary environment variables. Then, integrate the tool into your crewAI project to begin your market research. Below is a basic example of how to do this:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import MDXSearchTool
|
||||
|
||||
# Initialize the tool to search any MDX content it learns about during execution
|
||||
tool = MDXSearchTool()
|
||||
|
||||
# OR
|
||||
|
||||
# Initialize the tool with a specific MDX file path for an exclusive search within that document
|
||||
tool = MDXSearchTool(mdx='path/to/your/document.mdx')
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
- mdx: **Optional**. Specifies the MDX file path for the search. It can be provided during initialization.
|
||||
|
||||
## Customization of Model and Embeddings
|
||||
|
||||
The tool defaults to using OpenAI for embeddings and summarization. For customization, utilize a configuration dictionary as shown below:
|
||||
|
||||
```python Code
|
||||
tool = MDXSearchTool(
|
||||
config=dict(
|
||||
llm=dict(
|
||||
provider="ollama", # Options include google, openai, anthropic, llama2, etc.
|
||||
config=dict(
|
||||
model="llama2",
|
||||
# Optional parameters can be included here.
|
||||
# temperature=0.5,
|
||||
# top_p=1,
|
||||
# stream=true,
|
||||
),
|
||||
),
|
||||
embedder=dict(
|
||||
provider="google", # or openai, ollama, ...
|
||||
config=dict(
|
||||
model="models/embedding-001",
|
||||
task_type="retrieval_document",
|
||||
# Optional title for the embeddings can be added here.
|
||||
# title="Embeddings",
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
```
|
||||
88
docs/en/tools/file-document/overview.mdx
Normal file
88
docs/en/tools/file-document/overview.mdx
Normal file
@@ -0,0 +1,88 @@
|
||||
---
|
||||
title: "Overview"
|
||||
description: "Read, write, and search through various file formats with CrewAI's document processing tools"
|
||||
icon: "face-smile"
|
||||
---
|
||||
|
||||
These tools enable your agents to work with various file formats and document types. From reading PDFs to processing JSON data, these tools handle all your document processing needs.
|
||||
|
||||
## **Available Tools**
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="File Read Tool" icon="folders" href="/en/tools/file-document/filereadtool">
|
||||
Read content from any file type including text, markdown, and more.
|
||||
</Card>
|
||||
|
||||
<Card title="File Write Tool" icon="file-pen" href="/en/tools/file-document/filewritetool">
|
||||
Write content to files, create new documents, and save processed data.
|
||||
</Card>
|
||||
|
||||
<Card title="PDF Search Tool" icon="file-pdf" href="/en/tools/file-document/pdfsearchtool">
|
||||
Search and extract text content from PDF documents efficiently.
|
||||
</Card>
|
||||
|
||||
<Card title="DOCX Search Tool" icon="file-word" href="/en/tools/file-document/docxsearchtool">
|
||||
Search through Microsoft Word documents and extract relevant content.
|
||||
</Card>
|
||||
|
||||
<Card title="JSON Search Tool" icon="brackets-curly" href="/en/tools/file-document/jsonsearchtool">
|
||||
Parse and search through JSON files with advanced query capabilities.
|
||||
</Card>
|
||||
|
||||
<Card title="CSV Search Tool" icon="table" href="/en/tools/file-document/csvsearchtool">
|
||||
Process and search through CSV files, extract specific rows and columns.
|
||||
</Card>
|
||||
|
||||
<Card title="XML Search Tool" icon="code" href="/en/tools/file-document/xmlsearchtool">
|
||||
Parse XML files and search for specific elements and attributes.
|
||||
</Card>
|
||||
|
||||
<Card title="MDX Search Tool" icon="markdown" href="/en/tools/file-document/mdxsearchtool">
|
||||
Search through MDX files and extract content from documentation.
|
||||
</Card>
|
||||
|
||||
<Card title="TXT Search Tool" icon="file-lines" href="/en/tools/file-document/txtsearchtool">
|
||||
Search through plain text files with pattern matching capabilities.
|
||||
</Card>
|
||||
|
||||
<Card title="Directory Search Tool" icon="folder-open" href="/en/tools/file-document/directorysearchtool">
|
||||
Search for files and folders within directory structures.
|
||||
</Card>
|
||||
|
||||
<Card title="Directory Read Tool" icon="folder" href="/en/tools/file-document/directoryreadtool">
|
||||
Read and list directory contents, file structures, and metadata.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
## **Common Use Cases**
|
||||
|
||||
- **Document Processing**: Extract and analyze content from various file formats
|
||||
- **Data Import**: Read structured data from CSV, JSON, and XML files
|
||||
- **Content Search**: Find specific information within large document collections
|
||||
- **File Management**: Organize and manipulate files and directories
|
||||
- **Data Export**: Save processed results to various file formats
|
||||
|
||||
## **Quick Start Example**
|
||||
|
||||
```python
|
||||
from crewai_tools import FileReadTool, PDFSearchTool, JSONSearchTool
|
||||
|
||||
# Create tools
|
||||
file_reader = FileReadTool()
|
||||
pdf_searcher = PDFSearchTool()
|
||||
json_processor = JSONSearchTool()
|
||||
|
||||
# Add to your agent
|
||||
agent = Agent(
|
||||
role="Document Analyst",
|
||||
tools=[file_reader, pdf_searcher, json_processor],
|
||||
goal="Process and analyze various document types"
|
||||
)
|
||||
```
|
||||
|
||||
## **Tips for Document Processing**
|
||||
|
||||
- **File Permissions**: Ensure your agent has proper read/write permissions
|
||||
- **Large Files**: Consider chunking for very large documents
|
||||
- **Format Support**: Check tool documentation for supported file formats
|
||||
- **Error Handling**: Implement proper error handling for corrupted or inaccessible files
|
||||
71
docs/en/tools/file-document/pdfsearchtool.mdx
Normal file
71
docs/en/tools/file-document/pdfsearchtool.mdx
Normal file
@@ -0,0 +1,71 @@
|
||||
---
|
||||
title: PDF RAG Search
|
||||
description: The `PDFSearchTool` is designed to search PDF files and return the most relevant results.
|
||||
icon: file-pdf
|
||||
---
|
||||
|
||||
# `PDFSearchTool`
|
||||
|
||||
<Note>
|
||||
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
The PDFSearchTool is a RAG tool designed for semantic searches within PDF content. It allows for inputting a search query and a PDF document, leveraging advanced search techniques to find relevant content efficiently.
|
||||
This capability makes it especially useful for extracting specific information from large PDF files quickly.
|
||||
|
||||
## Installation
|
||||
|
||||
To get started with the PDFSearchTool, first, ensure the crewai_tools package is installed with the following command:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
Here's how to use the PDFSearchTool to search within a PDF document:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import PDFSearchTool
|
||||
|
||||
# Initialize the tool allowing for any PDF content search if the path is provided during execution
|
||||
tool = PDFSearchTool()
|
||||
|
||||
# OR
|
||||
|
||||
# Initialize the tool with a specific PDF path for exclusive search within that document
|
||||
tool = PDFSearchTool(pdf='path/to/your/document.pdf')
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
- `pdf`: **Optional** The PDF path for the search. Can be provided at initialization or within the `run` method's arguments. If provided at initialization, the tool confines its search to the specified document.
|
||||
|
||||
## Custom model and embeddings
|
||||
|
||||
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
|
||||
|
||||
```python Code
|
||||
tool = PDFSearchTool(
|
||||
config=dict(
|
||||
llm=dict(
|
||||
provider="ollama", # or google, openai, anthropic, llama2, ...
|
||||
config=dict(
|
||||
model="llama2",
|
||||
# temperature=0.5,
|
||||
# top_p=1,
|
||||
# stream=true,
|
||||
),
|
||||
),
|
||||
embedder=dict(
|
||||
provider="google", # or openai, ollama, ...
|
||||
config=dict(
|
||||
model="models/embedding-001",
|
||||
task_type="retrieval_document",
|
||||
# title="Embeddings",
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
```
|
||||
80
docs/en/tools/file-document/txtsearchtool.mdx
Normal file
80
docs/en/tools/file-document/txtsearchtool.mdx
Normal file
@@ -0,0 +1,80 @@
|
||||
---
|
||||
title: TXT RAG Search
|
||||
description: The `TXTSearchTool` is designed to perform a RAG (Retrieval-Augmented Generation) search within the content of a text file.
|
||||
icon: file-lines
|
||||
---
|
||||
|
||||
## Overview
|
||||
|
||||
<Note>
|
||||
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
|
||||
</Note>
|
||||
|
||||
This tool is used to perform a RAG (Retrieval-Augmented Generation) search within the content of a text file.
|
||||
It allows for semantic searching of a query within a specified text file's content,
|
||||
making it an invaluable resource for quickly extracting information or finding specific sections of text based on the query provided.
|
||||
|
||||
## Installation
|
||||
|
||||
To use the `TXTSearchTool`, you first need to install the `crewai_tools` package.
|
||||
This can be done using pip, a package manager for Python.
|
||||
Open your terminal or command prompt and enter the following command:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
This command will download and install the TXTSearchTool along with any necessary dependencies.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to use the TXTSearchTool to search within a text file.
|
||||
This example shows both the initialization of the tool with a specific text file and the subsequent search within that file's content.
|
||||
|
||||
```python Code
|
||||
from crewai_tools import TXTSearchTool
|
||||
|
||||
# Initialize the tool to search within any text file's content
|
||||
# the agent learns about during its execution
|
||||
tool = TXTSearchTool()
|
||||
|
||||
# OR
|
||||
|
||||
# Initialize the tool with a specific text file,
|
||||
# so the agent can search within the given text file's content
|
||||
tool = TXTSearchTool(txt='path/to/text/file.txt')
|
||||
```
|
||||
|
||||
## Arguments
|
||||
- `txt` (str): **Optional**. The path to the text file you want to search.
|
||||
This argument is only required if the tool was not initialized with a specific text file;
|
||||
otherwise, the search will be conducted within the initially provided text file.
|
||||
|
||||
## Custom model and embeddings
|
||||
|
||||
By default, the tool uses OpenAI for both embeddings and summarization.
|
||||
To customize the model, you can use a config dictionary as follows:
|
||||
|
||||
```python Code
|
||||
tool = TXTSearchTool(
|
||||
config=dict(
|
||||
llm=dict(
|
||||
provider="ollama", # or google, openai, anthropic, llama2, ...
|
||||
config=dict(
|
||||
model="llama2",
|
||||
# temperature=0.5,
|
||||
# top_p=1,
|
||||
# stream=true,
|
||||
),
|
||||
),
|
||||
embedder=dict(
|
||||
provider="google", # or openai, ollama, ...
|
||||
config=dict(
|
||||
model="models/embedding-001",
|
||||
task_type="retrieval_document",
|
||||
# title="Embeddings",
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
```
|
||||
77
docs/en/tools/file-document/xmlsearchtool.mdx
Normal file
77
docs/en/tools/file-document/xmlsearchtool.mdx
Normal file
@@ -0,0 +1,77 @@
|
||||
---
|
||||
title: XML RAG Search
|
||||
description: The `XMLSearchTool` is designed to perform a RAG (Retrieval-Augmented Generation) search within the content of a XML file.
|
||||
icon: file-xml
|
||||
---
|
||||
|
||||
# `XMLSearchTool`
|
||||
|
||||
<Note>
|
||||
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
The XMLSearchTool is a cutting-edge RAG tool engineered for conducting semantic searches within XML files.
|
||||
Ideal for users needing to parse and extract information from XML content efficiently, this tool supports inputting a search query and an optional XML file path.
|
||||
By specifying an XML path, users can target their search more precisely to the content of that file, thereby obtaining more relevant search outcomes.
|
||||
|
||||
## Installation
|
||||
|
||||
To start using the XMLSearchTool, you must first install the crewai_tools package. This can be easily done with the following command:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
Here are two examples demonstrating how to use the XMLSearchTool.
|
||||
The first example shows searching within a specific XML file, while the second example illustrates initiating a search without predefining an XML path, providing flexibility in search scope.
|
||||
|
||||
```python Code
|
||||
from crewai_tools import XMLSearchTool
|
||||
|
||||
# Allow agents to search within any XML file's content
|
||||
#as it learns about their paths during execution
|
||||
tool = XMLSearchTool()
|
||||
|
||||
# OR
|
||||
|
||||
# Initialize the tool with a specific XML file path
|
||||
#for exclusive search within that document
|
||||
tool = XMLSearchTool(xml='path/to/your/xmlfile.xml')
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
- `xml`: This is the path to the XML file you wish to search.
|
||||
It is an optional parameter during the tool's initialization but must be provided either at initialization or as part of the `run` method's arguments to execute a search.
|
||||
|
||||
## Custom model and embeddings
|
||||
|
||||
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
|
||||
|
||||
```python Code
|
||||
tool = XMLSearchTool(
|
||||
config=dict(
|
||||
llm=dict(
|
||||
provider="ollama", # or google, openai, anthropic, llama2, ...
|
||||
config=dict(
|
||||
model="llama2",
|
||||
# temperature=0.5,
|
||||
# top_p=1,
|
||||
# stream=true,
|
||||
),
|
||||
),
|
||||
embedder=dict(
|
||||
provider="google", # or openai, ollama, ...
|
||||
config=dict(
|
||||
model="models/embedding-001",
|
||||
task_type="retrieval_document",
|
||||
# title="Embeddings",
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
```
|
||||
120
docs/en/tools/overview.mdx
Normal file
120
docs/en/tools/overview.mdx
Normal file
@@ -0,0 +1,120 @@
|
||||
---
|
||||
title: "Tools Overview"
|
||||
description: "Discover CrewAI's extensive library of 40+ tools to supercharge your AI agents"
|
||||
icon: "toolbox"
|
||||
---
|
||||
|
||||
CrewAI provides an extensive library of pre-built tools to enhance your agents' capabilities. From file processing to web scraping, database queries to AI services - we've got you covered.
|
||||
|
||||
## **Tool Categories**
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card
|
||||
title="File & Document"
|
||||
icon="file-check"
|
||||
href="/en/tools/file-document/overview"
|
||||
color="#3B82F6"
|
||||
>
|
||||
Read, write, and search through various file formats including PDF, DOCX, JSON, CSV, and more. Perfect for document processing workflows.
|
||||
</Card>
|
||||
|
||||
<Card
|
||||
title="Web Scraping & Browsing"
|
||||
icon="globe"
|
||||
href="/en/tools/web-scraping/overview"
|
||||
color="#10B981"
|
||||
>
|
||||
Extract data from websites, automate browser interactions, and scrape content at scale with tools like Firecrawl, Selenium, and more.
|
||||
</Card>
|
||||
|
||||
<Card
|
||||
title="Search & Research"
|
||||
icon="magnifying-glass"
|
||||
href="/en/tools/search-research/overview"
|
||||
color="#F59E0B"
|
||||
>
|
||||
Perform web searches, find code repositories, research YouTube content, and discover information across the internet.
|
||||
</Card>
|
||||
|
||||
<Card
|
||||
title="Database & Data"
|
||||
icon="database"
|
||||
href="/en/tools/database-data/overview"
|
||||
color="#8B5CF6"
|
||||
>
|
||||
Connect to SQL databases, vector stores, and data warehouses. Query MySQL, PostgreSQL, Snowflake, Qdrant, and Weaviate.
|
||||
</Card>
|
||||
|
||||
<Card
|
||||
title="AI & Machine Learning"
|
||||
icon="brain"
|
||||
href="/en/tools/ai-ml/overview"
|
||||
color="#EF4444"
|
||||
>
|
||||
Generate images with DALL-E, process vision tasks, integrate with LangChain, build RAG systems, and leverage code interpreters.
|
||||
</Card>
|
||||
|
||||
<Card
|
||||
title="Cloud & Storage"
|
||||
icon="cloud"
|
||||
href="/en/tools/cloud-storage/overview"
|
||||
color="#06B6D4"
|
||||
>
|
||||
Interact with cloud services including AWS S3, Amazon Bedrock, and other cloud storage and AI services.
|
||||
</Card>
|
||||
|
||||
<Card
|
||||
title="Automation & Integration"
|
||||
icon="bolt"
|
||||
href="/en/tools/automation/overview"
|
||||
color="#84CC16"
|
||||
>
|
||||
Automate workflows with Apify, Composio, and other integration platforms to connect your agents with external services.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
## **Quick Access**
|
||||
|
||||
Need a specific tool? Here are some popular choices:
|
||||
|
||||
<CardGroup cols={3}>
|
||||
<Card title="RAG Tool" icon="image" href="/en/tools/ai-ml/ragtool">
|
||||
Implement Retrieval-Augmented Generation
|
||||
</Card>
|
||||
<Card title="Serper Dev" icon="book-atlas" href="/en/tools/search-research/serperdevtool">
|
||||
Google search API
|
||||
</Card>
|
||||
<Card title="File Read" icon="file" href="/en/tools/file-document/filereadtool">
|
||||
Read any file type
|
||||
</Card>
|
||||
<Card title="Scrape Website" icon="globe" href="/en/tools/web-scraping/scrapewebsitetool">
|
||||
Extract web content
|
||||
</Card>
|
||||
<Card title="Code Interpreter" icon="code" href="/en/tools/ai-ml/codeinterpretertool">
|
||||
Execute Python code
|
||||
</Card>
|
||||
<Card title="S3 Reader" icon="cloud" href="/en/tools/cloud-storage/s3readertool">
|
||||
Access AWS S3 files
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
## **Getting Started**
|
||||
|
||||
To use any tool in your CrewAI project:
|
||||
|
||||
1. **Import** the tool in your crew configuration
|
||||
2. **Add** it to your agent's tools list
|
||||
3. **Configure** any required API keys or settings
|
||||
|
||||
```python
|
||||
from crewai_tools import FileReadTool, SerperDevTool
|
||||
|
||||
# Add tools to your agent
|
||||
agent = Agent(
|
||||
role="Research Analyst",
|
||||
tools=[FileReadTool(), SerperDevTool()],
|
||||
# ... other configuration
|
||||
)
|
||||
```
|
||||
|
||||
Ready to explore? Pick a category above to discover tools that fit your use case!
|
||||
96
docs/en/tools/search-research/bravesearchtool.mdx
Normal file
96
docs/en/tools/search-research/bravesearchtool.mdx
Normal file
@@ -0,0 +1,96 @@
|
||||
---
|
||||
title: Brave Search
|
||||
description: The `BraveSearchTool` is designed to search the internet using the Brave Search API.
|
||||
icon: searchengin
|
||||
---
|
||||
|
||||
# `BraveSearchTool`
|
||||
|
||||
## Description
|
||||
|
||||
This tool is designed to perform web searches using the Brave Search API. It allows you to search the internet with a specified query and retrieve relevant results. The tool supports customizable result counts and country-specific searches.
|
||||
|
||||
## Installation
|
||||
|
||||
To incorporate this tool into your project, follow the installation instructions below:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Steps to Get Started
|
||||
|
||||
To effectively use the `BraveSearchTool`, follow these steps:
|
||||
|
||||
1. **Package Installation**: Confirm that the `crewai[tools]` package is installed in your Python environment.
|
||||
2. **API Key Acquisition**: Acquire a Brave Search API key by registering at [Brave Search API](https://api.search.brave.com/app/keys).
|
||||
3. **Environment Configuration**: Store your obtained API key in an environment variable named `BRAVE_API_KEY` to facilitate its use by the tool.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to initialize the tool and execute a search with a given query:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import BraveSearchTool
|
||||
|
||||
# Initialize the tool for internet searching capabilities
|
||||
tool = BraveSearchTool()
|
||||
|
||||
# Execute a search
|
||||
results = tool.run(search_query="CrewAI agent framework")
|
||||
print(results)
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `BraveSearchTool` accepts the following parameters:
|
||||
|
||||
- **search_query**: Mandatory. The search query you want to use to search the internet.
|
||||
- **country**: Optional. Specify the country for the search results. Default is empty string.
|
||||
- **n_results**: Optional. Number of search results to return. Default is `10`.
|
||||
- **save_file**: Optional. Whether to save the search results to a file. Default is `False`.
|
||||
|
||||
## Example with Parameters
|
||||
|
||||
Here is an example demonstrating how to use the tool with additional parameters:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import BraveSearchTool
|
||||
|
||||
# Initialize the tool with custom parameters
|
||||
tool = BraveSearchTool(
|
||||
country="US",
|
||||
n_results=5,
|
||||
save_file=True
|
||||
)
|
||||
|
||||
# Execute a search
|
||||
results = tool.run(search_query="Latest AI developments")
|
||||
print(results)
|
||||
```
|
||||
|
||||
## Agent Integration Example
|
||||
|
||||
Here's how to integrate the `BraveSearchTool` with a CrewAI agent:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent
|
||||
from crewai.project import agent
|
||||
from crewai_tools import BraveSearchTool
|
||||
|
||||
# Initialize the tool
|
||||
brave_search_tool = BraveSearchTool()
|
||||
|
||||
# Define an agent with the BraveSearchTool
|
||||
@agent
|
||||
def researcher(self) -> Agent:
|
||||
return Agent(
|
||||
config=self.agents_config["researcher"],
|
||||
allow_delegation=False,
|
||||
tools=[brave_search_tool]
|
||||
)
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
By integrating the `BraveSearchTool` into Python projects, users gain the ability to conduct real-time, relevant searches across the internet directly from their applications. The tool provides a simple interface to the powerful Brave Search API, making it easy to retrieve and process search results programmatically. By adhering to the setup and usage guidelines provided, incorporating this tool into projects is streamlined and straightforward.
|
||||
84
docs/en/tools/search-research/codedocssearchtool.mdx
Normal file
84
docs/en/tools/search-research/codedocssearchtool.mdx
Normal file
@@ -0,0 +1,84 @@
|
||||
---
|
||||
title: Code Docs RAG Search
|
||||
description: The `CodeDocsSearchTool` is a powerful RAG (Retrieval-Augmented Generation) tool designed for semantic searches within code documentation.
|
||||
icon: code
|
||||
---
|
||||
|
||||
# `CodeDocsSearchTool`
|
||||
|
||||
<Note>
|
||||
**Experimental**: We are still working on improving tools, so there might be unexpected behavior or changes in the future.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
The CodeDocsSearchTool is a powerful RAG (Retrieval-Augmented Generation) tool designed for semantic searches within code documentation.
|
||||
It enables users to efficiently find specific information or topics within code documentation. By providing a `docs_url` during initialization,
|
||||
the tool narrows down the search to that particular documentation site. Alternatively, without a specific `docs_url`,
|
||||
it searches across a wide array of code documentation known or discovered throughout its execution, making it versatile for various documentation search needs.
|
||||
|
||||
## Installation
|
||||
|
||||
To start using the CodeDocsSearchTool, first, install the crewai_tools package via pip:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
Utilize the CodeDocsSearchTool as follows to conduct searches within code documentation:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import CodeDocsSearchTool
|
||||
|
||||
# To search any code documentation content
|
||||
# if the URL is known or discovered during its execution:
|
||||
tool = CodeDocsSearchTool()
|
||||
|
||||
# OR
|
||||
|
||||
# To specifically focus your search on a given documentation site
|
||||
# by providing its URL:
|
||||
tool = CodeDocsSearchTool(docs_url='https://docs.example.com/reference')
|
||||
```
|
||||
<Note>
|
||||
Substitute 'https://docs.example.com/reference' with your target documentation URL
|
||||
and 'How to use search tool' with the search query relevant to your needs.
|
||||
</Note>
|
||||
|
||||
## Arguments
|
||||
|
||||
The following parameters can be used to customize the `CodeDocsSearchTool`'s behavior:
|
||||
|
||||
| Argument | Type | Description |
|
||||
|:---------------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| **docs_url** | `string` | _Optional_. Specifies the URL of the code documentation to be searched. |
|
||||
|
||||
## Custom model and embeddings
|
||||
|
||||
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
|
||||
|
||||
```python Code
|
||||
tool = CodeDocsSearchTool(
|
||||
config=dict(
|
||||
llm=dict(
|
||||
provider="ollama", # or google, openai, anthropic, llama2, ...
|
||||
config=dict(
|
||||
model="llama2",
|
||||
# temperature=0.5,
|
||||
# top_p=1,
|
||||
# stream=true,
|
||||
),
|
||||
),
|
||||
embedder=dict(
|
||||
provider="google", # or openai, ollama, ...
|
||||
config=dict(
|
||||
model="models/embedding-001",
|
||||
task_type="retrieval_document",
|
||||
# title="Embeddings",
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
```
|
||||
52
docs/en/tools/search-research/exasearchtool.mdx
Normal file
52
docs/en/tools/search-research/exasearchtool.mdx
Normal file
@@ -0,0 +1,52 @@
|
||||
---
|
||||
title: EXA Search Web Loader
|
||||
description: The `EXASearchTool` is designed to perform a semantic search for a specified query from a text's content across the internet.
|
||||
icon: globe-pointer
|
||||
---
|
||||
|
||||
# `EXASearchTool`
|
||||
|
||||
## Description
|
||||
|
||||
The EXASearchTool is designed to perform a semantic search for a specified query from a text's content across the internet.
|
||||
It utilizes the [exa.ai](https://exa.ai/) API to fetch and display the most relevant search results based on the query provided by the user.
|
||||
|
||||
## Installation
|
||||
|
||||
To incorporate this tool into your project, follow the installation instructions below:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to initialize the tool and execute a search with a given query:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import EXASearchTool
|
||||
|
||||
# Initialize the tool for internet searching capabilities
|
||||
tool = EXASearchTool()
|
||||
```
|
||||
|
||||
## Steps to Get Started
|
||||
|
||||
To effectively use the EXASearchTool, follow these steps:
|
||||
|
||||
<Steps>
|
||||
<Step title="Package Installation">
|
||||
Confirm that the `crewai[tools]` package is installed in your Python environment.
|
||||
</Step>
|
||||
<Step title="API Key Acquisition">
|
||||
Acquire a [exa.ai](https://exa.ai/) API key by registering for a free account at [exa.ai](https://exa.ai/).
|
||||
</Step>
|
||||
<Step title="Environment Configuration">
|
||||
Store your obtained API key in an environment variable named `EXA_API_KEY` to facilitate its use by the tool.
|
||||
</Step>
|
||||
</Steps>
|
||||
|
||||
## Conclusion
|
||||
|
||||
By integrating the `EXASearchTool` into Python projects, users gain the ability to conduct real-time, relevant searches across the internet directly from their applications.
|
||||
By adhering to the setup and usage guidelines provided, incorporating this tool into projects is streamlined and straightforward.
|
||||
83
docs/en/tools/search-research/githubsearchtool.mdx
Normal file
83
docs/en/tools/search-research/githubsearchtool.mdx
Normal file
@@ -0,0 +1,83 @@
|
||||
---
|
||||
title: Github Search
|
||||
description: The `GithubSearchTool` is designed to search websites and convert them into clean markdown or structured data.
|
||||
icon: github
|
||||
---
|
||||
|
||||
# `GithubSearchTool`
|
||||
|
||||
<Note>
|
||||
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
The GithubSearchTool is a Retrieval-Augmented Generation (RAG) tool specifically designed for conducting semantic searches within GitHub repositories. Utilizing advanced semantic search capabilities, it sifts through code, pull requests, issues, and repositories, making it an essential tool for developers, researchers, or anyone in need of precise information from GitHub.
|
||||
|
||||
## Installation
|
||||
|
||||
To use the GithubSearchTool, first ensure the crewai_tools package is installed in your Python environment:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
This command installs the necessary package to run the GithubSearchTool along with any other tools included in the crewai_tools package.
|
||||
|
||||
## Example
|
||||
|
||||
Here’s how you can use the GithubSearchTool to perform semantic searches within a GitHub repository:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import GithubSearchTool
|
||||
|
||||
# Initialize the tool for semantic searches within a specific GitHub repository
|
||||
tool = GithubSearchTool(
|
||||
github_repo='https://github.com/example/repo',
|
||||
gh_token='your_github_personal_access_token',
|
||||
content_types=['code', 'issue'] # Options: code, repo, pr, issue
|
||||
)
|
||||
|
||||
# OR
|
||||
|
||||
# Initialize the tool for semantic searches within a specific GitHub repository, so the agent can search any repository if it learns about during its execution
|
||||
tool = GithubSearchTool(
|
||||
gh_token='your_github_personal_access_token',
|
||||
content_types=['code', 'issue'] # Options: code, repo, pr, issue
|
||||
)
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
- `github_repo` : The URL of the GitHub repository where the search will be conducted. This is a mandatory field and specifies the target repository for your search.
|
||||
- `gh_token` : Your GitHub Personal Access Token (PAT) required for authentication. You can create one in your GitHub account settings under Developer Settings > Personal Access Tokens.
|
||||
- `content_types` : Specifies the types of content to include in your search. You must provide a list of content types from the following options: `code` for searching within the code,
|
||||
`repo` for searching within the repository's general information, `pr` for searching within pull requests, and `issue` for searching within issues.
|
||||
This field is mandatory and allows tailoring the search to specific content types within the GitHub repository.
|
||||
|
||||
## Custom model and embeddings
|
||||
|
||||
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
|
||||
|
||||
```python Code
|
||||
tool = GithubSearchTool(
|
||||
config=dict(
|
||||
llm=dict(
|
||||
provider="ollama", # or google, openai, anthropic, llama2, ...
|
||||
config=dict(
|
||||
model="llama2",
|
||||
# temperature=0.5,
|
||||
# top_p=1,
|
||||
# stream=true,
|
||||
),
|
||||
),
|
||||
embedder=dict(
|
||||
provider="google", # or openai, ollama, ...
|
||||
config=dict(
|
||||
model="models/embedding-001",
|
||||
task_type="retrieval_document",
|
||||
# title="Embeddings",
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
112
docs/en/tools/search-research/linkupsearchtool.mdx
Normal file
112
docs/en/tools/search-research/linkupsearchtool.mdx
Normal file
@@ -0,0 +1,112 @@
|
||||
---
|
||||
title: Linkup Search Tool
|
||||
description: The `LinkupSearchTool` enables querying the Linkup API for contextual information.
|
||||
icon: link
|
||||
---
|
||||
|
||||
# `LinkupSearchTool`
|
||||
|
||||
## Description
|
||||
|
||||
The `LinkupSearchTool` provides the ability to query the Linkup API for contextual information and retrieve structured results. This tool is ideal for enriching workflows with up-to-date and reliable information from Linkup, allowing agents to access relevant data during their tasks.
|
||||
|
||||
## Installation
|
||||
|
||||
To use this tool, you need to install the Linkup SDK:
|
||||
|
||||
```shell
|
||||
uv add linkup-sdk
|
||||
```
|
||||
|
||||
## Steps to Get Started
|
||||
|
||||
To effectively use the `LinkupSearchTool`, follow these steps:
|
||||
|
||||
1. **API Key**: Obtain a Linkup API key.
|
||||
2. **Environment Setup**: Set up your environment with the API key.
|
||||
3. **Install SDK**: Install the Linkup SDK using the command above.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to initialize the tool and use it in an agent:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import LinkupSearchTool
|
||||
from crewai import Agent
|
||||
import os
|
||||
|
||||
# Initialize the tool with your API key
|
||||
linkup_tool = LinkupSearchTool(api_key=os.getenv("LINKUP_API_KEY"))
|
||||
|
||||
# Define an agent that uses the tool
|
||||
@agent
|
||||
def researcher(self) -> Agent:
|
||||
'''
|
||||
This agent uses the LinkupSearchTool to retrieve contextual information
|
||||
from the Linkup API.
|
||||
'''
|
||||
return Agent(
|
||||
config=self.agents_config["researcher"],
|
||||
tools=[linkup_tool]
|
||||
)
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `LinkupSearchTool` accepts the following parameters:
|
||||
|
||||
### Constructor Parameters
|
||||
- **api_key**: Required. Your Linkup API key.
|
||||
|
||||
### Run Parameters
|
||||
- **query**: Required. The search term or phrase.
|
||||
- **depth**: Optional. The search depth. Default is "standard".
|
||||
- **output_type**: Optional. The type of output. Default is "searchResults".
|
||||
|
||||
## Advanced Usage
|
||||
|
||||
You can customize the search parameters for more specific results:
|
||||
|
||||
```python Code
|
||||
# Perform a search with custom parameters
|
||||
results = linkup_tool.run(
|
||||
query="Women Nobel Prize Physics",
|
||||
depth="deep",
|
||||
output_type="searchResults"
|
||||
)
|
||||
```
|
||||
|
||||
## Return Format
|
||||
|
||||
The tool returns results in the following format:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"results": [
|
||||
{
|
||||
"name": "Result Title",
|
||||
"url": "https://example.com/result",
|
||||
"content": "Content of the result..."
|
||||
},
|
||||
// Additional results...
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
If an error occurs, the response will be:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": false,
|
||||
"error": "Error message"
|
||||
}
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
The tool gracefully handles API errors and provides structured feedback. If the API request fails, the tool will return a dictionary with `success: false` and an error message.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `LinkupSearchTool` provides a seamless way to integrate Linkup's contextual information retrieval capabilities into your CrewAI agents. By leveraging this tool, agents can access relevant and up-to-date information to enhance their decision-making and task execution.
|
||||
71
docs/en/tools/search-research/overview.mdx
Normal file
71
docs/en/tools/search-research/overview.mdx
Normal file
@@ -0,0 +1,71 @@
|
||||
---
|
||||
title: "Overview"
|
||||
description: "Perform web searches, find repositories, and research information across the internet"
|
||||
icon: "face-smile"
|
||||
---
|
||||
|
||||
These tools enable your agents to search the web, research topics, and find information across various platforms including search engines, GitHub, and YouTube.
|
||||
|
||||
## **Available Tools**
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Serper Dev Tool" icon="google" href="/en/tools/search-research/serperdevtool">
|
||||
Google search API integration for comprehensive web search capabilities.
|
||||
</Card>
|
||||
|
||||
<Card title="Brave Search Tool" icon="shield" href="/en/tools/search-research/bravesearchtool">
|
||||
Privacy-focused search with Brave's independent search index.
|
||||
</Card>
|
||||
|
||||
<Card title="Exa Search Tool" icon="magnifying-glass" href="/en/tools/search-research/exasearchtool">
|
||||
AI-powered search for finding specific and relevant content.
|
||||
</Card>
|
||||
|
||||
<Card title="LinkUp Search Tool" icon="link" href="/en/tools/search-research/linkupsearchtool">
|
||||
Real-time web search with fresh content indexing.
|
||||
</Card>
|
||||
|
||||
<Card title="GitHub Search Tool" icon="github" href="/en/tools/search-research/githubsearchtool">
|
||||
Search GitHub repositories, code, issues, and documentation.
|
||||
</Card>
|
||||
|
||||
<Card title="Website Search Tool" icon="globe" href="/en/tools/search-research/websitesearchtool">
|
||||
Search within specific websites and domains.
|
||||
</Card>
|
||||
|
||||
<Card title="Code Docs Search Tool" icon="code" href="/en/tools/search-research/codedocssearchtool">
|
||||
Search through code documentation and technical resources.
|
||||
</Card>
|
||||
|
||||
<Card title="YouTube Channel Search" icon="youtube" href="/en/tools/search-research/youtubechannelsearchtool">
|
||||
Search YouTube channels for specific content and creators.
|
||||
</Card>
|
||||
|
||||
<Card title="YouTube Video Search" icon="play" href="/en/tools/search-research/youtubevideosearchtool">
|
||||
Find and analyze YouTube videos by topic, keyword, or criteria.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
## **Common Use Cases**
|
||||
|
||||
- **Market Research**: Search for industry trends and competitor analysis
|
||||
- **Content Discovery**: Find relevant articles, videos, and resources
|
||||
- **Code Research**: Search repositories and documentation for solutions
|
||||
- **Lead Generation**: Research companies and individuals
|
||||
- **Academic Research**: Find scholarly articles and technical papers
|
||||
|
||||
```python
|
||||
from crewai_tools import SerperDevTool, GitHubSearchTool, YoutubeVideoSearchTool
|
||||
|
||||
# Create research tools
|
||||
web_search = SerperDevTool()
|
||||
code_search = GitHubSearchTool()
|
||||
video_research = YoutubeVideoSearchTool()
|
||||
|
||||
# Add to your agent
|
||||
agent = Agent(
|
||||
role="Research Analyst",
|
||||
tools=[web_search, code_search, video_research],
|
||||
goal="Gather comprehensive information on any topic"
|
||||
)
|
||||
```
|
||||
112
docs/en/tools/search-research/serperdevtool.mdx
Normal file
112
docs/en/tools/search-research/serperdevtool.mdx
Normal file
@@ -0,0 +1,112 @@
|
||||
---
|
||||
title: Google Serper Search
|
||||
description: The `SerperDevTool` is designed to search the internet and return the most relevant results.
|
||||
icon: google
|
||||
---
|
||||
|
||||
# `SerperDevTool`
|
||||
|
||||
<Note>
|
||||
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
This tool is designed to perform a semantic search for a specified query from a text's content across the internet. It utilizes the [serper.dev](https://serper.dev) API
|
||||
to fetch and display the most relevant search results based on the query provided by the user.
|
||||
|
||||
## Installation
|
||||
|
||||
To incorporate this tool into your project, follow the installation instructions below:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to initialize the tool and execute a search with a given query:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import SerperDevTool
|
||||
|
||||
# Initialize the tool for internet searching capabilities
|
||||
tool = SerperDevTool()
|
||||
```
|
||||
|
||||
## Steps to Get Started
|
||||
|
||||
To effectively use the `SerperDevTool`, follow these steps:
|
||||
|
||||
1. **Package Installation**: Confirm that the `crewai[tools]` package is installed in your Python environment.
|
||||
2. **API Key Acquisition**: Acquire a `serper.dev` API key by registering for a free account at `serper.dev`.
|
||||
3. **Environment Configuration**: Store your obtained API key in an environment variable named `SERPER_API_KEY` to facilitate its use by the tool.
|
||||
|
||||
## Parameters
|
||||
|
||||
The `SerperDevTool` comes with several parameters that will be passed to the API :
|
||||
|
||||
- **search_url**: The URL endpoint for the search API. (Default is `https://google.serper.dev/search`)
|
||||
|
||||
- **country**: Optional. Specify the country for the search results.
|
||||
- **location**: Optional. Specify the location for the search results.
|
||||
- **locale**: Optional. Specify the locale for the search results.
|
||||
- **n_results**: Number of search results to return. Default is `10`.
|
||||
|
||||
The values for `country`, `location`, `locale` and `search_url` can be found on the [Serper Playground](https://serper.dev/playground).
|
||||
|
||||
## Example with Parameters
|
||||
|
||||
Here is an example demonstrating how to use the tool with additional parameters:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import SerperDevTool
|
||||
|
||||
tool = SerperDevTool(
|
||||
search_url="https://google.serper.dev/scholar",
|
||||
n_results=2,
|
||||
)
|
||||
|
||||
print(tool.run(search_query="ChatGPT"))
|
||||
|
||||
# Using Tool: Search the internet
|
||||
|
||||
# Search results: Title: Role of chat gpt in public health
|
||||
# Link: https://link.springer.com/article/10.1007/s10439-023-03172-7
|
||||
# Snippet: … ChatGPT in public health. In this overview, we will examine the potential uses of ChatGPT in
|
||||
# ---
|
||||
# Title: Potential use of chat gpt in global warming
|
||||
# Link: https://link.springer.com/article/10.1007/s10439-023-03171-8
|
||||
# Snippet: … as ChatGPT, have the potential to play a critical role in advancing our understanding of climate
|
||||
# ---
|
||||
|
||||
```
|
||||
|
||||
```python Code
|
||||
from crewai_tools import SerperDevTool
|
||||
|
||||
tool = SerperDevTool(
|
||||
country="fr",
|
||||
locale="fr",
|
||||
location="Paris, Paris, Ile-de-France, France",
|
||||
n_results=2,
|
||||
)
|
||||
|
||||
print(tool.run(search_query="Jeux Olympiques"))
|
||||
|
||||
# Using Tool: Search the internet
|
||||
|
||||
# Search results: Title: Jeux Olympiques de Paris 2024 - Actualités, calendriers, résultats
|
||||
# Link: https://olympics.com/fr/paris-2024
|
||||
# Snippet: Quels sont les sports présents aux Jeux Olympiques de Paris 2024 ? · Athlétisme · Aviron · Badminton · Basketball · Basketball 3x3 · Boxe · Breaking · Canoë ...
|
||||
# ---
|
||||
# Title: Billetterie Officielle de Paris 2024 - Jeux Olympiques et Paralympiques
|
||||
# Link: https://tickets.paris2024.org/
|
||||
# Snippet: Achetez vos billets exclusivement sur le site officiel de la billetterie de Paris 2024 pour participer au plus grand événement sportif au monde.
|
||||
# ---
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
By integrating the `SerperDevTool` into Python projects, users gain the ability to conduct real-time, relevant searches across the internet directly from their applications.
|
||||
The updated parameters allow for more customized and localized search results. By adhering to the setup and usage guidelines provided, incorporating this tool into projects is streamlined and straightforward.
|
||||
77
docs/en/tools/search-research/websitesearchtool.mdx
Normal file
77
docs/en/tools/search-research/websitesearchtool.mdx
Normal file
@@ -0,0 +1,77 @@
|
||||
---
|
||||
title: Website RAG Search
|
||||
description: The `WebsiteSearchTool` is designed to perform a RAG (Retrieval-Augmented Generation) search within the content of a website.
|
||||
icon: globe-stand
|
||||
---
|
||||
|
||||
# `WebsiteSearchTool`
|
||||
|
||||
<Note>
|
||||
The WebsiteSearchTool is currently in an experimental phase. We are actively working on incorporating this tool into our suite of offerings and will update the documentation accordingly.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
The WebsiteSearchTool is designed as a concept for conducting semantic searches within the content of websites.
|
||||
It aims to leverage advanced machine learning models like Retrieval-Augmented Generation (RAG) to navigate and extract information from specified URLs efficiently.
|
||||
This tool intends to offer flexibility, allowing users to perform searches across any website or focus on specific websites of interest.
|
||||
Please note, the current implementation details of the WebsiteSearchTool are under development, and its functionalities as described may not yet be accessible.
|
||||
|
||||
## Installation
|
||||
|
||||
To prepare your environment for when the WebsiteSearchTool becomes available, you can install the foundational package with:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
This command installs the necessary dependencies to ensure that once the tool is fully integrated, users can start using it immediately.
|
||||
|
||||
## Example Usage
|
||||
|
||||
Below are examples of how the WebsiteSearchTool could be utilized in different scenarios. Please note, these examples are illustrative and represent planned functionality:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import WebsiteSearchTool
|
||||
|
||||
# Example of initiating tool that agents can use
|
||||
# to search across any discovered websites
|
||||
tool = WebsiteSearchTool()
|
||||
|
||||
# Example of limiting the search to the content of a specific website,
|
||||
# so now agents can only search within that website
|
||||
tool = WebsiteSearchTool(website='https://example.com')
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
- `website`: An optional argument intended to specify the website URL for focused searches. This argument is designed to enhance the tool's flexibility by allowing targeted searches when necessary.
|
||||
|
||||
## Customization Options
|
||||
|
||||
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
|
||||
|
||||
|
||||
```python Code
|
||||
tool = WebsiteSearchTool(
|
||||
config=dict(
|
||||
llm=dict(
|
||||
provider="ollama", # or google, openai, anthropic, llama2, ...
|
||||
config=dict(
|
||||
model="llama2",
|
||||
# temperature=0.5,
|
||||
# top_p=1,
|
||||
# stream=true,
|
||||
),
|
||||
),
|
||||
embedder=dict(
|
||||
provider="google", # or openai, ollama, ...
|
||||
config=dict(
|
||||
model="models/embedding-001",
|
||||
task_type="retrieval_document",
|
||||
# title="Embeddings",
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
```
|
||||
194
docs/en/tools/search-research/youtubechannelsearchtool.mdx
Normal file
194
docs/en/tools/search-research/youtubechannelsearchtool.mdx
Normal file
@@ -0,0 +1,194 @@
|
||||
---
|
||||
title: YouTube Channel RAG Search
|
||||
description: The `YoutubeChannelSearchTool` is designed to perform a RAG (Retrieval-Augmented Generation) search within the content of a Youtube channel.
|
||||
icon: youtube
|
||||
---
|
||||
|
||||
# `YoutubeChannelSearchTool`
|
||||
|
||||
<Note>
|
||||
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
This tool is designed to perform semantic searches within a specific Youtube channel's content.
|
||||
Leveraging the RAG (Retrieval-Augmented Generation) methodology, it provides relevant search results,
|
||||
making it invaluable for extracting information or finding specific content without the need to manually sift through videos.
|
||||
It streamlines the search process within Youtube channels, catering to researchers, content creators, and viewers seeking specific information or topics.
|
||||
|
||||
## Installation
|
||||
|
||||
To utilize the YoutubeChannelSearchTool, the `crewai_tools` package must be installed. Execute the following command in your shell to install:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to use the `YoutubeChannelSearchTool` with a CrewAI agent:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools import YoutubeChannelSearchTool
|
||||
|
||||
# Initialize the tool for general YouTube channel searches
|
||||
youtube_channel_tool = YoutubeChannelSearchTool()
|
||||
|
||||
# Define an agent that uses the tool
|
||||
channel_researcher = Agent(
|
||||
role="Channel Researcher",
|
||||
goal="Extract relevant information from YouTube channels",
|
||||
backstory="An expert researcher who specializes in analyzing YouTube channel content.",
|
||||
tools=[youtube_channel_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Example task to search for information in a specific channel
|
||||
research_task = Task(
|
||||
description="Search for information about machine learning tutorials in the YouTube channel {youtube_channel_handle}",
|
||||
expected_output="A summary of the key machine learning tutorials available on the channel.",
|
||||
agent=channel_researcher,
|
||||
)
|
||||
|
||||
# Create and run the crew
|
||||
crew = Crew(agents=[channel_researcher], tasks=[research_task])
|
||||
result = crew.kickoff(inputs={"youtube_channel_handle": "@exampleChannel"})
|
||||
```
|
||||
|
||||
You can also initialize the tool with a specific YouTube channel handle:
|
||||
|
||||
```python Code
|
||||
# Initialize the tool with a specific YouTube channel handle
|
||||
youtube_channel_tool = YoutubeChannelSearchTool(
|
||||
youtube_channel_handle='@exampleChannel'
|
||||
)
|
||||
|
||||
# Define an agent that uses the tool
|
||||
channel_researcher = Agent(
|
||||
role="Channel Researcher",
|
||||
goal="Extract relevant information from a specific YouTube channel",
|
||||
backstory="An expert researcher who specializes in analyzing YouTube channel content.",
|
||||
tools=[youtube_channel_tool],
|
||||
verbose=True,
|
||||
)
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `YoutubeChannelSearchTool` accepts the following parameters:
|
||||
|
||||
- **youtube_channel_handle**: Optional. The handle of the YouTube channel to search within. If provided during initialization, the agent won't need to specify it when using the tool. If the handle doesn't start with '@', it will be automatically added.
|
||||
- **config**: Optional. Configuration for the underlying RAG system, including LLM and embedder settings.
|
||||
- **summarize**: Optional. Whether to summarize the retrieved content. Default is `False`.
|
||||
|
||||
When using the tool with an agent, the agent will need to provide:
|
||||
|
||||
- **search_query**: Required. The search query to find relevant information in the channel content.
|
||||
- **youtube_channel_handle**: Required only if not provided during initialization. The handle of the YouTube channel to search within.
|
||||
|
||||
## Custom Model and Embeddings
|
||||
|
||||
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
|
||||
|
||||
```python Code
|
||||
youtube_channel_tool = YoutubeChannelSearchTool(
|
||||
config=dict(
|
||||
llm=dict(
|
||||
provider="ollama", # or google, openai, anthropic, llama2, ...
|
||||
config=dict(
|
||||
model="llama2",
|
||||
# temperature=0.5,
|
||||
# top_p=1,
|
||||
# stream=true,
|
||||
),
|
||||
),
|
||||
embedder=dict(
|
||||
provider="google", # or openai, ollama, ...
|
||||
config=dict(
|
||||
model="models/embedding-001",
|
||||
task_type="retrieval_document",
|
||||
# title="Embeddings",
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
## Agent Integration Example
|
||||
|
||||
Here's a more detailed example of how to integrate the `YoutubeChannelSearchTool` with a CrewAI agent:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools import YoutubeChannelSearchTool
|
||||
|
||||
# Initialize the tool
|
||||
youtube_channel_tool = YoutubeChannelSearchTool()
|
||||
|
||||
# Define an agent that uses the tool
|
||||
channel_researcher = Agent(
|
||||
role="Channel Researcher",
|
||||
goal="Extract and analyze information from YouTube channels",
|
||||
backstory="""You are an expert channel researcher who specializes in extracting
|
||||
and analyzing information from YouTube channels. You have a keen eye for detail
|
||||
and can quickly identify key points and insights from video content across an entire channel.""",
|
||||
tools=[youtube_channel_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Create a task for the agent
|
||||
research_task = Task(
|
||||
description="""
|
||||
Search for information about data science projects and tutorials
|
||||
in the YouTube channel {youtube_channel_handle}.
|
||||
|
||||
Focus on:
|
||||
1. Key data science techniques covered
|
||||
2. Popular tutorial series
|
||||
3. Most viewed or recommended videos
|
||||
|
||||
Provide a comprehensive summary of these points.
|
||||
""",
|
||||
expected_output="A detailed summary of data science content available on the channel.",
|
||||
agent=channel_researcher,
|
||||
)
|
||||
|
||||
# Run the task
|
||||
crew = Crew(agents=[channel_researcher], tasks=[research_task])
|
||||
result = crew.kickoff(inputs={"youtube_channel_handle": "@exampleDataScienceChannel"})
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
The `YoutubeChannelSearchTool` is implemented as a subclass of `RagTool`, which provides the base functionality for Retrieval-Augmented Generation:
|
||||
|
||||
```python Code
|
||||
class YoutubeChannelSearchTool(RagTool):
|
||||
name: str = "Search a Youtube Channels content"
|
||||
description: str = "A tool that can be used to semantic search a query from a Youtube Channels content."
|
||||
args_schema: Type[BaseModel] = YoutubeChannelSearchToolSchema
|
||||
|
||||
def __init__(self, youtube_channel_handle: Optional[str] = None, **kwargs):
|
||||
super().__init__(**kwargs)
|
||||
if youtube_channel_handle is not None:
|
||||
kwargs["data_type"] = DataType.YOUTUBE_CHANNEL
|
||||
self.add(youtube_channel_handle)
|
||||
self.description = f"A tool that can be used to semantic search a query the {youtube_channel_handle} Youtube Channels content."
|
||||
self.args_schema = FixedYoutubeChannelSearchToolSchema
|
||||
self._generate_description()
|
||||
|
||||
def add(
|
||||
self,
|
||||
youtube_channel_handle: str,
|
||||
**kwargs: Any,
|
||||
) -> None:
|
||||
if not youtube_channel_handle.startswith("@"):
|
||||
youtube_channel_handle = f"@{youtube_channel_handle}"
|
||||
super().add(youtube_channel_handle, **kwargs)
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `YoutubeChannelSearchTool` provides a powerful way to search and extract information from YouTube channel content using RAG techniques. By enabling agents to search across an entire channel's videos, it facilitates information extraction and analysis tasks that would otherwise be difficult to perform. This tool is particularly useful for research, content analysis, and knowledge extraction from YouTube channels.
|
||||
187
docs/en/tools/search-research/youtubevideosearchtool.mdx
Normal file
187
docs/en/tools/search-research/youtubevideosearchtool.mdx
Normal file
@@ -0,0 +1,187 @@
|
||||
---
|
||||
title: YouTube Video RAG Search
|
||||
description: The `YoutubeVideoSearchTool` is designed to perform a RAG (Retrieval-Augmented Generation) search within the content of a Youtube video.
|
||||
icon: youtube
|
||||
---
|
||||
|
||||
# `YoutubeVideoSearchTool`
|
||||
|
||||
<Note>
|
||||
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
This tool is part of the `crewai_tools` package and is designed to perform semantic searches within Youtube video content, utilizing Retrieval-Augmented Generation (RAG) techniques.
|
||||
It is one of several "Search" tools in the package that leverage RAG for different sources.
|
||||
The YoutubeVideoSearchTool allows for flexibility in searches; users can search across any Youtube video content without specifying a video URL,
|
||||
or they can target their search to a specific Youtube video by providing its URL.
|
||||
|
||||
## Installation
|
||||
|
||||
To utilize the `YoutubeVideoSearchTool`, you must first install the `crewai_tools` package.
|
||||
This package contains the `YoutubeVideoSearchTool` among other utilities designed to enhance your data analysis and processing tasks.
|
||||
Install the package by executing the following command in your terminal:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to use the `YoutubeVideoSearchTool` with a CrewAI agent:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools import YoutubeVideoSearchTool
|
||||
|
||||
# Initialize the tool for general YouTube video searches
|
||||
youtube_search_tool = YoutubeVideoSearchTool()
|
||||
|
||||
# Define an agent that uses the tool
|
||||
video_researcher = Agent(
|
||||
role="Video Researcher",
|
||||
goal="Extract relevant information from YouTube videos",
|
||||
backstory="An expert researcher who specializes in analyzing video content.",
|
||||
tools=[youtube_search_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Example task to search for information in a specific video
|
||||
research_task = Task(
|
||||
description="Search for information about machine learning frameworks in the YouTube video at {youtube_video_url}",
|
||||
expected_output="A summary of the key machine learning frameworks mentioned in the video.",
|
||||
agent=video_researcher,
|
||||
)
|
||||
|
||||
# Create and run the crew
|
||||
crew = Crew(agents=[video_researcher], tasks=[research_task])
|
||||
result = crew.kickoff(inputs={"youtube_video_url": "https://youtube.com/watch?v=example"})
|
||||
```
|
||||
|
||||
You can also initialize the tool with a specific YouTube video URL:
|
||||
|
||||
```python Code
|
||||
# Initialize the tool with a specific YouTube video URL
|
||||
youtube_search_tool = YoutubeVideoSearchTool(
|
||||
youtube_video_url='https://youtube.com/watch?v=example'
|
||||
)
|
||||
|
||||
# Define an agent that uses the tool
|
||||
video_researcher = Agent(
|
||||
role="Video Researcher",
|
||||
goal="Extract relevant information from a specific YouTube video",
|
||||
backstory="An expert researcher who specializes in analyzing video content.",
|
||||
tools=[youtube_search_tool],
|
||||
verbose=True,
|
||||
)
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `YoutubeVideoSearchTool` accepts the following parameters:
|
||||
|
||||
- **youtube_video_url**: Optional. The URL of the YouTube video to search within. If provided during initialization, the agent won't need to specify it when using the tool.
|
||||
- **config**: Optional. Configuration for the underlying RAG system, including LLM and embedder settings.
|
||||
- **summarize**: Optional. Whether to summarize the retrieved content. Default is `False`.
|
||||
|
||||
When using the tool with an agent, the agent will need to provide:
|
||||
|
||||
- **search_query**: Required. The search query to find relevant information in the video content.
|
||||
- **youtube_video_url**: Required only if not provided during initialization. The URL of the YouTube video to search within.
|
||||
|
||||
## Custom Model and Embeddings
|
||||
|
||||
By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:
|
||||
|
||||
```python Code
|
||||
youtube_search_tool = YoutubeVideoSearchTool(
|
||||
config=dict(
|
||||
llm=dict(
|
||||
provider="ollama", # or google, openai, anthropic, llama2, ...
|
||||
config=dict(
|
||||
model="llama2",
|
||||
# temperature=0.5,
|
||||
# top_p=1,
|
||||
# stream=true,
|
||||
),
|
||||
),
|
||||
embedder=dict(
|
||||
provider="google", # or openai, ollama, ...
|
||||
config=dict(
|
||||
model="models/embedding-001",
|
||||
task_type="retrieval_document",
|
||||
# title="Embeddings",
|
||||
),
|
||||
),
|
||||
)
|
||||
)
|
||||
```
|
||||
|
||||
## Agent Integration Example
|
||||
|
||||
Here's a more detailed example of how to integrate the `YoutubeVideoSearchTool` with a CrewAI agent:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools import YoutubeVideoSearchTool
|
||||
|
||||
# Initialize the tool
|
||||
youtube_search_tool = YoutubeVideoSearchTool()
|
||||
|
||||
# Define an agent that uses the tool
|
||||
video_researcher = Agent(
|
||||
role="Video Researcher",
|
||||
goal="Extract and analyze information from YouTube videos",
|
||||
backstory="""You are an expert video researcher who specializes in extracting
|
||||
and analyzing information from YouTube videos. You have a keen eye for detail
|
||||
and can quickly identify key points and insights from video content.""",
|
||||
tools=[youtube_search_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Create a task for the agent
|
||||
research_task = Task(
|
||||
description="""
|
||||
Search for information about recent advancements in artificial intelligence
|
||||
in the YouTube video at {youtube_video_url}.
|
||||
|
||||
Focus on:
|
||||
1. Key AI technologies mentioned
|
||||
2. Real-world applications discussed
|
||||
3. Future predictions made by the speaker
|
||||
|
||||
Provide a comprehensive summary of these points.
|
||||
""",
|
||||
expected_output="A detailed summary of AI advancements, applications, and future predictions from the video.",
|
||||
agent=video_researcher,
|
||||
)
|
||||
|
||||
# Run the task
|
||||
crew = Crew(agents=[video_researcher], tasks=[research_task])
|
||||
result = crew.kickoff(inputs={"youtube_video_url": "https://youtube.com/watch?v=example"})
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
The `YoutubeVideoSearchTool` is implemented as a subclass of `RagTool`, which provides the base functionality for Retrieval-Augmented Generation:
|
||||
|
||||
```python Code
|
||||
class YoutubeVideoSearchTool(RagTool):
|
||||
name: str = "Search a Youtube Video content"
|
||||
description: str = "A tool that can be used to semantic search a query from a Youtube Video content."
|
||||
args_schema: Type[BaseModel] = YoutubeVideoSearchToolSchema
|
||||
|
||||
def __init__(self, youtube_video_url: Optional[str] = None, **kwargs):
|
||||
super().__init__(**kwargs)
|
||||
if youtube_video_url is not None:
|
||||
kwargs["data_type"] = DataType.YOUTUBE_VIDEO
|
||||
self.add(youtube_video_url)
|
||||
self.description = f"A tool that can be used to semantic search a query the {youtube_video_url} Youtube Video content."
|
||||
self.args_schema = FixedYoutubeVideoSearchToolSchema
|
||||
self._generate_description()
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `YoutubeVideoSearchTool` provides a powerful way to search and extract information from YouTube video content using RAG techniques. By enabling agents to search within video content, it facilitates information extraction and analysis tasks that would otherwise be difficult to perform. This tool is particularly useful for research, content analysis, and knowledge extraction from video sources.
|
||||
50
docs/en/tools/web-scraping/browserbaseloadtool.mdx
Normal file
50
docs/en/tools/web-scraping/browserbaseloadtool.mdx
Normal file
@@ -0,0 +1,50 @@
|
||||
---
|
||||
title: Browserbase Web Loader
|
||||
description: Browserbase is a developer platform to reliably run, manage, and monitor headless browsers.
|
||||
icon: browser
|
||||
---
|
||||
|
||||
# `BrowserbaseLoadTool`
|
||||
|
||||
## Description
|
||||
|
||||
[Browserbase](https://browserbase.com) is a developer platform to reliably run, manage, and monitor headless browsers.
|
||||
|
||||
Power your AI data retrievals with:
|
||||
|
||||
- [Serverless Infrastructure](https://docs.browserbase.com/under-the-hood) providing reliable browsers to extract data from complex UIs
|
||||
- [Stealth Mode](https://docs.browserbase.com/features/stealth-mode) with included fingerprinting tactics and automatic captcha solving
|
||||
- [Session Debugger](https://docs.browserbase.com/features/sessions) to inspect your Browser Session with networks timeline and logs
|
||||
- [Live Debug](https://docs.browserbase.com/guides/session-debug-connection/browser-remote-control) to quickly debug your automation
|
||||
|
||||
## Installation
|
||||
|
||||
- Get an API key and Project ID from [browserbase.com](https://browserbase.com) and set it in environment variables (`BROWSERBASE_API_KEY`, `BROWSERBASE_PROJECT_ID`).
|
||||
- Install the [Browserbase SDK](http://github.com/browserbase/python-sdk) along with `crewai[tools]` package:
|
||||
|
||||
```shell
|
||||
pip install browserbase 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
Utilize the BrowserbaseLoadTool as follows to allow your agent to load websites:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import BrowserbaseLoadTool
|
||||
|
||||
# Initialize the tool with the Browserbase API key and Project ID
|
||||
tool = BrowserbaseLoadTool()
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
The following parameters can be used to customize the `BrowserbaseLoadTool`'s behavior:
|
||||
|
||||
| Argument | Type | Description |
|
||||
|:---------------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| **api_key** | `string` | _Optional_. Browserbase API key. Default is `BROWSERBASE_API_KEY` env variable. |
|
||||
| **project_id** | `string` | _Optional_. Browserbase Project ID. Default is `BROWSERBASE_PROJECT_ID` env variable. |
|
||||
| **text_content** | `bool` | _Optional_. Retrieve only text content. Default is `False`. |
|
||||
| **session_id** | `string` | _Optional_. Provide an existing Session ID. |
|
||||
| **proxy** | `bool` | _Optional_. Enable/Disable Proxies. Default is `False`. |
|
||||
47
docs/en/tools/web-scraping/firecrawlcrawlwebsitetool.mdx
Normal file
47
docs/en/tools/web-scraping/firecrawlcrawlwebsitetool.mdx
Normal file
@@ -0,0 +1,47 @@
|
||||
---
|
||||
title: Firecrawl Crawl Website
|
||||
description: The `FirecrawlCrawlWebsiteTool` is designed to crawl and convert websites into clean markdown or structured data.
|
||||
icon: fire-flame
|
||||
---
|
||||
|
||||
# `FirecrawlCrawlWebsiteTool`
|
||||
|
||||
## Description
|
||||
|
||||
[Firecrawl](https://firecrawl.dev) is a platform for crawling and convert any website into clean markdown or structured data.
|
||||
|
||||
## Installation
|
||||
|
||||
- Get an API key from [firecrawl.dev](https://firecrawl.dev) and set it in environment variables (`FIRECRAWL_API_KEY`).
|
||||
- Install the [Firecrawl SDK](https://github.com/mendableai/firecrawl) along with `crewai[tools]` package:
|
||||
|
||||
```shell
|
||||
pip install firecrawl-py 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
Utilize the FirecrawlScrapeFromWebsiteTool as follows to allow your agent to load websites:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import FirecrawlCrawlWebsiteTool
|
||||
|
||||
tool = FirecrawlCrawlWebsiteTool(url='firecrawl.dev')
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
- `api_key`: Optional. Specifies Firecrawl API key. Defaults is the `FIRECRAWL_API_KEY` environment variable.
|
||||
- `url`: The base URL to start crawling from.
|
||||
- `page_options`: Optional.
|
||||
- `onlyMainContent`: Optional. Only return the main content of the page excluding headers, navs, footers, etc.
|
||||
- `includeHtml`: Optional. Include the raw HTML content of the page. Will output a html key in the response.
|
||||
- `crawler_options`: Optional. Options for controlling the crawling behavior.
|
||||
- `includes`: Optional. URL patterns to include in the crawl.
|
||||
- `exclude`: Optional. URL patterns to exclude from the crawl.
|
||||
- `generateImgAltText`: Optional. Generate alt text for images using LLMs (requires a paid plan).
|
||||
- `returnOnlyUrls`: Optional. If true, returns only the URLs as a list in the crawl status. Note: the response will be a list of URLs inside the data, not a list of documents.
|
||||
- `maxDepth`: Optional. Maximum depth to crawl. Depth 1 is the base URL, depth 2 includes the base URL and its direct children, and so on.
|
||||
- `mode`: Optional. The crawling mode to use. Fast mode crawls 4x faster on websites without a sitemap but may not be as accurate and shouldn't be used on heavily JavaScript-rendered websites.
|
||||
- `limit`: Optional. Maximum number of pages to crawl.
|
||||
- `timeout`: Optional. Timeout in milliseconds for the crawling operation.
|
||||
43
docs/en/tools/web-scraping/firecrawlscrapewebsitetool.mdx
Normal file
43
docs/en/tools/web-scraping/firecrawlscrapewebsitetool.mdx
Normal file
@@ -0,0 +1,43 @@
|
||||
---
|
||||
title: Firecrawl Scrape Website
|
||||
description: The `FirecrawlScrapeWebsiteTool` is designed to scrape websites and convert them into clean markdown or structured data.
|
||||
icon: fire-flame
|
||||
---
|
||||
|
||||
# `FirecrawlScrapeWebsiteTool`
|
||||
|
||||
## Description
|
||||
|
||||
[Firecrawl](https://firecrawl.dev) is a platform for crawling and convert any website into clean markdown or structured data.
|
||||
|
||||
## Installation
|
||||
|
||||
- Get an API key from [firecrawl.dev](https://firecrawl.dev) and set it in environment variables (`FIRECRAWL_API_KEY`).
|
||||
- Install the [Firecrawl SDK](https://github.com/mendableai/firecrawl) along with `crewai[tools]` package:
|
||||
|
||||
```shell
|
||||
pip install firecrawl-py 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
Utilize the FirecrawlScrapeWebsiteTool as follows to allow your agent to load websites:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import FirecrawlScrapeWebsiteTool
|
||||
|
||||
tool = FirecrawlScrapeWebsiteTool(url='firecrawl.dev')
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
- `api_key`: Optional. Specifies Firecrawl API key. Defaults is the `FIRECRAWL_API_KEY` environment variable.
|
||||
- `url`: The URL to scrape.
|
||||
- `page_options`: Optional.
|
||||
- `onlyMainContent`: Optional. Only return the main content of the page excluding headers, navs, footers, etc.
|
||||
- `includeHtml`: Optional. Include the raw HTML content of the page. Will output a html key in the response.
|
||||
- `extractor_options`: Optional. Options for LLM-based extraction of structured information from the page content
|
||||
- `mode`: The extraction mode to use, currently supports 'llm-extraction'
|
||||
- `extractionPrompt`: Optional. A prompt describing what information to extract from the page
|
||||
- `extractionSchema`: Optional. The schema for the data to be extracted
|
||||
- `timeout`: Optional. Timeout in milliseconds for the request
|
||||
41
docs/en/tools/web-scraping/firecrawlsearchtool.mdx
Normal file
41
docs/en/tools/web-scraping/firecrawlsearchtool.mdx
Normal file
@@ -0,0 +1,41 @@
|
||||
---
|
||||
title: Firecrawl Search
|
||||
description: The `FirecrawlSearchTool` is designed to search websites and convert them into clean markdown or structured data.
|
||||
icon: fire-flame
|
||||
---
|
||||
|
||||
# `FirecrawlSearchTool`
|
||||
|
||||
## Description
|
||||
|
||||
[Firecrawl](https://firecrawl.dev) is a platform for crawling and convert any website into clean markdown or structured data.
|
||||
|
||||
## Installation
|
||||
|
||||
- Get an API key from [firecrawl.dev](https://firecrawl.dev) and set it in environment variables (`FIRECRAWL_API_KEY`).
|
||||
- Install the [Firecrawl SDK](https://github.com/mendableai/firecrawl) along with `crewai[tools]` package:
|
||||
|
||||
```shell
|
||||
pip install firecrawl-py 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
Utilize the FirecrawlSearchTool as follows to allow your agent to load websites:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import FirecrawlSearchTool
|
||||
|
||||
tool = FirecrawlSearchTool(query='what is firecrawl?')
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
- `api_key`: Optional. Specifies Firecrawl API key. Defaults is the `FIRECRAWL_API_KEY` environment variable.
|
||||
- `query`: The search query string to be used for searching.
|
||||
- `page_options`: Optional. Options for result formatting.
|
||||
- `onlyMainContent`: Optional. Only return the main content of the page excluding headers, navs, footers, etc.
|
||||
- `includeHtml`: Optional. Include the raw HTML content of the page. Will output a html key in the response.
|
||||
- `fetchPageContent`: Optional. Fetch the full content of the page.
|
||||
- `search_options`: Optional. Options for controlling the crawling behavior.
|
||||
- `limit`: Optional. Maximum number of pages to crawl.
|
||||
86
docs/en/tools/web-scraping/hyperbrowserloadtool.mdx
Normal file
86
docs/en/tools/web-scraping/hyperbrowserloadtool.mdx
Normal file
@@ -0,0 +1,86 @@
|
||||
---
|
||||
title: Hyperbrowser Load Tool
|
||||
description: The `HyperbrowserLoadTool` enables web scraping and crawling using Hyperbrowser.
|
||||
icon: globe
|
||||
---
|
||||
|
||||
# `HyperbrowserLoadTool`
|
||||
|
||||
## Description
|
||||
|
||||
The `HyperbrowserLoadTool` enables web scraping and crawling using [Hyperbrowser](https://hyperbrowser.ai), a platform for running and scaling headless browsers. This tool allows you to scrape a single page or crawl an entire site, returning the content in properly formatted markdown or HTML.
|
||||
|
||||
Key Features:
|
||||
- Instant Scalability - Spin up hundreds of browser sessions in seconds without infrastructure headaches
|
||||
- Simple Integration - Works seamlessly with popular tools like Puppeteer and Playwright
|
||||
- Powerful APIs - Easy to use APIs for scraping/crawling any site
|
||||
- Bypass Anti-Bot Measures - Built-in stealth mode, ad blocking, automatic CAPTCHA solving, and rotating proxies
|
||||
|
||||
## Installation
|
||||
|
||||
To use this tool, you need to install the Hyperbrowser SDK:
|
||||
|
||||
```shell
|
||||
uv add hyperbrowser
|
||||
```
|
||||
|
||||
## Steps to Get Started
|
||||
|
||||
To effectively use the `HyperbrowserLoadTool`, follow these steps:
|
||||
|
||||
1. **Sign Up**: Head to [Hyperbrowser](https://app.hyperbrowser.ai/) to sign up and generate an API key.
|
||||
2. **API Key**: Set the `HYPERBROWSER_API_KEY` environment variable or pass it directly to the tool constructor.
|
||||
3. **Install SDK**: Install the Hyperbrowser SDK using the command above.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to initialize the tool and use it to scrape a website:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import HyperbrowserLoadTool
|
||||
from crewai import Agent
|
||||
|
||||
# Initialize the tool with your API key
|
||||
tool = HyperbrowserLoadTool(api_key="your_api_key") # Or use environment variable
|
||||
|
||||
# Define an agent that uses the tool
|
||||
@agent
|
||||
def web_researcher(self) -> Agent:
|
||||
'''
|
||||
This agent uses the HyperbrowserLoadTool to scrape websites
|
||||
and extract information.
|
||||
'''
|
||||
return Agent(
|
||||
config=self.agents_config["web_researcher"],
|
||||
tools=[tool]
|
||||
)
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `HyperbrowserLoadTool` accepts the following parameters:
|
||||
|
||||
### Constructor Parameters
|
||||
- **api_key**: Optional. Your Hyperbrowser API key. If not provided, it will be read from the `HYPERBROWSER_API_KEY` environment variable.
|
||||
|
||||
### Run Parameters
|
||||
- **url**: Required. The website URL to scrape or crawl.
|
||||
- **operation**: Optional. The operation to perform on the website. Either 'scrape' or 'crawl'. Default is 'scrape'.
|
||||
- **params**: Optional. Additional parameters for the scrape or crawl operation.
|
||||
|
||||
## Supported Parameters
|
||||
|
||||
For detailed information on all supported parameters, visit:
|
||||
- [Scrape Parameters](https://docs.hyperbrowser.ai/reference/sdks/python/scrape#start-scrape-job-and-wait)
|
||||
- [Crawl Parameters](https://docs.hyperbrowser.ai/reference/sdks/python/crawl#start-crawl-job-and-wait)
|
||||
|
||||
## Return Format
|
||||
|
||||
The tool returns content in the following format:
|
||||
|
||||
- For **scrape** operations: The content of the page in markdown or HTML format.
|
||||
- For **crawl** operations: The content of each page separated by dividers, including the URL of each page.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `HyperbrowserLoadTool` provides a powerful way to scrape and crawl websites, handling complex scenarios like anti-bot measures, CAPTCHAs, and more. By leveraging Hyperbrowser's platform, this tool enables agents to access and extract web content efficiently.
|
||||
107
docs/en/tools/web-scraping/overview.mdx
Normal file
107
docs/en/tools/web-scraping/overview.mdx
Normal file
@@ -0,0 +1,107 @@
|
||||
---
|
||||
title: "Overview"
|
||||
description: "Extract data from websites and automate browser interactions with powerful scraping tools"
|
||||
icon: "face-smile"
|
||||
---
|
||||
|
||||
These tools enable your agents to interact with the web, extract data from websites, and automate browser-based tasks. From simple web scraping to complex browser automation, these tools cover all your web interaction needs.
|
||||
|
||||
## **Available Tools**
|
||||
|
||||
<CardGroup cols={2}>
|
||||
<Card title="Scrape Website Tool" icon="globe" href="/en/tools/web-scraping/scrapewebsitetool">
|
||||
General-purpose web scraping tool for extracting content from any website.
|
||||
</Card>
|
||||
|
||||
<Card title="Scrape Element Tool" icon="crosshairs" href="/en/tools/web-scraping/scrapeelementfromwebsitetool">
|
||||
Target specific elements on web pages with precision scraping capabilities.
|
||||
</Card>
|
||||
|
||||
<Card title="Firecrawl Crawl Tool" icon="spider" href="/en/tools/web-scraping/firecrawlcrawlwebsitetool">
|
||||
Crawl entire websites systematically with Firecrawl's powerful engine.
|
||||
</Card>
|
||||
|
||||
<Card title="Firecrawl Scrape Tool" icon="fire" href="/en/tools/web-scraping/firecrawlscrapewebsitetool">
|
||||
High-performance web scraping with Firecrawl's advanced capabilities.
|
||||
</Card>
|
||||
|
||||
<Card title="Firecrawl Search Tool" icon="magnifying-glass" href="/en/tools/web-scraping/firecrawlsearchtool">
|
||||
Search and extract specific content using Firecrawl's search features.
|
||||
</Card>
|
||||
|
||||
<Card title="Selenium Scraping Tool" icon="robot" href="/en/tools/web-scraping/seleniumscrapingtool">
|
||||
Browser automation and scraping with Selenium WebDriver capabilities.
|
||||
</Card>
|
||||
|
||||
<Card title="ScrapFly Tool" icon="plane" href="/en/tools/web-scraping/scrapflyscrapetool">
|
||||
Professional web scraping with ScrapFly's premium scraping service.
|
||||
</Card>
|
||||
|
||||
<Card title="ScrapGraph Tool" icon="network-wired" href="/en/tools/web-scraping/scrapegraphscrapetool">
|
||||
Graph-based web scraping for complex data relationships.
|
||||
</Card>
|
||||
|
||||
<Card title="Spider Tool" icon="spider" href="/en/tools/web-scraping/spidertool">
|
||||
Comprehensive web crawling and data extraction capabilities.
|
||||
</Card>
|
||||
|
||||
<Card title="BrowserBase Tool" icon="browser" href="/en/tools/web-scraping/browserbaseloadtool">
|
||||
Cloud-based browser automation with BrowserBase infrastructure.
|
||||
</Card>
|
||||
|
||||
<Card title="HyperBrowser Tool" icon="window-maximize" href="/en/tools/web-scraping/hyperbrowserloadtool">
|
||||
Fast browser interactions with HyperBrowser's optimized engine.
|
||||
</Card>
|
||||
|
||||
<Card title="Stagehand Tool" icon="hand" href="/en/tools/web-scraping/stagehandtool">
|
||||
Intelligent browser automation with natural language commands.
|
||||
</Card>
|
||||
|
||||
<Card title="Oxylabs Scraper Tool" icon="globe" href="/en/tools/web-scraping/oxylabsscraperstool">
|
||||
Access web data at scale with Oxylabs.
|
||||
</Card>
|
||||
</CardGroup>
|
||||
|
||||
## **Common Use Cases**
|
||||
|
||||
- **Data Extraction**: Scrape product information, prices, and reviews
|
||||
- **Content Monitoring**: Track changes on websites and news sources
|
||||
- **Lead Generation**: Extract contact information and business data
|
||||
- **Market Research**: Gather competitive intelligence and market data
|
||||
- **Testing & QA**: Automate browser testing and validation workflows
|
||||
- **Social Media**: Extract posts, comments, and social media analytics
|
||||
|
||||
## **Quick Start Example**
|
||||
|
||||
```python
|
||||
from crewai_tools import ScrapeWebsiteTool, FirecrawlScrapeWebsiteTool, SeleniumScrapingTool
|
||||
|
||||
# Create scraping tools
|
||||
simple_scraper = ScrapeWebsiteTool()
|
||||
advanced_scraper = FirecrawlScrapeWebsiteTool()
|
||||
browser_automation = SeleniumScrapingTool()
|
||||
|
||||
# Add to your agent
|
||||
agent = Agent(
|
||||
role="Web Research Specialist",
|
||||
tools=[simple_scraper, advanced_scraper, browser_automation],
|
||||
goal="Extract and analyze web data efficiently"
|
||||
)
|
||||
```
|
||||
|
||||
## **Scraping Best Practices**
|
||||
|
||||
- **Respect robots.txt**: Always check and follow website scraping policies
|
||||
- **Rate Limiting**: Implement delays between requests to avoid overwhelming servers
|
||||
- **User Agents**: Use appropriate user agent strings to identify your bot
|
||||
- **Legal Compliance**: Ensure your scraping activities comply with terms of service
|
||||
- **Error Handling**: Implement robust error handling for network issues and blocked requests
|
||||
- **Data Quality**: Validate and clean extracted data before processing
|
||||
|
||||
## **Tool Selection Guide**
|
||||
|
||||
- **Simple Tasks**: Use `ScrapeWebsiteTool` for basic content extraction
|
||||
- **JavaScript-Heavy Sites**: Use `SeleniumScrapingTool` for dynamic content
|
||||
- **Scale & Performance**: Use `FirecrawlScrapeWebsiteTool` for high-volume scraping
|
||||
- **Cloud Infrastructure**: Use `BrowserBaseLoadTool` for scalable browser automation
|
||||
- **Complex Workflows**: Use `StagehandTool` for intelligent browser interactions
|
||||
236
docs/en/tools/web-scraping/oxylabsscraperstool.mdx
Normal file
236
docs/en/tools/web-scraping/oxylabsscraperstool.mdx
Normal file
@@ -0,0 +1,236 @@
|
||||
---
|
||||
title: Oxylabs Scrapers
|
||||
description: >
|
||||
Oxylabs Scrapers allow to easily access the information from the respective sources. Please see the list of available sources below:
|
||||
- `Amazon Product`
|
||||
- `Amazon Search`
|
||||
- `Google Seach`
|
||||
- `Universal`
|
||||
icon: globe
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
Get the credentials by creating an Oxylabs Account [here](https://oxylabs.io).
|
||||
```shell
|
||||
pip install 'crewai[tools]' oxylabs
|
||||
```
|
||||
Check [Oxylabs Documentation](https://developers.oxylabs.io/scraping-solutions/web-scraper-api/targets) to get more information about API parameters.
|
||||
|
||||
# `OxylabsAmazonProductScraperTool`
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
from crewai_tools import OxylabsAmazonProductScraperTool
|
||||
|
||||
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
|
||||
tool = OxylabsAmazonProductScraperTool()
|
||||
|
||||
result = tool.run(query="AAAAABBBBCC")
|
||||
|
||||
print(result)
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
- `query` - 10-symbol ASIN code.
|
||||
- `domain` - domain localization for Amazon.
|
||||
- `geo_location` - the _Deliver to_ location.
|
||||
- `user_agent_type` - device type and browser.
|
||||
- `render` - enables JavaScript rendering when set to `html`.
|
||||
- `callback_url` - URL to your callback endpoint.
|
||||
- `context` - Additional advanced settings and controls for specialized requirements.
|
||||
- `parse` - returns parsed data when set to true.
|
||||
- `parsing_instructions` - define your own parsing and data transformation logic that will be executed on an HTML scraping result.
|
||||
|
||||
### Advanced example
|
||||
|
||||
```python
|
||||
from crewai_tools import OxylabsAmazonProductScraperTool
|
||||
|
||||
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
|
||||
tool = OxylabsAmazonProductScraperTool(
|
||||
config={
|
||||
"domain": "com",
|
||||
"parse": True,
|
||||
"context": [
|
||||
{
|
||||
"key": "autoselect_variant",
|
||||
"value": True
|
||||
}
|
||||
]
|
||||
}
|
||||
)
|
||||
|
||||
result = tool.run(query="AAAAABBBBCC")
|
||||
|
||||
print(result)
|
||||
```
|
||||
|
||||
# `OxylabsAmazonSearchScraperTool`
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
from crewai_tools import OxylabsAmazonSearchScraperTool
|
||||
|
||||
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
|
||||
tool = OxylabsAmazonSearchScraperTool()
|
||||
|
||||
result = tool.run(query="headsets")
|
||||
|
||||
print(result)
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
- `query` - Amazon search term.
|
||||
- `domain` - Domain localization for Bestbuy.
|
||||
- `start_page` - starting page number.
|
||||
- `pages` - number of pages to retrieve.
|
||||
- `geo_location` - the _Deliver to_ location.
|
||||
- `user_agent_type` - device type and browser.
|
||||
- `render` - enables JavaScript rendering when set to `html`.
|
||||
- `callback_url` - URL to your callback endpoint.
|
||||
- `context` - Additional advanced settings and controls for specialized requirements.
|
||||
- `parse` - returns parsed data when set to true.
|
||||
- `parsing_instructions` - define your own parsing and data transformation logic that will be executed on an HTML scraping result.
|
||||
|
||||
### Advanced example
|
||||
|
||||
```python
|
||||
from crewai_tools import OxylabsAmazonSearchScraperTool
|
||||
|
||||
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
|
||||
tool = OxylabsAmazonSearchScraperTool(
|
||||
config={
|
||||
"domain": 'nl',
|
||||
"start_page": 2,
|
||||
"pages": 2,
|
||||
"parse": True,
|
||||
"context": [
|
||||
{'key': 'category_id', 'value': 16391693031}
|
||||
],
|
||||
}
|
||||
)
|
||||
|
||||
result = tool.run(query='nirvana tshirt')
|
||||
|
||||
print(result)
|
||||
```
|
||||
|
||||
# `OxylabsGoogleSearchScraperTool`
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
from crewai_tools import OxylabsGoogleSearchScraperTool
|
||||
|
||||
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
|
||||
tool = OxylabsGoogleSearchScraperTool()
|
||||
|
||||
result = tool.run(query="iPhone 16")
|
||||
|
||||
print(result)
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
- `query` - search keyword.
|
||||
- `domain` - domain localization for Google.
|
||||
- `start_page` - starting page number.
|
||||
- `pages` - number of pages to retrieve.
|
||||
- `limit` - number of results to retrieve in each page.
|
||||
- `locale` - `Accept-Language` header value which changes your Google search page web interface language.
|
||||
- `geo_location` - the geographical location that the result should be adapted for. Using this parameter correctly is extremely important to get the right data.
|
||||
- `user_agent_type` - device type and browser.
|
||||
- `render` - enables JavaScript rendering when set to `html`.
|
||||
- `callback_url` - URL to your callback endpoint.
|
||||
- `context` - Additional advanced settings and controls for specialized requirements.
|
||||
- `parse` - returns parsed data when set to true.
|
||||
- `parsing_instructions` - define your own parsing and data transformation logic that will be executed on an HTML scraping result.
|
||||
|
||||
### Advanced example
|
||||
|
||||
```python
|
||||
from crewai_tools import OxylabsGoogleSearchScraperTool
|
||||
|
||||
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
|
||||
tool = OxylabsGoogleSearchScraperTool(
|
||||
config={
|
||||
"parse": True,
|
||||
"geo_location": "Paris, France",
|
||||
"user_agent_type": "tablet",
|
||||
}
|
||||
)
|
||||
|
||||
result = tool.run(query="iPhone 16")
|
||||
|
||||
print(result)
|
||||
```
|
||||
|
||||
# `OxylabsUniversalScraperTool`
|
||||
|
||||
### Example
|
||||
|
||||
```python
|
||||
from crewai_tools import OxylabsUniversalScraperTool
|
||||
|
||||
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
|
||||
tool = OxylabsUniversalScraperTool()
|
||||
|
||||
result = tool.run(url="https://ip.oxylabs.io")
|
||||
|
||||
print(result)
|
||||
```
|
||||
|
||||
### Parameters
|
||||
|
||||
- `url` - website url to scrape.
|
||||
- `user_agent_type` - device type and browser.
|
||||
- `geo_location` - sets the proxy's geolocation to retrieve data.
|
||||
- `render` - enables JavaScript rendering when set to `html`.
|
||||
- `callback_url` - URL to your callback endpoint.
|
||||
- `context` - Additional advanced settings and controls for specialized requirements.
|
||||
- `parse` - returns parsed data when set to `true`, as long as a dedicated parser exists for the submitted URL's page type.
|
||||
- `parsing_instructions` - define your own parsing and data transformation logic that will be executed on an HTML scraping result.
|
||||
|
||||
|
||||
### Advanced example
|
||||
|
||||
```python
|
||||
from crewai_tools import OxylabsUniversalScraperTool
|
||||
|
||||
# make sure OXYLABS_USERNAME and OXYLABS_PASSWORD variables are set
|
||||
tool = OxylabsUniversalScraperTool(
|
||||
config={
|
||||
"render": "html",
|
||||
"user_agent_type": "mobile",
|
||||
"context": [
|
||||
{"key": "force_headers", "value": True},
|
||||
{"key": "force_cookies", "value": True},
|
||||
{
|
||||
"key": "headers",
|
||||
"value": {
|
||||
"Custom-Header-Name": "custom header content",
|
||||
},
|
||||
},
|
||||
{
|
||||
"key": "cookies",
|
||||
"value": [
|
||||
{"key": "NID", "value": "1234567890"},
|
||||
{"key": "1P JAR", "value": "0987654321"},
|
||||
],
|
||||
},
|
||||
{"key": "http_method", "value": "get"},
|
||||
{"key": "follow_redirects", "value": True},
|
||||
{"key": "successful_status_codes", "value": [808, 909]},
|
||||
],
|
||||
}
|
||||
)
|
||||
|
||||
result = tool.run(url="https://ip.oxylabs.io")
|
||||
|
||||
print(result)
|
||||
```
|
||||
139
docs/en/tools/web-scraping/scrapeelementfromwebsitetool.mdx
Normal file
139
docs/en/tools/web-scraping/scrapeelementfromwebsitetool.mdx
Normal file
@@ -0,0 +1,139 @@
|
||||
---
|
||||
title: Scrape Element From Website Tool
|
||||
description: The `ScrapeElementFromWebsiteTool` enables CrewAI agents to extract specific elements from websites using CSS selectors.
|
||||
icon: code
|
||||
---
|
||||
|
||||
# `ScrapeElementFromWebsiteTool`
|
||||
|
||||
## Description
|
||||
|
||||
The `ScrapeElementFromWebsiteTool` is designed to extract specific elements from websites using CSS selectors. This tool allows CrewAI agents to scrape targeted content from web pages, making it useful for data extraction tasks where only specific parts of a webpage are needed.
|
||||
|
||||
## Installation
|
||||
|
||||
To use this tool, you need to install the required dependencies:
|
||||
|
||||
```shell
|
||||
uv add requests beautifulsoup4
|
||||
```
|
||||
|
||||
## Steps to Get Started
|
||||
|
||||
To effectively use the `ScrapeElementFromWebsiteTool`, follow these steps:
|
||||
|
||||
1. **Install Dependencies**: Install the required packages using the command above.
|
||||
2. **Identify CSS Selectors**: Determine the CSS selectors for the elements you want to extract from the website.
|
||||
3. **Initialize the Tool**: Create an instance of the tool with the necessary parameters.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to use the `ScrapeElementFromWebsiteTool` to extract specific elements from a website:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools import ScrapeElementFromWebsiteTool
|
||||
|
||||
# Initialize the tool
|
||||
scrape_tool = ScrapeElementFromWebsiteTool()
|
||||
|
||||
# Define an agent that uses the tool
|
||||
web_scraper_agent = Agent(
|
||||
role="Web Scraper",
|
||||
goal="Extract specific information from websites",
|
||||
backstory="An expert in web scraping who can extract targeted content from web pages.",
|
||||
tools=[scrape_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Example task to extract headlines from a news website
|
||||
scrape_task = Task(
|
||||
description="Extract the main headlines from the CNN homepage. Use the CSS selector '.headline' to target the headline elements.",
|
||||
expected_output="A list of the main headlines from CNN.",
|
||||
agent=web_scraper_agent,
|
||||
)
|
||||
|
||||
# Create and run the crew
|
||||
crew = Crew(agents=[web_scraper_agent], tasks=[scrape_task])
|
||||
result = crew.kickoff()
|
||||
```
|
||||
|
||||
You can also initialize the tool with predefined parameters:
|
||||
|
||||
```python Code
|
||||
# Initialize the tool with predefined parameters
|
||||
scrape_tool = ScrapeElementFromWebsiteTool(
|
||||
website_url="https://www.example.com",
|
||||
css_element=".main-content"
|
||||
)
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `ScrapeElementFromWebsiteTool` accepts the following parameters during initialization:
|
||||
|
||||
- **website_url**: Optional. The URL of the website to scrape. If provided during initialization, the agent won't need to specify it when using the tool.
|
||||
- **css_element**: Optional. The CSS selector for the elements to extract. If provided during initialization, the agent won't need to specify it when using the tool.
|
||||
- **cookies**: Optional. A dictionary containing cookies to be sent with the request. This can be useful for websites that require authentication.
|
||||
|
||||
## Usage
|
||||
|
||||
When using the `ScrapeElementFromWebsiteTool` with an agent, the agent will need to provide the following parameters (unless they were specified during initialization):
|
||||
|
||||
- **website_url**: The URL of the website to scrape.
|
||||
- **css_element**: The CSS selector for the elements to extract.
|
||||
|
||||
The tool will return the text content of all elements matching the CSS selector, joined by newlines.
|
||||
|
||||
```python Code
|
||||
# Example of using the tool with an agent
|
||||
web_scraper_agent = Agent(
|
||||
role="Web Scraper",
|
||||
goal="Extract specific elements from websites",
|
||||
backstory="An expert in web scraping who can extract targeted content using CSS selectors.",
|
||||
tools=[scrape_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Create a task for the agent to extract specific elements
|
||||
extract_task = Task(
|
||||
description="""
|
||||
Extract all product titles from the featured products section on example.com.
|
||||
Use the CSS selector '.product-title' to target the title elements.
|
||||
""",
|
||||
expected_output="A list of product titles from the website",
|
||||
agent=web_scraper_agent,
|
||||
)
|
||||
|
||||
# Run the task through a crew
|
||||
crew = Crew(agents=[web_scraper_agent], tasks=[extract_task])
|
||||
result = crew.kickoff()
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
The `ScrapeElementFromWebsiteTool` uses the `requests` library to fetch the web page and `BeautifulSoup` to parse the HTML and extract the specified elements:
|
||||
|
||||
```python Code
|
||||
class ScrapeElementFromWebsiteTool(BaseTool):
|
||||
name: str = "Read a website content"
|
||||
description: str = "A tool that can be used to read a website content."
|
||||
|
||||
# Implementation details...
|
||||
|
||||
def _run(self, **kwargs: Any) -> Any:
|
||||
website_url = kwargs.get("website_url", self.website_url)
|
||||
css_element = kwargs.get("css_element", self.css_element)
|
||||
page = requests.get(
|
||||
website_url,
|
||||
headers=self.headers,
|
||||
cookies=self.cookies if self.cookies else {},
|
||||
)
|
||||
parsed = BeautifulSoup(page.content, "html.parser")
|
||||
elements = parsed.select(css_element)
|
||||
return "\n".join([element.get_text() for element in elements])
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `ScrapeElementFromWebsiteTool` provides a powerful way to extract specific elements from websites using CSS selectors. By enabling agents to target only the content they need, it makes web scraping tasks more efficient and focused. This tool is particularly useful for data extraction, content monitoring, and research tasks where specific information needs to be extracted from web pages.
|
||||
196
docs/en/tools/web-scraping/scrapegraphscrapetool.mdx
Normal file
196
docs/en/tools/web-scraping/scrapegraphscrapetool.mdx
Normal file
@@ -0,0 +1,196 @@
|
||||
---
|
||||
title: Scrapegraph Scrape Tool
|
||||
description: The `ScrapegraphScrapeTool` leverages Scrapegraph AI's SmartScraper API to intelligently extract content from websites.
|
||||
icon: chart-area
|
||||
---
|
||||
|
||||
# `ScrapegraphScrapeTool`
|
||||
|
||||
## Description
|
||||
|
||||
The `ScrapegraphScrapeTool` is designed to leverage Scrapegraph AI's SmartScraper API to intelligently extract content from websites. This tool provides advanced web scraping capabilities with AI-powered content extraction, making it ideal for targeted data collection and content analysis tasks. Unlike traditional web scrapers, it can understand the context and structure of web pages to extract the most relevant information based on natural language prompts.
|
||||
|
||||
## Installation
|
||||
|
||||
To use this tool, you need to install the Scrapegraph Python client:
|
||||
|
||||
```shell
|
||||
uv add scrapegraph-py
|
||||
```
|
||||
|
||||
You'll also need to set up your Scrapegraph API key as an environment variable:
|
||||
|
||||
```shell
|
||||
export SCRAPEGRAPH_API_KEY="your_api_key"
|
||||
```
|
||||
|
||||
You can obtain an API key from [Scrapegraph AI](https://scrapegraphai.com).
|
||||
|
||||
## Steps to Get Started
|
||||
|
||||
To effectively use the `ScrapegraphScrapeTool`, follow these steps:
|
||||
|
||||
1. **Install Dependencies**: Install the required package using the command above.
|
||||
2. **Set Up API Key**: Set your Scrapegraph API key as an environment variable or provide it during initialization.
|
||||
3. **Initialize the Tool**: Create an instance of the tool with the necessary parameters.
|
||||
4. **Define Extraction Prompts**: Create natural language prompts to guide the extraction of specific content.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to use the `ScrapegraphScrapeTool` to extract content from a website:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools import ScrapegraphScrapeTool
|
||||
|
||||
# Initialize the tool
|
||||
scrape_tool = ScrapegraphScrapeTool(api_key="your_api_key")
|
||||
|
||||
# Define an agent that uses the tool
|
||||
web_scraper_agent = Agent(
|
||||
role="Web Scraper",
|
||||
goal="Extract specific information from websites",
|
||||
backstory="An expert in web scraping who can extract targeted content from web pages.",
|
||||
tools=[scrape_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Example task to extract product information from an e-commerce site
|
||||
scrape_task = Task(
|
||||
description="Extract product names, prices, and descriptions from the featured products section of example.com.",
|
||||
expected_output="A structured list of product information including names, prices, and descriptions.",
|
||||
agent=web_scraper_agent,
|
||||
)
|
||||
|
||||
# Create and run the crew
|
||||
crew = Crew(agents=[web_scraper_agent], tasks=[scrape_task])
|
||||
result = crew.kickoff()
|
||||
```
|
||||
|
||||
You can also initialize the tool with predefined parameters:
|
||||
|
||||
```python Code
|
||||
# Initialize the tool with predefined parameters
|
||||
scrape_tool = ScrapegraphScrapeTool(
|
||||
website_url="https://www.example.com",
|
||||
user_prompt="Extract all product prices and descriptions",
|
||||
api_key="your_api_key"
|
||||
)
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `ScrapegraphScrapeTool` accepts the following parameters during initialization:
|
||||
|
||||
- **api_key**: Optional. Your Scrapegraph API key. If not provided, it will look for the `SCRAPEGRAPH_API_KEY` environment variable.
|
||||
- **website_url**: Optional. The URL of the website to scrape. If provided during initialization, the agent won't need to specify it when using the tool.
|
||||
- **user_prompt**: Optional. Custom instructions for content extraction. If provided during initialization, the agent won't need to specify it when using the tool.
|
||||
- **enable_logging**: Optional. Whether to enable logging for the Scrapegraph client. Default is `False`.
|
||||
|
||||
## Usage
|
||||
|
||||
When using the `ScrapegraphScrapeTool` with an agent, the agent will need to provide the following parameters (unless they were specified during initialization):
|
||||
|
||||
- **website_url**: The URL of the website to scrape.
|
||||
- **user_prompt**: Optional. Custom instructions for content extraction. Default is "Extract the main content of the webpage".
|
||||
|
||||
The tool will return the extracted content based on the provided prompt.
|
||||
|
||||
```python Code
|
||||
# Example of using the tool with an agent
|
||||
web_scraper_agent = Agent(
|
||||
role="Web Scraper",
|
||||
goal="Extract specific information from websites",
|
||||
backstory="An expert in web scraping who can extract targeted content from web pages.",
|
||||
tools=[scrape_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Create a task for the agent to extract specific content
|
||||
extract_task = Task(
|
||||
description="Extract the main heading and summary from example.com",
|
||||
expected_output="The main heading and summary from the website",
|
||||
agent=web_scraper_agent,
|
||||
)
|
||||
|
||||
# Run the task
|
||||
crew = Crew(agents=[web_scraper_agent], tasks=[extract_task])
|
||||
result = crew.kickoff()
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
The `ScrapegraphScrapeTool` may raise the following exceptions:
|
||||
|
||||
- **ValueError**: When API key is missing or URL format is invalid.
|
||||
- **RateLimitError**: When API rate limits are exceeded.
|
||||
- **RuntimeError**: When scraping operation fails (network issues, API errors).
|
||||
|
||||
It's recommended to instruct agents to handle potential errors gracefully:
|
||||
|
||||
```python Code
|
||||
# Create a task that includes error handling instructions
|
||||
robust_extract_task = Task(
|
||||
description="""
|
||||
Extract the main heading from example.com.
|
||||
Be aware that you might encounter errors such as:
|
||||
- Invalid URL format
|
||||
- Missing API key
|
||||
- Rate limit exceeded
|
||||
- Network or API errors
|
||||
|
||||
If you encounter any errors, provide a clear explanation of what went wrong
|
||||
and suggest possible solutions.
|
||||
""",
|
||||
expected_output="Either the extracted heading or a clear error explanation",
|
||||
agent=web_scraper_agent,
|
||||
)
|
||||
```
|
||||
|
||||
## Rate Limiting
|
||||
|
||||
The Scrapegraph API has rate limits that vary based on your subscription plan. Consider the following best practices:
|
||||
|
||||
- Implement appropriate delays between requests when processing multiple URLs.
|
||||
- Handle rate limit errors gracefully in your application.
|
||||
- Check your API plan limits on the Scrapegraph dashboard.
|
||||
|
||||
## Implementation Details
|
||||
|
||||
The `ScrapegraphScrapeTool` uses the Scrapegraph Python client to interact with the SmartScraper API:
|
||||
|
||||
```python Code
|
||||
class ScrapegraphScrapeTool(BaseTool):
|
||||
"""
|
||||
A tool that uses Scrapegraph AI to intelligently scrape website content.
|
||||
"""
|
||||
|
||||
# Implementation details...
|
||||
|
||||
def _run(self, **kwargs: Any) -> Any:
|
||||
website_url = kwargs.get("website_url", self.website_url)
|
||||
user_prompt = (
|
||||
kwargs.get("user_prompt", self.user_prompt)
|
||||
or "Extract the main content of the webpage"
|
||||
)
|
||||
|
||||
if not website_url:
|
||||
raise ValueError("website_url is required")
|
||||
|
||||
# Validate URL format
|
||||
self._validate_url(website_url)
|
||||
|
||||
try:
|
||||
# Make the SmartScraper request
|
||||
response = self._client.smartscraper(
|
||||
website_url=website_url,
|
||||
user_prompt=user_prompt,
|
||||
)
|
||||
|
||||
return response
|
||||
# Error handling...
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `ScrapegraphScrapeTool` provides a powerful way to extract content from websites using AI-powered understanding of web page structure. By enabling agents to target specific information using natural language prompts, it makes web scraping tasks more efficient and focused. This tool is particularly useful for data extraction, content monitoring, and research tasks where specific information needs to be extracted from web pages.
|
||||
47
docs/en/tools/web-scraping/scrapewebsitetool.mdx
Normal file
47
docs/en/tools/web-scraping/scrapewebsitetool.mdx
Normal file
@@ -0,0 +1,47 @@
|
||||
---
|
||||
title: Scrape Website
|
||||
description: The `ScrapeWebsiteTool` is designed to extract and read the content of a specified website.
|
||||
icon: magnifying-glass-location
|
||||
---
|
||||
|
||||
# `ScrapeWebsiteTool`
|
||||
|
||||
<Note>
|
||||
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
A tool designed to extract and read the content of a specified website. It is capable of handling various types of web pages by making HTTP requests and parsing the received HTML content.
|
||||
This tool can be particularly useful for web scraping tasks, data collection, or extracting specific information from websites.
|
||||
|
||||
## Installation
|
||||
|
||||
Install the crewai_tools package
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
```python
|
||||
from crewai_tools import ScrapeWebsiteTool
|
||||
|
||||
# To enable scrapping any website it finds during it's execution
|
||||
tool = ScrapeWebsiteTool()
|
||||
|
||||
# Initialize the tool with the website URL,
|
||||
# so the agent can only scrap the content of the specified website
|
||||
tool = ScrapeWebsiteTool(website_url='https://www.example.com')
|
||||
|
||||
# Extract the text from the site
|
||||
text = tool.run()
|
||||
print(text)
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
| Argument | Type | Description |
|
||||
|:---------------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| **website_url** | `string` | **Mandatory** website URL to read the file. This is the primary input for the tool, specifying which website's content should be scraped and read. |
|
||||
220
docs/en/tools/web-scraping/scrapflyscrapetool.mdx
Normal file
220
docs/en/tools/web-scraping/scrapflyscrapetool.mdx
Normal file
@@ -0,0 +1,220 @@
|
||||
---
|
||||
title: Scrapfly Scrape Website Tool
|
||||
description: The `ScrapflyScrapeWebsiteTool` leverages Scrapfly's web scraping API to extract content from websites in various formats.
|
||||
icon: spider
|
||||
---
|
||||
|
||||
# `ScrapflyScrapeWebsiteTool`
|
||||
|
||||
## Description
|
||||
|
||||
The `ScrapflyScrapeWebsiteTool` is designed to leverage [Scrapfly](https://scrapfly.io/)'s web scraping API to extract content from websites. This tool provides advanced web scraping capabilities with headless browser support, proxies, and anti-bot bypass features. It allows for extracting web page data in various formats, including raw HTML, markdown, and plain text, making it ideal for a wide range of web scraping tasks.
|
||||
|
||||
## Installation
|
||||
|
||||
To use this tool, you need to install the Scrapfly SDK:
|
||||
|
||||
```shell
|
||||
uv add scrapfly-sdk
|
||||
```
|
||||
|
||||
You'll also need to obtain a Scrapfly API key by registering at [scrapfly.io/register](https://www.scrapfly.io/register/).
|
||||
|
||||
## Steps to Get Started
|
||||
|
||||
To effectively use the `ScrapflyScrapeWebsiteTool`, follow these steps:
|
||||
|
||||
1. **Install Dependencies**: Install the Scrapfly SDK using the command above.
|
||||
2. **Obtain API Key**: Register at Scrapfly to get your API key.
|
||||
3. **Initialize the Tool**: Create an instance of the tool with your API key.
|
||||
4. **Configure Scraping Parameters**: Customize the scraping parameters based on your needs.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to use the `ScrapflyScrapeWebsiteTool` to extract content from a website:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools import ScrapflyScrapeWebsiteTool
|
||||
|
||||
# Initialize the tool
|
||||
scrape_tool = ScrapflyScrapeWebsiteTool(api_key="your_scrapfly_api_key")
|
||||
|
||||
# Define an agent that uses the tool
|
||||
web_scraper_agent = Agent(
|
||||
role="Web Scraper",
|
||||
goal="Extract information from websites",
|
||||
backstory="An expert in web scraping who can extract content from any website.",
|
||||
tools=[scrape_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Example task to extract content from a website
|
||||
scrape_task = Task(
|
||||
description="Extract the main content from the product page at https://web-scraping.dev/products and summarize the available products.",
|
||||
expected_output="A summary of the products available on the website.",
|
||||
agent=web_scraper_agent,
|
||||
)
|
||||
|
||||
# Create and run the crew
|
||||
crew = Crew(agents=[web_scraper_agent], tasks=[scrape_task])
|
||||
result = crew.kickoff()
|
||||
```
|
||||
|
||||
You can also customize the scraping parameters:
|
||||
|
||||
```python Code
|
||||
# Example with custom scraping parameters
|
||||
web_scraper_agent = Agent(
|
||||
role="Web Scraper",
|
||||
goal="Extract information from websites with custom parameters",
|
||||
backstory="An expert in web scraping who can extract content from any website.",
|
||||
tools=[scrape_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# The agent will use the tool with parameters like:
|
||||
# url="https://web-scraping.dev/products"
|
||||
# scrape_format="markdown"
|
||||
# ignore_scrape_failures=True
|
||||
# scrape_config={
|
||||
# "asp": True, # Bypass scraping blocking solutions, like Cloudflare
|
||||
# "render_js": True, # Enable JavaScript rendering with a cloud headless browser
|
||||
# "proxy_pool": "public_residential_pool", # Select a proxy pool
|
||||
# "country": "us", # Select a proxy location
|
||||
# "auto_scroll": True, # Auto scroll the page
|
||||
# }
|
||||
|
||||
scrape_task = Task(
|
||||
description="Extract the main content from the product page at https://web-scraping.dev/products using advanced scraping options including JavaScript rendering and proxy settings.",
|
||||
expected_output="A detailed summary of the products with all available information.",
|
||||
agent=web_scraper_agent,
|
||||
)
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `ScrapflyScrapeWebsiteTool` accepts the following parameters:
|
||||
|
||||
### Initialization Parameters
|
||||
|
||||
- **api_key**: Required. Your Scrapfly API key.
|
||||
|
||||
### Run Parameters
|
||||
|
||||
- **url**: Required. The URL of the website to scrape.
|
||||
- **scrape_format**: Optional. The format in which to extract the web page content. Options are "raw" (HTML), "markdown", or "text". Default is "markdown".
|
||||
- **scrape_config**: Optional. A dictionary containing additional Scrapfly scraping configuration options.
|
||||
- **ignore_scrape_failures**: Optional. Whether to ignore failures during scraping. If set to `True`, the tool will return `None` instead of raising an exception when scraping fails.
|
||||
|
||||
## Scrapfly Configuration Options
|
||||
|
||||
The `scrape_config` parameter allows you to customize the scraping behavior with the following options:
|
||||
|
||||
- **asp**: Enable anti-scraping protection bypass.
|
||||
- **render_js**: Enable JavaScript rendering with a cloud headless browser.
|
||||
- **proxy_pool**: Select a proxy pool (e.g., "public_residential_pool", "datacenter").
|
||||
- **country**: Select a proxy location (e.g., "us", "uk").
|
||||
- **auto_scroll**: Automatically scroll the page to load lazy-loaded content.
|
||||
- **js**: Execute custom JavaScript code by the headless browser.
|
||||
|
||||
For a complete list of configuration options, refer to the [Scrapfly API documentation](https://scrapfly.io/docs/scrape-api/getting-started).
|
||||
|
||||
## Usage
|
||||
|
||||
When using the `ScrapflyScrapeWebsiteTool` with an agent, the agent will need to provide the URL of the website to scrape and can optionally specify the format and additional configuration options:
|
||||
|
||||
```python Code
|
||||
# Example of using the tool with an agent
|
||||
web_scraper_agent = Agent(
|
||||
role="Web Scraper",
|
||||
goal="Extract information from websites",
|
||||
backstory="An expert in web scraping who can extract content from any website.",
|
||||
tools=[scrape_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Create a task for the agent
|
||||
scrape_task = Task(
|
||||
description="Extract the main content from example.com in markdown format.",
|
||||
expected_output="The main content of example.com in markdown format.",
|
||||
agent=web_scraper_agent,
|
||||
)
|
||||
|
||||
# Run the task
|
||||
crew = Crew(agents=[web_scraper_agent], tasks=[scrape_task])
|
||||
result = crew.kickoff()
|
||||
```
|
||||
|
||||
For more advanced usage with custom configuration:
|
||||
|
||||
```python Code
|
||||
# Create a task with more specific instructions
|
||||
advanced_scrape_task = Task(
|
||||
description="""
|
||||
Extract content from example.com with the following requirements:
|
||||
- Convert the content to plain text format
|
||||
- Enable JavaScript rendering
|
||||
- Use a US-based proxy
|
||||
- Handle any scraping failures gracefully
|
||||
""",
|
||||
expected_output="The extracted content from example.com",
|
||||
agent=web_scraper_agent,
|
||||
)
|
||||
```
|
||||
|
||||
## Error Handling
|
||||
|
||||
By default, the `ScrapflyScrapeWebsiteTool` will raise an exception if scraping fails. Agents can be instructed to handle failures gracefully by specifying the `ignore_scrape_failures` parameter:
|
||||
|
||||
```python Code
|
||||
# Create a task that instructs the agent to handle errors
|
||||
error_handling_task = Task(
|
||||
description="""
|
||||
Extract content from a potentially problematic website and make sure to handle any
|
||||
scraping failures gracefully by setting ignore_scrape_failures to True.
|
||||
""",
|
||||
expected_output="Either the extracted content or a graceful error message",
|
||||
agent=web_scraper_agent,
|
||||
)
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
The `ScrapflyScrapeWebsiteTool` uses the Scrapfly SDK to interact with the Scrapfly API:
|
||||
|
||||
```python Code
|
||||
class ScrapflyScrapeWebsiteTool(BaseTool):
|
||||
name: str = "Scrapfly web scraping API tool"
|
||||
description: str = (
|
||||
"Scrape a webpage url using Scrapfly and return its content as markdown or text"
|
||||
)
|
||||
|
||||
# Implementation details...
|
||||
|
||||
def _run(
|
||||
self,
|
||||
url: str,
|
||||
scrape_format: str = "markdown",
|
||||
scrape_config: Optional[Dict[str, Any]] = None,
|
||||
ignore_scrape_failures: Optional[bool] = None,
|
||||
):
|
||||
from scrapfly import ScrapeApiResponse, ScrapeConfig
|
||||
|
||||
scrape_config = scrape_config if scrape_config is not None else {}
|
||||
try:
|
||||
response: ScrapeApiResponse = self.scrapfly.scrape(
|
||||
ScrapeConfig(url, format=scrape_format, **scrape_config)
|
||||
)
|
||||
return response.scrape_result["content"]
|
||||
except Exception as e:
|
||||
if ignore_scrape_failures:
|
||||
logger.error(f"Error fetching data from {url}, exception: {e}")
|
||||
return None
|
||||
else:
|
||||
raise e
|
||||
```
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `ScrapflyScrapeWebsiteTool` provides a powerful way to extract content from websites using Scrapfly's advanced web scraping capabilities. With features like headless browser support, proxies, and anti-bot bypass, it can handle complex websites and extract content in various formats. This tool is particularly useful for data extraction, content monitoring, and research tasks where reliable web scraping is required.
|
||||
195
docs/en/tools/web-scraping/seleniumscrapingtool.mdx
Normal file
195
docs/en/tools/web-scraping/seleniumscrapingtool.mdx
Normal file
@@ -0,0 +1,195 @@
|
||||
---
|
||||
title: Selenium Scraper
|
||||
description: The `SeleniumScrapingTool` is designed to extract and read the content of a specified website using Selenium.
|
||||
icon: clipboard-user
|
||||
---
|
||||
|
||||
# `SeleniumScrapingTool`
|
||||
|
||||
<Note>
|
||||
This tool is currently in development. As we refine its capabilities, users may encounter unexpected behavior.
|
||||
Your feedback is invaluable to us for making improvements.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
The `SeleniumScrapingTool` is crafted for high-efficiency web scraping tasks.
|
||||
It allows for precise extraction of content from web pages by using CSS selectors to target specific elements.
|
||||
Its design caters to a wide range of scraping needs, offering flexibility to work with any provided website URL.
|
||||
|
||||
## Installation
|
||||
|
||||
To use this tool, you need to install the CrewAI tools package and Selenium:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
uv add selenium webdriver-manager
|
||||
```
|
||||
|
||||
You'll also need to have Chrome installed on your system, as the tool uses Chrome WebDriver for browser automation.
|
||||
|
||||
## Example
|
||||
|
||||
The following example demonstrates how to use the `SeleniumScrapingTool` with a CrewAI agent:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent, Task, Crew, Process
|
||||
from crewai_tools import SeleniumScrapingTool
|
||||
|
||||
# Initialize the tool
|
||||
selenium_tool = SeleniumScrapingTool()
|
||||
|
||||
# Define an agent that uses the tool
|
||||
web_scraper_agent = Agent(
|
||||
role="Web Scraper",
|
||||
goal="Extract information from websites using Selenium",
|
||||
backstory="An expert web scraper who can extract content from dynamic websites.",
|
||||
tools=[selenium_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Example task to scrape content from a website
|
||||
scrape_task = Task(
|
||||
description="Extract the main content from the homepage of example.com. Use the CSS selector 'main' to target the main content area.",
|
||||
expected_output="The main content from example.com's homepage.",
|
||||
agent=web_scraper_agent,
|
||||
)
|
||||
|
||||
# Create and run the crew
|
||||
crew = Crew(
|
||||
agents=[web_scraper_agent],
|
||||
tasks=[scrape_task],
|
||||
verbose=True,
|
||||
process=Process.sequential,
|
||||
)
|
||||
result = crew.kickoff()
|
||||
```
|
||||
|
||||
You can also initialize the tool with predefined parameters:
|
||||
|
||||
```python Code
|
||||
# Initialize the tool with predefined parameters
|
||||
selenium_tool = SeleniumScrapingTool(
|
||||
website_url='https://example.com',
|
||||
css_element='.main-content',
|
||||
wait_time=5
|
||||
)
|
||||
|
||||
# Define an agent that uses the tool
|
||||
web_scraper_agent = Agent(
|
||||
role="Web Scraper",
|
||||
goal="Extract information from websites using Selenium",
|
||||
backstory="An expert web scraper who can extract content from dynamic websites.",
|
||||
tools=[selenium_tool],
|
||||
verbose=True,
|
||||
)
|
||||
```
|
||||
|
||||
## Parameters
|
||||
|
||||
The `SeleniumScrapingTool` accepts the following parameters during initialization:
|
||||
|
||||
- **website_url**: Optional. The URL of the website to scrape. If provided during initialization, the agent won't need to specify it when using the tool.
|
||||
- **css_element**: Optional. The CSS selector for the elements to extract. If provided during initialization, the agent won't need to specify it when using the tool.
|
||||
- **cookie**: Optional. A dictionary containing cookie information, useful for simulating a logged-in session to access restricted content.
|
||||
- **wait_time**: Optional. Specifies the delay (in seconds) before scraping, allowing the website and any dynamic content to fully load. Default is `3` seconds.
|
||||
- **return_html**: Optional. Whether to return the HTML content instead of just the text. Default is `False`.
|
||||
|
||||
When using the tool with an agent, the agent will need to provide the following parameters (unless they were specified during initialization):
|
||||
|
||||
- **website_url**: Required. The URL of the website to scrape.
|
||||
- **css_element**: Required. The CSS selector for the elements to extract.
|
||||
|
||||
## Agent Integration Example
|
||||
|
||||
Here's a more detailed example of how to integrate the `SeleniumScrapingTool` with a CrewAI agent:
|
||||
|
||||
```python Code
|
||||
from crewai import Agent, Task, Crew, Process
|
||||
from crewai_tools import SeleniumScrapingTool
|
||||
|
||||
# Initialize the tool
|
||||
selenium_tool = SeleniumScrapingTool()
|
||||
|
||||
# Define an agent that uses the tool
|
||||
web_scraper_agent = Agent(
|
||||
role="Web Scraper",
|
||||
goal="Extract and analyze information from dynamic websites",
|
||||
backstory="""You are an expert web scraper who specializes in extracting
|
||||
content from dynamic websites that require browser automation. You have
|
||||
extensive knowledge of CSS selectors and can identify the right selectors
|
||||
to target specific content on any website.""",
|
||||
tools=[selenium_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
# Create a task for the agent
|
||||
scrape_task = Task(
|
||||
description="""
|
||||
Extract the following information from the news website at {website_url}:
|
||||
|
||||
1. The headlines of all featured articles (CSS selector: '.headline')
|
||||
2. The publication dates of these articles (CSS selector: '.pub-date')
|
||||
3. The author names where available (CSS selector: '.author')
|
||||
|
||||
Compile this information into a structured format with each article's details grouped together.
|
||||
""",
|
||||
expected_output="A structured list of articles with their headlines, publication dates, and authors.",
|
||||
agent=web_scraper_agent,
|
||||
)
|
||||
|
||||
# Run the task
|
||||
crew = Crew(
|
||||
agents=[web_scraper_agent],
|
||||
tasks=[scrape_task],
|
||||
verbose=True,
|
||||
process=Process.sequential,
|
||||
)
|
||||
result = crew.kickoff(inputs={"website_url": "https://news-example.com"})
|
||||
```
|
||||
|
||||
## Implementation Details
|
||||
|
||||
The `SeleniumScrapingTool` uses Selenium WebDriver to automate browser interactions:
|
||||
|
||||
```python Code
|
||||
class SeleniumScrapingTool(BaseTool):
|
||||
name: str = "Read a website content"
|
||||
description: str = "A tool that can be used to read a website content."
|
||||
args_schema: Type[BaseModel] = SeleniumScrapingToolSchema
|
||||
|
||||
def _run(self, **kwargs: Any) -> Any:
|
||||
website_url = kwargs.get("website_url", self.website_url)
|
||||
css_element = kwargs.get("css_element", self.css_element)
|
||||
return_html = kwargs.get("return_html", self.return_html)
|
||||
driver = self._create_driver(website_url, self.cookie, self.wait_time)
|
||||
|
||||
content = self._get_content(driver, css_element, return_html)
|
||||
driver.close()
|
||||
|
||||
return "\n".join(content)
|
||||
```
|
||||
|
||||
The tool performs the following steps:
|
||||
1. Creates a headless Chrome browser instance
|
||||
2. Navigates to the specified URL
|
||||
3. Waits for the specified time to allow the page to load
|
||||
4. Adds any cookies if provided
|
||||
5. Extracts content based on the CSS selector
|
||||
6. Returns the extracted content as text or HTML
|
||||
7. Closes the browser instance
|
||||
|
||||
## Handling Dynamic Content
|
||||
|
||||
The `SeleniumScrapingTool` is particularly useful for scraping websites with dynamic content that is loaded via JavaScript. By using a real browser instance, it can:
|
||||
|
||||
1. Execute JavaScript on the page
|
||||
2. Wait for dynamic content to load
|
||||
3. Interact with elements if needed
|
||||
4. Extract content that would not be available with simple HTTP requests
|
||||
|
||||
You can adjust the `wait_time` parameter to ensure that all dynamic content has loaded before extraction.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The `SeleniumScrapingTool` provides a powerful way to extract content from websites using browser automation. By enabling agents to interact with websites as a real user would, it facilitates scraping of dynamic content that would be difficult or impossible to extract using simpler methods. This tool is particularly useful for research, data collection, and monitoring tasks that involve modern web applications with JavaScript-rendered content.
|
||||
92
docs/en/tools/web-scraping/spidertool.mdx
Normal file
92
docs/en/tools/web-scraping/spidertool.mdx
Normal file
@@ -0,0 +1,92 @@
|
||||
---
|
||||
title: Spider Scraper
|
||||
description: The `SpiderTool` is designed to extract and read the content of a specified website using Spider.
|
||||
icon: spider-web
|
||||
---
|
||||
|
||||
# `SpiderTool`
|
||||
|
||||
## Description
|
||||
|
||||
[Spider](https://spider.cloud/?ref=crewai) is the [fastest](https://github.com/spider-rs/spider/blob/main/benches/BENCHMARKS.md#benchmark-results)
|
||||
open source scraper and crawler that returns LLM-ready data.
|
||||
It converts any website into pure HTML, markdown, metadata or text while enabling you to crawl with custom actions using AI.
|
||||
|
||||
## Installation
|
||||
|
||||
To use the `SpiderTool` you need to download the [Spider SDK](https://pypi.org/project/spider-client/)
|
||||
and the `crewai[tools]` SDK too:
|
||||
|
||||
```shell
|
||||
pip install spider-client 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
|
||||
This example shows you how you can use the `SpiderTool` to enable your agent to scrape and crawl websites.
|
||||
The data returned from the Spider API is already LLM-ready, so no need to do any cleaning there.
|
||||
|
||||
```python Code
|
||||
from crewai_tools import SpiderTool
|
||||
|
||||
def main():
|
||||
spider_tool = SpiderTool()
|
||||
|
||||
searcher = Agent(
|
||||
role="Web Research Expert",
|
||||
goal="Find related information from specific URL's",
|
||||
backstory="An expert web researcher that uses the web extremely well",
|
||||
tools=[spider_tool],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
return_metadata = Task(
|
||||
description="Scrape https://spider.cloud with a limit of 1 and enable metadata",
|
||||
expected_output="Metadata and 10 word summary of spider.cloud",
|
||||
agent=searcher
|
||||
)
|
||||
|
||||
crew = Crew(
|
||||
agents=[searcher],
|
||||
tasks=[
|
||||
return_metadata,
|
||||
],
|
||||
verbose=2
|
||||
)
|
||||
|
||||
crew.kickoff()
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
```
|
||||
|
||||
## Arguments
|
||||
| Argument | Type | Description |
|
||||
|:------------------|:---------|:-----------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| **api_key** | `string` | Specifies Spider API key. If not specified, it looks for `SPIDER_API_KEY` in environment variables. |
|
||||
| **params** | `object` | Optional parameters for the request. Defaults to `{"return_format": "markdown"}` to optimize content for LLMs. |
|
||||
| **request** | `string` | Type of request to perform (`http`, `chrome`, `smart`). `smart` defaults to HTTP, switching to JavaScript rendering if needed. |
|
||||
| **limit** | `int` | Max pages to crawl per website. Set to `0` or omit for unlimited. |
|
||||
| **depth** | `int` | Max crawl depth. Set to `0` for no limit. |
|
||||
| **cache** | `bool` | Enables HTTP caching to speed up repeated runs. Default is `true`. |
|
||||
| **budget** | `object` | Sets path-based limits for crawled pages, e.g., `{"*":1}` for root page only. |
|
||||
| **locale** | `string` | Locale for the request, e.g., `en-US`. |
|
||||
| **cookies** | `string` | HTTP cookies for the request. |
|
||||
| **stealth** | `bool` | Enables stealth mode for Chrome requests to avoid detection. Default is `true`. |
|
||||
| **headers** | `object` | HTTP headers as a map of key-value pairs for all requests. |
|
||||
| **metadata** | `bool` | Stores metadata about pages and content, aiding AI interoperability. Defaults to `false`. |
|
||||
| **viewport** | `object` | Sets Chrome viewport dimensions. Default is `800x600`. |
|
||||
| **encoding** | `string` | Specifies encoding type, e.g., `UTF-8`, `SHIFT_JIS`. |
|
||||
| **subdomains** | `bool` | Includes subdomains in the crawl. Default is `false`. |
|
||||
| **user_agent** | `string` | Custom HTTP user agent. Defaults to a random agent. |
|
||||
| **store_data** | `bool` | Enables data storage for the request. Overrides `storageless` when set. Default is `false`. |
|
||||
| **gpt_config** | `object` | Allows AI to generate crawl actions, with optional chaining steps via an array for `"prompt"`. |
|
||||
| **fingerprint** | `bool` | Enables advanced fingerprinting for Chrome. |
|
||||
| **storageless** | `bool` | Prevents all data storage, including AI embeddings. Default is `false`. |
|
||||
| **readability** | `bool` | Pre-processes content for reading via [Mozilla’s readability](https://github.com/mozilla/readability). Improves content for LLMs. |
|
||||
| **return_format** | `string` | Format to return data: `markdown`, `raw`, `text`, `html2text`. Use `raw` for default page format. |
|
||||
| **proxy_enabled** | `bool` | Enables high-performance proxies to avoid network-level blocking. |
|
||||
| **query_selector** | `string` | CSS query selector for content extraction from markup. |
|
||||
| **full_resources** | `bool` | Downloads all resources linked to the website. |
|
||||
| **request_timeout** | `int` | Timeout in seconds for requests (5-60). Default is `30`. |
|
||||
| **run_in_background** | `bool` | Runs the request in the background, useful for data storage and triggering dashboard crawls. No effect if `storageless` is set. |
|
||||
244
docs/en/tools/web-scraping/stagehandtool.mdx
Normal file
244
docs/en/tools/web-scraping/stagehandtool.mdx
Normal file
@@ -0,0 +1,244 @@
|
||||
---
|
||||
title: Stagehand Tool
|
||||
description: Web automation tool that integrates Stagehand with CrewAI for browser interaction and automation
|
||||
icon: hand
|
||||
---
|
||||
|
||||
|
||||
# Overview
|
||||
|
||||
The `StagehandTool` integrates the [Stagehand](https://docs.stagehand.dev/get_started/introduction) framework with CrewAI, enabling agents to interact with websites and automate browser tasks using natural language instructions.
|
||||
|
||||
## Overview
|
||||
|
||||
Stagehand is a powerful browser automation framework built by Browserbase that allows AI agents to:
|
||||
|
||||
- Navigate to websites
|
||||
- Click buttons, links, and other elements
|
||||
- Fill in forms
|
||||
- Extract data from web pages
|
||||
- Observe and identify elements
|
||||
- Perform complex workflows
|
||||
|
||||
The StagehandTool wraps the Stagehand Python SDK to provide CrewAI agents with browser control capabilities through three core primitives:
|
||||
|
||||
1. **Act**: Perform actions like clicking, typing, or navigating
|
||||
2. **Extract**: Extract structured data from web pages
|
||||
3. **Observe**: Identify and analyze elements on the page
|
||||
|
||||
## Prerequisites
|
||||
|
||||
Before using this tool, ensure you have:
|
||||
|
||||
1. A [Browserbase](https://www.browserbase.com/) account with API key and project ID
|
||||
2. An API key for an LLM (OpenAI or Anthropic Claude)
|
||||
3. The Stagehand Python SDK installed
|
||||
|
||||
Install the required dependency:
|
||||
|
||||
```bash
|
||||
pip install stagehand-py
|
||||
```
|
||||
|
||||
## Usage
|
||||
|
||||
### Basic Implementation
|
||||
|
||||
The StagehandTool can be implemented in two ways:
|
||||
|
||||
#### 1. Using Context Manager (Recommended)
|
||||
<Tip>
|
||||
The context manager approach is recommended as it ensures proper cleanup of resources even if exceptions occur.
|
||||
</Tip>
|
||||
|
||||
```python
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools import StagehandTool
|
||||
from stagehand.schemas import AvailableModel
|
||||
|
||||
# Initialize the tool with your API keys using a context manager
|
||||
with StagehandTool(
|
||||
api_key="your-browserbase-api-key",
|
||||
project_id="your-browserbase-project-id",
|
||||
model_api_key="your-llm-api-key", # OpenAI or Anthropic API key
|
||||
model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST, # Optional: specify which model to use
|
||||
) as stagehand_tool:
|
||||
# Create an agent with the tool
|
||||
researcher = Agent(
|
||||
role="Web Researcher",
|
||||
goal="Find and summarize information from websites",
|
||||
backstory="I'm an expert at finding information online.",
|
||||
verbose=True,
|
||||
tools=[stagehand_tool],
|
||||
)
|
||||
|
||||
# Create a task that uses the tool
|
||||
research_task = Task(
|
||||
description="Go to https://www.example.com and tell me what you see on the homepage.",
|
||||
agent=researcher,
|
||||
)
|
||||
|
||||
# Run the crew
|
||||
crew = Crew(
|
||||
agents=[researcher],
|
||||
tasks=[research_task],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
result = crew.kickoff()
|
||||
print(result)
|
||||
```
|
||||
|
||||
#### 2. Manual Resource Management
|
||||
|
||||
```python
|
||||
from crewai import Agent, Task, Crew
|
||||
from crewai_tools import StagehandTool
|
||||
from stagehand.schemas import AvailableModel
|
||||
|
||||
# Initialize the tool with your API keys
|
||||
stagehand_tool = StagehandTool(
|
||||
api_key="your-browserbase-api-key",
|
||||
project_id="your-browserbase-project-id",
|
||||
model_api_key="your-llm-api-key",
|
||||
model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,
|
||||
)
|
||||
|
||||
try:
|
||||
# Create an agent with the tool
|
||||
researcher = Agent(
|
||||
role="Web Researcher",
|
||||
goal="Find and summarize information from websites",
|
||||
backstory="I'm an expert at finding information online.",
|
||||
verbose=True,
|
||||
tools=[stagehand_tool],
|
||||
)
|
||||
|
||||
# Create a task that uses the tool
|
||||
research_task = Task(
|
||||
description="Go to https://www.example.com and tell me what you see on the homepage.",
|
||||
agent=researcher,
|
||||
)
|
||||
|
||||
# Run the crew
|
||||
crew = Crew(
|
||||
agents=[researcher],
|
||||
tasks=[research_task],
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
result = crew.kickoff()
|
||||
print(result)
|
||||
finally:
|
||||
# Explicitly clean up resources
|
||||
stagehand_tool.close()
|
||||
```
|
||||
|
||||
## Command Types
|
||||
|
||||
The StagehandTool supports three different command types for specific web automation tasks:
|
||||
|
||||
### 1. Act Command
|
||||
|
||||
The `act` command type (default) enables webpage interactions like clicking buttons, filling forms, and navigation.
|
||||
|
||||
```python
|
||||
# Perform an action (default behavior)
|
||||
result = stagehand_tool.run(
|
||||
instruction="Click the login button",
|
||||
url="https://example.com",
|
||||
command_type="act" # Default, so can be omitted
|
||||
)
|
||||
|
||||
# Fill out a form
|
||||
result = stagehand_tool.run(
|
||||
instruction="Fill the contact form with name 'John Doe', email 'john@example.com', and message 'Hello world'",
|
||||
url="https://example.com/contact"
|
||||
)
|
||||
```
|
||||
|
||||
### 2. Extract Command
|
||||
|
||||
The `extract` command type retrieves structured data from webpages.
|
||||
|
||||
```python
|
||||
# Extract all product information
|
||||
result = stagehand_tool.run(
|
||||
instruction="Extract all product names, prices, and descriptions",
|
||||
url="https://example.com/products",
|
||||
command_type="extract"
|
||||
)
|
||||
|
||||
# Extract specific information with a selector
|
||||
result = stagehand_tool.run(
|
||||
instruction="Extract the main article title and content",
|
||||
url="https://example.com/blog/article",
|
||||
command_type="extract",
|
||||
selector=".article-container" # Optional CSS selector
|
||||
)
|
||||
```
|
||||
|
||||
### 3. Observe Command
|
||||
|
||||
The `observe` command type identifies and analyzes webpage elements.
|
||||
|
||||
```python
|
||||
# Find interactive elements
|
||||
result = stagehand_tool.run(
|
||||
instruction="Find all interactive elements in the navigation menu",
|
||||
url="https://example.com",
|
||||
command_type="observe"
|
||||
)
|
||||
|
||||
# Identify form fields
|
||||
result = stagehand_tool.run(
|
||||
instruction="Identify all the input fields in the registration form",
|
||||
url="https://example.com/register",
|
||||
command_type="observe",
|
||||
selector="#registration-form"
|
||||
)
|
||||
```
|
||||
|
||||
## Configuration Options
|
||||
|
||||
Customize the StagehandTool behavior with these parameters:
|
||||
|
||||
```python
|
||||
stagehand_tool = StagehandTool(
|
||||
api_key="your-browserbase-api-key",
|
||||
project_id="your-browserbase-project-id",
|
||||
model_api_key="your-llm-api-key",
|
||||
model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,
|
||||
dom_settle_timeout_ms=5000, # Wait longer for DOM to settle
|
||||
headless=True, # Run browser in headless mode
|
||||
self_heal=True, # Attempt to recover from errors
|
||||
wait_for_captcha_solves=True, # Wait for CAPTCHA solving
|
||||
verbose=1, # Control logging verbosity (0-3)
|
||||
)
|
||||
```
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Be Specific**: Provide detailed instructions for better results
|
||||
2. **Choose Appropriate Command Type**: Select the right command type for your task
|
||||
3. **Use Selectors**: Leverage CSS selectors to improve accuracy
|
||||
4. **Break Down Complex Tasks**: Split complex workflows into multiple tool calls
|
||||
5. **Implement Error Handling**: Add error handling for potential issues
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
|
||||
Common issues and solutions:
|
||||
|
||||
- **Session Issues**: Verify API keys for both Browserbase and LLM provider
|
||||
- **Element Not Found**: Increase `dom_settle_timeout_ms` for slower pages
|
||||
- **Action Failures**: Use `observe` to identify correct elements first
|
||||
- **Incomplete Data**: Refine instructions or provide specific selectors
|
||||
|
||||
|
||||
## Additional Resources
|
||||
|
||||
For questions about the CrewAI integration:
|
||||
- Join Stagehand's [Slack community](https://stagehand.dev/slack)
|
||||
- Open an issue in the [Stagehand repository](https://github.com/browserbase/stagehand)
|
||||
- Visit [Stagehand documentation](https://docs.stagehand.dev/)
|
||||
Reference in New Issue
Block a user