Compare commits

...

10 Commits

Author SHA1 Message Date
Brandon Hancock
21f4b60754 Include embedding type fix 2025-03-20 08:55:30 -04:00
Brandon Hancock (bhancock_ai)
216ff4aa6f Merge branch 'main' into embedding-config-typing 2025-03-20 08:47:33 -04:00
Vini Brasil
fe0813e831 Improve MethodExecutionFailedEvent.error typing (#2401) 2025-03-18 12:52:23 -04:00
Brandon Hancock (bhancock_ai)
33cebea15b spelling and tab fix (#2394) 2025-03-17 16:31:23 -04:00
João Moura
e723e5ca3f preparign new version 2025-03-17 09:13:21 -07:00
Jakub Kopecký
24f1a19310 feat: add docs for ApifyActorsTool (#2254)
* add docs for ApifyActorsTool

* improve readme, add link to template

* format

* improve tool docs

* improve readme

* Update apifyactorstool.mdx (#1)

* Update apifyactorstool.mdx

* Update apifyactorstool.mdx

* dans suggestions

* custom apify icon

* update descripton

* Update apifyactorstool.mdx

---------

Co-authored-by: Jan Čurn <jan.curn@gmail.com>
Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com>
2025-03-16 12:29:57 -04:00
Nick Fujita
6f849c0e6d 'added docs for config based on agent review' 2025-02-20 18:11:16 +09:00
Nick Fujita
276f661e6c 'add specific providers to provider type' 2025-02-20 18:02:36 +09:00
Nick Fujita
8f99caf61b 'type cleanup' 2025-02-20 17:58:46 +09:00
Nick Fujita
f4642f11cc 'add typings to embedding configurator input arg' 2025-02-20 17:52:13 +09:00
17 changed files with 252 additions and 65 deletions

View File

@@ -106,6 +106,7 @@ Here is a list of the available tools and their descriptions:
| Tool | Description |
| :------------------------------- | :--------------------------------------------------------------------------------------------- |
| **ApifyActorsTool** | A tool that integrates Apify Actors with your workflows for web scraping and automation tasks. |
| **BrowserbaseLoadTool** | A tool for interacting with and extracting data from web browsers. |
| **CodeDocsSearchTool** | A RAG tool optimized for searching through code documentation and related technical documents. |
| **CodeInterpreterTool** | A tool for interpreting python code. |

View File

@@ -115,6 +115,7 @@
"concepts/testing",
"concepts/cli",
"concepts/tools",
"concepts/event-listener",
"concepts/langchain-tools",
"concepts/llamaindex-tools"
]
@@ -154,6 +155,7 @@
"group": "Tools",
"pages": [
"tools/aimindtool",
"tools/apifyactorstool",
"tools/bravesearchtool",
"tools/browserbaseloadtool",
"tools/codedocssearchtool",
@@ -220,4 +222,4 @@
"linkedin": "https://www.linkedin.com/company/crewai-inc",
"youtube": "https://youtube.com/@crewAIInc"
}
}
}

View File

@@ -0,0 +1,99 @@
---
title: Apify Actors
description: "`ApifyActorsTool` lets you call Apify Actors to provide your CrewAI workflows with web scraping, crawling, data extraction, and web automation capabilities."
# hack to use custom Apify icon
icon: "); -webkit-mask-image: url('https://upload.wikimedia.org/wikipedia/commons/a/ae/Apify.svg');/*"
---
# `ApifyActorsTool`
Integrate [Apify Actors](https://apify.com/actors) into your CrewAI workflows.
## Description
The `ApifyActorsTool` connects [Apify Actors](https://apify.com/actors), cloud-based programs for web scraping and automation, to your CrewAI workflows.
Use any of the 4,000+ Actors on [Apify Store](https://apify.com/store) for use cases such as extracting data from social media, search engines, online maps, e-commerce sites, travel portals, or general websites.
For details, see the [Apify CrewAI integration](https://docs.apify.com/platform/integrations/crewai) in Apify documentation.
## Steps to get started
<Steps>
<Step title="Install dependencies">
Install `crewai[tools]` and `langchain-apify` using pip: `pip install 'crewai[tools]' langchain-apify`.
</Step>
<Step title="Obtain an Apify API token">
Sign up to [Apify Console](https://console.apify.com/) and get your [Apify API token](https://console.apify.com/settings/integrations)..
</Step>
<Step title="Configure environment">
Set your Apify API token as the `APIFY_API_TOKEN` environment variable to enable the tool's functionality.
</Step>
</Steps>
## Usage example
Use the `ApifyActorsTool` manually to run the [RAG Web Browser Actor](https://apify.com/apify/rag-web-browser) to perform a web search:
```python
from crewai_tools import ApifyActorsTool
# Initialize the tool with an Apify Actor
tool = ApifyActorsTool(actor_name="apify/rag-web-browser")
# Run the tool with input parameters
results = tool.run(run_input={"query": "What is CrewAI?", "maxResults": 5})
# Process the results
for result in results:
print(f"URL: {result['metadata']['url']}")
print(f"Content: {result.get('markdown', 'N/A')[:100]}...")
```
### Expected output
Here is the output from running the code above:
```text
URL: https://www.example.com/crewai-intro
Content: CrewAI is a framework for building AI-powered workflows...
URL: https://docs.crewai.com/
Content: Official documentation for CrewAI...
```
The `ApifyActorsTool` automatically fetches the Actor definition and input schema from Apify using the provided `actor_name` and then constructs the tool description and argument schema. This means you need to specify only a valid `actor_name`, and the tool handles the rest when used with agents—no need to specify the `run_input`. Here's how it works:
```python
from crewai import Agent
from crewai_tools import ApifyActorsTool
rag_browser = ApifyActorsTool(actor_name="apify/rag-web-browser")
agent = Agent(
role="Research Analyst",
goal="Find and summarize information about specific topics",
backstory="You are an experienced researcher with attention to detail",
tools=[rag_browser],
)
```
You can run other Actors from [Apify Store](https://apify.com/store) simply by changing the `actor_name` and, when using it manually, adjusting the `run_input` based on the Actor input schema.
For an example of usage with agents, see the [CrewAI Actor template](https://apify.com/templates/python-crewai).
## Configuration
The `ApifyActorsTool` requires these inputs to work:
- **`actor_name`**
The ID of the Apify Actor to run, e.g., `"apify/rag-web-browser"`. Browse all Actors on [Apify Store](https://apify.com/store).
- **`run_input`**
A dictionary of input parameters for the Actor when running the tool manually.
- For example, for the `apify/rag-web-browser` Actor: `{"query": "search term", "maxResults": 5}`
- See the Actor's [input schema](https://apify.com/apify/rag-web-browser/input-schema) for the list of input parameters.
## Resources
- **[Apify](https://apify.com/)**: Explore the Apify platform.
- **[How to build an AI agent on Apify](https://blog.apify.com/how-to-build-an-ai-agent/)** - A complete step-by-step guide to creating, publishing, and monetizing AI agents on the Apify platform.
- **[RAG Web Browser Actor](https://apify.com/apify/rag-web-browser)**: A popular Actor for web search for LLMs.
- **[CrewAI Integration Guide](https://docs.apify.com/platform/integrations/crewai)**: Follow the official guide for integrating Apify and CrewAI.

View File

@@ -1,6 +1,6 @@
[project]
name = "crewai"
version = "0.105.0"
version = "0.108.0"
description = "Cutting-edge framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks."
readme = "README.md"
requires-python = ">=3.10,<3.13"

View File

@@ -14,7 +14,7 @@ warnings.filterwarnings(
category=UserWarning,
module="pydantic.main",
)
__version__ = "0.105.0"
__version__ = "0.108.0"
__all__ = [
"Agent",
"Crew",

View File

@@ -20,6 +20,7 @@ from crewai.tools.agent_tools.agent_tools import AgentTools
from crewai.utilities import Converter, Prompts
from crewai.utilities.constants import TRAINED_AGENTS_DATA_FILE, TRAINING_DATA_FILE
from crewai.utilities.converter import generate_model_description
from crewai.utilities.embedding_configurator import EmbeddingConfig
from crewai.utilities.events.agent_events import (
AgentExecutionCompletedEvent,
AgentExecutionErrorEvent,
@@ -108,7 +109,7 @@ class Agent(BaseAgent):
default="safe",
description="Mode for code execution: 'safe' (using Docker) or 'unsafe' (direct execution).",
)
embedder: Optional[Dict[str, Any]] = Field(
embedder: Optional[EmbeddingConfig] = Field(
default=None,
description="Embedder configuration for the agent.",
)
@@ -134,7 +135,7 @@ class Agent(BaseAgent):
self.cache_handler = CacheHandler()
self.set_cache_handler(self.cache_handler)
def set_knowledge(self, crew_embedder: Optional[Dict[str, Any]] = None):
def set_knowledge(self, crew_embedder: Optional[EmbeddingConfig] = None):
try:
if self.embedder is None and crew_embedder:
self.embedder = crew_embedder

View File

@@ -25,6 +25,7 @@ from crewai.tools.base_tool import BaseTool, Tool
from crewai.utilities import I18N, Logger, RPMController
from crewai.utilities.config import process_config
from crewai.utilities.converter import Converter
from crewai.utilities.embedding_configurator import EmbeddingConfig
T = TypeVar("T", bound="BaseAgent")
@@ -362,5 +363,5 @@ class BaseAgent(ABC, BaseModel):
self._rpm_controller = rpm_controller
self.create_agent_executor()
def set_knowledge(self, crew_embedder: Optional[Dict[str, Any]] = None):
def set_knowledge(self, crew_embedder: Optional[EmbeddingConfig] = None):
pass

View File

@@ -5,7 +5,7 @@ description = "{{name}} using crewAI"
authors = [{ name = "Your Name", email = "you@example.com" }]
requires-python = ">=3.10,<3.13"
dependencies = [
"crewai[tools]>=0.105.0,<1.0.0"
"crewai[tools]>=0.108.0,<1.0.0"
]
[project.scripts]

View File

@@ -5,7 +5,7 @@ description = "{{name}} using crewAI"
authors = [{ name = "Your Name", email = "you@example.com" }]
requires-python = ">=3.10,<3.13"
dependencies = [
"crewai[tools]>=0.105.0,<1.0.0",
"crewai[tools]>=0.108.0,<1.0.0",
]
[project.scripts]

View File

@@ -5,7 +5,7 @@ description = "Power up your crews with {{folder_name}}"
readme = "README.md"
requires-python = ">=3.10,<3.13"
dependencies = [
"crewai[tools]>=0.105.0"
"crewai[tools]>=0.108.0"
]
[tool.crewai]

View File

@@ -41,6 +41,7 @@ from crewai.tools.base_tool import Tool
from crewai.types.usage_metrics import UsageMetrics
from crewai.utilities import I18N, FileHandler, Logger, RPMController
from crewai.utilities.constants import TRAINING_DATA_FILE
from crewai.utilities.embedding_configurator import EmbeddingConfig
from crewai.utilities.evaluators.crew_evaluator_handler import CrewEvaluator
from crewai.utilities.evaluators.task_evaluator import TaskEvaluator
from crewai.utilities.events.crew_events import (
@@ -145,7 +146,7 @@ class Crew(BaseModel):
default=None,
description="An instance of the UserMemory to be used by the Crew to store/fetch memories of a specific user.",
)
embedder: Optional[dict] = Field(
embedder: Optional[EmbeddingConfig] = Field(
default=None,
description="Configuration for the embedder to be used for the crew.",
)

View File

@@ -5,6 +5,7 @@ from pydantic import BaseModel, ConfigDict, Field
from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
from crewai.utilities.embedding_configurator import EmbeddingConfig
os.environ["TOKENIZERS_PARALLELISM"] = "false" # removes logging from fastembed
@@ -21,14 +22,14 @@ class Knowledge(BaseModel):
sources: List[BaseKnowledgeSource] = Field(default_factory=list)
model_config = ConfigDict(arbitrary_types_allowed=True)
storage: Optional[KnowledgeStorage] = Field(default=None)
embedder: Optional[Dict[str, Any]] = None
embedder: Optional[EmbeddingConfig] = None
collection_name: Optional[str] = None
def __init__(
self,
collection_name: str,
sources: List[BaseKnowledgeSource],
embedder: Optional[Dict[str, Any]] = None,
embedder: Optional[EmbeddingConfig] = None,
storage: Optional[KnowledgeStorage] = None,
**data,
):

View File

@@ -15,6 +15,7 @@ from chromadb.config import Settings
from crewai.knowledge.storage.base_knowledge_storage import BaseKnowledgeStorage
from crewai.utilities import EmbeddingConfigurator
from crewai.utilities.constants import KNOWLEDGE_DIRECTORY
from crewai.utilities.embedding_configurator import EmbeddingConfig
from crewai.utilities.logger import Logger
from crewai.utilities.paths import db_storage_path
@@ -48,7 +49,7 @@ class KnowledgeStorage(BaseKnowledgeStorage):
def __init__(
self,
embedder: Optional[Dict[str, Any]] = None,
embedder: Optional[EmbeddingConfig] = None,
collection_name: Optional[str] = None,
):
self.collection_name = collection_name
@@ -187,7 +188,7 @@ class KnowledgeStorage(BaseKnowledgeStorage):
api_key=os.getenv("OPENAI_API_KEY"), model_name="text-embedding-3-small"
)
def _set_embedder_config(self, embedder: Optional[Dict[str, Any]] = None) -> None:
def _set_embedder_config(self, embedder: Optional[EmbeddingConfig] = None) -> None:
"""Set the embedding configuration for the knowledge storage.
Args:

View File

@@ -1,8 +1,84 @@
import os
from typing import Any, Dict, Optional, cast
from typing import Any, Callable, Literal, cast
from chromadb import Documents, EmbeddingFunction, Embeddings
from chromadb.api.types import validate_embedding_function
from pydantic import BaseModel
class EmbeddingProviderConfig(BaseModel):
"""Configuration model for embedding providers.
Attributes:
# Core Model Configuration
model (str | None): The model identifier for embeddings, used across multiple providers
like OpenAI, Azure, Watson, etc.
embedder (str | Callable | None): Custom embedding function or callable for custom
embedding implementations.
# API Authentication & Configuration
api_key (str | None): Authentication key for various providers (OpenAI, VertexAI,
Google, Cohere, VoyageAI, Watson).
api_base (str | None): Base API URL override for OpenAI and Azure services.
api_type (str | None): API type specification, particularly used for Azure configuration.
api_version (str | None): API version for OpenAI and Azure services.
api_url (str | None): API endpoint URL, used by HuggingFace and Watson services.
url (str | None): Base URL for the embedding service, primarily used for Ollama and
HuggingFace endpoints.
# Service-Specific Configuration
project_id (str | None): Project identifier used by VertexAI and Watson services.
organization_id (str | None): Organization identifier for OpenAI and Azure services.
deployment_id (str | None): Deployment identifier for OpenAI and Azure services.
region (str | None): Geographic region for VertexAI services.
session (str | None): Session configuration for Amazon Bedrock embeddings.
# Request Configuration
task_type (str | None): Specifies the task type for Google Generative AI embeddings.
default_headers (str | None): Custom headers for OpenAI and Azure API requests.
dimensions (str | None): Output dimensions specification for OpenAI and Azure embeddings.
"""
# Core Model Configuration
model: str | None = None
embedder: str | Callable | None = None
# API Authentication & Configuration
api_key: str | None = None
api_base: str | None = None
api_type: str | None = None
api_version: str | None = None
api_url: str | None = None
url: str | None = None
# Service-Specific Configuration
project_id: str | None = None
organization_id: str | None = None
deployment_id: str | None = None
region: str | None = None
session: str | None = None
# Request Configuration
task_type: str | None = None
default_headers: str | None = None
dimensions: str | None = None
class EmbeddingConfig(BaseModel):
provider: Literal[
"openai",
"azure",
"ollama",
"vertexai",
"google",
"cohere",
"voyageai",
"bedrock",
"huggingface",
"watson",
"custom",
]
config: EmbeddingProviderConfig | None = None
class EmbeddingConfigurator:
@@ -23,15 +99,19 @@ class EmbeddingConfigurator:
def configure_embedder(
self,
embedder_config: Optional[Dict[str, Any]] = None,
embedder_config: EmbeddingConfig | None = None,
) -> EmbeddingFunction:
"""Configures and returns an embedding function based on the provided config."""
if embedder_config is None:
return self._create_default_embedding_function()
provider = embedder_config.get("provider")
config = embedder_config.get("config", {})
model_name = config.get("model") if provider != "custom" else None
provider = embedder_config.provider
config = (
embedder_config.config
if embedder_config.config
else EmbeddingProviderConfig()
)
model_name = config.model if provider != "custom" else None
if provider not in self.embedding_functions:
raise Exception(
@@ -56,123 +136,123 @@ class EmbeddingConfigurator:
)
@staticmethod
def _configure_openai(config, model_name):
def _configure_openai(config: EmbeddingProviderConfig, model_name: str):
from chromadb.utils.embedding_functions.openai_embedding_function import (
OpenAIEmbeddingFunction,
)
return OpenAIEmbeddingFunction(
api_key=config.get("api_key") or os.getenv("OPENAI_API_KEY"),
api_key=config.api_key or os.getenv("OPENAI_API_KEY"),
model_name=model_name,
api_base=config.get("api_base", None),
api_type=config.get("api_type", None),
api_version=config.get("api_version", None),
default_headers=config.get("default_headers", None),
dimensions=config.get("dimensions", None),
deployment_id=config.get("deployment_id", None),
organization_id=config.get("organization_id", None),
api_base=config.api_base,
api_type=config.api_type,
api_version=config.api_version,
default_headers=config.default_headers,
dimensions=config.dimensions,
deployment_id=config.deployment_id,
organization_id=config.organization_id,
)
@staticmethod
def _configure_azure(config, model_name):
def _configure_azure(config: EmbeddingProviderConfig, model_name: str):
from chromadb.utils.embedding_functions.openai_embedding_function import (
OpenAIEmbeddingFunction,
)
return OpenAIEmbeddingFunction(
api_key=config.get("api_key"),
api_base=config.get("api_base"),
api_type=config.get("api_type", "azure"),
api_version=config.get("api_version"),
api_key=config.api_key,
api_base=config.api_base,
api_type=config.api_type if config.api_type else "azure",
api_version=config.api_version,
model_name=model_name,
default_headers=config.get("default_headers"),
dimensions=config.get("dimensions"),
deployment_id=config.get("deployment_id"),
organization_id=config.get("organization_id"),
default_headers=config.default_headers,
dimensions=config.dimensions,
deployment_id=config.deployment_id,
organization_id=config.organization_id,
)
@staticmethod
def _configure_ollama(config, model_name):
def _configure_ollama(config: EmbeddingProviderConfig, model_name: str):
from chromadb.utils.embedding_functions.ollama_embedding_function import (
OllamaEmbeddingFunction,
)
return OllamaEmbeddingFunction(
url=config.get("url", "http://localhost:11434/api/embeddings"),
url=config.url if config.url else "http://localhost:11434/api/embeddings",
model_name=model_name,
)
@staticmethod
def _configure_vertexai(config, model_name):
def _configure_vertexai(config: EmbeddingProviderConfig, model_name: str):
from chromadb.utils.embedding_functions.google_embedding_function import (
GoogleVertexEmbeddingFunction,
)
return GoogleVertexEmbeddingFunction(
model_name=model_name,
api_key=config.get("api_key"),
project_id=config.get("project_id"),
region=config.get("region"),
api_key=config.api_key,
project_id=config.project_id,
region=config.region,
)
@staticmethod
def _configure_google(config, model_name):
def _configure_google(config: EmbeddingProviderConfig, model_name: str):
from chromadb.utils.embedding_functions.google_embedding_function import (
GoogleGenerativeAiEmbeddingFunction,
)
return GoogleGenerativeAiEmbeddingFunction(
model_name=model_name,
api_key=config.get("api_key"),
task_type=config.get("task_type"),
api_key=config.api_key,
task_type=config.task_type,
)
@staticmethod
def _configure_cohere(config, model_name):
def _configure_cohere(config: EmbeddingProviderConfig, model_name: str):
from chromadb.utils.embedding_functions.cohere_embedding_function import (
CohereEmbeddingFunction,
)
return CohereEmbeddingFunction(
model_name=model_name,
api_key=config.get("api_key"),
api_key=config.api_key,
)
@staticmethod
def _configure_voyageai(config, model_name):
def _configure_voyageai(config: EmbeddingProviderConfig, model_name: str):
from chromadb.utils.embedding_functions.voyageai_embedding_function import (
VoyageAIEmbeddingFunction,
)
return VoyageAIEmbeddingFunction(
model_name=model_name,
api_key=config.get("api_key"),
api_key=config.api_key,
)
@staticmethod
def _configure_bedrock(config, model_name):
def _configure_bedrock(config: EmbeddingProviderConfig, model_name: str):
from chromadb.utils.embedding_functions.amazon_bedrock_embedding_function import (
AmazonBedrockEmbeddingFunction,
)
# Allow custom model_name override with backwards compatibility
kwargs = {"session": config.get("session")}
kwargs = {"session": config.session}
if model_name is not None:
kwargs["model_name"] = model_name
return AmazonBedrockEmbeddingFunction(**kwargs)
@staticmethod
def _configure_huggingface(config, model_name):
def _configure_huggingface(config: EmbeddingProviderConfig, model_name: str):
from chromadb.utils.embedding_functions.huggingface_embedding_function import (
HuggingFaceEmbeddingServer,
)
return HuggingFaceEmbeddingServer(
url=config.get("api_url"),
url=config.api_url,
)
@staticmethod
def _configure_watson(config, model_name):
def _configure_watson(config: EmbeddingProviderConfig, model_name: str):
try:
import ibm_watsonx_ai.foundation_models as watson_models
from ibm_watsonx_ai import Credentials
@@ -193,12 +273,10 @@ class EmbeddingConfigurator:
}
embedding = watson_models.Embeddings(
model_id=config.get("model"),
model_id=config.model,
params=embed_params,
credentials=Credentials(
api_key=config.get("api_key"), url=config.get("api_url")
),
project_id=config.get("project_id"),
credentials=Credentials(api_key=config.api_key, url=config.api_url),
project_id=config.project_id,
)
try:
@@ -211,8 +289,8 @@ class EmbeddingConfigurator:
return WatsonEmbeddingFunction()
@staticmethod
def _configure_custom(config):
custom_embedder = config.get("embedder")
def _configure_custom(config: EmbeddingProviderConfig):
custom_embedder = config.embedder
if isinstance(custom_embedder, EmbeddingFunction):
try:
validate_embedding_function(custom_embedder)

View File

@@ -1,6 +1,6 @@
from typing import Any, Dict, Optional, Union
from pydantic import BaseModel
from pydantic import BaseModel, ConfigDict
from .base_events import CrewEvent
@@ -52,9 +52,11 @@ class MethodExecutionFailedEvent(FlowEvent):
flow_name: str
method_name: str
error: Any
error: Exception
type: str = "method_execution_failed"
model_config = ConfigDict(arbitrary_types_allowed=True)
class FlowFinishedEvent(FlowEvent):
"""Event emitted when a flow completes execution"""

2
uv.lock generated
View File

@@ -619,7 +619,7 @@ wheels = [
[[package]]
name = "crewai"
version = "0.105.0"
version = "0.108.0"
source = { editable = "." }
dependencies = [
{ name = "appdirs" },