mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-01-22 14:48:13 +00:00
Compare commits
22 Commits
devin/1768
...
gl/feat/na
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
0a250a45ce | ||
|
|
1353cb2a33 | ||
|
|
5550c6df7e | ||
|
|
204a1cece7 | ||
|
|
4c0d99601c | ||
|
|
e2c517d0a2 | ||
|
|
af4523b2a1 | ||
|
|
1fe020fa6f | ||
|
|
b035aa8947 | ||
|
|
4ed5e4ca0e | ||
|
|
771eccfcdf | ||
|
|
50728b10e8 | ||
|
|
42ca4eacff | ||
|
|
d8ebfe7ee0 | ||
|
|
8cf0cfa2b7 | ||
|
|
3ad0af4934 | ||
|
|
56946d309b | ||
|
|
5200ed4372 | ||
|
|
301a1da047 | ||
|
|
22f1e21d69 | ||
|
|
741bf12bf4 | ||
|
|
b267bb4054 |
@@ -375,10 +375,13 @@ In this section, you'll find detailed examples that help you select, configure,
|
||||
GOOGLE_API_KEY=<your-api-key>
|
||||
GEMINI_API_KEY=<your-api-key>
|
||||
|
||||
# Optional - for Vertex AI
|
||||
# For Vertex AI Express mode (API key authentication)
|
||||
GOOGLE_GENAI_USE_VERTEXAI=true
|
||||
GOOGLE_API_KEY=<your-api-key>
|
||||
|
||||
# For Vertex AI with service account
|
||||
GOOGLE_CLOUD_PROJECT=<your-project-id>
|
||||
GOOGLE_CLOUD_LOCATION=<location> # Defaults to us-central1
|
||||
GOOGLE_GENAI_USE_VERTEXAI=true # Set to use Vertex AI
|
||||
```
|
||||
|
||||
**Basic Usage:**
|
||||
@@ -412,7 +415,35 @@ In this section, you'll find detailed examples that help you select, configure,
|
||||
)
|
||||
```
|
||||
|
||||
**Vertex AI Configuration:**
|
||||
**Vertex AI Express Mode (API Key Authentication):**
|
||||
|
||||
Vertex AI Express mode allows you to use Vertex AI with simple API key authentication instead of service account credentials. This is the quickest way to get started with Vertex AI.
|
||||
|
||||
To enable Express mode, set both environment variables in your `.env` file:
|
||||
```toml .env
|
||||
GOOGLE_GENAI_USE_VERTEXAI=true
|
||||
GOOGLE_API_KEY=<your-api-key>
|
||||
```
|
||||
|
||||
Then use the LLM as usual:
|
||||
```python Code
|
||||
from crewai import LLM
|
||||
|
||||
llm = LLM(
|
||||
model="gemini/gemini-2.0-flash",
|
||||
temperature=0.7
|
||||
)
|
||||
```
|
||||
|
||||
<Info>
|
||||
To get an Express mode API key:
|
||||
- New Google Cloud users: Get an [express mode API key](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart?usertype=apikey)
|
||||
- Existing Google Cloud users: Get a [Google Cloud API key bound to a service account](https://cloud.google.com/docs/authentication/api-keys)
|
||||
|
||||
For more details, see the [Vertex AI Express mode documentation](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart?usertype=apikey).
|
||||
</Info>
|
||||
|
||||
**Vertex AI Configuration (Service Account):**
|
||||
```python Code
|
||||
from crewai import LLM
|
||||
|
||||
@@ -424,10 +455,10 @@ In this section, you'll find detailed examples that help you select, configure,
|
||||
```
|
||||
|
||||
**Supported Environment Variables:**
|
||||
- `GOOGLE_API_KEY` or `GEMINI_API_KEY`: Your Google API key (required for Gemini API)
|
||||
- `GOOGLE_CLOUD_PROJECT`: Google Cloud project ID (for Vertex AI)
|
||||
- `GOOGLE_API_KEY` or `GEMINI_API_KEY`: Your Google API key (required for Gemini API and Vertex AI Express mode)
|
||||
- `GOOGLE_GENAI_USE_VERTEXAI`: Set to `true` to use Vertex AI (required for Express mode)
|
||||
- `GOOGLE_CLOUD_PROJECT`: Google Cloud project ID (for Vertex AI with service account)
|
||||
- `GOOGLE_CLOUD_LOCATION`: GCP location (defaults to `us-central1`)
|
||||
- `GOOGLE_GENAI_USE_VERTEXAI`: Set to `true` to use Vertex AI
|
||||
|
||||
**Features:**
|
||||
- Native function calling support for Gemini 1.5+ and 2.x models
|
||||
|
||||
@@ -107,7 +107,7 @@ CrewAI 코드 내에는 사용할 모델을 지정할 수 있는 여러 위치
|
||||
|
||||
## 공급자 구성 예시
|
||||
|
||||
CrewAI는 고유한 기능, 인증 방법, 모델 역량을 제공하는 다양한 LLM 공급자를 지원합니다.
|
||||
CrewAI는 고유한 기능, 인증 방법, 모델 역량을 제공하는 다양한 LLM 공급자를 지원합니다.
|
||||
이 섹션에서는 프로젝트의 요구에 가장 적합한 LLM을 선택, 구성, 최적화하는 데 도움이 되는 자세한 예시를 제공합니다.
|
||||
|
||||
<AccordionGroup>
|
||||
@@ -153,8 +153,8 @@ CrewAI는 고유한 기능, 인증 방법, 모델 역량을 제공하는 다양
|
||||
</Accordion>
|
||||
|
||||
<Accordion title="Meta-Llama">
|
||||
Meta의 Llama API는 Meta의 대형 언어 모델 패밀리 접근을 제공합니다.
|
||||
API는 [Meta Llama API](https://llama.developer.meta.com?utm_source=partner-crewai&utm_medium=website)에서 사용할 수 있습니다.
|
||||
Meta의 Llama API는 Meta의 대형 언어 모델 패밀리 접근을 제공합니다.
|
||||
API는 [Meta Llama API](https://llama.developer.meta.com?utm_source=partner-crewai&utm_medium=website)에서 사용할 수 있습니다.
|
||||
`.env` 파일에 다음 환경 변수를 설정하십시오:
|
||||
|
||||
```toml Code
|
||||
@@ -207,11 +207,20 @@ CrewAI는 고유한 기능, 인증 방법, 모델 역량을 제공하는 다양
|
||||
`.env` 파일에 API 키를 설정하십시오. 키가 필요하거나 기존 키를 찾으려면 [AI Studio](https://aistudio.google.com/apikey)를 확인하세요.
|
||||
|
||||
```toml .env
|
||||
# https://ai.google.dev/gemini-api/docs/api-key
|
||||
# Gemini API 사용 시 (다음 중 하나)
|
||||
GOOGLE_API_KEY=<your-api-key>
|
||||
GEMINI_API_KEY=<your-api-key>
|
||||
|
||||
# Vertex AI Express 모드 사용 시 (API 키 인증)
|
||||
GOOGLE_GENAI_USE_VERTEXAI=true
|
||||
GOOGLE_API_KEY=<your-api-key>
|
||||
|
||||
# Vertex AI 서비스 계정 사용 시
|
||||
GOOGLE_CLOUD_PROJECT=<your-project-id>
|
||||
GOOGLE_CLOUD_LOCATION=<location> # 기본값: us-central1
|
||||
```
|
||||
|
||||
CrewAI 프로젝트에서의 예시 사용법:
|
||||
**기본 사용법:**
|
||||
```python Code
|
||||
from crewai import LLM
|
||||
|
||||
@@ -221,6 +230,34 @@ CrewAI는 고유한 기능, 인증 방법, 모델 역량을 제공하는 다양
|
||||
)
|
||||
```
|
||||
|
||||
**Vertex AI Express 모드 (API 키 인증):**
|
||||
|
||||
Vertex AI Express 모드를 사용하면 서비스 계정 자격 증명 대신 간단한 API 키 인증으로 Vertex AI를 사용할 수 있습니다. Vertex AI를 시작하는 가장 빠른 방법입니다.
|
||||
|
||||
Express 모드를 활성화하려면 `.env` 파일에 두 환경 변수를 모두 설정하세요:
|
||||
```toml .env
|
||||
GOOGLE_GENAI_USE_VERTEXAI=true
|
||||
GOOGLE_API_KEY=<your-api-key>
|
||||
```
|
||||
|
||||
그런 다음 평소처럼 LLM을 사용하세요:
|
||||
```python Code
|
||||
from crewai import LLM
|
||||
|
||||
llm = LLM(
|
||||
model="gemini/gemini-2.0-flash",
|
||||
temperature=0.7
|
||||
)
|
||||
```
|
||||
|
||||
<Info>
|
||||
Express 모드 API 키를 받으려면:
|
||||
- 신규 Google Cloud 사용자: [Express 모드 API 키](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart?usertype=apikey) 받기
|
||||
- 기존 Google Cloud 사용자: [서비스 계정에 바인딩된 Google Cloud API 키](https://cloud.google.com/docs/authentication/api-keys) 받기
|
||||
|
||||
자세한 내용은 [Vertex AI Express 모드 문서](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart?usertype=apikey)를 참조하세요.
|
||||
</Info>
|
||||
|
||||
### Gemini 모델
|
||||
|
||||
Google은 다양한 용도에 최적화된 강력한 모델을 제공합니다.
|
||||
@@ -476,7 +513,7 @@ CrewAI는 고유한 기능, 인증 방법, 모델 역량을 제공하는 다양
|
||||
|
||||
<Accordion title="Local NVIDIA NIM Deployed using WSL2">
|
||||
|
||||
NVIDIA NIM을 이용하면 Windows 기기에서 WSL2(Windows Subsystem for Linux)를 통해 강력한 LLM을 로컬로 실행할 수 있습니다.
|
||||
NVIDIA NIM을 이용하면 Windows 기기에서 WSL2(Windows Subsystem for Linux)를 통해 강력한 LLM을 로컬로 실행할 수 있습니다.
|
||||
이 방식은 Nvidia GPU를 활용하여 프라이빗하고, 안전하며, 비용 효율적인 AI 추론을 클라우드 서비스에 의존하지 않고 구현할 수 있습니다.
|
||||
데이터 프라이버시, 오프라인 기능이 필요한 개발, 테스트, 또는 프로덕션 환경에 최적입니다.
|
||||
|
||||
@@ -954,4 +991,4 @@ LLM 설정을 최대한 활용하는 방법을 알아보세요:
|
||||
llm = LLM(model="openai/gpt-4o") # 128K tokens
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
</Tabs>
|
||||
|
||||
@@ -79,7 +79,7 @@ Existem diferentes locais no código do CrewAI onde você pode especificar o mod
|
||||
|
||||
# Configuração avançada com parâmetros detalhados
|
||||
llm = LLM(
|
||||
model="openai/gpt-4",
|
||||
model="openai/gpt-4",
|
||||
temperature=0.8,
|
||||
max_tokens=150,
|
||||
top_p=0.9,
|
||||
@@ -207,11 +207,20 @@ Nesta seção, você encontrará exemplos detalhados que ajudam a selecionar, co
|
||||
Defina sua chave de API no seu arquivo `.env`. Se precisar de uma chave, ou encontrar uma existente, verifique o [AI Studio](https://aistudio.google.com/apikey).
|
||||
|
||||
```toml .env
|
||||
# https://ai.google.dev/gemini-api/docs/api-key
|
||||
# Para API Gemini (uma das seguintes)
|
||||
GOOGLE_API_KEY=<your-api-key>
|
||||
GEMINI_API_KEY=<your-api-key>
|
||||
|
||||
# Para Vertex AI Express mode (autenticação por chave de API)
|
||||
GOOGLE_GENAI_USE_VERTEXAI=true
|
||||
GOOGLE_API_KEY=<your-api-key>
|
||||
|
||||
# Para Vertex AI com conta de serviço
|
||||
GOOGLE_CLOUD_PROJECT=<your-project-id>
|
||||
GOOGLE_CLOUD_LOCATION=<location> # Padrão: us-central1
|
||||
```
|
||||
|
||||
Exemplo de uso em seu projeto CrewAI:
|
||||
**Uso Básico:**
|
||||
```python Code
|
||||
from crewai import LLM
|
||||
|
||||
@@ -221,6 +230,34 @@ Nesta seção, você encontrará exemplos detalhados que ajudam a selecionar, co
|
||||
)
|
||||
```
|
||||
|
||||
**Vertex AI Express Mode (Autenticação por Chave de API):**
|
||||
|
||||
O Vertex AI Express mode permite usar o Vertex AI com autenticação simples por chave de API, em vez de credenciais de conta de serviço. Esta é a maneira mais rápida de começar com o Vertex AI.
|
||||
|
||||
Para habilitar o Express mode, defina ambas as variáveis de ambiente no seu arquivo `.env`:
|
||||
```toml .env
|
||||
GOOGLE_GENAI_USE_VERTEXAI=true
|
||||
GOOGLE_API_KEY=<your-api-key>
|
||||
```
|
||||
|
||||
Em seguida, use o LLM normalmente:
|
||||
```python Code
|
||||
from crewai import LLM
|
||||
|
||||
llm = LLM(
|
||||
model="gemini/gemini-2.0-flash",
|
||||
temperature=0.7
|
||||
)
|
||||
```
|
||||
|
||||
<Info>
|
||||
Para obter uma chave de API do Express mode:
|
||||
- Novos usuários do Google Cloud: Obtenha uma [chave de API do Express mode](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart?usertype=apikey)
|
||||
- Usuários existentes do Google Cloud: Obtenha uma [chave de API do Google Cloud vinculada a uma conta de serviço](https://cloud.google.com/docs/authentication/api-keys)
|
||||
|
||||
Para mais detalhes, consulte a [documentação do Vertex AI Express mode](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart?usertype=apikey).
|
||||
</Info>
|
||||
|
||||
### Modelos Gemini
|
||||
|
||||
O Google oferece uma variedade de modelos poderosos otimizados para diferentes casos de uso.
|
||||
@@ -823,7 +860,7 @@ Saiba como obter o máximo da configuração do seu LLM:
|
||||
Lembre-se de monitorar regularmente o uso de tokens e ajustar suas configurações para otimizar custos e desempenho.
|
||||
</Info>
|
||||
</Accordion>
|
||||
|
||||
|
||||
<Accordion title="Descartar Parâmetros Adicionais">
|
||||
O CrewAI usa Litellm internamente para chamadas LLM, permitindo descartar parâmetros adicionais desnecessários para seu caso de uso. Isso pode simplificar seu código e reduzir a complexidade da configuração do LLM.
|
||||
Por exemplo, se não precisar enviar o parâmetro <code>stop</code>, basta omiti-lo na chamada do LLM:
|
||||
@@ -882,4 +919,4 @@ Saiba como obter o máximo da configuração do seu LLM:
|
||||
llm = LLM(model="openai/gpt-4o") # 128K tokens
|
||||
```
|
||||
</Tab>
|
||||
</Tabs>
|
||||
</Tabs>
|
||||
|
||||
@@ -98,6 +98,13 @@ a2a = [
|
||||
"httpx-sse~=0.4.0",
|
||||
"aiocache[redis,memcached]~=0.12.3",
|
||||
]
|
||||
file-processing = [
|
||||
"Pillow~=10.4.0",
|
||||
"pypdf~=4.0.0",
|
||||
"python-magic>=0.4.27",
|
||||
"aiocache~=0.12.3",
|
||||
"aiofiles~=24.1.0",
|
||||
]
|
||||
|
||||
|
||||
[project.scripts]
|
||||
|
||||
@@ -6,6 +6,14 @@ import warnings
|
||||
from crewai.agent.core import Agent
|
||||
from crewai.crew import Crew
|
||||
from crewai.crews.crew_output import CrewOutput
|
||||
from crewai.files import (
|
||||
AudioFile,
|
||||
File,
|
||||
ImageFile,
|
||||
PDFFile,
|
||||
TextFile,
|
||||
VideoFile,
|
||||
)
|
||||
from crewai.flow.flow import Flow
|
||||
from crewai.knowledge.knowledge import Knowledge
|
||||
from crewai.llm import LLM
|
||||
@@ -74,14 +82,20 @@ _track_install_async()
|
||||
__all__ = [
|
||||
"LLM",
|
||||
"Agent",
|
||||
"AudioFile",
|
||||
"BaseLLM",
|
||||
"Crew",
|
||||
"CrewOutput",
|
||||
"File",
|
||||
"Flow",
|
||||
"ImageFile",
|
||||
"Knowledge",
|
||||
"LLMGuardrail",
|
||||
"PDFFile",
|
||||
"Process",
|
||||
"Task",
|
||||
"TaskOutput",
|
||||
"TextFile",
|
||||
"VideoFile",
|
||||
"__version__",
|
||||
]
|
||||
|
||||
@@ -1,7 +1,7 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
from collections.abc import Callable, Sequence
|
||||
from collections.abc import Callable, Coroutine, Sequence
|
||||
import shutil
|
||||
import subprocess
|
||||
import time
|
||||
@@ -34,6 +34,11 @@ from crewai.agents.agent_builder.base_agent import BaseAgent
|
||||
from crewai.agents.cache.cache_handler import CacheHandler
|
||||
from crewai.agents.crew_agent_executor import CrewAgentExecutor
|
||||
from crewai.events.event_bus import crewai_event_bus
|
||||
from crewai.events.types.agent_events import (
|
||||
LiteAgentExecutionCompletedEvent,
|
||||
LiteAgentExecutionErrorEvent,
|
||||
LiteAgentExecutionStartedEvent,
|
||||
)
|
||||
from crewai.events.types.knowledge_events import (
|
||||
KnowledgeQueryCompletedEvent,
|
||||
KnowledgeQueryFailedEvent,
|
||||
@@ -43,10 +48,10 @@ from crewai.events.types.memory_events import (
|
||||
MemoryRetrievalCompletedEvent,
|
||||
MemoryRetrievalStartedEvent,
|
||||
)
|
||||
from crewai.experimental.crew_agent_executor_flow import CrewAgentExecutorFlow
|
||||
from crewai.experimental.agent_executor import AgentExecutor
|
||||
from crewai.knowledge.knowledge import Knowledge
|
||||
from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
|
||||
from crewai.lite_agent import LiteAgent
|
||||
from crewai.lite_agent_output import LiteAgentOutput
|
||||
from crewai.llms.base_llm import BaseLLM
|
||||
from crewai.mcp import (
|
||||
MCPClient,
|
||||
@@ -64,15 +69,18 @@ from crewai.security.fingerprint import Fingerprint
|
||||
from crewai.tools.agent_tools.agent_tools import AgentTools
|
||||
from crewai.utilities.agent_utils import (
|
||||
get_tool_names,
|
||||
is_inside_event_loop,
|
||||
load_agent_from_repository,
|
||||
parse_tools,
|
||||
render_text_description_and_args,
|
||||
)
|
||||
from crewai.utilities.constants import TRAINED_AGENTS_DATA_FILE, TRAINING_DATA_FILE
|
||||
from crewai.utilities.converter import Converter
|
||||
from crewai.utilities.converter import Converter, ConverterError
|
||||
from crewai.utilities.guardrail import process_guardrail
|
||||
from crewai.utilities.guardrail_types import GuardrailType
|
||||
from crewai.utilities.llm_utils import create_llm
|
||||
from crewai.utilities.prompts import Prompts, StandardPromptResult, SystemPromptResult
|
||||
from crewai.utilities.pydantic_schema_utils import generate_model_description
|
||||
from crewai.utilities.token_counter_callback import TokenCalcHandler
|
||||
from crewai.utilities.training_handler import CrewTrainingHandler
|
||||
|
||||
@@ -80,18 +88,18 @@ from crewai.utilities.training_handler import CrewTrainingHandler
|
||||
try:
|
||||
from crewai.a2a.config import A2AClientConfig, A2AConfig, A2AServerConfig
|
||||
except ImportError:
|
||||
A2AClientConfig = Any
|
||||
A2AConfig = Any
|
||||
A2AServerConfig = Any
|
||||
A2AClientConfig = Any # type: ignore[assignment,misc]
|
||||
A2AConfig = Any # type: ignore[assignment,misc]
|
||||
A2AServerConfig = Any # type: ignore[assignment,misc]
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from crewai_tools import CodeInterpreterTool
|
||||
|
||||
from crewai.agents.agent_builder.base_agent import PlatformAppOrAction
|
||||
from crewai.lite_agent_output import LiteAgentOutput
|
||||
from crewai.task import Task
|
||||
from crewai.tools.base_tool import BaseTool
|
||||
from crewai.tools.structured_tool import CrewStructuredTool
|
||||
from crewai.utilities.types import LLMMessage
|
||||
|
||||
|
||||
@@ -113,7 +121,7 @@ class Agent(BaseAgent):
|
||||
The agent can also have memory, can operate in verbose mode, and can delegate tasks to other agents.
|
||||
|
||||
Attributes:
|
||||
agent_executor: An instance of the CrewAgentExecutor or CrewAgentExecutorFlow class.
|
||||
agent_executor: An instance of the CrewAgentExecutor or AgentExecutor class.
|
||||
role: The role of the agent.
|
||||
goal: The objective of the agent.
|
||||
backstory: The backstory of the agent.
|
||||
@@ -176,7 +184,8 @@ class Agent(BaseAgent):
|
||||
)
|
||||
multimodal: bool = Field(
|
||||
default=False,
|
||||
description="Whether the agent is multimodal.",
|
||||
deprecated=True,
|
||||
description="[DEPRECATED, will be removed in v2.0 - pass files natively.] Whether the agent is multimodal.",
|
||||
)
|
||||
inject_date: bool = Field(
|
||||
default=False,
|
||||
@@ -238,9 +247,9 @@ class Agent(BaseAgent):
|
||||
Can be a single A2AConfig/A2AClientConfig/A2AServerConfig, or a list of any number of A2AConfig/A2AClientConfig with a single A2AServerConfig.
|
||||
""",
|
||||
)
|
||||
executor_class: type[CrewAgentExecutor] | type[CrewAgentExecutorFlow] = Field(
|
||||
executor_class: type[CrewAgentExecutor] | type[AgentExecutor] = Field(
|
||||
default=CrewAgentExecutor,
|
||||
description="Class to use for the agent executor. Defaults to CrewAgentExecutor, can optionally use CrewAgentExecutorFlow.",
|
||||
description="Class to use for the agent executor. Defaults to CrewAgentExecutor, can optionally use AgentExecutor.",
|
||||
)
|
||||
|
||||
@model_validator(mode="before")
|
||||
@@ -1583,26 +1592,25 @@ class Agent(BaseAgent):
|
||||
)
|
||||
return None
|
||||
|
||||
def kickoff(
|
||||
def _prepare_kickoff(
|
||||
self,
|
||||
messages: str | list[LLMMessage],
|
||||
response_format: type[Any] | None = None,
|
||||
) -> LiteAgentOutput:
|
||||
"""
|
||||
Execute the agent with the given messages using a LiteAgent instance.
|
||||
) -> tuple[AgentExecutor, dict[str, str], dict[str, Any], list[CrewStructuredTool]]:
|
||||
"""Prepare common setup for kickoff execution.
|
||||
|
||||
This method is useful when you want to use the Agent configuration but
|
||||
with the simpler and more direct execution flow of LiteAgent.
|
||||
This method handles all the common preparation logic shared between
|
||||
kickoff() and kickoff_async(), including tool processing, prompt building,
|
||||
executor creation, and input formatting.
|
||||
|
||||
Args:
|
||||
messages: Either a string query or a list of message dictionaries.
|
||||
If a string is provided, it will be converted to a user message.
|
||||
If a list is provided, each dict should have 'role' and 'content' keys.
|
||||
response_format: Optional Pydantic model for structured output.
|
||||
|
||||
Returns:
|
||||
LiteAgentOutput: The result of the agent execution.
|
||||
Tuple of (executor, inputs, agent_info, parsed_tools) ready for execution.
|
||||
"""
|
||||
# Process platform apps and MCP tools
|
||||
if self.apps:
|
||||
platform_tools = self.get_platform_tools(self.apps)
|
||||
if platform_tools and self.tools is not None:
|
||||
@@ -1612,25 +1620,359 @@ class Agent(BaseAgent):
|
||||
if mcps and self.tools is not None:
|
||||
self.tools.extend(mcps)
|
||||
|
||||
lite_agent = LiteAgent(
|
||||
id=self.id,
|
||||
role=self.role,
|
||||
goal=self.goal,
|
||||
backstory=self.backstory,
|
||||
llm=self.llm,
|
||||
tools=self.tools or [],
|
||||
max_iterations=self.max_iter,
|
||||
max_execution_time=self.max_execution_time,
|
||||
respect_context_window=self.respect_context_window,
|
||||
verbose=self.verbose,
|
||||
response_format=response_format,
|
||||
# Prepare tools
|
||||
raw_tools: list[BaseTool] = self.tools or []
|
||||
parsed_tools = parse_tools(raw_tools)
|
||||
|
||||
# Build agent_info for backward-compatible event emission
|
||||
agent_info = {
|
||||
"id": self.id,
|
||||
"role": self.role,
|
||||
"goal": self.goal,
|
||||
"backstory": self.backstory,
|
||||
"tools": raw_tools,
|
||||
"verbose": self.verbose,
|
||||
}
|
||||
|
||||
# Build prompt for standalone execution
|
||||
prompt = Prompts(
|
||||
agent=self,
|
||||
has_tools=len(raw_tools) > 0,
|
||||
i18n=self.i18n,
|
||||
original_agent=self,
|
||||
guardrail=self.guardrail,
|
||||
guardrail_max_retries=self.guardrail_max_retries,
|
||||
use_system_prompt=self.use_system_prompt,
|
||||
system_template=self.system_template,
|
||||
prompt_template=self.prompt_template,
|
||||
response_template=self.response_template,
|
||||
).task_execution()
|
||||
|
||||
# Prepare stop words
|
||||
stop_words = [self.i18n.slice("observation")]
|
||||
if self.response_template:
|
||||
stop_words.append(
|
||||
self.response_template.split("{{ .Response }}")[1].strip()
|
||||
)
|
||||
|
||||
# Get RPM limit function
|
||||
rpm_limit_fn = (
|
||||
self._rpm_controller.check_or_wait if self._rpm_controller else None
|
||||
)
|
||||
|
||||
return lite_agent.kickoff(messages)
|
||||
# Create the executor for standalone mode (no crew, no task)
|
||||
executor = AgentExecutor(
|
||||
task=None,
|
||||
crew=None,
|
||||
llm=cast(BaseLLM, self.llm),
|
||||
agent=self,
|
||||
prompt=prompt,
|
||||
max_iter=self.max_iter,
|
||||
tools=parsed_tools,
|
||||
tools_names=get_tool_names(parsed_tools),
|
||||
stop_words=stop_words,
|
||||
tools_description=render_text_description_and_args(parsed_tools),
|
||||
tools_handler=self.tools_handler,
|
||||
original_tools=raw_tools,
|
||||
step_callback=self.step_callback,
|
||||
function_calling_llm=self.function_calling_llm,
|
||||
respect_context_window=self.respect_context_window,
|
||||
request_within_rpm_limit=rpm_limit_fn,
|
||||
callbacks=[TokenCalcHandler(self._token_process)],
|
||||
response_model=response_format,
|
||||
i18n=self.i18n,
|
||||
)
|
||||
|
||||
# Format messages
|
||||
if isinstance(messages, str):
|
||||
formatted_messages = messages
|
||||
else:
|
||||
formatted_messages = "\n".join(
|
||||
str(msg.get("content", "")) for msg in messages if msg.get("content")
|
||||
)
|
||||
|
||||
# Build the input dict for the executor
|
||||
inputs = {
|
||||
"input": formatted_messages,
|
||||
"tool_names": get_tool_names(parsed_tools),
|
||||
"tools": render_text_description_and_args(parsed_tools),
|
||||
}
|
||||
|
||||
return executor, inputs, agent_info, parsed_tools
|
||||
|
||||
def kickoff(
|
||||
self,
|
||||
messages: str | list[LLMMessage],
|
||||
response_format: type[Any] | None = None,
|
||||
) -> LiteAgentOutput | Coroutine[Any, Any, LiteAgentOutput]:
|
||||
"""
|
||||
Execute the agent with the given messages using the AgentExecutor.
|
||||
|
||||
This method provides standalone agent execution without requiring a Crew.
|
||||
It supports tools, response formatting, and guardrails.
|
||||
|
||||
When called from within a Flow (sync or async method), this automatically
|
||||
detects the event loop and returns a coroutine that the Flow framework
|
||||
awaits. Users don't need to handle async explicitly.
|
||||
|
||||
Args:
|
||||
messages: Either a string query or a list of message dictionaries.
|
||||
If a string is provided, it will be converted to a user message.
|
||||
If a list is provided, each dict should have 'role' and 'content' keys.
|
||||
response_format: Optional Pydantic model for structured output.
|
||||
|
||||
Returns:
|
||||
LiteAgentOutput: The result of the agent execution.
|
||||
When inside a Flow, returns a coroutine that resolves to LiteAgentOutput.
|
||||
|
||||
Note:
|
||||
For explicit async usage outside of Flow, use kickoff_async() directly.
|
||||
"""
|
||||
# Magic auto-async: if inside event loop (e.g., inside a Flow),
|
||||
# return coroutine for Flow to await
|
||||
if is_inside_event_loop():
|
||||
return self.kickoff_async(messages, response_format)
|
||||
|
||||
executor, inputs, agent_info, parsed_tools = self._prepare_kickoff(
|
||||
messages, response_format
|
||||
)
|
||||
|
||||
try:
|
||||
crewai_event_bus.emit(
|
||||
self,
|
||||
event=LiteAgentExecutionStartedEvent(
|
||||
agent_info=agent_info,
|
||||
tools=parsed_tools,
|
||||
messages=messages,
|
||||
),
|
||||
)
|
||||
|
||||
output = self._execute_and_build_output(executor, inputs, response_format)
|
||||
|
||||
if self.guardrail is not None:
|
||||
output = self._process_kickoff_guardrail(
|
||||
output=output,
|
||||
executor=executor,
|
||||
inputs=inputs,
|
||||
response_format=response_format,
|
||||
)
|
||||
|
||||
crewai_event_bus.emit(
|
||||
self,
|
||||
event=LiteAgentExecutionCompletedEvent(
|
||||
agent_info=agent_info,
|
||||
output=output.raw,
|
||||
),
|
||||
)
|
||||
|
||||
return output
|
||||
|
||||
except Exception as e:
|
||||
crewai_event_bus.emit(
|
||||
self,
|
||||
event=LiteAgentExecutionErrorEvent(
|
||||
agent_info=agent_info,
|
||||
error=str(e),
|
||||
),
|
||||
)
|
||||
raise
|
||||
|
||||
def _execute_and_build_output(
|
||||
self,
|
||||
executor: AgentExecutor,
|
||||
inputs: dict[str, str],
|
||||
response_format: type[Any] | None = None,
|
||||
) -> LiteAgentOutput:
|
||||
"""Execute the agent and build the output object.
|
||||
|
||||
Args:
|
||||
executor: The executor instance.
|
||||
inputs: Input dictionary for execution.
|
||||
response_format: Optional response format.
|
||||
|
||||
Returns:
|
||||
LiteAgentOutput with raw output, formatted result, and metrics.
|
||||
"""
|
||||
import json
|
||||
|
||||
# Execute the agent (this is called from sync path, so invoke returns dict)
|
||||
result = cast(dict[str, Any], executor.invoke(inputs))
|
||||
raw_output = result.get("output", "")
|
||||
|
||||
# Handle response format conversion
|
||||
formatted_result: BaseModel | None = None
|
||||
if response_format:
|
||||
try:
|
||||
model_schema = generate_model_description(response_format)
|
||||
schema = json.dumps(model_schema, indent=2)
|
||||
instructions = self.i18n.slice("formatted_task_instructions").format(
|
||||
output_format=schema
|
||||
)
|
||||
|
||||
converter = Converter(
|
||||
llm=self.llm,
|
||||
text=raw_output,
|
||||
model=response_format,
|
||||
instructions=instructions,
|
||||
)
|
||||
|
||||
conversion_result = converter.to_pydantic()
|
||||
if isinstance(conversion_result, BaseModel):
|
||||
formatted_result = conversion_result
|
||||
except ConverterError:
|
||||
pass # Keep raw output if conversion fails
|
||||
|
||||
# Get token usage metrics
|
||||
if isinstance(self.llm, BaseLLM):
|
||||
usage_metrics = self.llm.get_token_usage_summary()
|
||||
else:
|
||||
usage_metrics = self._token_process.get_summary()
|
||||
|
||||
return LiteAgentOutput(
|
||||
raw=raw_output,
|
||||
pydantic=formatted_result,
|
||||
agent_role=self.role,
|
||||
usage_metrics=usage_metrics.model_dump() if usage_metrics else None,
|
||||
messages=executor.messages,
|
||||
)
|
||||
|
||||
async def _execute_and_build_output_async(
|
||||
self,
|
||||
executor: AgentExecutor,
|
||||
inputs: dict[str, str],
|
||||
response_format: type[Any] | None = None,
|
||||
) -> LiteAgentOutput:
|
||||
"""Execute the agent asynchronously and build the output object.
|
||||
|
||||
This is the async version of _execute_and_build_output that uses
|
||||
invoke_async() for native async execution within event loops.
|
||||
|
||||
Args:
|
||||
executor: The executor instance.
|
||||
inputs: Input dictionary for execution.
|
||||
response_format: Optional response format.
|
||||
|
||||
Returns:
|
||||
LiteAgentOutput with raw output, formatted result, and metrics.
|
||||
"""
|
||||
import json
|
||||
|
||||
# Execute the agent asynchronously
|
||||
result = await executor.invoke_async(inputs)
|
||||
raw_output = result.get("output", "")
|
||||
|
||||
# Handle response format conversion
|
||||
formatted_result: BaseModel | None = None
|
||||
if response_format:
|
||||
try:
|
||||
model_schema = generate_model_description(response_format)
|
||||
schema = json.dumps(model_schema, indent=2)
|
||||
instructions = self.i18n.slice("formatted_task_instructions").format(
|
||||
output_format=schema
|
||||
)
|
||||
|
||||
converter = Converter(
|
||||
llm=self.llm,
|
||||
text=raw_output,
|
||||
model=response_format,
|
||||
instructions=instructions,
|
||||
)
|
||||
|
||||
conversion_result = converter.to_pydantic()
|
||||
if isinstance(conversion_result, BaseModel):
|
||||
formatted_result = conversion_result
|
||||
except ConverterError:
|
||||
pass # Keep raw output if conversion fails
|
||||
|
||||
# Get token usage metrics
|
||||
if isinstance(self.llm, BaseLLM):
|
||||
usage_metrics = self.llm.get_token_usage_summary()
|
||||
else:
|
||||
usage_metrics = self._token_process.get_summary()
|
||||
|
||||
return LiteAgentOutput(
|
||||
raw=raw_output,
|
||||
pydantic=formatted_result,
|
||||
agent_role=self.role,
|
||||
usage_metrics=usage_metrics.model_dump() if usage_metrics else None,
|
||||
messages=executor.messages,
|
||||
)
|
||||
|
||||
def _process_kickoff_guardrail(
|
||||
self,
|
||||
output: LiteAgentOutput,
|
||||
executor: AgentExecutor,
|
||||
inputs: dict[str, str],
|
||||
response_format: type[Any] | None = None,
|
||||
retry_count: int = 0,
|
||||
) -> LiteAgentOutput:
|
||||
"""Process guardrail for kickoff execution with retry logic.
|
||||
|
||||
Args:
|
||||
output: Current agent output.
|
||||
executor: The executor instance.
|
||||
inputs: Input dictionary for re-execution.
|
||||
response_format: Optional response format.
|
||||
retry_count: Current retry count.
|
||||
|
||||
Returns:
|
||||
Validated/updated output.
|
||||
"""
|
||||
from crewai.utilities.guardrail_types import GuardrailCallable
|
||||
|
||||
# Ensure guardrail is callable
|
||||
guardrail_callable: GuardrailCallable
|
||||
if isinstance(self.guardrail, str):
|
||||
from crewai.tasks.llm_guardrail import LLMGuardrail
|
||||
|
||||
guardrail_callable = cast(
|
||||
GuardrailCallable,
|
||||
LLMGuardrail(description=self.guardrail, llm=cast(BaseLLM, self.llm)),
|
||||
)
|
||||
elif callable(self.guardrail):
|
||||
guardrail_callable = self.guardrail
|
||||
else:
|
||||
# Should not happen if called from kickoff with guardrail check
|
||||
return output
|
||||
|
||||
guardrail_result = process_guardrail(
|
||||
output=output,
|
||||
guardrail=guardrail_callable,
|
||||
retry_count=retry_count,
|
||||
event_source=self,
|
||||
from_agent=self,
|
||||
)
|
||||
|
||||
if not guardrail_result.success:
|
||||
if retry_count >= self.guardrail_max_retries:
|
||||
raise ValueError(
|
||||
f"Agent's guardrail failed validation after {self.guardrail_max_retries} retries. "
|
||||
f"Last error: {guardrail_result.error}"
|
||||
)
|
||||
|
||||
# Add feedback and re-execute
|
||||
executor._append_message_to_state(
|
||||
guardrail_result.error or "Guardrail validation failed",
|
||||
role="user",
|
||||
)
|
||||
|
||||
# Re-execute and build new output
|
||||
output = self._execute_and_build_output(executor, inputs, response_format)
|
||||
|
||||
# Recursively retry guardrail
|
||||
return self._process_kickoff_guardrail(
|
||||
output=output,
|
||||
executor=executor,
|
||||
inputs=inputs,
|
||||
response_format=response_format,
|
||||
retry_count=retry_count + 1,
|
||||
)
|
||||
|
||||
# Apply guardrail result if available
|
||||
if guardrail_result.result is not None:
|
||||
if isinstance(guardrail_result.result, str):
|
||||
output.raw = guardrail_result.result
|
||||
elif isinstance(guardrail_result.result, BaseModel):
|
||||
output.pydantic = guardrail_result.result
|
||||
|
||||
return output
|
||||
|
||||
async def kickoff_async(
|
||||
self,
|
||||
@@ -1638,9 +1980,11 @@ class Agent(BaseAgent):
|
||||
response_format: type[Any] | None = None,
|
||||
) -> LiteAgentOutput:
|
||||
"""
|
||||
Execute the agent asynchronously with the given messages using a LiteAgent instance.
|
||||
Execute the agent asynchronously with the given messages.
|
||||
|
||||
This is the async version of the kickoff method.
|
||||
This is the async version of the kickoff method that uses native async
|
||||
execution. It is designed for use within async contexts, such as when
|
||||
called from within an async Flow method.
|
||||
|
||||
Args:
|
||||
messages: Either a string query or a list of message dictionaries.
|
||||
@@ -1651,21 +1995,48 @@ class Agent(BaseAgent):
|
||||
Returns:
|
||||
LiteAgentOutput: The result of the agent execution.
|
||||
"""
|
||||
lite_agent = LiteAgent(
|
||||
role=self.role,
|
||||
goal=self.goal,
|
||||
backstory=self.backstory,
|
||||
llm=self.llm,
|
||||
tools=self.tools or [],
|
||||
max_iterations=self.max_iter,
|
||||
max_execution_time=self.max_execution_time,
|
||||
respect_context_window=self.respect_context_window,
|
||||
verbose=self.verbose,
|
||||
response_format=response_format,
|
||||
i18n=self.i18n,
|
||||
original_agent=self,
|
||||
guardrail=self.guardrail,
|
||||
guardrail_max_retries=self.guardrail_max_retries,
|
||||
executor, inputs, agent_info, parsed_tools = self._prepare_kickoff(
|
||||
messages, response_format
|
||||
)
|
||||
|
||||
return await lite_agent.kickoff_async(messages)
|
||||
try:
|
||||
crewai_event_bus.emit(
|
||||
self,
|
||||
event=LiteAgentExecutionStartedEvent(
|
||||
agent_info=agent_info,
|
||||
tools=parsed_tools,
|
||||
messages=messages,
|
||||
),
|
||||
)
|
||||
|
||||
output = await self._execute_and_build_output_async(
|
||||
executor, inputs, response_format
|
||||
)
|
||||
|
||||
if self.guardrail is not None:
|
||||
output = self._process_kickoff_guardrail(
|
||||
output=output,
|
||||
executor=executor,
|
||||
inputs=inputs,
|
||||
response_format=response_format,
|
||||
)
|
||||
|
||||
crewai_event_bus.emit(
|
||||
self,
|
||||
event=LiteAgentExecutionCompletedEvent(
|
||||
agent_info=agent_info,
|
||||
output=output.raw,
|
||||
),
|
||||
)
|
||||
|
||||
return output
|
||||
|
||||
except Exception as e:
|
||||
crewai_event_bus.emit(
|
||||
self,
|
||||
event=LiteAgentExecutionErrorEvent(
|
||||
agent_info=agent_info,
|
||||
error=str(e),
|
||||
),
|
||||
)
|
||||
raise
|
||||
|
||||
@@ -21,9 +21,9 @@ if TYPE_CHECKING:
|
||||
|
||||
|
||||
class CrewAgentExecutorMixin:
|
||||
crew: Crew
|
||||
crew: Crew | None
|
||||
agent: Agent
|
||||
task: Task
|
||||
task: Task | None
|
||||
iterations: int
|
||||
max_iter: int
|
||||
messages: list[LLMMessage]
|
||||
|
||||
@@ -24,6 +24,7 @@ from crewai.events.types.logging_events import (
|
||||
AgentLogsExecutionEvent,
|
||||
AgentLogsStartedEvent,
|
||||
)
|
||||
from crewai.files import FileProcessor
|
||||
from crewai.hooks.llm_hooks import (
|
||||
get_after_llm_call_hooks,
|
||||
get_before_llm_call_hooks,
|
||||
@@ -43,6 +44,7 @@ from crewai.utilities.agent_utils import (
|
||||
process_llm_response,
|
||||
)
|
||||
from crewai.utilities.constants import TRAINING_DATA_FILE
|
||||
from crewai.utilities.file_store import get_all_files
|
||||
from crewai.utilities.i18n import I18N, get_i18n
|
||||
from crewai.utilities.printer import Printer
|
||||
from crewai.utilities.tool_utils import (
|
||||
@@ -188,6 +190,8 @@ class CrewAgentExecutor(CrewAgentExecutorMixin):
|
||||
user_prompt = self._format_prompt(self.prompt.get("prompt", ""), inputs)
|
||||
self.messages.append(format_message_for_llm(user_prompt))
|
||||
|
||||
self._inject_multimodal_files()
|
||||
|
||||
self._show_start_logs()
|
||||
|
||||
self.ask_for_human_input = bool(inputs.get("ask_for_human_input", False))
|
||||
@@ -212,6 +216,90 @@ class CrewAgentExecutor(CrewAgentExecutorMixin):
|
||||
self._create_external_memory(formatted_answer)
|
||||
return {"output": formatted_answer.output}
|
||||
|
||||
def _inject_multimodal_files(self) -> None:
|
||||
"""Inject files as multimodal content into messages.
|
||||
|
||||
For crews with input files and LLMs that support multimodal,
|
||||
processes files according to provider constraints and file handling mode,
|
||||
then delegates to the LLM's format_multimodal_content method to
|
||||
generate provider-specific content blocks.
|
||||
"""
|
||||
if not self.crew or not self.task:
|
||||
return
|
||||
|
||||
if not self.llm.supports_multimodal():
|
||||
return
|
||||
|
||||
files = get_all_files(self.crew.id, self.task.id)
|
||||
if not files:
|
||||
return
|
||||
|
||||
provider = getattr(self.llm, "provider", None) or getattr(self.llm, "model", "")
|
||||
processor = FileProcessor(constraints=provider)
|
||||
files = processor.process_files(files)
|
||||
|
||||
from crewai.files import get_upload_cache
|
||||
|
||||
upload_cache = get_upload_cache()
|
||||
content_blocks = self.llm.format_multimodal_content(
|
||||
files, upload_cache=upload_cache
|
||||
)
|
||||
if not content_blocks:
|
||||
return
|
||||
|
||||
for i in range(len(self.messages) - 1, -1, -1):
|
||||
msg = self.messages[i]
|
||||
if msg.get("role") == "user":
|
||||
existing_content = msg.get("content", "")
|
||||
if isinstance(existing_content, str):
|
||||
msg["content"] = [
|
||||
self.llm.format_text_content(existing_content),
|
||||
*content_blocks,
|
||||
]
|
||||
break
|
||||
|
||||
async def _ainject_multimodal_files(self) -> None:
|
||||
"""Async inject files as multimodal content into messages.
|
||||
|
||||
For crews with input files and LLMs that support multimodal,
|
||||
processes files according to provider constraints using parallel processing,
|
||||
then delegates to the LLM's aformat_multimodal_content method to
|
||||
generate provider-specific content blocks with parallel file resolution.
|
||||
"""
|
||||
if not self.crew or not self.task:
|
||||
return
|
||||
|
||||
if not self.llm.supports_multimodal():
|
||||
return
|
||||
|
||||
files = get_all_files(self.crew.id, self.task.id)
|
||||
if not files:
|
||||
return
|
||||
|
||||
provider = getattr(self.llm, "provider", None) or getattr(self.llm, "model", "")
|
||||
processor = FileProcessor(constraints=provider)
|
||||
files = await processor.aprocess_files(files)
|
||||
|
||||
from crewai.files import get_upload_cache
|
||||
|
||||
upload_cache = get_upload_cache()
|
||||
content_blocks = await self.llm.aformat_multimodal_content(
|
||||
files, upload_cache=upload_cache
|
||||
)
|
||||
if not content_blocks:
|
||||
return
|
||||
|
||||
for i in range(len(self.messages) - 1, -1, -1):
|
||||
msg = self.messages[i]
|
||||
if msg.get("role") == "user":
|
||||
existing_content = msg.get("content", "")
|
||||
if isinstance(existing_content, str):
|
||||
msg["content"] = [
|
||||
self.llm.format_text_content(existing_content),
|
||||
*content_blocks,
|
||||
]
|
||||
break
|
||||
|
||||
def _invoke_loop(self) -> AgentFinish:
|
||||
"""Execute agent loop until completion.
|
||||
|
||||
@@ -355,6 +443,8 @@ class CrewAgentExecutor(CrewAgentExecutorMixin):
|
||||
user_prompt = self._format_prompt(self.prompt.get("prompt", ""), inputs)
|
||||
self.messages.append(format_message_for_llm(user_prompt))
|
||||
|
||||
await self._ainject_multimodal_files()
|
||||
|
||||
self._show_start_logs()
|
||||
|
||||
self.ask_for_human_input = bool(inputs.get("ask_for_human_input", False))
|
||||
|
||||
@@ -80,6 +80,7 @@ from crewai.task import Task
|
||||
from crewai.tasks.conditional_task import ConditionalTask
|
||||
from crewai.tasks.task_output import TaskOutput
|
||||
from crewai.tools.agent_tools.agent_tools import AgentTools
|
||||
from crewai.tools.agent_tools.read_file_tool import ReadFileTool
|
||||
from crewai.tools.base_tool import BaseTool
|
||||
from crewai.types.streaming import CrewStreamingOutput
|
||||
from crewai.types.usage_metrics import UsageMetrics
|
||||
@@ -88,6 +89,7 @@ from crewai.utilities.crew.models import CrewContext
|
||||
from crewai.utilities.evaluators.crew_evaluator_handler import CrewEvaluator
|
||||
from crewai.utilities.evaluators.task_evaluator import TaskEvaluator
|
||||
from crewai.utilities.file_handler import FileHandler
|
||||
from crewai.utilities.file_store import clear_files, get_all_files
|
||||
from crewai.utilities.formatter import (
|
||||
aggregate_raw_outputs_from_task_outputs,
|
||||
aggregate_raw_outputs_from_tasks,
|
||||
@@ -106,6 +108,7 @@ from crewai.utilities.streaming import (
|
||||
)
|
||||
from crewai.utilities.task_output_storage_handler import TaskOutputStorageHandler
|
||||
from crewai.utilities.training_handler import CrewTrainingHandler
|
||||
from crewai.utilities.types import KickoffInputs
|
||||
|
||||
|
||||
warnings.filterwarnings("ignore", category=SyntaxWarning, module="pysbd")
|
||||
@@ -675,7 +678,7 @@ class Crew(FlowTrackable, BaseModel):
|
||||
|
||||
def kickoff(
|
||||
self,
|
||||
inputs: dict[str, Any] | None = None,
|
||||
inputs: KickoffInputs | dict[str, Any] | None = None,
|
||||
) -> CrewOutput | CrewStreamingOutput:
|
||||
if self.stream:
|
||||
enable_agent_streaming(self.agents)
|
||||
@@ -732,6 +735,7 @@ class Crew(FlowTrackable, BaseModel):
|
||||
)
|
||||
raise
|
||||
finally:
|
||||
clear_files(self.id)
|
||||
detach(token)
|
||||
|
||||
def kickoff_for_each(
|
||||
@@ -762,7 +766,7 @@ class Crew(FlowTrackable, BaseModel):
|
||||
return results
|
||||
|
||||
async def kickoff_async(
|
||||
self, inputs: dict[str, Any] | None = None
|
||||
self, inputs: KickoffInputs | dict[str, Any] | None = None
|
||||
) -> CrewOutput | CrewStreamingOutput:
|
||||
"""Asynchronous kickoff method to start the crew execution.
|
||||
|
||||
@@ -817,7 +821,7 @@ class Crew(FlowTrackable, BaseModel):
|
||||
return await run_for_each_async(self, inputs, kickoff_fn)
|
||||
|
||||
async def akickoff(
|
||||
self, inputs: dict[str, Any] | None = None
|
||||
self, inputs: KickoffInputs | dict[str, Any] | None = None
|
||||
) -> CrewOutput | CrewStreamingOutput:
|
||||
"""Native async kickoff method using async task execution throughout.
|
||||
|
||||
@@ -880,6 +884,7 @@ class Crew(FlowTrackable, BaseModel):
|
||||
)
|
||||
raise
|
||||
finally:
|
||||
clear_files(self.id)
|
||||
detach(token)
|
||||
|
||||
async def akickoff_for_each(
|
||||
@@ -1215,7 +1220,8 @@ class Crew(FlowTrackable, BaseModel):
|
||||
and hasattr(agent, "multimodal")
|
||||
and getattr(agent, "multimodal", False)
|
||||
):
|
||||
tools = self._add_multimodal_tools(agent, tools)
|
||||
if not (agent.llm and agent.llm.supports_multimodal()):
|
||||
tools = self._add_multimodal_tools(agent, tools)
|
||||
|
||||
if agent and (hasattr(agent, "apps") and getattr(agent, "apps", None)):
|
||||
tools = self._add_platform_tools(task, tools)
|
||||
@@ -1223,7 +1229,24 @@ class Crew(FlowTrackable, BaseModel):
|
||||
if agent and (hasattr(agent, "mcps") and getattr(agent, "mcps", None)):
|
||||
tools = self._add_mcp_tools(task, tools)
|
||||
|
||||
# Return a list[BaseTool] compatible with Task.execute_sync and execute_async
|
||||
files = get_all_files(self.id, task.id)
|
||||
if files:
|
||||
supported_types: list[str] = []
|
||||
if agent and agent.llm and agent.llm.supports_multimodal():
|
||||
supported_types = agent.llm.supported_multimodal_content_types()
|
||||
|
||||
def is_auto_injected(content_type: str) -> bool:
|
||||
return any(content_type.startswith(t) for t in supported_types)
|
||||
|
||||
# Only add read_file tool if there are files that need it
|
||||
files_needing_tool = {
|
||||
name: f
|
||||
for name, f in files.items()
|
||||
if not is_auto_injected(f.content_type)
|
||||
}
|
||||
if files_needing_tool:
|
||||
tools = self._add_file_tools(tools, files_needing_tool)
|
||||
|
||||
return tools
|
||||
|
||||
def _get_agent_to_use(self, task: Task) -> BaseAgent | None:
|
||||
@@ -1303,6 +1326,22 @@ class Crew(FlowTrackable, BaseModel):
|
||||
return self._merge_tools(tools, cast(list[BaseTool], code_tools))
|
||||
return tools
|
||||
|
||||
def _add_file_tools(
|
||||
self, tools: list[BaseTool], files: dict[str, Any]
|
||||
) -> list[BaseTool]:
|
||||
"""Add file reading tool when input files are available.
|
||||
|
||||
Args:
|
||||
tools: Current list of tools.
|
||||
files: Dictionary of input files.
|
||||
|
||||
Returns:
|
||||
Updated list with file tool added.
|
||||
"""
|
||||
read_file_tool = ReadFileTool()
|
||||
read_file_tool.set_files(files)
|
||||
return self._merge_tools(tools, [read_file_tool])
|
||||
|
||||
def _add_delegation_tools(
|
||||
self, task: Task, tools: list[BaseTool]
|
||||
) -> list[BaseTool]:
|
||||
|
||||
@@ -8,13 +8,22 @@ from typing import TYPE_CHECKING, Any
|
||||
|
||||
from crewai.agents.agent_builder.base_agent import BaseAgent
|
||||
from crewai.crews.crew_output import CrewOutput
|
||||
from crewai.files import (
|
||||
AudioFile,
|
||||
ImageFile,
|
||||
PDFFile,
|
||||
TextFile,
|
||||
VideoFile,
|
||||
)
|
||||
from crewai.rag.embeddings.types import EmbedderConfig
|
||||
from crewai.types.streaming import CrewStreamingOutput, FlowStreamingOutput
|
||||
from crewai.utilities.file_store import store_files
|
||||
from crewai.utilities.streaming import (
|
||||
StreamingState,
|
||||
TaskInfo,
|
||||
create_streaming_state,
|
||||
)
|
||||
from crewai.utilities.types import KickoffInputs
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
@@ -176,7 +185,36 @@ def check_conditional_skip(
|
||||
return None
|
||||
|
||||
|
||||
def prepare_kickoff(crew: Crew, inputs: dict[str, Any] | None) -> dict[str, Any] | None:
|
||||
def _extract_files_from_inputs(inputs: dict[str, Any]) -> dict[str, Any]:
|
||||
"""Extract file objects from inputs dict.
|
||||
|
||||
Scans inputs for FileInput objects (ImageFile, TextFile, etc.) and
|
||||
extracts them into a separate dict.
|
||||
|
||||
Args:
|
||||
inputs: The inputs dictionary to scan.
|
||||
|
||||
Returns:
|
||||
Dictionary of extracted file objects.
|
||||
"""
|
||||
file_types = (AudioFile, ImageFile, PDFFile, TextFile, VideoFile)
|
||||
files: dict[str, Any] = {}
|
||||
keys_to_remove: list[str] = []
|
||||
|
||||
for key, value in inputs.items():
|
||||
if isinstance(value, file_types):
|
||||
files[key] = value
|
||||
keys_to_remove.append(key)
|
||||
|
||||
for key in keys_to_remove:
|
||||
del inputs[key]
|
||||
|
||||
return files
|
||||
|
||||
|
||||
def prepare_kickoff(
|
||||
crew: Crew, inputs: KickoffInputs | dict[str, Any] | None
|
||||
) -> dict[str, Any] | None:
|
||||
"""Prepare crew for kickoff execution.
|
||||
|
||||
Handles before callbacks, event emission, task handler reset, input
|
||||
@@ -192,14 +230,17 @@ def prepare_kickoff(crew: Crew, inputs: dict[str, Any] | None) -> dict[str, Any]
|
||||
from crewai.events.event_bus import crewai_event_bus
|
||||
from crewai.events.types.crew_events import CrewKickoffStartedEvent
|
||||
|
||||
# Normalize inputs to dict[str, Any] for internal processing
|
||||
normalized: dict[str, Any] | None = dict(inputs) if inputs is not None else None
|
||||
|
||||
for before_callback in crew.before_kickoff_callbacks:
|
||||
if inputs is None:
|
||||
inputs = {}
|
||||
inputs = before_callback(inputs)
|
||||
if normalized is None:
|
||||
normalized = {}
|
||||
normalized = before_callback(normalized)
|
||||
|
||||
future = crewai_event_bus.emit(
|
||||
crew,
|
||||
CrewKickoffStartedEvent(crew_name=crew.name, inputs=inputs),
|
||||
CrewKickoffStartedEvent(crew_name=crew.name, inputs=normalized),
|
||||
)
|
||||
if future is not None:
|
||||
try:
|
||||
@@ -210,9 +251,20 @@ def prepare_kickoff(crew: Crew, inputs: dict[str, Any] | None) -> dict[str, Any]
|
||||
crew._task_output_handler.reset()
|
||||
crew._logging_color = "bold_purple"
|
||||
|
||||
if inputs is not None:
|
||||
crew._inputs = inputs
|
||||
crew._interpolate_inputs(inputs)
|
||||
if normalized is not None:
|
||||
# Extract files from dedicated "files" key
|
||||
files = normalized.pop("files", None) or {}
|
||||
|
||||
# Extract file objects unpacked directly into inputs
|
||||
unpacked_files = _extract_files_from_inputs(normalized)
|
||||
|
||||
# Merge files (unpacked files take precedence over explicit files dict)
|
||||
all_files = {**files, **unpacked_files}
|
||||
if all_files:
|
||||
store_files(crew.id, all_files)
|
||||
|
||||
crew._inputs = normalized
|
||||
crew._interpolate_inputs(normalized)
|
||||
crew._set_tasks_callbacks()
|
||||
crew._set_allow_crewai_trigger_context_for_first_task()
|
||||
|
||||
@@ -227,7 +279,7 @@ def prepare_kickoff(crew: Crew, inputs: dict[str, Any] | None) -> dict[str, Any]
|
||||
if crew.planning:
|
||||
crew._handle_crew_planning()
|
||||
|
||||
return inputs
|
||||
return normalized
|
||||
|
||||
|
||||
class StreamingContext:
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
from crewai.experimental.crew_agent_executor_flow import CrewAgentExecutorFlow
|
||||
from crewai.experimental.agent_executor import AgentExecutor, CrewAgentExecutorFlow
|
||||
from crewai.experimental.evaluation import (
|
||||
AgentEvaluationResult,
|
||||
AgentEvaluator,
|
||||
@@ -23,8 +23,9 @@ from crewai.experimental.evaluation import (
|
||||
__all__ = [
|
||||
"AgentEvaluationResult",
|
||||
"AgentEvaluator",
|
||||
"AgentExecutor",
|
||||
"BaseEvaluator",
|
||||
"CrewAgentExecutorFlow",
|
||||
"CrewAgentExecutorFlow", # Deprecated alias for AgentExecutor
|
||||
"EvaluationScore",
|
||||
"EvaluationTraceCallback",
|
||||
"ExperimentResult",
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Callable
|
||||
from collections.abc import Callable, Coroutine
|
||||
import threading
|
||||
from typing import TYPE_CHECKING, Any, Literal, cast
|
||||
from uuid import uuid4
|
||||
@@ -37,6 +37,7 @@ from crewai.utilities.agent_utils import (
|
||||
handle_unknown_error,
|
||||
has_reached_max_iterations,
|
||||
is_context_length_exceeded,
|
||||
is_inside_event_loop,
|
||||
process_llm_response,
|
||||
)
|
||||
from crewai.utilities.constants import TRAINING_DATA_FILE
|
||||
@@ -73,13 +74,17 @@ class AgentReActState(BaseModel):
|
||||
ask_for_human_input: bool = Field(default=False)
|
||||
|
||||
|
||||
class CrewAgentExecutorFlow(Flow[AgentReActState], CrewAgentExecutorMixin):
|
||||
"""Flow-based executor matching CrewAgentExecutor interface.
|
||||
class AgentExecutor(Flow[AgentReActState], CrewAgentExecutorMixin):
|
||||
"""Agent Executor for both standalone agents and crew-bound agents.
|
||||
|
||||
Inherits from:
|
||||
- Flow[AgentReActState]: Provides flow orchestration capabilities
|
||||
- CrewAgentExecutorMixin: Provides memory methods (short/long/external term)
|
||||
|
||||
This executor can operate in two modes:
|
||||
- Standalone mode: When crew and task are None (used by Agent.kickoff())
|
||||
- Crew mode: When crew and task are provided (used by Agent.execute_task())
|
||||
|
||||
Note: Multiple instances may be created during agent initialization
|
||||
(cache setup, RPM controller setup, etc.) but only the final instance
|
||||
should execute tasks via invoke().
|
||||
@@ -88,8 +93,6 @@ class CrewAgentExecutorFlow(Flow[AgentReActState], CrewAgentExecutorMixin):
|
||||
def __init__(
|
||||
self,
|
||||
llm: BaseLLM,
|
||||
task: Task,
|
||||
crew: Crew,
|
||||
agent: Agent,
|
||||
prompt: SystemPromptResult | StandardPromptResult,
|
||||
max_iter: int,
|
||||
@@ -98,6 +101,8 @@ class CrewAgentExecutorFlow(Flow[AgentReActState], CrewAgentExecutorMixin):
|
||||
stop_words: list[str],
|
||||
tools_description: str,
|
||||
tools_handler: ToolsHandler,
|
||||
task: Task | None = None,
|
||||
crew: Crew | None = None,
|
||||
step_callback: Any = None,
|
||||
original_tools: list[BaseTool] | None = None,
|
||||
function_calling_llm: BaseLLM | Any | None = None,
|
||||
@@ -111,8 +116,6 @@ class CrewAgentExecutorFlow(Flow[AgentReActState], CrewAgentExecutorMixin):
|
||||
|
||||
Args:
|
||||
llm: Language model instance.
|
||||
task: Task to execute.
|
||||
crew: Crew instance.
|
||||
agent: Agent to execute.
|
||||
prompt: Prompt templates.
|
||||
max_iter: Maximum iterations.
|
||||
@@ -121,6 +124,8 @@ class CrewAgentExecutorFlow(Flow[AgentReActState], CrewAgentExecutorMixin):
|
||||
stop_words: Stop word list.
|
||||
tools_description: Tool descriptions.
|
||||
tools_handler: Tool handler instance.
|
||||
task: Optional task to execute (None for standalone agent execution).
|
||||
crew: Optional crew instance (None for standalone agent execution).
|
||||
step_callback: Optional step callback.
|
||||
original_tools: Original tool list.
|
||||
function_calling_llm: Optional function calling LLM.
|
||||
@@ -131,9 +136,9 @@ class CrewAgentExecutorFlow(Flow[AgentReActState], CrewAgentExecutorMixin):
|
||||
"""
|
||||
self._i18n: I18N = i18n or get_i18n()
|
||||
self.llm = llm
|
||||
self.task = task
|
||||
self.task: Task | None = task
|
||||
self.agent = agent
|
||||
self.crew = crew
|
||||
self.crew: Crew | None = crew
|
||||
self.prompt = prompt
|
||||
self.tools = tools
|
||||
self.tools_names = tools_names
|
||||
@@ -178,7 +183,6 @@ class CrewAgentExecutorFlow(Flow[AgentReActState], CrewAgentExecutorMixin):
|
||||
else self.stop
|
||||
)
|
||||
)
|
||||
|
||||
self._state = AgentReActState()
|
||||
|
||||
def _ensure_flow_initialized(self) -> None:
|
||||
@@ -264,7 +268,7 @@ class CrewAgentExecutorFlow(Flow[AgentReActState], CrewAgentExecutorMixin):
|
||||
printer=self._printer,
|
||||
from_task=self.task,
|
||||
from_agent=self.agent,
|
||||
response_model=self.response_model,
|
||||
response_model=None,
|
||||
executor_context=self,
|
||||
)
|
||||
|
||||
@@ -449,9 +453,99 @@ class CrewAgentExecutorFlow(Flow[AgentReActState], CrewAgentExecutorMixin):
|
||||
|
||||
return "initialized"
|
||||
|
||||
def invoke(self, inputs: dict[str, Any]) -> dict[str, Any]:
|
||||
def invoke(
|
||||
self, inputs: dict[str, Any]
|
||||
) -> dict[str, Any] | Coroutine[Any, Any, dict[str, Any]]:
|
||||
"""Execute agent with given inputs.
|
||||
|
||||
When called from within an existing event loop (e.g., inside a Flow),
|
||||
this method returns a coroutine that should be awaited. The Flow
|
||||
framework handles this automatically.
|
||||
|
||||
Args:
|
||||
inputs: Input dictionary containing prompt variables.
|
||||
|
||||
Returns:
|
||||
Dictionary with agent output, or a coroutine if inside an event loop.
|
||||
"""
|
||||
# Magic auto-async: if inside event loop, return coroutine for Flow to await
|
||||
if is_inside_event_loop():
|
||||
return self.invoke_async(inputs)
|
||||
|
||||
self._ensure_flow_initialized()
|
||||
|
||||
with self._execution_lock:
|
||||
if self._is_executing:
|
||||
raise RuntimeError(
|
||||
"Executor is already running. "
|
||||
"Cannot invoke the same executor instance concurrently."
|
||||
)
|
||||
self._is_executing = True
|
||||
self._has_been_invoked = True
|
||||
|
||||
try:
|
||||
# Reset state for fresh execution
|
||||
self.state.messages.clear()
|
||||
self.state.iterations = 0
|
||||
self.state.current_answer = None
|
||||
self.state.is_finished = False
|
||||
|
||||
if "system" in self.prompt:
|
||||
prompt = cast("SystemPromptResult", self.prompt)
|
||||
system_prompt = self._format_prompt(prompt["system"], inputs)
|
||||
user_prompt = self._format_prompt(prompt["user"], inputs)
|
||||
self.state.messages.append(
|
||||
format_message_for_llm(system_prompt, role="system")
|
||||
)
|
||||
self.state.messages.append(format_message_for_llm(user_prompt))
|
||||
else:
|
||||
user_prompt = self._format_prompt(self.prompt["prompt"], inputs)
|
||||
self.state.messages.append(format_message_for_llm(user_prompt))
|
||||
|
||||
self.state.ask_for_human_input = bool(
|
||||
inputs.get("ask_for_human_input", False)
|
||||
)
|
||||
|
||||
self.kickoff()
|
||||
|
||||
formatted_answer = self.state.current_answer
|
||||
|
||||
if not isinstance(formatted_answer, AgentFinish):
|
||||
raise RuntimeError(
|
||||
"Agent execution ended without reaching a final answer."
|
||||
)
|
||||
|
||||
if self.state.ask_for_human_input:
|
||||
formatted_answer = self._handle_human_feedback(formatted_answer)
|
||||
|
||||
self._create_short_term_memory(formatted_answer)
|
||||
self._create_long_term_memory(formatted_answer)
|
||||
self._create_external_memory(formatted_answer)
|
||||
|
||||
return {"output": formatted_answer.output}
|
||||
|
||||
except AssertionError:
|
||||
fail_text = Text()
|
||||
fail_text.append("❌ ", style="red bold")
|
||||
fail_text.append(
|
||||
"Agent failed to reach a final answer. This is likely a bug - please report it.",
|
||||
style="red",
|
||||
)
|
||||
self._console.print(fail_text)
|
||||
raise
|
||||
except Exception as e:
|
||||
handle_unknown_error(self._printer, e)
|
||||
raise
|
||||
finally:
|
||||
self._is_executing = False
|
||||
|
||||
async def invoke_async(self, inputs: dict[str, Any]) -> dict[str, Any]:
|
||||
"""Execute agent asynchronously with given inputs.
|
||||
|
||||
This method is designed for use within async contexts, such as when
|
||||
the agent is called from within an async Flow method. It uses
|
||||
kickoff_async() directly instead of running in a separate thread.
|
||||
|
||||
Args:
|
||||
inputs: Input dictionary containing prompt variables.
|
||||
|
||||
@@ -492,7 +586,8 @@ class CrewAgentExecutorFlow(Flow[AgentReActState], CrewAgentExecutorMixin):
|
||||
inputs.get("ask_for_human_input", False)
|
||||
)
|
||||
|
||||
self.kickoff()
|
||||
# Use async kickoff directly since we're already in an async context
|
||||
await self.kickoff_async()
|
||||
|
||||
formatted_answer = self.state.current_answer
|
||||
|
||||
@@ -583,11 +678,14 @@ class CrewAgentExecutorFlow(Flow[AgentReActState], CrewAgentExecutorMixin):
|
||||
if self.agent is None:
|
||||
raise ValueError("Agent cannot be None")
|
||||
|
||||
if self.task is None:
|
||||
return
|
||||
|
||||
crewai_event_bus.emit(
|
||||
self.agent,
|
||||
AgentLogsStartedEvent(
|
||||
agent_role=self.agent.role,
|
||||
task_description=(self.task.description if self.task else "Not Found"),
|
||||
task_description=self.task.description,
|
||||
verbose=self.agent.verbose
|
||||
or (hasattr(self, "crew") and getattr(self.crew, "verbose", False)),
|
||||
),
|
||||
@@ -621,10 +719,12 @@ class CrewAgentExecutorFlow(Flow[AgentReActState], CrewAgentExecutorMixin):
|
||||
result: Agent's final output.
|
||||
human_feedback: Optional feedback from human.
|
||||
"""
|
||||
# Early return if no crew (standalone mode)
|
||||
if self.crew is None:
|
||||
return
|
||||
|
||||
agent_id = str(self.agent.id)
|
||||
train_iteration = (
|
||||
getattr(self.crew, "_train_iteration", None) if self.crew else None
|
||||
)
|
||||
train_iteration = getattr(self.crew, "_train_iteration", None)
|
||||
|
||||
if train_iteration is None or not isinstance(train_iteration, int):
|
||||
train_error = Text()
|
||||
@@ -806,3 +906,7 @@ class CrewAgentExecutorFlow(Flow[AgentReActState], CrewAgentExecutorMixin):
|
||||
requiring arbitrary_types_allowed=True.
|
||||
"""
|
||||
return core_schema.any_schema()
|
||||
|
||||
|
||||
# Backward compatibility alias (deprecated)
|
||||
CrewAgentExecutorFlow = AgentExecutor
|
||||
207
lib/crewai/src/crewai/files/__init__.py
Normal file
207
lib/crewai/src/crewai/files/__init__.py
Normal file
@@ -0,0 +1,207 @@
|
||||
"""File handling utilities for crewAI tasks."""
|
||||
|
||||
from crewai.files.cleanup import (
|
||||
cleanup_expired_files,
|
||||
cleanup_provider_files,
|
||||
cleanup_uploaded_files,
|
||||
)
|
||||
from crewai.files.content_types import (
|
||||
AudioContentType,
|
||||
AudioExtension,
|
||||
AudioFile,
|
||||
BaseFile,
|
||||
File,
|
||||
FileMode,
|
||||
ImageContentType,
|
||||
ImageExtension,
|
||||
ImageFile,
|
||||
PDFContentType,
|
||||
PDFExtension,
|
||||
PDFFile,
|
||||
TextContentType,
|
||||
TextExtension,
|
||||
TextFile,
|
||||
VideoContentType,
|
||||
VideoExtension,
|
||||
VideoFile,
|
||||
)
|
||||
from crewai.files.file import (
|
||||
FileBytes,
|
||||
FilePath,
|
||||
FileSource,
|
||||
FileSourceInput,
|
||||
FileStream,
|
||||
RawFileInput,
|
||||
)
|
||||
from crewai.files.processing import (
|
||||
ANTHROPIC_CONSTRAINTS,
|
||||
BEDROCK_CONSTRAINTS,
|
||||
GEMINI_CONSTRAINTS,
|
||||
OPENAI_CONSTRAINTS,
|
||||
AudioConstraints,
|
||||
FileHandling,
|
||||
FileProcessingError,
|
||||
FileProcessor,
|
||||
FileTooLargeError,
|
||||
FileValidationError,
|
||||
ImageConstraints,
|
||||
PDFConstraints,
|
||||
ProcessingDependencyError,
|
||||
ProviderConstraints,
|
||||
UnsupportedFileTypeError,
|
||||
VideoConstraints,
|
||||
get_constraints_for_provider,
|
||||
)
|
||||
from crewai.files.resolved import (
|
||||
FileReference,
|
||||
InlineBase64,
|
||||
InlineBytes,
|
||||
ResolvedFile,
|
||||
ResolvedFileType,
|
||||
UrlReference,
|
||||
)
|
||||
from crewai.files.resolver import (
|
||||
FileResolver,
|
||||
FileResolverConfig,
|
||||
create_resolver,
|
||||
)
|
||||
from crewai.files.upload_cache import (
|
||||
CachedUpload,
|
||||
UploadCache,
|
||||
get_upload_cache,
|
||||
reset_upload_cache,
|
||||
)
|
||||
from crewai.files.uploaders import FileUploader, UploadResult, get_uploader
|
||||
|
||||
|
||||
FileInput = AudioFile | File | ImageFile | PDFFile | TextFile | VideoFile
|
||||
|
||||
|
||||
def wrap_file_source(source: FileSource) -> FileInput:
|
||||
"""Wrap a FileSource in the appropriate typed FileInput wrapper.
|
||||
|
||||
Args:
|
||||
source: The file source to wrap.
|
||||
|
||||
Returns:
|
||||
Typed FileInput wrapper based on content type.
|
||||
"""
|
||||
content_type = source.content_type
|
||||
|
||||
if content_type.startswith("image/"):
|
||||
return ImageFile(source=source)
|
||||
if content_type.startswith("audio/"):
|
||||
return AudioFile(source=source)
|
||||
if content_type.startswith("video/"):
|
||||
return VideoFile(source=source)
|
||||
if content_type == "application/pdf":
|
||||
return PDFFile(source=source)
|
||||
return TextFile(source=source)
|
||||
|
||||
|
||||
def normalize_input_files(
|
||||
input_files: list[FileSourceInput | FileInput],
|
||||
) -> dict[str, FileInput]:
|
||||
"""Convert a list of file sources to a named dictionary of FileInputs.
|
||||
|
||||
Args:
|
||||
input_files: List of file source inputs or File objects.
|
||||
|
||||
Returns:
|
||||
Dictionary mapping names to FileInput wrappers.
|
||||
"""
|
||||
from pathlib import Path
|
||||
|
||||
result: dict[str, FileInput] = {}
|
||||
|
||||
for i, item in enumerate(input_files):
|
||||
if isinstance(item, BaseFile):
|
||||
name = item.filename or f"file_{i}"
|
||||
if "." in name:
|
||||
name = name.rsplit(".", 1)[0]
|
||||
result[name] = item
|
||||
continue
|
||||
|
||||
file_source: FilePath | FileBytes | FileStream
|
||||
if isinstance(item, (FilePath, FileBytes, FileStream)):
|
||||
file_source = item
|
||||
elif isinstance(item, Path):
|
||||
file_source = FilePath(path=item)
|
||||
elif isinstance(item, str):
|
||||
file_source = FilePath(path=Path(item))
|
||||
elif isinstance(item, (bytes, memoryview)):
|
||||
file_source = FileBytes(data=bytes(item))
|
||||
else:
|
||||
continue
|
||||
|
||||
name = file_source.filename or f"file_{i}"
|
||||
result[name] = wrap_file_source(file_source)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
__all__ = [
|
||||
"ANTHROPIC_CONSTRAINTS",
|
||||
"BEDROCK_CONSTRAINTS",
|
||||
"GEMINI_CONSTRAINTS",
|
||||
"OPENAI_CONSTRAINTS",
|
||||
"AudioConstraints",
|
||||
"AudioContentType",
|
||||
"AudioExtension",
|
||||
"AudioFile",
|
||||
"BaseFile",
|
||||
"CachedUpload",
|
||||
"File",
|
||||
"FileBytes",
|
||||
"FileHandling",
|
||||
"FileInput",
|
||||
"FileMode",
|
||||
"FilePath",
|
||||
"FileProcessingError",
|
||||
"FileProcessor",
|
||||
"FileReference",
|
||||
"FileResolver",
|
||||
"FileResolverConfig",
|
||||
"FileSource",
|
||||
"FileSourceInput",
|
||||
"FileStream",
|
||||
"FileTooLargeError",
|
||||
"FileUploader",
|
||||
"FileValidationError",
|
||||
"ImageConstraints",
|
||||
"ImageContentType",
|
||||
"ImageExtension",
|
||||
"ImageFile",
|
||||
"InlineBase64",
|
||||
"InlineBytes",
|
||||
"PDFConstraints",
|
||||
"PDFContentType",
|
||||
"PDFExtension",
|
||||
"PDFFile",
|
||||
"ProcessingDependencyError",
|
||||
"ProviderConstraints",
|
||||
"RawFileInput",
|
||||
"ResolvedFile",
|
||||
"ResolvedFileType",
|
||||
"TextContentType",
|
||||
"TextExtension",
|
||||
"TextFile",
|
||||
"UnsupportedFileTypeError",
|
||||
"UploadCache",
|
||||
"UploadResult",
|
||||
"UrlReference",
|
||||
"VideoConstraints",
|
||||
"VideoContentType",
|
||||
"VideoExtension",
|
||||
"VideoFile",
|
||||
"cleanup_expired_files",
|
||||
"cleanup_provider_files",
|
||||
"cleanup_uploaded_files",
|
||||
"create_resolver",
|
||||
"get_constraints_for_provider",
|
||||
"get_upload_cache",
|
||||
"get_uploader",
|
||||
"normalize_input_files",
|
||||
"reset_upload_cache",
|
||||
"wrap_file_source",
|
||||
]
|
||||
373
lib/crewai/src/crewai/files/cleanup.py
Normal file
373
lib/crewai/src/crewai/files/cleanup.py
Normal file
@@ -0,0 +1,373 @@
|
||||
"""Cleanup utilities for uploaded files."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import logging
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
from crewai.files.upload_cache import CachedUpload, UploadCache
|
||||
from crewai.files.uploaders import get_uploader
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from crewai.files.uploaders.base import FileUploader
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def _safe_delete(
|
||||
uploader: FileUploader,
|
||||
file_id: str,
|
||||
provider: str,
|
||||
) -> bool:
|
||||
"""Safely delete a file, logging any errors.
|
||||
|
||||
Args:
|
||||
uploader: The file uploader to use.
|
||||
file_id: The file ID to delete.
|
||||
provider: Provider name for logging.
|
||||
|
||||
Returns:
|
||||
True if deleted successfully, False otherwise.
|
||||
"""
|
||||
try:
|
||||
if uploader.delete(file_id):
|
||||
logger.debug(f"Deleted {file_id} from {provider}")
|
||||
return True
|
||||
logger.warning(f"Failed to delete {file_id} from {provider}")
|
||||
return False
|
||||
except Exception as e:
|
||||
logger.warning(f"Error deleting {file_id} from {provider}: {e}")
|
||||
return False
|
||||
|
||||
|
||||
def cleanup_uploaded_files(
|
||||
cache: UploadCache,
|
||||
*,
|
||||
delete_from_provider: bool = True,
|
||||
providers: list[str] | None = None,
|
||||
) -> int:
|
||||
"""Clean up uploaded files from the cache and optionally from providers.
|
||||
|
||||
Args:
|
||||
cache: The upload cache to clean up.
|
||||
delete_from_provider: If True, delete files from the provider as well.
|
||||
providers: Optional list of providers to clean up. If None, cleans all.
|
||||
|
||||
Returns:
|
||||
Number of files cleaned up.
|
||||
"""
|
||||
cleaned = 0
|
||||
|
||||
provider_uploads: dict[str, list[CachedUpload]] = {}
|
||||
|
||||
for provider in _get_providers_from_cache(cache):
|
||||
if providers is not None and provider not in providers:
|
||||
continue
|
||||
provider_uploads[provider] = cache.get_all_for_provider(provider)
|
||||
|
||||
if delete_from_provider:
|
||||
for provider, uploads in provider_uploads.items():
|
||||
uploader = get_uploader(provider)
|
||||
if uploader is None:
|
||||
logger.warning(
|
||||
f"No uploader available for {provider}, skipping cleanup"
|
||||
)
|
||||
continue
|
||||
|
||||
for upload in uploads:
|
||||
if _safe_delete(uploader, upload.file_id, provider):
|
||||
cleaned += 1
|
||||
|
||||
cache.clear()
|
||||
|
||||
logger.info(f"Cleaned up {cleaned} uploaded files")
|
||||
return cleaned
|
||||
|
||||
|
||||
def cleanup_expired_files(
|
||||
cache: UploadCache,
|
||||
*,
|
||||
delete_from_provider: bool = False,
|
||||
) -> int:
|
||||
"""Clean up expired files from the cache.
|
||||
|
||||
Args:
|
||||
cache: The upload cache to clean up.
|
||||
delete_from_provider: If True, attempt to delete from provider as well.
|
||||
Note: Expired files may already be deleted by the provider.
|
||||
|
||||
Returns:
|
||||
Number of expired entries removed from cache.
|
||||
"""
|
||||
expired_entries: list[CachedUpload] = []
|
||||
|
||||
if delete_from_provider:
|
||||
for provider in _get_providers_from_cache(cache):
|
||||
expired_entries.extend(
|
||||
upload
|
||||
for upload in cache.get_all_for_provider(provider)
|
||||
if upload.is_expired()
|
||||
)
|
||||
|
||||
removed = cache.clear_expired()
|
||||
|
||||
if delete_from_provider:
|
||||
for upload in expired_entries:
|
||||
uploader = get_uploader(upload.provider)
|
||||
if uploader is not None:
|
||||
try:
|
||||
uploader.delete(upload.file_id)
|
||||
except Exception as e:
|
||||
logger.debug(f"Could not delete expired file {upload.file_id}: {e}")
|
||||
|
||||
return removed
|
||||
|
||||
|
||||
def cleanup_provider_files(
|
||||
provider: str,
|
||||
*,
|
||||
cache: UploadCache | None = None,
|
||||
delete_all_from_provider: bool = False,
|
||||
) -> int:
|
||||
"""Clean up all files for a specific provider.
|
||||
|
||||
Args:
|
||||
provider: Provider name to clean up.
|
||||
cache: Optional upload cache to clear entries from.
|
||||
delete_all_from_provider: If True, delete all files from the provider,
|
||||
not just cached ones.
|
||||
|
||||
Returns:
|
||||
Number of files deleted.
|
||||
"""
|
||||
deleted = 0
|
||||
uploader = get_uploader(provider)
|
||||
|
||||
if uploader is None:
|
||||
logger.warning(f"No uploader available for {provider}")
|
||||
return 0
|
||||
|
||||
if delete_all_from_provider:
|
||||
try:
|
||||
files = uploader.list_files()
|
||||
for file_info in files:
|
||||
file_id = file_info.get("id") or file_info.get("name")
|
||||
if file_id and uploader.delete(file_id):
|
||||
deleted += 1
|
||||
except Exception as e:
|
||||
logger.warning(f"Error listing/deleting files from {provider}: {e}")
|
||||
elif cache is not None:
|
||||
uploads = cache.get_all_for_provider(provider)
|
||||
for upload in uploads:
|
||||
if _safe_delete(uploader, upload.file_id, provider):
|
||||
deleted += 1
|
||||
cache.remove_by_file_id(upload.file_id, provider)
|
||||
|
||||
logger.info(f"Deleted {deleted} files from {provider}")
|
||||
return deleted
|
||||
|
||||
|
||||
def _get_providers_from_cache(cache: UploadCache) -> set[str]:
|
||||
"""Get unique provider names from cache entries.
|
||||
|
||||
Args:
|
||||
cache: The upload cache.
|
||||
|
||||
Returns:
|
||||
Set of provider names.
|
||||
"""
|
||||
return cache.get_providers()
|
||||
|
||||
|
||||
async def _asafe_delete(
|
||||
uploader: FileUploader,
|
||||
file_id: str,
|
||||
provider: str,
|
||||
) -> bool:
|
||||
"""Async safely delete a file, logging any errors.
|
||||
|
||||
Args:
|
||||
uploader: The file uploader to use.
|
||||
file_id: The file ID to delete.
|
||||
provider: Provider name for logging.
|
||||
|
||||
Returns:
|
||||
True if deleted successfully, False otherwise.
|
||||
"""
|
||||
try:
|
||||
if await uploader.adelete(file_id):
|
||||
logger.debug(f"Deleted {file_id} from {provider}")
|
||||
return True
|
||||
logger.warning(f"Failed to delete {file_id} from {provider}")
|
||||
return False
|
||||
except Exception as e:
|
||||
logger.warning(f"Error deleting {file_id} from {provider}: {e}")
|
||||
return False
|
||||
|
||||
|
||||
async def acleanup_uploaded_files(
|
||||
cache: UploadCache,
|
||||
*,
|
||||
delete_from_provider: bool = True,
|
||||
providers: list[str] | None = None,
|
||||
max_concurrency: int = 10,
|
||||
) -> int:
|
||||
"""Async clean up uploaded files from the cache and optionally from providers.
|
||||
|
||||
Args:
|
||||
cache: The upload cache to clean up.
|
||||
delete_from_provider: If True, delete files from the provider as well.
|
||||
providers: Optional list of providers to clean up. If None, cleans all.
|
||||
max_concurrency: Maximum number of concurrent delete operations.
|
||||
|
||||
Returns:
|
||||
Number of files cleaned up.
|
||||
"""
|
||||
cleaned = 0
|
||||
|
||||
provider_uploads: dict[str, list[CachedUpload]] = {}
|
||||
|
||||
for provider in _get_providers_from_cache(cache):
|
||||
if providers is not None and provider not in providers:
|
||||
continue
|
||||
provider_uploads[provider] = await cache.aget_all_for_provider(provider)
|
||||
|
||||
if delete_from_provider:
|
||||
semaphore = asyncio.Semaphore(max_concurrency)
|
||||
|
||||
async def delete_one(file_uploader: FileUploader, cached: CachedUpload) -> bool:
|
||||
"""Delete a single file with semaphore limiting."""
|
||||
async with semaphore:
|
||||
return await _asafe_delete(
|
||||
file_uploader, cached.file_id, cached.provider
|
||||
)
|
||||
|
||||
tasks: list[asyncio.Task[bool]] = []
|
||||
for provider, uploads in provider_uploads.items():
|
||||
uploader = get_uploader(provider)
|
||||
if uploader is None:
|
||||
logger.warning(
|
||||
f"No uploader available for {provider}, skipping cleanup"
|
||||
)
|
||||
continue
|
||||
|
||||
tasks.extend(
|
||||
asyncio.create_task(delete_one(uploader, cached)) for cached in uploads
|
||||
)
|
||||
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
cleaned = sum(1 for r in results if r is True)
|
||||
|
||||
await cache.aclear()
|
||||
|
||||
logger.info(f"Cleaned up {cleaned} uploaded files")
|
||||
return cleaned
|
||||
|
||||
|
||||
async def acleanup_expired_files(
|
||||
cache: UploadCache,
|
||||
*,
|
||||
delete_from_provider: bool = False,
|
||||
max_concurrency: int = 10,
|
||||
) -> int:
|
||||
"""Async clean up expired files from the cache.
|
||||
|
||||
Args:
|
||||
cache: The upload cache to clean up.
|
||||
delete_from_provider: If True, attempt to delete from provider as well.
|
||||
max_concurrency: Maximum number of concurrent delete operations.
|
||||
|
||||
Returns:
|
||||
Number of expired entries removed from cache.
|
||||
"""
|
||||
expired_entries: list[CachedUpload] = []
|
||||
|
||||
if delete_from_provider:
|
||||
for provider in _get_providers_from_cache(cache):
|
||||
uploads = await cache.aget_all_for_provider(provider)
|
||||
expired_entries.extend(upload for upload in uploads if upload.is_expired())
|
||||
|
||||
removed = await cache.aclear_expired()
|
||||
|
||||
if delete_from_provider and expired_entries:
|
||||
semaphore = asyncio.Semaphore(max_concurrency)
|
||||
|
||||
async def delete_expired(cached: CachedUpload) -> None:
|
||||
"""Delete an expired file with semaphore limiting."""
|
||||
async with semaphore:
|
||||
file_uploader = get_uploader(cached.provider)
|
||||
if file_uploader is not None:
|
||||
try:
|
||||
await file_uploader.adelete(cached.file_id)
|
||||
except Exception as e:
|
||||
logger.debug(
|
||||
f"Could not delete expired file {cached.file_id}: {e}"
|
||||
)
|
||||
|
||||
await asyncio.gather(
|
||||
*[delete_expired(cached) for cached in expired_entries],
|
||||
return_exceptions=True,
|
||||
)
|
||||
|
||||
return removed
|
||||
|
||||
|
||||
async def acleanup_provider_files(
|
||||
provider: str,
|
||||
*,
|
||||
cache: UploadCache | None = None,
|
||||
delete_all_from_provider: bool = False,
|
||||
max_concurrency: int = 10,
|
||||
) -> int:
|
||||
"""Async clean up all files for a specific provider.
|
||||
|
||||
Args:
|
||||
provider: Provider name to clean up.
|
||||
cache: Optional upload cache to clear entries from.
|
||||
delete_all_from_provider: If True, delete all files from the provider.
|
||||
max_concurrency: Maximum number of concurrent delete operations.
|
||||
|
||||
Returns:
|
||||
Number of files deleted.
|
||||
"""
|
||||
deleted = 0
|
||||
uploader = get_uploader(provider)
|
||||
|
||||
if uploader is None:
|
||||
logger.warning(f"No uploader available for {provider}")
|
||||
return 0
|
||||
|
||||
semaphore = asyncio.Semaphore(max_concurrency)
|
||||
|
||||
async def delete_single(target_file_id: str) -> bool:
|
||||
"""Delete a single file with semaphore limiting."""
|
||||
async with semaphore:
|
||||
return await uploader.adelete(target_file_id)
|
||||
|
||||
if delete_all_from_provider:
|
||||
try:
|
||||
files = uploader.list_files()
|
||||
tasks = []
|
||||
for file_info in files:
|
||||
fid = file_info.get("id") or file_info.get("name")
|
||||
if fid:
|
||||
tasks.append(delete_single(fid))
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
deleted = sum(1 for r in results if r is True)
|
||||
except Exception as e:
|
||||
logger.warning(f"Error listing/deleting files from {provider}: {e}")
|
||||
elif cache is not None:
|
||||
uploads = await cache.aget_all_for_provider(provider)
|
||||
tasks = []
|
||||
for upload in uploads:
|
||||
tasks.append(delete_single(upload.file_id))
|
||||
results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
for upload, result in zip(uploads, results, strict=False):
|
||||
if result is True:
|
||||
deleted += 1
|
||||
await cache.aremove_by_file_id(upload.file_id, provider)
|
||||
|
||||
logger.info(f"Deleted {deleted} files from {provider}")
|
||||
return deleted
|
||||
267
lib/crewai/src/crewai/files/content_types.py
Normal file
267
lib/crewai/src/crewai/files/content_types.py
Normal file
@@ -0,0 +1,267 @@
|
||||
"""Content-type specific file classes."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from abc import ABC
|
||||
from io import IOBase
|
||||
from pathlib import Path
|
||||
from typing import Annotated, Any, Literal, Self
|
||||
|
||||
from pydantic import BaseModel, Field, GetCoreSchemaHandler
|
||||
from pydantic_core import CoreSchema, core_schema
|
||||
from typing_extensions import TypeIs
|
||||
|
||||
from crewai.files.file import (
|
||||
AsyncFileStream,
|
||||
FileBytes,
|
||||
FilePath,
|
||||
FileSource,
|
||||
FileStream,
|
||||
)
|
||||
|
||||
|
||||
FileSourceInput = str | Path | bytes | IOBase | FileSource
|
||||
|
||||
|
||||
class _FileSourceCoercer:
|
||||
"""Pydantic-compatible type that coerces various inputs to FileSource."""
|
||||
|
||||
@classmethod
|
||||
def _coerce(cls, v: Any) -> FileSource:
|
||||
"""Convert raw input to appropriate FileSource type."""
|
||||
if isinstance(v, (FilePath, FileBytes, FileStream)):
|
||||
return v
|
||||
if isinstance(v, Path):
|
||||
return FilePath(path=v)
|
||||
if isinstance(v, str):
|
||||
return FilePath(path=Path(v))
|
||||
if isinstance(v, bytes):
|
||||
return FileBytes(data=v)
|
||||
if isinstance(v, IOBase):
|
||||
return FileStream(stream=v)
|
||||
raise ValueError(f"Cannot convert {type(v).__name__} to file source")
|
||||
|
||||
@classmethod
|
||||
def __get_pydantic_core_schema__(
|
||||
cls,
|
||||
source_type: Any,
|
||||
handler: GetCoreSchemaHandler,
|
||||
) -> CoreSchema:
|
||||
"""Generate Pydantic core schema for FileSource coercion."""
|
||||
return core_schema.no_info_plain_validator_function(
|
||||
cls._coerce,
|
||||
serialization=core_schema.plain_serializer_function_ser_schema(
|
||||
lambda v: v,
|
||||
info_arg=False,
|
||||
return_schema=core_schema.any_schema(),
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
CoercedFileSource = Annotated[FileSourceInput, _FileSourceCoercer]
|
||||
|
||||
|
||||
def _is_file_source(v: FileSourceInput) -> TypeIs[FileSource]:
|
||||
"""Type guard to narrow FileSourceInput to FileSource."""
|
||||
return isinstance(v, (FilePath, FileBytes, FileStream))
|
||||
|
||||
|
||||
FileMode = Literal["strict", "auto", "warn", "chunk"]
|
||||
|
||||
|
||||
ImageExtension = Literal[
|
||||
".png", ".jpg", ".jpeg", ".gif", ".webp", ".bmp", ".tiff", ".tif", ".svg"
|
||||
]
|
||||
ImageContentType = Literal[
|
||||
"image/png",
|
||||
"image/jpeg",
|
||||
"image/gif",
|
||||
"image/webp",
|
||||
"image/bmp",
|
||||
"image/tiff",
|
||||
"image/svg+xml",
|
||||
]
|
||||
|
||||
PDFExtension = Literal[".pdf"]
|
||||
PDFContentType = Literal["application/pdf"]
|
||||
|
||||
TextExtension = Literal[
|
||||
".txt",
|
||||
".md",
|
||||
".rst",
|
||||
".csv",
|
||||
".json",
|
||||
".xml",
|
||||
".yaml",
|
||||
".yml",
|
||||
".html",
|
||||
".htm",
|
||||
".log",
|
||||
".ini",
|
||||
".cfg",
|
||||
".conf",
|
||||
]
|
||||
TextContentType = Literal[
|
||||
"text/plain",
|
||||
"text/markdown",
|
||||
"text/csv",
|
||||
"application/json",
|
||||
"application/xml",
|
||||
"text/xml",
|
||||
"application/x-yaml",
|
||||
"text/yaml",
|
||||
"text/html",
|
||||
]
|
||||
|
||||
AudioExtension = Literal[
|
||||
".mp3", ".wav", ".ogg", ".flac", ".aac", ".m4a", ".wma", ".aiff", ".opus"
|
||||
]
|
||||
AudioContentType = Literal[
|
||||
"audio/mpeg",
|
||||
"audio/wav",
|
||||
"audio/x-wav",
|
||||
"audio/ogg",
|
||||
"audio/flac",
|
||||
"audio/aac",
|
||||
"audio/mp4",
|
||||
"audio/x-ms-wma",
|
||||
"audio/aiff",
|
||||
"audio/opus",
|
||||
]
|
||||
|
||||
VideoExtension = Literal[
|
||||
".mp4", ".avi", ".mkv", ".mov", ".webm", ".flv", ".wmv", ".m4v", ".mpeg", ".mpg"
|
||||
]
|
||||
VideoContentType = Literal[
|
||||
"video/mp4",
|
||||
"video/x-msvideo",
|
||||
"video/x-matroska",
|
||||
"video/quicktime",
|
||||
"video/webm",
|
||||
"video/x-flv",
|
||||
"video/x-ms-wmv",
|
||||
"video/mpeg",
|
||||
]
|
||||
|
||||
|
||||
class BaseFile(ABC, BaseModel):
|
||||
"""Abstract base class for typed file wrappers.
|
||||
|
||||
Provides common functionality for all file types including:
|
||||
- File source management
|
||||
- Content reading
|
||||
- Dict unpacking support (`**` syntax)
|
||||
- Per-file mode mode
|
||||
|
||||
Can be unpacked with ** syntax: `{**ImageFile(source="./chart.png")}`
|
||||
which unpacks to: `{"chart": <ImageFile instance>}` using filename stem as key.
|
||||
|
||||
Attributes:
|
||||
source: The underlying file source (path, bytes, or stream).
|
||||
mode: How to handle this file if it exceeds provider limits.
|
||||
"""
|
||||
|
||||
source: CoercedFileSource = Field(description="The underlying file source.")
|
||||
mode: FileMode = Field(
|
||||
default="auto",
|
||||
description="How to handle if file exceeds limits: strict, auto, warn, chunk.",
|
||||
)
|
||||
|
||||
@property
|
||||
def _file_source(self) -> FileSource:
|
||||
"""Get source with narrowed type (always FileSource after validation)."""
|
||||
if _is_file_source(self.source):
|
||||
return self.source
|
||||
raise TypeError("source must be a FileSource after validation")
|
||||
|
||||
@property
|
||||
def filename(self) -> str | None:
|
||||
"""Get the filename from the source."""
|
||||
return self._file_source.filename
|
||||
|
||||
@property
|
||||
def content_type(self) -> str:
|
||||
"""Get the content type from the source."""
|
||||
return self._file_source.content_type
|
||||
|
||||
def read(self) -> bytes:
|
||||
"""Read the file content as bytes."""
|
||||
return self._file_source.read() # type: ignore[union-attr]
|
||||
|
||||
async def aread(self) -> bytes:
|
||||
"""Async read the file content as bytes.
|
||||
|
||||
Raises:
|
||||
TypeError: If the underlying source doesn't support async read.
|
||||
"""
|
||||
source = self._file_source
|
||||
if isinstance(source, (FilePath, FileBytes, AsyncFileStream)):
|
||||
return await source.aread()
|
||||
raise TypeError(f"{type(source).__name__} does not support async read")
|
||||
|
||||
def read_text(self, encoding: str = "utf-8") -> str:
|
||||
"""Read the file content as string."""
|
||||
return self.read().decode(encoding)
|
||||
|
||||
@property
|
||||
def _unpack_key(self) -> str:
|
||||
"""Get the key to use when unpacking (filename stem)."""
|
||||
filename = self._file_source.filename
|
||||
if filename:
|
||||
return Path(filename).stem
|
||||
return "file"
|
||||
|
||||
def keys(self) -> list[str]:
|
||||
"""Return keys for dict unpacking."""
|
||||
return [self._unpack_key]
|
||||
|
||||
def __getitem__(self, key: str) -> Self:
|
||||
"""Return self for dict unpacking."""
|
||||
if key == self._unpack_key:
|
||||
return self
|
||||
raise KeyError(key)
|
||||
|
||||
|
||||
class ImageFile(BaseFile):
|
||||
"""File representing an image.
|
||||
|
||||
Supports common image formats: PNG, JPEG, GIF, WebP, BMP, TIFF, SVG.
|
||||
"""
|
||||
|
||||
|
||||
class PDFFile(BaseFile):
|
||||
"""File representing a PDF document."""
|
||||
|
||||
|
||||
class TextFile(BaseFile):
|
||||
"""File representing a text document.
|
||||
|
||||
Supports common text formats: TXT, MD, RST, CSV, JSON, XML, YAML, HTML.
|
||||
"""
|
||||
|
||||
|
||||
class AudioFile(BaseFile):
|
||||
"""File representing an audio file.
|
||||
|
||||
Supports common audio formats: MP3, WAV, OGG, FLAC, AAC, M4A, WMA.
|
||||
"""
|
||||
|
||||
|
||||
class VideoFile(BaseFile):
|
||||
"""File representing a video file.
|
||||
|
||||
Supports common video formats: MP4, AVI, MKV, MOV, WebM, FLV, WMV.
|
||||
"""
|
||||
|
||||
|
||||
class File(BaseFile):
|
||||
"""Generic file that auto-detects the appropriate type.
|
||||
|
||||
Use this when you don't want to specify the exact file type.
|
||||
The content type is automatically detected from the file contents.
|
||||
|
||||
Example:
|
||||
>>> file = File(source="./document.pdf")
|
||||
>>> file = File(source="./image.png")
|
||||
>>> file = File(source=some_bytes)
|
||||
"""
|
||||
390
lib/crewai/src/crewai/files/file.py
Normal file
390
lib/crewai/src/crewai/files/file.py
Normal file
@@ -0,0 +1,390 @@
|
||||
"""Base file class for handling file inputs in tasks."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import AsyncIterator, Iterator
|
||||
import mimetypes
|
||||
from pathlib import Path
|
||||
from typing import Annotated, Any, BinaryIO, Protocol, cast, runtime_checkable
|
||||
|
||||
import aiofiles
|
||||
from pydantic import (
|
||||
BaseModel,
|
||||
BeforeValidator,
|
||||
Field,
|
||||
GetCoreSchemaHandler,
|
||||
PrivateAttr,
|
||||
model_validator,
|
||||
)
|
||||
from pydantic_core import CoreSchema, core_schema
|
||||
|
||||
|
||||
@runtime_checkable
|
||||
class AsyncReadable(Protocol):
|
||||
"""Protocol for async readable streams."""
|
||||
|
||||
async def read(self, size: int = -1) -> bytes: ...
|
||||
|
||||
|
||||
class _AsyncReadableValidator:
|
||||
"""Pydantic validator for AsyncReadable types."""
|
||||
|
||||
@classmethod
|
||||
def __get_pydantic_core_schema__(
|
||||
cls, _source_type: Any, _handler: GetCoreSchemaHandler
|
||||
) -> CoreSchema:
|
||||
return core_schema.no_info_plain_validator_function(
|
||||
cls._validate,
|
||||
serialization=core_schema.plain_serializer_function_ser_schema(
|
||||
lambda x: None, info_arg=False
|
||||
),
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _validate(value: Any) -> AsyncReadable:
|
||||
if isinstance(value, AsyncReadable):
|
||||
return value
|
||||
raise ValueError("Expected an async readable object with async read() method")
|
||||
|
||||
|
||||
ValidatedAsyncReadable = Annotated[AsyncReadable, _AsyncReadableValidator()]
|
||||
|
||||
DEFAULT_MAX_FILE_SIZE_BYTES = 500 * 1024 * 1024 # 500MB
|
||||
|
||||
|
||||
def detect_content_type(data: bytes, filename: str | None = None) -> str:
|
||||
"""Detect MIME type from file content.
|
||||
|
||||
Uses python-magic if available for accurate content-based detection,
|
||||
falls back to mimetypes module using filename extension.
|
||||
|
||||
Args:
|
||||
data: Raw bytes to analyze.
|
||||
filename: Optional filename for extension-based fallback.
|
||||
|
||||
Returns:
|
||||
The detected MIME type.
|
||||
"""
|
||||
try:
|
||||
import magic
|
||||
|
||||
result: str = magic.from_buffer(data, mime=True)
|
||||
return result
|
||||
except ImportError:
|
||||
if filename:
|
||||
mime_type, _ = mimetypes.guess_type(filename)
|
||||
if mime_type:
|
||||
return mime_type
|
||||
return "application/octet-stream"
|
||||
|
||||
|
||||
class _BinaryIOValidator:
|
||||
"""Pydantic validator for BinaryIO types."""
|
||||
|
||||
@classmethod
|
||||
def __get_pydantic_core_schema__(
|
||||
cls, _source_type: Any, _handler: GetCoreSchemaHandler
|
||||
) -> CoreSchema:
|
||||
return core_schema.no_info_plain_validator_function(
|
||||
cls._validate,
|
||||
serialization=core_schema.plain_serializer_function_ser_schema(
|
||||
lambda x: None, info_arg=False
|
||||
),
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _validate(value: Any) -> BinaryIO:
|
||||
if hasattr(value, "read") and hasattr(value, "seek"):
|
||||
return cast(BinaryIO, value)
|
||||
raise ValueError("Expected a binary file-like object with read() and seek()")
|
||||
|
||||
|
||||
ValidatedBinaryIO = Annotated[BinaryIO, _BinaryIOValidator()]
|
||||
|
||||
|
||||
class FilePath(BaseModel):
|
||||
"""File loaded from a filesystem path."""
|
||||
|
||||
path: Path = Field(description="Path to the file on the filesystem.")
|
||||
max_size_bytes: int = Field(
|
||||
default=DEFAULT_MAX_FILE_SIZE_BYTES,
|
||||
exclude=True,
|
||||
description="Maximum file size in bytes.",
|
||||
)
|
||||
_content: bytes | None = PrivateAttr(default=None)
|
||||
|
||||
@model_validator(mode="after")
|
||||
def _validate_file_exists(self) -> FilePath:
|
||||
"""Validate that the file exists, is secure, and within size limits."""
|
||||
from crewai.files.processing.exceptions import FileTooLargeError
|
||||
|
||||
path_str = str(self.path)
|
||||
if ".." in path_str:
|
||||
raise ValueError(f"Path traversal not allowed: {self.path}")
|
||||
|
||||
if self.path.is_symlink():
|
||||
resolved = self.path.resolve()
|
||||
cwd = Path.cwd().resolve()
|
||||
if not str(resolved).startswith(str(cwd)):
|
||||
raise ValueError(f"Symlink escapes allowed directory: {self.path}")
|
||||
|
||||
if not self.path.exists():
|
||||
raise ValueError(f"File not found: {self.path}")
|
||||
if not self.path.is_file():
|
||||
raise ValueError(f"Path is not a file: {self.path}")
|
||||
|
||||
actual_size = self.path.stat().st_size
|
||||
if actual_size > self.max_size_bytes:
|
||||
raise FileTooLargeError(
|
||||
f"File exceeds max size ({actual_size} > {self.max_size_bytes})",
|
||||
file_name=str(self.path),
|
||||
actual_size=actual_size,
|
||||
max_size=self.max_size_bytes,
|
||||
)
|
||||
|
||||
return self
|
||||
|
||||
@property
|
||||
def filename(self) -> str:
|
||||
"""Get the filename from the path."""
|
||||
return self.path.name
|
||||
|
||||
@property
|
||||
def content_type(self) -> str:
|
||||
"""Get the content type by reading file content."""
|
||||
return detect_content_type(self.read(), self.filename)
|
||||
|
||||
def read(self) -> bytes:
|
||||
"""Read the file content from disk."""
|
||||
if self._content is None:
|
||||
self._content = self.path.read_bytes()
|
||||
return self._content
|
||||
|
||||
async def aread(self) -> bytes:
|
||||
"""Async read the file content from disk."""
|
||||
if self._content is None:
|
||||
async with aiofiles.open(self.path, "rb") as f:
|
||||
self._content = await f.read()
|
||||
return self._content
|
||||
|
||||
def read_chunks(self, chunk_size: int = 65536) -> Iterator[bytes]:
|
||||
"""Stream file content in chunks without loading entirely into memory.
|
||||
|
||||
Args:
|
||||
chunk_size: Size of each chunk in bytes.
|
||||
|
||||
Yields:
|
||||
Chunks of file content.
|
||||
"""
|
||||
with open(self.path, "rb") as f:
|
||||
while chunk := f.read(chunk_size):
|
||||
yield chunk
|
||||
|
||||
async def aread_chunks(self, chunk_size: int = 65536) -> AsyncIterator[bytes]:
|
||||
"""Async streaming for non-blocking I/O.
|
||||
|
||||
Args:
|
||||
chunk_size: Size of each chunk in bytes.
|
||||
|
||||
Yields:
|
||||
Chunks of file content.
|
||||
"""
|
||||
async with aiofiles.open(self.path, "rb") as f:
|
||||
while chunk := await f.read(chunk_size):
|
||||
yield chunk
|
||||
|
||||
|
||||
class FileBytes(BaseModel):
|
||||
"""File created from raw bytes content."""
|
||||
|
||||
data: bytes = Field(description="Raw bytes content of the file.")
|
||||
filename: str | None = Field(default=None, description="Optional filename.")
|
||||
|
||||
@property
|
||||
def content_type(self) -> str:
|
||||
"""Get the content type from the data."""
|
||||
return detect_content_type(self.data, self.filename)
|
||||
|
||||
def read(self) -> bytes:
|
||||
"""Return the bytes content."""
|
||||
return self.data
|
||||
|
||||
async def aread(self) -> bytes:
|
||||
"""Async return the bytes content (immediate, already in memory)."""
|
||||
return self.data
|
||||
|
||||
def read_chunks(self, chunk_size: int = 65536) -> Iterator[bytes]:
|
||||
"""Stream bytes content in chunks.
|
||||
|
||||
Args:
|
||||
chunk_size: Size of each chunk in bytes.
|
||||
|
||||
Yields:
|
||||
Chunks of bytes content.
|
||||
"""
|
||||
for i in range(0, len(self.data), chunk_size):
|
||||
yield self.data[i : i + chunk_size]
|
||||
|
||||
async def aread_chunks(self, chunk_size: int = 65536) -> AsyncIterator[bytes]:
|
||||
"""Async streaming (immediate yield since already in memory).
|
||||
|
||||
Args:
|
||||
chunk_size: Size of each chunk in bytes.
|
||||
|
||||
Yields:
|
||||
Chunks of bytes content.
|
||||
"""
|
||||
for chunk in self.read_chunks(chunk_size):
|
||||
yield chunk
|
||||
|
||||
|
||||
class FileStream(BaseModel):
|
||||
"""File loaded from a file-like stream."""
|
||||
|
||||
stream: ValidatedBinaryIO = Field(description="Binary file stream.")
|
||||
filename: str | None = Field(default=None, description="Optional filename.")
|
||||
_content: bytes | None = PrivateAttr(default=None)
|
||||
|
||||
def model_post_init(self, __context: object) -> None:
|
||||
"""Extract filename from stream if not provided."""
|
||||
if self.filename is None:
|
||||
name = getattr(self.stream, "name", None)
|
||||
if name is not None:
|
||||
object.__setattr__(self, "filename", Path(name).name)
|
||||
|
||||
@property
|
||||
def content_type(self) -> str:
|
||||
"""Get the content type from stream content."""
|
||||
return detect_content_type(self.read(), self.filename)
|
||||
|
||||
def read(self) -> bytes:
|
||||
"""Read the stream content. Content is cached after first read."""
|
||||
if self._content is None:
|
||||
position = self.stream.tell()
|
||||
self.stream.seek(0)
|
||||
self._content = self.stream.read()
|
||||
self.stream.seek(position)
|
||||
return self._content
|
||||
|
||||
def close(self) -> None:
|
||||
"""Close the underlying stream."""
|
||||
self.stream.close()
|
||||
|
||||
def __enter__(self) -> FileStream:
|
||||
"""Enter context manager."""
|
||||
return self
|
||||
|
||||
def __exit__(
|
||||
self,
|
||||
exc_type: type[BaseException] | None,
|
||||
exc_val: BaseException | None,
|
||||
exc_tb: Any,
|
||||
) -> None:
|
||||
"""Exit context manager and close stream."""
|
||||
self.close()
|
||||
|
||||
def read_chunks(self, chunk_size: int = 65536) -> Iterator[bytes]:
|
||||
"""Stream from underlying stream in chunks.
|
||||
|
||||
Args:
|
||||
chunk_size: Size of each chunk in bytes.
|
||||
|
||||
Yields:
|
||||
Chunks of stream content.
|
||||
"""
|
||||
position = self.stream.tell()
|
||||
self.stream.seek(0)
|
||||
try:
|
||||
while chunk := self.stream.read(chunk_size):
|
||||
yield chunk
|
||||
finally:
|
||||
self.stream.seek(position)
|
||||
|
||||
|
||||
class AsyncFileStream(BaseModel):
|
||||
"""File loaded from an async stream.
|
||||
|
||||
Use for async file handles like aiofiles objects or aiohttp response bodies.
|
||||
This is an async-only type - use aread() instead of read().
|
||||
|
||||
Attributes:
|
||||
stream: Async file-like object with async read() method.
|
||||
filename: Optional filename for the stream.
|
||||
"""
|
||||
|
||||
stream: ValidatedAsyncReadable = Field(
|
||||
description="Async file stream with async read() method."
|
||||
)
|
||||
filename: str | None = Field(default=None, description="Optional filename.")
|
||||
_content: bytes | None = PrivateAttr(default=None)
|
||||
|
||||
@property
|
||||
def content_type(self) -> str:
|
||||
"""Get the content type from stream content. Requires aread() first."""
|
||||
if self._content is None:
|
||||
raise RuntimeError("Call aread() first to load content")
|
||||
return detect_content_type(self._content, self.filename)
|
||||
|
||||
async def aread(self) -> bytes:
|
||||
"""Async read the stream content. Content is cached after first read."""
|
||||
if self._content is None:
|
||||
self._content = await self.stream.read()
|
||||
return self._content
|
||||
|
||||
async def aclose(self) -> None:
|
||||
"""Async close the underlying stream."""
|
||||
if hasattr(self.stream, "close"):
|
||||
result = self.stream.close()
|
||||
if hasattr(result, "__await__"):
|
||||
await result
|
||||
|
||||
async def __aenter__(self) -> AsyncFileStream:
|
||||
"""Async enter context manager."""
|
||||
return self
|
||||
|
||||
async def __aexit__(
|
||||
self,
|
||||
exc_type: type[BaseException] | None,
|
||||
exc_val: BaseException | None,
|
||||
exc_tb: Any,
|
||||
) -> None:
|
||||
"""Async exit context manager and close stream."""
|
||||
await self.aclose()
|
||||
|
||||
async def aread_chunks(self, chunk_size: int = 65536) -> AsyncIterator[bytes]:
|
||||
"""Async stream content in chunks.
|
||||
|
||||
Args:
|
||||
chunk_size: Size of each chunk in bytes.
|
||||
|
||||
Yields:
|
||||
Chunks of stream content.
|
||||
"""
|
||||
while chunk := await self.stream.read(chunk_size):
|
||||
yield chunk
|
||||
|
||||
|
||||
FileSource = FilePath | FileBytes | FileStream | AsyncFileStream
|
||||
|
||||
|
||||
def _normalize_source(value: Any) -> FileSource:
|
||||
"""Convert raw input to appropriate source type."""
|
||||
if isinstance(value, (FilePath, FileBytes, FileStream, AsyncFileStream)):
|
||||
return value
|
||||
if isinstance(value, Path):
|
||||
return FilePath(path=value)
|
||||
if isinstance(value, str):
|
||||
return FilePath(path=Path(value))
|
||||
if isinstance(value, bytes):
|
||||
return FileBytes(data=value)
|
||||
if isinstance(value, AsyncReadable):
|
||||
return AsyncFileStream(stream=value)
|
||||
if hasattr(value, "read") and hasattr(value, "seek"):
|
||||
return FileStream(stream=value)
|
||||
raise ValueError(f"Cannot convert {type(value).__name__} to file source")
|
||||
|
||||
|
||||
RawFileInput = str | Path | bytes
|
||||
FileSourceInput = Annotated[
|
||||
RawFileInput | FileSource, BeforeValidator(_normalize_source)
|
||||
]
|
||||
184
lib/crewai/src/crewai/files/metrics.py
Normal file
184
lib/crewai/src/crewai/files/metrics.py
Normal file
@@ -0,0 +1,184 @@
|
||||
"""Performance metrics and structured logging for file operations."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import Generator
|
||||
from contextlib import contextmanager
|
||||
from dataclasses import dataclass, field
|
||||
from datetime import datetime, timezone
|
||||
import logging
|
||||
import time
|
||||
from typing import Any
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
@dataclass
|
||||
class FileOperationMetrics:
|
||||
"""Metrics for a file operation.
|
||||
|
||||
Attributes:
|
||||
operation: Name of the operation (e.g., "upload", "resolve", "process").
|
||||
filename: Name of the file being operated on.
|
||||
provider: Provider name if applicable.
|
||||
duration_ms: Duration of the operation in milliseconds.
|
||||
size_bytes: Size of the file in bytes.
|
||||
success: Whether the operation succeeded.
|
||||
error: Error message if operation failed.
|
||||
timestamp: When the operation occurred.
|
||||
metadata: Additional operation-specific metadata.
|
||||
"""
|
||||
|
||||
operation: str
|
||||
filename: str | None = None
|
||||
provider: str | None = None
|
||||
duration_ms: float = 0.0
|
||||
size_bytes: int | None = None
|
||||
success: bool = True
|
||||
error: str | None = None
|
||||
timestamp: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
|
||||
metadata: dict[str, Any] = field(default_factory=dict)
|
||||
|
||||
def to_dict(self) -> dict[str, Any]:
|
||||
"""Convert metrics to dictionary for logging.
|
||||
|
||||
Returns:
|
||||
Dictionary representation of metrics.
|
||||
"""
|
||||
result: dict[str, Any] = {
|
||||
"operation": self.operation,
|
||||
"duration_ms": round(self.duration_ms, 2),
|
||||
"success": self.success,
|
||||
"timestamp": self.timestamp.isoformat(),
|
||||
}
|
||||
|
||||
if self.filename:
|
||||
result["filename"] = self.filename
|
||||
if self.provider:
|
||||
result["provider"] = self.provider
|
||||
if self.size_bytes is not None:
|
||||
result["size_bytes"] = self.size_bytes
|
||||
if self.error:
|
||||
result["error"] = self.error
|
||||
if self.metadata:
|
||||
result.update(self.metadata)
|
||||
|
||||
return result
|
||||
|
||||
|
||||
@contextmanager
|
||||
def measure_operation(
|
||||
operation: str,
|
||||
*,
|
||||
filename: str | None = None,
|
||||
provider: str | None = None,
|
||||
size_bytes: int | None = None,
|
||||
log_level: int = logging.DEBUG,
|
||||
**extra_metadata: Any,
|
||||
) -> Generator[FileOperationMetrics, None, None]:
|
||||
"""Context manager to measure and log operation performance.
|
||||
|
||||
Args:
|
||||
operation: Name of the operation.
|
||||
filename: Optional filename being operated on.
|
||||
provider: Optional provider name.
|
||||
size_bytes: Optional file size in bytes.
|
||||
log_level: Log level for the result message.
|
||||
**extra_metadata: Additional metadata to include.
|
||||
|
||||
Yields:
|
||||
FileOperationMetrics object that will be populated with results.
|
||||
|
||||
Example:
|
||||
with measure_operation("upload", filename="test.pdf", provider="openai") as metrics:
|
||||
result = upload_file(file)
|
||||
metrics.metadata["file_id"] = result.file_id
|
||||
"""
|
||||
metrics = FileOperationMetrics(
|
||||
operation=operation,
|
||||
filename=filename,
|
||||
provider=provider,
|
||||
size_bytes=size_bytes,
|
||||
metadata=dict(extra_metadata),
|
||||
)
|
||||
|
||||
start_time = time.perf_counter()
|
||||
|
||||
try:
|
||||
yield metrics
|
||||
metrics.success = True
|
||||
except Exception as e:
|
||||
metrics.success = False
|
||||
metrics.error = str(e)
|
||||
raise
|
||||
finally:
|
||||
metrics.duration_ms = (time.perf_counter() - start_time) * 1000
|
||||
|
||||
log_message = f"{operation}"
|
||||
if filename:
|
||||
log_message += f" [{filename}]"
|
||||
if provider:
|
||||
log_message += f" ({provider})"
|
||||
|
||||
if metrics.success:
|
||||
log_message += f" completed in {metrics.duration_ms:.2f}ms"
|
||||
else:
|
||||
log_message += f" failed after {metrics.duration_ms:.2f}ms: {metrics.error}"
|
||||
|
||||
logger.log(log_level, log_message, extra=metrics.to_dict())
|
||||
|
||||
|
||||
def log_file_operation(
|
||||
operation: str,
|
||||
*,
|
||||
filename: str | None = None,
|
||||
provider: str | None = None,
|
||||
size_bytes: int | None = None,
|
||||
duration_ms: float | None = None,
|
||||
success: bool = True,
|
||||
error: str | None = None,
|
||||
level: int = logging.INFO,
|
||||
**extra: Any,
|
||||
) -> None:
|
||||
"""Log a file operation with structured data.
|
||||
|
||||
Args:
|
||||
operation: Name of the operation.
|
||||
filename: Optional filename being operated on.
|
||||
provider: Optional provider name.
|
||||
size_bytes: Optional file size in bytes.
|
||||
duration_ms: Optional duration in milliseconds.
|
||||
success: Whether the operation succeeded.
|
||||
error: Optional error message.
|
||||
level: Log level to use.
|
||||
**extra: Additional metadata to include.
|
||||
"""
|
||||
metrics = FileOperationMetrics(
|
||||
operation=operation,
|
||||
filename=filename,
|
||||
provider=provider,
|
||||
size_bytes=size_bytes,
|
||||
duration_ms=duration_ms or 0.0,
|
||||
success=success,
|
||||
error=error,
|
||||
metadata=dict(extra),
|
||||
)
|
||||
|
||||
message = f"{operation}"
|
||||
if filename:
|
||||
message += f" [{filename}]"
|
||||
if provider:
|
||||
message += f" ({provider})"
|
||||
|
||||
if success:
|
||||
if duration_ms:
|
||||
message += f" completed in {duration_ms:.2f}ms"
|
||||
else:
|
||||
message += " completed"
|
||||
else:
|
||||
message += " failed"
|
||||
if error:
|
||||
message += f": {error}"
|
||||
|
||||
logger.log(level, message, extra=metrics.to_dict())
|
||||
62
lib/crewai/src/crewai/files/processing/__init__.py
Normal file
62
lib/crewai/src/crewai/files/processing/__init__.py
Normal file
@@ -0,0 +1,62 @@
|
||||
"""File processing module for multimodal content handling.
|
||||
|
||||
This module provides validation, transformation, and processing utilities
|
||||
for files used in multimodal LLM interactions.
|
||||
"""
|
||||
|
||||
from crewai.files.processing.constraints import (
|
||||
ANTHROPIC_CONSTRAINTS,
|
||||
BEDROCK_CONSTRAINTS,
|
||||
GEMINI_CONSTRAINTS,
|
||||
OPENAI_CONSTRAINTS,
|
||||
AudioConstraints,
|
||||
ImageConstraints,
|
||||
PDFConstraints,
|
||||
ProviderConstraints,
|
||||
VideoConstraints,
|
||||
get_constraints_for_provider,
|
||||
)
|
||||
from crewai.files.processing.enums import FileHandling
|
||||
from crewai.files.processing.exceptions import (
|
||||
FileProcessingError,
|
||||
FileTooLargeError,
|
||||
FileValidationError,
|
||||
ProcessingDependencyError,
|
||||
UnsupportedFileTypeError,
|
||||
)
|
||||
from crewai.files.processing.processor import FileProcessor
|
||||
from crewai.files.processing.validators import (
|
||||
validate_audio,
|
||||
validate_file,
|
||||
validate_image,
|
||||
validate_pdf,
|
||||
validate_text,
|
||||
validate_video,
|
||||
)
|
||||
|
||||
|
||||
__all__ = [
|
||||
"ANTHROPIC_CONSTRAINTS",
|
||||
"BEDROCK_CONSTRAINTS",
|
||||
"GEMINI_CONSTRAINTS",
|
||||
"OPENAI_CONSTRAINTS",
|
||||
"AudioConstraints",
|
||||
"FileHandling",
|
||||
"FileProcessingError",
|
||||
"FileProcessor",
|
||||
"FileTooLargeError",
|
||||
"FileValidationError",
|
||||
"ImageConstraints",
|
||||
"PDFConstraints",
|
||||
"ProcessingDependencyError",
|
||||
"ProviderConstraints",
|
||||
"UnsupportedFileTypeError",
|
||||
"VideoConstraints",
|
||||
"get_constraints_for_provider",
|
||||
"validate_audio",
|
||||
"validate_file",
|
||||
"validate_image",
|
||||
"validate_pdf",
|
||||
"validate_text",
|
||||
"validate_video",
|
||||
]
|
||||
290
lib/crewai/src/crewai/files/processing/constraints.py
Normal file
290
lib/crewai/src/crewai/files/processing/constraints.py
Normal file
@@ -0,0 +1,290 @@
|
||||
"""Provider-specific file constraints for multimodal content."""
|
||||
|
||||
from dataclasses import dataclass
|
||||
from typing import Literal
|
||||
|
||||
|
||||
ImageFormat = Literal[
|
||||
"image/png",
|
||||
"image/jpeg",
|
||||
"image/gif",
|
||||
"image/webp",
|
||||
"image/heic",
|
||||
"image/heif",
|
||||
]
|
||||
|
||||
AudioFormat = Literal[
|
||||
"audio/mp3",
|
||||
"audio/mpeg",
|
||||
"audio/wav",
|
||||
"audio/ogg",
|
||||
"audio/flac",
|
||||
"audio/aac",
|
||||
"audio/m4a",
|
||||
"audio/opus",
|
||||
]
|
||||
|
||||
VideoFormat = Literal[
|
||||
"video/mp4",
|
||||
"video/mpeg",
|
||||
"video/webm",
|
||||
"video/quicktime",
|
||||
"video/x-msvideo",
|
||||
"video/x-flv",
|
||||
]
|
||||
|
||||
ProviderName = Literal[
|
||||
"anthropic",
|
||||
"openai",
|
||||
"gemini",
|
||||
"bedrock",
|
||||
"azure",
|
||||
]
|
||||
|
||||
# Pre-typed format tuples for common combinations
|
||||
DEFAULT_IMAGE_FORMATS: tuple[ImageFormat, ...] = (
|
||||
"image/png",
|
||||
"image/jpeg",
|
||||
"image/gif",
|
||||
"image/webp",
|
||||
)
|
||||
|
||||
GEMINI_IMAGE_FORMATS: tuple[ImageFormat, ...] = (
|
||||
"image/png",
|
||||
"image/jpeg",
|
||||
"image/gif",
|
||||
"image/webp",
|
||||
"image/heic",
|
||||
"image/heif",
|
||||
)
|
||||
|
||||
DEFAULT_AUDIO_FORMATS: tuple[AudioFormat, ...] = (
|
||||
"audio/mp3",
|
||||
"audio/mpeg",
|
||||
"audio/wav",
|
||||
"audio/ogg",
|
||||
"audio/flac",
|
||||
"audio/aac",
|
||||
"audio/m4a",
|
||||
)
|
||||
|
||||
GEMINI_AUDIO_FORMATS: tuple[AudioFormat, ...] = (
|
||||
"audio/mp3",
|
||||
"audio/mpeg",
|
||||
"audio/wav",
|
||||
"audio/ogg",
|
||||
"audio/flac",
|
||||
"audio/aac",
|
||||
"audio/m4a",
|
||||
"audio/opus",
|
||||
)
|
||||
|
||||
DEFAULT_VIDEO_FORMATS: tuple[VideoFormat, ...] = (
|
||||
"video/mp4",
|
||||
"video/mpeg",
|
||||
"video/webm",
|
||||
"video/quicktime",
|
||||
)
|
||||
|
||||
GEMINI_VIDEO_FORMATS: tuple[VideoFormat, ...] = (
|
||||
"video/mp4",
|
||||
"video/mpeg",
|
||||
"video/webm",
|
||||
"video/quicktime",
|
||||
"video/x-msvideo",
|
||||
"video/x-flv",
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ImageConstraints:
|
||||
"""Constraints for image files.
|
||||
|
||||
Attributes:
|
||||
max_size_bytes: Maximum file size in bytes.
|
||||
max_width: Maximum image width in pixels.
|
||||
max_height: Maximum image height in pixels.
|
||||
max_images_per_request: Maximum number of images per request.
|
||||
supported_formats: Supported image MIME types.
|
||||
"""
|
||||
|
||||
max_size_bytes: int
|
||||
max_width: int | None = None
|
||||
max_height: int | None = None
|
||||
max_images_per_request: int | None = None
|
||||
supported_formats: tuple[ImageFormat, ...] = DEFAULT_IMAGE_FORMATS
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class PDFConstraints:
|
||||
"""Constraints for PDF files.
|
||||
|
||||
Attributes:
|
||||
max_size_bytes: Maximum file size in bytes.
|
||||
max_pages: Maximum number of pages.
|
||||
"""
|
||||
|
||||
max_size_bytes: int
|
||||
max_pages: int | None = None
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class AudioConstraints:
|
||||
"""Constraints for audio files.
|
||||
|
||||
Attributes:
|
||||
max_size_bytes: Maximum file size in bytes.
|
||||
max_duration_seconds: Maximum audio duration in seconds.
|
||||
supported_formats: Supported audio MIME types.
|
||||
"""
|
||||
|
||||
max_size_bytes: int
|
||||
max_duration_seconds: int | None = None
|
||||
supported_formats: tuple[AudioFormat, ...] = DEFAULT_AUDIO_FORMATS
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class VideoConstraints:
|
||||
"""Constraints for video files.
|
||||
|
||||
Attributes:
|
||||
max_size_bytes: Maximum file size in bytes.
|
||||
max_duration_seconds: Maximum video duration in seconds.
|
||||
supported_formats: Supported video MIME types.
|
||||
"""
|
||||
|
||||
max_size_bytes: int
|
||||
max_duration_seconds: int | None = None
|
||||
supported_formats: tuple[VideoFormat, ...] = DEFAULT_VIDEO_FORMATS
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ProviderConstraints:
|
||||
"""Complete set of constraints for a provider.
|
||||
|
||||
Attributes:
|
||||
name: Provider name identifier.
|
||||
image: Image file constraints.
|
||||
pdf: PDF file constraints.
|
||||
audio: Audio file constraints.
|
||||
video: Video file constraints.
|
||||
general_max_size_bytes: Maximum size for any file type.
|
||||
supports_file_upload: Whether the provider supports file upload APIs.
|
||||
file_upload_threshold_bytes: Size threshold above which to use file upload.
|
||||
"""
|
||||
|
||||
name: ProviderName
|
||||
image: ImageConstraints | None = None
|
||||
pdf: PDFConstraints | None = None
|
||||
audio: AudioConstraints | None = None
|
||||
video: VideoConstraints | None = None
|
||||
general_max_size_bytes: int | None = None
|
||||
supports_file_upload: bool = False
|
||||
file_upload_threshold_bytes: int | None = None
|
||||
|
||||
|
||||
ANTHROPIC_CONSTRAINTS = ProviderConstraints(
|
||||
name="anthropic",
|
||||
image=ImageConstraints(
|
||||
max_size_bytes=5 * 1024 * 1024,
|
||||
max_width=8000,
|
||||
max_height=8000,
|
||||
),
|
||||
pdf=PDFConstraints(
|
||||
max_size_bytes=30 * 1024 * 1024,
|
||||
max_pages=100,
|
||||
),
|
||||
supports_file_upload=True,
|
||||
file_upload_threshold_bytes=5 * 1024 * 1024,
|
||||
)
|
||||
|
||||
OPENAI_CONSTRAINTS = ProviderConstraints(
|
||||
name="openai",
|
||||
image=ImageConstraints(
|
||||
max_size_bytes=20 * 1024 * 1024,
|
||||
max_images_per_request=10,
|
||||
),
|
||||
supports_file_upload=True,
|
||||
file_upload_threshold_bytes=5 * 1024 * 1024,
|
||||
)
|
||||
|
||||
GEMINI_CONSTRAINTS = ProviderConstraints(
|
||||
name="gemini",
|
||||
image=ImageConstraints(
|
||||
max_size_bytes=100 * 1024 * 1024,
|
||||
supported_formats=GEMINI_IMAGE_FORMATS,
|
||||
),
|
||||
pdf=PDFConstraints(
|
||||
max_size_bytes=50 * 1024 * 1024,
|
||||
),
|
||||
audio=AudioConstraints(
|
||||
max_size_bytes=100 * 1024 * 1024,
|
||||
supported_formats=GEMINI_AUDIO_FORMATS,
|
||||
),
|
||||
video=VideoConstraints(
|
||||
max_size_bytes=2 * 1024 * 1024 * 1024,
|
||||
supported_formats=GEMINI_VIDEO_FORMATS,
|
||||
),
|
||||
supports_file_upload=True,
|
||||
file_upload_threshold_bytes=20 * 1024 * 1024,
|
||||
)
|
||||
|
||||
BEDROCK_CONSTRAINTS = ProviderConstraints(
|
||||
name="bedrock",
|
||||
image=ImageConstraints(
|
||||
max_size_bytes=4_608_000,
|
||||
max_width=8000,
|
||||
max_height=8000,
|
||||
),
|
||||
pdf=PDFConstraints(
|
||||
max_size_bytes=3_840_000,
|
||||
max_pages=100,
|
||||
),
|
||||
)
|
||||
|
||||
AZURE_CONSTRAINTS = ProviderConstraints(
|
||||
name="azure",
|
||||
image=ImageConstraints(
|
||||
max_size_bytes=20 * 1024 * 1024,
|
||||
max_images_per_request=10,
|
||||
),
|
||||
)
|
||||
|
||||
|
||||
_PROVIDER_CONSTRAINTS_MAP: dict[str, ProviderConstraints] = {
|
||||
"anthropic": ANTHROPIC_CONSTRAINTS,
|
||||
"openai": OPENAI_CONSTRAINTS,
|
||||
"gemini": GEMINI_CONSTRAINTS,
|
||||
"bedrock": BEDROCK_CONSTRAINTS,
|
||||
"azure": AZURE_CONSTRAINTS,
|
||||
"claude": ANTHROPIC_CONSTRAINTS,
|
||||
"gpt": OPENAI_CONSTRAINTS,
|
||||
"google": GEMINI_CONSTRAINTS,
|
||||
"aws": BEDROCK_CONSTRAINTS,
|
||||
}
|
||||
|
||||
|
||||
def get_constraints_for_provider(
|
||||
provider: str | ProviderConstraints,
|
||||
) -> ProviderConstraints | None:
|
||||
"""Get constraints for a provider by name or return if already ProviderConstraints.
|
||||
|
||||
Args:
|
||||
provider: Provider name string or ProviderConstraints instance.
|
||||
|
||||
Returns:
|
||||
ProviderConstraints for the provider, or None if not found.
|
||||
"""
|
||||
if isinstance(provider, ProviderConstraints):
|
||||
return provider
|
||||
|
||||
provider_lower = provider.lower()
|
||||
|
||||
if provider_lower in _PROVIDER_CONSTRAINTS_MAP:
|
||||
return _PROVIDER_CONSTRAINTS_MAP[provider_lower]
|
||||
|
||||
for key, constraints in _PROVIDER_CONSTRAINTS_MAP.items():
|
||||
if key in provider_lower:
|
||||
return constraints
|
||||
|
||||
return None
|
||||
19
lib/crewai/src/crewai/files/processing/enums.py
Normal file
19
lib/crewai/src/crewai/files/processing/enums.py
Normal file
@@ -0,0 +1,19 @@
|
||||
"""Enums for file processing configuration."""
|
||||
|
||||
from enum import Enum
|
||||
|
||||
|
||||
class FileHandling(Enum):
|
||||
"""Defines how files exceeding provider limits should be handled.
|
||||
|
||||
Attributes:
|
||||
STRICT: Fail with an error if file exceeds limits.
|
||||
AUTO: Automatically resize, compress, or optimize to fit limits.
|
||||
WARN: Log a warning but attempt to process anyway.
|
||||
CHUNK: Split large files into smaller pieces.
|
||||
"""
|
||||
|
||||
STRICT = "strict"
|
||||
AUTO = "auto"
|
||||
WARN = "warn"
|
||||
CHUNK = "chunk"
|
||||
103
lib/crewai/src/crewai/files/processing/exceptions.py
Normal file
103
lib/crewai/src/crewai/files/processing/exceptions.py
Normal file
@@ -0,0 +1,103 @@
|
||||
"""Exceptions for file processing operations."""
|
||||
|
||||
|
||||
class FileProcessingError(Exception):
|
||||
"""Base exception for file processing errors."""
|
||||
|
||||
def __init__(self, message: str, file_name: str | None = None) -> None:
|
||||
"""Initialize the exception.
|
||||
|
||||
Args:
|
||||
message: Error message describing the issue.
|
||||
file_name: Optional name of the file that caused the error.
|
||||
"""
|
||||
self.file_name = file_name
|
||||
super().__init__(message)
|
||||
|
||||
|
||||
class FileValidationError(FileProcessingError):
|
||||
"""Raised when file validation fails."""
|
||||
|
||||
|
||||
class FileTooLargeError(FileValidationError):
|
||||
"""Raised when a file exceeds the maximum allowed size."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
message: str,
|
||||
file_name: str | None = None,
|
||||
actual_size: int | None = None,
|
||||
max_size: int | None = None,
|
||||
) -> None:
|
||||
"""Initialize the exception.
|
||||
|
||||
Args:
|
||||
message: Error message describing the issue.
|
||||
file_name: Optional name of the file that caused the error.
|
||||
actual_size: The actual size of the file in bytes.
|
||||
max_size: The maximum allowed size in bytes.
|
||||
"""
|
||||
self.actual_size = actual_size
|
||||
self.max_size = max_size
|
||||
super().__init__(message, file_name)
|
||||
|
||||
|
||||
class UnsupportedFileTypeError(FileValidationError):
|
||||
"""Raised when a file type is not supported by the provider."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
message: str,
|
||||
file_name: str | None = None,
|
||||
content_type: str | None = None,
|
||||
) -> None:
|
||||
"""Initialize the exception.
|
||||
|
||||
Args:
|
||||
message: Error message describing the issue.
|
||||
file_name: Optional name of the file that caused the error.
|
||||
content_type: The content type that is not supported.
|
||||
"""
|
||||
self.content_type = content_type
|
||||
super().__init__(message, file_name)
|
||||
|
||||
|
||||
class ProcessingDependencyError(FileProcessingError):
|
||||
"""Raised when a required processing dependency is not installed."""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
message: str,
|
||||
dependency: str,
|
||||
install_command: str | None = None,
|
||||
) -> None:
|
||||
"""Initialize the exception.
|
||||
|
||||
Args:
|
||||
message: Error message describing the issue.
|
||||
dependency: Name of the missing dependency.
|
||||
install_command: Optional command to install the dependency.
|
||||
"""
|
||||
self.dependency = dependency
|
||||
self.install_command = install_command
|
||||
super().__init__(message)
|
||||
|
||||
|
||||
class TransientFileError(FileProcessingError):
|
||||
"""Transient error that may succeed on retry (network, timeout)."""
|
||||
|
||||
|
||||
class PermanentFileError(FileProcessingError):
|
||||
"""Permanent error that will not succeed on retry (auth, format)."""
|
||||
|
||||
|
||||
class UploadError(FileProcessingError):
|
||||
"""Base exception for upload errors."""
|
||||
|
||||
|
||||
class TransientUploadError(UploadError, TransientFileError):
|
||||
"""Upload failed but may succeed on retry (network issues, rate limits)."""
|
||||
|
||||
|
||||
class PermanentUploadError(UploadError, PermanentFileError):
|
||||
"""Upload failed permanently (auth failure, invalid file, unsupported type)."""
|
||||
347
lib/crewai/src/crewai/files/processing/processor.py
Normal file
347
lib/crewai/src/crewai/files/processing/processor.py
Normal file
@@ -0,0 +1,347 @@
|
||||
"""FileProcessor for validating and transforming files based on provider constraints."""
|
||||
|
||||
import asyncio
|
||||
from collections.abc import Sequence
|
||||
import logging
|
||||
|
||||
from crewai.files.content_types import (
|
||||
AudioFile,
|
||||
File,
|
||||
ImageFile,
|
||||
PDFFile,
|
||||
TextFile,
|
||||
VideoFile,
|
||||
)
|
||||
from crewai.files.processing.constraints import (
|
||||
ProviderConstraints,
|
||||
get_constraints_for_provider,
|
||||
)
|
||||
from crewai.files.processing.enums import FileHandling
|
||||
from crewai.files.processing.exceptions import (
|
||||
FileProcessingError,
|
||||
FileTooLargeError,
|
||||
FileValidationError,
|
||||
UnsupportedFileTypeError,
|
||||
)
|
||||
from crewai.files.processing.transformers import (
|
||||
chunk_pdf,
|
||||
chunk_text,
|
||||
get_image_dimensions,
|
||||
get_pdf_page_count,
|
||||
optimize_image,
|
||||
resize_image,
|
||||
)
|
||||
from crewai.files.processing.validators import validate_file
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
FileInput = AudioFile | File | ImageFile | PDFFile | TextFile | VideoFile
|
||||
|
||||
|
||||
class FileProcessor:
|
||||
"""Processes files according to provider constraints and per-file mode mode.
|
||||
|
||||
Validates files against provider-specific limits and optionally transforms
|
||||
them (resize, compress, chunk) to meet those limits. Each file specifies
|
||||
its own mode mode via `file.mode`.
|
||||
|
||||
Attributes:
|
||||
constraints: Provider constraints for validation.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
constraints: ProviderConstraints | str | None = None,
|
||||
) -> None:
|
||||
"""Initialize the FileProcessor.
|
||||
|
||||
Args:
|
||||
constraints: Provider constraints or provider name string.
|
||||
If None, validation is skipped.
|
||||
"""
|
||||
if isinstance(constraints, str):
|
||||
resolved = get_constraints_for_provider(constraints)
|
||||
if resolved is None:
|
||||
logger.warning(
|
||||
f"Unknown provider '{constraints}' - validation disabled"
|
||||
)
|
||||
self.constraints = resolved
|
||||
else:
|
||||
self.constraints = constraints
|
||||
|
||||
def validate(self, file: FileInput) -> Sequence[str]:
|
||||
"""Validate a file against provider constraints.
|
||||
|
||||
Args:
|
||||
file: The file to validate.
|
||||
|
||||
Returns:
|
||||
List of validation error messages (empty if valid).
|
||||
|
||||
Raises:
|
||||
FileValidationError: If file.mode is STRICT and validation fails.
|
||||
"""
|
||||
if self.constraints is None:
|
||||
return []
|
||||
|
||||
mode = self._get_mode(file)
|
||||
raise_on_error = mode == FileHandling.STRICT
|
||||
return validate_file(file, self.constraints, raise_on_error=raise_on_error)
|
||||
|
||||
@staticmethod
|
||||
def _get_mode(file: FileInput) -> FileHandling:
|
||||
"""Get the mode mode for a file.
|
||||
|
||||
Args:
|
||||
file: The file to get mode for.
|
||||
|
||||
Returns:
|
||||
The file's mode mode, defaulting to AUTO.
|
||||
"""
|
||||
mode = getattr(file, "mode", None)
|
||||
if mode is None:
|
||||
return FileHandling.AUTO
|
||||
if isinstance(mode, str):
|
||||
return FileHandling(mode)
|
||||
if isinstance(mode, FileHandling):
|
||||
return mode
|
||||
return FileHandling.AUTO
|
||||
|
||||
def process(self, file: FileInput) -> FileInput | Sequence[FileInput]:
|
||||
"""Process a single file according to constraints and its mode mode.
|
||||
|
||||
Args:
|
||||
file: The file to process.
|
||||
|
||||
Returns:
|
||||
The processed file (possibly transformed) or a sequence of files
|
||||
if the file was chunked.
|
||||
|
||||
Raises:
|
||||
FileProcessingError: If file.mode is STRICT and processing fails.
|
||||
"""
|
||||
if self.constraints is None:
|
||||
return file
|
||||
|
||||
mode = self._get_mode(file)
|
||||
|
||||
try:
|
||||
errors = self.validate(file)
|
||||
|
||||
if not errors:
|
||||
return file
|
||||
|
||||
if mode == FileHandling.STRICT:
|
||||
raise FileValidationError("; ".join(errors), file_name=file.filename)
|
||||
|
||||
if mode == FileHandling.WARN:
|
||||
for error in errors:
|
||||
logger.warning(error)
|
||||
return file
|
||||
|
||||
if mode == FileHandling.AUTO:
|
||||
return self._auto_process(file)
|
||||
|
||||
if mode == FileHandling.CHUNK:
|
||||
return self._chunk_process(file)
|
||||
|
||||
return file
|
||||
|
||||
except (FileValidationError, FileTooLargeError, UnsupportedFileTypeError):
|
||||
raise
|
||||
except Exception as e:
|
||||
logger.error(f"Error processing file '{file.filename}': {e}")
|
||||
if mode == FileHandling.STRICT:
|
||||
raise FileProcessingError(str(e), file_name=file.filename) from e
|
||||
return file
|
||||
|
||||
def process_files(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
) -> dict[str, FileInput]:
|
||||
"""Process multiple files according to constraints.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping names to file inputs.
|
||||
|
||||
Returns:
|
||||
Dictionary mapping names to processed files. If a file is chunked,
|
||||
multiple entries are created with indexed names.
|
||||
"""
|
||||
result: dict[str, FileInput] = {}
|
||||
|
||||
for name, file in files.items():
|
||||
processed = self.process(file)
|
||||
|
||||
if isinstance(processed, Sequence) and not isinstance(
|
||||
processed, (str, bytes)
|
||||
):
|
||||
for i, chunk in enumerate(processed):
|
||||
chunk_name = f"{name}_chunk_{i}"
|
||||
result[chunk_name] = chunk
|
||||
else:
|
||||
result[name] = processed
|
||||
|
||||
return result
|
||||
|
||||
async def aprocess_files(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
max_concurrency: int = 10,
|
||||
) -> dict[str, FileInput]:
|
||||
"""Async process multiple files in parallel.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping names to file inputs.
|
||||
max_concurrency: Maximum number of concurrent processing tasks.
|
||||
|
||||
Returns:
|
||||
Dictionary mapping names to processed files. If a file is chunked,
|
||||
multiple entries are created with indexed names.
|
||||
"""
|
||||
semaphore = asyncio.Semaphore(max_concurrency)
|
||||
|
||||
async def process_single(
|
||||
key: str, input_file: FileInput
|
||||
) -> tuple[str, FileInput | Sequence[FileInput]]:
|
||||
"""Process a single file with semaphore limiting."""
|
||||
async with semaphore:
|
||||
loop = asyncio.get_running_loop()
|
||||
result = await loop.run_in_executor(None, self.process, input_file)
|
||||
return key, result
|
||||
|
||||
tasks = [process_single(n, f) for n, f in files.items()]
|
||||
gather_results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
output: dict[str, FileInput] = {}
|
||||
for item in gather_results:
|
||||
if isinstance(item, BaseException):
|
||||
logger.error(f"Processing failed: {item}")
|
||||
continue
|
||||
entry_name, processed = item
|
||||
if isinstance(processed, Sequence) and not isinstance(
|
||||
processed, (str, bytes)
|
||||
):
|
||||
for i, chunk in enumerate(processed):
|
||||
output[f"{entry_name}_chunk_{i}"] = chunk
|
||||
elif isinstance(
|
||||
processed, (AudioFile, File, ImageFile, PDFFile, TextFile, VideoFile)
|
||||
):
|
||||
output[entry_name] = processed
|
||||
|
||||
return output
|
||||
|
||||
def _auto_process(self, file: FileInput) -> FileInput:
|
||||
"""Automatically resize/compress file to meet constraints.
|
||||
|
||||
Args:
|
||||
file: The file to process.
|
||||
|
||||
Returns:
|
||||
The processed file.
|
||||
"""
|
||||
if self.constraints is None:
|
||||
return file
|
||||
|
||||
if isinstance(file, ImageFile) and self.constraints.image is not None:
|
||||
return self._auto_process_image(file)
|
||||
|
||||
if isinstance(file, PDFFile) and self.constraints.pdf is not None:
|
||||
logger.warning(
|
||||
f"Cannot auto-compress PDF '{file.filename}'. "
|
||||
"Consider using CHUNK mode for large PDFs."
|
||||
)
|
||||
return file
|
||||
|
||||
if isinstance(file, (AudioFile, VideoFile)):
|
||||
logger.warning(
|
||||
f"Auto-processing not supported for {type(file).__name__}. "
|
||||
"File will be used as-is."
|
||||
)
|
||||
return file
|
||||
|
||||
return file
|
||||
|
||||
def _auto_process_image(self, file: ImageFile) -> ImageFile:
|
||||
"""Auto-process an image file.
|
||||
|
||||
Args:
|
||||
file: The image file to process.
|
||||
|
||||
Returns:
|
||||
The processed image file.
|
||||
"""
|
||||
if self.constraints is None or self.constraints.image is None:
|
||||
return file
|
||||
|
||||
image_constraints = self.constraints.image
|
||||
processed = file
|
||||
content = file.read()
|
||||
current_size = len(content)
|
||||
|
||||
if image_constraints.max_width or image_constraints.max_height:
|
||||
dimensions = get_image_dimensions(file)
|
||||
if dimensions:
|
||||
width, height = dimensions
|
||||
max_w = image_constraints.max_width or width
|
||||
max_h = image_constraints.max_height or height
|
||||
|
||||
if width > max_w or height > max_h:
|
||||
try:
|
||||
processed = resize_image(file, max_w, max_h)
|
||||
content = processed.read()
|
||||
current_size = len(content)
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to resize image: {e}")
|
||||
|
||||
if current_size > image_constraints.max_size_bytes:
|
||||
try:
|
||||
processed = optimize_image(processed, image_constraints.max_size_bytes)
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to optimize image: {e}")
|
||||
|
||||
return processed
|
||||
|
||||
def _chunk_process(self, file: FileInput) -> FileInput | Sequence[FileInput]:
|
||||
"""Split file into chunks to meet constraints.
|
||||
|
||||
Args:
|
||||
file: The file to chunk.
|
||||
|
||||
Returns:
|
||||
Original file if chunking not needed, or sequence of chunked files.
|
||||
"""
|
||||
if self.constraints is None:
|
||||
return file
|
||||
|
||||
if isinstance(file, PDFFile) and self.constraints.pdf is not None:
|
||||
max_pages = self.constraints.pdf.max_pages
|
||||
if max_pages is not None:
|
||||
page_count = get_pdf_page_count(file)
|
||||
if page_count is not None and page_count > max_pages:
|
||||
try:
|
||||
return list(chunk_pdf(file, max_pages))
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to chunk PDF: {e}")
|
||||
return file
|
||||
|
||||
if isinstance(file, TextFile):
|
||||
# Use general max size as character limit approximation
|
||||
max_size = self.constraints.general_max_size_bytes
|
||||
if max_size is not None:
|
||||
content = file.read()
|
||||
if len(content) > max_size:
|
||||
try:
|
||||
return list(chunk_text(file, max_size))
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to chunk text file: {e}")
|
||||
return file
|
||||
|
||||
if isinstance(file, (ImageFile, AudioFile, VideoFile)):
|
||||
logger.warning(
|
||||
f"Chunking not supported for {type(file).__name__}. "
|
||||
"Consider using AUTO mode for images."
|
||||
)
|
||||
|
||||
return file
|
||||
336
lib/crewai/src/crewai/files/processing/transformers.py
Normal file
336
lib/crewai/src/crewai/files/processing/transformers.py
Normal file
@@ -0,0 +1,336 @@
|
||||
"""File transformation functions for resizing, optimizing, and chunking."""
|
||||
|
||||
from collections.abc import Iterator
|
||||
import io
|
||||
import logging
|
||||
|
||||
from crewai.files.content_types import ImageFile, PDFFile, TextFile
|
||||
from crewai.files.file import FileBytes
|
||||
from crewai.files.processing.exceptions import ProcessingDependencyError
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def resize_image(
|
||||
file: ImageFile,
|
||||
max_width: int,
|
||||
max_height: int,
|
||||
*,
|
||||
preserve_aspect_ratio: bool = True,
|
||||
) -> ImageFile:
|
||||
"""Resize an image to fit within the specified dimensions.
|
||||
|
||||
Args:
|
||||
file: The image file to resize.
|
||||
max_width: Maximum width in pixels.
|
||||
max_height: Maximum height in pixels.
|
||||
preserve_aspect_ratio: If True, maintain aspect ratio while fitting within bounds.
|
||||
|
||||
Returns:
|
||||
A new ImageFile with the resized image data.
|
||||
|
||||
Raises:
|
||||
ProcessingDependencyError: If Pillow is not installed.
|
||||
"""
|
||||
try:
|
||||
from PIL import Image
|
||||
except ImportError as e:
|
||||
raise ProcessingDependencyError(
|
||||
"Pillow is required for image resizing",
|
||||
dependency="Pillow",
|
||||
install_command="pip install Pillow",
|
||||
) from e
|
||||
|
||||
content = file.read()
|
||||
|
||||
with Image.open(io.BytesIO(content)) as img:
|
||||
original_width, original_height = img.size
|
||||
|
||||
if original_width <= max_width and original_height <= max_height:
|
||||
return file
|
||||
|
||||
if preserve_aspect_ratio:
|
||||
width_ratio = max_width / original_width
|
||||
height_ratio = max_height / original_height
|
||||
scale_factor = min(width_ratio, height_ratio)
|
||||
|
||||
new_width = int(original_width * scale_factor)
|
||||
new_height = int(original_height * scale_factor)
|
||||
else:
|
||||
new_width = min(original_width, max_width)
|
||||
new_height = min(original_height, max_height)
|
||||
|
||||
resized_img = img.resize((new_width, new_height), Image.Resampling.LANCZOS)
|
||||
|
||||
output_format = img.format or "PNG"
|
||||
if output_format.upper() == "JPEG":
|
||||
if resized_img.mode in ("RGBA", "LA", "P"):
|
||||
resized_img = resized_img.convert("RGB")
|
||||
|
||||
output_buffer = io.BytesIO()
|
||||
resized_img.save(output_buffer, format=output_format)
|
||||
output_bytes = output_buffer.getvalue()
|
||||
|
||||
logger.info(
|
||||
f"Resized image '{file.filename}' from {original_width}x{original_height} "
|
||||
f"to {new_width}x{new_height}"
|
||||
)
|
||||
|
||||
return ImageFile(source=FileBytes(data=output_bytes, filename=file.filename))
|
||||
|
||||
|
||||
def optimize_image(
|
||||
file: ImageFile,
|
||||
target_size_bytes: int,
|
||||
*,
|
||||
min_quality: int = 20,
|
||||
initial_quality: int = 85,
|
||||
) -> ImageFile:
|
||||
"""Optimize an image to fit within a target file size.
|
||||
|
||||
Uses iterative quality reduction to achieve target size.
|
||||
|
||||
Args:
|
||||
file: The image file to optimize.
|
||||
target_size_bytes: Target maximum file size in bytes.
|
||||
min_quality: Minimum quality to use (prevents excessive degradation).
|
||||
initial_quality: Starting quality for optimization.
|
||||
|
||||
Returns:
|
||||
A new ImageFile with the optimized image data.
|
||||
|
||||
Raises:
|
||||
ProcessingDependencyError: If Pillow is not installed.
|
||||
"""
|
||||
try:
|
||||
from PIL import Image
|
||||
except ImportError as e:
|
||||
raise ProcessingDependencyError(
|
||||
"Pillow is required for image optimization",
|
||||
dependency="Pillow",
|
||||
install_command="pip install Pillow",
|
||||
) from e
|
||||
|
||||
content = file.read()
|
||||
current_size = len(content)
|
||||
|
||||
if current_size <= target_size_bytes:
|
||||
return file
|
||||
|
||||
with Image.open(io.BytesIO(content)) as img:
|
||||
if img.mode in ("RGBA", "LA", "P"):
|
||||
img = img.convert("RGB")
|
||||
output_format = "JPEG"
|
||||
else:
|
||||
output_format = img.format or "JPEG"
|
||||
if output_format.upper() not in ("JPEG", "JPG"):
|
||||
output_format = "JPEG"
|
||||
|
||||
quality = initial_quality
|
||||
output_bytes = content
|
||||
|
||||
while len(output_bytes) > target_size_bytes and quality >= min_quality:
|
||||
output_buffer = io.BytesIO()
|
||||
img.save(
|
||||
output_buffer, format=output_format, quality=quality, optimize=True
|
||||
)
|
||||
output_bytes = output_buffer.getvalue()
|
||||
|
||||
if len(output_bytes) > target_size_bytes:
|
||||
quality -= 5
|
||||
|
||||
logger.info(
|
||||
f"Optimized image '{file.filename}' from {current_size} bytes to "
|
||||
f"{len(output_bytes)} bytes (quality={quality})"
|
||||
)
|
||||
|
||||
filename = file.filename
|
||||
if (
|
||||
filename
|
||||
and output_format.upper() == "JPEG"
|
||||
and not filename.lower().endswith((".jpg", ".jpeg"))
|
||||
):
|
||||
filename = filename.rsplit(".", 1)[0] + ".jpg"
|
||||
|
||||
return ImageFile(source=FileBytes(data=output_bytes, filename=filename))
|
||||
|
||||
|
||||
def chunk_pdf(
|
||||
file: PDFFile,
|
||||
max_pages: int,
|
||||
*,
|
||||
overlap_pages: int = 0,
|
||||
) -> Iterator[PDFFile]:
|
||||
"""Split a PDF into chunks of maximum page count.
|
||||
|
||||
Yields chunks one at a time to minimize memory usage.
|
||||
|
||||
Args:
|
||||
file: The PDF file to chunk.
|
||||
max_pages: Maximum pages per chunk.
|
||||
overlap_pages: Number of overlapping pages between chunks (for context).
|
||||
|
||||
Yields:
|
||||
PDFFile objects, one per chunk.
|
||||
|
||||
Raises:
|
||||
ProcessingDependencyError: If pypdf is not installed.
|
||||
"""
|
||||
try:
|
||||
from pypdf import PdfReader, PdfWriter
|
||||
except ImportError as e:
|
||||
raise ProcessingDependencyError(
|
||||
"pypdf is required for PDF chunking",
|
||||
dependency="pypdf",
|
||||
install_command="pip install pypdf",
|
||||
) from e
|
||||
|
||||
content = file.read()
|
||||
reader = PdfReader(io.BytesIO(content))
|
||||
total_pages = len(reader.pages)
|
||||
|
||||
if total_pages <= max_pages:
|
||||
yield file
|
||||
return
|
||||
|
||||
filename = file.filename or "document.pdf"
|
||||
base_filename = filename.rsplit(".", 1)[0]
|
||||
step = max_pages - overlap_pages
|
||||
|
||||
chunk_num = 0
|
||||
start_page = 0
|
||||
|
||||
while start_page < total_pages:
|
||||
end_page = min(start_page + max_pages, total_pages)
|
||||
|
||||
writer = PdfWriter()
|
||||
for page_num in range(start_page, end_page):
|
||||
writer.add_page(reader.pages[page_num])
|
||||
|
||||
output_buffer = io.BytesIO()
|
||||
writer.write(output_buffer)
|
||||
output_bytes = output_buffer.getvalue()
|
||||
|
||||
chunk_filename = f"{base_filename}_chunk_{chunk_num}.pdf"
|
||||
|
||||
logger.info(
|
||||
f"Created PDF chunk '{chunk_filename}' with pages {start_page + 1}-{end_page}"
|
||||
)
|
||||
|
||||
yield PDFFile(source=FileBytes(data=output_bytes, filename=chunk_filename))
|
||||
|
||||
start_page += step
|
||||
chunk_num += 1
|
||||
|
||||
|
||||
def chunk_text(
|
||||
file: TextFile,
|
||||
max_chars: int,
|
||||
*,
|
||||
overlap_chars: int = 200,
|
||||
split_on_newlines: bool = True,
|
||||
) -> Iterator[TextFile]:
|
||||
"""Split a text file into chunks of maximum character count.
|
||||
|
||||
Yields chunks one at a time to minimize memory usage.
|
||||
|
||||
Args:
|
||||
file: The text file to chunk.
|
||||
max_chars: Maximum characters per chunk.
|
||||
overlap_chars: Number of overlapping characters between chunks.
|
||||
split_on_newlines: If True, prefer splitting at newline boundaries.
|
||||
|
||||
Yields:
|
||||
TextFile objects, one per chunk.
|
||||
"""
|
||||
content = file.read()
|
||||
text = content.decode("utf-8", errors="replace")
|
||||
total_chars = len(text)
|
||||
|
||||
if total_chars <= max_chars:
|
||||
yield file
|
||||
return
|
||||
|
||||
filename = file.filename or "text.txt"
|
||||
base_filename = filename.rsplit(".", 1)[0]
|
||||
extension = filename.rsplit(".", 1)[-1] if "." in filename else "txt"
|
||||
|
||||
chunk_num = 0
|
||||
start_pos = 0
|
||||
|
||||
while start_pos < total_chars:
|
||||
end_pos = min(start_pos + max_chars, total_chars)
|
||||
|
||||
if end_pos < total_chars and split_on_newlines:
|
||||
last_newline = text.rfind("\n", start_pos, end_pos)
|
||||
if last_newline > start_pos + max_chars // 2:
|
||||
end_pos = last_newline + 1
|
||||
|
||||
chunk_content = text[start_pos:end_pos]
|
||||
chunk_bytes = chunk_content.encode("utf-8")
|
||||
|
||||
chunk_filename = f"{base_filename}_chunk_{chunk_num}.{extension}"
|
||||
|
||||
logger.info(
|
||||
f"Created text chunk '{chunk_filename}' with {len(chunk_content)} characters"
|
||||
)
|
||||
|
||||
yield TextFile(source=FileBytes(data=chunk_bytes, filename=chunk_filename))
|
||||
|
||||
if end_pos < total_chars:
|
||||
start_pos = max(start_pos + 1, end_pos - overlap_chars)
|
||||
else:
|
||||
start_pos = total_chars
|
||||
chunk_num += 1
|
||||
|
||||
|
||||
def get_image_dimensions(file: ImageFile) -> tuple[int, int] | None:
|
||||
"""Get the dimensions of an image file.
|
||||
|
||||
Args:
|
||||
file: The image file to measure.
|
||||
|
||||
Returns:
|
||||
Tuple of (width, height) in pixels, or None if dimensions cannot be determined.
|
||||
"""
|
||||
try:
|
||||
from PIL import Image
|
||||
except ImportError:
|
||||
logger.warning("Pillow not installed - cannot get image dimensions")
|
||||
return None
|
||||
|
||||
content = file.read()
|
||||
|
||||
try:
|
||||
with Image.open(io.BytesIO(content)) as img:
|
||||
width, height = img.size
|
||||
return width, height
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to get image dimensions: {e}")
|
||||
return None
|
||||
|
||||
|
||||
def get_pdf_page_count(file: PDFFile) -> int | None:
|
||||
"""Get the page count of a PDF file.
|
||||
|
||||
Args:
|
||||
file: The PDF file to measure.
|
||||
|
||||
Returns:
|
||||
Number of pages, or None if page count cannot be determined.
|
||||
"""
|
||||
try:
|
||||
from pypdf import PdfReader
|
||||
except ImportError:
|
||||
logger.warning("pypdf not installed - cannot get PDF page count")
|
||||
return None
|
||||
|
||||
content = file.read()
|
||||
|
||||
try:
|
||||
reader = PdfReader(io.BytesIO(content))
|
||||
return len(reader.pages)
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to get PDF page count: {e}")
|
||||
return None
|
||||
417
lib/crewai/src/crewai/files/processing/validators.py
Normal file
417
lib/crewai/src/crewai/files/processing/validators.py
Normal file
@@ -0,0 +1,417 @@
|
||||
"""File validation functions for checking against provider constraints."""
|
||||
|
||||
from collections.abc import Sequence
|
||||
import logging
|
||||
|
||||
from crewai.files.content_types import (
|
||||
AudioFile,
|
||||
File,
|
||||
ImageFile,
|
||||
PDFFile,
|
||||
TextFile,
|
||||
VideoFile,
|
||||
)
|
||||
from crewai.files.processing.constraints import (
|
||||
AudioConstraints,
|
||||
ImageConstraints,
|
||||
PDFConstraints,
|
||||
ProviderConstraints,
|
||||
VideoConstraints,
|
||||
)
|
||||
from crewai.files.processing.exceptions import (
|
||||
FileTooLargeError,
|
||||
FileValidationError,
|
||||
UnsupportedFileTypeError,
|
||||
)
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
FileInput = AudioFile | File | ImageFile | PDFFile | TextFile | VideoFile
|
||||
|
||||
|
||||
def _format_size(size_bytes: int) -> str:
|
||||
"""Format byte size to human-readable string."""
|
||||
if size_bytes >= 1024 * 1024 * 1024:
|
||||
return f"{size_bytes / (1024 * 1024 * 1024):.1f}GB"
|
||||
if size_bytes >= 1024 * 1024:
|
||||
return f"{size_bytes / (1024 * 1024):.1f}MB"
|
||||
if size_bytes >= 1024:
|
||||
return f"{size_bytes / 1024:.1f}KB"
|
||||
return f"{size_bytes}B"
|
||||
|
||||
|
||||
def validate_image(
|
||||
file: ImageFile,
|
||||
constraints: ImageConstraints,
|
||||
*,
|
||||
raise_on_error: bool = True,
|
||||
) -> Sequence[str]:
|
||||
"""Validate an image file against constraints.
|
||||
|
||||
Args:
|
||||
file: The image file to validate.
|
||||
constraints: Image constraints to validate against.
|
||||
raise_on_error: If True, raise exceptions on validation failure.
|
||||
|
||||
Returns:
|
||||
List of validation error messages (empty if valid).
|
||||
|
||||
Raises:
|
||||
FileTooLargeError: If the file exceeds size limits.
|
||||
FileValidationError: If the file exceeds dimension limits.
|
||||
UnsupportedFileTypeError: If the format is not supported.
|
||||
"""
|
||||
errors: list[str] = []
|
||||
content = file.read()
|
||||
file_size = len(content)
|
||||
filename = file.filename
|
||||
|
||||
if file_size > constraints.max_size_bytes:
|
||||
msg = (
|
||||
f"Image '{filename}' size ({_format_size(file_size)}) exceeds "
|
||||
f"maximum ({_format_size(constraints.max_size_bytes)})"
|
||||
)
|
||||
errors.append(msg)
|
||||
if raise_on_error:
|
||||
raise FileTooLargeError(
|
||||
msg,
|
||||
file_name=filename,
|
||||
actual_size=file_size,
|
||||
max_size=constraints.max_size_bytes,
|
||||
)
|
||||
|
||||
content_type = file.content_type
|
||||
if content_type not in constraints.supported_formats:
|
||||
msg = (
|
||||
f"Image format '{content_type}' is not supported. "
|
||||
f"Supported: {', '.join(constraints.supported_formats)}"
|
||||
)
|
||||
errors.append(msg)
|
||||
if raise_on_error:
|
||||
raise UnsupportedFileTypeError(
|
||||
msg, file_name=filename, content_type=content_type
|
||||
)
|
||||
|
||||
if constraints.max_width is not None or constraints.max_height is not None:
|
||||
try:
|
||||
import io
|
||||
|
||||
from PIL import Image
|
||||
|
||||
with Image.open(io.BytesIO(content)) as img:
|
||||
width, height = img.size
|
||||
|
||||
if constraints.max_width and width > constraints.max_width:
|
||||
msg = (
|
||||
f"Image '{filename}' width ({width}px) exceeds "
|
||||
f"maximum ({constraints.max_width}px)"
|
||||
)
|
||||
errors.append(msg)
|
||||
if raise_on_error:
|
||||
raise FileValidationError(msg, file_name=filename)
|
||||
|
||||
if constraints.max_height and height > constraints.max_height:
|
||||
msg = (
|
||||
f"Image '{filename}' height ({height}px) exceeds "
|
||||
f"maximum ({constraints.max_height}px)"
|
||||
)
|
||||
errors.append(msg)
|
||||
if raise_on_error:
|
||||
raise FileValidationError(msg, file_name=filename)
|
||||
|
||||
except ImportError:
|
||||
logger.warning(
|
||||
"Pillow not installed - cannot validate image dimensions. "
|
||||
"Install with: pip install Pillow"
|
||||
)
|
||||
|
||||
return errors
|
||||
|
||||
|
||||
def validate_pdf(
|
||||
file: PDFFile,
|
||||
constraints: PDFConstraints,
|
||||
*,
|
||||
raise_on_error: bool = True,
|
||||
) -> Sequence[str]:
|
||||
"""Validate a PDF file against constraints.
|
||||
|
||||
Args:
|
||||
file: The PDF file to validate.
|
||||
constraints: PDF constraints to validate against.
|
||||
raise_on_error: If True, raise exceptions on validation failure.
|
||||
|
||||
Returns:
|
||||
List of validation error messages (empty if valid).
|
||||
|
||||
Raises:
|
||||
FileTooLargeError: If the file exceeds size limits.
|
||||
FileValidationError: If the file exceeds page limits.
|
||||
"""
|
||||
errors: list[str] = []
|
||||
content = file.read()
|
||||
file_size = len(content)
|
||||
filename = file.filename
|
||||
|
||||
if file_size > constraints.max_size_bytes:
|
||||
msg = (
|
||||
f"PDF '{filename}' size ({_format_size(file_size)}) exceeds "
|
||||
f"maximum ({_format_size(constraints.max_size_bytes)})"
|
||||
)
|
||||
errors.append(msg)
|
||||
if raise_on_error:
|
||||
raise FileTooLargeError(
|
||||
msg,
|
||||
file_name=filename,
|
||||
actual_size=file_size,
|
||||
max_size=constraints.max_size_bytes,
|
||||
)
|
||||
|
||||
if constraints.max_pages is not None:
|
||||
try:
|
||||
import io
|
||||
|
||||
from pypdf import PdfReader
|
||||
|
||||
reader = PdfReader(io.BytesIO(content))
|
||||
page_count = len(reader.pages)
|
||||
|
||||
if page_count > constraints.max_pages:
|
||||
msg = (
|
||||
f"PDF '{filename}' page count ({page_count}) exceeds "
|
||||
f"maximum ({constraints.max_pages})"
|
||||
)
|
||||
errors.append(msg)
|
||||
if raise_on_error:
|
||||
raise FileValidationError(msg, file_name=filename)
|
||||
|
||||
except ImportError:
|
||||
logger.warning(
|
||||
"pypdf not installed - cannot validate PDF page count. "
|
||||
"Install with: pip install pypdf"
|
||||
)
|
||||
|
||||
return errors
|
||||
|
||||
|
||||
def validate_audio(
|
||||
file: AudioFile,
|
||||
constraints: AudioConstraints,
|
||||
*,
|
||||
raise_on_error: bool = True,
|
||||
) -> Sequence[str]:
|
||||
"""Validate an audio file against constraints.
|
||||
|
||||
Args:
|
||||
file: The audio file to validate.
|
||||
constraints: Audio constraints to validate against.
|
||||
raise_on_error: If True, raise exceptions on validation failure.
|
||||
|
||||
Returns:
|
||||
List of validation error messages (empty if valid).
|
||||
|
||||
Raises:
|
||||
FileTooLargeError: If the file exceeds size limits.
|
||||
UnsupportedFileTypeError: If the format is not supported.
|
||||
"""
|
||||
errors: list[str] = []
|
||||
content = file.read()
|
||||
file_size = len(content)
|
||||
filename = file.filename
|
||||
|
||||
if file_size > constraints.max_size_bytes:
|
||||
msg = (
|
||||
f"Audio '{filename}' size ({_format_size(file_size)}) exceeds "
|
||||
f"maximum ({_format_size(constraints.max_size_bytes)})"
|
||||
)
|
||||
errors.append(msg)
|
||||
if raise_on_error:
|
||||
raise FileTooLargeError(
|
||||
msg,
|
||||
file_name=filename,
|
||||
actual_size=file_size,
|
||||
max_size=constraints.max_size_bytes,
|
||||
)
|
||||
|
||||
content_type = file.content_type
|
||||
if content_type not in constraints.supported_formats:
|
||||
msg = (
|
||||
f"Audio format '{content_type}' is not supported. "
|
||||
f"Supported: {', '.join(constraints.supported_formats)}"
|
||||
)
|
||||
errors.append(msg)
|
||||
if raise_on_error:
|
||||
raise UnsupportedFileTypeError(
|
||||
msg, file_name=filename, content_type=content_type
|
||||
)
|
||||
|
||||
return errors
|
||||
|
||||
|
||||
def validate_video(
|
||||
file: VideoFile,
|
||||
constraints: VideoConstraints,
|
||||
*,
|
||||
raise_on_error: bool = True,
|
||||
) -> Sequence[str]:
|
||||
"""Validate a video file against constraints.
|
||||
|
||||
Args:
|
||||
file: The video file to validate.
|
||||
constraints: Video constraints to validate against.
|
||||
raise_on_error: If True, raise exceptions on validation failure.
|
||||
|
||||
Returns:
|
||||
List of validation error messages (empty if valid).
|
||||
|
||||
Raises:
|
||||
FileTooLargeError: If the file exceeds size limits.
|
||||
UnsupportedFileTypeError: If the format is not supported.
|
||||
"""
|
||||
errors: list[str] = []
|
||||
content = file.read()
|
||||
file_size = len(content)
|
||||
filename = file.filename
|
||||
|
||||
if file_size > constraints.max_size_bytes:
|
||||
msg = (
|
||||
f"Video '{filename}' size ({_format_size(file_size)}) exceeds "
|
||||
f"maximum ({_format_size(constraints.max_size_bytes)})"
|
||||
)
|
||||
errors.append(msg)
|
||||
if raise_on_error:
|
||||
raise FileTooLargeError(
|
||||
msg,
|
||||
file_name=filename,
|
||||
actual_size=file_size,
|
||||
max_size=constraints.max_size_bytes,
|
||||
)
|
||||
|
||||
content_type = file.content_type
|
||||
if content_type not in constraints.supported_formats:
|
||||
msg = (
|
||||
f"Video format '{content_type}' is not supported. "
|
||||
f"Supported: {', '.join(constraints.supported_formats)}"
|
||||
)
|
||||
errors.append(msg)
|
||||
if raise_on_error:
|
||||
raise UnsupportedFileTypeError(
|
||||
msg, file_name=filename, content_type=content_type
|
||||
)
|
||||
|
||||
return errors
|
||||
|
||||
|
||||
def validate_text(
|
||||
file: TextFile,
|
||||
constraints: ProviderConstraints,
|
||||
*,
|
||||
raise_on_error: bool = True,
|
||||
) -> Sequence[str]:
|
||||
"""Validate a text file against general constraints.
|
||||
|
||||
Args:
|
||||
file: The text file to validate.
|
||||
constraints: Provider constraints to validate against.
|
||||
raise_on_error: If True, raise exceptions on validation failure.
|
||||
|
||||
Returns:
|
||||
List of validation error messages (empty if valid).
|
||||
|
||||
Raises:
|
||||
FileTooLargeError: If the file exceeds size limits.
|
||||
"""
|
||||
errors: list[str] = []
|
||||
|
||||
if constraints.general_max_size_bytes is None:
|
||||
return errors
|
||||
|
||||
content = file.read()
|
||||
file_size = len(content)
|
||||
filename = file.filename
|
||||
|
||||
if file_size > constraints.general_max_size_bytes:
|
||||
msg = (
|
||||
f"Text file '{filename}' size ({_format_size(file_size)}) exceeds "
|
||||
f"maximum ({_format_size(constraints.general_max_size_bytes)})"
|
||||
)
|
||||
errors.append(msg)
|
||||
if raise_on_error:
|
||||
raise FileTooLargeError(
|
||||
msg,
|
||||
file_name=filename,
|
||||
actual_size=file_size,
|
||||
max_size=constraints.general_max_size_bytes,
|
||||
)
|
||||
|
||||
return errors
|
||||
|
||||
|
||||
def validate_file(
|
||||
file: FileInput,
|
||||
constraints: ProviderConstraints,
|
||||
*,
|
||||
raise_on_error: bool = True,
|
||||
) -> Sequence[str]:
|
||||
"""Validate a file against provider constraints.
|
||||
|
||||
Dispatches to the appropriate validator based on file type.
|
||||
|
||||
Args:
|
||||
file: The file to validate.
|
||||
constraints: Provider constraints to validate against.
|
||||
raise_on_error: If True, raise exceptions on validation failure.
|
||||
|
||||
Returns:
|
||||
List of validation error messages (empty if valid).
|
||||
|
||||
Raises:
|
||||
FileTooLargeError: If the file exceeds size limits.
|
||||
FileValidationError: If the file fails other validation checks.
|
||||
UnsupportedFileTypeError: If the file type is not supported.
|
||||
"""
|
||||
if isinstance(file, ImageFile):
|
||||
if constraints.image is None:
|
||||
msg = f"Provider '{constraints.name}' does not support images"
|
||||
if raise_on_error:
|
||||
raise UnsupportedFileTypeError(
|
||||
msg, file_name=file.filename, content_type=file.content_type
|
||||
)
|
||||
return [msg]
|
||||
return validate_image(file, constraints.image, raise_on_error=raise_on_error)
|
||||
|
||||
if isinstance(file, PDFFile):
|
||||
if constraints.pdf is None:
|
||||
msg = f"Provider '{constraints.name}' does not support PDFs"
|
||||
if raise_on_error:
|
||||
raise UnsupportedFileTypeError(
|
||||
msg, file_name=file.filename, content_type=file.content_type
|
||||
)
|
||||
return [msg]
|
||||
return validate_pdf(file, constraints.pdf, raise_on_error=raise_on_error)
|
||||
|
||||
if isinstance(file, AudioFile):
|
||||
if constraints.audio is None:
|
||||
msg = f"Provider '{constraints.name}' does not support audio"
|
||||
if raise_on_error:
|
||||
raise UnsupportedFileTypeError(
|
||||
msg, file_name=file.filename, content_type=file.content_type
|
||||
)
|
||||
return [msg]
|
||||
return validate_audio(file, constraints.audio, raise_on_error=raise_on_error)
|
||||
|
||||
if isinstance(file, VideoFile):
|
||||
if constraints.video is None:
|
||||
msg = f"Provider '{constraints.name}' does not support video"
|
||||
if raise_on_error:
|
||||
raise UnsupportedFileTypeError(
|
||||
msg, file_name=file.filename, content_type=file.content_type
|
||||
)
|
||||
return [msg]
|
||||
return validate_video(file, constraints.video, raise_on_error=raise_on_error)
|
||||
|
||||
if isinstance(file, TextFile):
|
||||
return validate_text(file, constraints, raise_on_error=raise_on_error)
|
||||
|
||||
return []
|
||||
84
lib/crewai/src/crewai/files/resolved.py
Normal file
84
lib/crewai/src/crewai/files/resolved.py
Normal file
@@ -0,0 +1,84 @@
|
||||
"""Resolved file types representing different delivery methods for file content."""
|
||||
|
||||
from abc import ABC
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ResolvedFile(ABC):
|
||||
"""Base class for resolved file representations.
|
||||
|
||||
A ResolvedFile represents the final form of a file ready for delivery
|
||||
to an LLM provider, whether inline or via reference.
|
||||
|
||||
Attributes:
|
||||
content_type: MIME type of the file content.
|
||||
"""
|
||||
|
||||
content_type: str
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class InlineBase64(ResolvedFile):
|
||||
"""File content encoded as base64 string.
|
||||
|
||||
Used by most providers for inline file content in messages.
|
||||
|
||||
Attributes:
|
||||
content_type: MIME type of the file content.
|
||||
data: Base64-encoded file content.
|
||||
"""
|
||||
|
||||
data: str
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class InlineBytes(ResolvedFile):
|
||||
"""File content as raw bytes.
|
||||
|
||||
Used by providers like Bedrock that accept raw bytes instead of base64.
|
||||
|
||||
Attributes:
|
||||
content_type: MIME type of the file content.
|
||||
data: Raw file bytes.
|
||||
"""
|
||||
|
||||
data: bytes
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class FileReference(ResolvedFile):
|
||||
"""Reference to an uploaded file.
|
||||
|
||||
Used when files are uploaded via provider File APIs.
|
||||
|
||||
Attributes:
|
||||
content_type: MIME type of the file content.
|
||||
file_id: Provider-specific file identifier.
|
||||
provider: Name of the provider the file was uploaded to.
|
||||
expires_at: When the uploaded file expires (if applicable).
|
||||
file_uri: Optional URI for accessing the file (used by Gemini).
|
||||
"""
|
||||
|
||||
file_id: str
|
||||
provider: str
|
||||
expires_at: datetime | None = None
|
||||
file_uri: str | None = None
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class UrlReference(ResolvedFile):
|
||||
"""Reference to a file accessible via URL.
|
||||
|
||||
Used by providers that support fetching files from URLs.
|
||||
|
||||
Attributes:
|
||||
content_type: MIME type of the file content.
|
||||
url: URL where the file can be accessed.
|
||||
"""
|
||||
|
||||
url: str
|
||||
|
||||
|
||||
ResolvedFileType = InlineBase64 | InlineBytes | FileReference | UrlReference
|
||||
634
lib/crewai/src/crewai/files/resolver.py
Normal file
634
lib/crewai/src/crewai/files/resolver.py
Normal file
@@ -0,0 +1,634 @@
|
||||
"""FileResolver for deciding file delivery method and managing uploads."""
|
||||
|
||||
import asyncio
|
||||
import base64
|
||||
from dataclasses import dataclass, field
|
||||
import hashlib
|
||||
import logging
|
||||
|
||||
from crewai.files.content_types import (
|
||||
AudioFile,
|
||||
File,
|
||||
ImageFile,
|
||||
PDFFile,
|
||||
TextFile,
|
||||
VideoFile,
|
||||
)
|
||||
from crewai.files.metrics import measure_operation
|
||||
from crewai.files.processing.constraints import (
|
||||
AudioConstraints,
|
||||
ImageConstraints,
|
||||
PDFConstraints,
|
||||
ProviderConstraints,
|
||||
VideoConstraints,
|
||||
get_constraints_for_provider,
|
||||
)
|
||||
from crewai.files.resolved import (
|
||||
FileReference,
|
||||
InlineBase64,
|
||||
InlineBytes,
|
||||
ResolvedFile,
|
||||
)
|
||||
from crewai.files.upload_cache import CachedUpload, UploadCache
|
||||
from crewai.files.uploaders import UploadResult, get_uploader
|
||||
from crewai.files.uploaders.base import FileUploader
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
FileInput = AudioFile | File | ImageFile | PDFFile | TextFile | VideoFile
|
||||
|
||||
UPLOAD_MAX_RETRIES = 3
|
||||
UPLOAD_RETRY_DELAY_BASE = 2
|
||||
|
||||
|
||||
@dataclass
|
||||
class FileContext:
|
||||
"""Cached file metadata to avoid redundant reads.
|
||||
|
||||
Attributes:
|
||||
content: Raw file bytes.
|
||||
size: Size of the file in bytes.
|
||||
content_hash: SHA-256 hash of the file content.
|
||||
content_type: MIME type of the file.
|
||||
"""
|
||||
|
||||
content: bytes
|
||||
size: int
|
||||
content_hash: str
|
||||
content_type: str
|
||||
|
||||
|
||||
@dataclass
|
||||
class FileResolverConfig:
|
||||
"""Configuration for FileResolver.
|
||||
|
||||
Attributes:
|
||||
prefer_upload: If True, prefer uploading over inline for supported providers.
|
||||
upload_threshold_bytes: Size threshold above which to use upload.
|
||||
If None, uses provider-specific threshold.
|
||||
use_bytes_for_bedrock: If True, use raw bytes instead of base64 for Bedrock.
|
||||
"""
|
||||
|
||||
prefer_upload: bool = False
|
||||
upload_threshold_bytes: int | None = None
|
||||
use_bytes_for_bedrock: bool = True
|
||||
|
||||
|
||||
@dataclass
|
||||
class FileResolver:
|
||||
"""Resolves files to their delivery format based on provider capabilities.
|
||||
|
||||
Decides whether to use inline base64, raw bytes, or file upload based on:
|
||||
- Provider constraints and capabilities
|
||||
- File size
|
||||
- Configuration preferences
|
||||
|
||||
Caches uploaded files to avoid redundant uploads.
|
||||
|
||||
Attributes:
|
||||
config: Resolver configuration.
|
||||
upload_cache: Cache for tracking uploaded files.
|
||||
"""
|
||||
|
||||
config: FileResolverConfig = field(default_factory=FileResolverConfig)
|
||||
upload_cache: UploadCache | None = None
|
||||
_uploaders: dict[str, FileUploader] = field(default_factory=dict)
|
||||
|
||||
@staticmethod
|
||||
def _build_file_context(file: FileInput) -> FileContext:
|
||||
"""Build context by reading file once.
|
||||
|
||||
Args:
|
||||
file: The file to build context for.
|
||||
|
||||
Returns:
|
||||
FileContext with cached metadata.
|
||||
"""
|
||||
content = file.read()
|
||||
return FileContext(
|
||||
content=content,
|
||||
size=len(content),
|
||||
content_hash=hashlib.sha256(content).hexdigest(),
|
||||
content_type=file.content_type,
|
||||
)
|
||||
|
||||
def resolve(self, file: FileInput, provider: str) -> ResolvedFile:
|
||||
"""Resolve a file to its delivery format for a provider.
|
||||
|
||||
Args:
|
||||
file: The file to resolve.
|
||||
provider: Provider name (e.g., "gemini", "anthropic", "openai").
|
||||
|
||||
Returns:
|
||||
ResolvedFile representing the appropriate delivery format.
|
||||
"""
|
||||
provider_lower = provider.lower()
|
||||
constraints = get_constraints_for_provider(provider)
|
||||
context = self._build_file_context(file)
|
||||
|
||||
should_upload = self._should_upload(
|
||||
file, provider_lower, constraints, context.size
|
||||
)
|
||||
|
||||
if should_upload:
|
||||
resolved = self._resolve_via_upload(file, provider_lower, context)
|
||||
if resolved is not None:
|
||||
return resolved
|
||||
|
||||
return self._resolve_inline(file, provider_lower, context)
|
||||
|
||||
def resolve_files(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
provider: str,
|
||||
) -> dict[str, ResolvedFile]:
|
||||
"""Resolve multiple files for a provider.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping names to file inputs.
|
||||
provider: Provider name.
|
||||
|
||||
Returns:
|
||||
Dictionary mapping names to resolved files.
|
||||
"""
|
||||
return {name: self.resolve(file, provider) for name, file in files.items()}
|
||||
|
||||
@staticmethod
|
||||
def _get_type_constraint(
|
||||
content_type: str,
|
||||
constraints: ProviderConstraints,
|
||||
) -> ImageConstraints | PDFConstraints | AudioConstraints | VideoConstraints | None:
|
||||
"""Get type-specific constraint based on content type.
|
||||
|
||||
Args:
|
||||
content_type: MIME type of the file.
|
||||
constraints: Provider constraints.
|
||||
|
||||
Returns:
|
||||
Type-specific constraint or None if not found.
|
||||
"""
|
||||
if content_type.startswith("image/"):
|
||||
return constraints.image
|
||||
if content_type == "application/pdf":
|
||||
return constraints.pdf
|
||||
if content_type.startswith("audio/"):
|
||||
return constraints.audio
|
||||
if content_type.startswith("video/"):
|
||||
return constraints.video
|
||||
return None
|
||||
|
||||
def _should_upload(
|
||||
self,
|
||||
file: FileInput,
|
||||
provider: str,
|
||||
constraints: ProviderConstraints | None,
|
||||
file_size: int,
|
||||
) -> bool:
|
||||
"""Determine if a file should be uploaded rather than inlined.
|
||||
|
||||
Uses type-specific constraints to make smarter decisions:
|
||||
- Checks if file exceeds type-specific inline size limits
|
||||
- Falls back to general threshold if no type-specific constraint
|
||||
|
||||
Args:
|
||||
file: The file to check.
|
||||
provider: Provider name.
|
||||
constraints: Provider constraints.
|
||||
file_size: Size of the file in bytes.
|
||||
|
||||
Returns:
|
||||
True if the file should be uploaded, False otherwise.
|
||||
"""
|
||||
if constraints is None or not constraints.supports_file_upload:
|
||||
return False
|
||||
|
||||
if self.config.prefer_upload:
|
||||
return True
|
||||
|
||||
content_type = file.content_type
|
||||
type_constraint = self._get_type_constraint(content_type, constraints)
|
||||
|
||||
if type_constraint is not None:
|
||||
# Check if file exceeds type-specific inline limit
|
||||
if file_size > type_constraint.max_size_bytes:
|
||||
logger.debug(
|
||||
f"File {file.filename} ({file_size}B) exceeds {content_type} "
|
||||
f"inline limit ({type_constraint.max_size_bytes}B) for {provider}"
|
||||
)
|
||||
return True
|
||||
|
||||
# Fall back to general threshold
|
||||
threshold = self.config.upload_threshold_bytes
|
||||
if threshold is None:
|
||||
threshold = constraints.file_upload_threshold_bytes
|
||||
|
||||
if threshold is not None and file_size > threshold:
|
||||
return True
|
||||
|
||||
return False
|
||||
|
||||
def _resolve_via_upload(
|
||||
self,
|
||||
file: FileInput,
|
||||
provider: str,
|
||||
context: FileContext,
|
||||
) -> ResolvedFile | None:
|
||||
"""Resolve a file by uploading it.
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
provider: Provider name.
|
||||
context: Pre-computed file context.
|
||||
|
||||
Returns:
|
||||
FileReference if upload succeeds, None otherwise.
|
||||
"""
|
||||
if self.upload_cache is not None:
|
||||
cached = self.upload_cache.get_by_hash(context.content_hash, provider)
|
||||
if cached is not None:
|
||||
logger.debug(
|
||||
f"Using cached upload for {file.filename}: {cached.file_id}"
|
||||
)
|
||||
return FileReference(
|
||||
content_type=cached.content_type,
|
||||
file_id=cached.file_id,
|
||||
provider=cached.provider,
|
||||
expires_at=cached.expires_at,
|
||||
file_uri=cached.file_uri,
|
||||
)
|
||||
|
||||
uploader = self._get_uploader(provider)
|
||||
if uploader is None:
|
||||
logger.debug(f"No uploader available for {provider}")
|
||||
return None
|
||||
|
||||
result = self._upload_with_retry(uploader, file, provider, context.size)
|
||||
if result is None:
|
||||
return None
|
||||
|
||||
if self.upload_cache is not None:
|
||||
self.upload_cache.set_by_hash(
|
||||
file_hash=context.content_hash,
|
||||
content_type=context.content_type,
|
||||
provider=provider,
|
||||
file_id=result.file_id,
|
||||
file_uri=result.file_uri,
|
||||
expires_at=result.expires_at,
|
||||
)
|
||||
|
||||
return FileReference(
|
||||
content_type=result.content_type,
|
||||
file_id=result.file_id,
|
||||
provider=result.provider,
|
||||
expires_at=result.expires_at,
|
||||
file_uri=result.file_uri,
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
def _upload_with_retry(
|
||||
uploader: FileUploader,
|
||||
file: FileInput,
|
||||
provider: str,
|
||||
file_size: int,
|
||||
) -> UploadResult | None:
|
||||
"""Upload with exponential backoff retry.
|
||||
|
||||
Args:
|
||||
uploader: The uploader to use.
|
||||
file: The file to upload.
|
||||
provider: Provider name for logging.
|
||||
file_size: Size of the file in bytes.
|
||||
|
||||
Returns:
|
||||
UploadResult if successful, None otherwise.
|
||||
"""
|
||||
import time
|
||||
|
||||
from crewai.files.processing.exceptions import (
|
||||
PermanentUploadError,
|
||||
TransientUploadError,
|
||||
)
|
||||
|
||||
last_error: Exception | None = None
|
||||
|
||||
for attempt in range(UPLOAD_MAX_RETRIES):
|
||||
with measure_operation(
|
||||
"upload",
|
||||
filename=file.filename,
|
||||
provider=provider,
|
||||
size_bytes=file_size,
|
||||
attempt=attempt + 1,
|
||||
) as metrics:
|
||||
try:
|
||||
result = uploader.upload(file)
|
||||
metrics.metadata["file_id"] = result.file_id
|
||||
return result
|
||||
except PermanentUploadError as e:
|
||||
metrics.metadata["error_type"] = "permanent"
|
||||
logger.warning(
|
||||
f"Non-retryable upload error for {file.filename}: {e}"
|
||||
)
|
||||
return None
|
||||
except TransientUploadError as e:
|
||||
metrics.metadata["error_type"] = "transient"
|
||||
last_error = e
|
||||
except Exception as e:
|
||||
metrics.metadata["error_type"] = "unknown"
|
||||
last_error = e
|
||||
|
||||
if attempt < UPLOAD_MAX_RETRIES - 1:
|
||||
delay = UPLOAD_RETRY_DELAY_BASE**attempt
|
||||
logger.debug(
|
||||
f"Retrying upload for {file.filename} in {delay}s (attempt {attempt + 1})"
|
||||
)
|
||||
time.sleep(delay)
|
||||
|
||||
logger.warning(
|
||||
f"Upload failed for {file.filename} to {provider} after {UPLOAD_MAX_RETRIES} attempts: {last_error}"
|
||||
)
|
||||
return None
|
||||
|
||||
def _resolve_inline(
|
||||
self,
|
||||
file: FileInput,
|
||||
provider: str,
|
||||
context: FileContext,
|
||||
) -> ResolvedFile:
|
||||
"""Resolve a file as inline content.
|
||||
|
||||
Args:
|
||||
file: The file to resolve (used for logging).
|
||||
provider: Provider name.
|
||||
context: Pre-computed file context.
|
||||
|
||||
Returns:
|
||||
InlineBase64 or InlineBytes depending on provider.
|
||||
"""
|
||||
logger.debug(f"Resolving {file.filename} as inline for {provider}")
|
||||
if self.config.use_bytes_for_bedrock and "bedrock" in provider:
|
||||
return InlineBytes(
|
||||
content_type=context.content_type,
|
||||
data=context.content,
|
||||
)
|
||||
|
||||
encoded = base64.b64encode(context.content).decode("ascii")
|
||||
return InlineBase64(
|
||||
content_type=context.content_type,
|
||||
data=encoded,
|
||||
)
|
||||
|
||||
async def aresolve(self, file: FileInput, provider: str) -> ResolvedFile:
|
||||
"""Async resolve a file to its delivery format for a provider.
|
||||
|
||||
Args:
|
||||
file: The file to resolve.
|
||||
provider: Provider name (e.g., "gemini", "anthropic", "openai").
|
||||
|
||||
Returns:
|
||||
ResolvedFile representing the appropriate delivery format.
|
||||
"""
|
||||
provider_lower = provider.lower()
|
||||
constraints = get_constraints_for_provider(provider)
|
||||
context = self._build_file_context(file)
|
||||
|
||||
should_upload = self._should_upload(
|
||||
file, provider_lower, constraints, context.size
|
||||
)
|
||||
|
||||
if should_upload:
|
||||
resolved = await self._aresolve_via_upload(file, provider_lower, context)
|
||||
if resolved is not None:
|
||||
return resolved
|
||||
|
||||
return self._resolve_inline(file, provider_lower, context)
|
||||
|
||||
async def aresolve_files(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
provider: str,
|
||||
max_concurrency: int = 10,
|
||||
) -> dict[str, ResolvedFile]:
|
||||
"""Async resolve multiple files in parallel.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping names to file inputs.
|
||||
provider: Provider name.
|
||||
max_concurrency: Maximum number of concurrent resolutions.
|
||||
|
||||
Returns:
|
||||
Dictionary mapping names to resolved files.
|
||||
"""
|
||||
semaphore = asyncio.Semaphore(max_concurrency)
|
||||
|
||||
async def resolve_single(
|
||||
entry_key: str, input_file: FileInput
|
||||
) -> tuple[str, ResolvedFile]:
|
||||
"""Resolve a single file with semaphore limiting."""
|
||||
async with semaphore:
|
||||
entry_resolved = await self.aresolve(input_file, provider)
|
||||
return entry_key, entry_resolved
|
||||
|
||||
tasks = [resolve_single(n, f) for n, f in files.items()]
|
||||
gather_results = await asyncio.gather(*tasks, return_exceptions=True)
|
||||
|
||||
output: dict[str, ResolvedFile] = {}
|
||||
for item in gather_results:
|
||||
if isinstance(item, BaseException):
|
||||
logger.error(f"Resolution failed: {item}")
|
||||
continue
|
||||
key, resolved = item
|
||||
output[key] = resolved
|
||||
|
||||
return output
|
||||
|
||||
async def _aresolve_via_upload(
|
||||
self,
|
||||
file: FileInput,
|
||||
provider: str,
|
||||
context: FileContext,
|
||||
) -> ResolvedFile | None:
|
||||
"""Async resolve a file by uploading it.
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
provider: Provider name.
|
||||
context: Pre-computed file context.
|
||||
|
||||
Returns:
|
||||
FileReference if upload succeeds, None otherwise.
|
||||
"""
|
||||
if self.upload_cache is not None:
|
||||
cached = await self.upload_cache.aget_by_hash(
|
||||
context.content_hash, provider
|
||||
)
|
||||
if cached is not None:
|
||||
logger.debug(
|
||||
f"Using cached upload for {file.filename}: {cached.file_id}"
|
||||
)
|
||||
return FileReference(
|
||||
content_type=cached.content_type,
|
||||
file_id=cached.file_id,
|
||||
provider=cached.provider,
|
||||
expires_at=cached.expires_at,
|
||||
file_uri=cached.file_uri,
|
||||
)
|
||||
|
||||
uploader = self._get_uploader(provider)
|
||||
if uploader is None:
|
||||
logger.debug(f"No uploader available for {provider}")
|
||||
return None
|
||||
|
||||
result = await self._aupload_with_retry(uploader, file, provider, context.size)
|
||||
if result is None:
|
||||
return None
|
||||
|
||||
if self.upload_cache is not None:
|
||||
await self.upload_cache.aset_by_hash(
|
||||
file_hash=context.content_hash,
|
||||
content_type=context.content_type,
|
||||
provider=provider,
|
||||
file_id=result.file_id,
|
||||
file_uri=result.file_uri,
|
||||
expires_at=result.expires_at,
|
||||
)
|
||||
|
||||
return FileReference(
|
||||
content_type=result.content_type,
|
||||
file_id=result.file_id,
|
||||
provider=result.provider,
|
||||
expires_at=result.expires_at,
|
||||
file_uri=result.file_uri,
|
||||
)
|
||||
|
||||
@staticmethod
|
||||
async def _aupload_with_retry(
|
||||
uploader: FileUploader,
|
||||
file: FileInput,
|
||||
provider: str,
|
||||
file_size: int,
|
||||
) -> UploadResult | None:
|
||||
"""Async upload with exponential backoff retry.
|
||||
|
||||
Args:
|
||||
uploader: The uploader to use.
|
||||
file: The file to upload.
|
||||
provider: Provider name for logging.
|
||||
file_size: Size of the file in bytes.
|
||||
|
||||
Returns:
|
||||
UploadResult if successful, None otherwise.
|
||||
"""
|
||||
from crewai.files.processing.exceptions import (
|
||||
PermanentUploadError,
|
||||
TransientUploadError,
|
||||
)
|
||||
|
||||
last_error: Exception | None = None
|
||||
|
||||
for attempt in range(UPLOAD_MAX_RETRIES):
|
||||
with measure_operation(
|
||||
"upload",
|
||||
filename=file.filename,
|
||||
provider=provider,
|
||||
size_bytes=file_size,
|
||||
attempt=attempt + 1,
|
||||
) as metrics:
|
||||
try:
|
||||
result = await uploader.aupload(file)
|
||||
metrics.metadata["file_id"] = result.file_id
|
||||
return result
|
||||
except PermanentUploadError as e:
|
||||
metrics.metadata["error_type"] = "permanent"
|
||||
logger.warning(
|
||||
f"Non-retryable upload error for {file.filename}: {e}"
|
||||
)
|
||||
return None
|
||||
except TransientUploadError as e:
|
||||
metrics.metadata["error_type"] = "transient"
|
||||
last_error = e
|
||||
except Exception as e:
|
||||
metrics.metadata["error_type"] = "unknown"
|
||||
last_error = e
|
||||
|
||||
if attempt < UPLOAD_MAX_RETRIES - 1:
|
||||
delay = UPLOAD_RETRY_DELAY_BASE**attempt
|
||||
logger.debug(
|
||||
f"Retrying upload for {file.filename} in {delay}s (attempt {attempt + 1})"
|
||||
)
|
||||
await asyncio.sleep(delay)
|
||||
|
||||
logger.warning(
|
||||
f"Upload failed for {file.filename} to {provider} after {UPLOAD_MAX_RETRIES} attempts: {last_error}"
|
||||
)
|
||||
return None
|
||||
|
||||
def _get_uploader(self, provider: str) -> FileUploader | None:
|
||||
"""Get or create an uploader for a provider.
|
||||
|
||||
Args:
|
||||
provider: Provider name.
|
||||
|
||||
Returns:
|
||||
FileUploader instance or None if not available.
|
||||
"""
|
||||
if provider not in self._uploaders:
|
||||
uploader = get_uploader(provider)
|
||||
if uploader is not None:
|
||||
self._uploaders[provider] = uploader
|
||||
else:
|
||||
return None
|
||||
|
||||
return self._uploaders.get(provider)
|
||||
|
||||
def get_cached_uploads(self, provider: str) -> list[CachedUpload]:
|
||||
"""Get all cached uploads for a provider.
|
||||
|
||||
Args:
|
||||
provider: Provider name.
|
||||
|
||||
Returns:
|
||||
List of cached uploads.
|
||||
"""
|
||||
if self.upload_cache is None:
|
||||
return []
|
||||
return self.upload_cache.get_all_for_provider(provider)
|
||||
|
||||
def clear_cache(self) -> None:
|
||||
"""Clear the upload cache."""
|
||||
if self.upload_cache is not None:
|
||||
self.upload_cache.clear()
|
||||
|
||||
|
||||
def create_resolver(
|
||||
provider: str | None = None,
|
||||
prefer_upload: bool = False,
|
||||
upload_threshold_bytes: int | None = None,
|
||||
enable_cache: bool = True,
|
||||
) -> FileResolver:
|
||||
"""Create a configured FileResolver.
|
||||
|
||||
Args:
|
||||
provider: Optional provider name to load default threshold from constraints.
|
||||
prefer_upload: Whether to prefer upload over inline.
|
||||
upload_threshold_bytes: Size threshold for using upload. If None and
|
||||
provider is specified, uses provider's default threshold.
|
||||
enable_cache: Whether to enable upload caching.
|
||||
|
||||
Returns:
|
||||
Configured FileResolver instance.
|
||||
"""
|
||||
threshold = upload_threshold_bytes
|
||||
if threshold is None and provider is not None:
|
||||
constraints = get_constraints_for_provider(provider)
|
||||
if constraints is not None:
|
||||
threshold = constraints.file_upload_threshold_bytes
|
||||
|
||||
config = FileResolverConfig(
|
||||
prefer_upload=prefer_upload,
|
||||
upload_threshold_bytes=threshold,
|
||||
)
|
||||
|
||||
cache = UploadCache() if enable_cache else None
|
||||
|
||||
return FileResolver(config=config, upload_cache=cache)
|
||||
556
lib/crewai/src/crewai/files/upload_cache.py
Normal file
556
lib/crewai/src/crewai/files/upload_cache.py
Normal file
@@ -0,0 +1,556 @@
|
||||
"""Cache for tracking uploaded files using aiocache."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import atexit
|
||||
import builtins
|
||||
from collections.abc import Iterator
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime, timezone
|
||||
import hashlib
|
||||
import logging
|
||||
from typing import TYPE_CHECKING, Any
|
||||
|
||||
from aiocache import Cache # type: ignore[import-untyped]
|
||||
from aiocache.serializers import PickleSerializer # type: ignore[import-untyped]
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from crewai.files.content_types import (
|
||||
AudioFile,
|
||||
File,
|
||||
ImageFile,
|
||||
PDFFile,
|
||||
TextFile,
|
||||
VideoFile,
|
||||
)
|
||||
|
||||
FileInput = AudioFile | File | ImageFile | PDFFile | TextFile | VideoFile
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
DEFAULT_TTL_SECONDS = 24 * 60 * 60 # 24 hours
|
||||
DEFAULT_MAX_CACHE_ENTRIES = 1000
|
||||
|
||||
|
||||
@dataclass
|
||||
class CachedUpload:
|
||||
"""Represents a cached file upload.
|
||||
|
||||
Attributes:
|
||||
file_id: Provider-specific file identifier.
|
||||
provider: Name of the provider.
|
||||
file_uri: Optional URI for accessing the file.
|
||||
content_type: MIME type of the uploaded file.
|
||||
uploaded_at: When the file was uploaded.
|
||||
expires_at: When the upload expires (if applicable).
|
||||
"""
|
||||
|
||||
file_id: str
|
||||
provider: str
|
||||
file_uri: str | None
|
||||
content_type: str
|
||||
uploaded_at: datetime
|
||||
expires_at: datetime | None = None
|
||||
|
||||
def is_expired(self) -> bool:
|
||||
"""Check if this cached upload has expired."""
|
||||
if self.expires_at is None:
|
||||
return False
|
||||
return datetime.now(timezone.utc) >= self.expires_at
|
||||
|
||||
|
||||
def _make_key(file_hash: str, provider: str) -> str:
|
||||
"""Create a cache key from file hash and provider."""
|
||||
return f"upload:{provider}:{file_hash}"
|
||||
|
||||
|
||||
def _compute_file_hash_streaming(chunks: Iterator[bytes]) -> str:
|
||||
"""Compute SHA-256 hash from streaming chunks.
|
||||
|
||||
Args:
|
||||
chunks: Iterator of byte chunks.
|
||||
|
||||
Returns:
|
||||
Hexadecimal hash string.
|
||||
"""
|
||||
hasher = hashlib.sha256()
|
||||
for chunk in chunks:
|
||||
hasher.update(chunk)
|
||||
return hasher.hexdigest()
|
||||
|
||||
|
||||
def _compute_file_hash(file: FileInput) -> str:
|
||||
"""Compute SHA-256 hash of file content.
|
||||
|
||||
Uses streaming for FilePath sources to avoid loading large files into memory.
|
||||
"""
|
||||
from crewai.files.file import FilePath
|
||||
|
||||
source = file._file_source
|
||||
if isinstance(source, FilePath):
|
||||
return _compute_file_hash_streaming(source.read_chunks(chunk_size=1024 * 1024))
|
||||
content = file.read()
|
||||
return hashlib.sha256(content).hexdigest()
|
||||
|
||||
|
||||
class UploadCache:
|
||||
"""Async cache for tracking uploaded files using aiocache.
|
||||
|
||||
Supports in-memory caching by default, with optional Redis backend
|
||||
for distributed setups.
|
||||
|
||||
Attributes:
|
||||
ttl: Default time-to-live in seconds for cached entries.
|
||||
namespace: Cache namespace for isolation.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
ttl: int = DEFAULT_TTL_SECONDS,
|
||||
namespace: str = "crewai_uploads",
|
||||
cache_type: str = "memory",
|
||||
max_entries: int | None = DEFAULT_MAX_CACHE_ENTRIES,
|
||||
**cache_kwargs: Any,
|
||||
) -> None:
|
||||
"""Initialize the upload cache.
|
||||
|
||||
Args:
|
||||
ttl: Default TTL in seconds.
|
||||
namespace: Cache namespace.
|
||||
cache_type: Backend type ("memory" or "redis").
|
||||
max_entries: Maximum cache entries (None for unlimited).
|
||||
**cache_kwargs: Additional args for cache backend.
|
||||
"""
|
||||
self.ttl = ttl
|
||||
self.namespace = namespace
|
||||
self.max_entries = max_entries
|
||||
self._provider_keys: dict[str, set[str]] = {}
|
||||
self._key_access_order: list[str] = []
|
||||
|
||||
if cache_type == "redis":
|
||||
self._cache = Cache(
|
||||
Cache.REDIS,
|
||||
serializer=PickleSerializer(),
|
||||
namespace=namespace,
|
||||
**cache_kwargs,
|
||||
)
|
||||
else:
|
||||
self._cache = Cache(
|
||||
serializer=PickleSerializer(),
|
||||
namespace=namespace,
|
||||
)
|
||||
|
||||
def _track_key(self, provider: str, key: str) -> None:
|
||||
"""Track a key for a provider (for cleanup) and access order."""
|
||||
if provider not in self._provider_keys:
|
||||
self._provider_keys[provider] = set()
|
||||
self._provider_keys[provider].add(key)
|
||||
if key in self._key_access_order:
|
||||
self._key_access_order.remove(key)
|
||||
self._key_access_order.append(key)
|
||||
|
||||
def _untrack_key(self, provider: str, key: str) -> None:
|
||||
"""Remove key tracking for a provider."""
|
||||
if provider in self._provider_keys:
|
||||
self._provider_keys[provider].discard(key)
|
||||
if key in self._key_access_order:
|
||||
self._key_access_order.remove(key)
|
||||
|
||||
async def _evict_if_needed(self) -> int:
|
||||
"""Evict oldest entries if limit exceeded.
|
||||
|
||||
Returns:
|
||||
Number of entries evicted.
|
||||
"""
|
||||
if self.max_entries is None:
|
||||
return 0
|
||||
|
||||
current_count = len(self)
|
||||
if current_count < self.max_entries:
|
||||
return 0
|
||||
|
||||
to_evict = max(1, self.max_entries // 10)
|
||||
return await self._evict_oldest(to_evict)
|
||||
|
||||
async def _evict_oldest(self, count: int) -> int:
|
||||
"""Evict the oldest entries from the cache.
|
||||
|
||||
Args:
|
||||
count: Number of entries to evict.
|
||||
|
||||
Returns:
|
||||
Number of entries actually evicted.
|
||||
"""
|
||||
evicted = 0
|
||||
keys_to_evict = self._key_access_order[:count]
|
||||
|
||||
for key in keys_to_evict:
|
||||
await self._cache.delete(key)
|
||||
self._key_access_order.remove(key)
|
||||
for provider_keys in self._provider_keys.values():
|
||||
provider_keys.discard(key)
|
||||
evicted += 1
|
||||
|
||||
if evicted > 0:
|
||||
logger.debug(f"Evicted {evicted} oldest cache entries")
|
||||
|
||||
return evicted
|
||||
|
||||
async def aget(self, file: FileInput, provider: str) -> CachedUpload | None:
|
||||
"""Get a cached upload for a file.
|
||||
|
||||
Args:
|
||||
file: The file to look up.
|
||||
provider: The provider name.
|
||||
|
||||
Returns:
|
||||
Cached upload if found and not expired, None otherwise.
|
||||
"""
|
||||
file_hash = _compute_file_hash(file)
|
||||
return await self.aget_by_hash(file_hash, provider)
|
||||
|
||||
async def aget_by_hash(self, file_hash: str, provider: str) -> CachedUpload | None:
|
||||
"""Get a cached upload by file hash.
|
||||
|
||||
Args:
|
||||
file_hash: Hash of the file content.
|
||||
provider: The provider name.
|
||||
|
||||
Returns:
|
||||
Cached upload if found and not expired, None otherwise.
|
||||
"""
|
||||
key = _make_key(file_hash, provider)
|
||||
result = await self._cache.get(key)
|
||||
|
||||
if result is None:
|
||||
return None
|
||||
if isinstance(result, CachedUpload):
|
||||
if result.is_expired():
|
||||
await self._cache.delete(key)
|
||||
self._untrack_key(provider, key)
|
||||
return None
|
||||
return result
|
||||
return None
|
||||
|
||||
async def aset(
|
||||
self,
|
||||
file: FileInput,
|
||||
provider: str,
|
||||
file_id: str,
|
||||
file_uri: str | None = None,
|
||||
expires_at: datetime | None = None,
|
||||
) -> CachedUpload:
|
||||
"""Cache an uploaded file.
|
||||
|
||||
Args:
|
||||
file: The file that was uploaded.
|
||||
provider: The provider name.
|
||||
file_id: Provider-specific file identifier.
|
||||
file_uri: Optional URI for accessing the file.
|
||||
expires_at: When the upload expires.
|
||||
|
||||
Returns:
|
||||
The created cache entry.
|
||||
"""
|
||||
file_hash = _compute_file_hash(file)
|
||||
return await self.aset_by_hash(
|
||||
file_hash=file_hash,
|
||||
content_type=file.content_type,
|
||||
provider=provider,
|
||||
file_id=file_id,
|
||||
file_uri=file_uri,
|
||||
expires_at=expires_at,
|
||||
)
|
||||
|
||||
async def aset_by_hash(
|
||||
self,
|
||||
file_hash: str,
|
||||
content_type: str,
|
||||
provider: str,
|
||||
file_id: str,
|
||||
file_uri: str | None = None,
|
||||
expires_at: datetime | None = None,
|
||||
) -> CachedUpload:
|
||||
"""Cache an uploaded file by hash.
|
||||
|
||||
Args:
|
||||
file_hash: Hash of the file content.
|
||||
content_type: MIME type of the file.
|
||||
provider: The provider name.
|
||||
file_id: Provider-specific file identifier.
|
||||
file_uri: Optional URI for accessing the file.
|
||||
expires_at: When the upload expires.
|
||||
|
||||
Returns:
|
||||
The created cache entry.
|
||||
"""
|
||||
await self._evict_if_needed()
|
||||
|
||||
key = _make_key(file_hash, provider)
|
||||
now = datetime.now(timezone.utc)
|
||||
|
||||
cached = CachedUpload(
|
||||
file_id=file_id,
|
||||
provider=provider,
|
||||
file_uri=file_uri,
|
||||
content_type=content_type,
|
||||
uploaded_at=now,
|
||||
expires_at=expires_at,
|
||||
)
|
||||
|
||||
ttl = self.ttl
|
||||
if expires_at is not None:
|
||||
ttl = max(0, int((expires_at - now).total_seconds()))
|
||||
|
||||
await self._cache.set(key, cached, ttl=ttl)
|
||||
self._track_key(provider, key)
|
||||
logger.debug(f"Cached upload: {file_id} for provider {provider}")
|
||||
return cached
|
||||
|
||||
async def aremove(self, file: FileInput, provider: str) -> bool:
|
||||
"""Remove a cached upload.
|
||||
|
||||
Args:
|
||||
file: The file to remove.
|
||||
provider: The provider name.
|
||||
|
||||
Returns:
|
||||
True if entry was removed, False if not found.
|
||||
"""
|
||||
file_hash = _compute_file_hash(file)
|
||||
key = _make_key(file_hash, provider)
|
||||
|
||||
result = await self._cache.delete(key)
|
||||
removed = bool(result > 0 if isinstance(result, int) else result)
|
||||
if removed:
|
||||
self._untrack_key(provider, key)
|
||||
return removed
|
||||
|
||||
async def aremove_by_file_id(self, file_id: str, provider: str) -> bool:
|
||||
"""Remove a cached upload by file ID.
|
||||
|
||||
Args:
|
||||
file_id: The file ID to remove.
|
||||
provider: The provider name.
|
||||
|
||||
Returns:
|
||||
True if entry was removed, False if not found.
|
||||
"""
|
||||
if provider not in self._provider_keys:
|
||||
return False
|
||||
|
||||
for key in list(self._provider_keys[provider]):
|
||||
cached = await self._cache.get(key)
|
||||
if isinstance(cached, CachedUpload) and cached.file_id == file_id:
|
||||
await self._cache.delete(key)
|
||||
self._untrack_key(provider, key)
|
||||
return True
|
||||
return False
|
||||
|
||||
async def aclear_expired(self) -> int:
|
||||
"""Remove all expired entries from the cache.
|
||||
|
||||
Returns:
|
||||
Number of entries removed.
|
||||
"""
|
||||
removed = 0
|
||||
|
||||
for provider, keys in list(self._provider_keys.items()):
|
||||
for key in list(keys):
|
||||
cached = await self._cache.get(key)
|
||||
if cached is None or (
|
||||
isinstance(cached, CachedUpload) and cached.is_expired()
|
||||
):
|
||||
await self._cache.delete(key)
|
||||
self._untrack_key(provider, key)
|
||||
removed += 1
|
||||
|
||||
if removed > 0:
|
||||
logger.debug(f"Cleared {removed} expired cache entries")
|
||||
return removed
|
||||
|
||||
async def aclear(self) -> int:
|
||||
"""Clear all entries from the cache.
|
||||
|
||||
Returns:
|
||||
Number of entries cleared.
|
||||
"""
|
||||
count = sum(len(keys) for keys in self._provider_keys.values())
|
||||
await self._cache.clear(namespace=self.namespace)
|
||||
self._provider_keys.clear()
|
||||
|
||||
if count > 0:
|
||||
logger.debug(f"Cleared {count} cache entries")
|
||||
return count
|
||||
|
||||
async def aget_all_for_provider(self, provider: str) -> list[CachedUpload]:
|
||||
"""Get all cached uploads for a provider.
|
||||
|
||||
Args:
|
||||
provider: The provider name.
|
||||
|
||||
Returns:
|
||||
List of cached uploads for the provider.
|
||||
"""
|
||||
if provider not in self._provider_keys:
|
||||
return []
|
||||
|
||||
results: list[CachedUpload] = []
|
||||
for key in list(self._provider_keys[provider]):
|
||||
cached = await self._cache.get(key)
|
||||
if isinstance(cached, CachedUpload) and not cached.is_expired():
|
||||
results.append(cached)
|
||||
return results
|
||||
|
||||
@staticmethod
|
||||
def _run_sync(coro: Any) -> Any:
|
||||
"""Run an async coroutine from sync context without blocking event loop."""
|
||||
try:
|
||||
loop = asyncio.get_running_loop()
|
||||
except RuntimeError:
|
||||
loop = None
|
||||
|
||||
if loop is not None and loop.is_running():
|
||||
future = asyncio.run_coroutine_threadsafe(coro, loop)
|
||||
return future.result(timeout=30)
|
||||
return asyncio.run(coro)
|
||||
|
||||
def get(self, file: FileInput, provider: str) -> CachedUpload | None:
|
||||
"""Sync wrapper for aget."""
|
||||
result: CachedUpload | None = self._run_sync(self.aget(file, provider))
|
||||
return result
|
||||
|
||||
def get_by_hash(self, file_hash: str, provider: str) -> CachedUpload | None:
|
||||
"""Sync wrapper for aget_by_hash."""
|
||||
result: CachedUpload | None = self._run_sync(
|
||||
self.aget_by_hash(file_hash, provider)
|
||||
)
|
||||
return result
|
||||
|
||||
def set(
|
||||
self,
|
||||
file: FileInput,
|
||||
provider: str,
|
||||
file_id: str,
|
||||
file_uri: str | None = None,
|
||||
expires_at: datetime | None = None,
|
||||
) -> CachedUpload:
|
||||
"""Sync wrapper for aset."""
|
||||
result: CachedUpload = self._run_sync(
|
||||
self.aset(file, provider, file_id, file_uri, expires_at)
|
||||
)
|
||||
return result
|
||||
|
||||
def set_by_hash(
|
||||
self,
|
||||
file_hash: str,
|
||||
content_type: str,
|
||||
provider: str,
|
||||
file_id: str,
|
||||
file_uri: str | None = None,
|
||||
expires_at: datetime | None = None,
|
||||
) -> CachedUpload:
|
||||
"""Sync wrapper for aset_by_hash."""
|
||||
result: CachedUpload = self._run_sync(
|
||||
self.aset_by_hash(
|
||||
file_hash, content_type, provider, file_id, file_uri, expires_at
|
||||
)
|
||||
)
|
||||
return result
|
||||
|
||||
def remove(self, file: FileInput, provider: str) -> bool:
|
||||
"""Sync wrapper for aremove."""
|
||||
result: bool = self._run_sync(self.aremove(file, provider))
|
||||
return result
|
||||
|
||||
def remove_by_file_id(self, file_id: str, provider: str) -> bool:
|
||||
"""Sync wrapper for aremove_by_file_id."""
|
||||
result: bool = self._run_sync(self.aremove_by_file_id(file_id, provider))
|
||||
return result
|
||||
|
||||
def clear_expired(self) -> int:
|
||||
"""Sync wrapper for aclear_expired."""
|
||||
result: int = self._run_sync(self.aclear_expired())
|
||||
return result
|
||||
|
||||
def clear(self) -> int:
|
||||
"""Sync wrapper for aclear."""
|
||||
result: int = self._run_sync(self.aclear())
|
||||
return result
|
||||
|
||||
def get_all_for_provider(self, provider: str) -> list[CachedUpload]:
|
||||
"""Sync wrapper for aget_all_for_provider."""
|
||||
result: list[CachedUpload] = self._run_sync(
|
||||
self.aget_all_for_provider(provider)
|
||||
)
|
||||
return result
|
||||
|
||||
def __len__(self) -> int:
|
||||
"""Return the number of cached entries."""
|
||||
return sum(len(keys) for keys in self._provider_keys.values())
|
||||
|
||||
def get_providers(self) -> builtins.set[str]:
|
||||
"""Get all provider names that have cached entries.
|
||||
|
||||
Returns:
|
||||
Set of provider names.
|
||||
"""
|
||||
return builtins.set(self._provider_keys.keys())
|
||||
|
||||
|
||||
_default_cache: UploadCache | None = None
|
||||
|
||||
|
||||
def get_upload_cache(
|
||||
ttl: int = DEFAULT_TTL_SECONDS,
|
||||
namespace: str = "crewai_uploads",
|
||||
cache_type: str = "memory",
|
||||
**cache_kwargs: Any,
|
||||
) -> UploadCache:
|
||||
"""Get or create the default upload cache.
|
||||
|
||||
Args:
|
||||
ttl: Default TTL in seconds.
|
||||
namespace: Cache namespace.
|
||||
cache_type: Backend type ("memory" or "redis").
|
||||
**cache_kwargs: Additional args for cache backend.
|
||||
|
||||
Returns:
|
||||
The upload cache instance.
|
||||
"""
|
||||
global _default_cache
|
||||
if _default_cache is None:
|
||||
_default_cache = UploadCache(
|
||||
ttl=ttl,
|
||||
namespace=namespace,
|
||||
cache_type=cache_type,
|
||||
**cache_kwargs,
|
||||
)
|
||||
return _default_cache
|
||||
|
||||
|
||||
def reset_upload_cache() -> None:
|
||||
"""Reset the default upload cache (useful for testing)."""
|
||||
global _default_cache
|
||||
if _default_cache is not None:
|
||||
_default_cache.clear()
|
||||
_default_cache = None
|
||||
|
||||
|
||||
def _cleanup_on_exit() -> None:
|
||||
"""Clean up uploaded files on process exit."""
|
||||
global _default_cache
|
||||
if _default_cache is None or len(_default_cache) == 0:
|
||||
return
|
||||
|
||||
from crewai.files.cleanup import cleanup_uploaded_files
|
||||
|
||||
try:
|
||||
cleanup_uploaded_files(_default_cache)
|
||||
except Exception as e:
|
||||
logger.debug(f"Error during exit cleanup: {e}")
|
||||
|
||||
|
||||
atexit.register(_cleanup_on_exit)
|
||||
84
lib/crewai/src/crewai/files/uploaders/__init__.py
Normal file
84
lib/crewai/src/crewai/files/uploaders/__init__.py
Normal file
@@ -0,0 +1,84 @@
|
||||
"""File uploader implementations for provider File APIs."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import logging
|
||||
from typing import Any
|
||||
|
||||
from crewai.files.uploaders.base import FileUploader, UploadResult
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
__all__ = [
|
||||
"FileUploader",
|
||||
"UploadResult",
|
||||
"get_uploader",
|
||||
]
|
||||
|
||||
|
||||
def get_uploader(provider: str, **kwargs: Any) -> FileUploader | None:
|
||||
"""Get a file uploader for a specific provider.
|
||||
|
||||
Args:
|
||||
provider: Provider name (e.g., "gemini", "anthropic").
|
||||
**kwargs: Additional arguments passed to the uploader constructor.
|
||||
|
||||
Returns:
|
||||
FileUploader instance for the provider, or None if not supported.
|
||||
"""
|
||||
provider_lower = provider.lower()
|
||||
|
||||
if "gemini" in provider_lower or "google" in provider_lower:
|
||||
try:
|
||||
from crewai.files.uploaders.gemini import GeminiFileUploader
|
||||
|
||||
return GeminiFileUploader(**kwargs)
|
||||
except ImportError:
|
||||
logger.warning(
|
||||
"google-genai not installed. Install with: pip install google-genai"
|
||||
)
|
||||
return None
|
||||
|
||||
if "anthropic" in provider_lower or "claude" in provider_lower:
|
||||
try:
|
||||
from crewai.files.uploaders.anthropic import AnthropicFileUploader
|
||||
|
||||
return AnthropicFileUploader(**kwargs)
|
||||
except ImportError:
|
||||
logger.warning(
|
||||
"anthropic not installed. Install with: pip install anthropic"
|
||||
)
|
||||
return None
|
||||
|
||||
if "openai" in provider_lower or "gpt" in provider_lower:
|
||||
try:
|
||||
from crewai.files.uploaders.openai import OpenAIFileUploader
|
||||
|
||||
return OpenAIFileUploader(**kwargs)
|
||||
except ImportError:
|
||||
logger.warning("openai not installed. Install with: pip install openai")
|
||||
return None
|
||||
|
||||
if "bedrock" in provider_lower or "aws" in provider_lower:
|
||||
import os
|
||||
|
||||
if (
|
||||
not os.environ.get("CREWAI_BEDROCK_S3_BUCKET")
|
||||
and "bucket_name" not in kwargs
|
||||
):
|
||||
logger.debug(
|
||||
"Bedrock S3 uploader not configured. "
|
||||
"Set CREWAI_BEDROCK_S3_BUCKET environment variable to enable."
|
||||
)
|
||||
return None
|
||||
try:
|
||||
from crewai.files.uploaders.bedrock import BedrockFileUploader
|
||||
|
||||
return BedrockFileUploader(**kwargs)
|
||||
except ImportError:
|
||||
logger.warning("boto3 not installed. Install with: pip install boto3")
|
||||
return None
|
||||
|
||||
logger.debug(f"No file uploader available for provider: {provider}")
|
||||
return None
|
||||
320
lib/crewai/src/crewai/files/uploaders/anthropic.py
Normal file
320
lib/crewai/src/crewai/files/uploaders/anthropic.py
Normal file
@@ -0,0 +1,320 @@
|
||||
"""Anthropic Files API uploader implementation."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import io
|
||||
import logging
|
||||
import os
|
||||
from typing import Any
|
||||
|
||||
from crewai.files.content_types import (
|
||||
AudioFile,
|
||||
File,
|
||||
ImageFile,
|
||||
PDFFile,
|
||||
TextFile,
|
||||
VideoFile,
|
||||
)
|
||||
from crewai.files.uploaders.base import FileUploader, UploadResult
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
FileInput = AudioFile | File | ImageFile | PDFFile | TextFile | VideoFile
|
||||
|
||||
|
||||
class AnthropicFileUploader(FileUploader):
|
||||
"""Uploader for Anthropic Files API.
|
||||
|
||||
Uses the anthropic SDK to upload files. Files are stored persistently
|
||||
until explicitly deleted.
|
||||
|
||||
Attributes:
|
||||
api_key: Optional API key (uses ANTHROPIC_API_KEY env var if not provided).
|
||||
"""
|
||||
|
||||
def __init__(self, api_key: str | None = None) -> None:
|
||||
"""Initialize the Anthropic uploader.
|
||||
|
||||
Args:
|
||||
api_key: Optional Anthropic API key. If not provided, uses
|
||||
ANTHROPIC_API_KEY environment variable.
|
||||
"""
|
||||
self._api_key = api_key or os.environ.get("ANTHROPIC_API_KEY")
|
||||
self._client: Any = None
|
||||
self._async_client: Any = None
|
||||
|
||||
@property
|
||||
def provider_name(self) -> str:
|
||||
"""Return the provider name."""
|
||||
return "anthropic"
|
||||
|
||||
def _get_client(self) -> Any:
|
||||
"""Get or create the Anthropic client."""
|
||||
if self._client is None:
|
||||
try:
|
||||
import anthropic
|
||||
|
||||
self._client = anthropic.Anthropic(api_key=self._api_key)
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"anthropic is required for Anthropic file uploads. "
|
||||
"Install with: pip install anthropic"
|
||||
) from e
|
||||
return self._client
|
||||
|
||||
def _get_async_client(self) -> Any:
|
||||
"""Get or create the async Anthropic client."""
|
||||
if self._async_client is None:
|
||||
try:
|
||||
import anthropic
|
||||
|
||||
self._async_client = anthropic.AsyncAnthropic(api_key=self._api_key)
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"anthropic is required for Anthropic file uploads. "
|
||||
"Install with: pip install anthropic"
|
||||
) from e
|
||||
return self._async_client
|
||||
|
||||
def upload(self, file: FileInput, purpose: str | None = None) -> UploadResult:
|
||||
"""Upload a file to Anthropic.
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
purpose: Optional purpose for the file (default: "user_upload").
|
||||
|
||||
Returns:
|
||||
UploadResult with the file ID and metadata.
|
||||
|
||||
Raises:
|
||||
TransientUploadError: For retryable errors (network, rate limits).
|
||||
PermanentUploadError: For non-retryable errors (auth, validation).
|
||||
"""
|
||||
from crewai.files.processing.exceptions import (
|
||||
PermanentUploadError,
|
||||
TransientUploadError,
|
||||
)
|
||||
|
||||
try:
|
||||
client = self._get_client()
|
||||
|
||||
content = file.read()
|
||||
file_purpose = purpose or "user_upload"
|
||||
|
||||
file_data = io.BytesIO(content)
|
||||
|
||||
logger.info(
|
||||
f"Uploading file '{file.filename}' to Anthropic ({len(content)} bytes)"
|
||||
)
|
||||
|
||||
uploaded_file = client.files.create(
|
||||
file=(file.filename, file_data, file.content_type),
|
||||
purpose=file_purpose,
|
||||
)
|
||||
|
||||
logger.info(f"Uploaded to Anthropic: {uploaded_file.id}")
|
||||
|
||||
return UploadResult(
|
||||
file_id=uploaded_file.id,
|
||||
file_uri=None,
|
||||
content_type=file.content_type,
|
||||
expires_at=None,
|
||||
provider=self.provider_name,
|
||||
)
|
||||
except ImportError:
|
||||
raise
|
||||
except Exception as e:
|
||||
error_type = type(e).__name__
|
||||
if "RateLimit" in error_type or "APIConnection" in error_type:
|
||||
raise TransientUploadError(
|
||||
f"Transient upload error: {e}", file_name=file.filename
|
||||
) from e
|
||||
if "Authentication" in error_type or "Permission" in error_type:
|
||||
raise PermanentUploadError(
|
||||
f"Authentication/permission error: {e}", file_name=file.filename
|
||||
) from e
|
||||
if "BadRequest" in error_type or "InvalidRequest" in error_type:
|
||||
raise PermanentUploadError(
|
||||
f"Invalid request: {e}", file_name=file.filename
|
||||
) from e
|
||||
status_code = getattr(e, "status_code", None)
|
||||
if status_code is not None:
|
||||
if status_code >= 500 or status_code == 429:
|
||||
raise TransientUploadError(
|
||||
f"Server error ({status_code}): {e}", file_name=file.filename
|
||||
) from e
|
||||
if status_code in (401, 403):
|
||||
raise PermanentUploadError(
|
||||
f"Auth error ({status_code}): {e}", file_name=file.filename
|
||||
) from e
|
||||
if status_code == 400:
|
||||
raise PermanentUploadError(
|
||||
f"Bad request ({status_code}): {e}", file_name=file.filename
|
||||
) from e
|
||||
raise TransientUploadError(
|
||||
f"Upload failed: {e}", file_name=file.filename
|
||||
) from e
|
||||
|
||||
def delete(self, file_id: str) -> bool:
|
||||
"""Delete an uploaded file from Anthropic.
|
||||
|
||||
Args:
|
||||
file_id: The file ID to delete.
|
||||
|
||||
Returns:
|
||||
True if deletion was successful, False otherwise.
|
||||
"""
|
||||
try:
|
||||
client = self._get_client()
|
||||
client.files.delete(file_id=file_id)
|
||||
logger.info(f"Deleted Anthropic file: {file_id}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to delete Anthropic file {file_id}: {e}")
|
||||
return False
|
||||
|
||||
def get_file_info(self, file_id: str) -> dict[str, Any] | None:
|
||||
"""Get information about an uploaded file.
|
||||
|
||||
Args:
|
||||
file_id: The file ID.
|
||||
|
||||
Returns:
|
||||
Dictionary with file information, or None if not found.
|
||||
"""
|
||||
try:
|
||||
client = self._get_client()
|
||||
file_info = client.files.retrieve(file_id=file_id)
|
||||
return {
|
||||
"id": file_info.id,
|
||||
"filename": file_info.filename,
|
||||
"purpose": file_info.purpose,
|
||||
"size_bytes": file_info.size_bytes,
|
||||
"created_at": file_info.created_at,
|
||||
}
|
||||
except Exception as e:
|
||||
logger.debug(f"Failed to get Anthropic file info for {file_id}: {e}")
|
||||
return None
|
||||
|
||||
def list_files(self) -> list[dict[str, Any]]:
|
||||
"""List all uploaded files.
|
||||
|
||||
Returns:
|
||||
List of dictionaries with file information.
|
||||
"""
|
||||
try:
|
||||
client = self._get_client()
|
||||
files = client.files.list()
|
||||
return [
|
||||
{
|
||||
"id": f.id,
|
||||
"filename": f.filename,
|
||||
"purpose": f.purpose,
|
||||
"size_bytes": f.size_bytes,
|
||||
"created_at": f.created_at,
|
||||
}
|
||||
for f in files.data
|
||||
]
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to list Anthropic files: {e}")
|
||||
return []
|
||||
|
||||
async def aupload(
|
||||
self, file: FileInput, purpose: str | None = None
|
||||
) -> UploadResult:
|
||||
"""Async upload a file to Anthropic using native async client.
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
purpose: Optional purpose for the file (default: "user_upload").
|
||||
|
||||
Returns:
|
||||
UploadResult with the file ID and metadata.
|
||||
|
||||
Raises:
|
||||
TransientUploadError: For retryable errors (network, rate limits).
|
||||
PermanentUploadError: For non-retryable errors (auth, validation).
|
||||
"""
|
||||
from crewai.files.processing.exceptions import (
|
||||
PermanentUploadError,
|
||||
TransientUploadError,
|
||||
)
|
||||
|
||||
try:
|
||||
client = self._get_async_client()
|
||||
|
||||
content = await file.aread()
|
||||
file_purpose = purpose or "user_upload"
|
||||
|
||||
file_data = io.BytesIO(content)
|
||||
|
||||
logger.info(
|
||||
f"Uploading file '{file.filename}' to Anthropic ({len(content)} bytes)"
|
||||
)
|
||||
|
||||
uploaded_file = await client.files.create(
|
||||
file=(file.filename, file_data, file.content_type),
|
||||
purpose=file_purpose,
|
||||
)
|
||||
|
||||
logger.info(f"Uploaded to Anthropic: {uploaded_file.id}")
|
||||
|
||||
return UploadResult(
|
||||
file_id=uploaded_file.id,
|
||||
file_uri=None,
|
||||
content_type=file.content_type,
|
||||
expires_at=None,
|
||||
provider=self.provider_name,
|
||||
)
|
||||
except ImportError:
|
||||
raise
|
||||
except Exception as e:
|
||||
error_type = type(e).__name__
|
||||
if "RateLimit" in error_type or "APIConnection" in error_type:
|
||||
raise TransientUploadError(
|
||||
f"Transient upload error: {e}", file_name=file.filename
|
||||
) from e
|
||||
if "Authentication" in error_type or "Permission" in error_type:
|
||||
raise PermanentUploadError(
|
||||
f"Authentication/permission error: {e}", file_name=file.filename
|
||||
) from e
|
||||
if "BadRequest" in error_type or "InvalidRequest" in error_type:
|
||||
raise PermanentUploadError(
|
||||
f"Invalid request: {e}", file_name=file.filename
|
||||
) from e
|
||||
status_code = getattr(e, "status_code", None)
|
||||
if status_code is not None:
|
||||
if status_code >= 500 or status_code == 429:
|
||||
raise TransientUploadError(
|
||||
f"Server error ({status_code}): {e}", file_name=file.filename
|
||||
) from e
|
||||
if status_code in (401, 403):
|
||||
raise PermanentUploadError(
|
||||
f"Auth error ({status_code}): {e}", file_name=file.filename
|
||||
) from e
|
||||
if status_code == 400:
|
||||
raise PermanentUploadError(
|
||||
f"Bad request ({status_code}): {e}", file_name=file.filename
|
||||
) from e
|
||||
raise TransientUploadError(
|
||||
f"Upload failed: {e}", file_name=file.filename
|
||||
) from e
|
||||
|
||||
async def adelete(self, file_id: str) -> bool:
|
||||
"""Async delete an uploaded file from Anthropic.
|
||||
|
||||
Args:
|
||||
file_id: The file ID to delete.
|
||||
|
||||
Returns:
|
||||
True if deletion was successful, False otherwise.
|
||||
"""
|
||||
try:
|
||||
client = self._get_async_client()
|
||||
await client.files.delete(file_id=file_id)
|
||||
logger.info(f"Deleted Anthropic file: {file_id}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to delete Anthropic file {file_id}: {e}")
|
||||
return False
|
||||
128
lib/crewai/src/crewai/files/uploaders/base.py
Normal file
128
lib/crewai/src/crewai/files/uploaders/base.py
Normal file
@@ -0,0 +1,128 @@
|
||||
"""Base class for file uploaders."""
|
||||
|
||||
from abc import ABC, abstractmethod
|
||||
import asyncio
|
||||
from dataclasses import dataclass
|
||||
from datetime import datetime
|
||||
from typing import Any
|
||||
|
||||
from crewai.files.content_types import (
|
||||
AudioFile,
|
||||
File,
|
||||
ImageFile,
|
||||
PDFFile,
|
||||
TextFile,
|
||||
VideoFile,
|
||||
)
|
||||
|
||||
|
||||
FileInput = AudioFile | File | ImageFile | PDFFile | TextFile | VideoFile
|
||||
|
||||
|
||||
@dataclass
|
||||
class UploadResult:
|
||||
"""Result of a file upload operation.
|
||||
|
||||
Attributes:
|
||||
file_id: Provider-specific file identifier.
|
||||
file_uri: Optional URI for accessing the file.
|
||||
content_type: MIME type of the uploaded file.
|
||||
expires_at: When the upload expires (if applicable).
|
||||
provider: Name of the provider.
|
||||
"""
|
||||
|
||||
file_id: str
|
||||
provider: str
|
||||
content_type: str
|
||||
file_uri: str | None = None
|
||||
expires_at: datetime | None = None
|
||||
|
||||
|
||||
class FileUploader(ABC):
|
||||
"""Abstract base class for provider file uploaders.
|
||||
|
||||
Implementations handle uploading files to provider-specific File APIs.
|
||||
"""
|
||||
|
||||
@property
|
||||
@abstractmethod
|
||||
def provider_name(self) -> str:
|
||||
"""Return the provider name."""
|
||||
|
||||
@abstractmethod
|
||||
def upload(self, file: FileInput, purpose: str | None = None) -> UploadResult:
|
||||
"""Upload a file to the provider.
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
purpose: Optional purpose/description for the upload.
|
||||
|
||||
Returns:
|
||||
UploadResult with the file identifier and metadata.
|
||||
|
||||
Raises:
|
||||
Exception: If upload fails.
|
||||
"""
|
||||
|
||||
async def aupload(
|
||||
self, file: FileInput, purpose: str | None = None
|
||||
) -> UploadResult:
|
||||
"""Async upload a file to the provider.
|
||||
|
||||
Default implementation runs sync upload in executor.
|
||||
Override in subclasses for native async support.
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
purpose: Optional purpose/description for the upload.
|
||||
|
||||
Returns:
|
||||
UploadResult with the file identifier and metadata.
|
||||
"""
|
||||
loop = asyncio.get_running_loop()
|
||||
return await loop.run_in_executor(None, self.upload, file, purpose)
|
||||
|
||||
@abstractmethod
|
||||
def delete(self, file_id: str) -> bool:
|
||||
"""Delete an uploaded file.
|
||||
|
||||
Args:
|
||||
file_id: The file identifier to delete.
|
||||
|
||||
Returns:
|
||||
True if deletion was successful, False otherwise.
|
||||
"""
|
||||
|
||||
async def adelete(self, file_id: str) -> bool:
|
||||
"""Async delete an uploaded file.
|
||||
|
||||
Default implementation runs sync delete in executor.
|
||||
Override in subclasses for native async support.
|
||||
|
||||
Args:
|
||||
file_id: The file identifier to delete.
|
||||
|
||||
Returns:
|
||||
True if deletion was successful, False otherwise.
|
||||
"""
|
||||
loop = asyncio.get_running_loop()
|
||||
return await loop.run_in_executor(None, self.delete, file_id)
|
||||
|
||||
def get_file_info(self, file_id: str) -> dict[str, Any] | None:
|
||||
"""Get information about an uploaded file.
|
||||
|
||||
Args:
|
||||
file_id: The file identifier.
|
||||
|
||||
Returns:
|
||||
Dictionary with file information, or None if not found.
|
||||
"""
|
||||
return None
|
||||
|
||||
def list_files(self) -> list[dict[str, Any]]:
|
||||
"""List all uploaded files.
|
||||
|
||||
Returns:
|
||||
List of dictionaries with file information.
|
||||
"""
|
||||
return []
|
||||
520
lib/crewai/src/crewai/files/uploaders/bedrock.py
Normal file
520
lib/crewai/src/crewai/files/uploaders/bedrock.py
Normal file
@@ -0,0 +1,520 @@
|
||||
"""AWS Bedrock S3 file uploader implementation."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import hashlib
|
||||
import logging
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from crewai.files.content_types import (
|
||||
AudioFile,
|
||||
File,
|
||||
ImageFile,
|
||||
PDFFile,
|
||||
TextFile,
|
||||
VideoFile,
|
||||
)
|
||||
from crewai.files.file import FileBytes, FilePath
|
||||
from crewai.files.uploaders.base import FileUploader, UploadResult
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
FileInput = AudioFile | File | ImageFile | PDFFile | TextFile | VideoFile
|
||||
|
||||
MULTIPART_THRESHOLD = 8 * 1024 * 1024
|
||||
MULTIPART_CHUNKSIZE = 8 * 1024 * 1024
|
||||
MAX_CONCURRENCY = 10
|
||||
|
||||
|
||||
def _get_file_path(file: FileInput) -> Path | None:
|
||||
"""Get the filesystem path if file source is FilePath.
|
||||
|
||||
Args:
|
||||
file: The file input to check.
|
||||
|
||||
Returns:
|
||||
Path if source is FilePath, None otherwise.
|
||||
"""
|
||||
source = file._file_source
|
||||
if isinstance(source, FilePath):
|
||||
return source.path
|
||||
return None
|
||||
|
||||
|
||||
def _get_file_size(file: FileInput) -> int | None:
|
||||
"""Get file size without reading content if possible.
|
||||
|
||||
Args:
|
||||
file: The file input.
|
||||
|
||||
Returns:
|
||||
Size in bytes if determinable without reading, None otherwise.
|
||||
"""
|
||||
source = file._file_source
|
||||
if isinstance(source, FilePath):
|
||||
return source.path.stat().st_size
|
||||
if isinstance(source, FileBytes):
|
||||
return len(source.data)
|
||||
return None
|
||||
|
||||
|
||||
def _compute_hash_streaming(file_path: Path) -> str:
|
||||
"""Compute SHA-256 hash by streaming file content.
|
||||
|
||||
Args:
|
||||
file_path: Path to the file.
|
||||
|
||||
Returns:
|
||||
First 16 characters of hex digest.
|
||||
"""
|
||||
hasher = hashlib.sha256()
|
||||
with open(file_path, "rb") as f:
|
||||
while chunk := f.read(1024 * 1024):
|
||||
hasher.update(chunk)
|
||||
return hasher.hexdigest()[:16]
|
||||
|
||||
|
||||
class BedrockFileUploader(FileUploader):
|
||||
"""Uploader for AWS Bedrock via S3.
|
||||
|
||||
Uploads files to S3 and returns S3 URIs that can be used with Bedrock's
|
||||
Converse API s3Location source format.
|
||||
|
||||
Attributes:
|
||||
bucket_name: S3 bucket name for file uploads.
|
||||
bucket_owner: Optional bucket owner account ID for cross-account access.
|
||||
prefix: Optional S3 key prefix for uploaded files.
|
||||
region: AWS region for the S3 bucket.
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
bucket_name: str | None = None,
|
||||
bucket_owner: str | None = None,
|
||||
prefix: str = "crewai-files",
|
||||
region: str | None = None,
|
||||
) -> None:
|
||||
"""Initialize the Bedrock S3 uploader.
|
||||
|
||||
Args:
|
||||
bucket_name: S3 bucket name. If not provided, uses
|
||||
CREWAI_BEDROCK_S3_BUCKET environment variable.
|
||||
bucket_owner: Optional bucket owner account ID for cross-account access.
|
||||
Uses CREWAI_BEDROCK_S3_BUCKET_OWNER environment variable if not provided.
|
||||
prefix: S3 key prefix for uploaded files (default: "crewai-files").
|
||||
region: AWS region. Uses AWS_REGION or AWS_DEFAULT_REGION if not provided.
|
||||
"""
|
||||
self._bucket_name = bucket_name or os.environ.get("CREWAI_BEDROCK_S3_BUCKET")
|
||||
self._bucket_owner = bucket_owner or os.environ.get(
|
||||
"CREWAI_BEDROCK_S3_BUCKET_OWNER"
|
||||
)
|
||||
self._prefix = prefix
|
||||
self._region = region or os.environ.get(
|
||||
"AWS_REGION", os.environ.get("AWS_DEFAULT_REGION")
|
||||
)
|
||||
self._client: Any = None
|
||||
self._async_client: Any = None
|
||||
|
||||
@property
|
||||
def provider_name(self) -> str:
|
||||
"""Return the provider name."""
|
||||
return "bedrock"
|
||||
|
||||
@property
|
||||
def bucket_name(self) -> str:
|
||||
"""Return the configured bucket name."""
|
||||
if not self._bucket_name:
|
||||
raise ValueError(
|
||||
"S3 bucket name not configured. Set CREWAI_BEDROCK_S3_BUCKET "
|
||||
"environment variable or pass bucket_name parameter."
|
||||
)
|
||||
return self._bucket_name
|
||||
|
||||
@property
|
||||
def bucket_owner(self) -> str | None:
|
||||
"""Return the configured bucket owner."""
|
||||
return self._bucket_owner
|
||||
|
||||
def _get_client(self) -> Any:
|
||||
"""Get or create the S3 client."""
|
||||
if self._client is None:
|
||||
try:
|
||||
import boto3
|
||||
|
||||
self._client = boto3.client("s3", region_name=self._region)
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"boto3 is required for Bedrock S3 file uploads. "
|
||||
"Install with: pip install boto3"
|
||||
) from e
|
||||
return self._client
|
||||
|
||||
def _get_async_client(self) -> Any:
|
||||
"""Get or create the async S3 client."""
|
||||
if self._async_client is None:
|
||||
try:
|
||||
import aioboto3 # type: ignore[import-not-found]
|
||||
|
||||
self._session = aioboto3.Session()
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"aioboto3 is required for async Bedrock S3 file uploads. "
|
||||
"Install with: pip install aioboto3"
|
||||
) from e
|
||||
return self._session
|
||||
|
||||
def _generate_s3_key(self, file: FileInput, content: bytes | None = None) -> str:
|
||||
"""Generate a unique S3 key for the file.
|
||||
|
||||
For FilePath sources with no content provided, computes hash via streaming.
|
||||
|
||||
Args:
|
||||
file: The file being uploaded.
|
||||
content: The file content bytes (optional for FilePath sources).
|
||||
|
||||
Returns:
|
||||
S3 key string.
|
||||
"""
|
||||
if content is not None:
|
||||
content_hash = hashlib.sha256(content).hexdigest()[:16]
|
||||
else:
|
||||
file_path = _get_file_path(file)
|
||||
if file_path is not None:
|
||||
content_hash = _compute_hash_streaming(file_path)
|
||||
else:
|
||||
content_hash = hashlib.sha256(file.read()).hexdigest()[:16]
|
||||
|
||||
filename = file.filename or "file"
|
||||
safe_filename = "".join(
|
||||
c if c.isalnum() or c in ".-_" else "_" for c in filename
|
||||
)
|
||||
return f"{self._prefix}/{content_hash}_{safe_filename}"
|
||||
|
||||
def _build_s3_uri(self, key: str) -> str:
|
||||
"""Build an S3 URI from a key.
|
||||
|
||||
Args:
|
||||
key: The S3 object key.
|
||||
|
||||
Returns:
|
||||
S3 URI string.
|
||||
"""
|
||||
return f"s3://{self.bucket_name}/{key}"
|
||||
|
||||
@staticmethod
|
||||
def _get_transfer_config() -> Any:
|
||||
"""Get boto3 TransferConfig for multipart uploads."""
|
||||
from boto3.s3.transfer import TransferConfig
|
||||
|
||||
return TransferConfig(
|
||||
multipart_threshold=MULTIPART_THRESHOLD,
|
||||
multipart_chunksize=MULTIPART_CHUNKSIZE,
|
||||
max_concurrency=MAX_CONCURRENCY,
|
||||
use_threads=True,
|
||||
)
|
||||
|
||||
def upload(self, file: FileInput, purpose: str | None = None) -> UploadResult:
|
||||
"""Upload a file to S3 for use with Bedrock.
|
||||
|
||||
Uses streaming upload with automatic multipart for large files.
|
||||
For FilePath sources, streams directly from disk without loading into memory.
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
purpose: Optional purpose (unused, kept for interface consistency).
|
||||
|
||||
Returns:
|
||||
UploadResult with the S3 URI and metadata.
|
||||
|
||||
Raises:
|
||||
TransientUploadError: For retryable errors (network, throttling).
|
||||
PermanentUploadError: For non-retryable errors (auth, validation).
|
||||
"""
|
||||
import io
|
||||
|
||||
from crewai.files.processing.exceptions import (
|
||||
PermanentUploadError,
|
||||
TransientUploadError,
|
||||
)
|
||||
|
||||
try:
|
||||
client = self._get_client()
|
||||
transfer_config = self._get_transfer_config()
|
||||
file_path = _get_file_path(file)
|
||||
|
||||
if file_path is not None:
|
||||
file_size = file_path.stat().st_size
|
||||
s3_key = self._generate_s3_key(file)
|
||||
|
||||
logger.info(
|
||||
f"Uploading file '{file.filename}' to S3 bucket "
|
||||
f"'{self.bucket_name}' ({file_size} bytes, streaming)"
|
||||
)
|
||||
|
||||
with open(file_path, "rb") as f:
|
||||
client.upload_fileobj(
|
||||
f,
|
||||
self.bucket_name,
|
||||
s3_key,
|
||||
ExtraArgs={"ContentType": file.content_type},
|
||||
Config=transfer_config,
|
||||
)
|
||||
else:
|
||||
content = file.read()
|
||||
s3_key = self._generate_s3_key(file, content)
|
||||
|
||||
logger.info(
|
||||
f"Uploading file '{file.filename}' to S3 bucket "
|
||||
f"'{self.bucket_name}' ({len(content)} bytes)"
|
||||
)
|
||||
|
||||
client.upload_fileobj(
|
||||
io.BytesIO(content),
|
||||
self.bucket_name,
|
||||
s3_key,
|
||||
ExtraArgs={"ContentType": file.content_type},
|
||||
Config=transfer_config,
|
||||
)
|
||||
|
||||
s3_uri = self._build_s3_uri(s3_key)
|
||||
logger.info(f"Uploaded to S3: {s3_uri}")
|
||||
|
||||
return UploadResult(
|
||||
file_id=s3_key,
|
||||
file_uri=s3_uri,
|
||||
content_type=file.content_type,
|
||||
expires_at=None,
|
||||
provider=self.provider_name,
|
||||
)
|
||||
except ImportError:
|
||||
raise
|
||||
except Exception as e:
|
||||
error_type = type(e).__name__
|
||||
error_code = getattr(e, "response", {}).get("Error", {}).get("Code", "")
|
||||
|
||||
if error_code in ("SlowDown", "ServiceUnavailable", "InternalError"):
|
||||
raise TransientUploadError(
|
||||
f"Transient S3 error: {e}", file_name=file.filename
|
||||
) from e
|
||||
if error_code in (
|
||||
"AccessDenied",
|
||||
"InvalidAccessKeyId",
|
||||
"SignatureDoesNotMatch",
|
||||
):
|
||||
raise PermanentUploadError(
|
||||
f"S3 authentication error: {e}", file_name=file.filename
|
||||
) from e
|
||||
if error_code in ("NoSuchBucket", "InvalidBucketName"):
|
||||
raise PermanentUploadError(
|
||||
f"S3 bucket error: {e}", file_name=file.filename
|
||||
) from e
|
||||
if "Throttl" in error_type or "Throttl" in str(e):
|
||||
raise TransientUploadError(
|
||||
f"S3 throttling: {e}", file_name=file.filename
|
||||
) from e
|
||||
raise TransientUploadError(
|
||||
f"S3 upload failed: {e}", file_name=file.filename
|
||||
) from e
|
||||
|
||||
def delete(self, file_id: str) -> bool:
|
||||
"""Delete an uploaded file from S3.
|
||||
|
||||
Args:
|
||||
file_id: The S3 key to delete.
|
||||
|
||||
Returns:
|
||||
True if deletion was successful, False otherwise.
|
||||
"""
|
||||
try:
|
||||
client = self._get_client()
|
||||
client.delete_object(Bucket=self.bucket_name, Key=file_id)
|
||||
logger.info(f"Deleted S3 object: s3://{self.bucket_name}/{file_id}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(
|
||||
f"Failed to delete S3 object s3://{self.bucket_name}/{file_id}: {e}"
|
||||
)
|
||||
return False
|
||||
|
||||
def get_file_info(self, file_id: str) -> dict[str, Any] | None:
|
||||
"""Get information about an uploaded file.
|
||||
|
||||
Args:
|
||||
file_id: The S3 key.
|
||||
|
||||
Returns:
|
||||
Dictionary with file information, or None if not found.
|
||||
"""
|
||||
try:
|
||||
client = self._get_client()
|
||||
response = client.head_object(Bucket=self.bucket_name, Key=file_id)
|
||||
return {
|
||||
"id": file_id,
|
||||
"uri": self._build_s3_uri(file_id),
|
||||
"content_type": response.get("ContentType"),
|
||||
"size": response.get("ContentLength"),
|
||||
"last_modified": response.get("LastModified"),
|
||||
"etag": response.get("ETag"),
|
||||
}
|
||||
except Exception as e:
|
||||
logger.debug(f"Failed to get S3 object info for {file_id}: {e}")
|
||||
return None
|
||||
|
||||
def list_files(self) -> list[dict[str, Any]]:
|
||||
"""List all uploaded files in the configured prefix.
|
||||
|
||||
Returns:
|
||||
List of dictionaries with file information.
|
||||
"""
|
||||
try:
|
||||
client = self._get_client()
|
||||
response = client.list_objects_v2(
|
||||
Bucket=self.bucket_name,
|
||||
Prefix=self._prefix,
|
||||
)
|
||||
return [
|
||||
{
|
||||
"id": obj["Key"],
|
||||
"uri": self._build_s3_uri(obj["Key"]),
|
||||
"size": obj.get("Size"),
|
||||
"last_modified": obj.get("LastModified"),
|
||||
"etag": obj.get("ETag"),
|
||||
}
|
||||
for obj in response.get("Contents", [])
|
||||
]
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to list S3 objects: {e}")
|
||||
return []
|
||||
|
||||
async def aupload(
|
||||
self, file: FileInput, purpose: str | None = None
|
||||
) -> UploadResult:
|
||||
"""Async upload a file to S3 for use with Bedrock.
|
||||
|
||||
Uses streaming upload with automatic multipart for large files.
|
||||
For FilePath sources, streams directly from disk without loading into memory.
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
purpose: Optional purpose (unused, kept for interface consistency).
|
||||
|
||||
Returns:
|
||||
UploadResult with the S3 URI and metadata.
|
||||
|
||||
Raises:
|
||||
TransientUploadError: For retryable errors (network, throttling).
|
||||
PermanentUploadError: For non-retryable errors (auth, validation).
|
||||
"""
|
||||
import io
|
||||
|
||||
import aiofiles
|
||||
|
||||
from crewai.files.processing.exceptions import (
|
||||
PermanentUploadError,
|
||||
TransientUploadError,
|
||||
)
|
||||
|
||||
try:
|
||||
session = self._get_async_client()
|
||||
transfer_config = self._get_transfer_config()
|
||||
file_path = _get_file_path(file)
|
||||
|
||||
if file_path is not None:
|
||||
file_size = file_path.stat().st_size
|
||||
s3_key = self._generate_s3_key(file)
|
||||
|
||||
logger.info(
|
||||
f"Uploading file '{file.filename}' to S3 bucket "
|
||||
f"'{self.bucket_name}' ({file_size} bytes, streaming)"
|
||||
)
|
||||
|
||||
async with session.client("s3", region_name=self._region) as client:
|
||||
async with aiofiles.open(file_path, "rb") as f:
|
||||
await client.upload_fileobj(
|
||||
f,
|
||||
self.bucket_name,
|
||||
s3_key,
|
||||
ExtraArgs={"ContentType": file.content_type},
|
||||
Config=transfer_config,
|
||||
)
|
||||
else:
|
||||
content = await file.aread()
|
||||
s3_key = self._generate_s3_key(file, content)
|
||||
|
||||
logger.info(
|
||||
f"Uploading file '{file.filename}' to S3 bucket "
|
||||
f"'{self.bucket_name}' ({len(content)} bytes)"
|
||||
)
|
||||
|
||||
async with session.client("s3", region_name=self._region) as client:
|
||||
await client.upload_fileobj(
|
||||
io.BytesIO(content),
|
||||
self.bucket_name,
|
||||
s3_key,
|
||||
ExtraArgs={"ContentType": file.content_type},
|
||||
Config=transfer_config,
|
||||
)
|
||||
|
||||
s3_uri = self._build_s3_uri(s3_key)
|
||||
logger.info(f"Uploaded to S3: {s3_uri}")
|
||||
|
||||
return UploadResult(
|
||||
file_id=s3_key,
|
||||
file_uri=s3_uri,
|
||||
content_type=file.content_type,
|
||||
expires_at=None,
|
||||
provider=self.provider_name,
|
||||
)
|
||||
except ImportError:
|
||||
raise
|
||||
except Exception as e:
|
||||
error_type = type(e).__name__
|
||||
error_code = getattr(e, "response", {}).get("Error", {}).get("Code", "")
|
||||
|
||||
if error_code in ("SlowDown", "ServiceUnavailable", "InternalError"):
|
||||
raise TransientUploadError(
|
||||
f"Transient S3 error: {e}", file_name=file.filename
|
||||
) from e
|
||||
if error_code in (
|
||||
"AccessDenied",
|
||||
"InvalidAccessKeyId",
|
||||
"SignatureDoesNotMatch",
|
||||
):
|
||||
raise PermanentUploadError(
|
||||
f"S3 authentication error: {e}", file_name=file.filename
|
||||
) from e
|
||||
if error_code in ("NoSuchBucket", "InvalidBucketName"):
|
||||
raise PermanentUploadError(
|
||||
f"S3 bucket error: {e}", file_name=file.filename
|
||||
) from e
|
||||
if "Throttl" in error_type or "Throttl" in str(e):
|
||||
raise TransientUploadError(
|
||||
f"S3 throttling: {e}", file_name=file.filename
|
||||
) from e
|
||||
raise TransientUploadError(
|
||||
f"S3 upload failed: {e}", file_name=file.filename
|
||||
) from e
|
||||
|
||||
async def adelete(self, file_id: str) -> bool:
|
||||
"""Async delete an uploaded file from S3.
|
||||
|
||||
Args:
|
||||
file_id: The S3 key to delete.
|
||||
|
||||
Returns:
|
||||
True if deletion was successful, False otherwise.
|
||||
"""
|
||||
try:
|
||||
session = self._get_async_client()
|
||||
async with session.client("s3", region_name=self._region) as client:
|
||||
await client.delete_object(Bucket=self.bucket_name, Key=file_id)
|
||||
logger.info(f"Deleted S3 object: s3://{self.bucket_name}/{file_id}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(
|
||||
f"Failed to delete S3 object s3://{self.bucket_name}/{file_id}: {e}"
|
||||
)
|
||||
return False
|
||||
508
lib/crewai/src/crewai/files/uploaders/gemini.py
Normal file
508
lib/crewai/src/crewai/files/uploaders/gemini.py
Normal file
@@ -0,0 +1,508 @@
|
||||
"""Gemini File API uploader implementation."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
from datetime import datetime, timedelta, timezone
|
||||
import io
|
||||
import logging
|
||||
import os
|
||||
from pathlib import Path
|
||||
import random
|
||||
import time
|
||||
from typing import Any
|
||||
|
||||
from crewai.files.content_types import (
|
||||
AudioFile,
|
||||
File,
|
||||
ImageFile,
|
||||
PDFFile,
|
||||
TextFile,
|
||||
VideoFile,
|
||||
)
|
||||
from crewai.files.file import FilePath
|
||||
from crewai.files.uploaders.base import FileUploader, UploadResult
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
FileInput = AudioFile | File | ImageFile | PDFFile | TextFile | VideoFile
|
||||
|
||||
GEMINI_FILE_TTL = timedelta(hours=48)
|
||||
|
||||
|
||||
def _get_file_path(file: FileInput) -> Path | None:
|
||||
"""Get the filesystem path if file source is FilePath.
|
||||
|
||||
Args:
|
||||
file: The file input to check.
|
||||
|
||||
Returns:
|
||||
Path if source is FilePath, None otherwise.
|
||||
"""
|
||||
source = file._file_source
|
||||
if isinstance(source, FilePath):
|
||||
return source.path
|
||||
return None
|
||||
|
||||
|
||||
def _get_file_size(file: FileInput) -> int | None:
|
||||
"""Get file size without reading content if possible.
|
||||
|
||||
Args:
|
||||
file: The file input.
|
||||
|
||||
Returns:
|
||||
Size in bytes if determinable without reading, None otherwise.
|
||||
"""
|
||||
source = file._file_source
|
||||
if isinstance(source, FilePath):
|
||||
return source.path.stat().st_size
|
||||
return None
|
||||
|
||||
|
||||
class GeminiFileUploader(FileUploader):
|
||||
"""Uploader for Google Gemini File API.
|
||||
|
||||
Uses the google-genai SDK to upload files. Files are stored for 48 hours.
|
||||
|
||||
Attributes:
|
||||
api_key: Optional API key (uses GOOGLE_API_KEY env var if not provided).
|
||||
"""
|
||||
|
||||
def __init__(self, api_key: str | None = None) -> None:
|
||||
"""Initialize the Gemini uploader.
|
||||
|
||||
Args:
|
||||
api_key: Optional Google API key. If not provided, uses
|
||||
GOOGLE_API_KEY environment variable.
|
||||
"""
|
||||
self._api_key = api_key or os.environ.get("GOOGLE_API_KEY")
|
||||
self._client: Any = None
|
||||
|
||||
@property
|
||||
def provider_name(self) -> str:
|
||||
"""Return the provider name."""
|
||||
return "gemini"
|
||||
|
||||
def _get_client(self) -> Any:
|
||||
"""Get or create the Gemini client."""
|
||||
if self._client is None:
|
||||
try:
|
||||
from google import genai
|
||||
|
||||
self._client = genai.Client(api_key=self._api_key)
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"google-genai is required for Gemini file uploads. "
|
||||
"Install with: pip install google-genai"
|
||||
) from e
|
||||
return self._client
|
||||
|
||||
def upload(self, file: FileInput, purpose: str | None = None) -> UploadResult:
|
||||
"""Upload a file to Gemini.
|
||||
|
||||
For FilePath sources, passes the path directly to the SDK which handles
|
||||
streaming internally via resumable uploads, avoiding memory overhead.
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
purpose: Optional purpose/description (used as display name).
|
||||
|
||||
Returns:
|
||||
UploadResult with the file URI and metadata.
|
||||
|
||||
Raises:
|
||||
TransientUploadError: For retryable errors (network, rate limits).
|
||||
PermanentUploadError: For non-retryable errors (auth, validation).
|
||||
"""
|
||||
from crewai.files.processing.exceptions import (
|
||||
PermanentUploadError,
|
||||
TransientUploadError,
|
||||
)
|
||||
|
||||
try:
|
||||
client = self._get_client()
|
||||
display_name = purpose or file.filename
|
||||
|
||||
file_path = _get_file_path(file)
|
||||
if file_path is not None:
|
||||
file_size = file_path.stat().st_size
|
||||
logger.info(
|
||||
f"Uploading file '{file.filename}' to Gemini via path "
|
||||
f"({file_size} bytes, streaming)"
|
||||
)
|
||||
uploaded_file = client.files.upload(
|
||||
file=file_path,
|
||||
config={
|
||||
"display_name": display_name,
|
||||
"mime_type": file.content_type,
|
||||
},
|
||||
)
|
||||
else:
|
||||
content = file.read()
|
||||
file_data = io.BytesIO(content)
|
||||
file_data.name = file.filename
|
||||
|
||||
logger.info(
|
||||
f"Uploading file '{file.filename}' to Gemini ({len(content)} bytes)"
|
||||
)
|
||||
|
||||
uploaded_file = client.files.upload(
|
||||
file=file_data,
|
||||
config={
|
||||
"display_name": display_name,
|
||||
"mime_type": file.content_type,
|
||||
},
|
||||
)
|
||||
|
||||
if file.content_type.startswith("video/"):
|
||||
if not self.wait_for_processing(uploaded_file.name):
|
||||
raise PermanentUploadError(
|
||||
f"Video processing failed for {file.filename}",
|
||||
file_name=file.filename,
|
||||
)
|
||||
|
||||
expires_at = datetime.now(timezone.utc) + GEMINI_FILE_TTL
|
||||
|
||||
logger.info(
|
||||
f"Uploaded to Gemini: {uploaded_file.name} (URI: {uploaded_file.uri})"
|
||||
)
|
||||
|
||||
return UploadResult(
|
||||
file_id=uploaded_file.name,
|
||||
file_uri=uploaded_file.uri,
|
||||
content_type=file.content_type,
|
||||
expires_at=expires_at,
|
||||
provider=self.provider_name,
|
||||
)
|
||||
except ImportError:
|
||||
raise
|
||||
except (TransientUploadError, PermanentUploadError):
|
||||
raise
|
||||
except Exception as e:
|
||||
error_msg = str(e).lower()
|
||||
if "quota" in error_msg or "rate" in error_msg or "limit" in error_msg:
|
||||
raise TransientUploadError(
|
||||
f"Rate limit error: {e}", file_name=file.filename
|
||||
) from e
|
||||
if (
|
||||
"auth" in error_msg
|
||||
or "permission" in error_msg
|
||||
or "denied" in error_msg
|
||||
):
|
||||
raise PermanentUploadError(
|
||||
f"Authentication/permission error: {e}", file_name=file.filename
|
||||
) from e
|
||||
if "invalid" in error_msg or "unsupported" in error_msg:
|
||||
raise PermanentUploadError(
|
||||
f"Invalid request: {e}", file_name=file.filename
|
||||
) from e
|
||||
status_code = getattr(e, "code", None) or getattr(e, "status_code", None)
|
||||
if status_code is not None:
|
||||
if isinstance(status_code, int):
|
||||
if status_code >= 500 or status_code == 429:
|
||||
raise TransientUploadError(
|
||||
f"Server error ({status_code}): {e}",
|
||||
file_name=file.filename,
|
||||
) from e
|
||||
if status_code in (401, 403):
|
||||
raise PermanentUploadError(
|
||||
f"Auth error ({status_code}): {e}", file_name=file.filename
|
||||
) from e
|
||||
if status_code == 400:
|
||||
raise PermanentUploadError(
|
||||
f"Bad request ({status_code}): {e}", file_name=file.filename
|
||||
) from e
|
||||
raise TransientUploadError(
|
||||
f"Upload failed: {e}", file_name=file.filename
|
||||
) from e
|
||||
|
||||
async def aupload(
|
||||
self, file: FileInput, purpose: str | None = None
|
||||
) -> UploadResult:
|
||||
"""Async upload a file to Gemini using native async client.
|
||||
|
||||
For FilePath sources, passes the path directly to the SDK which handles
|
||||
streaming internally via resumable uploads, avoiding memory overhead.
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
purpose: Optional purpose/description (used as display name).
|
||||
|
||||
Returns:
|
||||
UploadResult with the file URI and metadata.
|
||||
|
||||
Raises:
|
||||
TransientUploadError: For retryable errors (network, rate limits).
|
||||
PermanentUploadError: For non-retryable errors (auth, validation).
|
||||
"""
|
||||
from crewai.files.processing.exceptions import (
|
||||
PermanentUploadError,
|
||||
TransientUploadError,
|
||||
)
|
||||
|
||||
try:
|
||||
client = self._get_client()
|
||||
display_name = purpose or file.filename
|
||||
|
||||
file_path = _get_file_path(file)
|
||||
if file_path is not None:
|
||||
file_size = file_path.stat().st_size
|
||||
logger.info(
|
||||
f"Uploading file '{file.filename}' to Gemini via path "
|
||||
f"({file_size} bytes, streaming)"
|
||||
)
|
||||
uploaded_file = await client.aio.files.upload(
|
||||
file=file_path,
|
||||
config={
|
||||
"display_name": display_name,
|
||||
"mime_type": file.content_type,
|
||||
},
|
||||
)
|
||||
else:
|
||||
content = await file.aread()
|
||||
file_data = io.BytesIO(content)
|
||||
file_data.name = file.filename
|
||||
|
||||
logger.info(
|
||||
f"Uploading file '{file.filename}' to Gemini ({len(content)} bytes)"
|
||||
)
|
||||
|
||||
uploaded_file = await client.aio.files.upload(
|
||||
file=file_data,
|
||||
config={
|
||||
"display_name": display_name,
|
||||
"mime_type": file.content_type,
|
||||
},
|
||||
)
|
||||
|
||||
if file.content_type.startswith("video/"):
|
||||
if not await self.await_for_processing(uploaded_file.name):
|
||||
raise PermanentUploadError(
|
||||
f"Video processing failed for {file.filename}",
|
||||
file_name=file.filename,
|
||||
)
|
||||
|
||||
expires_at = datetime.now(timezone.utc) + GEMINI_FILE_TTL
|
||||
|
||||
logger.info(
|
||||
f"Uploaded to Gemini: {uploaded_file.name} (URI: {uploaded_file.uri})"
|
||||
)
|
||||
|
||||
return UploadResult(
|
||||
file_id=uploaded_file.name,
|
||||
file_uri=uploaded_file.uri,
|
||||
content_type=file.content_type,
|
||||
expires_at=expires_at,
|
||||
provider=self.provider_name,
|
||||
)
|
||||
except ImportError:
|
||||
raise
|
||||
except (TransientUploadError, PermanentUploadError):
|
||||
raise
|
||||
except Exception as e:
|
||||
error_msg = str(e).lower()
|
||||
if "quota" in error_msg or "rate" in error_msg or "limit" in error_msg:
|
||||
raise TransientUploadError(
|
||||
f"Rate limit error: {e}", file_name=file.filename
|
||||
) from e
|
||||
if (
|
||||
"auth" in error_msg
|
||||
or "permission" in error_msg
|
||||
or "denied" in error_msg
|
||||
):
|
||||
raise PermanentUploadError(
|
||||
f"Authentication/permission error: {e}", file_name=file.filename
|
||||
) from e
|
||||
if "invalid" in error_msg or "unsupported" in error_msg:
|
||||
raise PermanentUploadError(
|
||||
f"Invalid request: {e}", file_name=file.filename
|
||||
) from e
|
||||
status_code = getattr(e, "code", None) or getattr(e, "status_code", None)
|
||||
if status_code is not None and isinstance(status_code, int):
|
||||
if status_code >= 500 or status_code == 429:
|
||||
raise TransientUploadError(
|
||||
f"Server error ({status_code}): {e}", file_name=file.filename
|
||||
) from e
|
||||
if status_code in (401, 403):
|
||||
raise PermanentUploadError(
|
||||
f"Auth error ({status_code}): {e}", file_name=file.filename
|
||||
) from e
|
||||
if status_code == 400:
|
||||
raise PermanentUploadError(
|
||||
f"Bad request ({status_code}): {e}", file_name=file.filename
|
||||
) from e
|
||||
raise TransientUploadError(
|
||||
f"Upload failed: {e}", file_name=file.filename
|
||||
) from e
|
||||
|
||||
def delete(self, file_id: str) -> bool:
|
||||
"""Delete an uploaded file from Gemini.
|
||||
|
||||
Args:
|
||||
file_id: The file name/ID to delete.
|
||||
|
||||
Returns:
|
||||
True if deletion was successful, False otherwise.
|
||||
"""
|
||||
try:
|
||||
client = self._get_client()
|
||||
client.files.delete(name=file_id)
|
||||
logger.info(f"Deleted Gemini file: {file_id}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to delete Gemini file {file_id}: {e}")
|
||||
return False
|
||||
|
||||
async def adelete(self, file_id: str) -> bool:
|
||||
"""Async delete an uploaded file from Gemini.
|
||||
|
||||
Args:
|
||||
file_id: The file name/ID to delete.
|
||||
|
||||
Returns:
|
||||
True if deletion was successful, False otherwise.
|
||||
"""
|
||||
try:
|
||||
client = self._get_client()
|
||||
await client.aio.files.delete(name=file_id)
|
||||
logger.info(f"Deleted Gemini file: {file_id}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to delete Gemini file {file_id}: {e}")
|
||||
return False
|
||||
|
||||
def get_file_info(self, file_id: str) -> dict[str, Any] | None:
|
||||
"""Get information about an uploaded file.
|
||||
|
||||
Args:
|
||||
file_id: The file name/ID.
|
||||
|
||||
Returns:
|
||||
Dictionary with file information, or None if not found.
|
||||
"""
|
||||
try:
|
||||
client = self._get_client()
|
||||
file_info = client.files.get(name=file_id)
|
||||
return {
|
||||
"name": file_info.name,
|
||||
"uri": file_info.uri,
|
||||
"display_name": file_info.display_name,
|
||||
"mime_type": file_info.mime_type,
|
||||
"size_bytes": file_info.size_bytes,
|
||||
"state": str(file_info.state),
|
||||
"create_time": file_info.create_time,
|
||||
"expiration_time": file_info.expiration_time,
|
||||
}
|
||||
except Exception as e:
|
||||
logger.debug(f"Failed to get Gemini file info for {file_id}: {e}")
|
||||
return None
|
||||
|
||||
def list_files(self) -> list[dict[str, Any]]:
|
||||
"""List all uploaded files.
|
||||
|
||||
Returns:
|
||||
List of dictionaries with file information.
|
||||
"""
|
||||
try:
|
||||
client = self._get_client()
|
||||
files = client.files.list()
|
||||
return [
|
||||
{
|
||||
"name": f.name,
|
||||
"uri": f.uri,
|
||||
"display_name": f.display_name,
|
||||
"mime_type": f.mime_type,
|
||||
"size_bytes": f.size_bytes,
|
||||
"state": str(f.state),
|
||||
}
|
||||
for f in files
|
||||
]
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to list Gemini files: {e}")
|
||||
return []
|
||||
|
||||
def wait_for_processing(self, file_id: str, timeout_seconds: int = 300) -> bool:
|
||||
"""Wait for a file to finish processing with exponential backoff.
|
||||
|
||||
Some files (especially videos) need time to process after upload.
|
||||
|
||||
Args:
|
||||
file_id: The file name/ID.
|
||||
timeout_seconds: Maximum time to wait.
|
||||
|
||||
Returns:
|
||||
True if processing completed, False if timed out or failed.
|
||||
"""
|
||||
try:
|
||||
from google.genai.types import FileState
|
||||
except ImportError:
|
||||
return True
|
||||
|
||||
client = self._get_client()
|
||||
start_time = time.time()
|
||||
base_delay = 1.0
|
||||
max_delay = 30.0
|
||||
attempt = 0
|
||||
|
||||
while time.time() - start_time < timeout_seconds:
|
||||
file_info = client.files.get(name=file_id)
|
||||
|
||||
if file_info.state == FileState.ACTIVE:
|
||||
return True
|
||||
|
||||
if file_info.state == FileState.FAILED:
|
||||
logger.error(f"Gemini file processing failed: {file_id}")
|
||||
return False
|
||||
|
||||
delay = min(base_delay * (2**attempt), max_delay)
|
||||
jitter = random.uniform(0, delay * 0.1) # noqa: S311
|
||||
time.sleep(delay + jitter)
|
||||
attempt += 1
|
||||
|
||||
logger.warning(f"Timed out waiting for Gemini file processing: {file_id}")
|
||||
return False
|
||||
|
||||
async def await_for_processing(
|
||||
self, file_id: str, timeout_seconds: int = 300
|
||||
) -> bool:
|
||||
"""Async wait for a file to finish processing with exponential backoff.
|
||||
|
||||
Some files (especially videos) need time to process after upload.
|
||||
|
||||
Args:
|
||||
file_id: The file name/ID.
|
||||
timeout_seconds: Maximum time to wait.
|
||||
|
||||
Returns:
|
||||
True if processing completed, False if timed out or failed.
|
||||
"""
|
||||
try:
|
||||
from google.genai.types import FileState
|
||||
except ImportError:
|
||||
return True
|
||||
|
||||
client = self._get_client()
|
||||
start_time = time.time()
|
||||
base_delay = 1.0
|
||||
max_delay = 30.0
|
||||
attempt = 0
|
||||
|
||||
while time.time() - start_time < timeout_seconds:
|
||||
file_info = await client.aio.files.get(name=file_id)
|
||||
|
||||
if file_info.state == FileState.ACTIVE:
|
||||
return True
|
||||
|
||||
if file_info.state == FileState.FAILED:
|
||||
logger.error(f"Gemini file processing failed: {file_id}")
|
||||
return False
|
||||
|
||||
delay = min(base_delay * (2**attempt), max_delay)
|
||||
jitter = random.uniform(0, delay * 0.1) # noqa: S311
|
||||
await asyncio.sleep(delay + jitter)
|
||||
attempt += 1
|
||||
|
||||
logger.warning(f"Timed out waiting for Gemini file processing: {file_id}")
|
||||
return False
|
||||
752
lib/crewai/src/crewai/files/uploaders/openai.py
Normal file
752
lib/crewai/src/crewai/files/uploaders/openai.py
Normal file
@@ -0,0 +1,752 @@
|
||||
"""OpenAI Files API uploader implementation."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from collections.abc import AsyncIterator, Iterator
|
||||
import io
|
||||
import logging
|
||||
import os
|
||||
from typing import Any
|
||||
|
||||
from crewai.files.content_types import (
|
||||
AudioFile,
|
||||
File,
|
||||
ImageFile,
|
||||
PDFFile,
|
||||
TextFile,
|
||||
VideoFile,
|
||||
)
|
||||
from crewai.files.file import FileBytes, FilePath, FileStream
|
||||
from crewai.files.uploaders.base import FileUploader, UploadResult
|
||||
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
FileInput = AudioFile | File | ImageFile | PDFFile | TextFile | VideoFile
|
||||
|
||||
FILES_API_MAX_SIZE = 512 * 1024 * 1024
|
||||
DEFAULT_UPLOAD_CHUNK_SIZE = 64 * 1024 * 1024
|
||||
|
||||
|
||||
def _get_file_size(file: FileInput) -> int | None:
|
||||
"""Get file size without reading content if possible.
|
||||
|
||||
Args:
|
||||
file: The file to get size for.
|
||||
|
||||
Returns:
|
||||
File size in bytes, or None if size cannot be determined without reading.
|
||||
"""
|
||||
source = file._file_source
|
||||
if isinstance(source, FilePath):
|
||||
return source.path.stat().st_size
|
||||
if isinstance(source, FileBytes):
|
||||
return len(source.data)
|
||||
return None
|
||||
|
||||
|
||||
def _iter_file_chunks(file: FileInput, chunk_size: int) -> Iterator[bytes]:
|
||||
"""Iterate over file content in chunks.
|
||||
|
||||
Args:
|
||||
file: The file to read.
|
||||
chunk_size: Size of each chunk in bytes.
|
||||
|
||||
Yields:
|
||||
Chunks of file content.
|
||||
"""
|
||||
source = file._file_source
|
||||
if isinstance(source, (FilePath, FileBytes, FileStream)):
|
||||
yield from source.read_chunks(chunk_size)
|
||||
else:
|
||||
content = file.read()
|
||||
for i in range(0, len(content), chunk_size):
|
||||
yield content[i : i + chunk_size]
|
||||
|
||||
|
||||
async def _aiter_file_chunks(
|
||||
file: FileInput, chunk_size: int, content: bytes | None = None
|
||||
) -> AsyncIterator[bytes]:
|
||||
"""Async iterate over file content in chunks.
|
||||
|
||||
Args:
|
||||
file: The file to read.
|
||||
chunk_size: Size of each chunk in bytes.
|
||||
content: Optional pre-loaded content to chunk.
|
||||
|
||||
Yields:
|
||||
Chunks of file content.
|
||||
"""
|
||||
if content is not None:
|
||||
for i in range(0, len(content), chunk_size):
|
||||
yield content[i : i + chunk_size]
|
||||
return
|
||||
|
||||
source = file._file_source
|
||||
if isinstance(source, FilePath):
|
||||
async for chunk in source.aread_chunks(chunk_size):
|
||||
yield chunk
|
||||
elif isinstance(source, (FileBytes, FileStream)):
|
||||
for chunk in source.read_chunks(chunk_size):
|
||||
yield chunk
|
||||
else:
|
||||
data = await file.aread()
|
||||
for i in range(0, len(data), chunk_size):
|
||||
yield data[i : i + chunk_size]
|
||||
|
||||
|
||||
class OpenAIFileUploader(FileUploader):
|
||||
"""Uploader for OpenAI Files and Uploads APIs.
|
||||
|
||||
Uses the Files API for files up to 512MB (single request).
|
||||
Uses the Uploads API for files larger than 512MB (multipart chunked).
|
||||
|
||||
Attributes:
|
||||
api_key: Optional API key (uses OPENAI_API_KEY env var if not provided).
|
||||
chunk_size: Chunk size for multipart uploads (default 64MB).
|
||||
"""
|
||||
|
||||
def __init__(
|
||||
self,
|
||||
api_key: str | None = None,
|
||||
chunk_size: int = DEFAULT_UPLOAD_CHUNK_SIZE,
|
||||
) -> None:
|
||||
"""Initialize the OpenAI uploader.
|
||||
|
||||
Args:
|
||||
api_key: Optional OpenAI API key. If not provided, uses
|
||||
OPENAI_API_KEY environment variable.
|
||||
chunk_size: Chunk size in bytes for multipart uploads (default 64MB).
|
||||
"""
|
||||
self._api_key = api_key or os.environ.get("OPENAI_API_KEY")
|
||||
self._chunk_size = chunk_size
|
||||
self._client: Any = None
|
||||
self._async_client: Any = None
|
||||
|
||||
@property
|
||||
def provider_name(self) -> str:
|
||||
"""Return the provider name."""
|
||||
return "openai"
|
||||
|
||||
def _get_client(self) -> Any:
|
||||
"""Get or create the OpenAI client."""
|
||||
if self._client is None:
|
||||
try:
|
||||
from openai import OpenAI
|
||||
|
||||
self._client = OpenAI(api_key=self._api_key)
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"openai is required for OpenAI file uploads. "
|
||||
"Install with: pip install openai"
|
||||
) from e
|
||||
return self._client
|
||||
|
||||
def _get_async_client(self) -> Any:
|
||||
"""Get or create the async OpenAI client."""
|
||||
if self._async_client is None:
|
||||
try:
|
||||
from openai import AsyncOpenAI
|
||||
|
||||
self._async_client = AsyncOpenAI(api_key=self._api_key)
|
||||
except ImportError as e:
|
||||
raise ImportError(
|
||||
"openai is required for OpenAI file uploads. "
|
||||
"Install with: pip install openai"
|
||||
) from e
|
||||
return self._async_client
|
||||
|
||||
def upload(self, file: FileInput, purpose: str | None = None) -> UploadResult:
|
||||
"""Upload a file to OpenAI.
|
||||
|
||||
Uses Files API for files <= 512MB, Uploads API for larger files.
|
||||
For large files, streams chunks to avoid loading entire file in memory.
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
purpose: Optional purpose for the file (default: "user_data").
|
||||
|
||||
Returns:
|
||||
UploadResult with the file ID and metadata.
|
||||
|
||||
Raises:
|
||||
TransientUploadError: For retryable errors (network, rate limits).
|
||||
PermanentUploadError: For non-retryable errors (auth, validation).
|
||||
"""
|
||||
from crewai.files.processing.exceptions import (
|
||||
PermanentUploadError,
|
||||
TransientUploadError,
|
||||
)
|
||||
|
||||
try:
|
||||
file_size = _get_file_size(file)
|
||||
|
||||
if file_size is not None and file_size > FILES_API_MAX_SIZE:
|
||||
return self._upload_multipart_streaming(file, file_size, purpose)
|
||||
|
||||
content = file.read()
|
||||
if len(content) > FILES_API_MAX_SIZE:
|
||||
return self._upload_multipart(file, content, purpose)
|
||||
return self._upload_simple(file, content, purpose)
|
||||
except ImportError:
|
||||
raise
|
||||
except (TransientUploadError, PermanentUploadError):
|
||||
raise
|
||||
except Exception as e:
|
||||
raise self._classify_error(e, file.filename) from e
|
||||
|
||||
def _upload_simple(
|
||||
self,
|
||||
file: FileInput,
|
||||
content: bytes,
|
||||
purpose: str | None,
|
||||
) -> UploadResult:
|
||||
"""Upload using the Files API (single request, up to 512MB).
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
content: File content bytes.
|
||||
purpose: Optional purpose for the file.
|
||||
|
||||
Returns:
|
||||
UploadResult with the file ID and metadata.
|
||||
"""
|
||||
client = self._get_client()
|
||||
file_purpose = purpose or "user_data"
|
||||
|
||||
file_data = io.BytesIO(content)
|
||||
file_data.name = file.filename or "file"
|
||||
|
||||
logger.info(
|
||||
f"Uploading file '{file.filename}' to OpenAI Files API ({len(content)} bytes)"
|
||||
)
|
||||
|
||||
uploaded_file = client.files.create(
|
||||
file=file_data,
|
||||
purpose=file_purpose,
|
||||
)
|
||||
|
||||
logger.info(f"Uploaded to OpenAI: {uploaded_file.id}")
|
||||
|
||||
return UploadResult(
|
||||
file_id=uploaded_file.id,
|
||||
file_uri=None,
|
||||
content_type=file.content_type,
|
||||
expires_at=None,
|
||||
provider=self.provider_name,
|
||||
)
|
||||
|
||||
def _upload_multipart(
|
||||
self,
|
||||
file: FileInput,
|
||||
content: bytes,
|
||||
purpose: str | None,
|
||||
) -> UploadResult:
|
||||
"""Upload using the Uploads API with content already in memory.
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
content: File content bytes (already loaded).
|
||||
purpose: Optional purpose for the file.
|
||||
|
||||
Returns:
|
||||
UploadResult with the file ID and metadata.
|
||||
"""
|
||||
client = self._get_client()
|
||||
file_purpose = purpose or "user_data"
|
||||
filename = file.filename or "file"
|
||||
file_size = len(content)
|
||||
|
||||
logger.info(
|
||||
f"Uploading file '{filename}' to OpenAI Uploads API "
|
||||
f"({file_size} bytes, {self._chunk_size} byte chunks)"
|
||||
)
|
||||
|
||||
upload = client.uploads.create(
|
||||
bytes=file_size,
|
||||
filename=filename,
|
||||
mime_type=file.content_type,
|
||||
purpose=file_purpose,
|
||||
)
|
||||
|
||||
part_ids: list[str] = []
|
||||
offset = 0
|
||||
part_num = 1
|
||||
|
||||
try:
|
||||
while offset < file_size:
|
||||
chunk = content[offset : offset + self._chunk_size]
|
||||
chunk_io = io.BytesIO(chunk)
|
||||
|
||||
logger.debug(
|
||||
f"Uploading part {part_num} ({len(chunk)} bytes, offset {offset})"
|
||||
)
|
||||
|
||||
part = client.uploads.parts.create(
|
||||
upload_id=upload.id,
|
||||
data=chunk_io,
|
||||
)
|
||||
part_ids.append(part.id)
|
||||
|
||||
offset += self._chunk_size
|
||||
part_num += 1
|
||||
|
||||
completed = client.uploads.complete(
|
||||
upload_id=upload.id,
|
||||
part_ids=part_ids,
|
||||
)
|
||||
|
||||
file_id = completed.file.id if completed.file else upload.id
|
||||
logger.info(f"Completed multipart upload to OpenAI: {file_id}")
|
||||
|
||||
return UploadResult(
|
||||
file_id=file_id,
|
||||
file_uri=None,
|
||||
content_type=file.content_type,
|
||||
expires_at=None,
|
||||
provider=self.provider_name,
|
||||
)
|
||||
except Exception:
|
||||
logger.warning(f"Multipart upload failed, cancelling upload {upload.id}")
|
||||
try:
|
||||
client.uploads.cancel(upload_id=upload.id)
|
||||
except Exception as cancel_err:
|
||||
logger.debug(f"Failed to cancel upload: {cancel_err}")
|
||||
raise
|
||||
|
||||
def _upload_multipart_streaming(
|
||||
self,
|
||||
file: FileInput,
|
||||
file_size: int,
|
||||
purpose: str | None,
|
||||
) -> UploadResult:
|
||||
"""Upload using the Uploads API with streaming chunks.
|
||||
|
||||
Streams chunks directly from the file source without loading
|
||||
the entire file into memory. Used for large files.
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
file_size: Total file size in bytes.
|
||||
purpose: Optional purpose for the file.
|
||||
|
||||
Returns:
|
||||
UploadResult with the file ID and metadata.
|
||||
"""
|
||||
client = self._get_client()
|
||||
file_purpose = purpose or "user_data"
|
||||
filename = file.filename or "file"
|
||||
|
||||
logger.info(
|
||||
f"Uploading file '{filename}' to OpenAI Uploads API (streaming) "
|
||||
f"({file_size} bytes, {self._chunk_size} byte chunks)"
|
||||
)
|
||||
|
||||
upload = client.uploads.create(
|
||||
bytes=file_size,
|
||||
filename=filename,
|
||||
mime_type=file.content_type,
|
||||
purpose=file_purpose,
|
||||
)
|
||||
|
||||
part_ids: list[str] = []
|
||||
part_num = 1
|
||||
|
||||
try:
|
||||
for chunk in _iter_file_chunks(file, self._chunk_size):
|
||||
chunk_io = io.BytesIO(chunk)
|
||||
|
||||
logger.debug(f"Uploading part {part_num} ({len(chunk)} bytes)")
|
||||
|
||||
part = client.uploads.parts.create(
|
||||
upload_id=upload.id,
|
||||
data=chunk_io,
|
||||
)
|
||||
part_ids.append(part.id)
|
||||
part_num += 1
|
||||
|
||||
completed = client.uploads.complete(
|
||||
upload_id=upload.id,
|
||||
part_ids=part_ids,
|
||||
)
|
||||
|
||||
file_id = completed.file.id if completed.file else upload.id
|
||||
logger.info(f"Completed streaming multipart upload to OpenAI: {file_id}")
|
||||
|
||||
return UploadResult(
|
||||
file_id=file_id,
|
||||
file_uri=None,
|
||||
content_type=file.content_type,
|
||||
expires_at=None,
|
||||
provider=self.provider_name,
|
||||
)
|
||||
except Exception:
|
||||
logger.warning(f"Multipart upload failed, cancelling upload {upload.id}")
|
||||
try:
|
||||
client.uploads.cancel(upload_id=upload.id)
|
||||
except Exception as cancel_err:
|
||||
logger.debug(f"Failed to cancel upload: {cancel_err}")
|
||||
raise
|
||||
|
||||
@staticmethod
|
||||
def _classify_error(e: Exception, filename: str | None) -> Exception:
|
||||
"""Classify an exception as transient or permanent.
|
||||
|
||||
Args:
|
||||
e: The exception to classify.
|
||||
filename: The filename for error context.
|
||||
|
||||
Returns:
|
||||
TransientUploadError or PermanentUploadError.
|
||||
"""
|
||||
from crewai.files.processing.exceptions import (
|
||||
PermanentUploadError,
|
||||
TransientUploadError,
|
||||
)
|
||||
|
||||
error_type = type(e).__name__
|
||||
if "RateLimit" in error_type or "APIConnection" in error_type:
|
||||
return TransientUploadError(
|
||||
f"Transient upload error: {e}", file_name=filename
|
||||
)
|
||||
if "Authentication" in error_type or "Permission" in error_type:
|
||||
return PermanentUploadError(
|
||||
f"Authentication/permission error: {e}", file_name=filename
|
||||
)
|
||||
if "BadRequest" in error_type or "InvalidRequest" in error_type:
|
||||
return PermanentUploadError(f"Invalid request: {e}", file_name=filename)
|
||||
|
||||
status_code = getattr(e, "status_code", None)
|
||||
if status_code is not None:
|
||||
if status_code >= 500 or status_code == 429:
|
||||
return TransientUploadError(
|
||||
f"Server error ({status_code}): {e}", file_name=filename
|
||||
)
|
||||
if status_code in (401, 403):
|
||||
return PermanentUploadError(
|
||||
f"Auth error ({status_code}): {e}", file_name=filename
|
||||
)
|
||||
if status_code == 400:
|
||||
return PermanentUploadError(
|
||||
f"Bad request ({status_code}): {e}", file_name=filename
|
||||
)
|
||||
|
||||
return TransientUploadError(f"Upload failed: {e}", file_name=filename)
|
||||
|
||||
def delete(self, file_id: str) -> bool:
|
||||
"""Delete an uploaded file from OpenAI.
|
||||
|
||||
Args:
|
||||
file_id: The file ID to delete.
|
||||
|
||||
Returns:
|
||||
True if deletion was successful, False otherwise.
|
||||
"""
|
||||
try:
|
||||
client = self._get_client()
|
||||
client.files.delete(file_id)
|
||||
logger.info(f"Deleted OpenAI file: {file_id}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to delete OpenAI file {file_id}: {e}")
|
||||
return False
|
||||
|
||||
def get_file_info(self, file_id: str) -> dict[str, Any] | None:
|
||||
"""Get information about an uploaded file.
|
||||
|
||||
Args:
|
||||
file_id: The file ID.
|
||||
|
||||
Returns:
|
||||
Dictionary with file information, or None if not found.
|
||||
"""
|
||||
try:
|
||||
client = self._get_client()
|
||||
file_info = client.files.retrieve(file_id)
|
||||
return {
|
||||
"id": file_info.id,
|
||||
"filename": file_info.filename,
|
||||
"purpose": file_info.purpose,
|
||||
"bytes": file_info.bytes,
|
||||
"created_at": file_info.created_at,
|
||||
"status": file_info.status,
|
||||
}
|
||||
except Exception as e:
|
||||
logger.debug(f"Failed to get OpenAI file info for {file_id}: {e}")
|
||||
return None
|
||||
|
||||
def list_files(self) -> list[dict[str, Any]]:
|
||||
"""List all uploaded files.
|
||||
|
||||
Returns:
|
||||
List of dictionaries with file information.
|
||||
"""
|
||||
try:
|
||||
client = self._get_client()
|
||||
files = client.files.list()
|
||||
return [
|
||||
{
|
||||
"id": f.id,
|
||||
"filename": f.filename,
|
||||
"purpose": f.purpose,
|
||||
"bytes": f.bytes,
|
||||
"created_at": f.created_at,
|
||||
"status": f.status,
|
||||
}
|
||||
for f in files.data
|
||||
]
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to list OpenAI files: {e}")
|
||||
return []
|
||||
|
||||
async def aupload(
|
||||
self, file: FileInput, purpose: str | None = None
|
||||
) -> UploadResult:
|
||||
"""Async upload a file to OpenAI using native async client.
|
||||
|
||||
Uses Files API for files <= 512MB, Uploads API for larger files.
|
||||
For large files, streams chunks to avoid loading entire file in memory.
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
purpose: Optional purpose for the file (default: "user_data").
|
||||
|
||||
Returns:
|
||||
UploadResult with the file ID and metadata.
|
||||
|
||||
Raises:
|
||||
TransientUploadError: For retryable errors (network, rate limits).
|
||||
PermanentUploadError: For non-retryable errors (auth, validation).
|
||||
"""
|
||||
from crewai.files.processing.exceptions import (
|
||||
PermanentUploadError,
|
||||
TransientUploadError,
|
||||
)
|
||||
|
||||
try:
|
||||
file_size = _get_file_size(file)
|
||||
|
||||
if file_size is not None and file_size > FILES_API_MAX_SIZE:
|
||||
return await self._aupload_multipart_streaming(file, file_size, purpose)
|
||||
|
||||
content = await file.aread()
|
||||
if len(content) > FILES_API_MAX_SIZE:
|
||||
return await self._aupload_multipart(file, content, purpose)
|
||||
return await self._aupload_simple(file, content, purpose)
|
||||
except ImportError:
|
||||
raise
|
||||
except (TransientUploadError, PermanentUploadError):
|
||||
raise
|
||||
except Exception as e:
|
||||
raise self._classify_error(e, file.filename) from e
|
||||
|
||||
async def _aupload_simple(
|
||||
self,
|
||||
file: FileInput,
|
||||
content: bytes,
|
||||
purpose: str | None,
|
||||
) -> UploadResult:
|
||||
"""Async upload using the Files API (single request, up to 512MB).
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
content: File content bytes.
|
||||
purpose: Optional purpose for the file.
|
||||
|
||||
Returns:
|
||||
UploadResult with the file ID and metadata.
|
||||
"""
|
||||
client = self._get_async_client()
|
||||
file_purpose = purpose or "user_data"
|
||||
|
||||
file_data = io.BytesIO(content)
|
||||
file_data.name = file.filename or "file"
|
||||
|
||||
logger.info(
|
||||
f"Uploading file '{file.filename}' to OpenAI Files API ({len(content)} bytes)"
|
||||
)
|
||||
|
||||
uploaded_file = await client.files.create(
|
||||
file=file_data,
|
||||
purpose=file_purpose,
|
||||
)
|
||||
|
||||
logger.info(f"Uploaded to OpenAI: {uploaded_file.id}")
|
||||
|
||||
return UploadResult(
|
||||
file_id=uploaded_file.id,
|
||||
file_uri=None,
|
||||
content_type=file.content_type,
|
||||
expires_at=None,
|
||||
provider=self.provider_name,
|
||||
)
|
||||
|
||||
async def _aupload_multipart(
|
||||
self,
|
||||
file: FileInput,
|
||||
content: bytes,
|
||||
purpose: str | None,
|
||||
) -> UploadResult:
|
||||
"""Async upload using the Uploads API (multipart chunked, up to 8GB).
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
content: File content bytes.
|
||||
purpose: Optional purpose for the file.
|
||||
|
||||
Returns:
|
||||
UploadResult with the file ID and metadata.
|
||||
"""
|
||||
client = self._get_async_client()
|
||||
file_purpose = purpose or "user_data"
|
||||
filename = file.filename or "file"
|
||||
file_size = len(content)
|
||||
|
||||
logger.info(
|
||||
f"Uploading file '{filename}' to OpenAI Uploads API "
|
||||
f"({file_size} bytes, {self._chunk_size} byte chunks)"
|
||||
)
|
||||
|
||||
upload = await client.uploads.create(
|
||||
bytes=file_size,
|
||||
filename=filename,
|
||||
mime_type=file.content_type,
|
||||
purpose=file_purpose,
|
||||
)
|
||||
|
||||
part_ids: list[str] = []
|
||||
offset = 0
|
||||
part_num = 1
|
||||
|
||||
try:
|
||||
while offset < file_size:
|
||||
chunk = content[offset : offset + self._chunk_size]
|
||||
chunk_io = io.BytesIO(chunk)
|
||||
|
||||
logger.debug(
|
||||
f"Uploading part {part_num} ({len(chunk)} bytes, offset {offset})"
|
||||
)
|
||||
|
||||
part = await client.uploads.parts.create(
|
||||
upload_id=upload.id,
|
||||
data=chunk_io,
|
||||
)
|
||||
part_ids.append(part.id)
|
||||
|
||||
offset += self._chunk_size
|
||||
part_num += 1
|
||||
|
||||
completed = await client.uploads.complete(
|
||||
upload_id=upload.id,
|
||||
part_ids=part_ids,
|
||||
)
|
||||
|
||||
file_id = completed.file.id if completed.file else upload.id
|
||||
logger.info(f"Completed multipart upload to OpenAI: {file_id}")
|
||||
|
||||
return UploadResult(
|
||||
file_id=file_id,
|
||||
file_uri=None,
|
||||
content_type=file.content_type,
|
||||
expires_at=None,
|
||||
provider=self.provider_name,
|
||||
)
|
||||
except Exception:
|
||||
logger.warning(f"Multipart upload failed, cancelling upload {upload.id}")
|
||||
try:
|
||||
await client.uploads.cancel(upload_id=upload.id)
|
||||
except Exception as cancel_err:
|
||||
logger.debug(f"Failed to cancel upload: {cancel_err}")
|
||||
raise
|
||||
|
||||
async def _aupload_multipart_streaming(
|
||||
self,
|
||||
file: FileInput,
|
||||
file_size: int,
|
||||
purpose: str | None,
|
||||
) -> UploadResult:
|
||||
"""Async upload using the Uploads API with streaming chunks.
|
||||
|
||||
Streams chunks directly from the file source without loading
|
||||
the entire file into memory. Used for large files.
|
||||
|
||||
Args:
|
||||
file: The file to upload.
|
||||
file_size: Total file size in bytes.
|
||||
purpose: Optional purpose for the file.
|
||||
|
||||
Returns:
|
||||
UploadResult with the file ID and metadata.
|
||||
"""
|
||||
client = self._get_async_client()
|
||||
file_purpose = purpose or "user_data"
|
||||
filename = file.filename or "file"
|
||||
|
||||
logger.info(
|
||||
f"Uploading file '{filename}' to OpenAI Uploads API (streaming) "
|
||||
f"({file_size} bytes, {self._chunk_size} byte chunks)"
|
||||
)
|
||||
|
||||
upload = await client.uploads.create(
|
||||
bytes=file_size,
|
||||
filename=filename,
|
||||
mime_type=file.content_type,
|
||||
purpose=file_purpose,
|
||||
)
|
||||
|
||||
part_ids: list[str] = []
|
||||
part_num = 1
|
||||
|
||||
try:
|
||||
async for chunk in _aiter_file_chunks(file, self._chunk_size):
|
||||
chunk_io = io.BytesIO(chunk)
|
||||
|
||||
logger.debug(f"Uploading part {part_num} ({len(chunk)} bytes)")
|
||||
|
||||
part = await client.uploads.parts.create(
|
||||
upload_id=upload.id,
|
||||
data=chunk_io,
|
||||
)
|
||||
part_ids.append(part.id)
|
||||
part_num += 1
|
||||
|
||||
completed = await client.uploads.complete(
|
||||
upload_id=upload.id,
|
||||
part_ids=part_ids,
|
||||
)
|
||||
|
||||
file_id = completed.file.id if completed.file else upload.id
|
||||
logger.info(f"Completed streaming multipart upload to OpenAI: {file_id}")
|
||||
|
||||
return UploadResult(
|
||||
file_id=file_id,
|
||||
file_uri=None,
|
||||
content_type=file.content_type,
|
||||
expires_at=None,
|
||||
provider=self.provider_name,
|
||||
)
|
||||
except Exception:
|
||||
logger.warning(f"Multipart upload failed, cancelling upload {upload.id}")
|
||||
try:
|
||||
await client.uploads.cancel(upload_id=upload.id)
|
||||
except Exception as cancel_err:
|
||||
logger.debug(f"Failed to cancel upload: {cancel_err}")
|
||||
raise
|
||||
|
||||
async def adelete(self, file_id: str) -> bool:
|
||||
"""Async delete an uploaded file from OpenAI.
|
||||
|
||||
Args:
|
||||
file_id: The file ID to delete.
|
||||
|
||||
Returns:
|
||||
True if deletion was successful, False otherwise.
|
||||
"""
|
||||
try:
|
||||
client = self._get_async_client()
|
||||
await client.files.delete(file_id)
|
||||
logger.info(f"Deleted OpenAI file: {file_id}")
|
||||
return True
|
||||
except Exception as e:
|
||||
logger.warning(f"Failed to delete OpenAI file {file_id}: {e}")
|
||||
return False
|
||||
@@ -12,6 +12,7 @@ from concurrent.futures import Future
|
||||
import copy
|
||||
import inspect
|
||||
import logging
|
||||
import threading
|
||||
from typing import (
|
||||
TYPE_CHECKING,
|
||||
Any,
|
||||
@@ -64,6 +65,7 @@ from crewai.flow.persistence.base import FlowPersistence
|
||||
from crewai.flow.types import FlowExecutionData, FlowMethodName, PendingListenerKey
|
||||
from crewai.flow.utils import (
|
||||
_extract_all_methods,
|
||||
_extract_all_methods_recursive,
|
||||
_normalize_condition,
|
||||
get_possible_return_constants,
|
||||
is_flow_condition_dict,
|
||||
@@ -73,6 +75,7 @@ from crewai.flow.utils import (
|
||||
is_simple_flow_condition,
|
||||
)
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from crewai.flow.async_feedback.types import PendingFeedbackContext
|
||||
from crewai.flow.human_feedback import HumanFeedbackResult
|
||||
@@ -396,6 +399,62 @@ def and_(*conditions: str | FlowCondition | Callable[..., Any]) -> FlowCondition
|
||||
return {"type": AND_CONDITION, "conditions": processed_conditions}
|
||||
|
||||
|
||||
class StateProxy(Generic[T]):
|
||||
"""Proxy that provides thread-safe access to flow state.
|
||||
|
||||
Wraps state objects (dict or BaseModel) and uses a lock for all write
|
||||
operations to prevent race conditions when parallel listeners modify state.
|
||||
"""
|
||||
|
||||
__slots__ = ("_proxy_lock", "_proxy_state")
|
||||
|
||||
def __init__(self, state: T, lock: threading.Lock) -> None:
|
||||
object.__setattr__(self, "_proxy_state", state)
|
||||
object.__setattr__(self, "_proxy_lock", lock)
|
||||
|
||||
def __getattr__(self, name: str) -> Any:
|
||||
return getattr(object.__getattribute__(self, "_proxy_state"), name)
|
||||
|
||||
def __setattr__(self, name: str, value: Any) -> None:
|
||||
if name in ("_proxy_state", "_proxy_lock"):
|
||||
object.__setattr__(self, name, value)
|
||||
else:
|
||||
with object.__getattribute__(self, "_proxy_lock"):
|
||||
setattr(object.__getattribute__(self, "_proxy_state"), name, value)
|
||||
|
||||
def __getitem__(self, key: str) -> Any:
|
||||
return object.__getattribute__(self, "_proxy_state")[key]
|
||||
|
||||
def __setitem__(self, key: str, value: Any) -> None:
|
||||
with object.__getattribute__(self, "_proxy_lock"):
|
||||
object.__getattribute__(self, "_proxy_state")[key] = value
|
||||
|
||||
def __delitem__(self, key: str) -> None:
|
||||
with object.__getattribute__(self, "_proxy_lock"):
|
||||
del object.__getattribute__(self, "_proxy_state")[key]
|
||||
|
||||
def __contains__(self, key: str) -> bool:
|
||||
return key in object.__getattribute__(self, "_proxy_state")
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return repr(object.__getattribute__(self, "_proxy_state"))
|
||||
|
||||
def _unwrap(self) -> T:
|
||||
"""Return the underlying state object."""
|
||||
return cast(T, object.__getattribute__(self, "_proxy_state"))
|
||||
|
||||
def model_dump(self) -> dict[str, Any]:
|
||||
"""Return state as a dictionary.
|
||||
|
||||
Works for both dict and BaseModel underlying states.
|
||||
"""
|
||||
state = object.__getattribute__(self, "_proxy_state")
|
||||
if isinstance(state, dict):
|
||||
return state
|
||||
result: dict[str, Any] = state.model_dump()
|
||||
return result
|
||||
|
||||
|
||||
class FlowMeta(type):
|
||||
def __new__(
|
||||
mcs,
|
||||
@@ -519,7 +578,12 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
self._methods: dict[FlowMethodName, FlowMethod[Any, Any]] = {}
|
||||
self._method_execution_counts: dict[FlowMethodName, int] = {}
|
||||
self._pending_and_listeners: dict[PendingListenerKey, set[FlowMethodName]] = {}
|
||||
self._fired_or_listeners: set[FlowMethodName] = (
|
||||
set()
|
||||
) # Track OR listeners that already fired
|
||||
self._method_outputs: list[Any] = [] # list to store all method outputs
|
||||
self._state_lock = threading.Lock()
|
||||
self._or_listeners_lock = threading.Lock()
|
||||
self._completed_methods: set[FlowMethodName] = (
|
||||
set()
|
||||
) # Track completed methods for reload
|
||||
@@ -564,13 +628,182 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
method = method.__get__(self, self.__class__)
|
||||
self._methods[method.__name__] = method
|
||||
|
||||
def _mark_or_listener_fired(self, listener_name: FlowMethodName) -> bool:
|
||||
"""Mark an OR listener as fired atomically.
|
||||
|
||||
Args:
|
||||
listener_name: The name of the OR listener to mark.
|
||||
|
||||
Returns:
|
||||
True if this call was the first to fire the listener.
|
||||
False if the listener was already fired.
|
||||
"""
|
||||
with self._or_listeners_lock:
|
||||
if listener_name in self._fired_or_listeners:
|
||||
return False
|
||||
self._fired_or_listeners.add(listener_name)
|
||||
return True
|
||||
|
||||
def _clear_or_listeners(self) -> None:
|
||||
"""Clear fired OR listeners for cyclic flows."""
|
||||
with self._or_listeners_lock:
|
||||
self._fired_or_listeners.clear()
|
||||
|
||||
def _discard_or_listener(self, listener_name: FlowMethodName) -> None:
|
||||
"""Discard a single OR listener from the fired set."""
|
||||
with self._or_listeners_lock:
|
||||
self._fired_or_listeners.discard(listener_name)
|
||||
|
||||
def _build_racing_groups(self) -> dict[frozenset[FlowMethodName], FlowMethodName]:
|
||||
"""Identify groups of methods that race for the same OR listener.
|
||||
|
||||
Analyzes the flow graph to find listeners with OR conditions that have
|
||||
multiple trigger methods. These trigger methods form a "racing group"
|
||||
where only the first to complete should trigger the OR listener.
|
||||
|
||||
Only methods that are EXCLUSIVELY sources for the OR listener are included
|
||||
in the racing group. Methods that are also triggers for other listeners
|
||||
(e.g., AND conditions) are not cancelled when another racing source wins.
|
||||
|
||||
Returns:
|
||||
Dictionary mapping frozensets of racing method names to their
|
||||
shared OR listener name.
|
||||
|
||||
Example:
|
||||
If we have `@listen(or_(method_a, method_b))` on `handler`,
|
||||
and method_a/method_b aren't used elsewhere,
|
||||
this returns: {frozenset({'method_a', 'method_b'}): 'handler'}
|
||||
"""
|
||||
racing_groups: dict[frozenset[FlowMethodName], FlowMethodName] = {}
|
||||
|
||||
method_to_listeners: dict[FlowMethodName, set[FlowMethodName]] = {}
|
||||
for listener_name, condition_data in self._listeners.items():
|
||||
if is_simple_flow_condition(condition_data):
|
||||
_, methods = condition_data
|
||||
for m in methods:
|
||||
method_to_listeners.setdefault(m, set()).add(listener_name)
|
||||
elif is_flow_condition_dict(condition_data):
|
||||
all_methods = _extract_all_methods_recursive(condition_data)
|
||||
for m in all_methods:
|
||||
method_name = FlowMethodName(m) if isinstance(m, str) else m
|
||||
method_to_listeners.setdefault(method_name, set()).add(
|
||||
listener_name
|
||||
)
|
||||
|
||||
for listener_name, condition_data in self._listeners.items():
|
||||
if listener_name in self._routers:
|
||||
continue
|
||||
|
||||
trigger_methods: set[FlowMethodName] = set()
|
||||
|
||||
if is_simple_flow_condition(condition_data):
|
||||
condition_type, methods = condition_data
|
||||
if condition_type == OR_CONDITION and len(methods) > 1:
|
||||
trigger_methods = set(methods)
|
||||
|
||||
elif is_flow_condition_dict(condition_data):
|
||||
top_level_type = condition_data.get("type", OR_CONDITION)
|
||||
if top_level_type == OR_CONDITION:
|
||||
all_methods = _extract_all_methods_recursive(condition_data)
|
||||
if len(all_methods) > 1:
|
||||
trigger_methods = set(
|
||||
FlowMethodName(m) if isinstance(m, str) else m
|
||||
for m in all_methods
|
||||
)
|
||||
|
||||
if trigger_methods:
|
||||
exclusive_methods = {
|
||||
m
|
||||
for m in trigger_methods
|
||||
if method_to_listeners.get(m, set()) == {listener_name}
|
||||
}
|
||||
if len(exclusive_methods) > 1:
|
||||
racing_groups[frozenset(exclusive_methods)] = listener_name
|
||||
|
||||
return racing_groups
|
||||
|
||||
def _get_racing_group_for_listeners(
|
||||
self,
|
||||
listener_names: list[FlowMethodName],
|
||||
) -> tuple[frozenset[FlowMethodName], FlowMethodName] | None:
|
||||
"""Check if the given listeners form a racing group.
|
||||
|
||||
Args:
|
||||
listener_names: List of listener method names being executed.
|
||||
|
||||
Returns:
|
||||
Tuple of (racing_members, or_listener_name) if these listeners race,
|
||||
None otherwise.
|
||||
"""
|
||||
if not hasattr(self, "_racing_groups_cache"):
|
||||
self._racing_groups_cache = self._build_racing_groups()
|
||||
|
||||
listener_set = set(listener_names)
|
||||
|
||||
for racing_members, or_listener in self._racing_groups_cache.items():
|
||||
if racing_members & listener_set:
|
||||
racing_subset = racing_members & listener_set
|
||||
if len(racing_subset) > 1:
|
||||
return (frozenset(racing_subset), or_listener)
|
||||
|
||||
return None
|
||||
|
||||
async def _execute_racing_listeners(
|
||||
self,
|
||||
racing_listeners: frozenset[FlowMethodName],
|
||||
other_listeners: list[FlowMethodName],
|
||||
result: Any,
|
||||
) -> None:
|
||||
"""Execute racing listeners with first-wins semantics.
|
||||
|
||||
Racing listeners are executed in parallel, but once the first one
|
||||
completes, the others are cancelled. Non-racing listeners in the
|
||||
same batch are executed normally in parallel.
|
||||
|
||||
Args:
|
||||
racing_listeners: Set of listener names that race for an OR condition.
|
||||
other_listeners: Other listeners to execute in parallel (not racing).
|
||||
result: The result from the triggering method.
|
||||
"""
|
||||
racing_tasks = [
|
||||
asyncio.create_task(
|
||||
self._execute_single_listener(name, result),
|
||||
name=str(name),
|
||||
)
|
||||
for name in racing_listeners
|
||||
]
|
||||
|
||||
other_tasks = [
|
||||
asyncio.create_task(
|
||||
self._execute_single_listener(name, result),
|
||||
name=str(name),
|
||||
)
|
||||
for name in other_listeners
|
||||
]
|
||||
|
||||
if racing_tasks:
|
||||
for coro in asyncio.as_completed(racing_tasks):
|
||||
try:
|
||||
await coro
|
||||
except Exception as e:
|
||||
logger.debug(f"Racing listener failed: {e}")
|
||||
continue
|
||||
break
|
||||
|
||||
for task in racing_tasks:
|
||||
if not task.done():
|
||||
task.cancel()
|
||||
|
||||
if other_tasks:
|
||||
await asyncio.gather(*other_tasks, return_exceptions=True)
|
||||
|
||||
@classmethod
|
||||
def from_pending(
|
||||
cls,
|
||||
flow_id: str,
|
||||
persistence: FlowPersistence | None = None,
|
||||
**kwargs: Any,
|
||||
) -> "Flow[Any]":
|
||||
) -> Flow[Any]:
|
||||
"""Create a Flow instance from a pending feedback state.
|
||||
|
||||
This classmethod is used to restore a flow that was paused waiting
|
||||
@@ -631,7 +864,7 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
return instance
|
||||
|
||||
@property
|
||||
def pending_feedback(self) -> "PendingFeedbackContext | None":
|
||||
def pending_feedback(self) -> PendingFeedbackContext | None:
|
||||
"""Get the pending feedback context if this flow is waiting for feedback.
|
||||
|
||||
Returns:
|
||||
@@ -716,9 +949,10 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
Raises:
|
||||
ValueError: If no pending feedback context exists
|
||||
"""
|
||||
from crewai.flow.human_feedback import HumanFeedbackResult
|
||||
from datetime import datetime
|
||||
|
||||
from crewai.flow.human_feedback import HumanFeedbackResult
|
||||
|
||||
if self._pending_feedback_context is None:
|
||||
raise ValueError(
|
||||
"No pending feedback context. Use from_pending() to restore a paused flow."
|
||||
@@ -740,12 +974,14 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
# No default and no feedback - use first outcome
|
||||
collapsed_outcome = emit[0]
|
||||
elif emit:
|
||||
# Collapse feedback to outcome using LLM
|
||||
collapsed_outcome = self._collapse_to_outcome(
|
||||
feedback=feedback,
|
||||
outcomes=emit,
|
||||
llm=llm,
|
||||
)
|
||||
if llm is not None:
|
||||
collapsed_outcome = self._collapse_to_outcome(
|
||||
feedback=feedback,
|
||||
outcomes=emit,
|
||||
llm=llm,
|
||||
)
|
||||
else:
|
||||
collapsed_outcome = emit[0]
|
||||
|
||||
# Create result
|
||||
result = HumanFeedbackResult(
|
||||
@@ -784,21 +1020,16 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
# This allows methods to re-execute in loops (e.g., implement_changes → suggest_changes → implement_changes)
|
||||
self._is_execution_resuming = False
|
||||
|
||||
# Determine what to pass to listeners
|
||||
final_result: Any = result
|
||||
try:
|
||||
if emit and collapsed_outcome:
|
||||
# Router behavior - the outcome itself triggers listeners
|
||||
# First, add the outcome to method outputs as a router would
|
||||
self._method_outputs.append(collapsed_outcome)
|
||||
|
||||
# Then trigger listeners for the outcome (e.g., "approved" triggers @listen("approved"))
|
||||
final_result = await self._execute_listeners(
|
||||
FlowMethodName(collapsed_outcome), # Use outcome as trigger
|
||||
result, # Pass HumanFeedbackResult to listeners
|
||||
await self._execute_listeners(
|
||||
FlowMethodName(collapsed_outcome),
|
||||
result,
|
||||
)
|
||||
else:
|
||||
# Normal behavior - pass the HumanFeedbackResult
|
||||
final_result = await self._execute_listeners(
|
||||
await self._execute_listeners(
|
||||
FlowMethodName(context.method_name),
|
||||
result,
|
||||
)
|
||||
@@ -894,18 +1125,17 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
|
||||
# Handle case where initial_state is a type (class)
|
||||
if isinstance(self.initial_state, type):
|
||||
if issubclass(self.initial_state, FlowState):
|
||||
return self.initial_state() # Uses model defaults
|
||||
if issubclass(self.initial_state, BaseModel):
|
||||
# Validate that the model has an id field
|
||||
model_fields = getattr(self.initial_state, "model_fields", None)
|
||||
state_class: type[T] = self.initial_state
|
||||
if issubclass(state_class, FlowState):
|
||||
return state_class()
|
||||
if issubclass(state_class, BaseModel):
|
||||
model_fields = getattr(state_class, "model_fields", None)
|
||||
if not model_fields or "id" not in model_fields:
|
||||
raise ValueError("Flow state model must have an 'id' field")
|
||||
instance = self.initial_state()
|
||||
# Ensure id is set - generate UUID if empty
|
||||
if not getattr(instance, "id", None):
|
||||
object.__setattr__(instance, "id", str(uuid4()))
|
||||
return instance
|
||||
model_instance = state_class()
|
||||
if not getattr(model_instance, "id", None):
|
||||
object.__setattr__(model_instance, "id", str(uuid4()))
|
||||
return model_instance
|
||||
if self.initial_state is dict:
|
||||
return cast(T, {"id": str(uuid4())})
|
||||
|
||||
@@ -970,7 +1200,7 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
|
||||
@property
|
||||
def state(self) -> T:
|
||||
return self._state
|
||||
return StateProxy(self._state, self._state_lock) # type: ignore[return-value]
|
||||
|
||||
@property
|
||||
def method_outputs(self) -> list[Any]:
|
||||
@@ -1295,6 +1525,7 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
self._completed_methods.clear()
|
||||
self._method_outputs.clear()
|
||||
self._pending_and_listeners.clear()
|
||||
self._clear_or_listeners()
|
||||
else:
|
||||
# We're restoring from persistence, set the flag
|
||||
self._is_execution_resuming = True
|
||||
@@ -1346,9 +1577,26 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
self._initialize_state(inputs)
|
||||
|
||||
try:
|
||||
# Determine which start methods to execute at kickoff
|
||||
# Conditional start methods (with __trigger_methods__) are only triggered by their conditions
|
||||
# UNLESS there are no unconditional starts (then all starts run as entry points)
|
||||
unconditional_starts = [
|
||||
start_method
|
||||
for start_method in self._start_methods
|
||||
if not getattr(
|
||||
self._methods.get(start_method), "__trigger_methods__", None
|
||||
)
|
||||
]
|
||||
# If there are unconditional starts, only run those at kickoff
|
||||
# If there are NO unconditional starts, run all starts (including conditional ones)
|
||||
starts_to_execute = (
|
||||
unconditional_starts
|
||||
if unconditional_starts
|
||||
else self._start_methods
|
||||
)
|
||||
tasks = [
|
||||
self._execute_start_method(start_method)
|
||||
for start_method in self._start_methods
|
||||
for start_method in starts_to_execute
|
||||
]
|
||||
await asyncio.gather(*tasks)
|
||||
except Exception as e:
|
||||
@@ -1431,13 +1679,14 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
)
|
||||
self._event_futures.clear()
|
||||
|
||||
trace_listener = TraceCollectionListener()
|
||||
if trace_listener.batch_manager.batch_owner_type == "flow":
|
||||
if trace_listener.first_time_handler.is_first_time:
|
||||
trace_listener.first_time_handler.mark_events_collected()
|
||||
trace_listener.first_time_handler.handle_execution_completion()
|
||||
else:
|
||||
trace_listener.batch_manager.finalize_batch()
|
||||
if not self.suppress_flow_events:
|
||||
trace_listener = TraceCollectionListener()
|
||||
if trace_listener.batch_manager.batch_owner_type == "flow":
|
||||
if trace_listener.first_time_handler.is_first_time:
|
||||
trace_listener.first_time_handler.mark_events_collected()
|
||||
trace_listener.first_time_handler.handle_execution_completion()
|
||||
else:
|
||||
trace_listener.batch_manager.finalize_batch()
|
||||
|
||||
return final_output
|
||||
finally:
|
||||
@@ -1481,6 +1730,8 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
return
|
||||
# For cyclic flows, clear from completed to allow re-execution
|
||||
self._completed_methods.discard(start_method_name)
|
||||
# Also clear fired OR listeners to allow them to fire again in new cycle
|
||||
self._clear_or_listeners()
|
||||
|
||||
method = self._methods[start_method_name]
|
||||
enhanced_method = self._inject_trigger_payload_for_start_method(method)
|
||||
@@ -1503,11 +1754,25 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
if self.last_human_feedback is not None
|
||||
else result
|
||||
)
|
||||
tasks = [
|
||||
self._execute_single_listener(listener_name, listener_result)
|
||||
for listener_name in listeners_for_result
|
||||
]
|
||||
await asyncio.gather(*tasks)
|
||||
racing_group = self._get_racing_group_for_listeners(
|
||||
listeners_for_result
|
||||
)
|
||||
if racing_group:
|
||||
racing_members, _ = racing_group
|
||||
other_listeners = [
|
||||
name
|
||||
for name in listeners_for_result
|
||||
if name not in racing_members
|
||||
]
|
||||
await self._execute_racing_listeners(
|
||||
racing_members, other_listeners, listener_result
|
||||
)
|
||||
else:
|
||||
tasks = [
|
||||
self._execute_single_listener(listener_name, listener_result)
|
||||
for listener_name in listeners_for_result
|
||||
]
|
||||
await asyncio.gather(*tasks)
|
||||
else:
|
||||
await self._execute_listeners(start_method_name, result)
|
||||
|
||||
@@ -1573,11 +1838,19 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
if future:
|
||||
self._event_futures.append(future)
|
||||
|
||||
result = (
|
||||
await method(*args, **kwargs)
|
||||
if asyncio.iscoroutinefunction(method)
|
||||
else method(*args, **kwargs)
|
||||
)
|
||||
if asyncio.iscoroutinefunction(method):
|
||||
result = await method(*args, **kwargs)
|
||||
else:
|
||||
# Run sync methods in thread pool for isolation
|
||||
# This allows Agent.kickoff() to work synchronously inside Flow methods
|
||||
import contextvars
|
||||
|
||||
ctx = contextvars.copy_context()
|
||||
result = await asyncio.to_thread(ctx.run, method, *args, **kwargs)
|
||||
|
||||
# Auto-await coroutines returned from sync methods (enables AgentExecutor pattern)
|
||||
if asyncio.iscoroutine(result):
|
||||
result = await result
|
||||
|
||||
self._method_outputs.append(result)
|
||||
self._method_execution_counts[method_name] = (
|
||||
@@ -1724,11 +1997,27 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
listener_result = router_result_to_feedback.get(
|
||||
str(current_trigger), result
|
||||
)
|
||||
tasks = [
|
||||
self._execute_single_listener(listener_name, listener_result)
|
||||
for listener_name in listeners_triggered
|
||||
]
|
||||
await asyncio.gather(*tasks)
|
||||
racing_group = self._get_racing_group_for_listeners(
|
||||
listeners_triggered
|
||||
)
|
||||
if racing_group:
|
||||
racing_members, _ = racing_group
|
||||
other_listeners = [
|
||||
name
|
||||
for name in listeners_triggered
|
||||
if name not in racing_members
|
||||
]
|
||||
await self._execute_racing_listeners(
|
||||
racing_members, other_listeners, listener_result
|
||||
)
|
||||
else:
|
||||
tasks = [
|
||||
self._execute_single_listener(
|
||||
listener_name, listener_result
|
||||
)
|
||||
for listener_name in listeners_triggered
|
||||
]
|
||||
await asyncio.gather(*tasks)
|
||||
|
||||
if current_trigger in router_results:
|
||||
# Find start methods triggered by this router result
|
||||
@@ -1745,14 +2034,16 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
should_trigger = current_trigger in all_methods
|
||||
|
||||
if should_trigger:
|
||||
# Only execute if this is a cycle (method was already completed)
|
||||
# Execute conditional start method triggered by router result
|
||||
if method_name in self._completed_methods:
|
||||
# For router-triggered start methods in cycles, temporarily clear resumption flag
|
||||
# to allow cyclic execution
|
||||
# For cyclic re-execution, temporarily clear resumption flag
|
||||
was_resuming = self._is_execution_resuming
|
||||
self._is_execution_resuming = False
|
||||
await self._execute_start_method(method_name)
|
||||
self._is_execution_resuming = was_resuming
|
||||
else:
|
||||
# First-time execution of conditional start
|
||||
await self._execute_start_method(method_name)
|
||||
|
||||
def _evaluate_condition(
|
||||
self,
|
||||
@@ -1850,8 +2141,21 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
condition_type, methods = condition_data
|
||||
|
||||
if condition_type == OR_CONDITION:
|
||||
if trigger_method in methods:
|
||||
triggered.append(listener_name)
|
||||
# Only trigger multi-source OR listeners (or_(A, B, C)) once - skip if already fired
|
||||
# Simple single-method listeners fire every time their trigger occurs
|
||||
# Routers also fire every time - they're decision points
|
||||
has_multiple_triggers = len(methods) > 1
|
||||
should_check_fired = has_multiple_triggers and not is_router
|
||||
|
||||
if (
|
||||
not should_check_fired
|
||||
or listener_name not in self._fired_or_listeners
|
||||
):
|
||||
if trigger_method in methods:
|
||||
triggered.append(listener_name)
|
||||
# Only track multi-source OR listeners (not single-method or routers)
|
||||
if should_check_fired:
|
||||
self._fired_or_listeners.add(listener_name)
|
||||
elif condition_type == AND_CONDITION:
|
||||
pending_key = PendingListenerKey(listener_name)
|
||||
if pending_key not in self._pending_and_listeners:
|
||||
@@ -1864,10 +2168,26 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
self._pending_and_listeners.pop(pending_key, None)
|
||||
|
||||
elif is_flow_condition_dict(condition_data):
|
||||
# For complex conditions, check if top-level is OR and track accordingly
|
||||
top_level_type = condition_data.get("type", OR_CONDITION)
|
||||
is_or_based = top_level_type == OR_CONDITION
|
||||
|
||||
# Only track multi-source OR conditions (multiple sub-conditions), not routers
|
||||
sub_conditions = condition_data.get("conditions", [])
|
||||
has_multiple_triggers = is_or_based and len(sub_conditions) > 1
|
||||
should_check_fired = has_multiple_triggers and not is_router
|
||||
|
||||
# Skip compound OR-based listeners that have already fired
|
||||
if should_check_fired and listener_name in self._fired_or_listeners:
|
||||
continue
|
||||
|
||||
if self._evaluate_condition(
|
||||
condition_data, trigger_method, listener_name
|
||||
):
|
||||
triggered.append(listener_name)
|
||||
# Track compound OR-based listeners so they only fire once
|
||||
if should_check_fired:
|
||||
self._fired_or_listeners.add(listener_name)
|
||||
|
||||
return triggered
|
||||
|
||||
@@ -1896,9 +2216,22 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
if self._is_execution_resuming:
|
||||
# During resumption, skip execution but continue listeners
|
||||
await self._execute_listeners(listener_name, None)
|
||||
|
||||
# For routers, also check if any conditional starts they triggered are completed
|
||||
# If so, continue their chains
|
||||
if listener_name in self._routers:
|
||||
for start_method_name in self._start_methods:
|
||||
if (
|
||||
start_method_name in self._listeners
|
||||
and start_method_name in self._completed_methods
|
||||
):
|
||||
# This conditional start was executed, continue its chain
|
||||
await self._execute_start_method(start_method_name)
|
||||
return
|
||||
# For cyclic flows, clear from completed to allow re-execution
|
||||
self._completed_methods.discard(listener_name)
|
||||
# Also clear from fired OR listeners for cyclic flows
|
||||
self._discard_or_listener(listener_name)
|
||||
|
||||
try:
|
||||
method = self._methods[listener_name]
|
||||
@@ -1931,11 +2264,25 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
if self.last_human_feedback is not None
|
||||
else listener_result
|
||||
)
|
||||
tasks = [
|
||||
self._execute_single_listener(name, feedback_result)
|
||||
for name in listeners_for_result
|
||||
]
|
||||
await asyncio.gather(*tasks)
|
||||
racing_group = self._get_racing_group_for_listeners(
|
||||
listeners_for_result
|
||||
)
|
||||
if racing_group:
|
||||
racing_members, _ = racing_group
|
||||
other_listeners = [
|
||||
name
|
||||
for name in listeners_for_result
|
||||
if name not in racing_members
|
||||
]
|
||||
await self._execute_racing_listeners(
|
||||
racing_members, other_listeners, feedback_result
|
||||
)
|
||||
else:
|
||||
tasks = [
|
||||
self._execute_single_listener(name, feedback_result)
|
||||
for name in listeners_for_result
|
||||
]
|
||||
await asyncio.gather(*tasks)
|
||||
|
||||
except Exception as e:
|
||||
# Don't log HumanFeedbackPending as an error - it's expected control flow
|
||||
@@ -2049,7 +2396,7 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
from crewai.llms.base_llm import BaseLLM as BaseLLMClass
|
||||
from crewai.utilities.i18n import get_i18n
|
||||
|
||||
# Get or create LLM instance
|
||||
llm_instance: BaseLLMClass
|
||||
if isinstance(llm, str):
|
||||
llm_instance = LLM(model=llm)
|
||||
elif isinstance(llm, BaseLLMClass):
|
||||
@@ -2084,26 +2431,23 @@ class Flow(Generic[T], metaclass=FlowMeta):
|
||||
response_model=FeedbackOutcome,
|
||||
)
|
||||
|
||||
# Parse the response - LLM returns JSON string when using response_model
|
||||
if isinstance(response, str):
|
||||
import json
|
||||
|
||||
try:
|
||||
parsed = json.loads(response)
|
||||
return parsed.get("outcome", outcomes[0])
|
||||
return str(parsed.get("outcome", outcomes[0]))
|
||||
except json.JSONDecodeError:
|
||||
# Not valid JSON, might be raw outcome string
|
||||
response_clean = response.strip()
|
||||
for outcome in outcomes:
|
||||
if outcome.lower() == response_clean.lower():
|
||||
return outcome
|
||||
return outcomes[0]
|
||||
elif isinstance(response, FeedbackOutcome):
|
||||
return response.outcome
|
||||
return str(response.outcome)
|
||||
elif hasattr(response, "outcome"):
|
||||
return response.outcome
|
||||
return str(response.outcome)
|
||||
else:
|
||||
# Unexpected type, fall back to first outcome
|
||||
logger.warning(f"Unexpected response type: {type(response)}")
|
||||
return outcomes[0]
|
||||
|
||||
|
||||
@@ -61,7 +61,7 @@ class PersistenceDecorator:
|
||||
@classmethod
|
||||
def persist_state(
|
||||
cls,
|
||||
flow_instance: Flow,
|
||||
flow_instance: Flow[Any],
|
||||
method_name: str,
|
||||
persistence_instance: FlowPersistence,
|
||||
verbose: bool = False,
|
||||
@@ -90,7 +90,13 @@ class PersistenceDecorator:
|
||||
flow_uuid: str | None = None
|
||||
if isinstance(state, dict):
|
||||
flow_uuid = state.get("id")
|
||||
elif isinstance(state, BaseModel):
|
||||
elif hasattr(state, "_unwrap"):
|
||||
unwrapped = state._unwrap()
|
||||
if isinstance(unwrapped, dict):
|
||||
flow_uuid = unwrapped.get("id")
|
||||
else:
|
||||
flow_uuid = getattr(unwrapped, "id", None)
|
||||
elif isinstance(state, BaseModel) or hasattr(state, "id"):
|
||||
flow_uuid = getattr(state, "id", None)
|
||||
|
||||
if not flow_uuid:
|
||||
@@ -104,10 +110,11 @@ class PersistenceDecorator:
|
||||
logger.info(LOG_MESSAGES["save_state"].format(flow_uuid))
|
||||
|
||||
try:
|
||||
state_data = state._unwrap() if hasattr(state, "_unwrap") else state
|
||||
persistence_instance.save_state(
|
||||
flow_uuid=flow_uuid,
|
||||
method_name=method_name,
|
||||
state_data=state,
|
||||
state_data=state_data,
|
||||
)
|
||||
except Exception as e:
|
||||
error_msg = LOG_MESSAGES["save_error"].format(method_name, str(e))
|
||||
@@ -126,7 +133,9 @@ class PersistenceDecorator:
|
||||
raise ValueError(error_msg) from e
|
||||
|
||||
|
||||
def persist(persistence: FlowPersistence | None = None, verbose: bool = False):
|
||||
def persist(
|
||||
persistence: FlowPersistence | None = None, verbose: bool = False
|
||||
) -> Callable[[type | Callable[..., T]], type | Callable[..., T]]:
|
||||
"""Decorator to persist flow state.
|
||||
|
||||
This decorator can be applied at either the class level or method level.
|
||||
@@ -189,8 +198,8 @@ def persist(persistence: FlowPersistence | None = None, verbose: bool = False):
|
||||
if asyncio.iscoroutinefunction(method):
|
||||
# Create a closure to capture the current name and method
|
||||
def create_async_wrapper(
|
||||
method_name: str, original_method: Callable
|
||||
):
|
||||
method_name: str, original_method: Callable[..., Any]
|
||||
) -> Callable[..., Any]:
|
||||
@functools.wraps(original_method)
|
||||
async def method_wrapper(
|
||||
self: Any, *args: Any, **kwargs: Any
|
||||
@@ -221,8 +230,8 @@ def persist(persistence: FlowPersistence | None = None, verbose: bool = False):
|
||||
else:
|
||||
# Create a closure to capture the current name and method
|
||||
def create_sync_wrapper(
|
||||
method_name: str, original_method: Callable
|
||||
):
|
||||
method_name: str, original_method: Callable[..., Any]
|
||||
) -> Callable[..., Any]:
|
||||
@functools.wraps(original_method)
|
||||
def method_wrapper(self: Any, *args: Any, **kwargs: Any) -> Any:
|
||||
result = original_method(self, *args, **kwargs)
|
||||
@@ -268,7 +277,7 @@ def persist(persistence: FlowPersistence | None = None, verbose: bool = False):
|
||||
PersistenceDecorator.persist_state(
|
||||
flow_instance, method.__name__, actual_persistence, verbose
|
||||
)
|
||||
return result
|
||||
return cast(T, result)
|
||||
|
||||
for attr in [
|
||||
"__is_start_method__",
|
||||
|
||||
@@ -10,6 +10,7 @@ from typing import (
|
||||
get_origin,
|
||||
)
|
||||
import uuid
|
||||
import warnings
|
||||
|
||||
from pydantic import (
|
||||
UUID4,
|
||||
@@ -80,6 +81,11 @@ class LiteAgent(FlowTrackable, BaseModel):
|
||||
"""
|
||||
A lightweight agent that can process messages and use tools.
|
||||
|
||||
.. deprecated::
|
||||
LiteAgent is deprecated and will be removed in a future version.
|
||||
Use ``Agent().kickoff(messages)`` instead, which provides the same
|
||||
functionality with additional features like memory and knowledge support.
|
||||
|
||||
This agent is simpler than the full Agent class, focusing on direct execution
|
||||
rather than task delegation. It's designed to be used for simple interactions
|
||||
where a full crew is not needed.
|
||||
@@ -164,6 +170,18 @@ class LiteAgent(FlowTrackable, BaseModel):
|
||||
default_factory=get_after_llm_call_hooks
|
||||
)
|
||||
|
||||
@model_validator(mode="after")
|
||||
def emit_deprecation_warning(self) -> Self:
|
||||
"""Emit deprecation warning for LiteAgent usage."""
|
||||
warnings.warn(
|
||||
"LiteAgent is deprecated and will be removed in a future version. "
|
||||
"Use Agent().kickoff(messages) instead, which provides the same "
|
||||
"functionality with additional features like memory and knowledge support.",
|
||||
DeprecationWarning,
|
||||
stacklevel=2,
|
||||
)
|
||||
return self
|
||||
|
||||
@model_validator(mode="after")
|
||||
def setup_llm(self) -> Self:
|
||||
"""Set up the LLM and other components after initialization."""
|
||||
|
||||
@@ -66,6 +66,7 @@ if TYPE_CHECKING:
|
||||
from litellm.utils import supports_response_schema
|
||||
|
||||
from crewai.agent.core import Agent
|
||||
from crewai.files import FileInput, UploadCache
|
||||
from crewai.llms.hooks.base import BaseInterceptor
|
||||
from crewai.llms.providers.anthropic.completion import AnthropicThinkingConfig
|
||||
from crewai.task import Task
|
||||
@@ -683,7 +684,7 @@ class LLM(BaseLLM):
|
||||
"temperature": self.temperature,
|
||||
"top_p": self.top_p,
|
||||
"n": self.n,
|
||||
"stop": self.stop,
|
||||
"stop": self.stop or None,
|
||||
"max_tokens": self.max_tokens or self.max_completion_tokens,
|
||||
"presence_penalty": self.presence_penalty,
|
||||
"frequency_penalty": self.frequency_penalty,
|
||||
@@ -931,7 +932,6 @@ class LLM(BaseLLM):
|
||||
self._handle_streaming_callbacks(callbacks, usage_info, last_chunk)
|
||||
|
||||
if not tool_calls or not available_functions:
|
||||
|
||||
if response_model and self.is_litellm:
|
||||
instructor_instance = InternalInstructor(
|
||||
content=full_response,
|
||||
@@ -1144,8 +1144,12 @@ class LLM(BaseLLM):
|
||||
if response_model:
|
||||
params["response_model"] = response_model
|
||||
response = litellm.completion(**params)
|
||||
|
||||
if hasattr(response,"usage") and not isinstance(response.usage, type) and response.usage:
|
||||
|
||||
if (
|
||||
hasattr(response, "usage")
|
||||
and not isinstance(response.usage, type)
|
||||
and response.usage
|
||||
):
|
||||
usage_info = response.usage
|
||||
self._track_token_usage_internal(usage_info)
|
||||
|
||||
@@ -1273,7 +1277,11 @@ class LLM(BaseLLM):
|
||||
params["response_model"] = response_model
|
||||
response = await litellm.acompletion(**params)
|
||||
|
||||
if hasattr(response,"usage") and not isinstance(response.usage, type) and response.usage:
|
||||
if (
|
||||
hasattr(response, "usage")
|
||||
and not isinstance(response.usage, type)
|
||||
and response.usage
|
||||
):
|
||||
usage_info = response.usage
|
||||
self._track_token_usage_internal(usage_info)
|
||||
|
||||
@@ -1363,7 +1371,7 @@ class LLM(BaseLLM):
|
||||
"""
|
||||
full_response = ""
|
||||
chunk_count = 0
|
||||
|
||||
|
||||
usage_info = None
|
||||
|
||||
accumulated_tool_args: defaultdict[int, AccumulatedToolArgs] = defaultdict(
|
||||
@@ -2205,3 +2213,107 @@ class LLM(BaseLLM):
|
||||
stop=copy.deepcopy(self.stop, memo) if self.stop else None,
|
||||
**filtered_params,
|
||||
)
|
||||
|
||||
def supports_multimodal(self) -> bool:
|
||||
"""Check if the model supports multimodal inputs.
|
||||
|
||||
For litellm, check common vision-enabled model prefixes.
|
||||
|
||||
Returns:
|
||||
True if the model likely supports images.
|
||||
"""
|
||||
vision_prefixes = (
|
||||
"gpt-4o",
|
||||
"gpt-4-turbo",
|
||||
"gpt-4-vision",
|
||||
"gpt-4.1",
|
||||
"claude-3",
|
||||
"claude-4",
|
||||
"gemini",
|
||||
)
|
||||
model_lower = self.model.lower()
|
||||
return any(
|
||||
model_lower.startswith(p) or f"/{p}" in model_lower for p in vision_prefixes
|
||||
)
|
||||
|
||||
def supported_multimodal_content_types(self) -> list[str]:
|
||||
"""Get content types supported for multimodal input.
|
||||
|
||||
Determines supported types based on the underlying model.
|
||||
|
||||
Returns:
|
||||
List of supported MIME type prefixes.
|
||||
"""
|
||||
if not self.supports_multimodal():
|
||||
return []
|
||||
|
||||
model_lower = self.model.lower()
|
||||
|
||||
if "gemini" in model_lower:
|
||||
return ["image/", "audio/", "video/", "application/pdf", "text/"]
|
||||
if "claude-3" in model_lower or "claude-4" in model_lower:
|
||||
return ["image/", "application/pdf"]
|
||||
return ["image/"]
|
||||
|
||||
def format_multimodal_content(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
upload_cache: UploadCache | None = None,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Format files as multimodal content blocks for litellm.
|
||||
|
||||
Uses OpenAI-compatible format which litellm translates to provider format.
|
||||
Uses FileResolver for consistent base64 encoding.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping file names to FileInput objects.
|
||||
upload_cache: Optional cache (not used by litellm but kept for interface consistency).
|
||||
|
||||
Returns:
|
||||
List of content blocks in OpenAI's expected format.
|
||||
"""
|
||||
import base64
|
||||
|
||||
from crewai.files import (
|
||||
FileResolver,
|
||||
FileResolverConfig,
|
||||
InlineBase64,
|
||||
)
|
||||
|
||||
if not self.supports_multimodal():
|
||||
return []
|
||||
|
||||
content_blocks: list[dict[str, Any]] = []
|
||||
supported_types = self.supported_multimodal_content_types()
|
||||
|
||||
# LiteLLM uses OpenAI-compatible format
|
||||
config = FileResolverConfig(prefer_upload=False)
|
||||
resolver = FileResolver(config=config, upload_cache=upload_cache)
|
||||
|
||||
for file_input in files.values():
|
||||
content_type = file_input.content_type
|
||||
if not any(content_type.startswith(t) for t in supported_types):
|
||||
continue
|
||||
|
||||
resolved = resolver.resolve(file_input, "openai")
|
||||
|
||||
if isinstance(resolved, InlineBase64):
|
||||
content_blocks.append(
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": f"data:{resolved.content_type};base64,{resolved.data}"
|
||||
},
|
||||
}
|
||||
)
|
||||
else:
|
||||
# Fallback to direct base64 encoding
|
||||
data = base64.b64encode(file_input.read()).decode("ascii")
|
||||
content_blocks.append(
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {"url": f"data:{content_type};base64,{data}"},
|
||||
}
|
||||
)
|
||||
|
||||
return content_blocks
|
||||
|
||||
@@ -33,6 +33,7 @@ from crewai.types.usage_metrics import UsageMetrics
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from crewai.agent.core import Agent
|
||||
from crewai.files import FileInput, UploadCache
|
||||
from crewai.task import Task
|
||||
from crewai.tools.base_tool import BaseTool
|
||||
from crewai.utilities.types import LLMMessage
|
||||
@@ -280,6 +281,73 @@ class BaseLLM(ABC):
|
||||
# Default implementation - subclasses should override with model-specific values
|
||||
return DEFAULT_CONTEXT_WINDOW_SIZE
|
||||
|
||||
def supports_multimodal(self) -> bool:
|
||||
"""Check if the LLM supports multimodal inputs.
|
||||
|
||||
Returns:
|
||||
True if the LLM supports images, PDFs, audio, or video.
|
||||
"""
|
||||
return False
|
||||
|
||||
def supported_multimodal_content_types(self) -> list[str]:
|
||||
"""Get the content types supported by this LLM for multimodal input.
|
||||
|
||||
Returns:
|
||||
List of supported MIME type prefixes (e.g., ["image/", "application/pdf"]).
|
||||
"""
|
||||
return []
|
||||
|
||||
def format_multimodal_content(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
upload_cache: UploadCache | None = None,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Format files as multimodal content blocks for the LLM.
|
||||
|
||||
Subclasses should override this to provide provider-specific formatting.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping file names to FileInput objects.
|
||||
upload_cache: Optional cache for tracking uploaded files.
|
||||
|
||||
Returns:
|
||||
List of content blocks in the provider's expected format.
|
||||
"""
|
||||
return []
|
||||
|
||||
async def aformat_multimodal_content(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
upload_cache: UploadCache | None = None,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Async format files as multimodal content blocks for the LLM.
|
||||
|
||||
Default implementation calls the sync version. Subclasses should
|
||||
override to use async file resolution for parallel processing.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping file names to FileInput objects.
|
||||
upload_cache: Optional cache for tracking uploaded files.
|
||||
|
||||
Returns:
|
||||
List of content blocks in the provider's expected format.
|
||||
"""
|
||||
return self.format_multimodal_content(files, upload_cache)
|
||||
|
||||
def format_text_content(self, text: str) -> dict[str, Any]:
|
||||
"""Format text as a content block for the LLM.
|
||||
|
||||
Default implementation uses OpenAI/Anthropic format.
|
||||
Subclasses should override for provider-specific formatting.
|
||||
|
||||
Args:
|
||||
text: The text content to format.
|
||||
|
||||
Returns:
|
||||
A content block in the provider's expected format.
|
||||
"""
|
||||
return {"type": "text", "text": text}
|
||||
|
||||
# Common helper methods for native SDK implementations
|
||||
|
||||
def _emit_call_started_event(
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import base64
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
@@ -19,8 +20,11 @@ from crewai.utilities.types import LLMMessage
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from crewai.files import FileInput, UploadCache
|
||||
from crewai.llms.hooks.base import BaseInterceptor
|
||||
|
||||
DEFAULT_CACHE_TTL = "ephemeral"
|
||||
|
||||
try:
|
||||
from anthropic import Anthropic, AsyncAnthropic
|
||||
from anthropic.types import Message, TextBlock, ThinkingBlock, ToolUseBlock
|
||||
@@ -1231,3 +1235,242 @@ class AnthropicCompletion(BaseLLM):
|
||||
"total_tokens": input_tokens + output_tokens,
|
||||
}
|
||||
return {"total_tokens": 0}
|
||||
|
||||
def supports_multimodal(self) -> bool:
|
||||
"""Check if the model supports multimodal inputs.
|
||||
|
||||
All Claude 3+ models support vision and PDFs.
|
||||
|
||||
Returns:
|
||||
True if the model supports images and PDFs.
|
||||
"""
|
||||
return "claude-3" in self.model.lower() or "claude-4" in self.model.lower()
|
||||
|
||||
def supported_multimodal_content_types(self) -> list[str]:
|
||||
"""Get content types supported by Anthropic for multimodal input.
|
||||
|
||||
Returns:
|
||||
List of supported MIME type prefixes.
|
||||
"""
|
||||
if not self.supports_multimodal():
|
||||
return []
|
||||
return ["image/", "application/pdf"]
|
||||
|
||||
def format_multimodal_content(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
upload_cache: UploadCache | None = None,
|
||||
enable_caching: bool = True,
|
||||
cache_ttl: str | None = None,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Format files as Anthropic multimodal content blocks.
|
||||
|
||||
Anthropic supports both base64 inline format and file references via Files API.
|
||||
Uses FileResolver to determine the best delivery method based on file size.
|
||||
Supports prompt caching to reduce costs and latency for repeated file usage.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping file names to FileInput objects.
|
||||
upload_cache: Optional cache for tracking uploaded files.
|
||||
enable_caching: Whether to add cache_control markers (default: True).
|
||||
cache_ttl: Cache TTL - "ephemeral" (5min) or "1h" (1hr for supported models).
|
||||
|
||||
Returns:
|
||||
List of content blocks in Anthropic's expected format.
|
||||
"""
|
||||
if not self.supports_multimodal():
|
||||
return []
|
||||
|
||||
from crewai.files import (
|
||||
FileReference,
|
||||
FileResolver,
|
||||
FileResolverConfig,
|
||||
InlineBase64,
|
||||
)
|
||||
|
||||
content_blocks: list[dict[str, Any]] = []
|
||||
supported_types = self.supported_multimodal_content_types()
|
||||
|
||||
config = FileResolverConfig(prefer_upload=False)
|
||||
resolver = FileResolver(config=config, upload_cache=upload_cache)
|
||||
|
||||
file_list = list(files.values())
|
||||
num_files = len(file_list)
|
||||
|
||||
for i, file_input in enumerate(file_list):
|
||||
content_type = file_input.content_type
|
||||
if not any(content_type.startswith(t) for t in supported_types):
|
||||
continue
|
||||
|
||||
resolved = resolver.resolve(file_input, "anthropic")
|
||||
block: dict[str, Any] = {}
|
||||
|
||||
if isinstance(resolved, FileReference):
|
||||
if content_type.startswith("image/"):
|
||||
block = {
|
||||
"type": "image",
|
||||
"source": {
|
||||
"type": "file",
|
||||
"file_id": resolved.file_id,
|
||||
},
|
||||
}
|
||||
elif content_type == "application/pdf":
|
||||
block = {
|
||||
"type": "document",
|
||||
"source": {
|
||||
"type": "file",
|
||||
"file_id": resolved.file_id,
|
||||
},
|
||||
}
|
||||
elif isinstance(resolved, InlineBase64):
|
||||
if content_type.startswith("image/"):
|
||||
block = {
|
||||
"type": "image",
|
||||
"source": {
|
||||
"type": "base64",
|
||||
"media_type": resolved.content_type,
|
||||
"data": resolved.data,
|
||||
},
|
||||
}
|
||||
elif content_type == "application/pdf":
|
||||
block = {
|
||||
"type": "document",
|
||||
"source": {
|
||||
"type": "base64",
|
||||
"media_type": resolved.content_type,
|
||||
"data": resolved.data,
|
||||
},
|
||||
}
|
||||
else:
|
||||
data = base64.b64encode(file_input.read()).decode("ascii")
|
||||
if content_type.startswith("image/"):
|
||||
block = {
|
||||
"type": "image",
|
||||
"source": {
|
||||
"type": "base64",
|
||||
"media_type": content_type,
|
||||
"data": data,
|
||||
},
|
||||
}
|
||||
elif content_type == "application/pdf":
|
||||
block = {
|
||||
"type": "document",
|
||||
"source": {
|
||||
"type": "base64",
|
||||
"media_type": content_type,
|
||||
"data": data,
|
||||
},
|
||||
}
|
||||
|
||||
if block and enable_caching and i == num_files - 1:
|
||||
cache_control: dict[str, str] = {"type": cache_ttl or DEFAULT_CACHE_TTL}
|
||||
block["cache_control"] = cache_control
|
||||
|
||||
if block:
|
||||
content_blocks.append(block)
|
||||
|
||||
return content_blocks
|
||||
|
||||
async def aformat_multimodal_content(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
upload_cache: UploadCache | None = None,
|
||||
enable_caching: bool = True,
|
||||
cache_ttl: str | None = None,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Async format files as Anthropic multimodal content blocks.
|
||||
|
||||
Uses parallel file resolution for improved performance with multiple files.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping file names to FileInput objects.
|
||||
upload_cache: Optional cache for tracking uploaded files.
|
||||
enable_caching: Whether to add cache_control markers (default: True).
|
||||
cache_ttl: Cache TTL - "ephemeral" (5min) or "1h" (1hr for supported models).
|
||||
|
||||
Returns:
|
||||
List of content blocks in Anthropic's expected format.
|
||||
"""
|
||||
if not self.supports_multimodal():
|
||||
return []
|
||||
|
||||
from crewai.files import (
|
||||
FileReference,
|
||||
FileResolver,
|
||||
FileResolverConfig,
|
||||
InlineBase64,
|
||||
)
|
||||
|
||||
supported_types = self.supported_multimodal_content_types()
|
||||
|
||||
supported_files = {
|
||||
name: f
|
||||
for name, f in files.items()
|
||||
if any(f.content_type.startswith(t) for t in supported_types)
|
||||
}
|
||||
|
||||
if not supported_files:
|
||||
return []
|
||||
|
||||
config = FileResolverConfig(prefer_upload=False)
|
||||
resolver = FileResolver(config=config, upload_cache=upload_cache)
|
||||
resolved_files = await resolver.aresolve_files(supported_files, "anthropic")
|
||||
|
||||
content_blocks: list[dict[str, Any]] = []
|
||||
num_files = len(resolved_files)
|
||||
file_names = list(supported_files.keys())
|
||||
|
||||
for i, name in enumerate(file_names):
|
||||
if name not in resolved_files:
|
||||
continue
|
||||
|
||||
resolved = resolved_files[name]
|
||||
file_input = supported_files[name]
|
||||
content_type = file_input.content_type
|
||||
block: dict[str, Any] = {}
|
||||
|
||||
if isinstance(resolved, FileReference):
|
||||
if content_type.startswith("image/"):
|
||||
block = {
|
||||
"type": "image",
|
||||
"source": {
|
||||
"type": "file",
|
||||
"file_id": resolved.file_id,
|
||||
},
|
||||
}
|
||||
elif content_type == "application/pdf":
|
||||
block = {
|
||||
"type": "document",
|
||||
"source": {
|
||||
"type": "file",
|
||||
"file_id": resolved.file_id,
|
||||
},
|
||||
}
|
||||
elif isinstance(resolved, InlineBase64):
|
||||
if content_type.startswith("image/"):
|
||||
block = {
|
||||
"type": "image",
|
||||
"source": {
|
||||
"type": "base64",
|
||||
"media_type": resolved.content_type,
|
||||
"data": resolved.data,
|
||||
},
|
||||
}
|
||||
elif content_type == "application/pdf":
|
||||
block = {
|
||||
"type": "document",
|
||||
"source": {
|
||||
"type": "base64",
|
||||
"media_type": resolved.content_type,
|
||||
"data": resolved.data,
|
||||
},
|
||||
}
|
||||
|
||||
if block and enable_caching and i == num_files - 1:
|
||||
cache_control: dict[str, str] = {"type": cache_ttl or DEFAULT_CACHE_TTL}
|
||||
block["cache_control"] = cache_control
|
||||
|
||||
if block:
|
||||
content_blocks.append(block)
|
||||
|
||||
return content_blocks
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import base64
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
@@ -17,6 +18,7 @@ from crewai.utilities.types import LLMMessage
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from crewai.files import FileInput, UploadCache
|
||||
from crewai.llms.hooks.base import BaseInterceptor
|
||||
|
||||
|
||||
@@ -1016,3 +1018,136 @@ class AzureCompletion(BaseLLM):
|
||||
async def __aexit__(self, exc_type: Any, exc_val: Any, exc_tb: Any) -> None:
|
||||
"""Async context manager exit."""
|
||||
await self.aclose()
|
||||
|
||||
def supports_multimodal(self) -> bool:
|
||||
"""Check if the model supports multimodal inputs.
|
||||
|
||||
Azure OpenAI vision-enabled models include GPT-4o and GPT-4 Turbo with Vision.
|
||||
|
||||
Returns:
|
||||
True if the model supports images.
|
||||
"""
|
||||
vision_models = ("gpt-4o", "gpt-4-turbo", "gpt-4-vision", "gpt-4v")
|
||||
return any(self.model.lower().startswith(m) for m in vision_models)
|
||||
|
||||
def supported_multimodal_content_types(self) -> list[str]:
|
||||
"""Get content types supported by Azure for multimodal input.
|
||||
|
||||
Returns:
|
||||
List of supported MIME type prefixes.
|
||||
"""
|
||||
if not self.supports_multimodal():
|
||||
return []
|
||||
return ["image/"]
|
||||
|
||||
def format_multimodal_content(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
upload_cache: UploadCache | None = None,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Format files as Azure OpenAI multimodal content blocks.
|
||||
|
||||
Azure OpenAI uses the same image_url format as OpenAI.
|
||||
Uses FileResolver for consistent base64 encoding.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping file names to FileInput objects.
|
||||
upload_cache: Optional cache (not used by Azure but kept for interface consistency).
|
||||
|
||||
Returns:
|
||||
List of content blocks in Azure OpenAI's expected format.
|
||||
"""
|
||||
if not self.supports_multimodal():
|
||||
return []
|
||||
|
||||
from crewai.files import (
|
||||
FileResolver,
|
||||
FileResolverConfig,
|
||||
InlineBase64,
|
||||
)
|
||||
|
||||
content_blocks: list[dict[str, Any]] = []
|
||||
supported_types = self.supported_multimodal_content_types()
|
||||
|
||||
# Azure doesn't support file uploads for images, so just use inline
|
||||
config = FileResolverConfig(prefer_upload=False)
|
||||
resolver = FileResolver(config=config, upload_cache=upload_cache)
|
||||
|
||||
for file_input in files.values():
|
||||
content_type = file_input.content_type
|
||||
if not any(content_type.startswith(t) for t in supported_types):
|
||||
continue
|
||||
|
||||
resolved = resolver.resolve(file_input, "azure")
|
||||
|
||||
if isinstance(resolved, InlineBase64):
|
||||
content_blocks.append(
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": f"data:{resolved.content_type};base64,{resolved.data}"
|
||||
},
|
||||
}
|
||||
)
|
||||
else:
|
||||
# Fallback to direct base64 encoding
|
||||
data = base64.b64encode(file_input.read()).decode("ascii")
|
||||
content_blocks.append(
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {"url": f"data:{content_type};base64,{data}"},
|
||||
}
|
||||
)
|
||||
|
||||
return content_blocks
|
||||
|
||||
async def aformat_multimodal_content(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
upload_cache: UploadCache | None = None,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Async format files as Azure OpenAI multimodal content blocks.
|
||||
|
||||
Uses parallel file resolution for improved performance with multiple files.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping file names to FileInput objects.
|
||||
upload_cache: Optional cache (not used by Azure but kept for interface consistency).
|
||||
|
||||
Returns:
|
||||
List of content blocks in Azure OpenAI's expected format.
|
||||
"""
|
||||
if not self.supports_multimodal():
|
||||
return []
|
||||
|
||||
from crewai.files import (
|
||||
FileResolver,
|
||||
FileResolverConfig,
|
||||
InlineBase64,
|
||||
)
|
||||
|
||||
supported_types = self.supported_multimodal_content_types()
|
||||
|
||||
supported_files = {
|
||||
name: f
|
||||
for name, f in files.items()
|
||||
if any(f.content_type.startswith(t) for t in supported_types)
|
||||
}
|
||||
|
||||
if not supported_files:
|
||||
return []
|
||||
|
||||
config = FileResolverConfig(prefer_upload=False)
|
||||
resolver = FileResolver(config=config, upload_cache=upload_cache)
|
||||
resolved_files = await resolver.aresolve_files(supported_files, "azure")
|
||||
|
||||
return [
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": f"data:{resolved.content_type};base64,{resolved.data}"
|
||||
},
|
||||
}
|
||||
for resolved in resolved_files.values()
|
||||
if isinstance(resolved, InlineBase64)
|
||||
]
|
||||
|
||||
@@ -32,6 +32,7 @@ if TYPE_CHECKING:
|
||||
ToolTypeDef,
|
||||
)
|
||||
|
||||
from crewai.files import FileInput, UploadCache
|
||||
from crewai.llms.hooks.base import BaseInterceptor
|
||||
|
||||
|
||||
@@ -1450,3 +1451,372 @@ class BedrockCompletion(BaseLLM):
|
||||
|
||||
# Default context window size
|
||||
return int(8192 * CONTEXT_WINDOW_USAGE_RATIO)
|
||||
|
||||
def supports_multimodal(self) -> bool:
|
||||
"""Check if the model supports multimodal inputs.
|
||||
|
||||
Claude 3+ and Nova Lite/Pro/Premier on Bedrock support vision.
|
||||
|
||||
Returns:
|
||||
True if the model supports images.
|
||||
"""
|
||||
model_lower = self.model.lower()
|
||||
vision_models = (
|
||||
"anthropic.claude-3",
|
||||
"amazon.nova-lite",
|
||||
"amazon.nova-pro",
|
||||
"amazon.nova-premier",
|
||||
"us.amazon.nova-lite",
|
||||
"us.amazon.nova-pro",
|
||||
"us.amazon.nova-premier",
|
||||
)
|
||||
return any(model_lower.startswith(m) for m in vision_models)
|
||||
|
||||
def _is_nova_model(self) -> bool:
|
||||
"""Check if the model is an Amazon Nova model.
|
||||
|
||||
Only Nova models support S3 links for multimedia.
|
||||
|
||||
Returns:
|
||||
True if the model is a Nova model.
|
||||
"""
|
||||
model_lower = self.model.lower()
|
||||
return "amazon.nova-" in model_lower
|
||||
|
||||
def supported_multimodal_content_types(self) -> list[str]:
|
||||
"""Get content types supported by Bedrock for multimodal input.
|
||||
|
||||
Returns:
|
||||
List of supported MIME type prefixes.
|
||||
"""
|
||||
if not self.supports_multimodal():
|
||||
return []
|
||||
|
||||
types = ["image/png", "image/jpeg", "image/gif", "image/webp"]
|
||||
|
||||
if self._is_nova_model():
|
||||
types.extend(
|
||||
[
|
||||
"application/pdf",
|
||||
"text/csv",
|
||||
"text/plain",
|
||||
"text/markdown",
|
||||
"text/html",
|
||||
"application/msword",
|
||||
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
|
||||
"application/vnd.ms-excel",
|
||||
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
|
||||
"video/mp4",
|
||||
"video/quicktime",
|
||||
"video/x-matroska",
|
||||
"video/webm",
|
||||
"video/x-flv",
|
||||
"video/mpeg",
|
||||
"video/x-ms-wmv",
|
||||
"video/3gpp",
|
||||
]
|
||||
)
|
||||
else:
|
||||
types.append("application/pdf")
|
||||
|
||||
return types
|
||||
|
||||
def _get_document_format(self, content_type: str) -> str | None:
|
||||
"""Map content type to Bedrock document format.
|
||||
|
||||
Args:
|
||||
content_type: MIME type of the document.
|
||||
|
||||
Returns:
|
||||
Bedrock format string or None if unsupported.
|
||||
"""
|
||||
format_map = {
|
||||
"application/pdf": "pdf",
|
||||
"text/csv": "csv",
|
||||
"text/plain": "txt",
|
||||
"text/markdown": "md",
|
||||
"text/html": "html",
|
||||
"application/msword": "doc",
|
||||
"application/vnd.openxmlformats-officedocument.wordprocessingml.document": "docx",
|
||||
"application/vnd.ms-excel": "xls",
|
||||
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": "xlsx",
|
||||
}
|
||||
return format_map.get(content_type)
|
||||
|
||||
def _get_video_format(self, content_type: str) -> str | None:
|
||||
"""Map content type to Bedrock video format.
|
||||
|
||||
Args:
|
||||
content_type: MIME type of the video.
|
||||
|
||||
Returns:
|
||||
Bedrock format string or None if unsupported.
|
||||
"""
|
||||
format_map = {
|
||||
"video/mp4": "mp4",
|
||||
"video/quicktime": "mov",
|
||||
"video/x-matroska": "mkv",
|
||||
"video/webm": "webm",
|
||||
"video/x-flv": "flv",
|
||||
"video/mpeg": "mpeg",
|
||||
"video/x-ms-wmv": "wmv",
|
||||
"video/3gpp": "three_gp",
|
||||
}
|
||||
return format_map.get(content_type)
|
||||
|
||||
def format_multimodal_content(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
upload_cache: UploadCache | None = None,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Format files as Bedrock Converse API multimodal content blocks.
|
||||
|
||||
Bedrock Converse API supports both raw bytes and S3 URI references.
|
||||
S3 uploads are only supported by Amazon Nova models.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping file names to FileInput objects.
|
||||
upload_cache: Optional cache for S3 uploads.
|
||||
|
||||
Returns:
|
||||
List of content blocks in Bedrock's expected format.
|
||||
"""
|
||||
if not self.supports_multimodal():
|
||||
return []
|
||||
|
||||
import os
|
||||
|
||||
from crewai.files import (
|
||||
FileReference,
|
||||
FileResolver,
|
||||
FileResolverConfig,
|
||||
InlineBytes,
|
||||
)
|
||||
|
||||
content_blocks: list[dict[str, Any]] = []
|
||||
is_nova = self._is_nova_model()
|
||||
|
||||
s3_bucket = os.environ.get("CREWAI_BEDROCK_S3_BUCKET")
|
||||
s3_bucket_owner = os.environ.get("CREWAI_BEDROCK_S3_BUCKET_OWNER")
|
||||
prefer_upload = bool(s3_bucket) and is_nova
|
||||
|
||||
config = FileResolverConfig(
|
||||
prefer_upload=prefer_upload, use_bytes_for_bedrock=True
|
||||
)
|
||||
resolver = FileResolver(config=config, upload_cache=upload_cache)
|
||||
|
||||
for name, file_input in files.items():
|
||||
content_type = file_input.content_type
|
||||
resolved = resolver.resolve(file_input, "bedrock")
|
||||
|
||||
if isinstance(resolved, FileReference) and resolved.file_uri:
|
||||
s3_location: dict[str, Any] = {"uri": resolved.file_uri}
|
||||
if s3_bucket_owner:
|
||||
s3_location["bucketOwner"] = s3_bucket_owner
|
||||
|
||||
if content_type.startswith("image/"):
|
||||
media_type = content_type.split("/")[-1]
|
||||
if media_type == "jpg":
|
||||
media_type = "jpeg"
|
||||
content_blocks.append(
|
||||
{
|
||||
"image": {
|
||||
"format": media_type,
|
||||
"source": {"s3Location": s3_location},
|
||||
}
|
||||
}
|
||||
)
|
||||
elif content_type.startswith("video/"):
|
||||
video_format = self._get_video_format(content_type)
|
||||
if video_format:
|
||||
content_blocks.append(
|
||||
{
|
||||
"video": {
|
||||
"format": video_format,
|
||||
"source": {"s3Location": s3_location},
|
||||
}
|
||||
}
|
||||
)
|
||||
else:
|
||||
doc_format = self._get_document_format(content_type)
|
||||
if doc_format:
|
||||
content_blocks.append(
|
||||
{
|
||||
"document": {
|
||||
"name": name,
|
||||
"format": doc_format,
|
||||
"source": {"s3Location": s3_location},
|
||||
}
|
||||
}
|
||||
)
|
||||
else:
|
||||
if isinstance(resolved, InlineBytes):
|
||||
file_bytes = resolved.data
|
||||
else:
|
||||
file_bytes = file_input.read()
|
||||
|
||||
if content_type.startswith("image/"):
|
||||
media_type = content_type.split("/")[-1]
|
||||
if media_type == "jpg":
|
||||
media_type = "jpeg"
|
||||
content_blocks.append(
|
||||
{
|
||||
"image": {
|
||||
"format": media_type,
|
||||
"source": {"bytes": file_bytes},
|
||||
}
|
||||
}
|
||||
)
|
||||
elif content_type.startswith("video/"):
|
||||
video_format = self._get_video_format(content_type)
|
||||
if video_format:
|
||||
content_blocks.append(
|
||||
{
|
||||
"video": {
|
||||
"format": video_format,
|
||||
"source": {"bytes": file_bytes},
|
||||
}
|
||||
}
|
||||
)
|
||||
else:
|
||||
doc_format = self._get_document_format(content_type)
|
||||
if doc_format:
|
||||
content_blocks.append(
|
||||
{
|
||||
"document": {
|
||||
"name": name,
|
||||
"format": doc_format,
|
||||
"source": {"bytes": file_bytes},
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
return content_blocks
|
||||
|
||||
async def aformat_multimodal_content(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
upload_cache: UploadCache | None = None,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Async format files as Bedrock Converse API multimodal content blocks.
|
||||
|
||||
Uses parallel file resolution. S3 uploads are only supported by Nova models.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping file names to FileInput objects.
|
||||
upload_cache: Optional cache for S3 uploads.
|
||||
|
||||
Returns:
|
||||
List of content blocks in Bedrock's expected format.
|
||||
"""
|
||||
if not self.supports_multimodal():
|
||||
return []
|
||||
|
||||
import os
|
||||
|
||||
from crewai.files import (
|
||||
FileReference,
|
||||
FileResolver,
|
||||
FileResolverConfig,
|
||||
InlineBytes,
|
||||
)
|
||||
|
||||
is_nova = self._is_nova_model()
|
||||
s3_bucket = os.environ.get("CREWAI_BEDROCK_S3_BUCKET")
|
||||
s3_bucket_owner = os.environ.get("CREWAI_BEDROCK_S3_BUCKET_OWNER")
|
||||
prefer_upload = bool(s3_bucket) and is_nova
|
||||
|
||||
config = FileResolverConfig(
|
||||
prefer_upload=prefer_upload, use_bytes_for_bedrock=True
|
||||
)
|
||||
resolver = FileResolver(config=config, upload_cache=upload_cache)
|
||||
resolved_files = await resolver.aresolve_files(files, "bedrock")
|
||||
|
||||
content_blocks: list[dict[str, Any]] = []
|
||||
for name, resolved in resolved_files.items():
|
||||
file_input = files[name]
|
||||
content_type = file_input.content_type
|
||||
|
||||
if isinstance(resolved, FileReference) and resolved.file_uri:
|
||||
s3_location: dict[str, Any] = {"uri": resolved.file_uri}
|
||||
if s3_bucket_owner:
|
||||
s3_location["bucketOwner"] = s3_bucket_owner
|
||||
|
||||
if content_type.startswith("image/"):
|
||||
media_type = content_type.split("/")[-1]
|
||||
if media_type == "jpg":
|
||||
media_type = "jpeg"
|
||||
content_blocks.append(
|
||||
{
|
||||
"image": {
|
||||
"format": media_type,
|
||||
"source": {"s3Location": s3_location},
|
||||
}
|
||||
}
|
||||
)
|
||||
elif content_type.startswith("video/"):
|
||||
video_format = self._get_video_format(content_type)
|
||||
if video_format:
|
||||
content_blocks.append(
|
||||
{
|
||||
"video": {
|
||||
"format": video_format,
|
||||
"source": {"s3Location": s3_location},
|
||||
}
|
||||
}
|
||||
)
|
||||
else:
|
||||
doc_format = self._get_document_format(content_type)
|
||||
if doc_format:
|
||||
content_blocks.append(
|
||||
{
|
||||
"document": {
|
||||
"name": name,
|
||||
"format": doc_format,
|
||||
"source": {"s3Location": s3_location},
|
||||
}
|
||||
}
|
||||
)
|
||||
else:
|
||||
if isinstance(resolved, InlineBytes):
|
||||
file_bytes = resolved.data
|
||||
else:
|
||||
file_bytes = await file_input.aread()
|
||||
|
||||
if content_type.startswith("image/"):
|
||||
media_type = content_type.split("/")[-1]
|
||||
if media_type == "jpg":
|
||||
media_type = "jpeg"
|
||||
content_blocks.append(
|
||||
{
|
||||
"image": {
|
||||
"format": media_type,
|
||||
"source": {"bytes": file_bytes},
|
||||
}
|
||||
}
|
||||
)
|
||||
elif content_type.startswith("video/"):
|
||||
video_format = self._get_video_format(content_type)
|
||||
if video_format:
|
||||
content_blocks.append(
|
||||
{
|
||||
"video": {
|
||||
"format": video_format,
|
||||
"source": {"bytes": file_bytes},
|
||||
}
|
||||
}
|
||||
)
|
||||
else:
|
||||
doc_format = self._get_document_format(content_type)
|
||||
if doc_format:
|
||||
content_blocks.append(
|
||||
{
|
||||
"document": {
|
||||
"name": name,
|
||||
"format": doc_format,
|
||||
"source": {"bytes": file_bytes},
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
return content_blocks
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import base64
|
||||
import json
|
||||
import logging
|
||||
import os
|
||||
@@ -18,6 +19,10 @@ from crewai.utilities.types import LLMMessage
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from crewai.files import (
|
||||
FileInput,
|
||||
UploadCache,
|
||||
)
|
||||
from crewai.llms.hooks.base import BaseInterceptor
|
||||
|
||||
|
||||
@@ -54,15 +59,21 @@ class GeminiCompletion(BaseLLM):
|
||||
safety_settings: dict[str, Any] | None = None,
|
||||
client_params: dict[str, Any] | None = None,
|
||||
interceptor: BaseInterceptor[Any, Any] | None = None,
|
||||
use_vertexai: bool | None = None,
|
||||
**kwargs: Any,
|
||||
):
|
||||
"""Initialize Google Gemini chat completion client.
|
||||
|
||||
Args:
|
||||
model: Gemini model name (e.g., 'gemini-2.0-flash-001', 'gemini-1.5-pro')
|
||||
api_key: Google API key (defaults to GOOGLE_API_KEY or GEMINI_API_KEY env var)
|
||||
project: Google Cloud project ID (for Vertex AI)
|
||||
location: Google Cloud location (for Vertex AI, defaults to 'us-central1')
|
||||
api_key: Google API key for Gemini API authentication.
|
||||
Defaults to GOOGLE_API_KEY or GEMINI_API_KEY env var.
|
||||
NOTE: Cannot be used with Vertex AI (project parameter). Use Gemini API instead.
|
||||
project: Google Cloud project ID for Vertex AI with ADC authentication.
|
||||
Requires Application Default Credentials (gcloud auth application-default login).
|
||||
NOTE: Vertex AI does NOT support API keys, only OAuth2/ADC.
|
||||
If both api_key and project are set, api_key takes precedence.
|
||||
location: Google Cloud location (for Vertex AI with ADC, defaults to 'us-central1')
|
||||
temperature: Sampling temperature (0-2)
|
||||
top_p: Nucleus sampling parameter
|
||||
top_k: Top-k sampling parameter
|
||||
@@ -73,6 +84,12 @@ class GeminiCompletion(BaseLLM):
|
||||
client_params: Additional parameters to pass to the Google Gen AI Client constructor.
|
||||
Supports parameters like http_options, credentials, debug_config, etc.
|
||||
interceptor: HTTP interceptor (not yet supported for Gemini).
|
||||
use_vertexai: Whether to use Vertex AI instead of Gemini API.
|
||||
- True: Use Vertex AI (with ADC or Express mode with API key)
|
||||
- False: Use Gemini API (explicitly override env var)
|
||||
- None (default): Check GOOGLE_GENAI_USE_VERTEXAI env var
|
||||
When using Vertex AI with API key (Express mode), http_options with
|
||||
api_version="v1" is automatically configured.
|
||||
**kwargs: Additional parameters
|
||||
"""
|
||||
if interceptor is not None:
|
||||
@@ -95,7 +112,8 @@ class GeminiCompletion(BaseLLM):
|
||||
self.project = project or os.getenv("GOOGLE_CLOUD_PROJECT")
|
||||
self.location = location or os.getenv("GOOGLE_CLOUD_LOCATION") or "us-central1"
|
||||
|
||||
use_vertexai = os.getenv("GOOGLE_GENAI_USE_VERTEXAI", "").lower() == "true"
|
||||
if use_vertexai is None:
|
||||
use_vertexai = os.getenv("GOOGLE_GENAI_USE_VERTEXAI", "").lower() == "true"
|
||||
|
||||
self.client = self._initialize_client(use_vertexai)
|
||||
|
||||
@@ -146,13 +164,34 @@ class GeminiCompletion(BaseLLM):
|
||||
|
||||
Returns:
|
||||
Initialized Google Gen AI Client
|
||||
|
||||
Note:
|
||||
Google Gen AI SDK has two distinct endpoints with different auth requirements:
|
||||
- Gemini API (generativelanguage.googleapis.com): Supports API key authentication
|
||||
- Vertex AI (aiplatform.googleapis.com): Only supports OAuth2/ADC, NO API keys
|
||||
|
||||
When vertexai=True is set, it routes to aiplatform.googleapis.com which rejects
|
||||
API keys. Use Gemini API endpoint for API key authentication instead.
|
||||
"""
|
||||
client_params = {}
|
||||
|
||||
if self.client_params:
|
||||
client_params.update(self.client_params)
|
||||
|
||||
if use_vertexai or self.project:
|
||||
# Determine authentication mode based on available credentials
|
||||
has_api_key = bool(self.api_key)
|
||||
has_project = bool(self.project)
|
||||
|
||||
if has_api_key and has_project:
|
||||
logging.warning(
|
||||
"Both API key and project provided. Using API key authentication. "
|
||||
"Project/location parameters are ignored when using API keys. "
|
||||
"To use Vertex AI with ADC, remove the api_key parameter."
|
||||
)
|
||||
has_project = False
|
||||
|
||||
# Vertex AI with ADC (project without API key)
|
||||
if (use_vertexai or has_project) and not has_api_key:
|
||||
client_params.update(
|
||||
{
|
||||
"vertexai": True,
|
||||
@@ -161,12 +200,20 @@ class GeminiCompletion(BaseLLM):
|
||||
}
|
||||
)
|
||||
|
||||
client_params.pop("api_key", None)
|
||||
|
||||
elif self.api_key:
|
||||
# API key authentication (works with both Gemini API and Vertex AI Express)
|
||||
elif has_api_key:
|
||||
client_params["api_key"] = self.api_key
|
||||
|
||||
client_params.pop("vertexai", None)
|
||||
# Vertex AI Express mode: API key + vertexai=True + http_options with api_version="v1"
|
||||
# See: https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart?usertype=apikey
|
||||
if use_vertexai:
|
||||
client_params["vertexai"] = True
|
||||
client_params["http_options"] = types.HttpOptions(api_version="v1")
|
||||
else:
|
||||
# This ensures we use the Gemini API (generativelanguage.googleapis.com)
|
||||
client_params["vertexai"] = False
|
||||
|
||||
# Clean up project/location (not allowed with API key)
|
||||
client_params.pop("project", None)
|
||||
client_params.pop("location", None)
|
||||
|
||||
@@ -175,10 +222,13 @@ class GeminiCompletion(BaseLLM):
|
||||
return genai.Client(**client_params)
|
||||
except Exception as e:
|
||||
raise ValueError(
|
||||
"Either GOOGLE_API_KEY/GEMINI_API_KEY (for Gemini API) or "
|
||||
"GOOGLE_CLOUD_PROJECT (for Vertex AI) must be set"
|
||||
"Authentication required. Provide one of:\n"
|
||||
" 1. API key via GOOGLE_API_KEY or GEMINI_API_KEY environment variable\n"
|
||||
" (use_vertexai=True is optional for Vertex AI with API key)\n"
|
||||
" 2. For Vertex AI with ADC: Set GOOGLE_CLOUD_PROJECT and run:\n"
|
||||
" gcloud auth application-default login\n"
|
||||
" 3. Pass api_key parameter directly to LLM constructor\n"
|
||||
) from e
|
||||
|
||||
return genai.Client(**client_params)
|
||||
|
||||
def _get_client_params(self) -> dict[str, Any]:
|
||||
@@ -202,6 +252,8 @@ class GeminiCompletion(BaseLLM):
|
||||
"location": self.location,
|
||||
}
|
||||
)
|
||||
if self.api_key:
|
||||
params["api_key"] = self.api_key
|
||||
elif self.api_key:
|
||||
params["api_key"] = self.api_key
|
||||
|
||||
@@ -469,17 +521,31 @@ class GeminiCompletion(BaseLLM):
|
||||
role = message["role"]
|
||||
content = message["content"]
|
||||
|
||||
# Convert content to string if it's a list
|
||||
# Build parts list from content
|
||||
parts: list[types.Part] = []
|
||||
if isinstance(content, list):
|
||||
text_content = " ".join(
|
||||
str(item.get("text", "")) if isinstance(item, dict) else str(item)
|
||||
for item in content
|
||||
)
|
||||
for item in content:
|
||||
if isinstance(item, dict):
|
||||
if "text" in item:
|
||||
parts.append(types.Part.from_text(text=str(item["text"])))
|
||||
elif "inlineData" in item:
|
||||
inline = item["inlineData"]
|
||||
parts.append(
|
||||
types.Part.from_bytes(
|
||||
data=base64.b64decode(inline["data"]),
|
||||
mime_type=inline["mimeType"],
|
||||
)
|
||||
)
|
||||
else:
|
||||
parts.append(types.Part.from_text(text=str(item)))
|
||||
else:
|
||||
text_content = str(content) if content else ""
|
||||
parts.append(types.Part.from_text(text=str(content) if content else ""))
|
||||
|
||||
if role == "system":
|
||||
# Extract system instruction - Gemini handles it separately
|
||||
text_content = " ".join(
|
||||
p.text for p in parts if hasattr(p, "text") and p.text
|
||||
)
|
||||
if system_instruction:
|
||||
system_instruction += f"\n\n{text_content}"
|
||||
else:
|
||||
@@ -489,9 +555,7 @@ class GeminiCompletion(BaseLLM):
|
||||
gemini_role = "model" if role == "assistant" else "user"
|
||||
|
||||
# Create Content object
|
||||
gemini_content = types.Content(
|
||||
role=gemini_role, parts=[types.Part.from_text(text=text_content)]
|
||||
)
|
||||
gemini_content = types.Content(role=gemini_role, parts=parts)
|
||||
contents.append(gemini_content)
|
||||
|
||||
return contents, system_instruction
|
||||
@@ -1013,3 +1077,166 @@ class GeminiCompletion(BaseLLM):
|
||||
)
|
||||
)
|
||||
return result
|
||||
|
||||
def supports_multimodal(self) -> bool:
|
||||
"""Check if the model supports multimodal inputs.
|
||||
|
||||
Gemini models support images, audio, video, and PDFs.
|
||||
|
||||
Returns:
|
||||
True if the model supports multimodal inputs.
|
||||
"""
|
||||
return True
|
||||
|
||||
def supported_multimodal_content_types(self) -> list[str]:
|
||||
"""Get content types supported by Gemini for multimodal input.
|
||||
|
||||
Returns:
|
||||
List of supported MIME type prefixes.
|
||||
"""
|
||||
return ["image/", "audio/", "video/", "application/pdf", "text/"]
|
||||
|
||||
def format_multimodal_content(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
upload_cache: UploadCache | None = None,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Format files as Gemini multimodal content blocks.
|
||||
|
||||
Gemini supports both inlineData format and file references via File API.
|
||||
Uses FileResolver to determine the best delivery method based on file size.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping file names to FileInput objects.
|
||||
upload_cache: Optional cache for tracking uploaded files.
|
||||
|
||||
Returns:
|
||||
List of content blocks in Gemini's expected format.
|
||||
"""
|
||||
from crewai.files import (
|
||||
FileReference,
|
||||
FileResolver,
|
||||
FileResolverConfig,
|
||||
InlineBase64,
|
||||
)
|
||||
|
||||
content_blocks: list[dict[str, Any]] = []
|
||||
supported_types = self.supported_multimodal_content_types()
|
||||
|
||||
config = FileResolverConfig(prefer_upload=False)
|
||||
resolver = FileResolver(config=config, upload_cache=upload_cache)
|
||||
|
||||
for file_input in files.values():
|
||||
content_type = file_input.content_type
|
||||
if not any(content_type.startswith(t) for t in supported_types):
|
||||
continue
|
||||
|
||||
resolved = resolver.resolve(file_input, "gemini")
|
||||
|
||||
if isinstance(resolved, FileReference) and resolved.file_uri:
|
||||
# Use file reference format for uploaded files
|
||||
content_blocks.append(
|
||||
{
|
||||
"fileData": {
|
||||
"mimeType": resolved.content_type,
|
||||
"fileUri": resolved.file_uri,
|
||||
}
|
||||
}
|
||||
)
|
||||
elif isinstance(resolved, InlineBase64):
|
||||
# Use inline format for smaller files
|
||||
content_blocks.append(
|
||||
{
|
||||
"inlineData": {
|
||||
"mimeType": resolved.content_type,
|
||||
"data": resolved.data,
|
||||
}
|
||||
}
|
||||
)
|
||||
else:
|
||||
# Fallback to base64 encoding
|
||||
data = base64.b64encode(file_input.read()).decode("ascii")
|
||||
content_blocks.append(
|
||||
{
|
||||
"inlineData": {
|
||||
"mimeType": content_type,
|
||||
"data": data,
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
return content_blocks
|
||||
|
||||
async def aformat_multimodal_content(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
upload_cache: UploadCache | None = None,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Async format files as Gemini multimodal content blocks.
|
||||
|
||||
Uses parallel file resolution for improved performance with multiple files.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping file names to FileInput objects.
|
||||
upload_cache: Optional cache for tracking uploaded files.
|
||||
|
||||
Returns:
|
||||
List of content blocks in Gemini's expected format.
|
||||
"""
|
||||
from crewai.files import (
|
||||
FileReference,
|
||||
FileResolver,
|
||||
FileResolverConfig,
|
||||
InlineBase64,
|
||||
)
|
||||
|
||||
supported_types = self.supported_multimodal_content_types()
|
||||
|
||||
supported_files = {
|
||||
name: f
|
||||
for name, f in files.items()
|
||||
if any(f.content_type.startswith(t) for t in supported_types)
|
||||
}
|
||||
|
||||
if not supported_files:
|
||||
return []
|
||||
|
||||
config = FileResolverConfig(prefer_upload=False)
|
||||
resolver = FileResolver(config=config, upload_cache=upload_cache)
|
||||
resolved_files = await resolver.aresolve_files(supported_files, "gemini")
|
||||
|
||||
content_blocks: list[dict[str, Any]] = []
|
||||
for resolved in resolved_files.values():
|
||||
if isinstance(resolved, FileReference) and resolved.file_uri:
|
||||
content_blocks.append(
|
||||
{
|
||||
"fileData": {
|
||||
"mimeType": resolved.content_type,
|
||||
"fileUri": resolved.file_uri,
|
||||
}
|
||||
}
|
||||
)
|
||||
elif isinstance(resolved, InlineBase64):
|
||||
content_blocks.append(
|
||||
{
|
||||
"inlineData": {
|
||||
"mimeType": resolved.content_type,
|
||||
"data": resolved.data,
|
||||
}
|
||||
}
|
||||
)
|
||||
|
||||
return content_blocks
|
||||
|
||||
def format_text_content(self, text: str) -> dict[str, Any]:
|
||||
"""Format text as a Gemini content block.
|
||||
|
||||
Gemini uses {"text": "..."} format instead of {"type": "text", "text": "..."}.
|
||||
|
||||
Args:
|
||||
text: The text content to format.
|
||||
|
||||
Returns:
|
||||
A content block in Gemini's expected format.
|
||||
"""
|
||||
return {"text": text}
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import base64
|
||||
from collections.abc import AsyncIterator
|
||||
import json
|
||||
import logging
|
||||
@@ -27,6 +28,7 @@ from crewai.utilities.types import LLMMessage
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from crewai.agent.core import Agent
|
||||
from crewai.files import FileInput, UploadCache
|
||||
from crewai.llms.hooks.base import BaseInterceptor
|
||||
from crewai.task import Task
|
||||
from crewai.tools.base_tool import BaseTool
|
||||
@@ -1048,3 +1050,165 @@ class OpenAICompletion(BaseLLM):
|
||||
formatted_messages.append(message)
|
||||
|
||||
return formatted_messages
|
||||
|
||||
def supports_multimodal(self) -> bool:
|
||||
"""Check if the model supports multimodal inputs.
|
||||
|
||||
OpenAI vision-enabled models include GPT-4o, GPT-4.1, and o-series.
|
||||
|
||||
Returns:
|
||||
True if the model supports images.
|
||||
"""
|
||||
vision_models = (
|
||||
"gpt-4o",
|
||||
"gpt-4.1",
|
||||
"gpt-4-turbo",
|
||||
"gpt-4-vision",
|
||||
"o1",
|
||||
"o3",
|
||||
"o4",
|
||||
)
|
||||
return any(self.model.lower().startswith(m) for m in vision_models)
|
||||
|
||||
def supported_multimodal_content_types(self) -> list[str]:
|
||||
"""Get content types supported by OpenAI for multimodal input.
|
||||
|
||||
Returns:
|
||||
List of supported MIME type prefixes.
|
||||
"""
|
||||
if not self.supports_multimodal():
|
||||
return []
|
||||
return ["image/"]
|
||||
|
||||
def format_multimodal_content(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
upload_cache: UploadCache | None = None,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Format files as OpenAI multimodal content blocks.
|
||||
|
||||
OpenAI supports both base64 data URLs and file_id references via Files API.
|
||||
Uses FileResolver to determine the best delivery method based on file size.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping file names to FileInput objects.
|
||||
upload_cache: Optional cache for tracking uploaded files.
|
||||
|
||||
Returns:
|
||||
List of content blocks in OpenAI's expected format.
|
||||
"""
|
||||
if not self.supports_multimodal():
|
||||
return []
|
||||
|
||||
from crewai.files import (
|
||||
FileReference,
|
||||
FileResolver,
|
||||
FileResolverConfig,
|
||||
InlineBase64,
|
||||
)
|
||||
|
||||
content_blocks: list[dict[str, Any]] = []
|
||||
supported_types = self.supported_multimodal_content_types()
|
||||
|
||||
config = FileResolverConfig(prefer_upload=False)
|
||||
resolver = FileResolver(config=config, upload_cache=upload_cache)
|
||||
|
||||
for file_input in files.values():
|
||||
content_type = file_input.content_type
|
||||
if not any(content_type.startswith(t) for t in supported_types):
|
||||
continue
|
||||
|
||||
resolved = resolver.resolve(file_input, "openai")
|
||||
|
||||
if isinstance(resolved, FileReference):
|
||||
content_blocks.append(
|
||||
{
|
||||
"type": "file",
|
||||
"file": {
|
||||
"file_id": resolved.file_id,
|
||||
},
|
||||
}
|
||||
)
|
||||
elif isinstance(resolved, InlineBase64):
|
||||
content_blocks.append(
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": f"data:{resolved.content_type};base64,{resolved.data}"
|
||||
},
|
||||
}
|
||||
)
|
||||
else:
|
||||
data = base64.b64encode(file_input.read()).decode("ascii")
|
||||
content_blocks.append(
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {"url": f"data:{content_type};base64,{data}"},
|
||||
}
|
||||
)
|
||||
|
||||
return content_blocks
|
||||
|
||||
async def aformat_multimodal_content(
|
||||
self,
|
||||
files: dict[str, FileInput],
|
||||
upload_cache: UploadCache | None = None,
|
||||
) -> list[dict[str, Any]]:
|
||||
"""Async format files as OpenAI multimodal content blocks.
|
||||
|
||||
Uses parallel file resolution for improved performance with multiple files.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping file names to FileInput objects.
|
||||
upload_cache: Optional cache for tracking uploaded files.
|
||||
|
||||
Returns:
|
||||
List of content blocks in OpenAI's expected format.
|
||||
"""
|
||||
if not self.supports_multimodal():
|
||||
return []
|
||||
|
||||
from crewai.files import (
|
||||
FileReference,
|
||||
FileResolver,
|
||||
FileResolverConfig,
|
||||
InlineBase64,
|
||||
)
|
||||
|
||||
supported_types = self.supported_multimodal_content_types()
|
||||
|
||||
supported_files = {
|
||||
name: f
|
||||
for name, f in files.items()
|
||||
if any(f.content_type.startswith(t) for t in supported_types)
|
||||
}
|
||||
|
||||
if not supported_files:
|
||||
return []
|
||||
|
||||
config = FileResolverConfig(prefer_upload=False)
|
||||
resolver = FileResolver(config=config, upload_cache=upload_cache)
|
||||
resolved_files = await resolver.aresolve_files(supported_files, "openai")
|
||||
|
||||
content_blocks: list[dict[str, Any]] = []
|
||||
for resolved in resolved_files.values():
|
||||
if isinstance(resolved, FileReference):
|
||||
content_blocks.append(
|
||||
{
|
||||
"type": "file",
|
||||
"file": {
|
||||
"file_id": resolved.file_id,
|
||||
},
|
||||
}
|
||||
)
|
||||
elif isinstance(resolved, InlineBase64):
|
||||
content_blocks.append(
|
||||
{
|
||||
"type": "image_url",
|
||||
"image_url": {
|
||||
"url": f"data:{resolved.content_type};base64,{resolved.data}"
|
||||
},
|
||||
}
|
||||
)
|
||||
|
||||
return content_blocks
|
||||
|
||||
@@ -37,6 +37,12 @@ from crewai.events.types.task_events import (
|
||||
TaskFailedEvent,
|
||||
TaskStartedEvent,
|
||||
)
|
||||
from crewai.files import (
|
||||
FileInput,
|
||||
FilePath,
|
||||
FileSourceInput,
|
||||
normalize_input_files,
|
||||
)
|
||||
from crewai.security import Fingerprint, SecurityConfig
|
||||
from crewai.tasks.output_format import OutputFormat
|
||||
from crewai.tasks.task_output import TaskOutput
|
||||
@@ -44,6 +50,11 @@ from crewai.tools.base_tool import BaseTool
|
||||
from crewai.utilities.config import process_config
|
||||
from crewai.utilities.constants import NOT_SPECIFIED, _NotSpecified
|
||||
from crewai.utilities.converter import Converter, convert_to_model
|
||||
from crewai.utilities.file_store import (
|
||||
clear_task_files,
|
||||
get_all_files,
|
||||
store_task_files,
|
||||
)
|
||||
from crewai.utilities.guardrail import (
|
||||
process_guardrail,
|
||||
)
|
||||
@@ -142,6 +153,10 @@ class Task(BaseModel):
|
||||
default_factory=list,
|
||||
description="Tools the agent is limited to use for this task.",
|
||||
)
|
||||
input_files: list[FileSourceInput | FileInput] = Field(
|
||||
default_factory=list,
|
||||
description="List of input files for this task. Accepts paths, bytes, or File objects.",
|
||||
)
|
||||
security_config: SecurityConfig = Field(
|
||||
default_factory=SecurityConfig,
|
||||
description="Security configuration for the task.",
|
||||
@@ -357,6 +372,21 @@ class Task(BaseModel):
|
||||
"may_not_set_field", "This field is not to be set by the user.", {}
|
||||
)
|
||||
|
||||
@field_validator("input_files", mode="before")
|
||||
@classmethod
|
||||
def _normalize_input_files(cls, v: list[Any]) -> list[Any]:
|
||||
"""Convert string paths to FilePath objects."""
|
||||
if not v:
|
||||
return v
|
||||
|
||||
result = []
|
||||
for item in v:
|
||||
if isinstance(item, str):
|
||||
result.append(FilePath(path=Path(item)))
|
||||
else:
|
||||
result.append(item)
|
||||
return result
|
||||
|
||||
@field_validator("output_file")
|
||||
@classmethod
|
||||
def output_file_validation(cls, value: str | None) -> str | None:
|
||||
@@ -495,10 +525,10 @@ class Task(BaseModel):
|
||||
) -> None:
|
||||
"""Execute the task asynchronously with context handling."""
|
||||
try:
|
||||
result = self._execute_core(agent, context, tools)
|
||||
future.set_result(result)
|
||||
result = self._execute_core(agent, context, tools)
|
||||
future.set_result(result)
|
||||
except Exception as e:
|
||||
future.set_exception(e)
|
||||
future.set_exception(e)
|
||||
|
||||
async def aexecute_sync(
|
||||
self,
|
||||
@@ -516,6 +546,7 @@ class Task(BaseModel):
|
||||
tools: list[Any] | None,
|
||||
) -> TaskOutput:
|
||||
"""Run the core execution logic of the task asynchronously."""
|
||||
self._store_input_files()
|
||||
try:
|
||||
agent = agent or self.agent
|
||||
self.agent = agent
|
||||
@@ -600,6 +631,8 @@ class Task(BaseModel):
|
||||
self.end_time = datetime.datetime.now()
|
||||
crewai_event_bus.emit(self, TaskFailedEvent(error=str(e), task=self)) # type: ignore[no-untyped-call]
|
||||
raise e # Re-raise the exception after emitting the event
|
||||
finally:
|
||||
clear_task_files(self.id)
|
||||
|
||||
def _execute_core(
|
||||
self,
|
||||
@@ -608,6 +641,7 @@ class Task(BaseModel):
|
||||
tools: list[Any] | None,
|
||||
) -> TaskOutput:
|
||||
"""Run the core execution logic of the task."""
|
||||
self._store_input_files()
|
||||
try:
|
||||
agent = agent or self.agent
|
||||
self.agent = agent
|
||||
@@ -693,6 +727,8 @@ class Task(BaseModel):
|
||||
self.end_time = datetime.datetime.now()
|
||||
crewai_event_bus.emit(self, TaskFailedEvent(error=str(e), task=self)) # type: ignore[no-untyped-call]
|
||||
raise e # Re-raise the exception after emitting the event
|
||||
finally:
|
||||
clear_task_files(self.id)
|
||||
|
||||
def prompt(self) -> str:
|
||||
"""Generates the task prompt with optional markdown formatting.
|
||||
@@ -715,6 +751,51 @@ class Task(BaseModel):
|
||||
if trigger_payload is not None:
|
||||
description += f"\n\nTrigger Payload: {trigger_payload}"
|
||||
|
||||
if self.agent and self.agent.crew:
|
||||
files = get_all_files(self.agent.crew.id, self.id)
|
||||
if files:
|
||||
supported_types: list[str] = []
|
||||
if self.agent.llm and self.agent.llm.supports_multimodal():
|
||||
supported_types = (
|
||||
self.agent.llm.supported_multimodal_content_types()
|
||||
)
|
||||
|
||||
def is_auto_injected(content_type: str) -> bool:
|
||||
return any(content_type.startswith(t) for t in supported_types)
|
||||
|
||||
auto_injected_files = {
|
||||
name: f_input
|
||||
for name, f_input in files.items()
|
||||
if is_auto_injected(f_input.content_type)
|
||||
}
|
||||
tool_files = {
|
||||
name: f_input
|
||||
for name, f_input in files.items()
|
||||
if not is_auto_injected(f_input.content_type)
|
||||
}
|
||||
|
||||
file_lines: list[str] = []
|
||||
|
||||
if auto_injected_files:
|
||||
file_lines.append(
|
||||
"Input files (content already loaded in conversation):"
|
||||
)
|
||||
for name, file_input in auto_injected_files.items():
|
||||
filename = file_input.filename or name
|
||||
file_lines.append(f' - "{name}" ({filename})')
|
||||
|
||||
if tool_files:
|
||||
file_lines.append(
|
||||
"Available input files (use the name in quotes with read_file tool):"
|
||||
)
|
||||
for name, file_input in tool_files.items():
|
||||
filename = file_input.filename or name
|
||||
content_type = file_input.content_type
|
||||
file_lines.append(f' - "{name}" ({filename}, {content_type})')
|
||||
|
||||
if file_lines:
|
||||
description += "\n\n" + "\n".join(file_lines)
|
||||
|
||||
tasks_slices = [description]
|
||||
|
||||
output = self.i18n.slice("expected_output").format(
|
||||
@@ -948,6 +1029,18 @@ Follow these guidelines:
|
||||
) from e
|
||||
return
|
||||
|
||||
def _store_input_files(self) -> None:
|
||||
"""Store task input files in the file store.
|
||||
|
||||
Converts input_files list to a named dict and stores under task ID.
|
||||
"""
|
||||
if not self.input_files:
|
||||
return
|
||||
|
||||
files_dict = normalize_input_files(self.input_files)
|
||||
if files_dict:
|
||||
store_task_files(self.id, files_dict)
|
||||
|
||||
def __repr__(self) -> str:
|
||||
return f"Task(description={self.description}, expected_output={self.expected_output})"
|
||||
|
||||
|
||||
78
lib/crewai/src/crewai/tools/agent_tools/read_file_tool.py
Normal file
78
lib/crewai/src/crewai/tools/agent_tools/read_file_tool.py
Normal file
@@ -0,0 +1,78 @@
|
||||
"""Tool for reading input files provided to the crew."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import base64
|
||||
from typing import TYPE_CHECKING
|
||||
|
||||
from pydantic import BaseModel, Field, PrivateAttr
|
||||
|
||||
from crewai.tools.base_tool import BaseTool
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from crewai.files import FileInput
|
||||
|
||||
|
||||
class ReadFileToolSchema(BaseModel):
|
||||
"""Schema for read file tool arguments."""
|
||||
|
||||
file_name: str = Field(..., description="The name of the input file to read")
|
||||
|
||||
|
||||
class ReadFileTool(BaseTool):
|
||||
"""Tool for reading input files provided to the crew kickoff.
|
||||
|
||||
Provides agents access to files passed via the `files` key in inputs.
|
||||
"""
|
||||
|
||||
name: str = "read_file"
|
||||
description: str = (
|
||||
"Read content from an input file by name. "
|
||||
"Returns file content as text for text files, or base64 for binary files."
|
||||
)
|
||||
args_schema: type[BaseModel] = ReadFileToolSchema
|
||||
|
||||
_files: dict[str, FileInput] | None = PrivateAttr(default=None)
|
||||
|
||||
def set_files(self, files: dict[str, FileInput] | None) -> None:
|
||||
"""Set available input files.
|
||||
|
||||
Args:
|
||||
files: Dictionary mapping file names to file inputs.
|
||||
"""
|
||||
self._files = files
|
||||
|
||||
def _run(self, file_name: str, **kwargs: object) -> str:
|
||||
"""Read an input file by name.
|
||||
|
||||
Args:
|
||||
file_name: The name of the file to read.
|
||||
|
||||
Returns:
|
||||
File content as text for text files, or base64 encoded for binary.
|
||||
"""
|
||||
if not self._files:
|
||||
return "No input files available."
|
||||
|
||||
if file_name not in self._files:
|
||||
available = ", ".join(self._files.keys())
|
||||
return f"File '{file_name}' not found. Available files: {available}"
|
||||
|
||||
file_input = self._files[file_name]
|
||||
content = file_input.read()
|
||||
content_type = file_input.content_type
|
||||
filename = file_input.filename or file_name
|
||||
|
||||
text_types = (
|
||||
"text/",
|
||||
"application/json",
|
||||
"application/xml",
|
||||
"application/x-yaml",
|
||||
)
|
||||
|
||||
if any(content_type.startswith(t) for t in text_types):
|
||||
return content.decode("utf-8")
|
||||
|
||||
encoded = base64.b64encode(content).decode("ascii")
|
||||
return f"[Binary file: {filename} ({content_type})]\nBase64: {encoded}"
|
||||
@@ -1,5 +1,6 @@
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
from collections.abc import Callable, Sequence
|
||||
import json
|
||||
import re
|
||||
@@ -54,6 +55,23 @@ console = Console()
|
||||
_MULTIPLE_NEWLINES: Final[re.Pattern[str]] = re.compile(r"\n+")
|
||||
|
||||
|
||||
def is_inside_event_loop() -> bool:
|
||||
"""Check if code is currently running inside an asyncio event loop.
|
||||
|
||||
This is used to detect when code is being called from within an async context
|
||||
(e.g., inside a Flow). In such cases, callers should return a coroutine
|
||||
instead of executing synchronously to avoid nested event loop errors.
|
||||
|
||||
Returns:
|
||||
True if inside a running event loop, False otherwise.
|
||||
"""
|
||||
try:
|
||||
asyncio.get_running_loop()
|
||||
return True
|
||||
except RuntimeError:
|
||||
return False
|
||||
|
||||
|
||||
def parse_tools(tools: list[BaseTool]) -> list[CrewStructuredTool]:
|
||||
"""Parse tools to be used for the task.
|
||||
|
||||
|
||||
239
lib/crewai/src/crewai/utilities/file_store.py
Normal file
239
lib/crewai/src/crewai/utilities/file_store.py
Normal file
@@ -0,0 +1,239 @@
|
||||
"""Global file store for crew and task execution."""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
from collections.abc import Coroutine
|
||||
import concurrent.futures
|
||||
from typing import TYPE_CHECKING, TypeVar
|
||||
from uuid import UUID
|
||||
|
||||
from aiocache import Cache # type: ignore[import-untyped]
|
||||
from aiocache.serializers import PickleSerializer # type: ignore[import-untyped]
|
||||
|
||||
|
||||
if TYPE_CHECKING:
|
||||
from crewai.files import FileInput
|
||||
|
||||
_file_store = Cache(Cache.MEMORY, serializer=PickleSerializer())
|
||||
|
||||
T = TypeVar("T")
|
||||
|
||||
|
||||
def _run_sync(coro: Coroutine[None, None, T]) -> T:
|
||||
"""Run a coroutine synchronously, handling nested event loops.
|
||||
|
||||
If called from within a running event loop, runs the coroutine in a
|
||||
separate thread to avoid "cannot run event loop while another is running".
|
||||
|
||||
Args:
|
||||
coro: The coroutine to run.
|
||||
|
||||
Returns:
|
||||
The result of the coroutine.
|
||||
"""
|
||||
try:
|
||||
asyncio.get_running_loop()
|
||||
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
|
||||
future = executor.submit(asyncio.run, coro)
|
||||
return future.result()
|
||||
except RuntimeError:
|
||||
return asyncio.run(coro)
|
||||
|
||||
|
||||
DEFAULT_TTL = 3600
|
||||
|
||||
_CREW_PREFIX = "crew:"
|
||||
_TASK_PREFIX = "task:"
|
||||
|
||||
|
||||
async def astore_files(
|
||||
execution_id: UUID,
|
||||
files: dict[str, FileInput],
|
||||
ttl: int = DEFAULT_TTL,
|
||||
) -> None:
|
||||
"""Store files for a crew execution asynchronously.
|
||||
|
||||
Args:
|
||||
execution_id: Unique identifier for the crew execution.
|
||||
files: Dictionary mapping names to file inputs.
|
||||
ttl: Time-to-live in seconds.
|
||||
"""
|
||||
await _file_store.set(f"{_CREW_PREFIX}{execution_id}", files, ttl=ttl)
|
||||
|
||||
|
||||
async def aget_files(execution_id: UUID) -> dict[str, FileInput] | None:
|
||||
"""Retrieve files for a crew execution asynchronously.
|
||||
|
||||
Args:
|
||||
execution_id: Unique identifier for the crew execution.
|
||||
|
||||
Returns:
|
||||
Dictionary of files or None if not found.
|
||||
"""
|
||||
result: dict[str, FileInput] | None = await _file_store.get(
|
||||
f"{_CREW_PREFIX}{execution_id}"
|
||||
)
|
||||
return result
|
||||
|
||||
|
||||
async def aclear_files(execution_id: UUID) -> None:
|
||||
"""Clear files for a crew execution asynchronously.
|
||||
|
||||
Args:
|
||||
execution_id: Unique identifier for the crew execution.
|
||||
"""
|
||||
await _file_store.delete(f"{_CREW_PREFIX}{execution_id}")
|
||||
|
||||
|
||||
async def astore_task_files(
|
||||
task_id: UUID,
|
||||
files: dict[str, FileInput],
|
||||
ttl: int = DEFAULT_TTL,
|
||||
) -> None:
|
||||
"""Store files for a task execution asynchronously.
|
||||
|
||||
Args:
|
||||
task_id: Unique identifier for the task.
|
||||
files: Dictionary mapping names to file inputs.
|
||||
ttl: Time-to-live in seconds.
|
||||
"""
|
||||
await _file_store.set(f"{_TASK_PREFIX}{task_id}", files, ttl=ttl)
|
||||
|
||||
|
||||
async def aget_task_files(task_id: UUID) -> dict[str, FileInput] | None:
|
||||
"""Retrieve files for a task execution asynchronously.
|
||||
|
||||
Args:
|
||||
task_id: Unique identifier for the task.
|
||||
|
||||
Returns:
|
||||
Dictionary of files or None if not found.
|
||||
"""
|
||||
result: dict[str, FileInput] | None = await _file_store.get(
|
||||
f"{_TASK_PREFIX}{task_id}"
|
||||
)
|
||||
return result
|
||||
|
||||
|
||||
async def aclear_task_files(task_id: UUID) -> None:
|
||||
"""Clear files for a task execution asynchronously.
|
||||
|
||||
Args:
|
||||
task_id: Unique identifier for the task.
|
||||
"""
|
||||
await _file_store.delete(f"{_TASK_PREFIX}{task_id}")
|
||||
|
||||
|
||||
async def aget_all_files(
|
||||
crew_id: UUID,
|
||||
task_id: UUID | None = None,
|
||||
) -> dict[str, FileInput] | None:
|
||||
"""Get merged crew and task files asynchronously.
|
||||
|
||||
Task files override crew files with the same name.
|
||||
|
||||
Args:
|
||||
crew_id: Unique identifier for the crew execution.
|
||||
task_id: Optional task identifier for task-scoped files.
|
||||
|
||||
Returns:
|
||||
Merged dictionary of files or None if none found.
|
||||
"""
|
||||
crew_files = await aget_files(crew_id) or {}
|
||||
task_files = await aget_task_files(task_id) if task_id else {}
|
||||
|
||||
if not crew_files and not task_files:
|
||||
return None
|
||||
|
||||
return {**crew_files, **(task_files or {})}
|
||||
|
||||
|
||||
def store_files(
|
||||
execution_id: UUID,
|
||||
files: dict[str, FileInput],
|
||||
ttl: int = DEFAULT_TTL,
|
||||
) -> None:
|
||||
"""Store files for a crew execution.
|
||||
|
||||
Args:
|
||||
execution_id: Unique identifier for the crew execution.
|
||||
files: Dictionary mapping names to file inputs.
|
||||
ttl: Time-to-live in seconds.
|
||||
"""
|
||||
_run_sync(astore_files(execution_id, files, ttl))
|
||||
|
||||
|
||||
def get_files(execution_id: UUID) -> dict[str, FileInput] | None:
|
||||
"""Retrieve files for a crew execution.
|
||||
|
||||
Args:
|
||||
execution_id: Unique identifier for the crew execution.
|
||||
|
||||
Returns:
|
||||
Dictionary of files or None if not found.
|
||||
"""
|
||||
return _run_sync(aget_files(execution_id))
|
||||
|
||||
|
||||
def clear_files(execution_id: UUID) -> None:
|
||||
"""Clear files for a crew execution.
|
||||
|
||||
Args:
|
||||
execution_id: Unique identifier for the crew execution.
|
||||
"""
|
||||
_run_sync(aclear_files(execution_id))
|
||||
|
||||
|
||||
def store_task_files(
|
||||
task_id: UUID,
|
||||
files: dict[str, FileInput],
|
||||
ttl: int = DEFAULT_TTL,
|
||||
) -> None:
|
||||
"""Store files for a task execution.
|
||||
|
||||
Args:
|
||||
task_id: Unique identifier for the task.
|
||||
files: Dictionary mapping names to file inputs.
|
||||
ttl: Time-to-live in seconds.
|
||||
"""
|
||||
_run_sync(astore_task_files(task_id, files, ttl))
|
||||
|
||||
|
||||
def get_task_files(task_id: UUID) -> dict[str, FileInput] | None:
|
||||
"""Retrieve files for a task execution.
|
||||
|
||||
Args:
|
||||
task_id: Unique identifier for the task.
|
||||
|
||||
Returns:
|
||||
Dictionary of files or None if not found.
|
||||
"""
|
||||
return _run_sync(aget_task_files(task_id))
|
||||
|
||||
|
||||
def clear_task_files(task_id: UUID) -> None:
|
||||
"""Clear files for a task execution.
|
||||
|
||||
Args:
|
||||
task_id: Unique identifier for the task.
|
||||
"""
|
||||
_run_sync(aclear_task_files(task_id))
|
||||
|
||||
|
||||
def get_all_files(
|
||||
crew_id: UUID,
|
||||
task_id: UUID | None = None,
|
||||
) -> dict[str, FileInput] | None:
|
||||
"""Get merged crew and task files.
|
||||
|
||||
Task files override crew files with the same name.
|
||||
|
||||
Args:
|
||||
crew_id: Unique identifier for the crew execution.
|
||||
task_id: Optional task identifier for task-scoped files.
|
||||
|
||||
Returns:
|
||||
Merged dictionary of files or None if none found.
|
||||
"""
|
||||
return _run_sync(aget_all_files(crew_id, task_id))
|
||||
25
lib/crewai/src/crewai/utilities/files/__init__.py
Normal file
25
lib/crewai/src/crewai/utilities/files/__init__.py
Normal file
@@ -0,0 +1,25 @@
|
||||
"""Backwards compatibility re-exports from crewai.files.
|
||||
|
||||
Deprecated: Import from crewai.files instead.
|
||||
"""
|
||||
|
||||
import sys
|
||||
from typing import Any
|
||||
|
||||
from typing_extensions import deprecated
|
||||
|
||||
import crewai.files as _files
|
||||
|
||||
|
||||
@deprecated("crewai.utilities.files is deprecated. Import from crewai.files instead.")
|
||||
class _DeprecatedModule:
|
||||
"""Deprecated module wrapper."""
|
||||
|
||||
def __getattr__(self, name: str) -> Any:
|
||||
return getattr(_files, name)
|
||||
|
||||
def __dir__(self) -> list[str]:
|
||||
return list(_files.__all__)
|
||||
|
||||
|
||||
sys.modules[__name__] = _DeprecatedModule() # type: ignore[assignment]
|
||||
258
lib/crewai/src/crewai/utilities/files/__init__.pyi
Normal file
258
lib/crewai/src/crewai/utilities/files/__init__.pyi
Normal file
@@ -0,0 +1,258 @@
|
||||
"""Type stubs for backwards compatibility re-exports from crewai.files.
|
||||
|
||||
.. deprecated::
|
||||
Import from crewai.files instead.
|
||||
"""
|
||||
|
||||
from collections.abc import Callable
|
||||
from datetime import datetime
|
||||
from pathlib import Path
|
||||
from typing import Any, Literal
|
||||
|
||||
from typing_extensions import deprecated
|
||||
|
||||
import crewai.files as _files
|
||||
|
||||
FileMode = Literal["strict", "auto", "warn", "chunk"]
|
||||
ImageExtension = _files.ImageExtension
|
||||
ImageContentType = _files.ImageContentType
|
||||
PDFExtension = _files.PDFExtension
|
||||
PDFContentType = _files.PDFContentType
|
||||
TextExtension = _files.TextExtension
|
||||
TextContentType = _files.TextContentType
|
||||
AudioExtension = _files.AudioExtension
|
||||
AudioContentType = _files.AudioContentType
|
||||
VideoExtension = _files.VideoExtension
|
||||
VideoContentType = _files.VideoContentType
|
||||
FileInput = _files.FileInput
|
||||
FileSource = _files.FileSource
|
||||
FileSourceInput = _files.FileSourceInput
|
||||
RawFileInput = _files.RawFileInput
|
||||
ResolvedFileType = _files.ResolvedFileType
|
||||
FileHandling = _files.FileHandling
|
||||
|
||||
# Deprecated classes
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class BaseFile(_files.BaseFile):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class ImageFile(_files.ImageFile):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class PDFFile(_files.PDFFile):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class TextFile(_files.TextFile):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class AudioFile(_files.AudioFile):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class VideoFile(_files.VideoFile):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class File(_files.File):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class FilePath(_files.FilePath):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class FileBytes(_files.FileBytes):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class FileStream(_files.FileStream):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class FileResolver(_files.FileResolver):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class FileResolverConfig(_files.FileResolverConfig):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class FileProcessor(_files.FileProcessor):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class FileUploader(_files.FileUploader):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class UploadCache(_files.UploadCache):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class CachedUpload(_files.CachedUpload):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class UploadResult(_files.UploadResult):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class ResolvedFile(_files.ResolvedFile):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class FileReference(_files.FileReference):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class UrlReference(_files.UrlReference):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class InlineBase64(_files.InlineBase64):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class InlineBytes(_files.InlineBytes):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class ProviderConstraints(_files.ProviderConstraints):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class ImageConstraints(_files.ImageConstraints):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class AudioConstraints(_files.AudioConstraints):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class VideoConstraints(_files.VideoConstraints):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class PDFConstraints(_files.PDFConstraints):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
# Exceptions
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class FileProcessingError(_files.FileProcessingError):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class FileValidationError(_files.FileValidationError):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class FileTooLargeError(_files.FileTooLargeError):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class UnsupportedFileTypeError(_files.UnsupportedFileTypeError):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
class ProcessingDependencyError(_files.ProcessingDependencyError):
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
# Constants
|
||||
OPENAI_CONSTRAINTS: _files.ProviderConstraints
|
||||
ANTHROPIC_CONSTRAINTS: _files.ProviderConstraints
|
||||
GEMINI_CONSTRAINTS: _files.ProviderConstraints
|
||||
BEDROCK_CONSTRAINTS: _files.ProviderConstraints
|
||||
|
||||
# Deprecated functions
|
||||
@deprecated("Import from crewai.files instead")
|
||||
def create_resolver(
|
||||
provider: str,
|
||||
config: FileResolverConfig | None = None,
|
||||
) -> FileResolver:
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
def get_uploader(provider: str, **kwargs: Any) -> FileUploader | None:
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
def get_upload_cache() -> UploadCache:
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
def reset_upload_cache() -> None:
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
def get_constraints_for_provider(provider: str) -> ProviderConstraints:
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
def cleanup_uploaded_files(provider: str | None = None) -> int:
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
def cleanup_expired_files() -> int:
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
def cleanup_provider_files(provider: str) -> int:
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
def normalize_input_files(
|
||||
input_files: list[FileSourceInput | FileInput],
|
||||
) -> dict[str, FileInput]:
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
@deprecated("Import from crewai.files instead")
|
||||
def wrap_file_source(source: FileSource) -> FileInput:
|
||||
""".. deprecated:: Import from crewai.files instead."""
|
||||
...
|
||||
|
||||
__all__: list[str]
|
||||
@@ -1,8 +1,8 @@
|
||||
"""Types for CrewAI utilities."""
|
||||
|
||||
from typing import Any, Literal
|
||||
from typing import Any, Literal, TypedDict
|
||||
|
||||
from typing_extensions import TypedDict
|
||||
from crewai.files import FileInput
|
||||
|
||||
|
||||
class LLMMessage(TypedDict):
|
||||
@@ -15,3 +15,13 @@ class LLMMessage(TypedDict):
|
||||
|
||||
role: Literal["user", "assistant", "system"]
|
||||
content: str | list[dict[str, Any]]
|
||||
|
||||
|
||||
class KickoffInputs(TypedDict, total=False):
|
||||
"""Type for crew kickoff inputs.
|
||||
|
||||
Attributes:
|
||||
files: Named file inputs accessible to tasks during execution.
|
||||
"""
|
||||
|
||||
files: dict[str, FileInput]
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
"""Unit tests for CrewAgentExecutorFlow.
|
||||
"""Unit tests for AgentExecutor.
|
||||
|
||||
Tests the Flow-based agent executor implementation including state management,
|
||||
flow methods, routing logic, and error handling.
|
||||
@@ -8,9 +8,9 @@ from unittest.mock import Mock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from crewai.experimental.crew_agent_executor_flow import (
|
||||
from crewai.experimental.agent_executor import (
|
||||
AgentReActState,
|
||||
CrewAgentExecutorFlow,
|
||||
AgentExecutor,
|
||||
)
|
||||
from crewai.agents.parser import AgentAction, AgentFinish
|
||||
|
||||
@@ -43,8 +43,8 @@ class TestAgentReActState:
|
||||
assert state.ask_for_human_input is True
|
||||
|
||||
|
||||
class TestCrewAgentExecutorFlow:
|
||||
"""Test CrewAgentExecutorFlow class."""
|
||||
class TestAgentExecutor:
|
||||
"""Test AgentExecutor class."""
|
||||
|
||||
@pytest.fixture
|
||||
def mock_dependencies(self):
|
||||
@@ -87,8 +87,8 @@ class TestCrewAgentExecutorFlow:
|
||||
}
|
||||
|
||||
def test_executor_initialization(self, mock_dependencies):
|
||||
"""Test CrewAgentExecutorFlow initialization."""
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
"""Test AgentExecutor initialization."""
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
|
||||
assert executor.llm == mock_dependencies["llm"]
|
||||
assert executor.task == mock_dependencies["task"]
|
||||
@@ -100,9 +100,9 @@ class TestCrewAgentExecutorFlow:
|
||||
def test_initialize_reasoning(self, mock_dependencies):
|
||||
"""Test flow entry point."""
|
||||
with patch.object(
|
||||
CrewAgentExecutorFlow, "_show_start_logs"
|
||||
AgentExecutor, "_show_start_logs"
|
||||
) as mock_show_start:
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
result = executor.initialize_reasoning()
|
||||
|
||||
assert result == "initialized"
|
||||
@@ -110,7 +110,7 @@ class TestCrewAgentExecutorFlow:
|
||||
|
||||
def test_check_max_iterations_not_reached(self, mock_dependencies):
|
||||
"""Test routing when iterations < max."""
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
executor.state.iterations = 5
|
||||
|
||||
result = executor.check_max_iterations()
|
||||
@@ -118,7 +118,7 @@ class TestCrewAgentExecutorFlow:
|
||||
|
||||
def test_check_max_iterations_reached(self, mock_dependencies):
|
||||
"""Test routing when iterations >= max."""
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
executor.state.iterations = 10
|
||||
|
||||
result = executor.check_max_iterations()
|
||||
@@ -126,7 +126,7 @@ class TestCrewAgentExecutorFlow:
|
||||
|
||||
def test_route_by_answer_type_action(self, mock_dependencies):
|
||||
"""Test routing for AgentAction."""
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
executor.state.current_answer = AgentAction(
|
||||
thought="thinking", tool="search", tool_input="query", text="action text"
|
||||
)
|
||||
@@ -136,7 +136,7 @@ class TestCrewAgentExecutorFlow:
|
||||
|
||||
def test_route_by_answer_type_finish(self, mock_dependencies):
|
||||
"""Test routing for AgentFinish."""
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
executor.state.current_answer = AgentFinish(
|
||||
thought="final thoughts", output="Final answer", text="complete"
|
||||
)
|
||||
@@ -146,7 +146,7 @@ class TestCrewAgentExecutorFlow:
|
||||
|
||||
def test_continue_iteration(self, mock_dependencies):
|
||||
"""Test iteration continuation."""
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
|
||||
result = executor.continue_iteration()
|
||||
|
||||
@@ -154,8 +154,8 @@ class TestCrewAgentExecutorFlow:
|
||||
|
||||
def test_finalize_success(self, mock_dependencies):
|
||||
"""Test finalize with valid AgentFinish."""
|
||||
with patch.object(CrewAgentExecutorFlow, "_show_logs") as mock_show_logs:
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
with patch.object(AgentExecutor, "_show_logs") as mock_show_logs:
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
executor.state.current_answer = AgentFinish(
|
||||
thought="final thinking", output="Done", text="complete"
|
||||
)
|
||||
@@ -168,7 +168,7 @@ class TestCrewAgentExecutorFlow:
|
||||
|
||||
def test_finalize_failure(self, mock_dependencies):
|
||||
"""Test finalize skips when given AgentAction instead of AgentFinish."""
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
executor.state.current_answer = AgentAction(
|
||||
thought="thinking", tool="search", tool_input="query", text="action text"
|
||||
)
|
||||
@@ -181,7 +181,7 @@ class TestCrewAgentExecutorFlow:
|
||||
|
||||
def test_format_prompt(self, mock_dependencies):
|
||||
"""Test prompt formatting."""
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
inputs = {"input": "test input", "tool_names": "tool1, tool2", "tools": "desc"}
|
||||
|
||||
result = executor._format_prompt("Prompt {input} {tool_names} {tools}", inputs)
|
||||
@@ -192,18 +192,18 @@ class TestCrewAgentExecutorFlow:
|
||||
|
||||
def test_is_training_mode_false(self, mock_dependencies):
|
||||
"""Test training mode detection when not in training."""
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
assert executor._is_training_mode() is False
|
||||
|
||||
def test_is_training_mode_true(self, mock_dependencies):
|
||||
"""Test training mode detection when in training."""
|
||||
mock_dependencies["crew"]._train = True
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
assert executor._is_training_mode() is True
|
||||
|
||||
def test_append_message_to_state(self, mock_dependencies):
|
||||
"""Test message appending to state."""
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
initial_count = len(executor.state.messages)
|
||||
|
||||
executor._append_message_to_state("test message")
|
||||
@@ -216,7 +216,7 @@ class TestCrewAgentExecutorFlow:
|
||||
callback = Mock()
|
||||
mock_dependencies["step_callback"] = callback
|
||||
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
answer = AgentFinish(thought="thinking", output="test", text="final")
|
||||
|
||||
executor._invoke_step_callback(answer)
|
||||
@@ -226,14 +226,14 @@ class TestCrewAgentExecutorFlow:
|
||||
def test_invoke_step_callback_none(self, mock_dependencies):
|
||||
"""Test step callback when none provided."""
|
||||
mock_dependencies["step_callback"] = None
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
|
||||
# Should not raise error
|
||||
executor._invoke_step_callback(
|
||||
AgentFinish(thought="thinking", output="test", text="final")
|
||||
)
|
||||
|
||||
@patch("crewai.experimental.crew_agent_executor_flow.handle_output_parser_exception")
|
||||
@patch("crewai.experimental.agent_executor.handle_output_parser_exception")
|
||||
def test_recover_from_parser_error(
|
||||
self, mock_handle_exception, mock_dependencies
|
||||
):
|
||||
@@ -242,7 +242,7 @@ class TestCrewAgentExecutorFlow:
|
||||
|
||||
mock_handle_exception.return_value = None
|
||||
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
executor._last_parser_error = OutputParserError("test error")
|
||||
initial_iterations = executor.state.iterations
|
||||
|
||||
@@ -252,12 +252,12 @@ class TestCrewAgentExecutorFlow:
|
||||
assert executor.state.iterations == initial_iterations + 1
|
||||
mock_handle_exception.assert_called_once()
|
||||
|
||||
@patch("crewai.experimental.crew_agent_executor_flow.handle_context_length")
|
||||
@patch("crewai.experimental.agent_executor.handle_context_length")
|
||||
def test_recover_from_context_length(
|
||||
self, mock_handle_context, mock_dependencies
|
||||
):
|
||||
"""Test recovery from context length error."""
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
executor._last_context_error = Exception("context too long")
|
||||
initial_iterations = executor.state.iterations
|
||||
|
||||
@@ -270,16 +270,16 @@ class TestCrewAgentExecutorFlow:
|
||||
def test_use_stop_words_property(self, mock_dependencies):
|
||||
"""Test use_stop_words property."""
|
||||
mock_dependencies["llm"].supports_stop_words.return_value = True
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
assert executor.use_stop_words is True
|
||||
|
||||
mock_dependencies["llm"].supports_stop_words.return_value = False
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
assert executor.use_stop_words is False
|
||||
|
||||
def test_compatibility_properties(self, mock_dependencies):
|
||||
"""Test compatibility properties for mixin."""
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
executor.state.messages = [{"role": "user", "content": "test"}]
|
||||
executor.state.iterations = 5
|
||||
|
||||
@@ -321,8 +321,8 @@ class TestFlowErrorHandling:
|
||||
"tools_handler": Mock(),
|
||||
}
|
||||
|
||||
@patch("crewai.experimental.crew_agent_executor_flow.get_llm_response")
|
||||
@patch("crewai.experimental.crew_agent_executor_flow.enforce_rpm_limit")
|
||||
@patch("crewai.experimental.agent_executor.get_llm_response")
|
||||
@patch("crewai.experimental.agent_executor.enforce_rpm_limit")
|
||||
def test_call_llm_parser_error(
|
||||
self, mock_enforce_rpm, mock_get_llm, mock_dependencies
|
||||
):
|
||||
@@ -332,15 +332,15 @@ class TestFlowErrorHandling:
|
||||
mock_enforce_rpm.return_value = None
|
||||
mock_get_llm.side_effect = OutputParserError("parse failed")
|
||||
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
result = executor.call_llm_and_parse()
|
||||
|
||||
assert result == "parser_error"
|
||||
assert executor._last_parser_error is not None
|
||||
|
||||
@patch("crewai.experimental.crew_agent_executor_flow.get_llm_response")
|
||||
@patch("crewai.experimental.crew_agent_executor_flow.enforce_rpm_limit")
|
||||
@patch("crewai.experimental.crew_agent_executor_flow.is_context_length_exceeded")
|
||||
@patch("crewai.experimental.agent_executor.get_llm_response")
|
||||
@patch("crewai.experimental.agent_executor.enforce_rpm_limit")
|
||||
@patch("crewai.experimental.agent_executor.is_context_length_exceeded")
|
||||
def test_call_llm_context_error(
|
||||
self,
|
||||
mock_is_context_exceeded,
|
||||
@@ -353,7 +353,7 @@ class TestFlowErrorHandling:
|
||||
mock_get_llm.side_effect = Exception("context length")
|
||||
mock_is_context_exceeded.return_value = True
|
||||
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
result = executor.call_llm_and_parse()
|
||||
|
||||
assert result == "context_error"
|
||||
@@ -397,10 +397,10 @@ class TestFlowInvoke:
|
||||
"tools_handler": Mock(),
|
||||
}
|
||||
|
||||
@patch.object(CrewAgentExecutorFlow, "kickoff")
|
||||
@patch.object(CrewAgentExecutorFlow, "_create_short_term_memory")
|
||||
@patch.object(CrewAgentExecutorFlow, "_create_long_term_memory")
|
||||
@patch.object(CrewAgentExecutorFlow, "_create_external_memory")
|
||||
@patch.object(AgentExecutor, "kickoff")
|
||||
@patch.object(AgentExecutor, "_create_short_term_memory")
|
||||
@patch.object(AgentExecutor, "_create_long_term_memory")
|
||||
@patch.object(AgentExecutor, "_create_external_memory")
|
||||
def test_invoke_success(
|
||||
self,
|
||||
mock_external_memory,
|
||||
@@ -410,7 +410,7 @@ class TestFlowInvoke:
|
||||
mock_dependencies,
|
||||
):
|
||||
"""Test successful invoke without human feedback."""
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
|
||||
# Mock kickoff to set the final answer in state
|
||||
def mock_kickoff_side_effect():
|
||||
@@ -429,10 +429,10 @@ class TestFlowInvoke:
|
||||
mock_long_term_memory.assert_called_once()
|
||||
mock_external_memory.assert_called_once()
|
||||
|
||||
@patch.object(CrewAgentExecutorFlow, "kickoff")
|
||||
@patch.object(AgentExecutor, "kickoff")
|
||||
def test_invoke_failure_no_agent_finish(self, mock_kickoff, mock_dependencies):
|
||||
"""Test invoke fails without AgentFinish."""
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
executor.state.current_answer = AgentAction(
|
||||
thought="thinking", tool="test", tool_input="test", text="action text"
|
||||
)
|
||||
@@ -442,10 +442,10 @@ class TestFlowInvoke:
|
||||
with pytest.raises(RuntimeError, match="without reaching a final answer"):
|
||||
executor.invoke(inputs)
|
||||
|
||||
@patch.object(CrewAgentExecutorFlow, "kickoff")
|
||||
@patch.object(CrewAgentExecutorFlow, "_create_short_term_memory")
|
||||
@patch.object(CrewAgentExecutorFlow, "_create_long_term_memory")
|
||||
@patch.object(CrewAgentExecutorFlow, "_create_external_memory")
|
||||
@patch.object(AgentExecutor, "kickoff")
|
||||
@patch.object(AgentExecutor, "_create_short_term_memory")
|
||||
@patch.object(AgentExecutor, "_create_long_term_memory")
|
||||
@patch.object(AgentExecutor, "_create_external_memory")
|
||||
def test_invoke_with_system_prompt(
|
||||
self,
|
||||
mock_external_memory,
|
||||
@@ -459,7 +459,7 @@ class TestFlowInvoke:
|
||||
"system": "System: {input}",
|
||||
"user": "User: {input} {tool_names} {tools}",
|
||||
}
|
||||
executor = CrewAgentExecutorFlow(**mock_dependencies)
|
||||
executor = AgentExecutor(**mock_dependencies)
|
||||
|
||||
def mock_kickoff_side_effect():
|
||||
executor.state.current_answer = AgentFinish(
|
||||
@@ -72,62 +72,53 @@ class ResearchResult(BaseModel):
|
||||
|
||||
@pytest.mark.vcr()
|
||||
@pytest.mark.parametrize("verbose", [True, False])
|
||||
def test_lite_agent_created_with_correct_parameters(monkeypatch, verbose):
|
||||
"""Test that LiteAgent is created with the correct parameters when Agent.kickoff() is called."""
|
||||
def test_agent_kickoff_preserves_parameters(verbose):
|
||||
"""Test that Agent.kickoff() uses the correct parameters from the Agent."""
|
||||
# Create a test agent with specific parameters
|
||||
llm = LLM(model="gpt-4o-mini")
|
||||
mock_llm = Mock(spec=LLM)
|
||||
mock_llm.call.return_value = "Final Answer: Test response"
|
||||
mock_llm.stop = []
|
||||
|
||||
from crewai.types.usage_metrics import UsageMetrics
|
||||
|
||||
mock_usage_metrics = UsageMetrics(
|
||||
total_tokens=100,
|
||||
prompt_tokens=50,
|
||||
completion_tokens=50,
|
||||
cached_prompt_tokens=0,
|
||||
successful_requests=1,
|
||||
)
|
||||
mock_llm.get_token_usage_summary.return_value = mock_usage_metrics
|
||||
|
||||
custom_tools = [WebSearchTool(), CalculatorTool()]
|
||||
max_iter = 10
|
||||
max_execution_time = 300
|
||||
|
||||
agent = Agent(
|
||||
role="Test Agent",
|
||||
goal="Test Goal",
|
||||
backstory="Test Backstory",
|
||||
llm=llm,
|
||||
llm=mock_llm,
|
||||
tools=custom_tools,
|
||||
max_iter=max_iter,
|
||||
max_execution_time=max_execution_time,
|
||||
verbose=verbose,
|
||||
)
|
||||
|
||||
# Create a mock to capture the created LiteAgent
|
||||
created_lite_agent = None
|
||||
original_lite_agent = LiteAgent
|
||||
# Call kickoff and verify it works
|
||||
result = agent.kickoff("Test query")
|
||||
|
||||
# Define a mock LiteAgent class that captures its arguments
|
||||
class MockLiteAgent(original_lite_agent):
|
||||
def __init__(self, **kwargs):
|
||||
nonlocal created_lite_agent
|
||||
created_lite_agent = kwargs
|
||||
super().__init__(**kwargs)
|
||||
# Verify the agent was configured correctly
|
||||
assert agent.role == "Test Agent"
|
||||
assert agent.goal == "Test Goal"
|
||||
assert agent.backstory == "Test Backstory"
|
||||
assert len(agent.tools) == 2
|
||||
assert isinstance(agent.tools[0], WebSearchTool)
|
||||
assert isinstance(agent.tools[1], CalculatorTool)
|
||||
assert agent.max_iter == max_iter
|
||||
assert agent.verbose == verbose
|
||||
|
||||
# Patch the LiteAgent class
|
||||
monkeypatch.setattr("crewai.agent.core.LiteAgent", MockLiteAgent)
|
||||
|
||||
# Call kickoff to create the LiteAgent
|
||||
agent.kickoff("Test query")
|
||||
|
||||
# Verify all parameters were passed correctly
|
||||
assert created_lite_agent is not None
|
||||
assert created_lite_agent["role"] == "Test Agent"
|
||||
assert created_lite_agent["goal"] == "Test Goal"
|
||||
assert created_lite_agent["backstory"] == "Test Backstory"
|
||||
assert created_lite_agent["llm"] == llm
|
||||
assert len(created_lite_agent["tools"]) == 2
|
||||
assert isinstance(created_lite_agent["tools"][0], WebSearchTool)
|
||||
assert isinstance(created_lite_agent["tools"][1], CalculatorTool)
|
||||
assert created_lite_agent["max_iterations"] == max_iter
|
||||
assert created_lite_agent["max_execution_time"] == max_execution_time
|
||||
assert created_lite_agent["verbose"] == verbose
|
||||
assert created_lite_agent["response_format"] is None
|
||||
|
||||
# Test with a response_format
|
||||
class TestResponse(BaseModel):
|
||||
test_field: str
|
||||
|
||||
agent.kickoff("Test query", response_format=TestResponse)
|
||||
assert created_lite_agent["response_format"] == TestResponse
|
||||
# Verify kickoff returned a result
|
||||
assert result is not None
|
||||
assert result.raw is not None
|
||||
|
||||
|
||||
@pytest.mark.vcr()
|
||||
@@ -310,7 +301,8 @@ def verify_agent_parent_flow(result, agent, flow):
|
||||
|
||||
|
||||
def test_sets_parent_flow_when_inside_flow():
|
||||
captured_agent = None
|
||||
"""Test that an Agent can be created and executed inside a Flow context."""
|
||||
captured_event = None
|
||||
|
||||
mock_llm = Mock(spec=LLM)
|
||||
mock_llm.call.return_value = "Test response"
|
||||
@@ -343,15 +335,17 @@ def test_sets_parent_flow_when_inside_flow():
|
||||
event_received = threading.Event()
|
||||
|
||||
@crewai_event_bus.on(LiteAgentExecutionStartedEvent)
|
||||
def capture_agent(source, event):
|
||||
nonlocal captured_agent
|
||||
captured_agent = source
|
||||
def capture_event(source, event):
|
||||
nonlocal captured_event
|
||||
captured_event = event
|
||||
event_received.set()
|
||||
|
||||
flow.kickoff()
|
||||
result = flow.kickoff()
|
||||
|
||||
assert event_received.wait(timeout=5), "Timeout waiting for agent execution event"
|
||||
assert captured_agent.parent_flow is flow
|
||||
assert captured_event is not None
|
||||
assert captured_event.agent_info["role"] == "Test Agent"
|
||||
assert result is not None
|
||||
|
||||
|
||||
@pytest.mark.vcr()
|
||||
@@ -373,16 +367,14 @@ def test_guardrail_is_called_using_string():
|
||||
|
||||
@crewai_event_bus.on(LLMGuardrailStartedEvent)
|
||||
def capture_guardrail_started(source, event):
|
||||
assert isinstance(source, LiteAgent)
|
||||
assert source.original_agent == agent
|
||||
assert isinstance(source, Agent)
|
||||
with condition:
|
||||
guardrail_events["started"].append(event)
|
||||
condition.notify()
|
||||
|
||||
@crewai_event_bus.on(LLMGuardrailCompletedEvent)
|
||||
def capture_guardrail_completed(source, event):
|
||||
assert isinstance(source, LiteAgent)
|
||||
assert source.original_agent == agent
|
||||
assert isinstance(source, Agent)
|
||||
with condition:
|
||||
guardrail_events["completed"].append(event)
|
||||
condition.notify()
|
||||
@@ -683,3 +675,151 @@ def test_agent_kickoff_with_mcp_tools(mock_get_mcp_tools):
|
||||
|
||||
# Verify MCP tools were retrieved
|
||||
mock_get_mcp_tools.assert_called_once_with("https://mcp.exa.ai/mcp?api_key=test_exa_key&profile=research")
|
||||
|
||||
|
||||
# ============================================================================
|
||||
# Tests for LiteAgent inside Flow (magic auto-async pattern)
|
||||
# ============================================================================
|
||||
|
||||
from crewai.flow.flow import listen
|
||||
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_lite_agent_inside_flow_sync():
|
||||
"""Test that LiteAgent.kickoff() works magically inside a Flow.
|
||||
|
||||
This tests the "magic auto-async" pattern where calling agent.kickoff()
|
||||
from within a Flow automatically detects the event loop and returns a
|
||||
coroutine that the Flow framework awaits. Users don't need to use async/await.
|
||||
"""
|
||||
# Track execution
|
||||
execution_log = []
|
||||
|
||||
class TestFlow(Flow):
|
||||
@start()
|
||||
def run_agent(self):
|
||||
execution_log.append("flow_started")
|
||||
agent = Agent(
|
||||
role="Test Agent",
|
||||
goal="Answer questions",
|
||||
backstory="A helpful test assistant",
|
||||
llm=LLM(model="gpt-4o-mini"),
|
||||
verbose=False,
|
||||
)
|
||||
# Magic: just call kickoff() normally - it auto-detects Flow context
|
||||
result = agent.kickoff(messages="What is 2+2? Reply with just the number.")
|
||||
execution_log.append("agent_completed")
|
||||
return result
|
||||
|
||||
flow = TestFlow()
|
||||
result = flow.kickoff()
|
||||
|
||||
# Verify the flow executed successfully
|
||||
assert "flow_started" in execution_log
|
||||
assert "agent_completed" in execution_log
|
||||
assert result is not None
|
||||
assert isinstance(result, LiteAgentOutput)
|
||||
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_lite_agent_inside_flow_with_tools():
|
||||
"""Test that LiteAgent with tools works correctly inside a Flow."""
|
||||
class TestFlow(Flow):
|
||||
@start()
|
||||
def run_agent_with_tools(self):
|
||||
agent = Agent(
|
||||
role="Calculator Agent",
|
||||
goal="Perform calculations",
|
||||
backstory="A math expert",
|
||||
llm=LLM(model="gpt-4o-mini"),
|
||||
tools=[CalculatorTool()],
|
||||
verbose=False,
|
||||
)
|
||||
result = agent.kickoff(messages="Calculate 10 * 5")
|
||||
return result
|
||||
|
||||
flow = TestFlow()
|
||||
result = flow.kickoff()
|
||||
|
||||
assert result is not None
|
||||
assert isinstance(result, LiteAgentOutput)
|
||||
assert result.raw is not None
|
||||
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_multiple_agents_in_same_flow():
|
||||
"""Test that multiple LiteAgents can run sequentially in the same Flow."""
|
||||
class MultiAgentFlow(Flow):
|
||||
@start()
|
||||
def first_step(self):
|
||||
agent1 = Agent(
|
||||
role="First Agent",
|
||||
goal="Greet users",
|
||||
backstory="A friendly greeter",
|
||||
llm=LLM(model="gpt-4o-mini"),
|
||||
verbose=False,
|
||||
)
|
||||
return agent1.kickoff(messages="Say hello")
|
||||
|
||||
@listen(first_step)
|
||||
def second_step(self, first_result):
|
||||
agent2 = Agent(
|
||||
role="Second Agent",
|
||||
goal="Say goodbye",
|
||||
backstory="A polite farewell agent",
|
||||
llm=LLM(model="gpt-4o-mini"),
|
||||
verbose=False,
|
||||
)
|
||||
return agent2.kickoff(messages="Say goodbye")
|
||||
|
||||
flow = MultiAgentFlow()
|
||||
result = flow.kickoff()
|
||||
|
||||
assert result is not None
|
||||
assert isinstance(result, LiteAgentOutput)
|
||||
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_lite_agent_kickoff_async_inside_flow():
|
||||
"""Test that Agent.kickoff_async() works correctly from async Flow methods."""
|
||||
class AsyncAgentFlow(Flow):
|
||||
@start()
|
||||
async def async_agent_step(self):
|
||||
agent = Agent(
|
||||
role="Async Test Agent",
|
||||
goal="Answer questions asynchronously",
|
||||
backstory="An async helper",
|
||||
llm=LLM(model="gpt-4o-mini"),
|
||||
verbose=False,
|
||||
)
|
||||
result = await agent.kickoff_async(messages="What is 3+3?")
|
||||
return result
|
||||
|
||||
flow = AsyncAgentFlow()
|
||||
result = flow.kickoff()
|
||||
|
||||
assert result is not None
|
||||
assert isinstance(result, LiteAgentOutput)
|
||||
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_lite_agent_standalone_still_works():
|
||||
"""Test that LiteAgent.kickoff() still works normally outside of a Flow.
|
||||
|
||||
This verifies that the magic auto-async pattern doesn't break standalone usage
|
||||
where there's no event loop running.
|
||||
"""
|
||||
agent = Agent(
|
||||
role="Standalone Agent",
|
||||
goal="Answer questions",
|
||||
backstory="A helpful assistant",
|
||||
llm=LLM(model="gpt-4o-mini"),
|
||||
verbose=False,
|
||||
)
|
||||
|
||||
# This should work normally - no Flow, no event loop
|
||||
result = agent.kickoff(messages="What is 5+5? Reply with just the number.")
|
||||
|
||||
assert result is not None
|
||||
assert isinstance(result, LiteAgentOutput)
|
||||
assert result.raw is not None
|
||||
|
||||
@@ -0,0 +1,119 @@
|
||||
interactions:
|
||||
- request:
|
||||
body: '{"messages":[{"role":"system","content":"You are Test Agent. A helpful
|
||||
test assistant\nYour personal goal is: Answer questions\nTo give my best complete
|
||||
final answer to the task respond using the exact following format:\n\nThought:
|
||||
I now can give a great answer\nFinal Answer: Your final answer must be the great
|
||||
and the most complete as possible, it must be outcome described.\n\nI MUST use
|
||||
these formats, my job depends on it!"},{"role":"user","content":"\nCurrent Task:
|
||||
What is 2+2? Reply with just the number.\n\nBegin! This is VERY important to
|
||||
you, use the tools available and give your best Final Answer, your job depends
|
||||
on it!\n\nThought:"}],"model":"gpt-4o-mini"}'
|
||||
headers:
|
||||
User-Agent:
|
||||
- X-USER-AGENT-XXX
|
||||
accept:
|
||||
- application/json
|
||||
accept-encoding:
|
||||
- ACCEPT-ENCODING-XXX
|
||||
authorization:
|
||||
- AUTHORIZATION-XXX
|
||||
connection:
|
||||
- keep-alive
|
||||
content-length:
|
||||
- '673'
|
||||
content-type:
|
||||
- application/json
|
||||
host:
|
||||
- api.openai.com
|
||||
x-stainless-arch:
|
||||
- X-STAINLESS-ARCH-XXX
|
||||
x-stainless-async:
|
||||
- 'false'
|
||||
x-stainless-lang:
|
||||
- python
|
||||
x-stainless-os:
|
||||
- X-STAINLESS-OS-XXX
|
||||
x-stainless-package-version:
|
||||
- 1.83.0
|
||||
x-stainless-read-timeout:
|
||||
- X-STAINLESS-READ-TIMEOUT-XXX
|
||||
x-stainless-retry-count:
|
||||
- '0'
|
||||
x-stainless-runtime:
|
||||
- CPython
|
||||
x-stainless-runtime-version:
|
||||
- 3.13.3
|
||||
method: POST
|
||||
uri: https://api.openai.com/v1/chat/completions
|
||||
response:
|
||||
body:
|
||||
string: "{\n \"id\": \"chatcmpl-Cy7b0HjL79y39EkUcMLrRhPFe3XGj\",\n \"object\":
|
||||
\"chat.completion\",\n \"created\": 1768444914,\n \"model\": \"gpt-4o-mini-2024-07-18\",\n
|
||||
\ \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\":
|
||||
\"assistant\",\n \"content\": \"I now can give a great answer \\nFinal
|
||||
Answer: 4\",\n \"refusal\": null,\n \"annotations\": []\n },\n
|
||||
\ \"logprobs\": null,\n \"finish_reason\": \"stop\"\n }\n ],\n
|
||||
\ \"usage\": {\n \"prompt_tokens\": 136,\n \"completion_tokens\": 13,\n
|
||||
\ \"total_tokens\": 149,\n \"prompt_tokens_details\": {\n \"cached_tokens\":
|
||||
0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\":
|
||||
{\n \"reasoning_tokens\": 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\":
|
||||
0,\n \"rejected_prediction_tokens\": 0\n }\n },\n \"service_tier\":
|
||||
\"default\",\n \"system_fingerprint\": \"fp_8bbc38b4db\"\n}\n"
|
||||
headers:
|
||||
CF-RAY:
|
||||
- CF-RAY-XXX
|
||||
Connection:
|
||||
- keep-alive
|
||||
Content-Type:
|
||||
- application/json
|
||||
Date:
|
||||
- Thu, 15 Jan 2026 02:41:55 GMT
|
||||
Server:
|
||||
- cloudflare
|
||||
Set-Cookie:
|
||||
- SET-COOKIE-XXX
|
||||
Strict-Transport-Security:
|
||||
- STS-XXX
|
||||
Transfer-Encoding:
|
||||
- chunked
|
||||
X-Content-Type-Options:
|
||||
- X-CONTENT-TYPE-XXX
|
||||
access-control-expose-headers:
|
||||
- ACCESS-CONTROL-XXX
|
||||
alt-svc:
|
||||
- h3=":443"; ma=86400
|
||||
cf-cache-status:
|
||||
- DYNAMIC
|
||||
content-length:
|
||||
- '857'
|
||||
openai-organization:
|
||||
- OPENAI-ORG-XXX
|
||||
openai-processing-ms:
|
||||
- '341'
|
||||
openai-project:
|
||||
- OPENAI-PROJECT-XXX
|
||||
openai-version:
|
||||
- '2020-10-01'
|
||||
x-envoy-upstream-service-time:
|
||||
- '358'
|
||||
x-openai-proxy-wasm:
|
||||
- v0.1
|
||||
x-ratelimit-limit-requests:
|
||||
- X-RATELIMIT-LIMIT-REQUESTS-XXX
|
||||
x-ratelimit-limit-tokens:
|
||||
- X-RATELIMIT-LIMIT-TOKENS-XXX
|
||||
x-ratelimit-remaining-requests:
|
||||
- X-RATELIMIT-REMAINING-REQUESTS-XXX
|
||||
x-ratelimit-remaining-tokens:
|
||||
- X-RATELIMIT-REMAINING-TOKENS-XXX
|
||||
x-ratelimit-reset-requests:
|
||||
- X-RATELIMIT-RESET-REQUESTS-XXX
|
||||
x-ratelimit-reset-tokens:
|
||||
- X-RATELIMIT-RESET-TOKENS-XXX
|
||||
x-request-id:
|
||||
- X-REQUEST-ID-XXX
|
||||
status:
|
||||
code: 200
|
||||
message: OK
|
||||
version: 1
|
||||
@@ -0,0 +1,255 @@
|
||||
interactions:
|
||||
- request:
|
||||
body: '{"messages":[{"role":"system","content":"You are Calculator Agent. A math
|
||||
expert\nYour personal goal is: Perform calculations\nYou ONLY have access to
|
||||
the following tools, and should NEVER make up tools that are not listed here:\n\nTool
|
||||
Name: calculate\nTool Arguments: {\n \"properties\": {\n \"expression\":
|
||||
{\n \"title\": \"Expression\",\n \"type\": \"string\"\n }\n },\n \"required\":
|
||||
[\n \"expression\"\n ],\n \"title\": \"CalculatorToolSchema\",\n \"type\":
|
||||
\"object\",\n \"additionalProperties\": false\n}\nTool Description: Calculate
|
||||
the result of a mathematical expression.\n\nIMPORTANT: Use the following format
|
||||
in your response:\n\n```\nThought: you should always think about what to do\nAction:
|
||||
the action to take, only one name of [calculate], just the name, exactly as
|
||||
it''s written.\nAction Input: the input to the action, just a simple JSON object,
|
||||
enclosed in curly braces, using \" to wrap keys and values.\nObservation: the
|
||||
result of the action\n```\n\nOnce all necessary information is gathered, return
|
||||
the following format:\n\n```\nThought: I now know the final answer\nFinal Answer:
|
||||
the final answer to the original input question\n```"},{"role":"user","content":"\nCurrent
|
||||
Task: Calculate 10 * 5\n\nBegin! This is VERY important to you, use the tools
|
||||
available and give your best Final Answer, your job depends on it!\n\nThought:"}],"model":"gpt-4o-mini"}'
|
||||
headers:
|
||||
User-Agent:
|
||||
- X-USER-AGENT-XXX
|
||||
accept:
|
||||
- application/json
|
||||
accept-encoding:
|
||||
- ACCEPT-ENCODING-XXX
|
||||
authorization:
|
||||
- AUTHORIZATION-XXX
|
||||
connection:
|
||||
- keep-alive
|
||||
content-length:
|
||||
- '1403'
|
||||
content-type:
|
||||
- application/json
|
||||
host:
|
||||
- api.openai.com
|
||||
x-stainless-arch:
|
||||
- X-STAINLESS-ARCH-XXX
|
||||
x-stainless-async:
|
||||
- 'false'
|
||||
x-stainless-lang:
|
||||
- python
|
||||
x-stainless-os:
|
||||
- X-STAINLESS-OS-XXX
|
||||
x-stainless-package-version:
|
||||
- 1.83.0
|
||||
x-stainless-read-timeout:
|
||||
- X-STAINLESS-READ-TIMEOUT-XXX
|
||||
x-stainless-retry-count:
|
||||
- '0'
|
||||
x-stainless-runtime:
|
||||
- CPython
|
||||
x-stainless-runtime-version:
|
||||
- 3.13.3
|
||||
method: POST
|
||||
uri: https://api.openai.com/v1/chat/completions
|
||||
response:
|
||||
body:
|
||||
string: "{\n \"id\": \"chatcmpl-Cy7avghVPSpszLmlbHpwDQlWDoD6O\",\n \"object\":
|
||||
\"chat.completion\",\n \"created\": 1768444909,\n \"model\": \"gpt-4o-mini-2024-07-18\",\n
|
||||
\ \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\":
|
||||
\"assistant\",\n \"content\": \"Thought: I need to calculate the expression
|
||||
10 * 5.\\nAction: calculate\\nAction Input: {\\\"expression\\\":\\\"10 * 5\\\"}\\nObservation:
|
||||
50\",\n \"refusal\": null,\n \"annotations\": []\n },\n
|
||||
\ \"logprobs\": null,\n \"finish_reason\": \"stop\"\n }\n ],\n
|
||||
\ \"usage\": {\n \"prompt_tokens\": 291,\n \"completion_tokens\": 33,\n
|
||||
\ \"total_tokens\": 324,\n \"prompt_tokens_details\": {\n \"cached_tokens\":
|
||||
0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\":
|
||||
{\n \"reasoning_tokens\": 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\":
|
||||
0,\n \"rejected_prediction_tokens\": 0\n }\n },\n \"service_tier\":
|
||||
\"default\",\n \"system_fingerprint\": \"fp_c4585b5b9c\"\n}\n"
|
||||
headers:
|
||||
CF-RAY:
|
||||
- CF-RAY-XXX
|
||||
Connection:
|
||||
- keep-alive
|
||||
Content-Type:
|
||||
- application/json
|
||||
Date:
|
||||
- Thu, 15 Jan 2026 02:41:49 GMT
|
||||
Server:
|
||||
- cloudflare
|
||||
Set-Cookie:
|
||||
- SET-COOKIE-XXX
|
||||
Strict-Transport-Security:
|
||||
- STS-XXX
|
||||
Transfer-Encoding:
|
||||
- chunked
|
||||
X-Content-Type-Options:
|
||||
- X-CONTENT-TYPE-XXX
|
||||
access-control-expose-headers:
|
||||
- ACCESS-CONTROL-XXX
|
||||
alt-svc:
|
||||
- h3=":443"; ma=86400
|
||||
cf-cache-status:
|
||||
- DYNAMIC
|
||||
content-length:
|
||||
- '939'
|
||||
openai-organization:
|
||||
- OPENAI-ORG-XXX
|
||||
openai-processing-ms:
|
||||
- '579'
|
||||
openai-project:
|
||||
- OPENAI-PROJECT-XXX
|
||||
openai-version:
|
||||
- '2020-10-01'
|
||||
x-envoy-upstream-service-time:
|
||||
- '598'
|
||||
x-openai-proxy-wasm:
|
||||
- v0.1
|
||||
x-ratelimit-limit-requests:
|
||||
- X-RATELIMIT-LIMIT-REQUESTS-XXX
|
||||
x-ratelimit-limit-tokens:
|
||||
- X-RATELIMIT-LIMIT-TOKENS-XXX
|
||||
x-ratelimit-remaining-requests:
|
||||
- X-RATELIMIT-REMAINING-REQUESTS-XXX
|
||||
x-ratelimit-remaining-tokens:
|
||||
- X-RATELIMIT-REMAINING-TOKENS-XXX
|
||||
x-ratelimit-reset-requests:
|
||||
- X-RATELIMIT-RESET-REQUESTS-XXX
|
||||
x-ratelimit-reset-tokens:
|
||||
- X-RATELIMIT-RESET-TOKENS-XXX
|
||||
x-request-id:
|
||||
- X-REQUEST-ID-XXX
|
||||
status:
|
||||
code: 200
|
||||
message: OK
|
||||
- request:
|
||||
body: '{"messages":[{"role":"system","content":"You are Calculator Agent. A math
|
||||
expert\nYour personal goal is: Perform calculations\nYou ONLY have access to
|
||||
the following tools, and should NEVER make up tools that are not listed here:\n\nTool
|
||||
Name: calculate\nTool Arguments: {\n \"properties\": {\n \"expression\":
|
||||
{\n \"title\": \"Expression\",\n \"type\": \"string\"\n }\n },\n \"required\":
|
||||
[\n \"expression\"\n ],\n \"title\": \"CalculatorToolSchema\",\n \"type\":
|
||||
\"object\",\n \"additionalProperties\": false\n}\nTool Description: Calculate
|
||||
the result of a mathematical expression.\n\nIMPORTANT: Use the following format
|
||||
in your response:\n\n```\nThought: you should always think about what to do\nAction:
|
||||
the action to take, only one name of [calculate], just the name, exactly as
|
||||
it''s written.\nAction Input: the input to the action, just a simple JSON object,
|
||||
enclosed in curly braces, using \" to wrap keys and values.\nObservation: the
|
||||
result of the action\n```\n\nOnce all necessary information is gathered, return
|
||||
the following format:\n\n```\nThought: I now know the final answer\nFinal Answer:
|
||||
the final answer to the original input question\n```"},{"role":"user","content":"\nCurrent
|
||||
Task: Calculate 10 * 5\n\nBegin! This is VERY important to you, use the tools
|
||||
available and give your best Final Answer, your job depends on it!\n\nThought:"},{"role":"assistant","content":"Thought:
|
||||
I need to calculate the expression 10 * 5.\nAction: calculate\nAction Input:
|
||||
{\"expression\":\"10 * 5\"}\nObservation: The result of 10 * 5 is 50"}],"model":"gpt-4o-mini"}'
|
||||
headers:
|
||||
User-Agent:
|
||||
- X-USER-AGENT-XXX
|
||||
accept:
|
||||
- application/json
|
||||
accept-encoding:
|
||||
- ACCEPT-ENCODING-XXX
|
||||
authorization:
|
||||
- AUTHORIZATION-XXX
|
||||
connection:
|
||||
- keep-alive
|
||||
content-length:
|
||||
- '1591'
|
||||
content-type:
|
||||
- application/json
|
||||
cookie:
|
||||
- COOKIE-XXX
|
||||
host:
|
||||
- api.openai.com
|
||||
x-stainless-arch:
|
||||
- X-STAINLESS-ARCH-XXX
|
||||
x-stainless-async:
|
||||
- 'false'
|
||||
x-stainless-lang:
|
||||
- python
|
||||
x-stainless-os:
|
||||
- X-STAINLESS-OS-XXX
|
||||
x-stainless-package-version:
|
||||
- 1.83.0
|
||||
x-stainless-read-timeout:
|
||||
- X-STAINLESS-READ-TIMEOUT-XXX
|
||||
x-stainless-retry-count:
|
||||
- '0'
|
||||
x-stainless-runtime:
|
||||
- CPython
|
||||
x-stainless-runtime-version:
|
||||
- 3.13.3
|
||||
method: POST
|
||||
uri: https://api.openai.com/v1/chat/completions
|
||||
response:
|
||||
body:
|
||||
string: "{\n \"id\": \"chatcmpl-Cy7avDhDZCLvv8v2dh8ZQRrLdci6A\",\n \"object\":
|
||||
\"chat.completion\",\n \"created\": 1768444909,\n \"model\": \"gpt-4o-mini-2024-07-18\",\n
|
||||
\ \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\":
|
||||
\"assistant\",\n \"content\": \"Thought: I now know the final answer.\\nFinal
|
||||
Answer: 50\",\n \"refusal\": null,\n \"annotations\": []\n },\n
|
||||
\ \"logprobs\": null,\n \"finish_reason\": \"stop\"\n }\n ],\n
|
||||
\ \"usage\": {\n \"prompt_tokens\": 337,\n \"completion_tokens\": 14,\n
|
||||
\ \"total_tokens\": 351,\n \"prompt_tokens_details\": {\n \"cached_tokens\":
|
||||
0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\":
|
||||
{\n \"reasoning_tokens\": 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\":
|
||||
0,\n \"rejected_prediction_tokens\": 0\n }\n },\n \"service_tier\":
|
||||
\"default\",\n \"system_fingerprint\": \"fp_c4585b5b9c\"\n}\n"
|
||||
headers:
|
||||
CF-RAY:
|
||||
- CF-RAY-XXX
|
||||
Connection:
|
||||
- keep-alive
|
||||
Content-Type:
|
||||
- application/json
|
||||
Date:
|
||||
- Thu, 15 Jan 2026 02:41:50 GMT
|
||||
Server:
|
||||
- cloudflare
|
||||
Strict-Transport-Security:
|
||||
- STS-XXX
|
||||
Transfer-Encoding:
|
||||
- chunked
|
||||
X-Content-Type-Options:
|
||||
- X-CONTENT-TYPE-XXX
|
||||
access-control-expose-headers:
|
||||
- ACCESS-CONTROL-XXX
|
||||
alt-svc:
|
||||
- h3=":443"; ma=86400
|
||||
cf-cache-status:
|
||||
- DYNAMIC
|
||||
content-length:
|
||||
- '864'
|
||||
openai-organization:
|
||||
- OPENAI-ORG-XXX
|
||||
openai-processing-ms:
|
||||
- '429'
|
||||
openai-project:
|
||||
- OPENAI-PROJECT-XXX
|
||||
openai-version:
|
||||
- '2020-10-01'
|
||||
x-envoy-upstream-service-time:
|
||||
- '457'
|
||||
x-openai-proxy-wasm:
|
||||
- v0.1
|
||||
x-ratelimit-limit-requests:
|
||||
- X-RATELIMIT-LIMIT-REQUESTS-XXX
|
||||
x-ratelimit-limit-tokens:
|
||||
- X-RATELIMIT-LIMIT-TOKENS-XXX
|
||||
x-ratelimit-remaining-requests:
|
||||
- X-RATELIMIT-REMAINING-REQUESTS-XXX
|
||||
x-ratelimit-remaining-tokens:
|
||||
- X-RATELIMIT-REMAINING-TOKENS-XXX
|
||||
x-ratelimit-reset-requests:
|
||||
- X-RATELIMIT-RESET-REQUESTS-XXX
|
||||
x-ratelimit-reset-tokens:
|
||||
- X-RATELIMIT-RESET-TOKENS-XXX
|
||||
x-request-id:
|
||||
- X-REQUEST-ID-XXX
|
||||
status:
|
||||
code: 200
|
||||
message: OK
|
||||
version: 1
|
||||
@@ -0,0 +1,119 @@
|
||||
interactions:
|
||||
- request:
|
||||
body: '{"messages":[{"role":"system","content":"You are Async Test Agent. An async
|
||||
helper\nYour personal goal is: Answer questions asynchronously\nTo give my best
|
||||
complete final answer to the task respond using the exact following format:\n\nThought:
|
||||
I now can give a great answer\nFinal Answer: Your final answer must be the great
|
||||
and the most complete as possible, it must be outcome described.\n\nI MUST use
|
||||
these formats, my job depends on it!"},{"role":"user","content":"\nCurrent Task:
|
||||
What is 3+3?\n\nBegin! This is VERY important to you, use the tools available
|
||||
and give your best Final Answer, your job depends on it!\n\nThought:"}],"model":"gpt-4o-mini"}'
|
||||
headers:
|
||||
User-Agent:
|
||||
- X-USER-AGENT-XXX
|
||||
accept:
|
||||
- application/json
|
||||
accept-encoding:
|
||||
- ACCEPT-ENCODING-XXX
|
||||
authorization:
|
||||
- AUTHORIZATION-XXX
|
||||
connection:
|
||||
- keep-alive
|
||||
content-length:
|
||||
- '657'
|
||||
content-type:
|
||||
- application/json
|
||||
host:
|
||||
- api.openai.com
|
||||
x-stainless-arch:
|
||||
- X-STAINLESS-ARCH-XXX
|
||||
x-stainless-async:
|
||||
- 'false'
|
||||
x-stainless-lang:
|
||||
- python
|
||||
x-stainless-os:
|
||||
- X-STAINLESS-OS-XXX
|
||||
x-stainless-package-version:
|
||||
- 1.83.0
|
||||
x-stainless-read-timeout:
|
||||
- X-STAINLESS-READ-TIMEOUT-XXX
|
||||
x-stainless-retry-count:
|
||||
- '0'
|
||||
x-stainless-runtime:
|
||||
- CPython
|
||||
x-stainless-runtime-version:
|
||||
- 3.13.3
|
||||
method: POST
|
||||
uri: https://api.openai.com/v1/chat/completions
|
||||
response:
|
||||
body:
|
||||
string: "{\n \"id\": \"chatcmpl-Cy7atOGxtc4y3oYNI62WiQ0Vogsdv\",\n \"object\":
|
||||
\"chat.completion\",\n \"created\": 1768444907,\n \"model\": \"gpt-4o-mini-2024-07-18\",\n
|
||||
\ \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\":
|
||||
\"assistant\",\n \"content\": \"I now can give a great answer \\nFinal
|
||||
Answer: The sum of 3 + 3 is 6. Therefore, the outcome is that if you add three
|
||||
and three together, you will arrive at the total of six.\",\n \"refusal\":
|
||||
null,\n \"annotations\": []\n },\n \"logprobs\": null,\n
|
||||
\ \"finish_reason\": \"stop\"\n }\n ],\n \"usage\": {\n \"prompt_tokens\":
|
||||
131,\n \"completion_tokens\": 46,\n \"total_tokens\": 177,\n \"prompt_tokens_details\":
|
||||
{\n \"cached_tokens\": 0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\":
|
||||
{\n \"reasoning_tokens\": 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\":
|
||||
0,\n \"rejected_prediction_tokens\": 0\n }\n },\n \"service_tier\":
|
||||
\"default\",\n \"system_fingerprint\": \"fp_29330a9688\"\n}\n"
|
||||
headers:
|
||||
CF-RAY:
|
||||
- CF-RAY-XXX
|
||||
Connection:
|
||||
- keep-alive
|
||||
Content-Type:
|
||||
- application/json
|
||||
Date:
|
||||
- Thu, 15 Jan 2026 02:41:48 GMT
|
||||
Server:
|
||||
- cloudflare
|
||||
Set-Cookie:
|
||||
- SET-COOKIE-XXX
|
||||
Strict-Transport-Security:
|
||||
- STS-XXX
|
||||
Transfer-Encoding:
|
||||
- chunked
|
||||
X-Content-Type-Options:
|
||||
- X-CONTENT-TYPE-XXX
|
||||
access-control-expose-headers:
|
||||
- ACCESS-CONTROL-XXX
|
||||
alt-svc:
|
||||
- h3=":443"; ma=86400
|
||||
cf-cache-status:
|
||||
- DYNAMIC
|
||||
content-length:
|
||||
- '983'
|
||||
openai-organization:
|
||||
- OPENAI-ORG-XXX
|
||||
openai-processing-ms:
|
||||
- '944'
|
||||
openai-project:
|
||||
- OPENAI-PROJECT-XXX
|
||||
openai-version:
|
||||
- '2020-10-01'
|
||||
x-envoy-upstream-service-time:
|
||||
- '1192'
|
||||
x-openai-proxy-wasm:
|
||||
- v0.1
|
||||
x-ratelimit-limit-requests:
|
||||
- X-RATELIMIT-LIMIT-REQUESTS-XXX
|
||||
x-ratelimit-limit-tokens:
|
||||
- X-RATELIMIT-LIMIT-TOKENS-XXX
|
||||
x-ratelimit-remaining-requests:
|
||||
- X-RATELIMIT-REMAINING-REQUESTS-XXX
|
||||
x-ratelimit-remaining-tokens:
|
||||
- X-RATELIMIT-REMAINING-TOKENS-XXX
|
||||
x-ratelimit-reset-requests:
|
||||
- X-RATELIMIT-RESET-REQUESTS-XXX
|
||||
x-ratelimit-reset-tokens:
|
||||
- X-RATELIMIT-RESET-TOKENS-XXX
|
||||
x-request-id:
|
||||
- X-REQUEST-ID-XXX
|
||||
status:
|
||||
code: 200
|
||||
message: OK
|
||||
version: 1
|
||||
@@ -0,0 +1,119 @@
|
||||
interactions:
|
||||
- request:
|
||||
body: '{"messages":[{"role":"system","content":"You are Standalone Agent. A helpful
|
||||
assistant\nYour personal goal is: Answer questions\nTo give my best complete
|
||||
final answer to the task respond using the exact following format:\n\nThought:
|
||||
I now can give a great answer\nFinal Answer: Your final answer must be the great
|
||||
and the most complete as possible, it must be outcome described.\n\nI MUST use
|
||||
these formats, my job depends on it!"},{"role":"user","content":"\nCurrent Task:
|
||||
What is 5+5? Reply with just the number.\n\nBegin! This is VERY important to
|
||||
you, use the tools available and give your best Final Answer, your job depends
|
||||
on it!\n\nThought:"}],"model":"gpt-4o-mini"}'
|
||||
headers:
|
||||
User-Agent:
|
||||
- X-USER-AGENT-XXX
|
||||
accept:
|
||||
- application/json
|
||||
accept-encoding:
|
||||
- ACCEPT-ENCODING-XXX
|
||||
authorization:
|
||||
- AUTHORIZATION-XXX
|
||||
connection:
|
||||
- keep-alive
|
||||
content-length:
|
||||
- '674'
|
||||
content-type:
|
||||
- application/json
|
||||
host:
|
||||
- api.openai.com
|
||||
x-stainless-arch:
|
||||
- X-STAINLESS-ARCH-XXX
|
||||
x-stainless-async:
|
||||
- 'false'
|
||||
x-stainless-lang:
|
||||
- python
|
||||
x-stainless-os:
|
||||
- X-STAINLESS-OS-XXX
|
||||
x-stainless-package-version:
|
||||
- 1.83.0
|
||||
x-stainless-read-timeout:
|
||||
- X-STAINLESS-READ-TIMEOUT-XXX
|
||||
x-stainless-retry-count:
|
||||
- '0'
|
||||
x-stainless-runtime:
|
||||
- CPython
|
||||
x-stainless-runtime-version:
|
||||
- 3.13.3
|
||||
method: POST
|
||||
uri: https://api.openai.com/v1/chat/completions
|
||||
response:
|
||||
body:
|
||||
string: "{\n \"id\": \"chatcmpl-Cy7azhPwUHQ0p5tdhxSAmLPoE8UgC\",\n \"object\":
|
||||
\"chat.completion\",\n \"created\": 1768444913,\n \"model\": \"gpt-4o-mini-2024-07-18\",\n
|
||||
\ \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\":
|
||||
\"assistant\",\n \"content\": \"I now can give a great answer \\nFinal
|
||||
Answer: 10\",\n \"refusal\": null,\n \"annotations\": []\n },\n
|
||||
\ \"logprobs\": null,\n \"finish_reason\": \"stop\"\n }\n ],\n
|
||||
\ \"usage\": {\n \"prompt_tokens\": 136,\n \"completion_tokens\": 13,\n
|
||||
\ \"total_tokens\": 149,\n \"prompt_tokens_details\": {\n \"cached_tokens\":
|
||||
0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\":
|
||||
{\n \"reasoning_tokens\": 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\":
|
||||
0,\n \"rejected_prediction_tokens\": 0\n }\n },\n \"service_tier\":
|
||||
\"default\",\n \"system_fingerprint\": \"fp_29330a9688\"\n}\n"
|
||||
headers:
|
||||
CF-RAY:
|
||||
- CF-RAY-XXX
|
||||
Connection:
|
||||
- keep-alive
|
||||
Content-Type:
|
||||
- application/json
|
||||
Date:
|
||||
- Thu, 15 Jan 2026 02:41:54 GMT
|
||||
Server:
|
||||
- cloudflare
|
||||
Set-Cookie:
|
||||
- SET-COOKIE-XXX
|
||||
Strict-Transport-Security:
|
||||
- STS-XXX
|
||||
Transfer-Encoding:
|
||||
- chunked
|
||||
X-Content-Type-Options:
|
||||
- X-CONTENT-TYPE-XXX
|
||||
access-control-expose-headers:
|
||||
- ACCESS-CONTROL-XXX
|
||||
alt-svc:
|
||||
- h3=":443"; ma=86400
|
||||
cf-cache-status:
|
||||
- DYNAMIC
|
||||
content-length:
|
||||
- '858'
|
||||
openai-organization:
|
||||
- OPENAI-ORG-XXX
|
||||
openai-processing-ms:
|
||||
- '455'
|
||||
openai-project:
|
||||
- OPENAI-PROJECT-XXX
|
||||
openai-version:
|
||||
- '2020-10-01'
|
||||
x-envoy-upstream-service-time:
|
||||
- '583'
|
||||
x-openai-proxy-wasm:
|
||||
- v0.1
|
||||
x-ratelimit-limit-requests:
|
||||
- X-RATELIMIT-LIMIT-REQUESTS-XXX
|
||||
x-ratelimit-limit-tokens:
|
||||
- X-RATELIMIT-LIMIT-TOKENS-XXX
|
||||
x-ratelimit-remaining-requests:
|
||||
- X-RATELIMIT-REMAINING-REQUESTS-XXX
|
||||
x-ratelimit-remaining-tokens:
|
||||
- X-RATELIMIT-REMAINING-TOKENS-XXX
|
||||
x-ratelimit-reset-requests:
|
||||
- X-RATELIMIT-RESET-REQUESTS-XXX
|
||||
x-ratelimit-reset-tokens:
|
||||
- X-RATELIMIT-RESET-TOKENS-XXX
|
||||
x-request-id:
|
||||
- X-REQUEST-ID-XXX
|
||||
status:
|
||||
code: 200
|
||||
message: OK
|
||||
version: 1
|
||||
@@ -0,0 +1,239 @@
|
||||
interactions:
|
||||
- request:
|
||||
body: '{"messages":[{"role":"system","content":"You are First Agent. A friendly
|
||||
greeter\nYour personal goal is: Greet users\nTo give my best complete final
|
||||
answer to the task respond using the exact following format:\n\nThought: I now
|
||||
can give a great answer\nFinal Answer: Your final answer must be the great and
|
||||
the most complete as possible, it must be outcome described.\n\nI MUST use these
|
||||
formats, my job depends on it!"},{"role":"user","content":"\nCurrent Task: Say
|
||||
hello\n\nBegin! This is VERY important to you, use the tools available and give
|
||||
your best Final Answer, your job depends on it!\n\nThought:"}],"model":"gpt-4o-mini"}'
|
||||
headers:
|
||||
User-Agent:
|
||||
- X-USER-AGENT-XXX
|
||||
accept:
|
||||
- application/json
|
||||
accept-encoding:
|
||||
- ACCEPT-ENCODING-XXX
|
||||
authorization:
|
||||
- AUTHORIZATION-XXX
|
||||
connection:
|
||||
- keep-alive
|
||||
content-length:
|
||||
- '632'
|
||||
content-type:
|
||||
- application/json
|
||||
host:
|
||||
- api.openai.com
|
||||
x-stainless-arch:
|
||||
- X-STAINLESS-ARCH-XXX
|
||||
x-stainless-async:
|
||||
- 'false'
|
||||
x-stainless-lang:
|
||||
- python
|
||||
x-stainless-os:
|
||||
- X-STAINLESS-OS-XXX
|
||||
x-stainless-package-version:
|
||||
- 1.83.0
|
||||
x-stainless-read-timeout:
|
||||
- X-STAINLESS-READ-TIMEOUT-XXX
|
||||
x-stainless-retry-count:
|
||||
- '0'
|
||||
x-stainless-runtime:
|
||||
- CPython
|
||||
x-stainless-runtime-version:
|
||||
- 3.13.3
|
||||
method: POST
|
||||
uri: https://api.openai.com/v1/chat/completions
|
||||
response:
|
||||
body:
|
||||
string: "{\n \"id\": \"chatcmpl-CyRKzgODZ9yn3F9OkaXsscLk2Ln3N\",\n \"object\":
|
||||
\"chat.completion\",\n \"created\": 1768520801,\n \"model\": \"gpt-4o-mini-2024-07-18\",\n
|
||||
\ \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\":
|
||||
\"assistant\",\n \"content\": \"I now can give a great answer \\nFinal
|
||||
Answer: Hello! Welcome! I'm so glad to see you here. If you need any assistance
|
||||
or have any questions, feel free to ask. Have a wonderful day!\",\n \"refusal\":
|
||||
null,\n \"annotations\": []\n },\n \"logprobs\": null,\n
|
||||
\ \"finish_reason\": \"stop\"\n }\n ],\n \"usage\": {\n \"prompt_tokens\":
|
||||
127,\n \"completion_tokens\": 43,\n \"total_tokens\": 170,\n \"prompt_tokens_details\":
|
||||
{\n \"cached_tokens\": 0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\":
|
||||
{\n \"reasoning_tokens\": 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\":
|
||||
0,\n \"rejected_prediction_tokens\": 0\n }\n },\n \"service_tier\":
|
||||
\"default\",\n \"system_fingerprint\": \"fp_c4585b5b9c\"\n}\n"
|
||||
headers:
|
||||
CF-RAY:
|
||||
- CF-RAY-XXX
|
||||
Connection:
|
||||
- keep-alive
|
||||
Content-Type:
|
||||
- application/json
|
||||
Date:
|
||||
- Thu, 15 Jan 2026 23:46:42 GMT
|
||||
Server:
|
||||
- cloudflare
|
||||
Set-Cookie:
|
||||
- SET-COOKIE-XXX
|
||||
Strict-Transport-Security:
|
||||
- STS-XXX
|
||||
Transfer-Encoding:
|
||||
- chunked
|
||||
X-Content-Type-Options:
|
||||
- X-CONTENT-TYPE-XXX
|
||||
access-control-expose-headers:
|
||||
- ACCESS-CONTROL-XXX
|
||||
alt-svc:
|
||||
- h3=":443"; ma=86400
|
||||
cf-cache-status:
|
||||
- DYNAMIC
|
||||
content-length:
|
||||
- '990'
|
||||
openai-organization:
|
||||
- OPENAI-ORG-XXX
|
||||
openai-processing-ms:
|
||||
- '880'
|
||||
openai-project:
|
||||
- OPENAI-PROJECT-XXX
|
||||
openai-version:
|
||||
- '2020-10-01'
|
||||
x-envoy-upstream-service-time:
|
||||
- '1160'
|
||||
x-openai-proxy-wasm:
|
||||
- v0.1
|
||||
x-ratelimit-limit-requests:
|
||||
- X-RATELIMIT-LIMIT-REQUESTS-XXX
|
||||
x-ratelimit-limit-tokens:
|
||||
- X-RATELIMIT-LIMIT-TOKENS-XXX
|
||||
x-ratelimit-remaining-requests:
|
||||
- X-RATELIMIT-REMAINING-REQUESTS-XXX
|
||||
x-ratelimit-remaining-tokens:
|
||||
- X-RATELIMIT-REMAINING-TOKENS-XXX
|
||||
x-ratelimit-reset-requests:
|
||||
- X-RATELIMIT-RESET-REQUESTS-XXX
|
||||
x-ratelimit-reset-tokens:
|
||||
- X-RATELIMIT-RESET-TOKENS-XXX
|
||||
x-request-id:
|
||||
- X-REQUEST-ID-XXX
|
||||
status:
|
||||
code: 200
|
||||
message: OK
|
||||
- request:
|
||||
body: '{"messages":[{"role":"system","content":"You are Second Agent. A polite
|
||||
farewell agent\nYour personal goal is: Say goodbye\nTo give my best complete
|
||||
final answer to the task respond using the exact following format:\n\nThought:
|
||||
I now can give a great answer\nFinal Answer: Your final answer must be the great
|
||||
and the most complete as possible, it must be outcome described.\n\nI MUST use
|
||||
these formats, my job depends on it!"},{"role":"user","content":"\nCurrent Task:
|
||||
Say goodbye\n\nBegin! This is VERY important to you, use the tools available
|
||||
and give your best Final Answer, your job depends on it!\n\nThought:"}],"model":"gpt-4o-mini"}'
|
||||
headers:
|
||||
User-Agent:
|
||||
- X-USER-AGENT-XXX
|
||||
accept:
|
||||
- application/json
|
||||
accept-encoding:
|
||||
- ACCEPT-ENCODING-XXX
|
||||
authorization:
|
||||
- AUTHORIZATION-XXX
|
||||
connection:
|
||||
- keep-alive
|
||||
content-length:
|
||||
- '640'
|
||||
content-type:
|
||||
- application/json
|
||||
host:
|
||||
- api.openai.com
|
||||
x-stainless-arch:
|
||||
- X-STAINLESS-ARCH-XXX
|
||||
x-stainless-async:
|
||||
- 'false'
|
||||
x-stainless-lang:
|
||||
- python
|
||||
x-stainless-os:
|
||||
- X-STAINLESS-OS-XXX
|
||||
x-stainless-package-version:
|
||||
- 1.83.0
|
||||
x-stainless-read-timeout:
|
||||
- X-STAINLESS-READ-TIMEOUT-XXX
|
||||
x-stainless-retry-count:
|
||||
- '0'
|
||||
x-stainless-runtime:
|
||||
- CPython
|
||||
x-stainless-runtime-version:
|
||||
- 3.13.3
|
||||
method: POST
|
||||
uri: https://api.openai.com/v1/chat/completions
|
||||
response:
|
||||
body:
|
||||
string: "{\n \"id\": \"chatcmpl-CyRL1Ua2PkK5xXPp3KeF0AnGAk3JP\",\n \"object\":
|
||||
\"chat.completion\",\n \"created\": 1768520803,\n \"model\": \"gpt-4o-mini-2024-07-18\",\n
|
||||
\ \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\":
|
||||
\"assistant\",\n \"content\": \"I now can give a great answer \\nFinal
|
||||
Answer: As we reach the end of our conversation, I want to express my gratitude
|
||||
for the time we've shared. It's been a pleasure assisting you, and I hope
|
||||
you found our interaction helpful and enjoyable. Remember, whenever you need
|
||||
assistance, I'm just a message away. Wishing you all the best in your future
|
||||
endeavors. Goodbye and take care!\",\n \"refusal\": null,\n \"annotations\":
|
||||
[]\n },\n \"logprobs\": null,\n \"finish_reason\": \"stop\"\n
|
||||
\ }\n ],\n \"usage\": {\n \"prompt_tokens\": 126,\n \"completion_tokens\":
|
||||
79,\n \"total_tokens\": 205,\n \"prompt_tokens_details\": {\n \"cached_tokens\":
|
||||
0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\":
|
||||
{\n \"reasoning_tokens\": 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\":
|
||||
0,\n \"rejected_prediction_tokens\": 0\n }\n },\n \"service_tier\":
|
||||
\"default\",\n \"system_fingerprint\": \"fp_29330a9688\"\n}\n"
|
||||
headers:
|
||||
CF-RAY:
|
||||
- CF-RAY-XXX
|
||||
Connection:
|
||||
- keep-alive
|
||||
Content-Type:
|
||||
- application/json
|
||||
Date:
|
||||
- Thu, 15 Jan 2026 23:46:44 GMT
|
||||
Server:
|
||||
- cloudflare
|
||||
Set-Cookie:
|
||||
- SET-COOKIE-XXX
|
||||
Strict-Transport-Security:
|
||||
- STS-XXX
|
||||
Transfer-Encoding:
|
||||
- chunked
|
||||
X-Content-Type-Options:
|
||||
- X-CONTENT-TYPE-XXX
|
||||
access-control-expose-headers:
|
||||
- ACCESS-CONTROL-XXX
|
||||
alt-svc:
|
||||
- h3=":443"; ma=86400
|
||||
cf-cache-status:
|
||||
- DYNAMIC
|
||||
content-length:
|
||||
- '1189'
|
||||
openai-organization:
|
||||
- OPENAI-ORG-XXX
|
||||
openai-processing-ms:
|
||||
- '1363'
|
||||
openai-project:
|
||||
- OPENAI-PROJECT-XXX
|
||||
openai-version:
|
||||
- '2020-10-01'
|
||||
x-envoy-upstream-service-time:
|
||||
- '1605'
|
||||
x-openai-proxy-wasm:
|
||||
- v0.1
|
||||
x-ratelimit-limit-requests:
|
||||
- X-RATELIMIT-LIMIT-REQUESTS-XXX
|
||||
x-ratelimit-limit-tokens:
|
||||
- X-RATELIMIT-LIMIT-TOKENS-XXX
|
||||
x-ratelimit-remaining-requests:
|
||||
- X-RATELIMIT-REMAINING-REQUESTS-XXX
|
||||
x-ratelimit-remaining-tokens:
|
||||
- X-RATELIMIT-REMAINING-TOKENS-XXX
|
||||
x-ratelimit-reset-requests:
|
||||
- X-RATELIMIT-RESET-REQUESTS-XXX
|
||||
x-ratelimit-reset-tokens:
|
||||
- X-RATELIMIT-RESET-TOKENS-XXX
|
||||
x-request-id:
|
||||
- X-REQUEST-ID-XXX
|
||||
status:
|
||||
code: 200
|
||||
message: OK
|
||||
version: 1
|
||||
@@ -0,0 +1,104 @@
|
||||
interactions:
|
||||
- request:
|
||||
body: '{"max_tokens":4096,"messages":[{"role":"user","content":[{"type":"text","text":"What
|
||||
type of document is this? Answer in one word."},{"type":"document","source":{"type":"base64","media_type":"application/pdf","data":"JVBERi0xLjQKMSAwIG9iaiA8PCAvVHlwZSAvQ2F0YWxvZyAvUGFnZXMgMiAwIFIgPj4gZW5kb2JqCjIgMCBvYmogPDwgL1R5cGUgL1BhZ2VzIC9LaWRzIFszIDAgUl0gL0NvdW50IDEgPj4gZW5kb2JqCjMgMCBvYmogPDwgL1R5cGUgL1BhZ2UgL1BhcmVudCAyIDAgUiAvTWVkaWFCb3ggWzAgMCA2MTIgNzkyXSA+PiBlbmRvYmoKeHJlZgowIDQKMDAwMDAwMDAwMCA2NTUzNSBmCjAwMDAwMDAwMDkgMDAwMDAgbgowMDAwMDAwMDU4IDAwMDAwIG4KMDAwMDAwMDExNSAwMDAwMCBuCnRyYWlsZXIgPDwgL1NpemUgNCAvUm9vdCAxIDAgUiA+PgpzdGFydHhyZWYKMTk2CiUlRU9GCg=="},"cache_control":{"type":"ephemeral"}}]}],"model":"claude-3-5-haiku-20241022","stream":false}'
|
||||
headers:
|
||||
User-Agent:
|
||||
- X-USER-AGENT-XXX
|
||||
accept:
|
||||
- application/json
|
||||
accept-encoding:
|
||||
- ACCEPT-ENCODING-XXX
|
||||
anthropic-version:
|
||||
- '2023-06-01'
|
||||
connection:
|
||||
- keep-alive
|
||||
content-length:
|
||||
- '748'
|
||||
content-type:
|
||||
- application/json
|
||||
host:
|
||||
- api.anthropic.com
|
||||
x-api-key:
|
||||
- X-API-KEY-XXX
|
||||
x-stainless-arch:
|
||||
- X-STAINLESS-ARCH-XXX
|
||||
x-stainless-async:
|
||||
- 'false'
|
||||
x-stainless-lang:
|
||||
- python
|
||||
x-stainless-os:
|
||||
- X-STAINLESS-OS-XXX
|
||||
x-stainless-package-version:
|
||||
- 0.71.1
|
||||
x-stainless-retry-count:
|
||||
- '0'
|
||||
x-stainless-runtime:
|
||||
- CPython
|
||||
x-stainless-runtime-version:
|
||||
- 3.12.10
|
||||
x-stainless-timeout:
|
||||
- NOT_GIVEN
|
||||
method: POST
|
||||
uri: https://api.anthropic.com/v1/messages
|
||||
response:
|
||||
body:
|
||||
string: !!binary |
|
||||
H4sIAAAAAAAA/3WQTUvEMBCG/8ucW2jr7rL25sKCKHrQiyASYjJsw6ZJzUxEKf3vTheLX3hKeJ8n
|
||||
8zIZoY8WPbRgvM4Wy7NyXXbaHXPZVM2qrpoGCnBWhJ4Oqqovd/nBnt92tF1dX+z3u6t7ffO8FYff
|
||||
B5wtJNIHlCBFPweayBHrwBKZGBjl1j6Oi8/4NpPT0cIdUu4RpqcCiOOgEmqKQQAGqzinAJ+A8CVj
|
||||
MDIhZO8LyKfSdgQXhsyK4xEDQVtvmo3UatOhMjKMXQzqp1ItXLD9jy1v5wYcOuwxaa/W/V//i9bd
|
||||
bzoVEDN/j1ayDqZXZ1CxwySLzl9ldbIwTR/rySkqnAEAAA==
|
||||
headers:
|
||||
CF-RAY:
|
||||
- CF-RAY-XXX
|
||||
Connection:
|
||||
- keep-alive
|
||||
Content-Type:
|
||||
- application/json
|
||||
Date:
|
||||
- Thu, 22 Jan 2026 00:18:50 GMT
|
||||
Server:
|
||||
- cloudflare
|
||||
Transfer-Encoding:
|
||||
- chunked
|
||||
X-Robots-Tag:
|
||||
- none
|
||||
anthropic-organization-id:
|
||||
- ANTHROPIC-ORGANIZATION-ID-XXX
|
||||
anthropic-ratelimit-input-tokens-limit:
|
||||
- ANTHROPIC-RATELIMIT-INPUT-TOKENS-LIMIT-XXX
|
||||
anthropic-ratelimit-input-tokens-remaining:
|
||||
- ANTHROPIC-RATELIMIT-INPUT-TOKENS-REMAINING-XXX
|
||||
anthropic-ratelimit-input-tokens-reset:
|
||||
- ANTHROPIC-RATELIMIT-INPUT-TOKENS-RESET-XXX
|
||||
anthropic-ratelimit-output-tokens-limit:
|
||||
- ANTHROPIC-RATELIMIT-OUTPUT-TOKENS-LIMIT-XXX
|
||||
anthropic-ratelimit-output-tokens-remaining:
|
||||
- ANTHROPIC-RATELIMIT-OUTPUT-TOKENS-REMAINING-XXX
|
||||
anthropic-ratelimit-output-tokens-reset:
|
||||
- ANTHROPIC-RATELIMIT-OUTPUT-TOKENS-RESET-XXX
|
||||
anthropic-ratelimit-requests-limit:
|
||||
- '4000'
|
||||
anthropic-ratelimit-requests-remaining:
|
||||
- '3999'
|
||||
anthropic-ratelimit-requests-reset:
|
||||
- '2026-01-22T00:18:50Z'
|
||||
anthropic-ratelimit-tokens-limit:
|
||||
- ANTHROPIC-RATELIMIT-TOKENS-LIMIT-XXX
|
||||
anthropic-ratelimit-tokens-remaining:
|
||||
- ANTHROPIC-RATELIMIT-TOKENS-REMAINING-XXX
|
||||
anthropic-ratelimit-tokens-reset:
|
||||
- ANTHROPIC-RATELIMIT-TOKENS-RESET-XXX
|
||||
cf-cache-status:
|
||||
- DYNAMIC
|
||||
request-id:
|
||||
- REQUEST-ID-XXX
|
||||
strict-transport-security:
|
||||
- STS-XXX
|
||||
x-envoy-upstream-service-time:
|
||||
- '750'
|
||||
status:
|
||||
code: 200
|
||||
message: OK
|
||||
version: 1
|
||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@@ -0,0 +1,104 @@
|
||||
interactions:
|
||||
- request:
|
||||
body: '{"max_tokens":4096,"messages":[{"role":"user","content":[{"type":"text","text":"What
|
||||
type of document is this? Answer in one word."},{"type":"document","source":{"type":"base64","media_type":"application/pdf","data":"JVBERi0xLjQKMSAwIG9iaiA8PCAvVHlwZSAvQ2F0YWxvZyAvUGFnZXMgMiAwIFIgPj4gZW5kb2JqCjIgMCBvYmogPDwgL1R5cGUgL1BhZ2VzIC9LaWRzIFszIDAgUl0gL0NvdW50IDEgPj4gZW5kb2JqCjMgMCBvYmogPDwgL1R5cGUgL1BhZ2UgL1BhcmVudCAyIDAgUiAvTWVkaWFCb3ggWzAgMCA2MTIgNzkyXSA+PiBlbmRvYmoKeHJlZgowIDQKMDAwMDAwMDAwMCA2NTUzNSBmCjAwMDAwMDAwMDkgMDAwMDAgbgowMDAwMDAwMDU4IDAwMDAwIG4KMDAwMDAwMDExNSAwMDAwMCBuCnRyYWlsZXIgPDwgL1NpemUgNCAvUm9vdCAxIDAgUiA+PgpzdGFydHhyZWYKMTk2CiUlRU9GCg=="},"cache_control":{"type":"ephemeral"}}]}],"model":"claude-3-5-haiku-20241022","stream":false}'
|
||||
headers:
|
||||
User-Agent:
|
||||
- X-USER-AGENT-XXX
|
||||
accept:
|
||||
- application/json
|
||||
accept-encoding:
|
||||
- ACCEPT-ENCODING-XXX
|
||||
anthropic-version:
|
||||
- '2023-06-01'
|
||||
connection:
|
||||
- keep-alive
|
||||
content-length:
|
||||
- '748'
|
||||
content-type:
|
||||
- application/json
|
||||
host:
|
||||
- api.anthropic.com
|
||||
x-api-key:
|
||||
- X-API-KEY-XXX
|
||||
x-stainless-arch:
|
||||
- X-STAINLESS-ARCH-XXX
|
||||
x-stainless-async:
|
||||
- 'false'
|
||||
x-stainless-lang:
|
||||
- python
|
||||
x-stainless-os:
|
||||
- X-STAINLESS-OS-XXX
|
||||
x-stainless-package-version:
|
||||
- 0.71.1
|
||||
x-stainless-retry-count:
|
||||
- '0'
|
||||
x-stainless-runtime:
|
||||
- CPython
|
||||
x-stainless-runtime-version:
|
||||
- 3.12.10
|
||||
x-stainless-timeout:
|
||||
- NOT_GIVEN
|
||||
method: POST
|
||||
uri: https://api.anthropic.com/v1/messages
|
||||
response:
|
||||
body:
|
||||
string: !!binary |
|
||||
H4sIAAAAAAAA/3WQTUvEMBCG/8ucW2hju4eeRUU97EFRFAkhGbZh06Qmk1Up/e9OF4tf7CnhfZ7J
|
||||
y2SCIRh00IF2Khssz8q27JXd51JUoqkrIaAAa1gY0k5W9bbptXo7PD60l/V1f/V0J+5vxQ079DHi
|
||||
YmFKaoccxOCWQKVkEylPHOngCfnWPU+rT/i+kOPRwfb8AuaXAhKFUUZUKXhO0RtJOXr4AglfM3rN
|
||||
4z47V0A+NnYTWD9mkhT26BN09UZsuFPpHqXmx8gGL38r1coZm1NsnV0acOxxwKicbIf//jet+790
|
||||
LiBk+hk1vA7Gg9UoyWLkRZd/MioamOdP24g1JZkBAAA=
|
||||
headers:
|
||||
CF-RAY:
|
||||
- CF-RAY-XXX
|
||||
Connection:
|
||||
- keep-alive
|
||||
Content-Type:
|
||||
- application/json
|
||||
Date:
|
||||
- Thu, 22 Jan 2026 00:18:56 GMT
|
||||
Server:
|
||||
- cloudflare
|
||||
Transfer-Encoding:
|
||||
- chunked
|
||||
X-Robots-Tag:
|
||||
- none
|
||||
anthropic-organization-id:
|
||||
- ANTHROPIC-ORGANIZATION-ID-XXX
|
||||
anthropic-ratelimit-input-tokens-limit:
|
||||
- ANTHROPIC-RATELIMIT-INPUT-TOKENS-LIMIT-XXX
|
||||
anthropic-ratelimit-input-tokens-remaining:
|
||||
- ANTHROPIC-RATELIMIT-INPUT-TOKENS-REMAINING-XXX
|
||||
anthropic-ratelimit-input-tokens-reset:
|
||||
- ANTHROPIC-RATELIMIT-INPUT-TOKENS-RESET-XXX
|
||||
anthropic-ratelimit-output-tokens-limit:
|
||||
- ANTHROPIC-RATELIMIT-OUTPUT-TOKENS-LIMIT-XXX
|
||||
anthropic-ratelimit-output-tokens-remaining:
|
||||
- ANTHROPIC-RATELIMIT-OUTPUT-TOKENS-REMAINING-XXX
|
||||
anthropic-ratelimit-output-tokens-reset:
|
||||
- ANTHROPIC-RATELIMIT-OUTPUT-TOKENS-RESET-XXX
|
||||
anthropic-ratelimit-requests-limit:
|
||||
- '4000'
|
||||
anthropic-ratelimit-requests-remaining:
|
||||
- '3999'
|
||||
anthropic-ratelimit-requests-reset:
|
||||
- '2026-01-22T00:18:55Z'
|
||||
anthropic-ratelimit-tokens-limit:
|
||||
- ANTHROPIC-RATELIMIT-TOKENS-LIMIT-XXX
|
||||
anthropic-ratelimit-tokens-remaining:
|
||||
- ANTHROPIC-RATELIMIT-TOKENS-REMAINING-XXX
|
||||
anthropic-ratelimit-tokens-reset:
|
||||
- ANTHROPIC-RATELIMIT-TOKENS-RESET-XXX
|
||||
cf-cache-status:
|
||||
- DYNAMIC
|
||||
request-id:
|
||||
- REQUEST-ID-XXX
|
||||
strict-transport-security:
|
||||
- STS-XXX
|
||||
x-envoy-upstream-service-time:
|
||||
- '648'
|
||||
status:
|
||||
code: 200
|
||||
message: OK
|
||||
version: 1
|
||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@@ -0,0 +1,75 @@
|
||||
interactions:
|
||||
- request:
|
||||
body: '{"contents": [{"parts": [{"text": "\nCurrent Task: What is the capital
|
||||
of Japan?\n\nThis is the expected criteria for your final answer: The capital
|
||||
of Japan\nyou MUST return the actual complete content as the final answer, not
|
||||
a summary.\n\nBegin! This is VERY important to you, use the tools available
|
||||
and give your best Final Answer, your job depends on it!\n\nThought:"}], "role":
|
||||
"user"}], "systemInstruction": {"parts": [{"text": "You are Research Assistant.
|
||||
You are a helpful research assistant.\nYour personal goal is: Find information
|
||||
about the capital of Japan\nTo give my best complete final answer to the task
|
||||
respond using the exact following format:\n\nThought: I now can give a great
|
||||
answer\nFinal Answer: Your final answer must be the great and the most complete
|
||||
as possible, it must be outcome described.\n\nI MUST use these formats, my job
|
||||
depends on it!"}], "role": "user"}, "generationConfig": {"stopSequences": ["\nObservation:"]}}'
|
||||
headers:
|
||||
User-Agent:
|
||||
- X-USER-AGENT-XXX
|
||||
accept:
|
||||
- '*/*'
|
||||
accept-encoding:
|
||||
- ACCEPT-ENCODING-XXX
|
||||
connection:
|
||||
- keep-alive
|
||||
content-length:
|
||||
- '952'
|
||||
content-type:
|
||||
- application/json
|
||||
host:
|
||||
- aiplatform.googleapis.com
|
||||
x-goog-api-client:
|
||||
- google-genai-sdk/1.59.0 gl-python/3.13.3
|
||||
x-goog-api-key:
|
||||
- X-GOOG-API-KEY-XXX
|
||||
method: POST
|
||||
uri: https://aiplatform.googleapis.com/v1/publishers/google/models/gemini-2.0-flash-exp:generateContent
|
||||
response:
|
||||
body:
|
||||
string: "{\n \"candidates\": [\n {\n \"content\": {\n \"role\":
|
||||
\"model\",\n \"parts\": [\n {\n \"text\": \"The
|
||||
capital of Japan is Tokyo.\\nFinal Answer: Tokyo\\n\"\n }\n ]\n
|
||||
\ },\n \"finishReason\": \"STOP\",\n \"avgLogprobs\": -0.017845841554495003\n
|
||||
\ }\n ],\n \"usageMetadata\": {\n \"promptTokenCount\": 163,\n \"candidatesTokenCount\":
|
||||
13,\n \"totalTokenCount\": 176,\n \"trafficType\": \"ON_DEMAND\",\n
|
||||
\ \"promptTokensDetails\": [\n {\n \"modality\": \"TEXT\",\n
|
||||
\ \"tokenCount\": 163\n }\n ],\n \"candidatesTokensDetails\":
|
||||
[\n {\n \"modality\": \"TEXT\",\n \"tokenCount\": 13\n
|
||||
\ }\n ]\n },\n \"modelVersion\": \"gemini-2.0-flash-exp\",\n \"createTime\":
|
||||
\"2026-01-15T22:27:38.066749Z\",\n \"responseId\": \"2mlpab2JBNOFidsPh5GigQs\"\n}\n"
|
||||
headers:
|
||||
Alt-Svc:
|
||||
- h3=":443"; ma=2592000,h3-29=":443"; ma=2592000
|
||||
Content-Type:
|
||||
- application/json; charset=UTF-8
|
||||
Date:
|
||||
- Thu, 15 Jan 2026 22:27:38 GMT
|
||||
Server:
|
||||
- scaffolding on HTTPServer2
|
||||
Transfer-Encoding:
|
||||
- chunked
|
||||
Vary:
|
||||
- Origin
|
||||
- X-Origin
|
||||
- Referer
|
||||
X-Content-Type-Options:
|
||||
- X-CONTENT-TYPE-XXX
|
||||
X-Frame-Options:
|
||||
- X-FRAME-OPTIONS-XXX
|
||||
X-XSS-Protection:
|
||||
- '0'
|
||||
content-length:
|
||||
- '786'
|
||||
status:
|
||||
code: 200
|
||||
message: OK
|
||||
version: 1
|
||||
File diff suppressed because one or more lines are too long
@@ -1,456 +1,528 @@
|
||||
interactions:
|
||||
- request:
|
||||
body: '{"trace_id": "00000000-0000-0000-0000-000000000000", "execution_type": "crew", "user_identifier": null, "execution_context": {"crew_fingerprint": null, "crew_name": "Unknown Crew", "flow_name": null, "crewai_version": "1.3.0", "privacy_level": "standard"}, "execution_metadata": {"expected_duration_estimate": 300, "agent_count": 0, "task_count": 0, "flow_method_count": 0, "execution_started_at": "2025-11-05T22:19:56.074812+00:00"}}'
|
||||
body: "{\"messages\":[{\"role\":\"system\",\"content\":\"You are Guardrail Agent.
|
||||
You are a expert at validating the output of a task. By providing effective
|
||||
feedback if the output is not valid.\\nYour personal goal is: Validate the output
|
||||
of the task\\nTo give my best complete final answer to the task respond using
|
||||
the exact following format:\\n\\nThought: I now can give a great answer\\nFinal
|
||||
Answer: Your final answer must be the great and the most complete as possible,
|
||||
it must be outcome described.\\n\\nI MUST use these formats, my job depends
|
||||
on it!\"},{\"role\":\"user\",\"content\":\"\\nCurrent Task: \\n Ensure
|
||||
the following task result complies with the given guardrail.\\n\\n Task
|
||||
result:\\n \\n Lorem Ipsum is simply dummy text of the printing
|
||||
and typesetting industry. Lorem Ipsum has been the industry's standard dummy
|
||||
text ever\\n \\n\\n Guardrail:\\n Ensure the result has
|
||||
less than 10 words\\n\\n Your task:\\n - Confirm if the Task result
|
||||
complies with the guardrail.\\n - If not, provide clear feedback explaining
|
||||
what is wrong (e.g., by how much it violates the rule, or what specific part
|
||||
fails).\\n - Focus only on identifying issues \u2014 do not propose corrections.\\n
|
||||
\ - If the Task result complies with the guardrail, saying that is valid\\n
|
||||
\ \\n\\nBegin! This is VERY important to you, use the tools available
|
||||
and give your best Final Answer, your job depends on it!\\n\\nThought:\"}],\"model\":\"gpt-4o\"}"
|
||||
headers:
|
||||
Accept:
|
||||
- '*/*'
|
||||
Accept-Encoding:
|
||||
- gzip, deflate, zstd
|
||||
Connection:
|
||||
- keep-alive
|
||||
Content-Length:
|
||||
- '434'
|
||||
Content-Type:
|
||||
- application/json
|
||||
User-Agent:
|
||||
- CrewAI-CLI/1.3.0
|
||||
X-Crewai-Version:
|
||||
- 1.3.0
|
||||
method: POST
|
||||
uri: https://app.crewai.com/crewai_plus/api/v1/tracing/batches
|
||||
response:
|
||||
body:
|
||||
string: '{"error":"bad_credentials","message":"Bad credentials"}'
|
||||
headers:
|
||||
Connection:
|
||||
- keep-alive
|
||||
Content-Length:
|
||||
- '55'
|
||||
Content-Type:
|
||||
- application/json; charset=utf-8
|
||||
Date:
|
||||
- Wed, 05 Nov 2025 22:19:56 GMT
|
||||
cache-control:
|
||||
- no-store
|
||||
content-security-policy:
|
||||
- 'default-src ''self'' *.app.crewai.com app.crewai.com; script-src ''self'' ''unsafe-inline'' *.app.crewai.com app.crewai.com https://cdn.jsdelivr.net/npm/apexcharts https://www.gstatic.com https://run.pstmn.io https://apis.google.com https://apis.google.com/js/api.js https://accounts.google.com https://accounts.google.com/gsi/client https://cdnjs.cloudflare.com/ajax/libs/normalize/8.0.1/normalize.min.css.map https://*.google.com https://docs.google.com https://slides.google.com https://js.hs-scripts.com https://js.sentry-cdn.com https://browser.sentry-cdn.com https://www.googletagmanager.com https://js-na1.hs-scripts.com https://js.hubspot.com http://js-na1.hs-scripts.com https://bat.bing.com https://cdn.amplitude.com https://cdn.segment.com https://d1d3n03t5zntha.cloudfront.net/ https://descriptusercontent.com https://edge.fullstory.com https://googleads.g.doubleclick.net https://js.hs-analytics.net https://js.hs-banner.com https://js.hsadspixel.net https://js.hscollectedforms.net
|
||||
https://js.usemessages.com https://snap.licdn.com https://static.cloudflareinsights.com https://static.reo.dev https://www.google-analytics.com https://share.descript.com/; style-src ''self'' ''unsafe-inline'' *.app.crewai.com app.crewai.com https://cdn.jsdelivr.net/npm/apexcharts; img-src ''self'' data: *.app.crewai.com app.crewai.com https://zeus.tools.crewai.com https://dashboard.tools.crewai.com https://cdn.jsdelivr.net https://forms.hsforms.com https://track.hubspot.com https://px.ads.linkedin.com https://px4.ads.linkedin.com https://www.google.com https://www.google.com.br; font-src ''self'' data: *.app.crewai.com app.crewai.com; connect-src ''self'' *.app.crewai.com app.crewai.com https://zeus.tools.crewai.com https://connect.useparagon.com/ https://zeus.useparagon.com/* https://*.useparagon.com/* https://run.pstmn.io https://connect.tools.crewai.com/ https://*.sentry.io https://www.google-analytics.com https://edge.fullstory.com https://rs.fullstory.com https://api.hubspot.com
|
||||
https://forms.hscollectedforms.net https://api.hubapi.com https://px.ads.linkedin.com https://px4.ads.linkedin.com https://google.com/pagead/form-data/16713662509 https://google.com/ccm/form-data/16713662509 https://www.google.com/ccm/collect https://worker-actionkit.tools.crewai.com https://api.reo.dev; frame-src ''self'' *.app.crewai.com app.crewai.com https://connect.useparagon.com/ https://zeus.tools.crewai.com https://zeus.useparagon.com/* https://connect.tools.crewai.com/ https://docs.google.com https://drive.google.com https://slides.google.com https://accounts.google.com https://*.google.com https://app.hubspot.com/ https://td.doubleclick.net https://www.googletagmanager.com/ https://www.youtube.com https://share.descript.com'
|
||||
expires:
|
||||
- '0'
|
||||
permissions-policy:
|
||||
- camera=(), microphone=(self), geolocation=()
|
||||
pragma:
|
||||
- no-cache
|
||||
referrer-policy:
|
||||
- strict-origin-when-cross-origin
|
||||
strict-transport-security:
|
||||
- max-age=63072000; includeSubDomains
|
||||
vary:
|
||||
- Accept
|
||||
x-content-type-options:
|
||||
- nosniff
|
||||
x-frame-options:
|
||||
- SAMEORIGIN
|
||||
x-permitted-cross-domain-policies:
|
||||
- none
|
||||
x-request-id:
|
||||
- 230c6cb5-92c7-448d-8c94-e5548a9f4259
|
||||
x-runtime:
|
||||
- '0.073220'
|
||||
x-xss-protection:
|
||||
- 1; mode=block
|
||||
status:
|
||||
code: 401
|
||||
message: Unauthorized
|
||||
- request:
|
||||
body: '{"messages":[{"role":"system","content":"You are Guardrail Agent. You are a expert at validating the output of a task. By providing effective feedback if the output is not valid.\nYour personal goal is: Validate the output of the task\n\nTo give my best complete final answer to the task respond using the exact following format:\n\nThought: I now can give a great answer\nFinal Answer: Your final answer must be the great and the most complete as possible, it must be outcome described.\n\nI MUST use these formats, my job depends on it!Ensure your final answer strictly adheres to the following OpenAPI schema: {\n \"type\": \"json_schema\",\n \"json_schema\": {\n \"name\": \"LLMGuardrailResult\",\n \"strict\": true,\n \"schema\": {\n \"properties\": {\n \"valid\": {\n \"description\": \"Whether the task output complies with the guardrail\",\n \"title\": \"Valid\",\n \"type\": \"boolean\"\n },\n \"feedback\": {\n \"anyOf\":
|
||||
[\n {\n \"type\": \"string\"\n },\n {\n \"type\": \"null\"\n }\n ],\n \"default\": null,\n \"description\": \"A feedback about the task output if it is not valid\",\n \"title\": \"Feedback\"\n }\n },\n \"required\": [\n \"valid\",\n \"feedback\"\n ],\n \"title\": \"LLMGuardrailResult\",\n \"type\": \"object\",\n \"additionalProperties\": false\n }\n }\n}\n\nDo not include the OpenAPI schema in the final output. Ensure the final output does not include any code block markers like ```json or ```python."},{"role":"user","content":"\n Ensure the following task result complies with the given guardrail.\n\n Task result:\n \n Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry''s standard dummy text ever\n \n\n Guardrail:\n Ensure
|
||||
the result has less than 10 words\n\n Your task:\n - Confirm if the Task result complies with the guardrail.\n - If not, provide clear feedback explaining what is wrong (e.g., by how much it violates the rule, or what specific part fails).\n - Focus only on identifying issues — do not propose corrections.\n - If the Task result complies with the guardrail, saying that is valid\n "}],"model":"gpt-4o"}'
|
||||
headers:
|
||||
- X-USER-AGENT-XXX
|
||||
accept:
|
||||
- application/json
|
||||
accept-encoding:
|
||||
- gzip, deflate, zstd
|
||||
- ACCEPT-ENCODING-XXX
|
||||
authorization:
|
||||
- AUTHORIZATION-XXX
|
||||
connection:
|
||||
- keep-alive
|
||||
content-length:
|
||||
- '2452'
|
||||
- '1467'
|
||||
content-type:
|
||||
- application/json
|
||||
host:
|
||||
- api.openai.com
|
||||
user-agent:
|
||||
- OpenAI/Python 1.109.1
|
||||
x-stainless-arch:
|
||||
- arm64
|
||||
- X-STAINLESS-ARCH-XXX
|
||||
x-stainless-async:
|
||||
- 'false'
|
||||
x-stainless-lang:
|
||||
- python
|
||||
x-stainless-os:
|
||||
- MacOS
|
||||
- X-STAINLESS-OS-XXX
|
||||
x-stainless-package-version:
|
||||
- 1.109.1
|
||||
- 1.83.0
|
||||
x-stainless-read-timeout:
|
||||
- '600'
|
||||
- X-STAINLESS-READ-TIMEOUT-XXX
|
||||
x-stainless-retry-count:
|
||||
- '0'
|
||||
x-stainless-runtime:
|
||||
- CPython
|
||||
x-stainless-runtime-version:
|
||||
- 3.12.9
|
||||
- 3.13.3
|
||||
method: POST
|
||||
uri: https://api.openai.com/v1/chat/completions
|
||||
response:
|
||||
body:
|
||||
string: "{\n \"id\": \"chatcmpl-CYg96Riy2RJRxnBHvoROukymP9wvs\",\n \"object\": \"chat.completion\",\n \"created\": 1762381196,\n \"model\": \"gpt-4o-2024-08-06\",\n \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\": \"assistant\",\n \"content\": \"Thought: I need to check if the task result meets the requirement of having less than 10 words.\\n\\nFinal Answer: {\\n \\\"valid\\\": false,\\n \\\"feedback\\\": \\\"The task result contains more than 10 words, violating the guardrail. The text provided contains about 21 words.\\\"\\n}\",\n \"refusal\": null,\n \"annotations\": []\n },\n \"logprobs\": null,\n \"finish_reason\": \"stop\"\n }\n ],\n \"usage\": {\n \"prompt_tokens\": 489,\n \"completion_tokens\": 61,\n \"total_tokens\": 550,\n \"prompt_tokens_details\": {\n \"cached_tokens\": 0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\": {\n \"reasoning_tokens\"\
|
||||
: 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\": 0,\n \"rejected_prediction_tokens\": 0\n }\n },\n \"service_tier\": \"default\",\n \"system_fingerprint\": \"fp_cbf1785567\"\n}\n"
|
||||
string: "{\n \"id\": \"chatcmpl-Cy7yHRYTZi8yzRbcODnKr92keLKCb\",\n \"object\":
|
||||
\"chat.completion\",\n \"created\": 1768446357,\n \"model\": \"gpt-4o-2024-08-06\",\n
|
||||
\ \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\":
|
||||
\"assistant\",\n \"content\": \"The task result provided has more than
|
||||
10 words. I will count the words to verify this.\\n\\nThe task result is the
|
||||
following text:\\n\\\"Lorem Ipsum is simply dummy text of the printing and
|
||||
typesetting industry. Lorem Ipsum has been the industry's standard dummy text
|
||||
ever\\\"\\n\\nCounting the words:\\n\\n1. Lorem \\n2. Ipsum \\n3. is \\n4.
|
||||
simply \\n5. dummy \\n6. text \\n7. of \\n8. the \\n9. printing \\n10. and
|
||||
\\n11. typesetting \\n12. industry. \\n13. Lorem \\n14. Ipsum \\n15. has \\n16.
|
||||
been \\n17. the \\n18. industry's \\n19. standard \\n20. dummy \\n21. text
|
||||
\\n22. ever\\n\\nThe total word count is 22.\\n\\nThought: I now can give
|
||||
a great answer\\nFinal Answer: The task result does not comply with the guardrail.
|
||||
It contains 22 words, which exceeds the limit of 10 words.\",\n \"refusal\":
|
||||
null,\n \"annotations\": []\n },\n \"logprobs\": null,\n
|
||||
\ \"finish_reason\": \"stop\"\n }\n ],\n \"usage\": {\n \"prompt_tokens\":
|
||||
285,\n \"completion_tokens\": 195,\n \"total_tokens\": 480,\n \"prompt_tokens_details\":
|
||||
{\n \"cached_tokens\": 0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\":
|
||||
{\n \"reasoning_tokens\": 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\":
|
||||
0,\n \"rejected_prediction_tokens\": 0\n }\n },\n \"service_tier\":
|
||||
\"default\",\n \"system_fingerprint\": \"fp_deacdd5f6f\"\n}\n"
|
||||
headers:
|
||||
CF-RAY:
|
||||
- REDACTED-RAY
|
||||
- CF-RAY-XXX
|
||||
Connection:
|
||||
- keep-alive
|
||||
Content-Type:
|
||||
- application/json
|
||||
Date:
|
||||
- Wed, 05 Nov 2025 22:19:58 GMT
|
||||
- Thu, 15 Jan 2026 03:05:59 GMT
|
||||
Server:
|
||||
- cloudflare
|
||||
Set-Cookie:
|
||||
- __cf_bm=REDACTED; path=/; expires=Wed, 05-Nov-25 22:49:58 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None
|
||||
- _cfuvid=REDACTED; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None
|
||||
- SET-COOKIE-XXX
|
||||
Strict-Transport-Security:
|
||||
- max-age=31536000; includeSubDomains; preload
|
||||
- STS-XXX
|
||||
Transfer-Encoding:
|
||||
- chunked
|
||||
X-Content-Type-Options:
|
||||
- nosniff
|
||||
- X-CONTENT-TYPE-XXX
|
||||
access-control-expose-headers:
|
||||
- X-Request-ID
|
||||
- ACCESS-CONTROL-XXX
|
||||
alt-svc:
|
||||
- h3=":443"; ma=86400
|
||||
cf-cache-status:
|
||||
- DYNAMIC
|
||||
content-length:
|
||||
- '1557'
|
||||
openai-organization:
|
||||
- user-hortuttj2f3qtmxyik2zxf4q
|
||||
- OPENAI-ORG-XXX
|
||||
openai-processing-ms:
|
||||
- '2201'
|
||||
- '2130'
|
||||
openai-project:
|
||||
- proj_fL4UBWR1CMpAAdgzaSKqsVvA
|
||||
- OPENAI-PROJECT-XXX
|
||||
openai-version:
|
||||
- '2020-10-01'
|
||||
x-envoy-upstream-service-time:
|
||||
- '2401'
|
||||
- '2147'
|
||||
x-openai-proxy-wasm:
|
||||
- v0.1
|
||||
x-ratelimit-limit-requests:
|
||||
- '500'
|
||||
- X-RATELIMIT-LIMIT-REQUESTS-XXX
|
||||
x-ratelimit-limit-tokens:
|
||||
- '30000'
|
||||
- X-RATELIMIT-LIMIT-TOKENS-XXX
|
||||
x-ratelimit-remaining-requests:
|
||||
- '499'
|
||||
- X-RATELIMIT-REMAINING-REQUESTS-XXX
|
||||
x-ratelimit-remaining-tokens:
|
||||
- '29439'
|
||||
- X-RATELIMIT-REMAINING-TOKENS-XXX
|
||||
x-ratelimit-reset-requests:
|
||||
- 120ms
|
||||
- X-RATELIMIT-RESET-REQUESTS-XXX
|
||||
x-ratelimit-reset-tokens:
|
||||
- 1.122s
|
||||
- X-RATELIMIT-RESET-TOKENS-XXX
|
||||
x-request-id:
|
||||
- req_REDACTED
|
||||
- X-REQUEST-ID-XXX
|
||||
status:
|
||||
code: 200
|
||||
message: OK
|
||||
- request:
|
||||
body: '{"messages":[{"role":"system","content":"Ensure your final answer strictly adheres to the following OpenAPI schema: {\n \"type\": \"json_schema\",\n \"json_schema\": {\n \"name\": \"LLMGuardrailResult\",\n \"strict\": true,\n \"schema\": {\n \"properties\": {\n \"valid\": {\n \"description\": \"Whether the task output complies with the guardrail\",\n \"title\": \"Valid\",\n \"type\": \"boolean\"\n },\n \"feedback\": {\n \"anyOf\": [\n {\n \"type\": \"string\"\n },\n {\n \"type\": \"null\"\n }\n ],\n \"default\": null,\n \"description\": \"A feedback about the task output if it is not valid\",\n \"title\": \"Feedback\"\n }\n },\n \"required\": [\n \"valid\",\n \"feedback\"\n ],\n \"title\": \"LLMGuardrailResult\",\n \"type\": \"object\",\n \"additionalProperties\":
|
||||
false\n }\n }\n}\n\nDo not include the OpenAPI schema in the final output. Ensure the final output does not include any code block markers like ```json or ```python."},{"role":"user","content":"{\n \"valid\": false,\n \"feedback\": \"The task result contains more than 10 words, violating the guardrail. The text provided contains about 21 words.\"\n}"}],"model":"gpt-4o","response_format":{"type":"json_schema","json_schema":{"schema":{"properties":{"valid":{"description":"Whether the task output complies with the guardrail","title":"Valid","type":"boolean"},"feedback":{"anyOf":[{"type":"string"},{"type":"null"}],"description":"A feedback about the task output if it is not valid","title":"Feedback"}},"required":["valid","feedback"],"title":"LLMGuardrailResult","type":"object","additionalProperties":false},"name":"LLMGuardrailResult","strict":true}},"stream":false}'
|
||||
body: '{"messages":[{"role":"system","content":"Ensure your final answer strictly
|
||||
adheres to the following OpenAPI schema: {\n \"type\": \"json_schema\",\n \"json_schema\":
|
||||
{\n \"name\": \"LLMGuardrailResult\",\n \"strict\": true,\n \"schema\":
|
||||
{\n \"properties\": {\n \"valid\": {\n \"description\":
|
||||
\"Whether the task output complies with the guardrail\",\n \"title\":
|
||||
\"Valid\",\n \"type\": \"boolean\"\n },\n \"feedback\":
|
||||
{\n \"anyOf\": [\n {\n \"type\": \"string\"\n },\n {\n \"type\":
|
||||
\"null\"\n }\n ],\n \"default\": null,\n \"description\":
|
||||
\"A feedback about the task output if it is not valid\",\n \"title\":
|
||||
\"Feedback\"\n }\n },\n \"required\": [\n \"valid\",\n \"feedback\"\n ],\n \"title\":
|
||||
\"LLMGuardrailResult\",\n \"type\": \"object\",\n \"additionalProperties\":
|
||||
false\n }\n }\n}\n\nDo not include the OpenAPI schema in the final output.
|
||||
Ensure the final output does not include any code block markers like ```json
|
||||
or ```python."},{"role":"user","content":"The task result does not comply with
|
||||
the guardrail. It contains 22 words, which exceeds the limit of 10 words."}],"model":"gpt-4o","response_format":{"type":"json_schema","json_schema":{"schema":{"properties":{"valid":{"description":"Whether
|
||||
the task output complies with the guardrail","title":"Valid","type":"boolean"},"feedback":{"anyOf":[{"type":"string"},{"type":"null"}],"description":"A
|
||||
feedback about the task output if it is not valid","title":"Feedback"}},"required":["valid","feedback"],"title":"LLMGuardrailResult","type":"object","additionalProperties":false},"name":"LLMGuardrailResult","strict":true}},"stream":false}'
|
||||
headers:
|
||||
User-Agent:
|
||||
- X-USER-AGENT-XXX
|
||||
accept:
|
||||
- application/json
|
||||
accept-encoding:
|
||||
- gzip, deflate, zstd
|
||||
- ACCEPT-ENCODING-XXX
|
||||
authorization:
|
||||
- AUTHORIZATION-XXX
|
||||
connection:
|
||||
- keep-alive
|
||||
content-length:
|
||||
- '1884'
|
||||
- '1835'
|
||||
content-type:
|
||||
- application/json
|
||||
cookie:
|
||||
- __cf_bm=REDACTED; _cfuvid=REDACTED
|
||||
- COOKIE-XXX
|
||||
host:
|
||||
- api.openai.com
|
||||
user-agent:
|
||||
- OpenAI/Python 1.109.1
|
||||
x-stainless-arch:
|
||||
- arm64
|
||||
- X-STAINLESS-ARCH-XXX
|
||||
x-stainless-async:
|
||||
- 'false'
|
||||
x-stainless-helper-method:
|
||||
- chat.completions.parse
|
||||
- beta.chat.completions.parse
|
||||
x-stainless-lang:
|
||||
- python
|
||||
x-stainless-os:
|
||||
- MacOS
|
||||
- X-STAINLESS-OS-XXX
|
||||
x-stainless-package-version:
|
||||
- 1.109.1
|
||||
- 1.83.0
|
||||
x-stainless-read-timeout:
|
||||
- '600'
|
||||
- X-STAINLESS-READ-TIMEOUT-XXX
|
||||
x-stainless-retry-count:
|
||||
- '0'
|
||||
x-stainless-runtime:
|
||||
- CPython
|
||||
x-stainless-runtime-version:
|
||||
- 3.12.9
|
||||
- 3.13.3
|
||||
method: POST
|
||||
uri: https://api.openai.com/v1/chat/completions
|
||||
response:
|
||||
body:
|
||||
string: "{\n \"id\": \"chatcmpl-CYg98QlZ8NTrQ69676MpXXyCoZJT8\",\n \"object\": \"chat.completion\",\n \"created\": 1762381198,\n \"model\": \"gpt-4o-2024-08-06\",\n \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\": \"assistant\",\n \"content\": \"{\\\"valid\\\":false,\\\"feedback\\\":\\\"The task result contains more than 10 words, violating the guardrail. The text provided contains about 21 words.\\\"}\",\n \"refusal\": null,\n \"annotations\": []\n },\n \"logprobs\": null,\n \"finish_reason\": \"stop\"\n }\n ],\n \"usage\": {\n \"prompt_tokens\": 374,\n \"completion_tokens\": 32,\n \"total_tokens\": 406,\n \"prompt_tokens_details\": {\n \"cached_tokens\": 0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\": {\n \"reasoning_tokens\": 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\": 0,\n \"rejected_prediction_tokens\": 0\n }\n },\n\
|
||||
\ \"service_tier\": \"default\",\n \"system_fingerprint\": \"fp_cbf1785567\"\n}\n"
|
||||
string: "{\n \"id\": \"chatcmpl-Cy7yJiPCk4fXuogyT5e8XeGRLCSf8\",\n \"object\":
|
||||
\"chat.completion\",\n \"created\": 1768446359,\n \"model\": \"gpt-4o-2024-08-06\",\n
|
||||
\ \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\":
|
||||
\"assistant\",\n \"content\": \"{\\\"valid\\\":false,\\\"feedback\\\":\\\"The
|
||||
task output exceeds the word limit of 10 words by containing 22 words.\\\"}\",\n
|
||||
\ \"refusal\": null,\n \"annotations\": []\n },\n \"logprobs\":
|
||||
null,\n \"finish_reason\": \"stop\"\n }\n ],\n \"usage\": {\n \"prompt_tokens\":
|
||||
363,\n \"completion_tokens\": 25,\n \"total_tokens\": 388,\n \"prompt_tokens_details\":
|
||||
{\n \"cached_tokens\": 0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\":
|
||||
{\n \"reasoning_tokens\": 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\":
|
||||
0,\n \"rejected_prediction_tokens\": 0\n }\n },\n \"service_tier\":
|
||||
\"default\",\n \"system_fingerprint\": \"fp_a0e9480a2f\"\n}\n"
|
||||
headers:
|
||||
CF-RAY:
|
||||
- REDACTED-RAY
|
||||
- CF-RAY-XXX
|
||||
Connection:
|
||||
- keep-alive
|
||||
Content-Type:
|
||||
- application/json
|
||||
Date:
|
||||
- Wed, 05 Nov 2025 22:19:59 GMT
|
||||
- Thu, 15 Jan 2026 03:05:59 GMT
|
||||
Server:
|
||||
- cloudflare
|
||||
Strict-Transport-Security:
|
||||
- max-age=31536000; includeSubDomains; preload
|
||||
- STS-XXX
|
||||
Transfer-Encoding:
|
||||
- chunked
|
||||
X-Content-Type-Options:
|
||||
- nosniff
|
||||
- X-CONTENT-TYPE-XXX
|
||||
access-control-expose-headers:
|
||||
- X-Request-ID
|
||||
- ACCESS-CONTROL-XXX
|
||||
alt-svc:
|
||||
- h3=":443"; ma=86400
|
||||
cf-cache-status:
|
||||
- DYNAMIC
|
||||
content-length:
|
||||
- '913'
|
||||
openai-organization:
|
||||
- user-hortuttj2f3qtmxyik2zxf4q
|
||||
- OPENAI-ORG-XXX
|
||||
openai-processing-ms:
|
||||
- '419'
|
||||
- '488'
|
||||
openai-project:
|
||||
- proj_fL4UBWR1CMpAAdgzaSKqsVvA
|
||||
- OPENAI-PROJECT-XXX
|
||||
openai-version:
|
||||
- '2020-10-01'
|
||||
x-envoy-upstream-service-time:
|
||||
- '432'
|
||||
- '507'
|
||||
x-openai-proxy-wasm:
|
||||
- v0.1
|
||||
x-ratelimit-limit-requests:
|
||||
- '500'
|
||||
- X-RATELIMIT-LIMIT-REQUESTS-XXX
|
||||
x-ratelimit-limit-tokens:
|
||||
- '30000'
|
||||
- X-RATELIMIT-LIMIT-TOKENS-XXX
|
||||
x-ratelimit-remaining-requests:
|
||||
- '499'
|
||||
- X-RATELIMIT-REMAINING-REQUESTS-XXX
|
||||
x-ratelimit-remaining-tokens:
|
||||
- '29702'
|
||||
- X-RATELIMIT-REMAINING-TOKENS-XXX
|
||||
x-ratelimit-reset-requests:
|
||||
- 120ms
|
||||
- X-RATELIMIT-RESET-REQUESTS-XXX
|
||||
x-ratelimit-reset-tokens:
|
||||
- 596ms
|
||||
- X-RATELIMIT-RESET-TOKENS-XXX
|
||||
x-request-id:
|
||||
- req_REDACTED
|
||||
- X-REQUEST-ID-XXX
|
||||
status:
|
||||
code: 200
|
||||
message: OK
|
||||
- request:
|
||||
body: '{"messages":[{"role":"system","content":"You are Guardrail Agent. You are a expert at validating the output of a task. By providing effective feedback if the output is not valid.\nYour personal goal is: Validate the output of the task\n\nTo give my best complete final answer to the task respond using the exact following format:\n\nThought: I now can give a great answer\nFinal Answer: Your final answer must be the great and the most complete as possible, it must be outcome described.\n\nI MUST use these formats, my job depends on it!Ensure your final answer strictly adheres to the following OpenAPI schema: {\n \"type\": \"json_schema\",\n \"json_schema\": {\n \"name\": \"LLMGuardrailResult\",\n \"strict\": true,\n \"schema\": {\n \"properties\": {\n \"valid\": {\n \"description\": \"Whether the task output complies with the guardrail\",\n \"title\": \"Valid\",\n \"type\": \"boolean\"\n },\n \"feedback\": {\n \"anyOf\":
|
||||
[\n {\n \"type\": \"string\"\n },\n {\n \"type\": \"null\"\n }\n ],\n \"default\": null,\n \"description\": \"A feedback about the task output if it is not valid\",\n \"title\": \"Feedback\"\n }\n },\n \"required\": [\n \"valid\",\n \"feedback\"\n ],\n \"title\": \"LLMGuardrailResult\",\n \"type\": \"object\",\n \"additionalProperties\": false\n }\n }\n}\n\nDo not include the OpenAPI schema in the final output. Ensure the final output does not include any code block markers like ```json or ```python."},{"role":"user","content":"\n Ensure the following task result complies with the given guardrail.\n\n Task result:\n \n Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry''s standard dummy text ever\n \n\n Guardrail:\n Ensure
|
||||
the result has less than 500 words\n\n Your task:\n - Confirm if the Task result complies with the guardrail.\n - If not, provide clear feedback explaining what is wrong (e.g., by how much it violates the rule, or what specific part fails).\n - Focus only on identifying issues — do not propose corrections.\n - If the Task result complies with the guardrail, saying that is valid\n "}],"model":"gpt-4o"}'
|
||||
body: "{\"messages\":[{\"role\":\"system\",\"content\":\"You are Guardrail Agent.
|
||||
You are a expert at validating the output of a task. By providing effective
|
||||
feedback if the output is not valid.\\nYour personal goal is: Validate the output
|
||||
of the task\\nTo give my best complete final answer to the task respond using
|
||||
the exact following format:\\n\\nThought: I now can give a great answer\\nFinal
|
||||
Answer: Your final answer must be the great and the most complete as possible,
|
||||
it must be outcome described.\\n\\nI MUST use these formats, my job depends
|
||||
on it!\"},{\"role\":\"user\",\"content\":\"\\nCurrent Task: \\n Ensure
|
||||
the following task result complies with the given guardrail.\\n\\n Task
|
||||
result:\\n \\n Lorem Ipsum is simply dummy text of the printing
|
||||
and typesetting industry. Lorem Ipsum has been the industry's standard dummy
|
||||
text ever\\n \\n\\n Guardrail:\\n Ensure the result has
|
||||
less than 500 words\\n\\n Your task:\\n - Confirm if the Task
|
||||
result complies with the guardrail.\\n - If not, provide clear feedback
|
||||
explaining what is wrong (e.g., by how much it violates the rule, or what specific
|
||||
part fails).\\n - Focus only on identifying issues \u2014 do not propose
|
||||
corrections.\\n - If the Task result complies with the guardrail, saying
|
||||
that is valid\\n \\n\\nBegin! This is VERY important to you, use the
|
||||
tools available and give your best Final Answer, your job depends on it!\\n\\nThought:\"}],\"model\":\"gpt-4o\"}"
|
||||
headers:
|
||||
User-Agent:
|
||||
- X-USER-AGENT-XXX
|
||||
accept:
|
||||
- application/json
|
||||
accept-encoding:
|
||||
- gzip, deflate, zstd
|
||||
- ACCEPT-ENCODING-XXX
|
||||
authorization:
|
||||
- AUTHORIZATION-XXX
|
||||
connection:
|
||||
- keep-alive
|
||||
content-length:
|
||||
- '2453'
|
||||
- '1468'
|
||||
content-type:
|
||||
- application/json
|
||||
host:
|
||||
- api.openai.com
|
||||
user-agent:
|
||||
- OpenAI/Python 1.109.1
|
||||
x-stainless-arch:
|
||||
- arm64
|
||||
- X-STAINLESS-ARCH-XXX
|
||||
x-stainless-async:
|
||||
- 'false'
|
||||
x-stainless-lang:
|
||||
- python
|
||||
x-stainless-os:
|
||||
- MacOS
|
||||
- X-STAINLESS-OS-XXX
|
||||
x-stainless-package-version:
|
||||
- 1.109.1
|
||||
- 1.83.0
|
||||
x-stainless-read-timeout:
|
||||
- '600'
|
||||
- X-STAINLESS-READ-TIMEOUT-XXX
|
||||
x-stainless-retry-count:
|
||||
- '0'
|
||||
x-stainless-runtime:
|
||||
- CPython
|
||||
x-stainless-runtime-version:
|
||||
- 3.12.9
|
||||
- 3.13.3
|
||||
method: POST
|
||||
uri: https://api.openai.com/v1/chat/completions
|
||||
response:
|
||||
body:
|
||||
string: "{\n \"id\": \"chatcmpl-CYgBMV6fu7EvV2BqzMdJaKyLAg1WW\",\n \"object\": \"chat.completion\",\n \"created\": 1762381336,\n \"model\": \"gpt-4o-2024-08-06\",\n \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\": \"assistant\",\n \"content\": \"Thought: I now can give a great answer\\nFinal Answer: {\\\"valid\\\": true, \\\"feedback\\\": null}\",\n \"refusal\": null,\n \"annotations\": []\n },\n \"logprobs\": null,\n \"finish_reason\": \"stop\"\n }\n ],\n \"usage\": {\n \"prompt_tokens\": 489,\n \"completion_tokens\": 23,\n \"total_tokens\": 512,\n \"prompt_tokens_details\": {\n \"cached_tokens\": 0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\": {\n \"reasoning_tokens\": 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\": 0,\n \"rejected_prediction_tokens\": 0\n }\n },\n \"service_tier\": \"default\",\n \"system_fingerprint\"\
|
||||
: \"fp_cbf1785567\"\n}\n"
|
||||
string: "{\n \"id\": \"chatcmpl-Cy7yKa0rmi2YoTLpyXt9hjeLt2rTI\",\n \"object\":
|
||||
\"chat.completion\",\n \"created\": 1768446360,\n \"model\": \"gpt-4o-2024-08-06\",\n
|
||||
\ \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\":
|
||||
\"assistant\",\n \"content\": \"First, I'll count the number of words
|
||||
in the Task result to ensure it complies with the guardrail. \\n\\nThe Task
|
||||
result is: \\\"Lorem Ipsum is simply dummy text of the printing and typesetting
|
||||
industry. Lorem Ipsum has been the industry's standard dummy text ever.\\\"\\n\\nBy
|
||||
counting the words: \\n1. Lorem\\n2. Ipsum\\n3. is\\n4. simply\\n5. dummy\\n6.
|
||||
text\\n7. of\\n8. the\\n9. printing\\n10. and\\n11. typesetting\\n12. industry\\n13.
|
||||
Lorem\\n14. Ipsum\\n15. has\\n16. been\\n17. the\\n18. industry's\\n19. standard\\n20.
|
||||
dummy\\n21. text\\n22. ever\\n\\nThere are 22 words total in the Task result.\\n\\nI
|
||||
need to verify if the count of 22 words is less than the guardrail limit of
|
||||
500 words.\\n\\nThought: I now can give a great answer\\nFinal Answer: The
|
||||
Task result complies with the guardrail as it contains 22 words, which is
|
||||
less than the 500-word limit. Therefore, the output is valid.\",\n \"refusal\":
|
||||
null,\n \"annotations\": []\n },\n \"logprobs\": null,\n
|
||||
\ \"finish_reason\": \"stop\"\n }\n ],\n \"usage\": {\n \"prompt_tokens\":
|
||||
285,\n \"completion_tokens\": 227,\n \"total_tokens\": 512,\n \"prompt_tokens_details\":
|
||||
{\n \"cached_tokens\": 0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\":
|
||||
{\n \"reasoning_tokens\": 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\":
|
||||
0,\n \"rejected_prediction_tokens\": 0\n }\n },\n \"service_tier\":
|
||||
\"default\",\n \"system_fingerprint\": \"fp_deacdd5f6f\"\n}\n"
|
||||
headers:
|
||||
CF-RAY:
|
||||
- REDACTED-RAY
|
||||
- CF-RAY-XXX
|
||||
Connection:
|
||||
- keep-alive
|
||||
Content-Type:
|
||||
- application/json
|
||||
Date:
|
||||
- Wed, 05 Nov 2025 22:22:16 GMT
|
||||
- Thu, 15 Jan 2026 03:06:02 GMT
|
||||
Server:
|
||||
- cloudflare
|
||||
Set-Cookie:
|
||||
- __cf_bm=REDACTED; path=/; expires=Wed, 05-Nov-25 22:52:16 GMT; domain=.api.openai.com; HttpOnly; Secure; SameSite=None
|
||||
- _cfuvid=REDACTED; path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None
|
||||
- SET-COOKIE-XXX
|
||||
Strict-Transport-Security:
|
||||
- max-age=31536000; includeSubDomains; preload
|
||||
- STS-XXX
|
||||
Transfer-Encoding:
|
||||
- chunked
|
||||
X-Content-Type-Options:
|
||||
- nosniff
|
||||
- X-CONTENT-TYPE-XXX
|
||||
access-control-expose-headers:
|
||||
- X-Request-ID
|
||||
- ACCESS-CONTROL-XXX
|
||||
alt-svc:
|
||||
- h3=":443"; ma=86400
|
||||
cf-cache-status:
|
||||
- DYNAMIC
|
||||
content-length:
|
||||
- '1668'
|
||||
openai-organization:
|
||||
- user-hortuttj2f3qtmxyik2zxf4q
|
||||
- OPENAI-ORG-XXX
|
||||
openai-processing-ms:
|
||||
- '327'
|
||||
- '2502'
|
||||
openai-project:
|
||||
- proj_fL4UBWR1CMpAAdgzaSKqsVvA
|
||||
- OPENAI-PROJECT-XXX
|
||||
openai-version:
|
||||
- '2020-10-01'
|
||||
x-envoy-upstream-service-time:
|
||||
- '372'
|
||||
- '2522'
|
||||
x-openai-proxy-wasm:
|
||||
- v0.1
|
||||
x-ratelimit-limit-requests:
|
||||
- '500'
|
||||
- X-RATELIMIT-LIMIT-REQUESTS-XXX
|
||||
x-ratelimit-limit-tokens:
|
||||
- '30000'
|
||||
- X-RATELIMIT-LIMIT-TOKENS-XXX
|
||||
x-ratelimit-remaining-requests:
|
||||
- '499'
|
||||
- X-RATELIMIT-REMAINING-REQUESTS-XXX
|
||||
x-ratelimit-remaining-tokens:
|
||||
- '29438'
|
||||
- X-RATELIMIT-REMAINING-TOKENS-XXX
|
||||
x-ratelimit-reset-requests:
|
||||
- 120ms
|
||||
- X-RATELIMIT-RESET-REQUESTS-XXX
|
||||
x-ratelimit-reset-tokens:
|
||||
- 1.124s
|
||||
- X-RATELIMIT-RESET-TOKENS-XXX
|
||||
x-request-id:
|
||||
- req_REDACTED
|
||||
- X-REQUEST-ID-XXX
|
||||
status:
|
||||
code: 200
|
||||
message: OK
|
||||
- request:
|
||||
body: '{"messages":[{"role":"system","content":"Ensure your final answer strictly adheres to the following OpenAPI schema: {\n \"type\": \"json_schema\",\n \"json_schema\": {\n \"name\": \"LLMGuardrailResult\",\n \"strict\": true,\n \"schema\": {\n \"properties\": {\n \"valid\": {\n \"description\": \"Whether the task output complies with the guardrail\",\n \"title\": \"Valid\",\n \"type\": \"boolean\"\n },\n \"feedback\": {\n \"anyOf\": [\n {\n \"type\": \"string\"\n },\n {\n \"type\": \"null\"\n }\n ],\n \"default\": null,\n \"description\": \"A feedback about the task output if it is not valid\",\n \"title\": \"Feedback\"\n }\n },\n \"required\": [\n \"valid\",\n \"feedback\"\n ],\n \"title\": \"LLMGuardrailResult\",\n \"type\": \"object\",\n \"additionalProperties\":
|
||||
false\n }\n }\n}\n\nDo not include the OpenAPI schema in the final output. Ensure the final output does not include any code block markers like ```json or ```python."},{"role":"user","content":"{\"valid\": true, \"feedback\": null}"}],"model":"gpt-4o","response_format":{"type":"json_schema","json_schema":{"schema":{"properties":{"valid":{"description":"Whether the task output complies with the guardrail","title":"Valid","type":"boolean"},"feedback":{"anyOf":[{"type":"string"},{"type":"null"}],"description":"A feedback about the task output if it is not valid","title":"Feedback"}},"required":["valid","feedback"],"title":"LLMGuardrailResult","type":"object","additionalProperties":false},"name":"LLMGuardrailResult","strict":true}},"stream":false}'
|
||||
body: '{"messages":[{"role":"system","content":"Ensure your final answer strictly
|
||||
adheres to the following OpenAPI schema: {\n \"type\": \"json_schema\",\n \"json_schema\":
|
||||
{\n \"name\": \"LLMGuardrailResult\",\n \"strict\": true,\n \"schema\":
|
||||
{\n \"properties\": {\n \"valid\": {\n \"description\":
|
||||
\"Whether the task output complies with the guardrail\",\n \"title\":
|
||||
\"Valid\",\n \"type\": \"boolean\"\n },\n \"feedback\":
|
||||
{\n \"anyOf\": [\n {\n \"type\": \"string\"\n },\n {\n \"type\":
|
||||
\"null\"\n }\n ],\n \"default\": null,\n \"description\":
|
||||
\"A feedback about the task output if it is not valid\",\n \"title\":
|
||||
\"Feedback\"\n }\n },\n \"required\": [\n \"valid\",\n \"feedback\"\n ],\n \"title\":
|
||||
\"LLMGuardrailResult\",\n \"type\": \"object\",\n \"additionalProperties\":
|
||||
false\n }\n }\n}\n\nDo not include the OpenAPI schema in the final output.
|
||||
Ensure the final output does not include any code block markers like ```json
|
||||
or ```python."},{"role":"user","content":"The Task result complies with the
|
||||
guardrail as it contains 22 words, which is less than the 500-word limit. Therefore,
|
||||
the output is valid."}],"model":"gpt-4o","response_format":{"type":"json_schema","json_schema":{"schema":{"properties":{"valid":{"description":"Whether
|
||||
the task output complies with the guardrail","title":"Valid","type":"boolean"},"feedback":{"anyOf":[{"type":"string"},{"type":"null"}],"description":"A
|
||||
feedback about the task output if it is not valid","title":"Feedback"}},"required":["valid","feedback"],"title":"LLMGuardrailResult","type":"object","additionalProperties":false},"name":"LLMGuardrailResult","strict":true}},"stream":false}'
|
||||
headers:
|
||||
User-Agent:
|
||||
- X-USER-AGENT-XXX
|
||||
accept:
|
||||
- application/json
|
||||
accept-encoding:
|
||||
- gzip, deflate, zstd
|
||||
- ACCEPT-ENCODING-XXX
|
||||
authorization:
|
||||
- AUTHORIZATION-XXX
|
||||
connection:
|
||||
- keep-alive
|
||||
content-length:
|
||||
- '1762'
|
||||
- '1864'
|
||||
content-type:
|
||||
- application/json
|
||||
cookie:
|
||||
- __cf_bm=REDACTED; _cfuvid=REDACTED
|
||||
- COOKIE-XXX
|
||||
host:
|
||||
- api.openai.com
|
||||
user-agent:
|
||||
- OpenAI/Python 1.109.1
|
||||
x-stainless-arch:
|
||||
- arm64
|
||||
- X-STAINLESS-ARCH-XXX
|
||||
x-stainless-async:
|
||||
- 'false'
|
||||
x-stainless-helper-method:
|
||||
- chat.completions.parse
|
||||
- beta.chat.completions.parse
|
||||
x-stainless-lang:
|
||||
- python
|
||||
x-stainless-os:
|
||||
- MacOS
|
||||
- X-STAINLESS-OS-XXX
|
||||
x-stainless-package-version:
|
||||
- 1.109.1
|
||||
- 1.83.0
|
||||
x-stainless-read-timeout:
|
||||
- '600'
|
||||
- X-STAINLESS-READ-TIMEOUT-XXX
|
||||
x-stainless-retry-count:
|
||||
- '0'
|
||||
x-stainless-runtime:
|
||||
- CPython
|
||||
x-stainless-runtime-version:
|
||||
- 3.12.9
|
||||
- 3.13.3
|
||||
method: POST
|
||||
uri: https://api.openai.com/v1/chat/completions
|
||||
response:
|
||||
body:
|
||||
string: "{\n \"id\": \"chatcmpl-CYgBMU20R45qGGaLN6vNAmW1NR4R6\",\n \"object\": \"chat.completion\",\n \"created\": 1762381336,\n \"model\": \"gpt-4o-2024-08-06\",\n \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\": \"assistant\",\n \"content\": \"{\\\"valid\\\":true,\\\"feedback\\\":null}\",\n \"refusal\": null,\n \"annotations\": []\n },\n \"logprobs\": null,\n \"finish_reason\": \"stop\"\n }\n ],\n \"usage\": {\n \"prompt_tokens\": 347,\n \"completion_tokens\": 9,\n \"total_tokens\": 356,\n \"prompt_tokens_details\": {\n \"cached_tokens\": 0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\": {\n \"reasoning_tokens\": 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\": 0,\n \"rejected_prediction_tokens\": 0\n }\n },\n \"service_tier\": \"default\",\n \"system_fingerprint\": \"fp_cbf1785567\"\n}\n"
|
||||
string: "{\n \"id\": \"chatcmpl-Cy7yMAjNYSCz2foZPEcSVCuapzF8y\",\n \"object\":
|
||||
\"chat.completion\",\n \"created\": 1768446362,\n \"model\": \"gpt-4o-2024-08-06\",\n
|
||||
\ \"choices\": [\n {\n \"index\": 0,\n \"message\": {\n \"role\":
|
||||
\"assistant\",\n \"content\": \"{\\\"valid\\\":true,\\\"feedback\\\":null}\",\n
|
||||
\ \"refusal\": null,\n \"annotations\": []\n },\n \"logprobs\":
|
||||
null,\n \"finish_reason\": \"stop\"\n }\n ],\n \"usage\": {\n \"prompt_tokens\":
|
||||
369,\n \"completion_tokens\": 9,\n \"total_tokens\": 378,\n \"prompt_tokens_details\":
|
||||
{\n \"cached_tokens\": 0,\n \"audio_tokens\": 0\n },\n \"completion_tokens_details\":
|
||||
{\n \"reasoning_tokens\": 0,\n \"audio_tokens\": 0,\n \"accepted_prediction_tokens\":
|
||||
0,\n \"rejected_prediction_tokens\": 0\n }\n },\n \"service_tier\":
|
||||
\"default\",\n \"system_fingerprint\": \"fp_a0e9480a2f\"\n}\n"
|
||||
headers:
|
||||
CF-RAY:
|
||||
- REDACTED-RAY
|
||||
- CF-RAY-XXX
|
||||
Connection:
|
||||
- keep-alive
|
||||
Content-Type:
|
||||
- application/json
|
||||
Date:
|
||||
- Wed, 05 Nov 2025 22:22:17 GMT
|
||||
- Thu, 15 Jan 2026 03:06:03 GMT
|
||||
Server:
|
||||
- cloudflare
|
||||
Strict-Transport-Security:
|
||||
- max-age=31536000; includeSubDomains; preload
|
||||
- STS-XXX
|
||||
Transfer-Encoding:
|
||||
- chunked
|
||||
X-Content-Type-Options:
|
||||
- nosniff
|
||||
- X-CONTENT-TYPE-XXX
|
||||
access-control-expose-headers:
|
||||
- X-Request-ID
|
||||
- ACCESS-CONTROL-XXX
|
||||
alt-svc:
|
||||
- h3=":443"; ma=86400
|
||||
cf-cache-status:
|
||||
- DYNAMIC
|
||||
content-length:
|
||||
- '837'
|
||||
openai-organization:
|
||||
- user-hortuttj2f3qtmxyik2zxf4q
|
||||
- OPENAI-ORG-XXX
|
||||
openai-processing-ms:
|
||||
- '1081'
|
||||
- '413'
|
||||
openai-project:
|
||||
- proj_fL4UBWR1CMpAAdgzaSKqsVvA
|
||||
- OPENAI-PROJECT-XXX
|
||||
openai-version:
|
||||
- '2020-10-01'
|
||||
x-envoy-upstream-service-time:
|
||||
- '1241'
|
||||
- '650'
|
||||
x-openai-proxy-wasm:
|
||||
- v0.1
|
||||
x-ratelimit-limit-requests:
|
||||
- '500'
|
||||
- X-RATELIMIT-LIMIT-REQUESTS-XXX
|
||||
x-ratelimit-limit-tokens:
|
||||
- '30000'
|
||||
- X-RATELIMIT-LIMIT-TOKENS-XXX
|
||||
x-ratelimit-remaining-requests:
|
||||
- '499'
|
||||
- X-RATELIMIT-REMAINING-REQUESTS-XXX
|
||||
x-ratelimit-remaining-tokens:
|
||||
- '29478'
|
||||
- X-RATELIMIT-REMAINING-TOKENS-XXX
|
||||
x-ratelimit-reset-requests:
|
||||
- 120ms
|
||||
- X-RATELIMIT-RESET-REQUESTS-XXX
|
||||
x-ratelimit-reset-tokens:
|
||||
- 1.042s
|
||||
- X-RATELIMIT-RESET-TOKENS-XXX
|
||||
x-request-id:
|
||||
- req_REDACTED
|
||||
- X-REQUEST-ID-XXX
|
||||
status:
|
||||
code: 200
|
||||
message: OK
|
||||
|
||||
1
lib/crewai/tests/files/__init__.py
Normal file
1
lib/crewai/tests/files/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""Tests for file processing utilities."""
|
||||
1
lib/crewai/tests/files/processing/__init__.py
Normal file
1
lib/crewai/tests/files/processing/__init__.py
Normal file
@@ -0,0 +1 @@
|
||||
"""Tests for file processing module."""
|
||||
226
lib/crewai/tests/files/processing/test_constraints.py
Normal file
226
lib/crewai/tests/files/processing/test_constraints.py
Normal file
@@ -0,0 +1,226 @@
|
||||
"""Tests for provider constraints."""
|
||||
|
||||
import pytest
|
||||
|
||||
from crewai.files.processing.constraints import (
|
||||
ANTHROPIC_CONSTRAINTS,
|
||||
BEDROCK_CONSTRAINTS,
|
||||
GEMINI_CONSTRAINTS,
|
||||
OPENAI_CONSTRAINTS,
|
||||
AudioConstraints,
|
||||
ImageConstraints,
|
||||
PDFConstraints,
|
||||
ProviderConstraints,
|
||||
VideoConstraints,
|
||||
get_constraints_for_provider,
|
||||
)
|
||||
|
||||
|
||||
class TestImageConstraints:
|
||||
"""Tests for ImageConstraints dataclass."""
|
||||
|
||||
def test_image_constraints_creation(self):
|
||||
"""Test creating image constraints with all fields."""
|
||||
constraints = ImageConstraints(
|
||||
max_size_bytes=5 * 1024 * 1024,
|
||||
max_width=8000,
|
||||
max_height=8000,
|
||||
max_images_per_request=10,
|
||||
)
|
||||
|
||||
assert constraints.max_size_bytes == 5 * 1024 * 1024
|
||||
assert constraints.max_width == 8000
|
||||
assert constraints.max_height == 8000
|
||||
assert constraints.max_images_per_request == 10
|
||||
|
||||
def test_image_constraints_defaults(self):
|
||||
"""Test image constraints with default values."""
|
||||
constraints = ImageConstraints(max_size_bytes=1000)
|
||||
|
||||
assert constraints.max_size_bytes == 1000
|
||||
assert constraints.max_width is None
|
||||
assert constraints.max_height is None
|
||||
assert constraints.max_images_per_request is None
|
||||
assert "image/png" in constraints.supported_formats
|
||||
|
||||
def test_image_constraints_frozen(self):
|
||||
"""Test that image constraints are immutable."""
|
||||
constraints = ImageConstraints(max_size_bytes=1000)
|
||||
|
||||
with pytest.raises(Exception):
|
||||
constraints.max_size_bytes = 2000
|
||||
|
||||
|
||||
class TestPDFConstraints:
|
||||
"""Tests for PDFConstraints dataclass."""
|
||||
|
||||
def test_pdf_constraints_creation(self):
|
||||
"""Test creating PDF constraints."""
|
||||
constraints = PDFConstraints(
|
||||
max_size_bytes=30 * 1024 * 1024,
|
||||
max_pages=100,
|
||||
)
|
||||
|
||||
assert constraints.max_size_bytes == 30 * 1024 * 1024
|
||||
assert constraints.max_pages == 100
|
||||
|
||||
def test_pdf_constraints_defaults(self):
|
||||
"""Test PDF constraints with default values."""
|
||||
constraints = PDFConstraints(max_size_bytes=1000)
|
||||
|
||||
assert constraints.max_size_bytes == 1000
|
||||
assert constraints.max_pages is None
|
||||
|
||||
|
||||
class TestAudioConstraints:
|
||||
"""Tests for AudioConstraints dataclass."""
|
||||
|
||||
def test_audio_constraints_creation(self):
|
||||
"""Test creating audio constraints."""
|
||||
constraints = AudioConstraints(
|
||||
max_size_bytes=100 * 1024 * 1024,
|
||||
max_duration_seconds=3600,
|
||||
)
|
||||
|
||||
assert constraints.max_size_bytes == 100 * 1024 * 1024
|
||||
assert constraints.max_duration_seconds == 3600
|
||||
assert "audio/mp3" in constraints.supported_formats
|
||||
|
||||
|
||||
class TestVideoConstraints:
|
||||
"""Tests for VideoConstraints dataclass."""
|
||||
|
||||
def test_video_constraints_creation(self):
|
||||
"""Test creating video constraints."""
|
||||
constraints = VideoConstraints(
|
||||
max_size_bytes=2 * 1024 * 1024 * 1024,
|
||||
max_duration_seconds=7200,
|
||||
)
|
||||
|
||||
assert constraints.max_size_bytes == 2 * 1024 * 1024 * 1024
|
||||
assert constraints.max_duration_seconds == 7200
|
||||
assert "video/mp4" in constraints.supported_formats
|
||||
|
||||
|
||||
class TestProviderConstraints:
|
||||
"""Tests for ProviderConstraints dataclass."""
|
||||
|
||||
def test_provider_constraints_creation(self):
|
||||
"""Test creating full provider constraints."""
|
||||
constraints = ProviderConstraints(
|
||||
name="test-provider",
|
||||
image=ImageConstraints(max_size_bytes=5 * 1024 * 1024),
|
||||
pdf=PDFConstraints(max_size_bytes=30 * 1024 * 1024),
|
||||
supports_file_upload=True,
|
||||
file_upload_threshold_bytes=10 * 1024 * 1024,
|
||||
)
|
||||
|
||||
assert constraints.name == "test-provider"
|
||||
assert constraints.image is not None
|
||||
assert constraints.pdf is not None
|
||||
assert constraints.supports_file_upload is True
|
||||
|
||||
def test_provider_constraints_defaults(self):
|
||||
"""Test provider constraints with default values."""
|
||||
constraints = ProviderConstraints(name="test")
|
||||
|
||||
assert constraints.name == "test"
|
||||
assert constraints.image is None
|
||||
assert constraints.pdf is None
|
||||
assert constraints.audio is None
|
||||
assert constraints.video is None
|
||||
assert constraints.supports_file_upload is False
|
||||
|
||||
|
||||
class TestPredefinedConstraints:
|
||||
"""Tests for predefined provider constraints."""
|
||||
|
||||
def test_anthropic_constraints(self):
|
||||
"""Test Anthropic constraints are properly defined."""
|
||||
assert ANTHROPIC_CONSTRAINTS.name == "anthropic"
|
||||
assert ANTHROPIC_CONSTRAINTS.image is not None
|
||||
assert ANTHROPIC_CONSTRAINTS.image.max_size_bytes == 5 * 1024 * 1024
|
||||
assert ANTHROPIC_CONSTRAINTS.image.max_width == 8000
|
||||
assert ANTHROPIC_CONSTRAINTS.pdf is not None
|
||||
assert ANTHROPIC_CONSTRAINTS.pdf.max_pages == 100
|
||||
assert ANTHROPIC_CONSTRAINTS.supports_file_upload is True
|
||||
|
||||
def test_openai_constraints(self):
|
||||
"""Test OpenAI constraints are properly defined."""
|
||||
assert OPENAI_CONSTRAINTS.name == "openai"
|
||||
assert OPENAI_CONSTRAINTS.image is not None
|
||||
assert OPENAI_CONSTRAINTS.image.max_size_bytes == 20 * 1024 * 1024
|
||||
assert OPENAI_CONSTRAINTS.pdf is None # OpenAI doesn't support PDFs
|
||||
|
||||
def test_gemini_constraints(self):
|
||||
"""Test Gemini constraints are properly defined."""
|
||||
assert GEMINI_CONSTRAINTS.name == "gemini"
|
||||
assert GEMINI_CONSTRAINTS.image is not None
|
||||
assert GEMINI_CONSTRAINTS.pdf is not None
|
||||
assert GEMINI_CONSTRAINTS.audio is not None
|
||||
assert GEMINI_CONSTRAINTS.video is not None
|
||||
assert GEMINI_CONSTRAINTS.supports_file_upload is True
|
||||
|
||||
def test_bedrock_constraints(self):
|
||||
"""Test Bedrock constraints are properly defined."""
|
||||
assert BEDROCK_CONSTRAINTS.name == "bedrock"
|
||||
assert BEDROCK_CONSTRAINTS.image is not None
|
||||
assert BEDROCK_CONSTRAINTS.image.max_size_bytes == 4_608_000
|
||||
assert BEDROCK_CONSTRAINTS.pdf is not None
|
||||
assert BEDROCK_CONSTRAINTS.supports_file_upload is False
|
||||
|
||||
|
||||
class TestGetConstraintsForProvider:
|
||||
"""Tests for get_constraints_for_provider function."""
|
||||
|
||||
def test_get_by_exact_name(self):
|
||||
"""Test getting constraints by exact provider name."""
|
||||
result = get_constraints_for_provider("anthropic")
|
||||
assert result == ANTHROPIC_CONSTRAINTS
|
||||
|
||||
result = get_constraints_for_provider("openai")
|
||||
assert result == OPENAI_CONSTRAINTS
|
||||
|
||||
result = get_constraints_for_provider("gemini")
|
||||
assert result == GEMINI_CONSTRAINTS
|
||||
|
||||
def test_get_by_alias(self):
|
||||
"""Test getting constraints by alias name."""
|
||||
result = get_constraints_for_provider("claude")
|
||||
assert result == ANTHROPIC_CONSTRAINTS
|
||||
|
||||
result = get_constraints_for_provider("gpt")
|
||||
assert result == OPENAI_CONSTRAINTS
|
||||
|
||||
result = get_constraints_for_provider("google")
|
||||
assert result == GEMINI_CONSTRAINTS
|
||||
|
||||
def test_get_case_insensitive(self):
|
||||
"""Test case-insensitive lookup."""
|
||||
result = get_constraints_for_provider("ANTHROPIC")
|
||||
assert result == ANTHROPIC_CONSTRAINTS
|
||||
|
||||
result = get_constraints_for_provider("OpenAI")
|
||||
assert result == OPENAI_CONSTRAINTS
|
||||
|
||||
def test_get_with_provider_constraints_object(self):
|
||||
"""Test passing ProviderConstraints object returns it unchanged."""
|
||||
custom = ProviderConstraints(name="custom")
|
||||
result = get_constraints_for_provider(custom)
|
||||
assert result is custom
|
||||
|
||||
def test_get_unknown_provider(self):
|
||||
"""Test unknown provider returns None."""
|
||||
result = get_constraints_for_provider("unknown-provider")
|
||||
assert result is None
|
||||
|
||||
def test_get_by_partial_match(self):
|
||||
"""Test partial match in provider string."""
|
||||
result = get_constraints_for_provider("claude-3-sonnet")
|
||||
assert result == ANTHROPIC_CONSTRAINTS
|
||||
|
||||
result = get_constraints_for_provider("gpt-4o")
|
||||
assert result == OPENAI_CONSTRAINTS
|
||||
|
||||
result = get_constraints_for_provider("gemini-pro")
|
||||
assert result == GEMINI_CONSTRAINTS
|
||||
220
lib/crewai/tests/files/processing/test_processor.py
Normal file
220
lib/crewai/tests/files/processing/test_processor.py
Normal file
@@ -0,0 +1,220 @@
|
||||
"""Tests for FileProcessor class."""
|
||||
|
||||
import pytest
|
||||
|
||||
from crewai.files import FileBytes, ImageFile, PDFFile, TextFile
|
||||
from crewai.files.processing.constraints import (
|
||||
ANTHROPIC_CONSTRAINTS,
|
||||
ImageConstraints,
|
||||
PDFConstraints,
|
||||
ProviderConstraints,
|
||||
)
|
||||
from crewai.files.processing.enums import FileHandling
|
||||
from crewai.files.processing.exceptions import (
|
||||
FileTooLargeError,
|
||||
FileValidationError,
|
||||
)
|
||||
from crewai.files.processing.processor import FileProcessor
|
||||
|
||||
|
||||
# Minimal valid PNG: 8x8 pixel RGB image (valid for PIL)
|
||||
MINIMAL_PNG = bytes([
|
||||
0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x00, 0x00, 0x00, 0x0d,
|
||||
0x49, 0x48, 0x44, 0x52, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x08,
|
||||
0x08, 0x02, 0x00, 0x00, 0x00, 0x4b, 0x6d, 0x29, 0xdc, 0x00, 0x00, 0x00,
|
||||
0x12, 0x49, 0x44, 0x41, 0x54, 0x78, 0x9c, 0x63, 0xfc, 0xcf, 0x80, 0x1d,
|
||||
0x30, 0xe1, 0x10, 0x1f, 0xa4, 0x12, 0x00, 0xcd, 0x41, 0x01, 0x0f, 0xe8,
|
||||
0x41, 0xe2, 0x6f, 0x00, 0x00, 0x00, 0x00, 0x49, 0x45, 0x4e, 0x44, 0xae,
|
||||
0x42, 0x60, 0x82,
|
||||
])
|
||||
|
||||
# Minimal valid PDF
|
||||
MINIMAL_PDF = (
|
||||
b"%PDF-1.4\n1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj "
|
||||
b"2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj "
|
||||
b"3 0 obj<</Type/Page/MediaBox[0 0 612 792]/Parent 2 0 R>>endobj "
|
||||
b"xref\n0 4\n0000000000 65535 f \n0000000009 00000 n \n"
|
||||
b"0000000052 00000 n \n0000000101 00000 n \n"
|
||||
b"trailer<</Size 4/Root 1 0 R>>\nstartxref\n178\n%%EOF"
|
||||
)
|
||||
|
||||
|
||||
class TestFileProcessorInit:
|
||||
"""Tests for FileProcessor initialization."""
|
||||
|
||||
def test_init_with_constraints(self):
|
||||
"""Test initialization with ProviderConstraints."""
|
||||
processor = FileProcessor(constraints=ANTHROPIC_CONSTRAINTS)
|
||||
|
||||
assert processor.constraints == ANTHROPIC_CONSTRAINTS
|
||||
|
||||
def test_init_with_provider_string(self):
|
||||
"""Test initialization with provider name string."""
|
||||
processor = FileProcessor(constraints="anthropic")
|
||||
|
||||
assert processor.constraints == ANTHROPIC_CONSTRAINTS
|
||||
|
||||
def test_init_with_unknown_provider(self):
|
||||
"""Test initialization with unknown provider sets constraints to None."""
|
||||
processor = FileProcessor(constraints="unknown")
|
||||
|
||||
assert processor.constraints is None
|
||||
|
||||
def test_init_with_none_constraints(self):
|
||||
"""Test initialization with None constraints."""
|
||||
processor = FileProcessor(constraints=None)
|
||||
|
||||
assert processor.constraints is None
|
||||
|
||||
|
||||
class TestFileProcessorValidate:
|
||||
"""Tests for FileProcessor.validate method."""
|
||||
|
||||
def test_validate_valid_file(self):
|
||||
"""Test validating a valid file returns no errors."""
|
||||
processor = FileProcessor(constraints=ANTHROPIC_CONSTRAINTS)
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
errors = processor.validate(file)
|
||||
|
||||
assert len(errors) == 0
|
||||
|
||||
def test_validate_without_constraints(self):
|
||||
"""Test validating without constraints returns empty list."""
|
||||
processor = FileProcessor(constraints=None)
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
errors = processor.validate(file)
|
||||
|
||||
assert len(errors) == 0
|
||||
|
||||
def test_validate_strict_raises_on_error(self):
|
||||
"""Test STRICT mode raises on validation error."""
|
||||
constraints = ProviderConstraints(
|
||||
name="test",
|
||||
image=ImageConstraints(max_size_bytes=10),
|
||||
)
|
||||
processor = FileProcessor(constraints=constraints)
|
||||
# Set mode to strict on the file
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"), mode="strict")
|
||||
|
||||
with pytest.raises(FileTooLargeError):
|
||||
processor.validate(file)
|
||||
|
||||
|
||||
class TestFileProcessorProcess:
|
||||
"""Tests for FileProcessor.process method."""
|
||||
|
||||
def test_process_valid_file(self):
|
||||
"""Test processing a valid file returns it unchanged."""
|
||||
processor = FileProcessor(constraints=ANTHROPIC_CONSTRAINTS)
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
result = processor.process(file)
|
||||
|
||||
assert result == file
|
||||
|
||||
def test_process_without_constraints(self):
|
||||
"""Test processing without constraints returns file unchanged."""
|
||||
processor = FileProcessor(constraints=None)
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
result = processor.process(file)
|
||||
|
||||
assert result == file
|
||||
|
||||
def test_process_strict_raises_on_error(self):
|
||||
"""Test STRICT mode raises on processing error."""
|
||||
constraints = ProviderConstraints(
|
||||
name="test",
|
||||
image=ImageConstraints(max_size_bytes=10),
|
||||
)
|
||||
processor = FileProcessor(constraints=constraints)
|
||||
# Set mode to strict on the file
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"), mode="strict")
|
||||
|
||||
with pytest.raises(FileTooLargeError):
|
||||
processor.process(file)
|
||||
|
||||
def test_process_warn_returns_file(self):
|
||||
"""Test WARN mode returns file with warning."""
|
||||
constraints = ProviderConstraints(
|
||||
name="test",
|
||||
image=ImageConstraints(max_size_bytes=10),
|
||||
)
|
||||
processor = FileProcessor(constraints=constraints)
|
||||
# Set mode to warn on the file
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"), mode="warn")
|
||||
|
||||
result = processor.process(file)
|
||||
|
||||
assert result == file
|
||||
|
||||
|
||||
class TestFileProcessorProcessFiles:
|
||||
"""Tests for FileProcessor.process_files method."""
|
||||
|
||||
def test_process_files_multiple(self):
|
||||
"""Test processing multiple files."""
|
||||
processor = FileProcessor(constraints=ANTHROPIC_CONSTRAINTS)
|
||||
files = {
|
||||
"image1": ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test1.png")),
|
||||
"image2": ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test2.png")),
|
||||
}
|
||||
|
||||
result = processor.process_files(files)
|
||||
|
||||
assert len(result) == 2
|
||||
assert "image1" in result
|
||||
assert "image2" in result
|
||||
|
||||
def test_process_files_empty(self):
|
||||
"""Test processing empty files dict."""
|
||||
processor = FileProcessor(constraints=ANTHROPIC_CONSTRAINTS)
|
||||
|
||||
result = processor.process_files({})
|
||||
|
||||
assert result == {}
|
||||
|
||||
|
||||
class TestFileHandlingEnum:
|
||||
"""Tests for FileHandling enum."""
|
||||
|
||||
def test_enum_values(self):
|
||||
"""Test all enum values are accessible."""
|
||||
assert FileHandling.STRICT.value == "strict"
|
||||
assert FileHandling.AUTO.value == "auto"
|
||||
assert FileHandling.WARN.value == "warn"
|
||||
assert FileHandling.CHUNK.value == "chunk"
|
||||
|
||||
|
||||
class TestFileProcessorPerFileMode:
|
||||
"""Tests for per-file mode handling."""
|
||||
|
||||
def test_file_default_mode_is_auto(self):
|
||||
"""Test that files default to auto mode."""
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
assert file.mode == "auto"
|
||||
|
||||
def test_file_custom_mode(self):
|
||||
"""Test setting custom mode on file."""
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"), mode="strict")
|
||||
assert file.mode == "strict"
|
||||
|
||||
def test_processor_respects_file_mode(self):
|
||||
"""Test processor uses each file's mode setting."""
|
||||
constraints = ProviderConstraints(
|
||||
name="test",
|
||||
image=ImageConstraints(max_size_bytes=10),
|
||||
)
|
||||
processor = FileProcessor(constraints=constraints)
|
||||
|
||||
# File with strict mode should raise
|
||||
strict_file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"), mode="strict")
|
||||
with pytest.raises(FileTooLargeError):
|
||||
processor.process(strict_file)
|
||||
|
||||
# File with warn mode should not raise
|
||||
warn_file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"), mode="warn")
|
||||
result = processor.process(warn_file)
|
||||
assert result == warn_file
|
||||
359
lib/crewai/tests/files/processing/test_transformers.py
Normal file
359
lib/crewai/tests/files/processing/test_transformers.py
Normal file
@@ -0,0 +1,359 @@
|
||||
"""Unit tests for file transformers."""
|
||||
|
||||
import io
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
from crewai.files import ImageFile, PDFFile, TextFile
|
||||
from crewai.files.file import FileBytes
|
||||
from crewai.files.processing.exceptions import ProcessingDependencyError
|
||||
from crewai.files.processing.transformers import (
|
||||
chunk_pdf,
|
||||
chunk_text,
|
||||
get_image_dimensions,
|
||||
get_pdf_page_count,
|
||||
optimize_image,
|
||||
resize_image,
|
||||
)
|
||||
|
||||
|
||||
def create_test_png(width: int = 100, height: int = 100) -> bytes:
|
||||
"""Create a minimal valid PNG for testing."""
|
||||
from PIL import Image
|
||||
|
||||
img = Image.new("RGB", (width, height), color="red")
|
||||
buffer = io.BytesIO()
|
||||
img.save(buffer, format="PNG")
|
||||
return buffer.getvalue()
|
||||
|
||||
|
||||
def create_test_pdf(num_pages: int = 1) -> bytes:
|
||||
"""Create a minimal valid PDF for testing."""
|
||||
from pypdf import PdfWriter
|
||||
|
||||
writer = PdfWriter()
|
||||
for _ in range(num_pages):
|
||||
writer.add_blank_page(width=612, height=792)
|
||||
|
||||
buffer = io.BytesIO()
|
||||
writer.write(buffer)
|
||||
return buffer.getvalue()
|
||||
|
||||
|
||||
class TestResizeImage:
|
||||
"""Tests for resize_image function."""
|
||||
|
||||
def test_resize_larger_image(self) -> None:
|
||||
"""Test resizing an image larger than max dimensions."""
|
||||
png_bytes = create_test_png(200, 150)
|
||||
img = ImageFile(source=FileBytes(data=png_bytes, filename="test.png"))
|
||||
|
||||
result = resize_image(img, max_width=100, max_height=100)
|
||||
|
||||
dims = get_image_dimensions(result)
|
||||
assert dims is not None
|
||||
width, height = dims
|
||||
assert width <= 100
|
||||
assert height <= 100
|
||||
|
||||
def test_no_resize_if_within_bounds(self) -> None:
|
||||
"""Test that small images are returned unchanged."""
|
||||
png_bytes = create_test_png(50, 50)
|
||||
img = ImageFile(source=FileBytes(data=png_bytes, filename="small.png"))
|
||||
|
||||
result = resize_image(img, max_width=100, max_height=100)
|
||||
|
||||
assert result is img
|
||||
|
||||
def test_preserve_aspect_ratio(self) -> None:
|
||||
"""Test that aspect ratio is preserved during resize."""
|
||||
png_bytes = create_test_png(200, 100)
|
||||
img = ImageFile(source=FileBytes(data=png_bytes, filename="wide.png"))
|
||||
|
||||
result = resize_image(img, max_width=100, max_height=100)
|
||||
|
||||
dims = get_image_dimensions(result)
|
||||
assert dims is not None
|
||||
width, height = dims
|
||||
assert width == 100
|
||||
assert height == 50
|
||||
|
||||
def test_resize_without_aspect_ratio(self) -> None:
|
||||
"""Test resizing without preserving aspect ratio."""
|
||||
png_bytes = create_test_png(200, 100)
|
||||
img = ImageFile(source=FileBytes(data=png_bytes, filename="wide.png"))
|
||||
|
||||
result = resize_image(
|
||||
img, max_width=50, max_height=50, preserve_aspect_ratio=False
|
||||
)
|
||||
|
||||
dims = get_image_dimensions(result)
|
||||
assert dims is not None
|
||||
width, height = dims
|
||||
assert width == 50
|
||||
assert height == 50
|
||||
|
||||
def test_resize_returns_image_file(self) -> None:
|
||||
"""Test that resize returns an ImageFile instance."""
|
||||
png_bytes = create_test_png(200, 200)
|
||||
img = ImageFile(source=FileBytes(data=png_bytes, filename="test.png"))
|
||||
|
||||
result = resize_image(img, max_width=100, max_height=100)
|
||||
|
||||
assert isinstance(result, ImageFile)
|
||||
|
||||
def test_raises_without_pillow(self) -> None:
|
||||
"""Test that ProcessingDependencyError is raised without Pillow."""
|
||||
img = ImageFile(source=FileBytes(data=b"fake", filename="test.png"))
|
||||
|
||||
with patch.dict("sys.modules", {"PIL": None, "PIL.Image": None}):
|
||||
with pytest.raises(ProcessingDependencyError) as exc_info:
|
||||
# Force reimport to trigger ImportError
|
||||
import importlib
|
||||
|
||||
import crewai.files.processing.transformers as t
|
||||
|
||||
importlib.reload(t)
|
||||
t.resize_image(img, 100, 100)
|
||||
|
||||
assert "Pillow" in str(exc_info.value)
|
||||
|
||||
|
||||
class TestOptimizeImage:
|
||||
"""Tests for optimize_image function."""
|
||||
|
||||
def test_optimize_reduces_size(self) -> None:
|
||||
"""Test that optimization reduces file size."""
|
||||
png_bytes = create_test_png(500, 500)
|
||||
original_size = len(png_bytes)
|
||||
img = ImageFile(source=FileBytes(data=png_bytes, filename="large.png"))
|
||||
|
||||
result = optimize_image(img, target_size_bytes=original_size // 2)
|
||||
|
||||
result_size = len(result.read())
|
||||
assert result_size < original_size
|
||||
|
||||
def test_no_optimize_if_under_target(self) -> None:
|
||||
"""Test that small images are returned unchanged."""
|
||||
png_bytes = create_test_png(50, 50)
|
||||
img = ImageFile(source=FileBytes(data=png_bytes, filename="small.png"))
|
||||
|
||||
result = optimize_image(img, target_size_bytes=1024 * 1024)
|
||||
|
||||
assert result is img
|
||||
|
||||
def test_optimize_returns_image_file(self) -> None:
|
||||
"""Test that optimize returns an ImageFile instance."""
|
||||
png_bytes = create_test_png(200, 200)
|
||||
img = ImageFile(source=FileBytes(data=png_bytes, filename="test.png"))
|
||||
|
||||
result = optimize_image(img, target_size_bytes=100)
|
||||
|
||||
assert isinstance(result, ImageFile)
|
||||
|
||||
def test_optimize_respects_min_quality(self) -> None:
|
||||
"""Test that optimization stops at minimum quality."""
|
||||
png_bytes = create_test_png(100, 100)
|
||||
img = ImageFile(source=FileBytes(data=png_bytes, filename="test.png"))
|
||||
|
||||
# Request impossibly small size - should stop at min quality
|
||||
result = optimize_image(img, target_size_bytes=10, min_quality=50)
|
||||
|
||||
assert isinstance(result, ImageFile)
|
||||
assert len(result.read()) > 10
|
||||
|
||||
|
||||
class TestChunkPdf:
|
||||
"""Tests for chunk_pdf function."""
|
||||
|
||||
def test_chunk_splits_large_pdf(self) -> None:
|
||||
"""Test that large PDFs are split into chunks."""
|
||||
pdf_bytes = create_test_pdf(num_pages=10)
|
||||
pdf = PDFFile(source=FileBytes(data=pdf_bytes, filename="large.pdf"))
|
||||
|
||||
result = list(chunk_pdf(pdf, max_pages=3))
|
||||
|
||||
assert len(result) == 4
|
||||
assert all(isinstance(chunk, PDFFile) for chunk in result)
|
||||
|
||||
def test_no_chunk_if_within_limit(self) -> None:
|
||||
"""Test that small PDFs are returned unchanged."""
|
||||
pdf_bytes = create_test_pdf(num_pages=3)
|
||||
pdf = PDFFile(source=FileBytes(data=pdf_bytes, filename="small.pdf"))
|
||||
|
||||
result = list(chunk_pdf(pdf, max_pages=5))
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0] is pdf
|
||||
|
||||
def test_chunk_filenames(self) -> None:
|
||||
"""Test that chunked files have indexed filenames."""
|
||||
pdf_bytes = create_test_pdf(num_pages=6)
|
||||
pdf = PDFFile(source=FileBytes(data=pdf_bytes, filename="document.pdf"))
|
||||
|
||||
result = list(chunk_pdf(pdf, max_pages=2))
|
||||
|
||||
assert result[0].filename == "document_chunk_0.pdf"
|
||||
assert result[1].filename == "document_chunk_1.pdf"
|
||||
assert result[2].filename == "document_chunk_2.pdf"
|
||||
|
||||
def test_chunk_with_overlap(self) -> None:
|
||||
"""Test chunking with overlapping pages."""
|
||||
pdf_bytes = create_test_pdf(num_pages=10)
|
||||
pdf = PDFFile(source=FileBytes(data=pdf_bytes, filename="doc.pdf"))
|
||||
|
||||
result = list(chunk_pdf(pdf, max_pages=4, overlap_pages=1))
|
||||
|
||||
# With overlap, we get more chunks
|
||||
assert len(result) >= 3
|
||||
|
||||
def test_chunk_page_counts(self) -> None:
|
||||
"""Test that each chunk has correct page count."""
|
||||
pdf_bytes = create_test_pdf(num_pages=7)
|
||||
pdf = PDFFile(source=FileBytes(data=pdf_bytes, filename="doc.pdf"))
|
||||
|
||||
result = list(chunk_pdf(pdf, max_pages=3))
|
||||
|
||||
page_counts = [get_pdf_page_count(chunk) for chunk in result]
|
||||
assert page_counts == [3, 3, 1]
|
||||
|
||||
|
||||
class TestChunkText:
|
||||
"""Tests for chunk_text function."""
|
||||
|
||||
def test_chunk_splits_large_text(self) -> None:
|
||||
"""Test that large text files are split into chunks."""
|
||||
content = "Hello world. " * 100
|
||||
text = TextFile(source=content.encode(), filename="large.txt")
|
||||
|
||||
result = list(chunk_text(text, max_chars=200, overlap_chars=0))
|
||||
|
||||
assert len(result) > 1
|
||||
assert all(isinstance(chunk, TextFile) for chunk in result)
|
||||
|
||||
def test_no_chunk_if_within_limit(self) -> None:
|
||||
"""Test that small text files are returned unchanged."""
|
||||
content = "Short text"
|
||||
text = TextFile(source=content.encode(), filename="small.txt")
|
||||
|
||||
result = list(chunk_text(text, max_chars=1000, overlap_chars=0))
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0] is text
|
||||
|
||||
def test_chunk_filenames(self) -> None:
|
||||
"""Test that chunked files have indexed filenames."""
|
||||
content = "A" * 500
|
||||
text = TextFile(source=FileBytes(data=content.encode(), filename="data.txt"))
|
||||
|
||||
result = list(chunk_text(text, max_chars=200, overlap_chars=0))
|
||||
|
||||
assert result[0].filename == "data_chunk_0.txt"
|
||||
assert result[1].filename == "data_chunk_1.txt"
|
||||
assert len(result) == 3
|
||||
|
||||
def test_chunk_preserves_extension(self) -> None:
|
||||
"""Test that file extension is preserved in chunks."""
|
||||
content = "A" * 500
|
||||
text = TextFile(source=FileBytes(data=content.encode(), filename="script.py"))
|
||||
|
||||
result = list(chunk_text(text, max_chars=200, overlap_chars=0))
|
||||
|
||||
assert all(chunk.filename.endswith(".py") for chunk in result)
|
||||
|
||||
def test_chunk_prefers_newline_boundaries(self) -> None:
|
||||
"""Test that chunking prefers to split at newlines."""
|
||||
content = "Line one\nLine two\nLine three\nLine four\nLine five"
|
||||
text = TextFile(source=content.encode(), filename="lines.txt")
|
||||
|
||||
result = list(chunk_text(text, max_chars=25, overlap_chars=0, split_on_newlines=True))
|
||||
|
||||
# Should split at newline boundaries
|
||||
for chunk in result:
|
||||
chunk_text_content = chunk.read().decode()
|
||||
# Chunks should end at newlines (except possibly the last)
|
||||
if chunk != result[-1]:
|
||||
assert chunk_text_content.endswith("\n") or len(chunk_text_content) <= 25
|
||||
|
||||
def test_chunk_with_overlap(self) -> None:
|
||||
"""Test chunking with overlapping characters."""
|
||||
content = "ABCDEFGHIJ" * 10
|
||||
text = TextFile(source=content.encode(), filename="data.txt")
|
||||
|
||||
result = list(chunk_text(text, max_chars=30, overlap_chars=5))
|
||||
|
||||
# With overlap, chunks should share some content
|
||||
assert len(result) >= 3
|
||||
|
||||
def test_chunk_overlap_larger_than_max_chars(self) -> None:
|
||||
"""Test that overlap > max_chars doesn't cause infinite loop."""
|
||||
content = "A" * 100
|
||||
text = TextFile(source=content.encode(), filename="data.txt")
|
||||
|
||||
# overlap_chars > max_chars should still work (just with max overlap)
|
||||
result = list(chunk_text(text, max_chars=20, overlap_chars=50))
|
||||
|
||||
assert len(result) > 1
|
||||
# Should still complete without hanging
|
||||
|
||||
|
||||
class TestGetImageDimensions:
|
||||
"""Tests for get_image_dimensions function."""
|
||||
|
||||
def test_get_dimensions(self) -> None:
|
||||
"""Test getting image dimensions."""
|
||||
png_bytes = create_test_png(150, 100)
|
||||
img = ImageFile(source=FileBytes(data=png_bytes, filename="test.png"))
|
||||
|
||||
dims = get_image_dimensions(img)
|
||||
|
||||
assert dims == (150, 100)
|
||||
|
||||
def test_returns_none_for_invalid_image(self) -> None:
|
||||
"""Test that None is returned for invalid image data."""
|
||||
img = ImageFile(source=FileBytes(data=b"not an image", filename="bad.png"))
|
||||
|
||||
dims = get_image_dimensions(img)
|
||||
|
||||
assert dims is None
|
||||
|
||||
def test_returns_none_without_pillow(self) -> None:
|
||||
"""Test that None is returned when Pillow is not installed."""
|
||||
png_bytes = create_test_png(100, 100)
|
||||
img = ImageFile(source=FileBytes(data=png_bytes, filename="test.png"))
|
||||
|
||||
with patch.dict("sys.modules", {"PIL": None}):
|
||||
# Can't easily test this without unloading module
|
||||
# Just verify the function handles the case gracefully
|
||||
pass
|
||||
|
||||
|
||||
class TestGetPdfPageCount:
|
||||
"""Tests for get_pdf_page_count function."""
|
||||
|
||||
def test_get_page_count(self) -> None:
|
||||
"""Test getting PDF page count."""
|
||||
pdf_bytes = create_test_pdf(num_pages=5)
|
||||
pdf = PDFFile(source=FileBytes(data=pdf_bytes, filename="test.pdf"))
|
||||
|
||||
count = get_pdf_page_count(pdf)
|
||||
|
||||
assert count == 5
|
||||
|
||||
def test_single_page(self) -> None:
|
||||
"""Test page count for single page PDF."""
|
||||
pdf_bytes = create_test_pdf(num_pages=1)
|
||||
pdf = PDFFile(source=FileBytes(data=pdf_bytes, filename="single.pdf"))
|
||||
|
||||
count = get_pdf_page_count(pdf)
|
||||
|
||||
assert count == 1
|
||||
|
||||
def test_returns_none_for_invalid_pdf(self) -> None:
|
||||
"""Test that None is returned for invalid PDF data."""
|
||||
pdf = PDFFile(source=FileBytes(data=b"not a pdf", filename="bad.pdf"))
|
||||
|
||||
count = get_pdf_page_count(pdf)
|
||||
|
||||
assert count is None
|
||||
208
lib/crewai/tests/files/processing/test_validators.py
Normal file
208
lib/crewai/tests/files/processing/test_validators.py
Normal file
@@ -0,0 +1,208 @@
|
||||
"""Tests for file validators."""
|
||||
|
||||
import pytest
|
||||
|
||||
from crewai.files import FileBytes, ImageFile, PDFFile, TextFile
|
||||
from crewai.files.processing.constraints import (
|
||||
ANTHROPIC_CONSTRAINTS,
|
||||
ImageConstraints,
|
||||
PDFConstraints,
|
||||
ProviderConstraints,
|
||||
)
|
||||
from crewai.files.processing.exceptions import (
|
||||
FileTooLargeError,
|
||||
FileValidationError,
|
||||
UnsupportedFileTypeError,
|
||||
)
|
||||
from crewai.files.processing.validators import (
|
||||
validate_file,
|
||||
validate_image,
|
||||
validate_pdf,
|
||||
validate_text,
|
||||
)
|
||||
|
||||
|
||||
# Minimal valid PNG: 8x8 pixel RGB image (valid for PIL)
|
||||
MINIMAL_PNG = bytes([
|
||||
0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0x00, 0x00, 0x00, 0x0d,
|
||||
0x49, 0x48, 0x44, 0x52, 0x00, 0x00, 0x00, 0x08, 0x00, 0x00, 0x00, 0x08,
|
||||
0x08, 0x02, 0x00, 0x00, 0x00, 0x4b, 0x6d, 0x29, 0xdc, 0x00, 0x00, 0x00,
|
||||
0x12, 0x49, 0x44, 0x41, 0x54, 0x78, 0x9c, 0x63, 0xfc, 0xcf, 0x80, 0x1d,
|
||||
0x30, 0xe1, 0x10, 0x1f, 0xa4, 0x12, 0x00, 0xcd, 0x41, 0x01, 0x0f, 0xe8,
|
||||
0x41, 0xe2, 0x6f, 0x00, 0x00, 0x00, 0x00, 0x49, 0x45, 0x4e, 0x44, 0xae,
|
||||
0x42, 0x60, 0x82,
|
||||
])
|
||||
|
||||
# Minimal valid PDF
|
||||
MINIMAL_PDF = (
|
||||
b"%PDF-1.4\n1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj "
|
||||
b"2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj "
|
||||
b"3 0 obj<</Type/Page/MediaBox[0 0 612 792]/Parent 2 0 R>>endobj "
|
||||
b"xref\n0 4\n0000000000 65535 f \n0000000009 00000 n \n"
|
||||
b"0000000052 00000 n \n0000000101 00000 n \n"
|
||||
b"trailer<</Size 4/Root 1 0 R>>\nstartxref\n178\n%%EOF"
|
||||
)
|
||||
|
||||
|
||||
class TestValidateImage:
|
||||
"""Tests for validate_image function."""
|
||||
|
||||
def test_validate_valid_image(self):
|
||||
"""Test validating a valid image within constraints."""
|
||||
constraints = ImageConstraints(
|
||||
max_size_bytes=10 * 1024 * 1024,
|
||||
supported_formats=("image/png",),
|
||||
)
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
errors = validate_image(file, constraints, raise_on_error=False)
|
||||
|
||||
assert len(errors) == 0
|
||||
|
||||
def test_validate_image_too_large(self):
|
||||
"""Test validating an image that exceeds size limit."""
|
||||
constraints = ImageConstraints(
|
||||
max_size_bytes=10, # Very small limit
|
||||
supported_formats=("image/png",),
|
||||
)
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
with pytest.raises(FileTooLargeError) as exc_info:
|
||||
validate_image(file, constraints)
|
||||
|
||||
assert "exceeds" in str(exc_info.value)
|
||||
assert exc_info.value.file_name == "test.png"
|
||||
|
||||
def test_validate_image_unsupported_format(self):
|
||||
"""Test validating an image with unsupported format."""
|
||||
constraints = ImageConstraints(
|
||||
max_size_bytes=10 * 1024 * 1024,
|
||||
supported_formats=("image/jpeg",), # Only JPEG
|
||||
)
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
with pytest.raises(UnsupportedFileTypeError) as exc_info:
|
||||
validate_image(file, constraints)
|
||||
|
||||
assert "not supported" in str(exc_info.value)
|
||||
|
||||
def test_validate_image_no_raise(self):
|
||||
"""Test validating with raise_on_error=False returns errors list."""
|
||||
constraints = ImageConstraints(
|
||||
max_size_bytes=10,
|
||||
supported_formats=("image/jpeg",),
|
||||
)
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
errors = validate_image(file, constraints, raise_on_error=False)
|
||||
|
||||
assert len(errors) == 2 # Size error and format error
|
||||
|
||||
|
||||
class TestValidatePDF:
|
||||
"""Tests for validate_pdf function."""
|
||||
|
||||
def test_validate_valid_pdf(self):
|
||||
"""Test validating a valid PDF within constraints."""
|
||||
constraints = PDFConstraints(
|
||||
max_size_bytes=10 * 1024 * 1024,
|
||||
)
|
||||
file = PDFFile(source=FileBytes(data=MINIMAL_PDF, filename="test.pdf"))
|
||||
|
||||
errors = validate_pdf(file, constraints, raise_on_error=False)
|
||||
|
||||
assert len(errors) == 0
|
||||
|
||||
def test_validate_pdf_too_large(self):
|
||||
"""Test validating a PDF that exceeds size limit."""
|
||||
constraints = PDFConstraints(
|
||||
max_size_bytes=10, # Very small limit
|
||||
)
|
||||
file = PDFFile(source=FileBytes(data=MINIMAL_PDF, filename="test.pdf"))
|
||||
|
||||
with pytest.raises(FileTooLargeError) as exc_info:
|
||||
validate_pdf(file, constraints)
|
||||
|
||||
assert "exceeds" in str(exc_info.value)
|
||||
|
||||
|
||||
class TestValidateText:
|
||||
"""Tests for validate_text function."""
|
||||
|
||||
def test_validate_valid_text(self):
|
||||
"""Test validating a valid text file."""
|
||||
constraints = ProviderConstraints(
|
||||
name="test",
|
||||
general_max_size_bytes=10 * 1024 * 1024,
|
||||
)
|
||||
file = TextFile(source=FileBytes(data=b"Hello, World!", filename="test.txt"))
|
||||
|
||||
errors = validate_text(file, constraints, raise_on_error=False)
|
||||
|
||||
assert len(errors) == 0
|
||||
|
||||
def test_validate_text_too_large(self):
|
||||
"""Test validating text that exceeds size limit."""
|
||||
constraints = ProviderConstraints(
|
||||
name="test",
|
||||
general_max_size_bytes=5,
|
||||
)
|
||||
file = TextFile(source=FileBytes(data=b"Hello, World!", filename="test.txt"))
|
||||
|
||||
with pytest.raises(FileTooLargeError):
|
||||
validate_text(file, constraints)
|
||||
|
||||
def test_validate_text_no_limit(self):
|
||||
"""Test validating text with no size limit."""
|
||||
constraints = ProviderConstraints(name="test")
|
||||
file = TextFile(source=FileBytes(data=b"Hello, World!", filename="test.txt"))
|
||||
|
||||
errors = validate_text(file, constraints, raise_on_error=False)
|
||||
|
||||
assert len(errors) == 0
|
||||
|
||||
|
||||
class TestValidateFile:
|
||||
"""Tests for validate_file function."""
|
||||
|
||||
def test_validate_file_dispatches_to_image(self):
|
||||
"""Test validate_file dispatches to image validator."""
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
errors = validate_file(file, ANTHROPIC_CONSTRAINTS, raise_on_error=False)
|
||||
|
||||
assert len(errors) == 0
|
||||
|
||||
def test_validate_file_dispatches_to_pdf(self):
|
||||
"""Test validate_file dispatches to PDF validator."""
|
||||
file = PDFFile(source=FileBytes(data=MINIMAL_PDF, filename="test.pdf"))
|
||||
|
||||
errors = validate_file(file, ANTHROPIC_CONSTRAINTS, raise_on_error=False)
|
||||
|
||||
assert len(errors) == 0
|
||||
|
||||
def test_validate_file_unsupported_type(self):
|
||||
"""Test validating a file type not supported by provider."""
|
||||
constraints = ProviderConstraints(
|
||||
name="test",
|
||||
image=None, # No image support
|
||||
)
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
with pytest.raises(UnsupportedFileTypeError) as exc_info:
|
||||
validate_file(file, constraints)
|
||||
|
||||
assert "does not support images" in str(exc_info.value)
|
||||
|
||||
def test_validate_file_pdf_not_supported(self):
|
||||
"""Test validating PDF when provider doesn't support it."""
|
||||
constraints = ProviderConstraints(
|
||||
name="test",
|
||||
pdf=None, # No PDF support
|
||||
)
|
||||
file = PDFFile(source=FileBytes(data=MINIMAL_PDF, filename="test.pdf"))
|
||||
|
||||
with pytest.raises(UnsupportedFileTypeError) as exc_info:
|
||||
validate_file(file, constraints)
|
||||
|
||||
assert "does not support PDFs" in str(exc_info.value)
|
||||
135
lib/crewai/tests/files/test_resolved.py
Normal file
135
lib/crewai/tests/files/test_resolved.py
Normal file
@@ -0,0 +1,135 @@
|
||||
"""Tests for resolved file types."""
|
||||
|
||||
from datetime import datetime, timezone
|
||||
|
||||
import pytest
|
||||
|
||||
from crewai.files.resolved import (
|
||||
FileReference,
|
||||
InlineBase64,
|
||||
InlineBytes,
|
||||
ResolvedFile,
|
||||
UrlReference,
|
||||
)
|
||||
|
||||
|
||||
class TestInlineBase64:
|
||||
"""Tests for InlineBase64 resolved type."""
|
||||
|
||||
def test_create_inline_base64(self):
|
||||
"""Test creating InlineBase64 instance."""
|
||||
resolved = InlineBase64(
|
||||
content_type="image/png",
|
||||
data="iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==",
|
||||
)
|
||||
|
||||
assert resolved.content_type == "image/png"
|
||||
assert len(resolved.data) > 0
|
||||
|
||||
def test_inline_base64_is_resolved_file(self):
|
||||
"""Test InlineBase64 is a ResolvedFile."""
|
||||
resolved = InlineBase64(content_type="image/png", data="abc123")
|
||||
|
||||
assert isinstance(resolved, ResolvedFile)
|
||||
|
||||
def test_inline_base64_frozen(self):
|
||||
"""Test InlineBase64 is immutable."""
|
||||
resolved = InlineBase64(content_type="image/png", data="abc123")
|
||||
|
||||
with pytest.raises(Exception):
|
||||
resolved.data = "xyz789"
|
||||
|
||||
|
||||
class TestInlineBytes:
|
||||
"""Tests for InlineBytes resolved type."""
|
||||
|
||||
def test_create_inline_bytes(self):
|
||||
"""Test creating InlineBytes instance."""
|
||||
data = b"\x89PNG\r\n\x1a\n"
|
||||
resolved = InlineBytes(
|
||||
content_type="image/png",
|
||||
data=data,
|
||||
)
|
||||
|
||||
assert resolved.content_type == "image/png"
|
||||
assert resolved.data == data
|
||||
|
||||
def test_inline_bytes_is_resolved_file(self):
|
||||
"""Test InlineBytes is a ResolvedFile."""
|
||||
resolved = InlineBytes(content_type="image/png", data=b"test")
|
||||
|
||||
assert isinstance(resolved, ResolvedFile)
|
||||
|
||||
|
||||
class TestFileReference:
|
||||
"""Tests for FileReference resolved type."""
|
||||
|
||||
def test_create_file_reference(self):
|
||||
"""Test creating FileReference instance."""
|
||||
resolved = FileReference(
|
||||
content_type="image/png",
|
||||
file_id="file-abc123",
|
||||
provider="gemini",
|
||||
)
|
||||
|
||||
assert resolved.content_type == "image/png"
|
||||
assert resolved.file_id == "file-abc123"
|
||||
assert resolved.provider == "gemini"
|
||||
assert resolved.expires_at is None
|
||||
assert resolved.file_uri is None
|
||||
|
||||
def test_file_reference_with_expiry(self):
|
||||
"""Test FileReference with expiry time."""
|
||||
expiry = datetime.now(timezone.utc)
|
||||
resolved = FileReference(
|
||||
content_type="application/pdf",
|
||||
file_id="file-xyz789",
|
||||
provider="gemini",
|
||||
expires_at=expiry,
|
||||
)
|
||||
|
||||
assert resolved.expires_at == expiry
|
||||
|
||||
def test_file_reference_with_uri(self):
|
||||
"""Test FileReference with URI."""
|
||||
resolved = FileReference(
|
||||
content_type="video/mp4",
|
||||
file_id="file-video123",
|
||||
provider="gemini",
|
||||
file_uri="https://generativelanguage.googleapis.com/v1/files/file-video123",
|
||||
)
|
||||
|
||||
assert resolved.file_uri is not None
|
||||
|
||||
def test_file_reference_is_resolved_file(self):
|
||||
"""Test FileReference is a ResolvedFile."""
|
||||
resolved = FileReference(
|
||||
content_type="image/png",
|
||||
file_id="file-123",
|
||||
provider="anthropic",
|
||||
)
|
||||
|
||||
assert isinstance(resolved, ResolvedFile)
|
||||
|
||||
|
||||
class TestUrlReference:
|
||||
"""Tests for UrlReference resolved type."""
|
||||
|
||||
def test_create_url_reference(self):
|
||||
"""Test creating UrlReference instance."""
|
||||
resolved = UrlReference(
|
||||
content_type="image/png",
|
||||
url="https://storage.googleapis.com/bucket/image.png",
|
||||
)
|
||||
|
||||
assert resolved.content_type == "image/png"
|
||||
assert resolved.url == "https://storage.googleapis.com/bucket/image.png"
|
||||
|
||||
def test_url_reference_is_resolved_file(self):
|
||||
"""Test UrlReference is a ResolvedFile."""
|
||||
resolved = UrlReference(
|
||||
content_type="image/jpeg",
|
||||
url="https://example.com/photo.jpg",
|
||||
)
|
||||
|
||||
assert isinstance(resolved, ResolvedFile)
|
||||
174
lib/crewai/tests/files/test_resolver.py
Normal file
174
lib/crewai/tests/files/test_resolver.py
Normal file
@@ -0,0 +1,174 @@
|
||||
"""Tests for FileResolver."""
|
||||
|
||||
import pytest
|
||||
|
||||
from crewai.files import FileBytes, ImageFile
|
||||
from crewai.files.resolved import InlineBase64, InlineBytes
|
||||
from crewai.files.resolver import (
|
||||
FileResolver,
|
||||
FileResolverConfig,
|
||||
create_resolver,
|
||||
)
|
||||
from crewai.files.upload_cache import UploadCache
|
||||
|
||||
|
||||
# Minimal valid PNG
|
||||
MINIMAL_PNG = (
|
||||
b"\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x08\x00\x00\x00\x08"
|
||||
b"\x01\x00\x00\x00\x00\xf9Y\xab\xcd\x00\x00\x00\nIDATx\x9cc`\x00\x00"
|
||||
b"\x00\x02\x00\x01\xe2!\xbc3\x00\x00\x00\x00IEND\xaeB`\x82"
|
||||
)
|
||||
|
||||
|
||||
class TestFileResolverConfig:
|
||||
"""Tests for FileResolverConfig."""
|
||||
|
||||
def test_default_config(self):
|
||||
"""Test default configuration values."""
|
||||
config = FileResolverConfig()
|
||||
|
||||
assert config.prefer_upload is False
|
||||
assert config.upload_threshold_bytes is None
|
||||
assert config.use_bytes_for_bedrock is True
|
||||
|
||||
def test_custom_config(self):
|
||||
"""Test custom configuration values."""
|
||||
config = FileResolverConfig(
|
||||
prefer_upload=True,
|
||||
upload_threshold_bytes=1024 * 1024,
|
||||
use_bytes_for_bedrock=False,
|
||||
)
|
||||
|
||||
assert config.prefer_upload is True
|
||||
assert config.upload_threshold_bytes == 1024 * 1024
|
||||
assert config.use_bytes_for_bedrock is False
|
||||
|
||||
|
||||
class TestFileResolver:
|
||||
"""Tests for FileResolver class."""
|
||||
|
||||
def test_resolve_inline_base64(self):
|
||||
"""Test resolving file as inline base64."""
|
||||
resolver = FileResolver()
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
resolved = resolver.resolve(file, "openai")
|
||||
|
||||
assert isinstance(resolved, InlineBase64)
|
||||
assert resolved.content_type == "image/png"
|
||||
assert len(resolved.data) > 0
|
||||
|
||||
def test_resolve_inline_bytes_for_bedrock(self):
|
||||
"""Test resolving file as inline bytes for Bedrock."""
|
||||
config = FileResolverConfig(use_bytes_for_bedrock=True)
|
||||
resolver = FileResolver(config=config)
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
resolved = resolver.resolve(file, "bedrock")
|
||||
|
||||
assert isinstance(resolved, InlineBytes)
|
||||
assert resolved.content_type == "image/png"
|
||||
assert resolved.data == MINIMAL_PNG
|
||||
|
||||
def test_resolve_files_multiple(self):
|
||||
"""Test resolving multiple files."""
|
||||
resolver = FileResolver()
|
||||
files = {
|
||||
"image1": ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test1.png")),
|
||||
"image2": ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test2.png")),
|
||||
}
|
||||
|
||||
resolved = resolver.resolve_files(files, "openai")
|
||||
|
||||
assert len(resolved) == 2
|
||||
assert "image1" in resolved
|
||||
assert "image2" in resolved
|
||||
assert all(isinstance(r, InlineBase64) for r in resolved.values())
|
||||
|
||||
def test_resolve_with_cache(self):
|
||||
"""Test resolver uses cache."""
|
||||
cache = UploadCache()
|
||||
resolver = FileResolver(upload_cache=cache)
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
# First resolution
|
||||
resolved1 = resolver.resolve(file, "openai")
|
||||
# Second resolution (should use same base64 encoding)
|
||||
resolved2 = resolver.resolve(file, "openai")
|
||||
|
||||
assert isinstance(resolved1, InlineBase64)
|
||||
assert isinstance(resolved2, InlineBase64)
|
||||
# Data should be identical
|
||||
assert resolved1.data == resolved2.data
|
||||
|
||||
def test_clear_cache(self):
|
||||
"""Test clearing resolver cache."""
|
||||
cache = UploadCache()
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
# Add something to cache manually
|
||||
cache.set(file=file, provider="gemini", file_id="test")
|
||||
|
||||
resolver = FileResolver(upload_cache=cache)
|
||||
resolver.clear_cache()
|
||||
|
||||
assert len(cache) == 0
|
||||
|
||||
def test_get_cached_uploads(self):
|
||||
"""Test getting cached uploads from resolver."""
|
||||
cache = UploadCache()
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
cache.set(file=file, provider="gemini", file_id="test-1")
|
||||
cache.set(file=file, provider="anthropic", file_id="test-2")
|
||||
|
||||
resolver = FileResolver(upload_cache=cache)
|
||||
|
||||
gemini_uploads = resolver.get_cached_uploads("gemini")
|
||||
anthropic_uploads = resolver.get_cached_uploads("anthropic")
|
||||
|
||||
assert len(gemini_uploads) == 1
|
||||
assert len(anthropic_uploads) == 1
|
||||
|
||||
def test_get_cached_uploads_empty(self):
|
||||
"""Test getting cached uploads when no cache."""
|
||||
resolver = FileResolver() # No cache
|
||||
|
||||
uploads = resolver.get_cached_uploads("gemini")
|
||||
|
||||
assert uploads == []
|
||||
|
||||
|
||||
class TestCreateResolver:
|
||||
"""Tests for create_resolver factory function."""
|
||||
|
||||
def test_create_default_resolver(self):
|
||||
"""Test creating resolver with default settings."""
|
||||
resolver = create_resolver()
|
||||
|
||||
assert resolver.config.prefer_upload is False
|
||||
assert resolver.upload_cache is not None
|
||||
|
||||
def test_create_resolver_with_options(self):
|
||||
"""Test creating resolver with custom options."""
|
||||
resolver = create_resolver(
|
||||
prefer_upload=True,
|
||||
upload_threshold_bytes=5 * 1024 * 1024,
|
||||
enable_cache=False,
|
||||
)
|
||||
|
||||
assert resolver.config.prefer_upload is True
|
||||
assert resolver.config.upload_threshold_bytes == 5 * 1024 * 1024
|
||||
assert resolver.upload_cache is None
|
||||
|
||||
def test_create_resolver_cache_enabled(self):
|
||||
"""Test resolver has cache when enabled."""
|
||||
resolver = create_resolver(enable_cache=True)
|
||||
|
||||
assert resolver.upload_cache is not None
|
||||
|
||||
def test_create_resolver_cache_disabled(self):
|
||||
"""Test resolver has no cache when disabled."""
|
||||
resolver = create_resolver(enable_cache=False)
|
||||
|
||||
assert resolver.upload_cache is None
|
||||
206
lib/crewai/tests/files/test_upload_cache.py
Normal file
206
lib/crewai/tests/files/test_upload_cache.py
Normal file
@@ -0,0 +1,206 @@
|
||||
"""Tests for upload cache."""
|
||||
|
||||
from datetime import datetime, timedelta, timezone
|
||||
|
||||
import pytest
|
||||
|
||||
from crewai.files import FileBytes, ImageFile
|
||||
from crewai.files.upload_cache import CachedUpload, UploadCache
|
||||
|
||||
|
||||
# Minimal valid PNG
|
||||
MINIMAL_PNG = (
|
||||
b"\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x08\x00\x00\x00\x08"
|
||||
b"\x01\x00\x00\x00\x00\xf9Y\xab\xcd\x00\x00\x00\nIDATx\x9cc`\x00\x00"
|
||||
b"\x00\x02\x00\x01\xe2!\xbc3\x00\x00\x00\x00IEND\xaeB`\x82"
|
||||
)
|
||||
|
||||
|
||||
class TestCachedUpload:
|
||||
"""Tests for CachedUpload dataclass."""
|
||||
|
||||
def test_cached_upload_creation(self):
|
||||
"""Test creating a cached upload."""
|
||||
now = datetime.now(timezone.utc)
|
||||
cached = CachedUpload(
|
||||
file_id="file-123",
|
||||
provider="gemini",
|
||||
file_uri="files/file-123",
|
||||
content_type="image/png",
|
||||
uploaded_at=now,
|
||||
expires_at=now + timedelta(hours=48),
|
||||
)
|
||||
|
||||
assert cached.file_id == "file-123"
|
||||
assert cached.provider == "gemini"
|
||||
assert cached.file_uri == "files/file-123"
|
||||
assert cached.content_type == "image/png"
|
||||
|
||||
def test_is_expired_false(self):
|
||||
"""Test is_expired returns False for non-expired upload."""
|
||||
future = datetime.now(timezone.utc) + timedelta(hours=24)
|
||||
cached = CachedUpload(
|
||||
file_id="file-123",
|
||||
provider="gemini",
|
||||
file_uri=None,
|
||||
content_type="image/png",
|
||||
uploaded_at=datetime.now(timezone.utc),
|
||||
expires_at=future,
|
||||
)
|
||||
|
||||
assert cached.is_expired() is False
|
||||
|
||||
def test_is_expired_true(self):
|
||||
"""Test is_expired returns True for expired upload."""
|
||||
past = datetime.now(timezone.utc) - timedelta(hours=1)
|
||||
cached = CachedUpload(
|
||||
file_id="file-123",
|
||||
provider="gemini",
|
||||
file_uri=None,
|
||||
content_type="image/png",
|
||||
uploaded_at=datetime.now(timezone.utc) - timedelta(hours=2),
|
||||
expires_at=past,
|
||||
)
|
||||
|
||||
assert cached.is_expired() is True
|
||||
|
||||
def test_is_expired_no_expiry(self):
|
||||
"""Test is_expired returns False when no expiry set."""
|
||||
cached = CachedUpload(
|
||||
file_id="file-123",
|
||||
provider="anthropic",
|
||||
file_uri=None,
|
||||
content_type="image/png",
|
||||
uploaded_at=datetime.now(timezone.utc),
|
||||
expires_at=None,
|
||||
)
|
||||
|
||||
assert cached.is_expired() is False
|
||||
|
||||
|
||||
class TestUploadCache:
|
||||
"""Tests for UploadCache class."""
|
||||
|
||||
def test_cache_creation(self):
|
||||
"""Test creating an empty cache."""
|
||||
cache = UploadCache()
|
||||
|
||||
assert len(cache) == 0
|
||||
|
||||
def test_set_and_get(self):
|
||||
"""Test setting and getting cached uploads."""
|
||||
cache = UploadCache()
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
cached = cache.set(
|
||||
file=file,
|
||||
provider="gemini",
|
||||
file_id="file-123",
|
||||
file_uri="files/file-123",
|
||||
)
|
||||
|
||||
result = cache.get(file, "gemini")
|
||||
|
||||
assert result is not None
|
||||
assert result.file_id == "file-123"
|
||||
assert result.provider == "gemini"
|
||||
|
||||
def test_get_missing(self):
|
||||
"""Test getting non-existent entry returns None."""
|
||||
cache = UploadCache()
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
result = cache.get(file, "gemini")
|
||||
|
||||
assert result is None
|
||||
|
||||
def test_get_different_provider(self):
|
||||
"""Test getting with different provider returns None."""
|
||||
cache = UploadCache()
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
cache.set(file=file, provider="gemini", file_id="file-123")
|
||||
|
||||
result = cache.get(file, "anthropic") # Different provider
|
||||
|
||||
assert result is None
|
||||
|
||||
def test_remove(self):
|
||||
"""Test removing cached entry."""
|
||||
cache = UploadCache()
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
cache.set(file=file, provider="gemini", file_id="file-123")
|
||||
removed = cache.remove(file, "gemini")
|
||||
|
||||
assert removed is True
|
||||
assert cache.get(file, "gemini") is None
|
||||
|
||||
def test_remove_missing(self):
|
||||
"""Test removing non-existent entry returns False."""
|
||||
cache = UploadCache()
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
removed = cache.remove(file, "gemini")
|
||||
|
||||
assert removed is False
|
||||
|
||||
def test_remove_by_file_id(self):
|
||||
"""Test removing by file ID."""
|
||||
cache = UploadCache()
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
cache.set(file=file, provider="gemini", file_id="file-123")
|
||||
removed = cache.remove_by_file_id("file-123", "gemini")
|
||||
|
||||
assert removed is True
|
||||
assert len(cache) == 0
|
||||
|
||||
def test_clear_expired(self):
|
||||
"""Test clearing expired entries."""
|
||||
cache = UploadCache()
|
||||
file1 = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test1.png"))
|
||||
file2 = ImageFile(source=FileBytes(data=MINIMAL_PNG + b"x", filename="test2.png"))
|
||||
|
||||
# Add one expired and one valid entry
|
||||
past = datetime.now(timezone.utc) - timedelta(hours=1)
|
||||
future = datetime.now(timezone.utc) + timedelta(hours=24)
|
||||
|
||||
cache.set(file=file1, provider="gemini", file_id="expired", expires_at=past)
|
||||
cache.set(file=file2, provider="gemini", file_id="valid", expires_at=future)
|
||||
|
||||
removed = cache.clear_expired()
|
||||
|
||||
assert removed == 1
|
||||
assert len(cache) == 1
|
||||
assert cache.get(file2, "gemini") is not None
|
||||
|
||||
def test_clear(self):
|
||||
"""Test clearing all entries."""
|
||||
cache = UploadCache()
|
||||
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
|
||||
|
||||
cache.set(file=file, provider="gemini", file_id="file-123")
|
||||
cache.set(file=file, provider="anthropic", file_id="file-456")
|
||||
|
||||
cleared = cache.clear()
|
||||
|
||||
assert cleared == 2
|
||||
assert len(cache) == 0
|
||||
|
||||
def test_get_all_for_provider(self):
|
||||
"""Test getting all cached uploads for a provider."""
|
||||
cache = UploadCache()
|
||||
file1 = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test1.png"))
|
||||
file2 = ImageFile(source=FileBytes(data=MINIMAL_PNG + b"x", filename="test2.png"))
|
||||
file3 = ImageFile(source=FileBytes(data=MINIMAL_PNG + b"xx", filename="test3.png"))
|
||||
|
||||
cache.set(file=file1, provider="gemini", file_id="file-1")
|
||||
cache.set(file=file2, provider="gemini", file_id="file-2")
|
||||
cache.set(file=file3, provider="anthropic", file_id="file-3")
|
||||
|
||||
gemini_uploads = cache.get_all_for_provider("gemini")
|
||||
anthropic_uploads = cache.get_all_for_provider("anthropic")
|
||||
|
||||
assert len(gemini_uploads) == 2
|
||||
assert len(anthropic_uploads) == 1
|
||||
5
lib/crewai/tests/fixtures/quarterly_report.csv
vendored
Normal file
5
lib/crewai/tests/fixtures/quarterly_report.csv
vendored
Normal file
@@ -0,0 +1,5 @@
|
||||
Quarter,Revenue ($M),Expenses ($M),Profit ($M)
|
||||
Q1 2024,70,40,30
|
||||
Q2 2024,75,42,33
|
||||
Q3 2024,80,45,35
|
||||
Q4 2024,75,44,31
|
||||
|
BIN
lib/crewai/tests/fixtures/revenue_chart.png
vendored
Normal file
BIN
lib/crewai/tests/fixtures/revenue_chart.png
vendored
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 27 KiB |
10
lib/crewai/tests/fixtures/review_guidelines.txt
vendored
Normal file
10
lib/crewai/tests/fixtures/review_guidelines.txt
vendored
Normal file
@@ -0,0 +1,10 @@
|
||||
Review Guidelines
|
||||
|
||||
1. Be clear and concise: Write feedback that is easy to understand.
|
||||
2. Focus on behavior and outcomes: Describe what happened and why it matters.
|
||||
3. Be specific: Provide examples to support your points.
|
||||
4. Balance positives and improvements: Highlight strengths and areas to grow.
|
||||
5. Be respectful and constructive: Assume positive intent and offer solutions.
|
||||
6. Use objective criteria: Reference goals, metrics, or expectations where possible.
|
||||
7. Suggest next steps: Recommend actionable ways to improve.
|
||||
8. Proofread: Check tone, grammar, and clarity before submitting.
|
||||
@@ -728,3 +728,39 @@ def test_google_streaming_returns_usage_metrics():
|
||||
assert result.token_usage.prompt_tokens > 0
|
||||
assert result.token_usage.completion_tokens > 0
|
||||
assert result.token_usage.successful_requests >= 1
|
||||
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_google_express_mode_works() -> None:
|
||||
"""
|
||||
Test Google Vertex AI Express mode with API key authentication.
|
||||
This tests Vertex AI Express mode (aiplatform.googleapis.com) with API key
|
||||
authentication.
|
||||
|
||||
"""
|
||||
with patch.dict(os.environ, {"GOOGLE_GENAI_USE_VERTEXAI": "true"}):
|
||||
agent = Agent(
|
||||
role="Research Assistant",
|
||||
goal="Find information about the capital of Japan",
|
||||
backstory="You are a helpful research assistant.",
|
||||
llm=LLM(
|
||||
model="gemini/gemini-2.0-flash-exp",
|
||||
),
|
||||
verbose=True,
|
||||
)
|
||||
|
||||
task = Task(
|
||||
description="What is the capital of Japan?",
|
||||
expected_output="The capital of Japan",
|
||||
agent=agent,
|
||||
)
|
||||
|
||||
|
||||
crew = Crew(agents=[agent], tasks=[task])
|
||||
result = crew.kickoff()
|
||||
|
||||
assert result.token_usage is not None
|
||||
assert result.token_usage.total_tokens > 0
|
||||
assert result.token_usage.prompt_tokens > 0
|
||||
assert result.token_usage.completion_tokens > 0
|
||||
assert result.token_usage.successful_requests >= 1
|
||||
|
||||
474
lib/crewai/tests/llms/test_multimodal.py
Normal file
474
lib/crewai/tests/llms/test_multimodal.py
Normal file
@@ -0,0 +1,474 @@
|
||||
"""Unit tests for LLM multimodal functionality across all providers."""
|
||||
|
||||
import base64
|
||||
import os
|
||||
from unittest.mock import patch
|
||||
|
||||
import pytest
|
||||
|
||||
from crewai.llm import LLM
|
||||
from crewai.files import ImageFile, PDFFile, TextFile
|
||||
|
||||
# Check for optional provider dependencies
|
||||
try:
|
||||
from crewai.llms.providers.anthropic.completion import AnthropicCompletion
|
||||
HAS_ANTHROPIC = True
|
||||
except ImportError:
|
||||
HAS_ANTHROPIC = False
|
||||
|
||||
try:
|
||||
from crewai.llms.providers.azure.completion import AzureCompletion
|
||||
HAS_AZURE = True
|
||||
except ImportError:
|
||||
HAS_AZURE = False
|
||||
|
||||
try:
|
||||
from crewai.llms.providers.bedrock.completion import BedrockCompletion
|
||||
HAS_BEDROCK = True
|
||||
except ImportError:
|
||||
HAS_BEDROCK = False
|
||||
|
||||
|
||||
# Minimal valid PNG for testing
|
||||
MINIMAL_PNG = (
|
||||
b"\x89PNG\r\n\x1a\n"
|
||||
b"\x00\x00\x00\rIHDR"
|
||||
b"\x00\x00\x00\x01\x00\x00\x00\x01\x08\x02\x00\x00\x00"
|
||||
b"\x90wS\xde"
|
||||
b"\x00\x00\x00\x00IEND\xaeB`\x82"
|
||||
)
|
||||
|
||||
MINIMAL_PDF = b"%PDF-1.4 test content"
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def mock_api_keys():
|
||||
"""Mock API keys for all providers."""
|
||||
env_vars = {
|
||||
"ANTHROPIC_API_KEY": "test-key",
|
||||
"OPENAI_API_KEY": "test-key",
|
||||
"GOOGLE_API_KEY": "test-key",
|
||||
"AZURE_API_KEY": "test-key",
|
||||
"AWS_ACCESS_KEY_ID": "test-key",
|
||||
"AWS_SECRET_ACCESS_KEY": "test-key",
|
||||
}
|
||||
with patch.dict(os.environ, env_vars):
|
||||
yield
|
||||
|
||||
|
||||
class TestLiteLLMMultimodal:
|
||||
"""Tests for LLM class (litellm wrapper) multimodal functionality.
|
||||
|
||||
These tests use `is_litellm=True` to ensure the litellm wrapper is used
|
||||
instead of native providers.
|
||||
"""
|
||||
|
||||
def test_supports_multimodal_gpt4o(self) -> None:
|
||||
"""Test GPT-4o model supports multimodal."""
|
||||
llm = LLM(model="gpt-4o", is_litellm=True)
|
||||
assert llm.supports_multimodal() is True
|
||||
|
||||
def test_supports_multimodal_gpt4_turbo(self) -> None:
|
||||
"""Test GPT-4 Turbo model supports multimodal."""
|
||||
llm = LLM(model="gpt-4-turbo", is_litellm=True)
|
||||
assert llm.supports_multimodal() is True
|
||||
|
||||
def test_supports_multimodal_claude3(self) -> None:
|
||||
"""Test Claude 3 model supports multimodal via litellm."""
|
||||
# Use litellm/ prefix to avoid native provider import
|
||||
llm = LLM(model="litellm/claude-3-sonnet-20240229")
|
||||
assert llm.supports_multimodal() is True
|
||||
|
||||
def test_supports_multimodal_gemini(self) -> None:
|
||||
"""Test Gemini model supports multimodal."""
|
||||
llm = LLM(model="gemini/gemini-pro", is_litellm=True)
|
||||
assert llm.supports_multimodal() is True
|
||||
|
||||
def test_supports_multimodal_gpt35_does_not(self) -> None:
|
||||
"""Test GPT-3.5 model does not support multimodal."""
|
||||
llm = LLM(model="gpt-3.5-turbo", is_litellm=True)
|
||||
assert llm.supports_multimodal() is False
|
||||
|
||||
def test_supported_content_types_openai(self) -> None:
|
||||
"""Test OpenAI models support images only."""
|
||||
llm = LLM(model="gpt-4o", is_litellm=True)
|
||||
types = llm.supported_multimodal_content_types()
|
||||
assert "image/" in types
|
||||
assert "application/pdf" not in types
|
||||
|
||||
def test_supported_content_types_claude(self) -> None:
|
||||
"""Test Claude models support images and PDFs via litellm."""
|
||||
# Use litellm/ prefix to avoid native provider import
|
||||
llm = LLM(model="litellm/claude-3-sonnet-20240229")
|
||||
types = llm.supported_multimodal_content_types()
|
||||
assert "image/" in types
|
||||
assert "application/pdf" in types
|
||||
|
||||
def test_supported_content_types_gemini(self) -> None:
|
||||
"""Test Gemini models support wide range of content."""
|
||||
llm = LLM(model="gemini/gemini-pro", is_litellm=True)
|
||||
types = llm.supported_multimodal_content_types()
|
||||
assert "image/" in types
|
||||
assert "audio/" in types
|
||||
assert "video/" in types
|
||||
assert "application/pdf" in types
|
||||
assert "text/" in types
|
||||
|
||||
def test_supported_content_types_non_multimodal(self) -> None:
|
||||
"""Test non-multimodal models return empty list."""
|
||||
llm = LLM(model="gpt-3.5-turbo", is_litellm=True)
|
||||
assert llm.supported_multimodal_content_types() == []
|
||||
|
||||
def test_format_multimodal_content_image(self) -> None:
|
||||
"""Test formatting image content."""
|
||||
llm = LLM(model="gpt-4o", is_litellm=True)
|
||||
files = {"chart": ImageFile(source=MINIMAL_PNG)}
|
||||
|
||||
result = llm.format_multimodal_content(files)
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0]["type"] == "image_url"
|
||||
assert "data:image/png;base64," in result[0]["image_url"]["url"]
|
||||
|
||||
def test_format_multimodal_content_non_multimodal(self) -> None:
|
||||
"""Test non-multimodal model returns empty list."""
|
||||
llm = LLM(model="gpt-3.5-turbo", is_litellm=True)
|
||||
files = {"chart": ImageFile(source=MINIMAL_PNG)}
|
||||
|
||||
result = llm.format_multimodal_content(files)
|
||||
|
||||
assert result == []
|
||||
|
||||
def test_format_multimodal_content_unsupported_type(self) -> None:
|
||||
"""Test unsupported content type is skipped."""
|
||||
llm = LLM(model="gpt-4o", is_litellm=True) # OpenAI doesn't support PDF
|
||||
files = {"doc": PDFFile(source=MINIMAL_PDF)}
|
||||
|
||||
result = llm.format_multimodal_content(files)
|
||||
|
||||
assert result == []
|
||||
|
||||
|
||||
@pytest.mark.skipif(not HAS_ANTHROPIC, reason="Anthropic SDK not installed")
|
||||
class TestAnthropicMultimodal:
|
||||
"""Tests for Anthropic provider multimodal functionality."""
|
||||
|
||||
def test_supports_multimodal_claude3(self) -> None:
|
||||
"""Test Claude 3 supports multimodal."""
|
||||
llm = LLM(model="anthropic/claude-3-sonnet-20240229")
|
||||
assert llm.supports_multimodal() is True
|
||||
|
||||
def test_supports_multimodal_claude4(self) -> None:
|
||||
"""Test Claude 4 supports multimodal."""
|
||||
llm = LLM(model="anthropic/claude-4-opus")
|
||||
assert llm.supports_multimodal() is True
|
||||
|
||||
def test_supported_content_types(self) -> None:
|
||||
"""Test Anthropic supports images and PDFs."""
|
||||
llm = LLM(model="anthropic/claude-3-sonnet-20240229")
|
||||
types = llm.supported_multimodal_content_types()
|
||||
assert "image/" in types
|
||||
assert "application/pdf" in types
|
||||
|
||||
def test_format_multimodal_content_image(self) -> None:
|
||||
"""Test Anthropic image format uses source-based structure."""
|
||||
llm = LLM(model="anthropic/claude-3-sonnet-20240229")
|
||||
files = {"chart": ImageFile(source=MINIMAL_PNG)}
|
||||
|
||||
result = llm.format_multimodal_content(files)
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0]["type"] == "image"
|
||||
assert result[0]["source"]["type"] == "base64"
|
||||
assert result[0]["source"]["media_type"] == "image/png"
|
||||
assert "data" in result[0]["source"]
|
||||
|
||||
def test_format_multimodal_content_pdf(self) -> None:
|
||||
"""Test Anthropic PDF format uses document structure."""
|
||||
llm = LLM(model="anthropic/claude-3-sonnet-20240229")
|
||||
files = {"doc": PDFFile(source=MINIMAL_PDF)}
|
||||
|
||||
result = llm.format_multimodal_content(files)
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0]["type"] == "document"
|
||||
assert result[0]["source"]["type"] == "base64"
|
||||
assert result[0]["source"]["media_type"] == "application/pdf"
|
||||
|
||||
|
||||
class TestOpenAIMultimodal:
|
||||
"""Tests for OpenAI provider multimodal functionality."""
|
||||
|
||||
def test_supports_multimodal_gpt4o(self) -> None:
|
||||
"""Test GPT-4o supports multimodal."""
|
||||
llm = LLM(model="openai/gpt-4o")
|
||||
assert llm.supports_multimodal() is True
|
||||
|
||||
def test_supports_multimodal_gpt4_vision(self) -> None:
|
||||
"""Test GPT-4 Vision supports multimodal."""
|
||||
llm = LLM(model="openai/gpt-4-vision-preview")
|
||||
assert llm.supports_multimodal() is True
|
||||
|
||||
def test_supports_multimodal_o1(self) -> None:
|
||||
"""Test O1 model supports multimodal."""
|
||||
llm = LLM(model="openai/o1-preview")
|
||||
assert llm.supports_multimodal() is True
|
||||
|
||||
def test_does_not_support_gpt35(self) -> None:
|
||||
"""Test GPT-3.5 does not support multimodal."""
|
||||
llm = LLM(model="openai/gpt-3.5-turbo")
|
||||
assert llm.supports_multimodal() is False
|
||||
|
||||
def test_supported_content_types(self) -> None:
|
||||
"""Test OpenAI supports only images."""
|
||||
llm = LLM(model="openai/gpt-4o")
|
||||
types = llm.supported_multimodal_content_types()
|
||||
assert types == ["image/"]
|
||||
|
||||
def test_format_multimodal_content_image(self) -> None:
|
||||
"""Test OpenAI uses image_url format."""
|
||||
llm = LLM(model="openai/gpt-4o")
|
||||
files = {"chart": ImageFile(source=MINIMAL_PNG)}
|
||||
|
||||
result = llm.format_multimodal_content(files)
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0]["type"] == "image_url"
|
||||
url = result[0]["image_url"]["url"]
|
||||
assert url.startswith("data:image/png;base64,")
|
||||
# Verify base64 content
|
||||
b64_data = url.split(",")[1]
|
||||
assert base64.b64decode(b64_data) == MINIMAL_PNG
|
||||
|
||||
|
||||
class TestGeminiMultimodal:
|
||||
"""Tests for Gemini provider multimodal functionality."""
|
||||
|
||||
def test_supports_multimodal_always_true(self) -> None:
|
||||
"""Test Gemini always supports multimodal."""
|
||||
llm = LLM(model="gemini/gemini-pro")
|
||||
assert llm.supports_multimodal() is True
|
||||
|
||||
def test_supported_content_types(self) -> None:
|
||||
"""Test Gemini supports wide range of types."""
|
||||
llm = LLM(model="gemini/gemini-pro")
|
||||
types = llm.supported_multimodal_content_types()
|
||||
assert "image/" in types
|
||||
assert "audio/" in types
|
||||
assert "video/" in types
|
||||
assert "application/pdf" in types
|
||||
assert "text/" in types
|
||||
|
||||
def test_format_multimodal_content_image(self) -> None:
|
||||
"""Test Gemini uses inlineData format."""
|
||||
llm = LLM(model="gemini/gemini-pro")
|
||||
files = {"chart": ImageFile(source=MINIMAL_PNG)}
|
||||
|
||||
result = llm.format_multimodal_content(files)
|
||||
|
||||
assert len(result) == 1
|
||||
assert "inlineData" in result[0]
|
||||
assert result[0]["inlineData"]["mimeType"] == "image/png"
|
||||
assert "data" in result[0]["inlineData"]
|
||||
|
||||
def test_format_text_content(self) -> None:
|
||||
"""Test Gemini text format uses simple text key."""
|
||||
llm = LLM(model="gemini/gemini-pro")
|
||||
|
||||
result = llm.format_text_content("Hello world")
|
||||
|
||||
assert result == {"text": "Hello world"}
|
||||
|
||||
|
||||
@pytest.mark.skipif(not HAS_AZURE, reason="Azure AI Inference SDK not installed")
|
||||
class TestAzureMultimodal:
|
||||
"""Tests for Azure OpenAI provider multimodal functionality."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def mock_azure_env(self):
|
||||
"""Mock Azure-specific environment variables."""
|
||||
env_vars = {
|
||||
"AZURE_API_KEY": "test-key",
|
||||
"AZURE_API_BASE": "https://test.openai.azure.com",
|
||||
"AZURE_API_VERSION": "2024-02-01",
|
||||
}
|
||||
with patch.dict(os.environ, env_vars):
|
||||
yield
|
||||
|
||||
def test_supports_multimodal_gpt4o(self) -> None:
|
||||
"""Test Azure GPT-4o supports multimodal."""
|
||||
llm = LLM(model="azure/gpt-4o")
|
||||
assert llm.supports_multimodal() is True
|
||||
|
||||
def test_supports_multimodal_gpt4_turbo(self) -> None:
|
||||
"""Test Azure GPT-4 Turbo supports multimodal."""
|
||||
llm = LLM(model="azure/gpt-4-turbo")
|
||||
assert llm.supports_multimodal() is True
|
||||
|
||||
def test_does_not_support_gpt35(self) -> None:
|
||||
"""Test Azure GPT-3.5 does not support multimodal."""
|
||||
llm = LLM(model="azure/gpt-35-turbo")
|
||||
assert llm.supports_multimodal() is False
|
||||
|
||||
def test_supported_content_types(self) -> None:
|
||||
"""Test Azure supports only images."""
|
||||
llm = LLM(model="azure/gpt-4o")
|
||||
types = llm.supported_multimodal_content_types()
|
||||
assert types == ["image/"]
|
||||
|
||||
def test_format_multimodal_content_image(self) -> None:
|
||||
"""Test Azure uses same format as OpenAI."""
|
||||
llm = LLM(model="azure/gpt-4o")
|
||||
files = {"chart": ImageFile(source=MINIMAL_PNG)}
|
||||
|
||||
result = llm.format_multimodal_content(files)
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0]["type"] == "image_url"
|
||||
assert "data:image/png;base64," in result[0]["image_url"]["url"]
|
||||
|
||||
|
||||
@pytest.mark.skipif(not HAS_BEDROCK, reason="AWS Bedrock SDK not installed")
|
||||
class TestBedrockMultimodal:
|
||||
"""Tests for AWS Bedrock provider multimodal functionality."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def mock_bedrock_env(self):
|
||||
"""Mock AWS-specific environment variables."""
|
||||
env_vars = {
|
||||
"AWS_ACCESS_KEY_ID": "test-key",
|
||||
"AWS_SECRET_ACCESS_KEY": "test-secret",
|
||||
"AWS_DEFAULT_REGION": "us-east-1",
|
||||
}
|
||||
with patch.dict(os.environ, env_vars):
|
||||
yield
|
||||
|
||||
def test_supports_multimodal_claude3(self) -> None:
|
||||
"""Test Bedrock Claude 3 supports multimodal."""
|
||||
llm = LLM(model="bedrock/anthropic.claude-3-sonnet")
|
||||
assert llm.supports_multimodal() is True
|
||||
|
||||
def test_does_not_support_claude2(self) -> None:
|
||||
"""Test Bedrock Claude 2 does not support multimodal."""
|
||||
llm = LLM(model="bedrock/anthropic.claude-v2")
|
||||
assert llm.supports_multimodal() is False
|
||||
|
||||
def test_supported_content_types(self) -> None:
|
||||
"""Test Bedrock supports images and PDFs."""
|
||||
llm = LLM(model="bedrock/anthropic.claude-3-sonnet")
|
||||
types = llm.supported_multimodal_content_types()
|
||||
assert "image/" in types
|
||||
assert "application/pdf" in types
|
||||
|
||||
def test_format_multimodal_content_image(self) -> None:
|
||||
"""Test Bedrock uses Converse API image format."""
|
||||
llm = LLM(model="bedrock/anthropic.claude-3-sonnet")
|
||||
files = {"chart": ImageFile(source=MINIMAL_PNG)}
|
||||
|
||||
result = llm.format_multimodal_content(files)
|
||||
|
||||
assert len(result) == 1
|
||||
assert "image" in result[0]
|
||||
assert result[0]["image"]["format"] == "png"
|
||||
assert "source" in result[0]["image"]
|
||||
assert "bytes" in result[0]["image"]["source"]
|
||||
|
||||
def test_format_multimodal_content_pdf(self) -> None:
|
||||
"""Test Bedrock uses Converse API document format."""
|
||||
llm = LLM(model="bedrock/anthropic.claude-3-sonnet")
|
||||
files = {"doc": PDFFile(source=MINIMAL_PDF)}
|
||||
|
||||
result = llm.format_multimodal_content(files)
|
||||
|
||||
assert len(result) == 1
|
||||
assert "document" in result[0]
|
||||
assert result[0]["document"]["format"] == "pdf"
|
||||
assert "source" in result[0]["document"]
|
||||
|
||||
|
||||
class TestBaseLLMMultimodal:
|
||||
"""Tests for BaseLLM default multimodal behavior."""
|
||||
|
||||
def test_base_supports_multimodal_false(self) -> None:
|
||||
"""Test base implementation returns False."""
|
||||
from crewai.llms.base_llm import BaseLLM
|
||||
|
||||
class TestLLM(BaseLLM):
|
||||
def call(self, messages, tools=None, callbacks=None):
|
||||
return "test"
|
||||
|
||||
llm = TestLLM(model="test")
|
||||
assert llm.supports_multimodal() is False
|
||||
|
||||
def test_base_supported_content_types_empty(self) -> None:
|
||||
"""Test base implementation returns empty list."""
|
||||
from crewai.llms.base_llm import BaseLLM
|
||||
|
||||
class TestLLM(BaseLLM):
|
||||
def call(self, messages, tools=None, callbacks=None):
|
||||
return "test"
|
||||
|
||||
llm = TestLLM(model="test")
|
||||
assert llm.supported_multimodal_content_types() == []
|
||||
|
||||
def test_base_format_multimodal_content_empty(self) -> None:
|
||||
"""Test base implementation returns empty list."""
|
||||
from crewai.llms.base_llm import BaseLLM
|
||||
|
||||
class TestLLM(BaseLLM):
|
||||
def call(self, messages, tools=None, callbacks=None):
|
||||
return "test"
|
||||
|
||||
llm = TestLLM(model="test")
|
||||
files = {"chart": ImageFile(source=MINIMAL_PNG)}
|
||||
assert llm.format_multimodal_content(files) == []
|
||||
|
||||
def test_base_format_text_content(self) -> None:
|
||||
"""Test base text formatting uses OpenAI/Anthropic style."""
|
||||
from crewai.llms.base_llm import BaseLLM
|
||||
|
||||
class TestLLM(BaseLLM):
|
||||
def call(self, messages, tools=None, callbacks=None):
|
||||
return "test"
|
||||
|
||||
llm = TestLLM(model="test")
|
||||
result = llm.format_text_content("Hello")
|
||||
assert result == {"type": "text", "text": "Hello"}
|
||||
|
||||
|
||||
class TestMultipleFilesFormatting:
|
||||
"""Tests for formatting multiple files at once."""
|
||||
|
||||
def test_format_multiple_images(self) -> None:
|
||||
"""Test formatting multiple images."""
|
||||
llm = LLM(model="gpt-4o")
|
||||
files = {
|
||||
"chart1": ImageFile(source=MINIMAL_PNG),
|
||||
"chart2": ImageFile(source=MINIMAL_PNG),
|
||||
}
|
||||
|
||||
result = llm.format_multimodal_content(files)
|
||||
|
||||
assert len(result) == 2
|
||||
|
||||
def test_format_mixed_supported_and_unsupported(self) -> None:
|
||||
"""Test only supported types are formatted."""
|
||||
llm = LLM(model="gpt-4o") # OpenAI - images only
|
||||
files = {
|
||||
"chart": ImageFile(source=MINIMAL_PNG),
|
||||
"doc": PDFFile(source=MINIMAL_PDF), # Not supported
|
||||
"text": TextFile(source=b"hello"), # Not supported
|
||||
}
|
||||
|
||||
result = llm.format_multimodal_content(files)
|
||||
|
||||
assert len(result) == 1
|
||||
assert result[0]["type"] == "image_url"
|
||||
|
||||
def test_format_empty_files_dict(self) -> None:
|
||||
"""Test empty files dict returns empty list."""
|
||||
llm = LLM(model="gpt-4o")
|
||||
|
||||
result = llm.format_multimodal_content({})
|
||||
|
||||
assert result == []
|
||||
329
lib/crewai/tests/llms/test_multimodal_integration.py
Normal file
329
lib/crewai/tests/llms/test_multimodal_integration.py
Normal file
@@ -0,0 +1,329 @@
|
||||
"""Integration tests for LLM multimodal functionality with cassettes.
|
||||
|
||||
These tests make actual API calls (recorded via VCR cassettes) to verify
|
||||
multimodal content is properly sent and processed by each provider.
|
||||
"""
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from crewai.llm import LLM
|
||||
from crewai.files import File, ImageFile, PDFFile, TextFile
|
||||
|
||||
|
||||
# Path to test data files
|
||||
TEST_DATA_DIR = Path(__file__).parent.parent.parent.parent.parent / "data"
|
||||
TEST_IMAGE_PATH = TEST_DATA_DIR / "revenue_chart.png"
|
||||
TEST_TEXT_PATH = TEST_DATA_DIR / "review_guidelines.txt"
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def test_image_bytes() -> bytes:
|
||||
"""Load test image bytes."""
|
||||
return TEST_IMAGE_PATH.read_bytes()
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def test_text_bytes() -> bytes:
|
||||
"""Load test text bytes."""
|
||||
return TEST_TEXT_PATH.read_bytes()
|
||||
|
||||
|
||||
# Minimal PDF for testing (real PDF structure)
|
||||
MINIMAL_PDF = b"""%PDF-1.4
|
||||
1 0 obj << /Type /Catalog /Pages 2 0 R >> endobj
|
||||
2 0 obj << /Type /Pages /Kids [3 0 R] /Count 1 >> endobj
|
||||
3 0 obj << /Type /Page /Parent 2 0 R /MediaBox [0 0 612 792] >> endobj
|
||||
xref
|
||||
0 4
|
||||
0000000000 65535 f
|
||||
0000000009 00000 n
|
||||
0000000058 00000 n
|
||||
0000000115 00000 n
|
||||
trailer << /Size 4 /Root 1 0 R >>
|
||||
startxref
|
||||
196
|
||||
%%EOF
|
||||
"""
|
||||
|
||||
|
||||
def _build_multimodal_message(llm: LLM, prompt: str, files: dict) -> list[dict]:
|
||||
"""Build a multimodal message with text and file content."""
|
||||
content_blocks = llm.format_multimodal_content(files)
|
||||
return [
|
||||
{
|
||||
"role": "user",
|
||||
"content": [
|
||||
llm.format_text_content(prompt),
|
||||
*content_blocks,
|
||||
],
|
||||
}
|
||||
]
|
||||
|
||||
|
||||
class TestOpenAIMultimodalIntegration:
|
||||
"""Integration tests for OpenAI multimodal with real API calls."""
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_describe_image(self, test_image_bytes: bytes) -> None:
|
||||
"""Test OpenAI can describe an image."""
|
||||
llm = LLM(model="openai/gpt-4o-mini")
|
||||
files = {"image": ImageFile(source=test_image_bytes)}
|
||||
|
||||
messages = _build_multimodal_message(
|
||||
llm,
|
||||
"Describe this image in one sentence. Be brief.",
|
||||
files,
|
||||
)
|
||||
|
||||
response = llm.call(messages)
|
||||
|
||||
assert response
|
||||
assert isinstance(response, str)
|
||||
assert len(response) > 0
|
||||
|
||||
|
||||
class TestAnthropicMultimodalIntegration:
|
||||
"""Integration tests for Anthropic multimodal with real API calls."""
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_describe_image(self, test_image_bytes: bytes) -> None:
|
||||
"""Test Anthropic can describe an image."""
|
||||
llm = LLM(model="anthropic/claude-3-5-haiku-20241022")
|
||||
files = {"image": ImageFile(source=test_image_bytes)}
|
||||
|
||||
messages = _build_multimodal_message(
|
||||
llm,
|
||||
"Describe this image in one sentence. Be brief.",
|
||||
files,
|
||||
)
|
||||
|
||||
response = llm.call(messages)
|
||||
|
||||
assert response
|
||||
assert isinstance(response, str)
|
||||
assert len(response) > 0
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_analyze_pdf(self) -> None:
|
||||
"""Test Anthropic can analyze a PDF."""
|
||||
llm = LLM(model="anthropic/claude-3-5-haiku-20241022")
|
||||
files = {"document": PDFFile(source=MINIMAL_PDF)}
|
||||
|
||||
messages = _build_multimodal_message(
|
||||
llm,
|
||||
"What type of document is this? Answer in one word.",
|
||||
files,
|
||||
)
|
||||
|
||||
response = llm.call(messages)
|
||||
|
||||
assert response
|
||||
assert isinstance(response, str)
|
||||
assert len(response) > 0
|
||||
|
||||
|
||||
class TestGeminiMultimodalIntegration:
|
||||
"""Integration tests for Gemini multimodal with real API calls."""
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_describe_image(self, test_image_bytes: bytes) -> None:
|
||||
"""Test Gemini can describe an image."""
|
||||
llm = LLM(model="gemini/gemini-2.0-flash")
|
||||
files = {"image": ImageFile(source=test_image_bytes)}
|
||||
|
||||
messages = _build_multimodal_message(
|
||||
llm,
|
||||
"Describe this image in one sentence. Be brief.",
|
||||
files,
|
||||
)
|
||||
|
||||
response = llm.call(messages)
|
||||
|
||||
assert response
|
||||
assert isinstance(response, str)
|
||||
assert len(response) > 0
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_analyze_text_file(self, test_text_bytes: bytes) -> None:
|
||||
"""Test Gemini can analyze a text file."""
|
||||
llm = LLM(model="gemini/gemini-2.0-flash")
|
||||
files = {"readme": TextFile(source=test_text_bytes)}
|
||||
|
||||
messages = _build_multimodal_message(
|
||||
llm,
|
||||
"Summarize what this text file says in one sentence.",
|
||||
files,
|
||||
)
|
||||
|
||||
response = llm.call(messages)
|
||||
|
||||
assert response
|
||||
assert isinstance(response, str)
|
||||
assert len(response) > 0
|
||||
|
||||
|
||||
class TestLiteLLMMultimodalIntegration:
|
||||
"""Integration tests for LiteLLM wrapper multimodal with real API calls."""
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_describe_image_gpt4o(self, test_image_bytes: bytes) -> None:
|
||||
"""Test LiteLLM with GPT-4o can describe an image."""
|
||||
llm = LLM(model="gpt-4o-mini", is_litellm=True)
|
||||
files = {"image": ImageFile(source=test_image_bytes)}
|
||||
|
||||
messages = _build_multimodal_message(
|
||||
llm,
|
||||
"Describe this image in one sentence. Be brief.",
|
||||
files,
|
||||
)
|
||||
|
||||
response = llm.call(messages)
|
||||
|
||||
assert response
|
||||
assert isinstance(response, str)
|
||||
assert len(response) > 0
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_describe_image_claude(self, test_image_bytes: bytes) -> None:
|
||||
"""Test LiteLLM with Claude can describe an image."""
|
||||
llm = LLM(model="anthropic/claude-3-5-haiku-20241022", is_litellm=True)
|
||||
files = {"image": ImageFile(source=test_image_bytes)}
|
||||
|
||||
messages = _build_multimodal_message(
|
||||
llm,
|
||||
"Describe this image in one sentence. Be brief.",
|
||||
files,
|
||||
)
|
||||
|
||||
response = llm.call(messages)
|
||||
|
||||
assert response
|
||||
assert isinstance(response, str)
|
||||
assert len(response) > 0
|
||||
|
||||
|
||||
class TestMultipleFilesIntegration:
|
||||
"""Integration tests for multiple files in a single request."""
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_multiple_images_openai(self, test_image_bytes: bytes) -> None:
|
||||
"""Test OpenAI can process multiple images."""
|
||||
llm = LLM(model="openai/gpt-4o-mini")
|
||||
files = {
|
||||
"image1": ImageFile(source=test_image_bytes),
|
||||
"image2": ImageFile(source=test_image_bytes),
|
||||
}
|
||||
|
||||
messages = _build_multimodal_message(
|
||||
llm,
|
||||
"How many images do you see? Answer with just the number.",
|
||||
files,
|
||||
)
|
||||
|
||||
response = llm.call(messages)
|
||||
|
||||
assert response
|
||||
assert isinstance(response, str)
|
||||
assert "2" in response or "two" in response.lower()
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_mixed_content_anthropic(self, test_image_bytes: bytes) -> None:
|
||||
"""Test Anthropic can process image and PDF together."""
|
||||
llm = LLM(model="anthropic/claude-3-5-haiku-20241022")
|
||||
files = {
|
||||
"image": ImageFile(source=test_image_bytes),
|
||||
"document": PDFFile(source=MINIMAL_PDF),
|
||||
}
|
||||
|
||||
messages = _build_multimodal_message(
|
||||
llm,
|
||||
"What types of files did I send you? List them briefly.",
|
||||
files,
|
||||
)
|
||||
|
||||
response = llm.call(messages)
|
||||
|
||||
assert response
|
||||
assert isinstance(response, str)
|
||||
assert len(response) > 0
|
||||
|
||||
|
||||
class TestGenericFileIntegration:
|
||||
"""Integration tests for the generic File class with auto-detection."""
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_generic_file_image_openai(self, test_image_bytes: bytes) -> None:
|
||||
"""Test generic File auto-detects image and sends correct content type."""
|
||||
llm = LLM(model="openai/gpt-4o-mini")
|
||||
files = {"image": File(source=test_image_bytes)}
|
||||
|
||||
messages = _build_multimodal_message(
|
||||
llm,
|
||||
"Describe this image in one sentence. Be brief.",
|
||||
files,
|
||||
)
|
||||
|
||||
response = llm.call(messages)
|
||||
|
||||
assert response
|
||||
assert isinstance(response, str)
|
||||
assert len(response) > 0
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_generic_file_pdf_anthropic(self) -> None:
|
||||
"""Test generic File auto-detects PDF and sends correct content type."""
|
||||
llm = LLM(model="anthropic/claude-3-5-haiku-20241022")
|
||||
files = {"document": File(source=MINIMAL_PDF)}
|
||||
|
||||
messages = _build_multimodal_message(
|
||||
llm,
|
||||
"What type of document is this? Answer in one word.",
|
||||
files,
|
||||
)
|
||||
|
||||
response = llm.call(messages)
|
||||
|
||||
assert response
|
||||
assert isinstance(response, str)
|
||||
assert len(response) > 0
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_generic_file_text_gemini(self, test_text_bytes: bytes) -> None:
|
||||
"""Test generic File auto-detects text and sends correct content type."""
|
||||
llm = LLM(model="gemini/gemini-2.0-flash")
|
||||
files = {"content": File(source=test_text_bytes)}
|
||||
|
||||
messages = _build_multimodal_message(
|
||||
llm,
|
||||
"Summarize what this text says in one sentence.",
|
||||
files,
|
||||
)
|
||||
|
||||
response = llm.call(messages)
|
||||
|
||||
assert response
|
||||
assert isinstance(response, str)
|
||||
assert len(response) > 0
|
||||
|
||||
@pytest.mark.vcr()
|
||||
def test_generic_file_mixed_types(self, test_image_bytes: bytes) -> None:
|
||||
"""Test generic File works with multiple auto-detected types."""
|
||||
llm = LLM(model="anthropic/claude-3-5-haiku-20241022")
|
||||
files = {
|
||||
"chart": File(source=test_image_bytes),
|
||||
"doc": File(source=MINIMAL_PDF),
|
||||
}
|
||||
|
||||
messages = _build_multimodal_message(
|
||||
llm,
|
||||
"What types of files did I send? List them briefly.",
|
||||
files,
|
||||
)
|
||||
|
||||
response = llm.call(messages)
|
||||
|
||||
assert response
|
||||
assert isinstance(response, str)
|
||||
assert len(response) > 0
|
||||
@@ -1202,7 +1202,8 @@ def test_complex_and_or_branching():
|
||||
)
|
||||
assert execution_order.index("branch_2b") > min_branch_1_index
|
||||
|
||||
# Final should be last and after both 2a and 2b
|
||||
|
||||
# Final should be after both 2a and 2b
|
||||
assert execution_order[-1] == "final"
|
||||
assert execution_order.index("final") > execution_order.index("branch_2a")
|
||||
assert execution_order.index("final") > execution_order.index("branch_2b")
|
||||
@@ -1255,10 +1256,11 @@ def test_conditional_router_paths_exclusivity():
|
||||
|
||||
|
||||
def test_state_consistency_across_parallel_branches():
|
||||
"""Test that state remains consistent when branches execute sequentially.
|
||||
"""Test that state remains consistent when branches execute in parallel.
|
||||
|
||||
Note: Branches triggered by the same parent execute sequentially, not in parallel.
|
||||
This ensures predictable state mutations and prevents race conditions.
|
||||
Note: Branches triggered by the same parent execute in parallel for efficiency.
|
||||
Thread-safe state access via StateProxy ensures no race conditions.
|
||||
We check the execution order to ensure the branches execute in parallel.
|
||||
"""
|
||||
execution_order = []
|
||||
|
||||
@@ -1295,12 +1297,14 @@ def test_state_consistency_across_parallel_branches():
|
||||
flow = StateConsistencyFlow()
|
||||
flow.kickoff()
|
||||
|
||||
# Branches execute sequentially, so branch_a runs first, then branch_b
|
||||
assert flow.state["branch_a_value"] == 10 # Sees initial value
|
||||
assert flow.state["branch_b_value"] == 11 # Sees value after branch_a increment
|
||||
assert "branch_a" in execution_order
|
||||
assert "branch_b" in execution_order
|
||||
assert "verify_state" in execution_order
|
||||
|
||||
# Final counter should reflect both increments sequentially
|
||||
assert flow.state["counter"] == 16 # 10 + 1 + 5
|
||||
assert flow.state["branch_a_value"] is not None
|
||||
assert flow.state["branch_b_value"] is not None
|
||||
|
||||
assert flow.state["counter"] == 16
|
||||
|
||||
|
||||
def test_deeply_nested_conditions():
|
||||
|
||||
@@ -247,4 +247,4 @@ def test_persistence_with_base_model(tmp_path):
|
||||
assert message.role == "user"
|
||||
assert message.type == "text"
|
||||
assert message.content == "Hello, World!"
|
||||
assert isinstance(flow.state, State)
|
||||
assert isinstance(flow.state._unwrap(), State)
|
||||
|
||||
@@ -185,8 +185,8 @@ def test_task_guardrail_process_output(task_output):
|
||||
|
||||
result = guardrail(task_output)
|
||||
assert result[0] is False
|
||||
|
||||
assert result[1] == "The task result contains more than 10 words, violating the guardrail. The text provided contains about 21 words."
|
||||
# Check that feedback is provided (wording varies by LLM)
|
||||
assert result[1] == "The task output exceeds the word limit of 10 words by containing 22 words."
|
||||
|
||||
guardrail = LLMGuardrail(
|
||||
description="Ensure the result has less than 500 words", llm=LLM(model="gpt-4o")
|
||||
|
||||
122
lib/crewai/tests/tools/agent_tools/test_read_file_tool.py
Normal file
122
lib/crewai/tests/tools/agent_tools/test_read_file_tool.py
Normal file
@@ -0,0 +1,122 @@
|
||||
"""Unit tests for ReadFileTool."""
|
||||
|
||||
import base64
|
||||
|
||||
import pytest
|
||||
|
||||
from crewai.tools.agent_tools.read_file_tool import ReadFileTool
|
||||
from crewai.files import ImageFile, PDFFile, TextFile
|
||||
|
||||
|
||||
class TestReadFileTool:
|
||||
"""Tests for ReadFileTool."""
|
||||
|
||||
def setup_method(self) -> None:
|
||||
"""Set up test fixtures."""
|
||||
self.tool = ReadFileTool()
|
||||
|
||||
def test_tool_metadata(self) -> None:
|
||||
"""Test tool has correct name and description."""
|
||||
assert self.tool.name == "read_file"
|
||||
assert "Read content from an input file" in self.tool.description
|
||||
|
||||
def test_run_no_files_available(self) -> None:
|
||||
"""Test _run returns message when no files are set."""
|
||||
result = self.tool._run(file_name="any.txt")
|
||||
assert result == "No input files available."
|
||||
|
||||
def test_run_file_not_found(self) -> None:
|
||||
"""Test _run returns message when file not found."""
|
||||
self.tool.set_files({"doc.txt": TextFile(source=b"content")})
|
||||
|
||||
result = self.tool._run(file_name="missing.txt")
|
||||
|
||||
assert "File 'missing.txt' not found" in result
|
||||
assert "doc.txt" in result # Lists available files
|
||||
|
||||
def test_run_text_file(self) -> None:
|
||||
"""Test reading a text file returns decoded content."""
|
||||
text_content = "Hello, this is text content!"
|
||||
self.tool.set_files({"readme.txt": TextFile(source=text_content.encode())})
|
||||
|
||||
result = self.tool._run(file_name="readme.txt")
|
||||
|
||||
assert result == text_content
|
||||
|
||||
def test_run_json_file(self) -> None:
|
||||
"""Test reading a JSON file returns decoded content."""
|
||||
json_content = '{"key": "value"}'
|
||||
self.tool.set_files({"data.json": TextFile(source=json_content.encode())})
|
||||
|
||||
result = self.tool._run(file_name="data.json")
|
||||
|
||||
assert result == json_content
|
||||
|
||||
def test_run_binary_file_returns_base64(self) -> None:
|
||||
"""Test reading a binary file returns base64 encoded content."""
|
||||
# Minimal valid PNG structure for proper MIME detection
|
||||
png_bytes = (
|
||||
b"\x89PNG\r\n\x1a\n"
|
||||
b"\x00\x00\x00\rIHDR"
|
||||
b"\x00\x00\x00\x01\x00\x00\x00\x01\x08\x02\x00\x00\x00"
|
||||
b"\x90wS\xde"
|
||||
b"\x00\x00\x00\x00IEND\xaeB`\x82"
|
||||
)
|
||||
self.tool.set_files({"image.png": ImageFile(source=png_bytes)})
|
||||
|
||||
result = self.tool._run(file_name="image.png")
|
||||
|
||||
assert "[Binary file:" in result
|
||||
assert "image/png" in result
|
||||
assert "Base64:" in result
|
||||
|
||||
# Verify base64 can be decoded
|
||||
b64_part = result.split("Base64: ")[1]
|
||||
decoded = base64.b64decode(b64_part)
|
||||
assert decoded == png_bytes
|
||||
|
||||
def test_run_pdf_file_returns_base64(self) -> None:
|
||||
"""Test reading a PDF file returns base64 encoded content."""
|
||||
pdf_bytes = b"%PDF-1.4 some content here"
|
||||
self.tool.set_files({"doc.pdf": PDFFile(source=pdf_bytes)})
|
||||
|
||||
result = self.tool._run(file_name="doc.pdf")
|
||||
|
||||
assert "[Binary file:" in result
|
||||
assert "application/pdf" in result
|
||||
|
||||
def test_set_files_none(self) -> None:
|
||||
"""Test setting files to None."""
|
||||
self.tool.set_files({"doc": TextFile(source=b"content")})
|
||||
self.tool.set_files(None)
|
||||
|
||||
result = self.tool._run(file_name="doc")
|
||||
|
||||
assert result == "No input files available."
|
||||
|
||||
def test_run_multiple_files(self) -> None:
|
||||
"""Test tool can access multiple files."""
|
||||
self.tool.set_files({
|
||||
"file1.txt": TextFile(source=b"content 1"),
|
||||
"file2.txt": TextFile(source=b"content 2"),
|
||||
"file3.txt": TextFile(source=b"content 3"),
|
||||
})
|
||||
|
||||
assert self.tool._run(file_name="file1.txt") == "content 1"
|
||||
assert self.tool._run(file_name="file2.txt") == "content 2"
|
||||
assert self.tool._run(file_name="file3.txt") == "content 3"
|
||||
|
||||
def test_run_with_kwargs(self) -> None:
|
||||
"""Test _run ignores extra kwargs."""
|
||||
self.tool.set_files({"doc.txt": TextFile(source=b"content")})
|
||||
|
||||
result = self.tool._run(file_name="doc.txt", extra_arg="ignored")
|
||||
|
||||
assert result == "content"
|
||||
|
||||
def test_args_schema(self) -> None:
|
||||
"""Test that args_schema is properly defined."""
|
||||
schema = self.tool.args_schema
|
||||
|
||||
assert "file_name" in schema.model_fields
|
||||
assert schema.model_fields["file_name"].is_required()
|
||||
@@ -348,11 +348,11 @@ def test_agent_emits_execution_error_event(base_agent, base_task):
|
||||
|
||||
error_message = "Error happening while sending prompt to model."
|
||||
base_agent.max_retry_limit = 0
|
||||
with patch.object(
|
||||
CrewAgentExecutor, "invoke", wraps=base_agent.agent_executor.invoke
|
||||
) as invoke_mock:
|
||||
invoke_mock.side_effect = Exception(error_message)
|
||||
|
||||
# Patch at the class level since agent_executor is created lazily
|
||||
with patch.object(
|
||||
CrewAgentExecutor, "invoke", side_effect=Exception(error_message)
|
||||
):
|
||||
with pytest.raises(Exception): # noqa: B017
|
||||
base_agent.execute_task(
|
||||
task=base_task,
|
||||
|
||||
171
lib/crewai/tests/utilities/test_file_store.py
Normal file
171
lib/crewai/tests/utilities/test_file_store.py
Normal file
@@ -0,0 +1,171 @@
|
||||
"""Unit tests for file_store module."""
|
||||
|
||||
import uuid
|
||||
|
||||
import pytest
|
||||
|
||||
from crewai.utilities.file_store import (
|
||||
clear_files,
|
||||
clear_task_files,
|
||||
get_all_files,
|
||||
get_files,
|
||||
get_task_files,
|
||||
store_files,
|
||||
store_task_files,
|
||||
)
|
||||
from crewai.files import TextFile
|
||||
|
||||
|
||||
class TestFileStore:
|
||||
"""Tests for synchronous file store operations."""
|
||||
|
||||
def setup_method(self) -> None:
|
||||
"""Set up test fixtures."""
|
||||
self.crew_id = uuid.uuid4()
|
||||
self.task_id = uuid.uuid4()
|
||||
self.test_file = TextFile(source=b"test content")
|
||||
|
||||
def teardown_method(self) -> None:
|
||||
"""Clean up after tests."""
|
||||
clear_files(self.crew_id)
|
||||
clear_task_files(self.task_id)
|
||||
|
||||
def test_store_and_get_files(self) -> None:
|
||||
"""Test storing and retrieving crew files."""
|
||||
files = {"doc": self.test_file}
|
||||
store_files(self.crew_id, files)
|
||||
|
||||
retrieved = get_files(self.crew_id)
|
||||
|
||||
assert retrieved is not None
|
||||
assert "doc" in retrieved
|
||||
assert retrieved["doc"].read() == b"test content"
|
||||
|
||||
def test_get_files_returns_none_when_empty(self) -> None:
|
||||
"""Test that get_files returns None for non-existent keys."""
|
||||
new_id = uuid.uuid4()
|
||||
result = get_files(new_id)
|
||||
assert result is None
|
||||
|
||||
def test_clear_files(self) -> None:
|
||||
"""Test clearing crew files."""
|
||||
files = {"doc": self.test_file}
|
||||
store_files(self.crew_id, files)
|
||||
|
||||
clear_files(self.crew_id)
|
||||
|
||||
result = get_files(self.crew_id)
|
||||
assert result is None
|
||||
|
||||
def test_store_and_get_task_files(self) -> None:
|
||||
"""Test storing and retrieving task files."""
|
||||
files = {"task_doc": self.test_file}
|
||||
store_task_files(self.task_id, files)
|
||||
|
||||
retrieved = get_task_files(self.task_id)
|
||||
|
||||
assert retrieved is not None
|
||||
assert "task_doc" in retrieved
|
||||
|
||||
def test_clear_task_files(self) -> None:
|
||||
"""Test clearing task files."""
|
||||
files = {"task_doc": self.test_file}
|
||||
store_task_files(self.task_id, files)
|
||||
|
||||
clear_task_files(self.task_id)
|
||||
|
||||
result = get_task_files(self.task_id)
|
||||
assert result is None
|
||||
|
||||
def test_get_all_files_merges_crew_and_task(self) -> None:
|
||||
"""Test that get_all_files merges crew and task files."""
|
||||
crew_file = TextFile(source=b"crew content")
|
||||
task_file = TextFile(source=b"task content")
|
||||
|
||||
store_files(self.crew_id, {"crew_doc": crew_file})
|
||||
store_task_files(self.task_id, {"task_doc": task_file})
|
||||
|
||||
merged = get_all_files(self.crew_id, self.task_id)
|
||||
|
||||
assert merged is not None
|
||||
assert "crew_doc" in merged
|
||||
assert "task_doc" in merged
|
||||
|
||||
def test_get_all_files_task_overrides_crew(self) -> None:
|
||||
"""Test that task files override crew files with same name."""
|
||||
crew_file = TextFile(source=b"crew version")
|
||||
task_file = TextFile(source=b"task version")
|
||||
|
||||
store_files(self.crew_id, {"shared_doc": crew_file})
|
||||
store_task_files(self.task_id, {"shared_doc": task_file})
|
||||
|
||||
merged = get_all_files(self.crew_id, self.task_id)
|
||||
|
||||
assert merged is not None
|
||||
assert merged["shared_doc"].read() == b"task version"
|
||||
|
||||
def test_get_all_files_crew_only(self) -> None:
|
||||
"""Test get_all_files with only crew files."""
|
||||
store_files(self.crew_id, {"doc": self.test_file})
|
||||
|
||||
result = get_all_files(self.crew_id)
|
||||
|
||||
assert result is not None
|
||||
assert "doc" in result
|
||||
|
||||
def test_get_all_files_returns_none_when_empty(self) -> None:
|
||||
"""Test that get_all_files returns None when no files exist."""
|
||||
new_crew_id = uuid.uuid4()
|
||||
new_task_id = uuid.uuid4()
|
||||
|
||||
result = get_all_files(new_crew_id, new_task_id)
|
||||
|
||||
assert result is None
|
||||
|
||||
|
||||
@pytest.mark.asyncio
|
||||
class TestAsyncFileStore:
|
||||
"""Tests for asynchronous file store operations."""
|
||||
|
||||
async def test_astore_and_aget_files(self) -> None:
|
||||
"""Test async storing and retrieving crew files."""
|
||||
from crewai.utilities.file_store import aclear_files, aget_files, astore_files
|
||||
|
||||
crew_id = uuid.uuid4()
|
||||
test_file = TextFile(source=b"async content")
|
||||
|
||||
try:
|
||||
await astore_files(crew_id, {"doc": test_file})
|
||||
retrieved = await aget_files(crew_id)
|
||||
|
||||
assert retrieved is not None
|
||||
assert "doc" in retrieved
|
||||
assert retrieved["doc"].read() == b"async content"
|
||||
finally:
|
||||
await aclear_files(crew_id)
|
||||
|
||||
async def test_aget_all_files(self) -> None:
|
||||
"""Test async get_all_files merging."""
|
||||
from crewai.utilities.file_store import (
|
||||
aclear_files,
|
||||
aclear_task_files,
|
||||
aget_all_files,
|
||||
astore_files,
|
||||
astore_task_files,
|
||||
)
|
||||
|
||||
crew_id = uuid.uuid4()
|
||||
task_id = uuid.uuid4()
|
||||
|
||||
try:
|
||||
await astore_files(crew_id, {"crew": TextFile(source=b"crew")})
|
||||
await astore_task_files(task_id, {"task": TextFile(source=b"task")})
|
||||
|
||||
merged = await aget_all_files(crew_id, task_id)
|
||||
|
||||
assert merged is not None
|
||||
assert "crew" in merged
|
||||
assert "task" in merged
|
||||
finally:
|
||||
await aclear_files(crew_id)
|
||||
await aclear_task_files(task_id)
|
||||
520
lib/crewai/tests/utilities/test_files.py
Normal file
520
lib/crewai/tests/utilities/test_files.py
Normal file
@@ -0,0 +1,520 @@
|
||||
"""Unit tests for files module."""
|
||||
|
||||
import io
|
||||
import tempfile
|
||||
from pathlib import Path
|
||||
|
||||
import pytest
|
||||
|
||||
from crewai.files import (
|
||||
AudioFile,
|
||||
File,
|
||||
FileBytes,
|
||||
FilePath,
|
||||
FileSource,
|
||||
FileStream,
|
||||
ImageFile,
|
||||
PDFFile,
|
||||
TextFile,
|
||||
VideoFile,
|
||||
normalize_input_files,
|
||||
wrap_file_source,
|
||||
)
|
||||
from crewai.files.file import detect_content_type
|
||||
|
||||
|
||||
class TestDetectContentType:
|
||||
"""Tests for MIME type detection."""
|
||||
|
||||
def test_detect_plain_text(self) -> None:
|
||||
"""Test detection of plain text content."""
|
||||
result = detect_content_type(b"Hello, World!")
|
||||
assert result == "text/plain"
|
||||
|
||||
def test_detect_json(self) -> None:
|
||||
"""Test detection of JSON content."""
|
||||
result = detect_content_type(b'{"key": "value"}')
|
||||
assert result == "application/json"
|
||||
|
||||
def test_detect_png(self) -> None:
|
||||
"""Test detection of PNG content."""
|
||||
# Minimal valid PNG: header + IHDR chunk + IEND chunk
|
||||
png_data = (
|
||||
b"\x89PNG\r\n\x1a\n" # PNG signature
|
||||
b"\x00\x00\x00\rIHDR" # IHDR chunk length and type
|
||||
b"\x00\x00\x00\x01" # width: 1
|
||||
b"\x00\x00\x00\x01" # height: 1
|
||||
b"\x08\x02" # bit depth: 8, color type: 2 (RGB)
|
||||
b"\x00\x00\x00" # compression, filter, interlace
|
||||
b"\x90wS\xde" # CRC
|
||||
b"\x00\x00\x00\x00IEND\xaeB`\x82" # IEND chunk
|
||||
)
|
||||
result = detect_content_type(png_data)
|
||||
assert result == "image/png"
|
||||
|
||||
def test_detect_jpeg(self) -> None:
|
||||
"""Test detection of JPEG header."""
|
||||
jpeg_header = b"\xff\xd8\xff\xe0\x00\x10JFIF"
|
||||
result = detect_content_type(jpeg_header)
|
||||
assert result == "image/jpeg"
|
||||
|
||||
def test_detect_pdf(self) -> None:
|
||||
"""Test detection of PDF header."""
|
||||
pdf_header = b"%PDF-1.4"
|
||||
result = detect_content_type(pdf_header)
|
||||
assert result == "application/pdf"
|
||||
|
||||
|
||||
class TestFilePath:
|
||||
"""Tests for FilePath class."""
|
||||
|
||||
def test_create_from_existing_file(self, tmp_path: Path) -> None:
|
||||
"""Test creating FilePath from an existing file."""
|
||||
file_path = tmp_path / "test.txt"
|
||||
file_path.write_text("test content")
|
||||
|
||||
fp = FilePath(path=file_path)
|
||||
|
||||
assert fp.filename == "test.txt"
|
||||
assert fp.read() == b"test content"
|
||||
|
||||
def test_content_is_cached(self, tmp_path: Path) -> None:
|
||||
"""Test that file content is cached after first read."""
|
||||
file_path = tmp_path / "test.txt"
|
||||
file_path.write_text("original")
|
||||
|
||||
fp = FilePath(path=file_path)
|
||||
first_read = fp.read()
|
||||
|
||||
# Modify file after first read
|
||||
file_path.write_text("modified")
|
||||
second_read = fp.read()
|
||||
|
||||
assert first_read == second_read == b"original"
|
||||
|
||||
def test_raises_for_missing_file(self, tmp_path: Path) -> None:
|
||||
"""Test that FilePath raises for non-existent files."""
|
||||
with pytest.raises(ValueError, match="File not found"):
|
||||
FilePath(path=tmp_path / "nonexistent.txt")
|
||||
|
||||
def test_raises_for_directory(self, tmp_path: Path) -> None:
|
||||
"""Test that FilePath raises for directories."""
|
||||
with pytest.raises(ValueError, match="Path is not a file"):
|
||||
FilePath(path=tmp_path)
|
||||
|
||||
def test_content_type_detection(self, tmp_path: Path) -> None:
|
||||
"""Test content type detection from file content."""
|
||||
file_path = tmp_path / "test.txt"
|
||||
file_path.write_text("plain text content")
|
||||
|
||||
fp = FilePath(path=file_path)
|
||||
|
||||
assert fp.content_type == "text/plain"
|
||||
|
||||
|
||||
class TestFileBytes:
|
||||
"""Tests for FileBytes class."""
|
||||
|
||||
def test_create_from_bytes(self) -> None:
|
||||
"""Test creating FileBytes from raw bytes."""
|
||||
fb = FileBytes(data=b"test data")
|
||||
|
||||
assert fb.read() == b"test data"
|
||||
assert fb.filename is None
|
||||
|
||||
def test_create_with_filename(self) -> None:
|
||||
"""Test creating FileBytes with optional filename."""
|
||||
fb = FileBytes(data=b"test", filename="doc.txt")
|
||||
|
||||
assert fb.filename == "doc.txt"
|
||||
|
||||
def test_content_type_detection(self) -> None:
|
||||
"""Test content type detection from bytes."""
|
||||
fb = FileBytes(data=b"text content")
|
||||
|
||||
assert fb.content_type == "text/plain"
|
||||
|
||||
|
||||
class TestFileStream:
|
||||
"""Tests for FileStream class."""
|
||||
|
||||
def test_create_from_stream(self) -> None:
|
||||
"""Test creating FileStream from a file-like object."""
|
||||
stream = io.BytesIO(b"stream content")
|
||||
|
||||
fs = FileStream(stream=stream)
|
||||
|
||||
assert fs.read() == b"stream content"
|
||||
|
||||
def test_content_is_cached(self) -> None:
|
||||
"""Test that stream content is cached."""
|
||||
stream = io.BytesIO(b"original")
|
||||
|
||||
fs = FileStream(stream=stream)
|
||||
first = fs.read()
|
||||
|
||||
# Even after modifying stream, cached content is returned
|
||||
stream.seek(0)
|
||||
stream.write(b"modified")
|
||||
second = fs.read()
|
||||
|
||||
assert first == second == b"original"
|
||||
|
||||
def test_filename_from_stream(self, tmp_path: Path) -> None:
|
||||
"""Test filename extraction from stream with name attribute."""
|
||||
file_path = tmp_path / "named.txt"
|
||||
file_path.write_text("content")
|
||||
|
||||
with open(file_path, "rb") as f:
|
||||
fs = FileStream(stream=f)
|
||||
assert fs.filename == "named.txt"
|
||||
|
||||
def test_close_stream(self) -> None:
|
||||
"""Test closing the underlying stream."""
|
||||
stream = io.BytesIO(b"data")
|
||||
fs = FileStream(stream=stream)
|
||||
|
||||
fs.close()
|
||||
|
||||
assert stream.closed
|
||||
|
||||
|
||||
class TestTypedFileWrappers:
|
||||
"""Tests for typed file wrapper classes."""
|
||||
|
||||
def test_image_file_from_bytes(self) -> None:
|
||||
"""Test ImageFile creation from bytes."""
|
||||
# Minimal valid PNG structure
|
||||
png_bytes = (
|
||||
b"\x89PNG\r\n\x1a\n"
|
||||
b"\x00\x00\x00\rIHDR"
|
||||
b"\x00\x00\x00\x01\x00\x00\x00\x01\x08\x02\x00\x00\x00"
|
||||
b"\x90wS\xde"
|
||||
b"\x00\x00\x00\x00IEND\xaeB`\x82"
|
||||
)
|
||||
img = ImageFile(source=png_bytes)
|
||||
|
||||
assert img.content_type == "image/png"
|
||||
|
||||
def test_image_file_from_path(self, tmp_path: Path) -> None:
|
||||
"""Test ImageFile creation from path string."""
|
||||
file_path = tmp_path / "test.png"
|
||||
file_path.write_bytes(b"\x89PNG\r\n\x1a\n" + b"\x00" * 100)
|
||||
|
||||
img = ImageFile(source=str(file_path))
|
||||
|
||||
assert img.filename == "test.png"
|
||||
|
||||
def test_text_file_read_text(self) -> None:
|
||||
"""Test TextFile.read_text method."""
|
||||
tf = TextFile(source=b"Hello, World!")
|
||||
|
||||
assert tf.read_text() == "Hello, World!"
|
||||
|
||||
def test_pdf_file_creation(self) -> None:
|
||||
"""Test PDFFile creation."""
|
||||
pdf_bytes = b"%PDF-1.4 content"
|
||||
pdf = PDFFile(source=pdf_bytes)
|
||||
|
||||
assert pdf.read() == pdf_bytes
|
||||
|
||||
def test_audio_file_creation(self) -> None:
|
||||
"""Test AudioFile creation."""
|
||||
audio = AudioFile(source=b"audio data")
|
||||
assert audio.read() == b"audio data"
|
||||
|
||||
def test_video_file_creation(self) -> None:
|
||||
"""Test VideoFile creation."""
|
||||
video = VideoFile(source=b"video data")
|
||||
assert video.read() == b"video data"
|
||||
|
||||
def test_dict_unpacking(self, tmp_path: Path) -> None:
|
||||
"""Test that files support ** unpacking syntax."""
|
||||
file_path = tmp_path / "document.txt"
|
||||
file_path.write_text("content")
|
||||
|
||||
tf = TextFile(source=str(file_path))
|
||||
|
||||
# Unpack into dict
|
||||
result = {**tf}
|
||||
|
||||
assert "document" in result
|
||||
assert result["document"] is tf
|
||||
|
||||
def test_dict_unpacking_no_filename(self) -> None:
|
||||
"""Test dict unpacking with bytes (no filename)."""
|
||||
tf = TextFile(source=b"content")
|
||||
result = {**tf}
|
||||
|
||||
assert "file" in result
|
||||
|
||||
def test_keys_method(self, tmp_path: Path) -> None:
|
||||
"""Test keys() method for dict unpacking."""
|
||||
file_path = tmp_path / "test.txt"
|
||||
file_path.write_text("content")
|
||||
|
||||
tf = TextFile(source=str(file_path))
|
||||
|
||||
assert tf.keys() == ["test"]
|
||||
|
||||
def test_getitem_valid_key(self, tmp_path: Path) -> None:
|
||||
"""Test __getitem__ with valid key."""
|
||||
file_path = tmp_path / "doc.txt"
|
||||
file_path.write_text("content")
|
||||
|
||||
tf = TextFile(source=str(file_path))
|
||||
|
||||
assert tf["doc"] is tf
|
||||
|
||||
def test_getitem_invalid_key(self, tmp_path: Path) -> None:
|
||||
"""Test __getitem__ with invalid key raises KeyError."""
|
||||
file_path = tmp_path / "doc.txt"
|
||||
file_path.write_text("content")
|
||||
|
||||
tf = TextFile(source=str(file_path))
|
||||
|
||||
with pytest.raises(KeyError):
|
||||
_ = tf["wrong_key"]
|
||||
|
||||
|
||||
class TestWrapFileSource:
|
||||
"""Tests for wrap_file_source function."""
|
||||
|
||||
def test_wrap_image_source(self) -> None:
|
||||
"""Test wrapping image source returns ImageFile."""
|
||||
# Minimal valid PNG structure
|
||||
png_bytes = (
|
||||
b"\x89PNG\r\n\x1a\n"
|
||||
b"\x00\x00\x00\rIHDR"
|
||||
b"\x00\x00\x00\x01\x00\x00\x00\x01\x08\x02\x00\x00\x00"
|
||||
b"\x90wS\xde"
|
||||
b"\x00\x00\x00\x00IEND\xaeB`\x82"
|
||||
)
|
||||
source = FileBytes(data=png_bytes)
|
||||
|
||||
result = wrap_file_source(source)
|
||||
|
||||
assert isinstance(result, ImageFile)
|
||||
|
||||
def test_wrap_pdf_source(self) -> None:
|
||||
"""Test wrapping PDF source returns PDFFile."""
|
||||
source = FileBytes(data=b"%PDF-1.4 content")
|
||||
|
||||
result = wrap_file_source(source)
|
||||
|
||||
assert isinstance(result, PDFFile)
|
||||
|
||||
def test_wrap_text_source(self) -> None:
|
||||
"""Test wrapping text source returns TextFile."""
|
||||
source = FileBytes(data=b"plain text")
|
||||
|
||||
result = wrap_file_source(source)
|
||||
|
||||
assert isinstance(result, TextFile)
|
||||
|
||||
|
||||
class TestNormalizeInputFiles:
|
||||
"""Tests for normalize_input_files function."""
|
||||
|
||||
def test_normalize_path_strings(self, tmp_path: Path) -> None:
|
||||
"""Test normalizing path strings."""
|
||||
file1 = tmp_path / "doc1.txt"
|
||||
file2 = tmp_path / "doc2.txt"
|
||||
file1.write_text("content1")
|
||||
file2.write_text("content2")
|
||||
|
||||
result = normalize_input_files([str(file1), str(file2)])
|
||||
|
||||
assert "doc1.txt" in result
|
||||
assert "doc2.txt" in result
|
||||
|
||||
def test_normalize_path_objects(self, tmp_path: Path) -> None:
|
||||
"""Test normalizing Path objects."""
|
||||
file_path = tmp_path / "document.txt"
|
||||
file_path.write_text("content")
|
||||
|
||||
result = normalize_input_files([file_path])
|
||||
|
||||
assert "document.txt" in result
|
||||
|
||||
def test_normalize_bytes(self) -> None:
|
||||
"""Test normalizing raw bytes."""
|
||||
result = normalize_input_files([b"content1", b"content2"])
|
||||
|
||||
assert "file_0" in result
|
||||
assert "file_1" in result
|
||||
|
||||
def test_normalize_file_source(self) -> None:
|
||||
"""Test normalizing FileSource objects."""
|
||||
source = FileBytes(data=b"content", filename="named.txt")
|
||||
|
||||
result = normalize_input_files([source])
|
||||
|
||||
assert "named.txt" in result
|
||||
|
||||
def test_normalize_mixed_inputs(self, tmp_path: Path) -> None:
|
||||
"""Test normalizing mixed input types."""
|
||||
file_path = tmp_path / "path.txt"
|
||||
file_path.write_text("from path")
|
||||
|
||||
inputs = [
|
||||
str(file_path),
|
||||
b"raw bytes",
|
||||
FileBytes(data=b"source", filename="source.txt"),
|
||||
]
|
||||
|
||||
result = normalize_input_files(inputs)
|
||||
|
||||
assert len(result) == 3
|
||||
assert "path.txt" in result
|
||||
assert "file_1" in result
|
||||
assert "source.txt" in result
|
||||
|
||||
def test_empty_input(self) -> None:
|
||||
"""Test normalizing empty input list."""
|
||||
result = normalize_input_files([])
|
||||
assert result == {}
|
||||
|
||||
|
||||
class TestGenericFile:
|
||||
"""Tests for the generic File class with auto-detection."""
|
||||
|
||||
def test_file_from_text_bytes(self) -> None:
|
||||
"""Test File creation from text bytes auto-detects content type."""
|
||||
f = File(source=b"Hello, World!")
|
||||
|
||||
assert f.content_type == "text/plain"
|
||||
assert f.read() == b"Hello, World!"
|
||||
|
||||
def test_file_from_png_bytes(self) -> None:
|
||||
"""Test File creation from PNG bytes auto-detects image type."""
|
||||
png_bytes = (
|
||||
b"\x89PNG\r\n\x1a\n"
|
||||
b"\x00\x00\x00\rIHDR"
|
||||
b"\x00\x00\x00\x01\x00\x00\x00\x01\x08\x02\x00\x00\x00"
|
||||
b"\x90wS\xde"
|
||||
b"\x00\x00\x00\x00IEND\xaeB`\x82"
|
||||
)
|
||||
f = File(source=png_bytes)
|
||||
|
||||
assert f.content_type == "image/png"
|
||||
|
||||
def test_file_from_pdf_bytes(self) -> None:
|
||||
"""Test File creation from PDF bytes auto-detects PDF type."""
|
||||
f = File(source=b"%PDF-1.4 content")
|
||||
|
||||
assert f.content_type == "application/pdf"
|
||||
|
||||
def test_file_from_path(self, tmp_path: Path) -> None:
|
||||
"""Test File creation from path string."""
|
||||
file_path = tmp_path / "document.txt"
|
||||
file_path.write_text("file content")
|
||||
|
||||
f = File(source=str(file_path))
|
||||
|
||||
assert f.filename == "document.txt"
|
||||
assert f.read() == b"file content"
|
||||
assert f.content_type == "text/plain"
|
||||
|
||||
def test_file_from_path_object(self, tmp_path: Path) -> None:
|
||||
"""Test File creation from Path object."""
|
||||
file_path = tmp_path / "data.txt"
|
||||
file_path.write_text("path object content")
|
||||
|
||||
f = File(source=file_path)
|
||||
|
||||
assert f.filename == "data.txt"
|
||||
assert f.read_text() == "path object content"
|
||||
|
||||
def test_file_read_text(self) -> None:
|
||||
"""Test File.read_text method."""
|
||||
f = File(source=b"Text content here")
|
||||
|
||||
assert f.read_text() == "Text content here"
|
||||
|
||||
def test_file_dict_unpacking(self, tmp_path: Path) -> None:
|
||||
"""Test File supports ** unpacking syntax."""
|
||||
file_path = tmp_path / "report.txt"
|
||||
file_path.write_text("report content")
|
||||
|
||||
f = File(source=str(file_path))
|
||||
result = {**f}
|
||||
|
||||
assert "report" in result
|
||||
assert result["report"] is f
|
||||
|
||||
def test_file_dict_unpacking_no_filename(self) -> None:
|
||||
"""Test File dict unpacking with bytes (no filename)."""
|
||||
f = File(source=b"content")
|
||||
result = {**f}
|
||||
|
||||
assert "file" in result
|
||||
|
||||
def test_file_keys_method(self, tmp_path: Path) -> None:
|
||||
"""Test File keys() method."""
|
||||
file_path = tmp_path / "chart.png"
|
||||
file_path.write_bytes(b"\x89PNG\r\n\x1a\n" + b"\x00" * 50)
|
||||
|
||||
f = File(source=str(file_path))
|
||||
|
||||
assert f.keys() == ["chart"]
|
||||
|
||||
def test_file_getitem(self, tmp_path: Path) -> None:
|
||||
"""Test File __getitem__ with valid key."""
|
||||
file_path = tmp_path / "image.png"
|
||||
file_path.write_bytes(b"\x89PNG\r\n\x1a\n" + b"\x00" * 50)
|
||||
|
||||
f = File(source=str(file_path))
|
||||
|
||||
assert f["image"] is f
|
||||
|
||||
def test_file_getitem_invalid_key(self, tmp_path: Path) -> None:
|
||||
"""Test File __getitem__ with invalid key raises KeyError."""
|
||||
file_path = tmp_path / "doc.txt"
|
||||
file_path.write_text("content")
|
||||
|
||||
f = File(source=str(file_path))
|
||||
|
||||
with pytest.raises(KeyError):
|
||||
_ = f["wrong"]
|
||||
|
||||
def test_file_with_stream(self) -> None:
|
||||
"""Test File creation from stream."""
|
||||
stream = io.BytesIO(b"stream content")
|
||||
|
||||
f = File(source=stream)
|
||||
|
||||
assert f.read() == b"stream content"
|
||||
assert f.content_type == "text/plain"
|
||||
|
||||
def test_file_default_mode(self) -> None:
|
||||
"""Test File has default mode of 'auto'."""
|
||||
f = File(source=b"content")
|
||||
|
||||
assert f.mode == "auto"
|
||||
|
||||
def test_file_custom_mode(self) -> None:
|
||||
"""Test File with custom mode mode."""
|
||||
f = File(source=b"content", mode="strict")
|
||||
|
||||
assert f.mode == "strict"
|
||||
|
||||
def test_file_chunk_mode(self) -> None:
|
||||
"""Test File with chunk mode mode."""
|
||||
f = File(source=b"content", mode="chunk")
|
||||
|
||||
assert f.mode == "chunk"
|
||||
|
||||
def test_image_file_with_mode(self) -> None:
|
||||
"""Test ImageFile with custom mode."""
|
||||
png_bytes = (
|
||||
b"\x89PNG\r\n\x1a\n"
|
||||
b"\x00\x00\x00\rIHDR"
|
||||
b"\x00\x00\x00\x01\x00\x00\x00\x01\x08\x02\x00\x00\x00"
|
||||
b"\x90wS\xde"
|
||||
b"\x00\x00\x00\x00IEND\xaeB`\x82"
|
||||
)
|
||||
img = ImageFile(source=png_bytes, mode="strict")
|
||||
|
||||
assert img.mode == "strict"
|
||||
assert img.content_type == "image/png"
|
||||
@@ -28,6 +28,7 @@ dev = [
|
||||
"boto3-stubs[bedrock-runtime]==1.40.54",
|
||||
"types-psycopg2==2.9.21.20251012",
|
||||
"types-pymysql==1.1.0.20250916",
|
||||
"types-aiofiles~=24.1.0",
|
||||
]
|
||||
|
||||
|
||||
|
||||
47
uv.lock
generated
47
uv.lock
generated
@@ -50,6 +50,7 @@ dev = [
|
||||
{ name = "pytest-timeout", specifier = "==2.4.0" },
|
||||
{ name = "pytest-xdist", specifier = "==3.8.0" },
|
||||
{ name = "ruff", specifier = "==0.14.7" },
|
||||
{ name = "types-aiofiles", specifier = "~=24.1.0" },
|
||||
{ name = "types-appdirs", specifier = "==1.4.*" },
|
||||
{ name = "types-psycopg2", specifier = "==2.9.21.20251012" },
|
||||
{ name = "types-pymysql", specifier = "==1.1.0.20250916" },
|
||||
@@ -131,11 +132,11 @@ redis = [
|
||||
|
||||
[[package]]
|
||||
name = "aiofiles"
|
||||
version = "25.1.0"
|
||||
version = "24.1.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/41/c3/534eac40372d8ee36ef40df62ec129bee4fdb5ad9706e58a29be53b2c970/aiofiles-25.1.0.tar.gz", hash = "sha256:a8d728f0a29de45dc521f18f07297428d56992a742f0cd2701ba86e44d23d5b2", size = 46354, upload-time = "2025-10-09T20:51:04.358Z" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/0b/03/a88171e277e8caa88a4c77808c20ebb04ba74cc4681bf1e9416c862de237/aiofiles-24.1.0.tar.gz", hash = "sha256:22a075c9e5a3810f0c2e48f3008c94d68c65d763b9b03857924c99e57355166c", size = 30247, upload-time = "2024-06-24T11:02:03.584Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/bc/8a/340a1555ae33d7354dbca4faa54948d76d89a27ceef032c8c3bc661d003e/aiofiles-25.1.0-py3-none-any.whl", hash = "sha256:abe311e527c862958650f9438e859c1fa7568a141b22abcd015e120e86a85695", size = 14668, upload-time = "2025-10-09T20:51:03.174Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/a5/45/30bb92d442636f570cb5651bc661f52b610e2eec3f891a5dc3a4c3667db0/aiofiles-24.1.0-py3-none-any.whl", hash = "sha256:b4ec55f4195e3eb5d7abd1bf7e061763e864dd4954231fb8539a0ef8bb8260e5", size = 15896, upload-time = "2024-06-24T11:02:01.529Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -1199,6 +1200,13 @@ docling = [
|
||||
embeddings = [
|
||||
{ name = "tiktoken" },
|
||||
]
|
||||
file-processing = [
|
||||
{ name = "aiocache" },
|
||||
{ name = "aiofiles" },
|
||||
{ name = "pillow" },
|
||||
{ name = "pypdf" },
|
||||
{ name = "python-magic" },
|
||||
]
|
||||
google-genai = [
|
||||
{ name = "google-genai" },
|
||||
]
|
||||
@@ -1231,7 +1239,9 @@ watson = [
|
||||
requires-dist = [
|
||||
{ name = "a2a-sdk", marker = "extra == 'a2a'", specifier = "~=0.3.10" },
|
||||
{ name = "aiobotocore", marker = "extra == 'aws'", specifier = "~=2.25.2" },
|
||||
{ name = "aiocache", marker = "extra == 'file-processing'", specifier = "~=0.12.3" },
|
||||
{ name = "aiocache", extras = ["memcached", "redis"], marker = "extra == 'a2a'", specifier = "~=0.12.3" },
|
||||
{ name = "aiofiles", marker = "extra == 'file-processing'", specifier = "~=24.1.0" },
|
||||
{ name = "aiosqlite", specifier = "~=0.21.0" },
|
||||
{ name = "anthropic", marker = "extra == 'anthropic'", specifier = "~=0.71.0" },
|
||||
{ name = "appdirs", specifier = "~=1.4.4" },
|
||||
@@ -1261,11 +1271,14 @@ requires-dist = [
|
||||
{ name = "opentelemetry-sdk", specifier = "~=1.34.0" },
|
||||
{ name = "pandas", marker = "extra == 'pandas'", specifier = "~=2.2.3" },
|
||||
{ name = "pdfplumber", specifier = "~=0.11.4" },
|
||||
{ name = "pillow", marker = "extra == 'file-processing'", specifier = "~=10.4.0" },
|
||||
{ name = "portalocker", specifier = "~=2.7.0" },
|
||||
{ name = "pydantic", specifier = "~=2.11.9" },
|
||||
{ name = "pydantic-settings", specifier = "~=2.10.1" },
|
||||
{ name = "pyjwt", specifier = "~=2.9.0" },
|
||||
{ name = "pypdf", marker = "extra == 'file-processing'", specifier = "~=4.0.0" },
|
||||
{ name = "python-dotenv", specifier = "~=1.1.1" },
|
||||
{ name = "python-magic", marker = "extra == 'file-processing'", specifier = ">=0.4.27" },
|
||||
{ name = "qdrant-client", extras = ["fastembed"], marker = "extra == 'qdrant'", specifier = "~=1.14.3" },
|
||||
{ name = "regex", specifier = "~=2024.9.11" },
|
||||
{ name = "tiktoken", marker = "extra == 'embeddings'", specifier = "~=0.8.0" },
|
||||
@@ -1275,7 +1288,7 @@ requires-dist = [
|
||||
{ name = "uv", specifier = "~=0.9.13" },
|
||||
{ name = "voyageai", marker = "extra == 'voyageai'", specifier = "~=0.3.5" },
|
||||
]
|
||||
provides-extras = ["a2a", "anthropic", "aws", "azure-ai-inference", "bedrock", "docling", "embeddings", "google-genai", "litellm", "mem0", "openpyxl", "pandas", "qdrant", "tools", "voyageai", "watson"]
|
||||
provides-extras = ["a2a", "anthropic", "aws", "azure-ai-inference", "bedrock", "docling", "embeddings", "file-processing", "google-genai", "litellm", "mem0", "openpyxl", "pandas", "qdrant", "tools", "voyageai", "watson"]
|
||||
|
||||
[[package]]
|
||||
name = "crewai-devtools"
|
||||
@@ -6211,14 +6224,11 @@ wheels = [
|
||||
|
||||
[[package]]
|
||||
name = "pypdf"
|
||||
version = "6.4.1"
|
||||
version = "4.0.2"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "typing-extensions", marker = "python_full_version < '3.11'" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/0c/e0/57f914ae9fedbc91fe3ebe74b78c88903943ec9c232b6da15947bb3bf8ab/pypdf-6.4.1.tar.gz", hash = "sha256:36eb0b52730fc3077d2b8d4122751e696d46af9ef9e5383db492df1ab0cc4647", size = 5275322, upload-time = "2025-12-07T14:19:27.922Z" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/5f/de/5ee74158c3090ec99eae9f90c9e9c18f207fa5c722b0e95d6fa7faebcdf8/pypdf-4.0.2.tar.gz", hash = "sha256:3316d9ddfcff5df67ae3cdfe8b945c432aa43e7f970bae7c2a4ab4fe129cd937", size = 280173, upload-time = "2024-02-18T15:45:10.729Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/db/ef/68c0f473d8b8764b23f199450dfa035e6f2206e67e9bde5dd695bab9bdf0/pypdf-6.4.1-py3-none-any.whl", hash = "sha256:1782ee0766f0b77defc305f1eb2bafe738a2ef6313f3f3d2ee85b4542ba7e535", size = 328325, upload-time = "2025-12-07T14:19:26.286Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/d7/87/30f8a2963247fd7b1267e600379c5e3f51c9849a07d042398e4485b7415c/pypdf-4.0.2-py3-none-any.whl", hash = "sha256:a62daa2a24d5a608ba1b6284dde185317ce3644f89b9ebe5314d0c5d1c9f257d", size = 283953, upload-time = "2024-02-18T15:45:07.857Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -7604,12 +7614,14 @@ dependencies = [
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/be/f9/5e4491e5ccf42f5d9cfc663741d261b3e6e1683ae7812114e7636409fcc6/sqlalchemy-2.0.45.tar.gz", hash = "sha256:1632a4bda8d2d25703fdad6363058d882541bdaaee0e5e3ddfa0cd3229efce88", size = 9869912, upload-time = "2025-12-09T21:05:16.737Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/fe/70/75b1387d72e2847220441166c5eb4e9846dd753895208c13e6d66523b2d9/sqlalchemy-2.0.45-cp310-cp310-macosx_11_0_arm64.whl", hash = "sha256:c64772786d9eee72d4d3784c28f0a636af5b0a29f3fe26ff11f55efe90c0bd85", size = 2154148, upload-time = "2025-12-10T20:03:21.023Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/d8/a4/7805e02323c49cb9d1ae5cd4913b28c97103079765f520043f914fca4cb3/sqlalchemy-2.0.45-cp310-cp310-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7ae64ebf7657395824a19bca98ab10eb9a3ecb026bf09524014f1bb81cb598d4", size = 3233051, upload-time = "2025-12-09T22:06:04.768Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/d7/ec/32ae09139f61bef3de3142e85c47abdee8db9a55af2bb438da54a4549263/sqlalchemy-2.0.45-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:0f02325709d1b1a1489f23a39b318e175a171497374149eae74d612634b234c0", size = 3232781, upload-time = "2025-12-09T22:09:54.435Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ad/bd/bf7b869b6f5585eac34222e1cf4405f4ba8c3b85dd6b1af5d4ce8bca695f/sqlalchemy-2.0.45-cp310-cp310-musllinux_1_2_aarch64.whl", hash = "sha256:d2c3684fca8a05f0ac1d9a21c1f4a266983a7ea9180efb80ffeb03861ecd01a0", size = 3182096, upload-time = "2025-12-09T22:06:06.169Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/21/6a/c219720a241bb8f35c88815ccc27761f5af7fdef04b987b0e8a2c1a6dcaa/sqlalchemy-2.0.45-cp310-cp310-musllinux_1_2_x86_64.whl", hash = "sha256:040f6f0545b3b7da6b9317fc3e922c9a98fc7243b2a1b39f78390fc0942f7826", size = 3205109, upload-time = "2025-12-09T22:09:55.969Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/bd/c4/6ccf31b2bc925d5d95fab403ffd50d20d7c82b858cf1a4855664ca054dce/sqlalchemy-2.0.45-cp310-cp310-win32.whl", hash = "sha256:830d434d609fe7bfa47c425c445a8b37929f140a7a44cdaf77f6d34df3a7296a", size = 2114240, upload-time = "2025-12-09T21:29:54.007Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/de/29/a27a31fca07316def418db6f7c70ab14010506616a2decef1906050a0587/sqlalchemy-2.0.45-cp310-cp310-win_amd64.whl", hash = "sha256:0209d9753671b0da74da2cfbb9ecf9c02f72a759e4b018b3ab35f244c91842c7", size = 2137615, upload-time = "2025-12-09T21:29:55.85Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/a2/1c/769552a9d840065137272ebe86ffbb0bc92b0f1e0a68ee5266a225f8cd7b/sqlalchemy-2.0.45-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:2e90a344c644a4fa871eb01809c32096487928bd2038bf10f3e4515cb688cc56", size = 2153860, upload-time = "2025-12-10T20:03:23.843Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/f3/f8/9be54ff620e5b796ca7b44670ef58bc678095d51b0e89d6e3102ea468216/sqlalchemy-2.0.45-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:b8c8b41b97fba5f62349aa285654230296829672fc9939cd7f35aab246d1c08b", size = 3309379, upload-time = "2025-12-09T22:06:07.461Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/f6/2b/60ce3ee7a5ae172bfcd419ce23259bb874d2cddd44f67c5df3760a1e22f9/sqlalchemy-2.0.45-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_28_x86_64.whl", hash = "sha256:12c694ed6468333a090d2f60950e4250b928f457e4962389553d6ba5fe9951ac", size = 3309948, upload-time = "2025-12-09T22:09:57.643Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/a3/42/bac8d393f5db550e4e466d03d16daaafd2bad1f74e48c12673fb499a7fc1/sqlalchemy-2.0.45-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:f7d27a1d977a1cfef38a0e2e1ca86f09c4212666ce34e6ae542f3ed0a33bc606", size = 3261239, upload-time = "2025-12-09T22:06:08.879Z" },
|
||||
@@ -8201,6 +8213,15 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/00/22/35617eee79080a5d071d0f14ad698d325ee6b3bf824fc0467c03b30e7fa8/typer-0.19.2-py3-none-any.whl", hash = "sha256:755e7e19670ffad8283db353267cb81ef252f595aa6834a0d1ca9312d9326cb9", size = 46748, upload-time = "2025-09-23T09:47:46.777Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "types-aiofiles"
|
||||
version = "24.1.0.20250822"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/19/48/c64471adac9206cc844afb33ed311ac5a65d2f59df3d861e0f2d0cad7414/types_aiofiles-24.1.0.20250822.tar.gz", hash = "sha256:9ab90d8e0c307fe97a7cf09338301e3f01a163e39f3b529ace82466355c84a7b", size = 14484, upload-time = "2025-08-22T03:02:23.039Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/bc/8e/5e6d2215e1d8f7c2a94c6e9d0059ae8109ce0f5681956d11bb0a228cef04/types_aiofiles-24.1.0.20250822-py3-none-any.whl", hash = "sha256:0ec8f8909e1a85a5a79aed0573af7901f53120dd2a29771dd0b3ef48e12328b0", size = 14322, upload-time = "2025-08-22T03:02:21.918Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "types-appdirs"
|
||||
version = "1.4.3.5"
|
||||
@@ -8451,7 +8472,7 @@ local-inference = [
|
||||
|
||||
[[package]]
|
||||
name = "unstructured-client"
|
||||
version = "0.42.4"
|
||||
version = "0.42.3"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "aiofiles" },
|
||||
@@ -8462,9 +8483,9 @@ dependencies = [
|
||||
{ name = "pypdf" },
|
||||
{ name = "requests-toolbelt" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/a4/8f/43c9a936a153e62f18e7629128698feebd81d2cfff2835febc85377b8eb8/unstructured_client-0.42.4.tar.gz", hash = "sha256:144ecd231a11d091cdc76acf50e79e57889269b8c9d8b9df60e74cf32ac1ba5e", size = 91404, upload-time = "2025-11-14T16:59:25.131Z" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/96/45/0d605c1c4ed6e38845e9e7d95758abddc7d66e1d096ef9acdf2ecdeaf009/unstructured_client-0.42.3.tar.gz", hash = "sha256:a568d8b281fafdf452647d874060cd0647e33e4a19e811b4db821eb1f3051163", size = 91379, upload-time = "2025-08-12T20:48:04.937Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/5e/6c/7c69e4353e5bdd05fc247c2ec1d840096eb928975697277b015c49405b0f/unstructured_client-0.42.4-py3-none-any.whl", hash = "sha256:fc6341344dd2f2e2aed793636b5f4e6204cad741ff2253d5a48ff2f2bccb8e9a", size = 207863, upload-time = "2025-11-14T16:59:23.674Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/47/1c/137993fff771efc3d5c31ea6b6d126c635c7b124ea641531bca1fd8ea815/unstructured_client-0.42.3-py3-none-any.whl", hash = "sha256:14e9a6a44ed58c64bacd32c62d71db19bf9c2f2b46a2401830a8dfff48249d39", size = 207814, upload-time = "2025-08-12T20:48:03.638Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
||||
Reference in New Issue
Block a user