Compare commits

...

173 Commits

Author SHA1 Message Date
Greyson LaLonde
8f1e4d5f5c fix(tests): rename files to input_files in _prepare_kickoff test 2026-01-23 10:16:41 -05:00
Greyson LaLonde
07b2abfe44 chore: update test assumption 2026-01-23 10:07:27 -05:00
Greyson LaLonde
f7bd6292db feat(files): add PDF constraints for OpenAI and URL references for Bedrock 2026-01-23 09:59:52 -05:00
Greyson LaLonde
eff3a7d115 fix(tests): update mock_kickoff_fn signatures to accept input_files 2026-01-23 09:58:34 -05:00
Greyson LaLonde
ec116b6d42 fix(files): make FileInput available at runtime for Pydantic models 2026-01-23 09:37:07 -05:00
Greyson LaLonde
343ad02c88 feat(files): use FileInput type for all input_files parameters 2026-01-23 09:21:22 -05:00
Greyson LaLonde
cd0a2c3900 feat(files): standardize input_files parameter across all kickoff methods 2026-01-23 09:08:19 -05:00
Greyson LaLonde
f21751ffb8 fix(tests): convert crewai-tools search tool cassettes to new VCR format
Convert 5 cassettes from old VCR format (content, http_version, status_code)
to new format (body.string, status.code, status.message) to fix test failures.
2026-01-23 07:03:16 -05:00
Greyson LaLonde
57769fd8ff fix(tests): add tracing API requests to tracing test cassettes
Add mock tracing batch initialization requests to cassettes that were
missing them. The tests expect requests to fake.crewai.com (from .env.test)
but the cassettes only had OpenAI API requests.
2026-01-23 06:50:07 -05:00
Greyson LaLonde
15fd1bf898 fix(tests): fix multimodal and before_kickoff_callback tests
- Mock supports_multimodal() to return False in image tool test since
  AddImageTool is only added when LLM doesn't natively support multimodal
- Remove incorrect assertion that expected original inputs dict to be
  modified (it's copied internally before modification)
2026-01-23 06:39:09 -05:00
Greyson LaLonde
f89c39a480 fix(tests): convert VCR cassettes to correct format and fix file tests
- Convert 78 cassettes from old VCR format (content/status_code/http_version)
  to new format (body.string/status.code/status.message)
- Replace MagicMock with real File objects in file-related agent tests
2026-01-23 06:29:28 -05:00
Greyson LaLonde
8eea0e45eb fix(tests): use real File objects instead of MagicMock in file tests
Replace MagicMock with crewai_files.File instances in file-related
tests to satisfy Pydantic validation requirements.
2026-01-23 06:13:04 -05:00
Greyson LaLonde
2f87d2c1b6 chore: regen cassettes; make linter happy 2026-01-23 02:34:26 -05:00
Greyson LaLonde
6145dfdbe7 fix(crews): validate inputs type before dict conversion
- Add explicit type check for Mapping in prepare_kickoff to raise
  TypeError with clear message instead of ValueError from dict()
- Update test_kickoff_for_each_invalid_input to expect TypeError
- Fix test_multimodal_flag_adds_multimodal_tools to mock LLM's
  supports_multimodal() since AddImageTool is only added when
  the LLM doesn't natively support multimodal content
2026-01-23 02:25:12 -05:00
Greyson LaLonde
ceb2bdc7fb feat(files): add prefer_upload parameter to format_multimodal_content
Allow callers to force file uploads via the high-level API instead of
only triggering uploads based on file size thresholds. Useful for
testing and when file_id references are preferred over inline base64.
2026-01-23 02:19:12 -05:00
Greyson LaLonde
dc4bbfb5b9 feat(files): add api parameter to format_multimodal_content
Allows selecting OpenAIResponsesFormatter via api="responses" parameter
instead of always using Chat Completions format.
2026-01-23 02:06:09 -05:00
Greyson LaLonde
c208ace3da feat(files): add files param to agent.kickoff() and async aliases 2026-01-23 02:01:11 -05:00
Greyson LaLonde
4ab53c0726 feat(files): add file_id upload support and text file handling
- Add VCR patch for binary request bodies (base64 encoding fallback)
- Add generate_filename() utility for UUID-based filenames with extension
- Add OpenAIResponsesFormatter for Responses API (input_image, input_file)
- Fix OpenAI uploader to use 'vision' purpose for images
- Fix Anthropic uploader to use tuple format (filename, content, content_type)
- Add TextConstraints and text support for Gemini
- Add file_id upload integration tests for Anthropic and OpenAI Responses API
2026-01-23 01:57:29 -05:00
Greyson LaLonde
7c9ce9ccd8 feat(openai): add Responses API support with auto-chaining and ZDR compliance
- Add full OpenAI Responses API support alongside existing Chat Completions API
- Implement auto_chain parameter to automatically track and pass previous_response_id
- Add auto_chain_reasoning for encrypted reasoning in ZDR (Zero Data Retention) scenarios
- Parse built-in tool outputs: web_search, file_search, computer_use, code_interpreter
- Support all Responses API parameters: reasoning, include, tools, truncation, etc.
- Add streaming support for Responses API with proper event handling
- Include 67 tests covering all new functionality
2026-01-23 01:53:15 -05:00
Greyson LaLonde
c0f7a24e94 test: add gpt-5 and gpt-5-nano vision integration tests 2026-01-22 23:56:49 -05:00
Greyson LaLonde
01527b74f5 test: add gpt-5-mini vision integration test 2026-01-22 23:51:31 -05:00
Greyson LaLonde
a4f387bbe5 test: add o4-mini and gpt-4.1-mini vision integration tests 2026-01-22 23:35:15 -05:00
Greyson LaLonde
772a311f9d fix: preserve files during message summarization 2026-01-22 23:23:35 -05:00
Greyson LaLonde
661d4d29b2 test: add Gemini video and audio integration tests 2026-01-22 23:16:08 -05:00
Greyson LaLonde
f3efd2946a fix: filter empty VCR responses from stainless client 2026-01-22 23:10:15 -05:00
Greyson LaLonde
8b6337627c test: add async LLM usage test with stop parameter 2026-01-22 23:02:02 -05:00
Greyson LaLonde
4921b71e8b fix: re-record VCR cassette for async LLM test 2026-01-22 22:56:15 -05:00
Greyson LaLonde
8be27da9ff docs: add README and description for crewai-files package 2026-01-22 22:50:10 -05:00
Greyson LaLonde
2c5e794ea3 feat: allow LLM providers to pass clients to file uploaders
- Add get_file_uploader() method to BaseLLM (returns None by default)
- Implement get_file_uploader() in Anthropic, OpenAI, Gemini, Bedrock
- Pass both sync and async clients where applicable
- Update uploaders to accept optional pre-instantiated clients
- Update factory to pass through client parameters

This allows reusing authenticated LLM clients for file uploads,
avoiding redundant connections.
2026-01-22 22:44:05 -05:00
Greyson LaLonde
9a2b610b21 fix: handle optional crewai_files import in types.py 2026-01-22 22:33:58 -05:00
Greyson LaLonde
19d6a47d0c fix: support multimodal content in Bedrock message formatting
- Add format_text_content override for Bedrock's {"text": ...} format
- Handle pre-formatted list content in _format_messages_for_converse
- Update Bedrock tests to use Claude 3 Haiku for on-demand availability
- Add VCR cassettes for Bedrock multimodal tests
2026-01-22 22:27:58 -05:00
Greyson LaLonde
83bab3531b test: add Gemini multimodal integration test cassettes
Record VCR cassettes for Gemini multimodal tests and add missing
TextFile import.
2026-01-22 22:14:41 -05:00
Greyson LaLonde
11b50abbec test: add multimodal integration test cassettes
Record VCR cassettes for OpenAI, Anthropic, Azure, and LiteLLM
multimodal tests. Gemini and Bedrock tests remain but cassettes
will be generated when credentials are available.
2026-01-22 22:12:20 -05:00
Greyson LaLonde
a1cbb2f4e2 refactor: improve multimodal file handling architecture
- Make crewai_files an optional dependency with graceful fallbacks
- Move file formatting from executor to LLM layer (_process_message_files)
- Add files field to LLMMessage type for cleaner message passing
- Add cache_control to Anthropic content blocks for prompt caching
- Clean up formatters: static methods for OpenAI/Gemini, proper error handling
- Remove unused ContentFormatter protocol
- Move test fixtures to lib/crewai-files/tests/fixtures
- Add Azure and Bedrock multimodal integration tests
- Fix mypy errors in crew_agent_executor.py
2026-01-22 21:55:10 -05:00
Greyson LaLonde
dc015b14f9 Merge branch 'main' into gl/feat/native-multimodal-files 2026-01-22 20:47:35 -05:00
Lorenze Jay
bd4d039f63 Lorenze/imp/native tool calling (#4258)
Some checks are pending
CodeQL Advanced / Analyze (actions) (push) Waiting to run
CodeQL Advanced / Analyze (python) (push) Waiting to run
Notify Downstream / notify-downstream (push) Waiting to run
* wip restrcuturing agent executor and liteagent

* fix: handle None task in AgentExecutor to prevent errors

Added a check to ensure that if the task is None, the method returns early without attempting to access task properties. This change improves the robustness of the AgentExecutor by preventing potential errors when the task is not set.

* refactor: streamline AgentExecutor initialization by removing redundant parameters

Updated the Agent class to simplify the initialization of the AgentExecutor by removing unnecessary task and crew parameters in standalone mode. This change enhances code clarity and maintains backward compatibility by ensuring that the executor is correctly configured without redundant assignments.

* wip: clean

* ensure executors work inside a flow due to flow in flow async structure

* refactor: enhance agent kickoff preparation by separating common logic

Updated the Agent class to introduce a new private method  that consolidates the common setup logic for both synchronous and asynchronous kickoff executions. This change improves code clarity and maintainability by reducing redundancy in the kickoff process, while ensuring that the agent can still execute effectively within both standalone and flow contexts.

* linting and tests

* fix test

* refactor: improve test for Agent kickoff parameters

Updated the test for the Agent class to ensure that the kickoff method correctly preserves parameters. The test now verifies the configuration of the agent after kickoff, enhancing clarity and maintainability. Additionally, the test for asynchronous kickoff within a flow context has been updated to reflect the Agent class instead of LiteAgent.

* refactor: update test task guardrail process output for improved validation

Refactored the test for task guardrail process output to enhance the validation of the output against the OpenAPI schema. The changes include a more structured request body and updated response handling to ensure compliance with the guardrail requirements. This update aims to improve the clarity and reliability of the test cases, ensuring that task outputs are correctly validated and feedback is appropriately provided.

* test fix cassette

* test fix cassette

* working

* working cassette

* refactor: streamline agent execution and enhance flow compatibility

Refactored the Agent class to simplify the execution method by removing the event loop check and clarifying the behavior when called from synchronous and asynchronous contexts. The changes ensure that the method operates seamlessly within flow methods, improving clarity in the documentation. Additionally, updated the AgentExecutor to set the response model to None, enhancing flexibility. New test cassettes were added to validate the functionality of agents within flow contexts, ensuring robust testing for both synchronous and asynchronous operations.

* fixed cassette

* Enhance Flow Execution Logic

- Introduced conditional execution for start methods in the Flow class.
- Unconditional start methods are prioritized during kickoff, while conditional starts are executed only if no unconditional starts are present.
- Improved handling of cyclic flows by allowing re-execution of conditional start methods triggered by routers.
- Added checks to continue execution chains for completed conditional starts.

These changes improve the flexibility and control of flow execution, ensuring that the correct methods are triggered based on the defined conditions.

* Enhance Agent and Flow Execution Logic

- Updated the Agent class to automatically detect the event loop and return a coroutine when called within a Flow, simplifying async handling for users.
- Modified Flow class to execute listeners sequentially, preventing race conditions on shared state during listener execution.
- Improved handling of coroutine results from synchronous methods, ensuring proper execution flow and state management.

These changes enhance the overall execution logic and user experience when working with agents and flows in CrewAI.

* Enhance Flow Listener Logic and Agent Imports

- Updated the Flow class to track fired OR listeners, ensuring that multi-source OR listeners only trigger once during execution. This prevents redundant executions and improves flow efficiency.
- Cleared fired OR listeners during cyclic flow resets to allow re-execution in new cycles.
- Modified the Agent class imports to include Coroutine from collections.abc, enhancing type handling for asynchronous operations.

These changes improve the control and performance of flow execution in CrewAI, ensuring more predictable behavior in complex scenarios.

* adjusted test due to new cassette

* ensure native tool calling works with liteagent

* ensure response model is respected

* Enhance Tool Name Handling for LLM Compatibility

- Added a new function  to replace invalid characters in function names with underscores, ensuring compatibility with LLM providers.
- Updated the  function to sanitize tool names before validation.
- Modified the  function to use sanitized names for tool registration.

These changes improve the robustness of tool name handling, preventing potential issues with invalid characters in function names.

* ensure we dont finalize batch on just a liteagent finishing

* max tools per turn wip and ensure we drop print times

* fix sync main issues

* fix llm_call_completed event serialization issue

* drop max_tools_iterations

* for fixing model dump with state

* Add extract_tool_call_info function to handle various tool call formats

- Introduced a new utility function  to extract tool call ID, name, and arguments from different provider formats (OpenAI, Gemini, Anthropic, and dictionary).
- This enhancement improves the flexibility and compatibility of tool calls across multiple LLM providers, ensuring consistent handling of tool call information.
- The function returns a tuple containing the call ID, function name, and function arguments, or None if the format is unrecognized.

* Refactor AgentExecutor to support batch execution of native tool calls

- Updated the  method to process all tools from  in a single batch, enhancing efficiency and reducing the number of interactions with the LLM.
- Introduced a new utility function  to streamline the extraction of tool call details, improving compatibility with various tool formats.
- Removed the  parameter, simplifying the initialization of the .
- Enhanced logging and message handling to provide clearer insights during tool execution.
- This refactor improves the overall performance and usability of the agent execution flow.

* Update English translations for tool usage and reasoning instructions

- Revised the `post_tool_reasoning` message to clarify the analysis process after tool usage, emphasizing the need to provide only the final answer if requirements are met.
- Updated the `format` message to simplify the instructions for deciding between using a tool or providing a final answer, enhancing clarity for users.
- These changes improve the overall user experience by providing clearer guidance on task execution and response formatting.

* fix

* fixing azure tests

* organizae imports

* dropped unused

* Remove debug print statements from AgentExecutor to clean up the code and improve readability. This change enhances the overall performance of the agent execution flow by eliminating unnecessary console output during LLM calls and iterations.

* linted

* updated cassette

* regen cassette

* revert crew agent executor

* adjust cassettes and dropped tests due to native tool implementation

* adjust

* ensure we properly fail tools and emit their events

* Enhance tool handling and delegation tracking in agent executors

- Implemented immediate return for tools with result_as_answer=True in crew_agent_executor.py.
- Added delegation tracking functionality in agent_utils.py to increment delegations when specific tools are used.
- Updated tool usage logic to handle caching more effectively in tool_usage.py.
- Enhanced test cases to validate new delegation features and tool caching behavior.

This update improves the efficiency of tool execution and enhances the delegation capabilities of agents.

* Enhance tool handling and delegation tracking in agent executors

- Implemented immediate return for tools with result_as_answer=True in crew_agent_executor.py.
- Added delegation tracking functionality in agent_utils.py to increment delegations when specific tools are used.
- Updated tool usage logic to handle caching more effectively in tool_usage.py.
- Enhanced test cases to validate new delegation features and tool caching behavior.

This update improves the efficiency of tool execution and enhances the delegation capabilities of agents.

* fix cassettes

* fix

* regen cassettes

* regen gemini

* ensure we support bedrock

* supporting bedrock

* regen azure cassettes

* Implement max usage count tracking for tools in agent executors

- Added functionality to check if a tool has reached its maximum usage count before execution in both crew_agent_executor.py and agent_executor.py.
- Enhanced error handling to return a message when a tool's usage limit is reached.
- Updated tool usage logic in tool_usage.py to increment usage counts and print current usage status.
- Introduced tests to validate max usage count behavior for native tool calling, ensuring proper enforcement and tracking.

This update improves tool management by preventing overuse and providing clear feedback when limits are reached.

* fix other test

* fix test

* drop logs

* better tests

* regen

* regen all azure cassettes

* regen again placeholder for cassette matching

* fix: unify tool name sanitization across codebase

* fix: include tool role messages in save_last_messages

* fix: update sanitize_tool_name test expectations

Align test expectations with unified sanitize_tool_name behavior
that lowercases and splits camelCase for LLM provider compatibility.

* fix: apply sanitize_tool_name consistently across codebase

Unify tool name sanitization to ensure consistency between tool names
shown to LLMs and tool name matching/lookup logic.

* regen

* fix: sanitize tool names in native tool call processing

- Update extract_tool_call_info to return sanitized tool names
- Fix delegation tool name matching to use sanitized names
- Add sanitization in crew_agent_executor tool call extraction
- Add sanitization in experimental agent_executor
- Add sanitization in LLM.call function lookup
- Update streaming utility to use sanitized names
- Update base_agent_executor_mixin delegation check

* Extract text content from parts directly to avoid warning about non-text parts

* Add test case for Gemini token usage tracking

- Introduced a new YAML cassette for tracking token usage in Gemini API responses.
- Updated the test for Gemini to validate token usage metrics and response content.
- Ensured proper integration with the Gemini model and API key handling.

---------

Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
2026-01-22 17:44:03 -08:00
Greyson LaLonde
7d03758c83 fix: use typing_extensions.TypedDict for Python < 3.12 compatibility 2026-01-22 20:27:50 -05:00
Greyson LaLonde
bbdb383529 Merge branch 'lorenze/imp/native-tool-calling' into gl/feat/native-multimodal-files 2026-01-22 20:22:30 -05:00
Greyson LaLonde
4bd32f6626 fix: import Self from typing_extensions for Python 3.10 compatibility 2026-01-22 20:16:25 -05:00
lorenzejay
69bc9a5897 Extract text content from parts directly to avoid warning about non-text parts 2026-01-22 17:12:17 -08:00
Greyson LaLonde
4decb15c61 Merge branch 'lorenze/imp/native-tool-calling' into gl/feat/native-multimodal-files 2026-01-22 20:09:53 -05:00
Greyson LaLonde
80f7410683 Merge branch 'lorenze/imp/native-tool-calling' of https://github.com/crewAIInc/crewAI into lorenze/imp/native-tool-calling 2026-01-22 20:05:41 -05:00
Greyson LaLonde
b104d64b39 fix: sanitize tool names in native tool call processing
- Update extract_tool_call_info to return sanitized tool names
- Fix delegation tool name matching to use sanitized names
- Add sanitization in crew_agent_executor tool call extraction
- Add sanitization in experimental agent_executor
- Add sanitization in LLM.call function lookup
- Update streaming utility to use sanitized names
- Update base_agent_executor_mixin delegation check
2026-01-22 20:05:20 -05:00
lorenzejay
a54005459d Merge branch 'lorenze/imp/native-tool-calling' of github.com:crewAIInc/crewAI into lorenze/imp/native-tool-calling 2026-01-22 16:53:16 -08:00
lorenzejay
ec3a65b529 regen 2026-01-22 16:53:00 -08:00
Greyson LaLonde
242757f67b fix: apply sanitize_tool_name consistently across codebase
Unify tool name sanitization to ensure consistency between tool names
shown to LLMs and tool name matching/lookup logic.
2026-01-22 19:52:25 -05:00
Greyson LaLonde
8310ca1369 fix: update sanitize_tool_name test expectations
Align test expectations with unified sanitize_tool_name behavior
that lowercases and splits camelCase for LLM provider compatibility.
2026-01-22 19:17:42 -05:00
Greyson LaLonde
1b006beedc Merge branch 'main' into gl/feat/native-multimodal-files 2026-01-22 19:14:55 -05:00
Greyson LaLonde
edae4e889c fix: include tool role messages in save_last_messages 2026-01-22 19:10:33 -05:00
Greyson LaLonde
846133310b fix: unify tool name sanitization across codebase 2026-01-22 19:01:14 -05:00
lorenzejay
e9ca6e89d8 regen again placeholder for cassette matching 2026-01-22 14:49:07 -08:00
lorenzejay
11c96d6e3c regen all azure cassettes 2026-01-22 14:14:21 -08:00
lorenzejay
f1bad9c748 regen 2026-01-22 14:09:16 -08:00
lorenzejay
73963b8e65 better tests 2026-01-22 14:08:02 -08:00
lorenzejay
249b118e9e drop logs 2026-01-22 13:54:27 -08:00
lorenzejay
51c5973033 fix test 2026-01-22 13:52:38 -08:00
lorenzejay
c7a83c8c36 fix other test 2026-01-22 13:47:52 -08:00
lorenzejay
ba15fbf8ea Implement max usage count tracking for tools in agent executors
- Added functionality to check if a tool has reached its maximum usage count before execution in both crew_agent_executor.py and agent_executor.py.
- Enhanced error handling to return a message when a tool's usage limit is reached.
- Updated tool usage logic in tool_usage.py to increment usage counts and print current usage status.
- Introduced tests to validate max usage count behavior for native tool calling, ensuring proper enforcement and tracking.

This update improves tool management by preventing overuse and providing clear feedback when limits are reached.
2026-01-22 13:47:36 -08:00
lorenzejay
90ab4d2527 regen azure cassettes 2026-01-22 13:40:10 -08:00
lorenzejay
65746137fe supporting bedrock 2026-01-22 13:31:56 -08:00
lorenzejay
89e961e08e ensure we support bedrock 2026-01-22 13:29:49 -08:00
lorenzejay
a61cfb258f regen gemini 2026-01-22 13:02:14 -08:00
Greyson LaLonde
ca07114bcf refactor: centralize multimodal formatting in crewai_files 2026-01-22 15:59:55 -05:00
lorenzejay
2f300bf86e regen cassettes 2026-01-22 12:49:56 -08:00
lorenzejay
f3951cb09d fix 2026-01-22 12:36:05 -08:00
Greyson LaLonde
b95a3a9bc8 refactor: extract files module to standalone crewai-files package 2026-01-22 15:06:20 -05:00
lorenzejay
77697c3ad9 fix cassettes 2026-01-22 11:43:07 -08:00
lorenzejay
a0fad289c5 Enhance tool handling and delegation tracking in agent executors
- Implemented immediate return for tools with result_as_answer=True in crew_agent_executor.py.
- Added delegation tracking functionality in agent_utils.py to increment delegations when specific tools are used.
- Updated tool usage logic to handle caching more effectively in tool_usage.py.
- Enhanced test cases to validate new delegation features and tool caching behavior.

This update improves the efficiency of tool execution and enhances the delegation capabilities of agents.
2026-01-22 11:42:52 -08:00
lorenzejay
458f6867f0 Enhance tool handling and delegation tracking in agent executors
- Implemented immediate return for tools with result_as_answer=True in crew_agent_executor.py.
- Added delegation tracking functionality in agent_utils.py to increment delegations when specific tools are used.
- Updated tool usage logic to handle caching more effectively in tool_usage.py.
- Enhanced test cases to validate new delegation features and tool caching behavior.

This update improves the efficiency of tool execution and enhances the delegation capabilities of agents.
2026-01-22 11:35:27 -08:00
Greyson LaLonde
a064b84ead feat: add URL file source support for multimodal content 2026-01-22 14:18:16 -05:00
Greyson LaLonde
4d0b6d834c test: add real video file tests for duration detection 2026-01-22 14:08:40 -05:00
Greyson LaLonde
9be88e05ee feat: add format hints to audio/video duration detection 2026-01-22 14:02:55 -05:00
lorenzejay
d0af4c6331 ensure we properly fail tools and emit their events 2026-01-22 10:36:11 -08:00
lorenzejay
0d4ff5d80c adjust 2026-01-22 10:18:35 -08:00
lorenzejay
a6a0bf6412 adjust cassettes and dropped tests due to native tool implementation 2026-01-22 10:15:18 -08:00
lorenzejay
bffe5aa877 revert crew agent executor 2026-01-22 09:28:00 -08:00
Greyson LaLonde
9fec81f976 refactor: improve factory typing with specific provider and uploader types 2026-01-22 12:22:28 -05:00
lorenzejay
85096ca086 regen cassette 2026-01-22 09:17:21 -08:00
lorenzejay
0d62d8dc0c updated cassette 2026-01-22 09:07:37 -08:00
lorenzejay
21911d2de5 linted 2026-01-22 08:49:03 -08:00
Greyson LaLonde
6147d4eb2e refactor: reorganize files module with centralized constants and utilities 2026-01-22 11:46:17 -05:00
lorenzejay
b40780f220 Merge branch 'main' of github.com:crewAIInc/crewAI into lorenze/imp/native-tool-calling 2026-01-22 08:38:57 -08:00
Vini Brasil
06d953bf46 Add model field to LLM failed events (#4267)
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled
Move the `model` field from `LLMCallStartedEvent` and
`LLMCallCompletedEvent` to the base `LLMEventBase` class.
2026-01-22 16:19:18 +01:00
Greyson LaLonde
e2a5177da2 refactor: consolidate FileInput and MIME type definitions 2026-01-22 10:15:32 -05:00
Greyson LaLonde
da930fa1df refactor: extract helper functions to reduce code duplication 2026-01-22 09:52:23 -05:00
Greyson LaLonde
0a250a45ce refactor: fix IDE warnings and add Literal types to constraints
- Add Literal types for ImageFormat, AudioFormat, VideoFormat, ProviderName
- Convert methods to @staticmethod where appropriate
- Remove redundant default parameter values
- Fix variable shadowing in nested functions
- Make magic import optional with mimetypes fallback
- Add docstrings to inner functions
2026-01-22 02:54:29 -05:00
Greyson LaLonde
1353cb2a33 feat: add streaming uploads for large files
- OpenAI: Use Uploads API for files > 512MB with chunked streaming
- Gemini: Pass file path directly to SDK for FilePath sources
- Bedrock: Use upload_fileobj with TransferConfig for automatic multipart
2026-01-22 02:10:15 -05:00
Greyson LaLonde
5550c6df7e feat: promote files to first-class crewai.files package 2026-01-22 01:39:04 -05:00
lorenzejay
d9e4a2345b Remove debug print statements from AgentExecutor to clean up the code and improve readability. This change enhances the overall performance of the agent execution flow by eliminating unnecessary console output during LLM calls and iterations. 2026-01-21 17:53:35 -08:00
Greyson LaLonde
204a1cece7 chore: move file processing deps to optional dependencies 2026-01-21 20:52:15 -05:00
Greyson LaLonde
4c0d99601c chore: remove unnecessary comments and fix type errors
- Remove unnecessary block and inline comments from file utilities
- Fix mypy errors by using file.read() instead of file.source.read()
2026-01-21 20:40:13 -05:00
Greyson LaLonde
e2c517d0a2 feat: export file types and deprecate agent multimodal flag
- Export File type classes from crewai package
- Mark Agent.multimodal field as deprecated (use input_files instead)
2026-01-21 20:14:43 -05:00
Greyson LaLonde
af4523b2a1 chore: add file processing dependencies
- Add python-magic and aiocache to core dependencies
- Add optional image-processing group (Pillow)
- Add optional pdf-processing group (pypdf)
- Add optional file-processing group (both)
2026-01-21 20:13:26 -05:00
Greyson LaLonde
1fe020fa6f test: add file utilities tests
- Add tests for file processing constraints and validators
- Add tests for FileProcessor and FileResolver
- Add tests for resolved file types
- Add tests for file store operations
- Add unit tests for multimodal LLM support
2026-01-21 20:12:57 -05:00
Greyson LaLonde
b035aa8947 feat: add ReadFileTool for agent file access
- Create read_file tool for agents to access attached files
- Support reading by file name from crew/task file store
- Add unit tests for ReadFileTool
2026-01-21 20:12:11 -05:00
Greyson LaLonde
4ed5e4ca0e feat: add input_files support to Task and Crew
- Add input_files parameter to Task for file attachments
- Add file_handling mode to Crew for processing behavior
- Integrate file injection in CrewAgentExecutor
- Update prepare_kickoff to handle KickoffInputs type
2026-01-21 20:11:05 -05:00
Greyson LaLonde
771eccfcdf feat: add multimodal support to LLM providers
- Add format_multimodal_content() to all LLM providers
- Support inline base64 and file reference formats
- Add FileResolver integration for upload caching
- Add module exports for files package
2026-01-21 20:05:33 -05:00
Greyson LaLonde
50728b10e8 fix: resolve mypy type errors in file utilities 2026-01-21 19:43:46 -05:00
Greyson LaLonde
42ca4eacff feat: upgrade upload cache to aiocache with atexit cleanup 2026-01-21 19:35:56 -05:00
Greyson LaLonde
d8ebfe7ee0 feat: add module exports and file store 2026-01-21 19:28:40 -05:00
lorenzejay
422374a881 dropped unused 2026-01-21 16:06:33 -08:00
lorenzejay
659589e8ae organizae imports 2026-01-21 16:00:54 -08:00
lorenzejay
97766b3c58 fixing azure tests 2026-01-21 16:00:32 -08:00
Greyson LaLonde
8cf0cfa2b7 feat: add prompt caching support for Anthropic 2026-01-21 18:46:06 -05:00
Greyson LaLonde
3ad0af4934 feat: add file resolver for inline vs upload decisions 2026-01-21 18:41:34 -05:00
Greyson LaLonde
56946d309b feat: add provider file uploaders 2026-01-21 18:38:04 -05:00
Greyson LaLonde
5200ed4372 feat: add file upload cache 2026-01-21 18:37:22 -05:00
Greyson LaLonde
301a1da047 feat: add file processing infrastructure 2026-01-21 18:30:14 -05:00
Greyson LaLonde
22f1e21d69 feat: add core file types and content detection 2026-01-21 18:23:36 -05:00
lorenzejay
87088171d4 fix 2026-01-21 14:59:47 -08:00
lorenzejay
1757559a3d Merge branch 'main' of github.com:crewAIInc/crewAI into lorenze/imp/native-tool-calling 2026-01-21 13:59:44 -08:00
lorenzejay
6c5d6fb70c Update English translations for tool usage and reasoning instructions
- Revised the `post_tool_reasoning` message to clarify the analysis process after tool usage, emphasizing the need to provide only the final answer if requirements are met.
- Updated the `format` message to simplify the instructions for deciding between using a tool or providing a final answer, enhancing clarity for users.
- These changes improve the overall user experience by providing clearer guidance on task execution and response formatting.
2026-01-21 13:03:35 -08:00
lorenzejay
56dd2f82a4 Refactor AgentExecutor to support batch execution of native tool calls
- Updated the  method to process all tools from  in a single batch, enhancing efficiency and reducing the number of interactions with the LLM.
- Introduced a new utility function  to streamline the extraction of tool call details, improving compatibility with various tool formats.
- Removed the  parameter, simplifying the initialization of the .
- Enhanced logging and message handling to provide clearer insights during tool execution.
- This refactor improves the overall performance and usability of the agent execution flow.
2026-01-21 13:03:06 -08:00
lorenzejay
e562a06836 Add extract_tool_call_info function to handle various tool call formats
- Introduced a new utility function  to extract tool call ID, name, and arguments from different provider formats (OpenAI, Gemini, Anthropic, and dictionary).
- This enhancement improves the flexibility and compatibility of tool calls across multiple LLM providers, ensuring consistent handling of tool call information.
- The function returns a tuple containing the call ID, function name, and function arguments, or None if the format is unrecognized.
2026-01-21 13:02:22 -08:00
lorenzejay
edd1fd73cd for fixing model dump with state 2026-01-21 13:01:42 -08:00
lorenzejay
b0abf169b0 drop max_tools_iterations 2026-01-21 13:00:23 -08:00
lorenzejay
7d5a64af0d fix llm_call_completed event serialization issue 2026-01-21 12:59:28 -08:00
Greyson LaLonde
f997b73577 fix: bump mcp to ~=1.23.1
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled
Build uv cache / build-cache (3.10) (push) Has been cancelled
Build uv cache / build-cache (3.11) (push) Has been cancelled
Build uv cache / build-cache (3.12) (push) Has been cancelled
Build uv cache / build-cache (3.13) (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
- resolves [cve](https://nvd.nist.gov/vuln/detail/CVE-2025-66416)
2026-01-21 12:43:48 -05:00
Greyson LaLonde
7a65baeb9c feat: add event ordering and parent-child hierarchy
adds emission sequencing, parent-child event hierarchy with scope management, and integrates both into the event bus. introduces flush() for deterministic handling, resets emission counters for test isolation, and adds chain tracking via previous_event_id/triggered_by_event_id plus context variables populated during emit and listener execution. includes tracing listener typing/sorting improvements, safer tool event pairing with try/finally, additional stack checks and cache-hit formatting, context isolation fixes, cassette regen/decoding, and test updates to handle vcr race conditions and flaky behavior.
2026-01-21 11:12:10 -05:00
lorenzejay
d6e04ba24d fix sync main issues 2026-01-21 07:40:27 -08:00
lorenzejay
1b67629149 Merge branch 'main' of github.com:crewAIInc/crewAI into lorenze/imp/native-tool-calling 2026-01-20 22:05:33 -08:00
Lorenze Jay
741bf12bf4 Lorenze/enh decouple executor from crew (#4209)
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
* wip restrcuturing agent executor and liteagent

* fix: handle None task in AgentExecutor to prevent errors

Added a check to ensure that if the task is None, the method returns early without attempting to access task properties. This change improves the robustness of the AgentExecutor by preventing potential errors when the task is not set.

* refactor: streamline AgentExecutor initialization by removing redundant parameters

Updated the Agent class to simplify the initialization of the AgentExecutor by removing unnecessary task and crew parameters in standalone mode. This change enhances code clarity and maintains backward compatibility by ensuring that the executor is correctly configured without redundant assignments.

* ensure executors work inside a flow due to flow in flow async structure

* refactor: enhance agent kickoff preparation by separating common logic

Updated the Agent class to introduce a new private method  that consolidates the common setup logic for both synchronous and asynchronous kickoff executions. This change improves code clarity and maintainability by reducing redundancy in the kickoff process, while ensuring that the agent can still execute effectively within both standalone and flow contexts.

* linting and tests

* fix test

* refactor: improve test for Agent kickoff parameters

Updated the test for the Agent class to ensure that the kickoff method correctly preserves parameters. The test now verifies the configuration of the agent after kickoff, enhancing clarity and maintainability. Additionally, the test for asynchronous kickoff within a flow context has been updated to reflect the Agent class instead of LiteAgent.

* refactor: update test task guardrail process output for improved validation

Refactored the test for task guardrail process output to enhance the validation of the output against the OpenAPI schema. The changes include a more structured request body and updated response handling to ensure compliance with the guardrail requirements. This update aims to improve the clarity and reliability of the test cases, ensuring that task outputs are correctly validated and feedback is appropriately provided.

* test fix cassette

* test fix cassette

* working

* working cassette

* refactor: streamline agent execution and enhance flow compatibility

Refactored the Agent class to simplify the execution method by removing the event loop check and clarifying the behavior when called from synchronous and asynchronous contexts. The changes ensure that the method operates seamlessly within flow methods, improving clarity in the documentation. Additionally, updated the AgentExecutor to set the response model to None, enhancing flexibility. New test cassettes were added to validate the functionality of agents within flow contexts, ensuring robust testing for both synchronous and asynchronous operations.

* fixed cassette

* Enhance Flow Execution Logic

- Introduced conditional execution for start methods in the Flow class.
- Unconditional start methods are prioritized during kickoff, while conditional starts are executed only if no unconditional starts are present.
- Improved handling of cyclic flows by allowing re-execution of conditional start methods triggered by routers.
- Added checks to continue execution chains for completed conditional starts.

These changes improve the flexibility and control of flow execution, ensuring that the correct methods are triggered based on the defined conditions.

* Enhance Agent and Flow Execution Logic

- Updated the Agent class to automatically detect the event loop and return a coroutine when called within a Flow, simplifying async handling for users.
- Modified Flow class to execute listeners sequentially, preventing race conditions on shared state during listener execution.
- Improved handling of coroutine results from synchronous methods, ensuring proper execution flow and state management.

These changes enhance the overall execution logic and user experience when working with agents and flows in CrewAI.

* Enhance Flow Listener Logic and Agent Imports

- Updated the Flow class to track fired OR listeners, ensuring that multi-source OR listeners only trigger once during execution. This prevents redundant executions and improves flow efficiency.
- Cleared fired OR listeners during cyclic flow resets to allow re-execution in new cycles.
- Modified the Agent class imports to include Coroutine from collections.abc, enhancing type handling for asynchronous operations.

These changes improve the control and performance of flow execution in CrewAI, ensuring more predictable behavior in complex scenarios.

* adjusted test due to new cassette

* ensure we dont finalize batch on just a liteagent finishing

* feat: cancellable parallelized flow methods

* feat: allow methods to be cancelled & run parallelized

* feat: ensure state is thread safe through proxy

* fix: check for proxy state

* fix: mimic BaseModel method

* chore: update final attr checks; test

* better description

* fix test

* chore: update test assumptions

* extra

---------

Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
2026-01-20 21:44:45 -08:00
lorenzejay
b49e42af05 max tools per turn wip and ensure we drop print times 2026-01-20 16:46:38 -08:00
lorenzejay
3472cb4f8a Merge branch 'lorenze/enh-decouple-executor-from-crew' into lorenze/imp/native-tool-calling 2026-01-20 13:26:08 -08:00
lorenzejay
63a33cf01c ensure we dont finalize batch on just a liteagent finishing 2026-01-20 13:23:27 -08:00
lorenzejay
9de0e7cb13 Enhance Tool Name Handling for LLM Compatibility
- Added a new function  to replace invalid characters in function names with underscores, ensuring compatibility with LLM providers.
- Updated the  function to sanitize tool names before validation.
- Modified the  function to use sanitized names for tool registration.

These changes improve the robustness of tool name handling, preventing potential issues with invalid characters in function names.
2026-01-20 13:17:25 -08:00
lorenzejay
4c1f86b32f ensure response model is respected 2026-01-20 11:11:56 -08:00
lorenzejay
822d1f9997 ensure native tool calling works with liteagent 2026-01-20 10:59:57 -08:00
lorenzejay
bfc15ef4bd merged lorenze/enh-decouple-executor-from-crew 2026-01-20 10:41:28 -08:00
lorenzejay
edcf3e3e36 Merge branch 'main' of github.com:crewAIInc/crewAI into lorenze/imp/native-tool-calling 2026-01-20 10:20:18 -08:00
lorenzejay
33d87bdf0f adjusted test due to new cassette 2026-01-20 10:16:00 -08:00
lorenzejay
c16f1dd801 Merge branch 'main' of github.com:crewAIInc/crewAI into lorenze/enh-decouple-executor-from-crew 2026-01-20 10:02:33 -08:00
Lorenze Jay
b267bb4054 Lorenze/fix google vertex api using api keys (#4243)
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled
Check Documentation Broken Links / Check broken links (push) Has been cancelled
Build uv cache / build-cache (3.12) (push) Has been cancelled
Build uv cache / build-cache (3.13) (push) Has been cancelled
Build uv cache / build-cache (3.10) (push) Has been cancelled
Build uv cache / build-cache (3.11) (push) Has been cancelled
* supporting vertex through api key use - expo mode

* docs update here

* docs translations

---------

Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
2026-01-20 09:34:36 -08:00
Greyson LaLonde
ceef062426 feat: add additional a2a events and enrich event metadata
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
2026-01-16 16:57:31 -05:00
lorenzejay
64052745b7 Enhance Flow Listener Logic and Agent Imports
- Updated the Flow class to track fired OR listeners, ensuring that multi-source OR listeners only trigger once during execution. This prevents redundant executions and improves flow efficiency.
- Cleared fired OR listeners during cyclic flow resets to allow re-execution in new cycles.
- Modified the Agent class imports to include Coroutine from collections.abc, enhancing type handling for asynchronous operations.

These changes improve the control and performance of flow execution in CrewAI, ensuring more predictable behavior in complex scenarios.
2026-01-15 16:12:13 -08:00
lorenzejay
7f7b5094cc Enhance Agent and Flow Execution Logic
- Updated the Agent class to automatically detect the event loop and return a coroutine when called within a Flow, simplifying async handling for users.
- Modified Flow class to execute listeners sequentially, preventing race conditions on shared state during listener execution.
- Improved handling of coroutine results from synchronous methods, ensuring proper execution flow and state management.

These changes enhance the overall execution logic and user experience when working with agents and flows in CrewAI.
2026-01-15 15:51:39 -08:00
lorenzejay
67d681bc6e Merge branch 'main' of github.com:crewAIInc/crewAI into lorenze/imp/native-tool-calling 2026-01-15 14:46:56 -08:00
lorenzejay
ad83e8a2bf Merge branch 'main' of github.com:crewAIInc/crewAI into lorenze/enh-decouple-executor-from-crew 2026-01-15 14:45:17 -08:00
Heitor Carvalho
e44d778e0e feat: keycloak sso provider support (#4241)
Some checks failed
Check Documentation Broken Links / Check broken links (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Build uv cache / build-cache (3.10) (push) Has been cancelled
Build uv cache / build-cache (3.11) (push) Has been cancelled
Build uv cache / build-cache (3.12) (push) Has been cancelled
Build uv cache / build-cache (3.13) (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
2026-01-15 15:38:40 -03:00
lorenzejay
601eda9095 Enhance Flow Execution Logic
- Introduced conditional execution for start methods in the Flow class.
- Unconditional start methods are prioritized during kickoff, while conditional starts are executed only if no unconditional starts are present.
- Improved handling of cyclic flows by allowing re-execution of conditional start methods triggered by routers.
- Added checks to continue execution chains for completed conditional starts.

These changes improve the flexibility and control of flow execution, ensuring that the correct methods are triggered based on the defined conditions.
2026-01-15 09:29:25 -08:00
lorenzejay
83c62a65dd Merge branch 'main' of github.com:crewAIInc/crewAI into lorenze/enh-decouple-executor-from-crew 2026-01-15 09:12:38 -08:00
nicoferdi96
5645cbb22e CrewAI AMP Deployment Guidelines (#4205)
* doc changes for better deplyment guidelines and checklist

* chore: remove .claude folder from version control

The .claude folder contains local Claude Code skills and configuration
that should not be tracked in the repository. Already in .gitignore.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

* Better project structure for flows

* docs.json updated structure

* Ko and Pt traslations for deploying guidelines to AMP

* fix broken links

---------

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
2026-01-15 16:32:20 +01:00
Lorenze Jay
8f022be106 feat: bump versions to 1.8.1 (#4242)
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Check Documentation Broken Links / Check broken links (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
Build uv cache / build-cache (3.10) (push) Has been cancelled
Build uv cache / build-cache (3.11) (push) Has been cancelled
Build uv cache / build-cache (3.12) (push) Has been cancelled
Build uv cache / build-cache (3.13) (push) Has been cancelled
* feat: bump versions to 1.8.1

* bump bump
2026-01-14 20:49:14 -08:00
Greyson LaLonde
6a19b0a279 feat: a2a task execution utilities 2026-01-14 22:56:17 -05:00
Greyson LaLonde
641c336b2c chore: a2a agent card docs, refine existing a2a docs 2026-01-14 22:46:53 -05:00
Greyson LaLonde
22f1812824 feat: add a2a server config; agent card generation 2026-01-14 22:09:11 -05:00
lorenzejay
3a1deb193a fixed cassette 2026-01-14 19:06:28 -08:00
lorenzejay
09185acc0d refactor: streamline agent execution and enhance flow compatibility
Refactored the Agent class to simplify the execution method by removing the event loop check and clarifying the behavior when called from synchronous and asynchronous contexts. The changes ensure that the method operates seamlessly within flow methods, improving clarity in the documentation. Additionally, updated the AgentExecutor to set the response model to None, enhancing flexibility. New test cassettes were added to validate the functionality of agents within flow contexts, ensuring robust testing for both synchronous and asynchronous operations.
2026-01-14 18:51:09 -08:00
lorenzejay
6541f01b1b working cassette 2026-01-14 16:40:35 -08:00
lorenzejay
3a6702e9c8 working 2026-01-14 16:27:50 -08:00
lorenzejay
e4bd7889fd test fix cassette 2026-01-14 16:23:36 -08:00
lorenzejay
842a1db16f test fix cassette 2026-01-14 16:23:19 -08:00
lorenzejay
e9b86100c7 refactor: update test task guardrail process output for improved validation
Refactored the test for task guardrail process output to enhance the validation of the output against the OpenAPI schema. The changes include a more structured request body and updated response handling to ensure compliance with the guardrail requirements. This update aims to improve the clarity and reliability of the test cases, ensuring that task outputs are correctly validated and feedback is appropriately provided.
2026-01-14 16:05:38 -08:00
lorenzejay
341812d58e refactor: improve test for Agent kickoff parameters
Updated the test for the Agent class to ensure that the kickoff method correctly preserves parameters. The test now verifies the configuration of the agent after kickoff, enhancing clarity and maintainability. Additionally, the test for asynchronous kickoff within a flow context has been updated to reflect the Agent class instead of LiteAgent.
2026-01-14 15:56:53 -08:00
lorenzejay
38db734561 fix test 2026-01-14 15:39:34 -08:00
lorenzejay
5048d54981 Merge branch 'main' of github.com:crewAIInc/crewAI into lorenze/enh-decouple-executor-from-crew 2026-01-14 14:28:33 -08:00
lorenzejay
ae17178e86 linting and tests 2026-01-14 14:28:09 -08:00
lorenzejay
b7a13e15ff refactor: enhance agent kickoff preparation by separating common logic
Updated the Agent class to introduce a new private method  that consolidates the common setup logic for both synchronous and asynchronous kickoff executions. This change improves code clarity and maintainability by reducing redundancy in the kickoff process, while ensuring that the agent can still execute effectively within both standalone and flow contexts.
2026-01-14 14:27:39 -08:00
lorenzejay
13dc7e25e0 ensure executors work inside a flow due to flow in flow async structure 2026-01-14 14:23:10 -08:00
lorenzejay
6c5e5056f3 wip: clean 2026-01-14 12:08:41 -08:00
Lorenze Jay
9edbf89b68 fix: enhance Azure model stop word support detection (#4227)
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
- Updated the `supports_stop_words` method to accurately reflect support for stop sequences based on model type, specifically excluding GPT-5 and O-series models.
- Added comprehensive tests to verify that GPT-5 family and O-series models do not support stop words, ensuring correct behavior in completion parameter preparation.
- Ensured that stop words are not included in parameters for unsupported models while maintaining expected behavior for supported models.
2026-01-13 10:23:59 -08:00
Vini Brasil
685f7b9af1 Increase frame inspection depth to detect parent_flow (#4231)
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled
Check Documentation Broken Links / Check broken links (push) Has been cancelled
This commit fixes a bug where `parent_flow` was not being set because
the maximum depth was not sufficient to search for an instance of `Flow`
in the current call stack frame during Flow instantiation.
2026-01-13 18:40:22 +01:00
Anaisdg
595fdfb6e7 feat: add galileo to integrations page (#4130)
* feat: add galileo to integrations page

* fix: linting issues

* fix: clarification on hanlder

* fix: uv install, load_dotenv redundancy, spelling error

* add: translations fix uv install and typo

* fix: broken links

---------

Co-authored-by: Anais <anais@Anaiss-MacBook-Pro.local>
Co-authored-by: Lorenze Jay <63378463+lorenzejay@users.noreply.github.com>
Co-authored-by: Anais <anais@Mac.lan>
2026-01-13 08:49:17 -08:00
Koushiv
8f99fa76ed feat: additional a2a transports
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Check Documentation Broken Links / Check broken links (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
Co-authored-by: Koushiv Sadhukhan <koushiv.777@gmail.com>
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
2026-01-12 12:03:06 -05:00
GininDenis
17e3fcbe1f fix: unlink task in execution spans
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
2026-01-12 02:58:42 -05:00
Joao Moura
b858d705a8 updating docs
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled
Check Documentation Broken Links / Check broken links (push) Has been cancelled
2026-01-11 16:02:55 -08:00
lorenzejay
5cef85c643 refactor: streamline AgentExecutor initialization by removing redundant parameters
Updated the Agent class to simplify the initialization of the AgentExecutor by removing unnecessary task and crew parameters in standalone mode. This change enhances code clarity and maintains backward compatibility by ensuring that the executor is correctly configured without redundant assignments.
2026-01-09 18:27:07 -08:00
lorenzejay
dc3ae9396d fix: handle None task in AgentExecutor to prevent errors
Added a check to ensure that if the task is None, the method returns early without attempting to access task properties. This change improves the robustness of the AgentExecutor by preventing potential errors when the task is not set.
2026-01-09 18:07:37 -08:00
Lorenze Jay
d60f7b360d WIP docs for pii-redaction feat (#4189)
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Check Documentation Broken Links / Check broken links (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
Build uv cache / build-cache (3.10) (push) Has been cancelled
Build uv cache / build-cache (3.11) (push) Has been cancelled
Build uv cache / build-cache (3.12) (push) Has been cancelled
Build uv cache / build-cache (3.13) (push) Has been cancelled
* WIP docs for pii-redaction feat

* fix

* updated image

* Update PII Redaction documentation to clarify Enterprise plan requirements and version constraints

* visual re-ordering

* dropping not useful info

* improve docs

* better wording

* Add PII Redaction feature documentation in Korean and Portuguese, including details on activation, supported entity types, and best practices for usage.
2026-01-09 17:53:05 -08:00
Lorenze Jay
6050a7b3e0 chore: update changelog for version 1.8.0 release (#4206)
- Added new features including native async chain for a2a, a2a update mechanisms, and global flow configuration for human-in-the-loop feedback.
- Improved event handling with enhancements to EventListener and TraceCollectionListener.
- Fixed bugs related to missing a2a dependencies and WorkOS login polling.
- Updated documentation for webhook-streaming and adjusted language in AOP to AMP documentation.
- Acknowledged contributors for this release.
2026-01-09 16:44:45 -08:00
lorenzejay
0029f8193c wip restrcuturing agent executor and liteagent 2026-01-09 14:42:50 -08:00
João Moura
46846bcace fix: improve error handling for HumanFeedbackPending in flow execution (#4203)
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
* fix: handle HumanFeedbackPending in flow error management

Updated the flow error handling to treat HumanFeedbackPending as expected control flow rather than an error. This change ensures that the flow can appropriately manage human feedback scenarios without signaling an error, improving the robustness of the flow execution.

* fix: improve error handling for HumanFeedbackPending in flow execution

Refined the flow error management to emit a paused event for HumanFeedbackPending exceptions instead of treating them as failures. This enhancement allows the flow to better manage human feedback scenarios, ensuring that the execution state is preserved and appropriately handled without signaling an error. Regular failure events are still emitted for other exceptions, maintaining robust error reporting.
2026-01-08 03:40:02 -03:00
João Moura
d71e91e8f2 fix: handle HumanFeedbackPending in flow error management (#4200)
Updated the flow error handling to treat HumanFeedbackPending as expected control flow rather than an error. This change ensures that the flow can appropriately manage human feedback scenarios without signaling an error, improving the robustness of the flow execution.
2026-01-08 00:52:38 -03:00
429 changed files with 70362 additions and 37305 deletions

1
.gitignore vendored
View File

@@ -26,3 +26,4 @@ plan.md
conceptual_plan.md
build_image
chromadb-*.lock
.claude

View File

@@ -19,7 +19,7 @@ repos:
language: system
pass_filenames: true
types: [python]
exclude: ^(lib/crewai/src/crewai/cli/templates/|lib/crewai/tests/|lib/crewai-tools/tests/)
exclude: ^(lib/crewai/src/crewai/cli/templates/|lib/crewai/tests/|lib/crewai-tools/tests/|lib/crewai-files/tests/)
- repo: https://github.com/astral-sh/uv-pre-commit
rev: 0.9.3
hooks:

View File

@@ -1,6 +1,8 @@
"""Pytest configuration for crewAI workspace."""
import base64
from collections.abc import Generator
import gzip
import os
from pathlib import Path
import tempfile
@@ -9,6 +11,7 @@ from typing import Any
from dotenv import load_dotenv
import pytest
from vcr.request import Request # type: ignore[import-untyped]
import vcr.stubs.httpx_stubs as httpx_stubs # type: ignore[import-untyped]
env_test_path = Path(__file__).parent / ".env.test"
@@ -16,6 +19,25 @@ load_dotenv(env_test_path, override=True)
load_dotenv(override=True)
def _patched_make_vcr_request(httpx_request: Any, **kwargs: Any) -> Any:
"""Patched version of VCR's _make_vcr_request that handles binary content.
The original implementation fails on binary request bodies (like file uploads)
because it assumes all content can be decoded as UTF-8.
"""
raw_body = httpx_request.read()
try:
body = raw_body.decode("utf-8")
except UnicodeDecodeError:
body = base64.b64encode(raw_body).decode("ascii")
uri = str(httpx_request.url)
headers = dict(httpx_request.headers)
return Request(httpx_request.method, uri, body, headers)
httpx_stubs._make_vcr_request = _patched_make_vcr_request
@pytest.fixture(autouse=True, scope="function")
def cleanup_event_handlers() -> Generator[None, Any, None]:
"""Clean up event bus handlers after each test to prevent test pollution."""
@@ -31,6 +53,21 @@ def cleanup_event_handlers() -> Generator[None, Any, None]:
pass
@pytest.fixture(autouse=True, scope="function")
def reset_event_state() -> None:
"""Reset event system state before each test for isolation."""
from crewai.events.base_events import reset_emission_counter
from crewai.events.event_context import (
EventContextConfig,
_event_context_config,
_event_id_stack,
)
reset_emission_counter()
_event_id_stack.set(())
_event_context_config.set(EventContextConfig())
@pytest.fixture(autouse=True, scope="function")
def setup_test_environment() -> Generator[None, Any, None]:
"""Setup test environment for crewAI workspace."""
@@ -133,19 +170,42 @@ def _filter_request_headers(request: Request) -> Request: # type: ignore[no-any
request.headers[variant] = [replacement]
request.method = request.method.upper()
# Normalize Azure OpenAI endpoints to a consistent placeholder for cassette matching.
if request.host and request.host.endswith(".openai.azure.com"):
original_host = request.host
placeholder_host = "fake-azure-endpoint.openai.azure.com"
request.uri = request.uri.replace(original_host, placeholder_host)
return request
def _filter_response_headers(response: dict[str, Any]) -> dict[str, Any]:
"""Filter sensitive headers from response before recording."""
# Remove Content-Encoding to prevent decompression issues on replay
def _filter_response_headers(response: dict[str, Any]) -> dict[str, Any] | None:
"""Filter sensitive headers from response before recording.
Returns None to skip recording responses with empty bodies. This handles
duplicate recordings caused by OpenAI's stainless client using
with_raw_response which triggers httpx to re-read the consumed stream.
"""
body = response.get("body", {}).get("string", "")
headers = response.get("headers", {})
content_length = headers.get("content-length", headers.get("Content-Length", []))
if body == "" or body == b"" or content_length == ["0"]:
return None
for encoding_header in ["Content-Encoding", "content-encoding"]:
response["headers"].pop(encoding_header, None)
if encoding_header in headers:
encoding = headers.pop(encoding_header)
if encoding and encoding[0] == "gzip":
body = response.get("body", {}).get("string", b"")
if isinstance(body, bytes) and body.startswith(b"\x1f\x8b"):
response["body"]["string"] = gzip.decompress(body).decode("utf-8")
for header_name, replacement in HEADERS_TO_FILTER.items():
for variant in [header_name, header_name.upper(), header_name.title()]:
if variant in response["headers"]:
response["headers"][variant] = [replacement]
if variant in headers:
headers[variant] = [replacement]
return response
@@ -160,7 +220,10 @@ def vcr_cassette_dir(request: Any) -> str:
test_file = Path(request.fspath)
for parent in test_file.parents:
if parent.name in ("crewai", "crewai-tools") and parent.parent.name == "lib":
if (
parent.name in ("crewai", "crewai-tools", "crewai-files")
and parent.parent.name == "lib"
):
package_root = parent
break
else:

View File

@@ -61,7 +61,9 @@
"groups": [
{
"group": "Welcome",
"pages": ["index"]
"pages": [
"index"
]
}
]
},
@@ -71,7 +73,11 @@
"groups": [
{
"group": "Get Started",
"pages": ["en/introduction", "en/installation", "en/quickstart"]
"pages": [
"en/introduction",
"en/installation",
"en/quickstart"
]
},
{
"group": "Guides",
@@ -79,17 +85,23 @@
{
"group": "Strategy",
"icon": "compass",
"pages": ["en/guides/concepts/evaluating-use-cases"]
"pages": [
"en/guides/concepts/evaluating-use-cases"
]
},
{
"group": "Agents",
"icon": "user",
"pages": ["en/guides/agents/crafting-effective-agents"]
"pages": [
"en/guides/agents/crafting-effective-agents"
]
},
{
"group": "Crews",
"icon": "users",
"pages": ["en/guides/crews/first-crew"]
"pages": [
"en/guides/crews/first-crew"
]
},
{
"group": "Flows",
@@ -279,6 +291,7 @@
"en/observability/arize-phoenix",
"en/observability/braintrust",
"en/observability/datadog",
"en/observability/galileo",
"en/observability/langdb",
"en/observability/langfuse",
"en/observability/langtrace",
@@ -324,7 +337,9 @@
},
{
"group": "Telemetry",
"pages": ["en/telemetry"]
"pages": [
"en/telemetry"
]
}
]
},
@@ -334,7 +349,9 @@
"groups": [
{
"group": "Getting Started",
"pages": ["en/enterprise/introduction"]
"pages": [
"en/enterprise/introduction"
]
},
{
"group": "Build",
@@ -343,7 +360,8 @@
"en/enterprise/features/crew-studio",
"en/enterprise/features/marketplace",
"en/enterprise/features/agent-repositories",
"en/enterprise/features/tools-and-integrations"
"en/enterprise/features/tools-and-integrations",
"en/enterprise/features/pii-trace-redactions"
]
},
{
@@ -411,7 +429,8 @@
"group": "How-To Guides",
"pages": [
"en/enterprise/guides/build-crew",
"en/enterprise/guides/deploy-crew",
"en/enterprise/guides/prepare-for-deployment",
"en/enterprise/guides/deploy-to-amp",
"en/enterprise/guides/kickoff-crew",
"en/enterprise/guides/update-crew",
"en/enterprise/guides/enable-crew-studio",
@@ -426,7 +445,9 @@
},
{
"group": "Resources",
"pages": ["en/enterprise/resources/frequently-asked-questions"]
"pages": [
"en/enterprise/resources/frequently-asked-questions"
]
}
]
},
@@ -452,7 +473,10 @@
"groups": [
{
"group": "Examples",
"pages": ["en/examples/example", "en/examples/cookbooks"]
"pages": [
"en/examples/example",
"en/examples/cookbooks"
]
}
]
},
@@ -462,7 +486,9 @@
"groups": [
{
"group": "Release Notes",
"pages": ["en/changelog"]
"pages": [
"en/changelog"
]
}
]
}
@@ -501,7 +527,9 @@
"groups": [
{
"group": "Bem-vindo",
"pages": ["pt-BR/index"]
"pages": [
"pt-BR/index"
]
}
]
},
@@ -523,17 +551,23 @@
{
"group": "Estratégia",
"icon": "compass",
"pages": ["pt-BR/guides/concepts/evaluating-use-cases"]
"pages": [
"pt-BR/guides/concepts/evaluating-use-cases"
]
},
{
"group": "Agentes",
"icon": "user",
"pages": ["pt-BR/guides/agents/crafting-effective-agents"]
"pages": [
"pt-BR/guides/agents/crafting-effective-agents"
]
},
{
"group": "Crews",
"icon": "users",
"pages": ["pt-BR/guides/crews/first-crew"]
"pages": [
"pt-BR/guides/crews/first-crew"
]
},
{
"group": "Flows",
@@ -710,6 +744,7 @@
"pt-BR/observability/arize-phoenix",
"pt-BR/observability/braintrust",
"pt-BR/observability/datadog",
"pt-BR/observability/galileo",
"pt-BR/observability/langdb",
"pt-BR/observability/langfuse",
"pt-BR/observability/langtrace",
@@ -754,7 +789,9 @@
},
{
"group": "Telemetria",
"pages": ["pt-BR/telemetry"]
"pages": [
"pt-BR/telemetry"
]
}
]
},
@@ -764,7 +801,9 @@
"groups": [
{
"group": "Começando",
"pages": ["pt-BR/enterprise/introduction"]
"pages": [
"pt-BR/enterprise/introduction"
]
},
{
"group": "Construir",
@@ -773,7 +812,8 @@
"pt-BR/enterprise/features/crew-studio",
"pt-BR/enterprise/features/marketplace",
"pt-BR/enterprise/features/agent-repositories",
"pt-BR/enterprise/features/tools-and-integrations"
"pt-BR/enterprise/features/tools-and-integrations",
"pt-BR/enterprise/features/pii-trace-redactions"
]
},
{
@@ -825,7 +865,8 @@
"group": "Guias",
"pages": [
"pt-BR/enterprise/guides/build-crew",
"pt-BR/enterprise/guides/deploy-crew",
"pt-BR/enterprise/guides/prepare-for-deployment",
"pt-BR/enterprise/guides/deploy-to-amp",
"pt-BR/enterprise/guides/kickoff-crew",
"pt-BR/enterprise/guides/update-crew",
"pt-BR/enterprise/guides/enable-crew-studio",
@@ -883,7 +924,10 @@
"groups": [
{
"group": "Exemplos",
"pages": ["pt-BR/examples/example", "pt-BR/examples/cookbooks"]
"pages": [
"pt-BR/examples/example",
"pt-BR/examples/cookbooks"
]
}
]
},
@@ -893,7 +937,9 @@
"groups": [
{
"group": "Notas de Versão",
"pages": ["pt-BR/changelog"]
"pages": [
"pt-BR/changelog"
]
}
]
}
@@ -932,7 +978,9 @@
"groups": [
{
"group": "환영합니다",
"pages": ["ko/index"]
"pages": [
"ko/index"
]
}
]
},
@@ -942,7 +990,11 @@
"groups": [
{
"group": "시작 안내",
"pages": ["ko/introduction", "ko/installation", "ko/quickstart"]
"pages": [
"ko/introduction",
"ko/installation",
"ko/quickstart"
]
},
{
"group": "가이드",
@@ -950,17 +1002,23 @@
{
"group": "전략",
"icon": "compass",
"pages": ["ko/guides/concepts/evaluating-use-cases"]
"pages": [
"ko/guides/concepts/evaluating-use-cases"
]
},
{
"group": "에이전트 (Agents)",
"icon": "user",
"pages": ["ko/guides/agents/crafting-effective-agents"]
"pages": [
"ko/guides/agents/crafting-effective-agents"
]
},
{
"group": "크루 (Crews)",
"icon": "users",
"pages": ["ko/guides/crews/first-crew"]
"pages": [
"ko/guides/crews/first-crew"
]
},
{
"group": "플로우 (Flows)",
@@ -1149,6 +1207,7 @@
"ko/observability/arize-phoenix",
"ko/observability/braintrust",
"ko/observability/datadog",
"ko/observability/galileo",
"ko/observability/langdb",
"ko/observability/langfuse",
"ko/observability/langtrace",
@@ -1193,7 +1252,9 @@
},
{
"group": "Telemetry",
"pages": ["ko/telemetry"]
"pages": [
"ko/telemetry"
]
}
]
},
@@ -1203,7 +1264,9 @@
"groups": [
{
"group": "시작 안내",
"pages": ["ko/enterprise/introduction"]
"pages": [
"ko/enterprise/introduction"
]
},
{
"group": "빌드",
@@ -1212,7 +1275,8 @@
"ko/enterprise/features/crew-studio",
"ko/enterprise/features/marketplace",
"ko/enterprise/features/agent-repositories",
"ko/enterprise/features/tools-and-integrations"
"ko/enterprise/features/tools-and-integrations",
"ko/enterprise/features/pii-trace-redactions"
]
},
{
@@ -1264,7 +1328,8 @@
"group": "How-To Guides",
"pages": [
"ko/enterprise/guides/build-crew",
"ko/enterprise/guides/deploy-crew",
"ko/enterprise/guides/prepare-for-deployment",
"ko/enterprise/guides/deploy-to-amp",
"ko/enterprise/guides/kickoff-crew",
"ko/enterprise/guides/update-crew",
"ko/enterprise/guides/enable-crew-studio",
@@ -1294,7 +1359,9 @@
},
{
"group": "학습 자원",
"pages": ["ko/enterprise/resources/frequently-asked-questions"]
"pages": [
"ko/enterprise/resources/frequently-asked-questions"
]
}
]
},
@@ -1320,7 +1387,10 @@
"groups": [
{
"group": "예시",
"pages": ["ko/examples/example", "ko/examples/cookbooks"]
"pages": [
"ko/examples/example",
"ko/examples/cookbooks"
]
}
]
},
@@ -1330,7 +1400,9 @@
"groups": [
{
"group": "릴리스 노트",
"pages": ["ko/changelog"]
"pages": [
"ko/changelog"
]
}
]
}
@@ -1445,6 +1517,18 @@
"source": "/enterprise/:path*",
"destination": "/en/enterprise/:path*"
},
{
"source": "/en/enterprise/guides/deploy-crew",
"destination": "/en/enterprise/guides/deploy-to-amp"
},
{
"source": "/ko/enterprise/guides/deploy-crew",
"destination": "/ko/enterprise/guides/deploy-to-amp"
},
{
"source": "/pt-BR/enterprise/guides/deploy-crew",
"destination": "/pt-BR/enterprise/guides/deploy-to-amp"
},
{
"source": "/api-reference/:path*",
"destination": "/en/api-reference/:path*"

View File

@@ -4,6 +4,516 @@ description: "Product updates, improvements, and bug fixes for CrewAI"
icon: "clock"
mode: "wide"
---
<Update label="Jan 08, 2026">
## v1.8.0
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.8.0)
## What's Changed
### Features
- Add native async chain for a2a
- Add a2a update mechanisms (poll/stream/push) with handlers and config
- Introduce global flow configuration for human-in-the-loop feedback
- Add streaming tool call events and fix provider ID tracking
- Introduce production-ready Flows and Crews architecture
- Add HITL for Flows
- Improve EventListener and TraceCollectionListener for enhanced event handling
### Bug Fixes
- Handle missing a2a dependency as optional
- Correct error fetching for WorkOS login polling
- Fix wrong trigger name in sample documentation
### Documentation
- Update webhook-streaming documentation
- Adjust AOP to AMP documentation language
### Contributors
@Vidit-Ostwal, @greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @lucasgomide, @mplachta
</Update>
<Update label="Dec 19, 2025">
## v1.7.2
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.7.2)
## What's Changed
### Bug Fixes
- Resolve connection issues
### Documentation
- Update api-reference/status docs page
### Contributors
@greysonlalonde, @heitorado, @lorenzejay, @lucasgomide
</Update>
<Update label="Dec 16, 2025">
## v1.7.1
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.7.1)
## What's Changed
### Improvements
- Add `--no-commit` flag to bump command
- Use JSON schema for tool argument serialization
### Bug Fixes
- Fix error message display from response when tool repository login fails
- Fix graceful termination of future when executing a task asynchronously
- Fix task ordering by adding index
- Fix platform compatibility checks for Windows signals
- Fix RPM controller timer to prevent process hang
- Fix token usage recording and validate response model on stream
### Documentation
- Add translated documentation for async
- Add documentation for AOP Deploy API
- Add documentation for the agent handler connector
- Add documentation on native async
### Contributors
@Llamrei, @dragosmc, @gilfeig, @greysonlalonde, @heitorado, @lorenzejay, @mattatcha, @vinibrsl
</Update>
<Update label="Dec 09, 2025">
## v1.7.0
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.7.0)
## What's Changed
### Features
- Add async flow kickoff
- Add async crew support
- Add async task support
- Add async knowledge support
- Add async memory support
- Add async support for tools and agent executor; improve typing and docs
- Implement a2a extensions API and async agent card caching; fix task propagation & streaming
- Add native async tool support
- Add async llm support
- Create sys event types and handler
### Bug Fixes
- Fix issue to ensure nonetypes are not passed to otel
- Fix deadlock in token store file operations
- Fix to ensure otel span is closed
- Use HuggingFaceEmbeddingFunction for embeddings, update keys and add tests
- Fix to ensure supports_tools is true for all supported anthropic models
- Ensure hooks work with lite agents flows
### Contributors
@greysonlalonde, @lorenzejay
</Update>
<Update label="Nov 29, 2025">
## v1.6.1
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.6.1)
## What's Changed
### Bug Fixes
- Fix ChatCompletionsClient call to ensure proper functionality
- Ensure async methods are executable for annotations
- Fix parameters in RagTool.add, add typing, and tests
- Remove invalid parameter from SSE client
- Erase 'oauth2_extra' setting on 'crewai config reset' command
### Refactoring
- Enhance model validation and provider inference in LLM class
### Contributors
@Vidit-Ostwal, @greysonlalonde, @heitorado, @lorenzejay
</Update>
<Update label="Nov 25, 2025">
## v1.6.0
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.6.0)
## What's Changed
### Features
- Add streaming result support to flows and crews
- Add gemini-3-pro-preview
- Support CLI login with Entra ID
- Add Merge Agent Handler tool
- Enhance flow event state management
### Bug Fixes
- Ensure custom rag store persist path is set if passed
- Ensure fuzzy returns are more strict and show type warning
- Re-add openai response_format parameter and add test
- Fix rag tool embeddings configuration
- Ensure flow execution start panel is not shown on plot
### Documentation
- Update references from AMP to AOP in documentation
- Update AMP to AOP
### Contributors
@Vidit-Ostwal, @gilfeig, @greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @markmcd
</Update>
<Update label="Nov 22, 2025">
## v0.203.2
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/0.203.2)
## What's Changed
- Hotfix version bump from 0.203.1 to 0.203.2
</Update>
<Update label="Nov 16, 2025">
## v1.5.0
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.5.0)
## What's Changed
### Features
- Add a2a trust remote completion status flag
- Fetch and store more data about Okta authorization server
- Implement before and after LLM call hooks in CrewAgentExecutor
- Expose messages to TaskOutput and LiteAgentOutputs
- Enhance schema description of QdrantVectorSearchTool
### Bug Fixes
- Ensure tracing instrumentation flags are correctly applied
- Fix custom tool documentation links and add Mintlify broken links action
### Documentation
- Enhance task guardrail documentation with LLM-based validation support
### Contributors
@danielfsbarreto, @greysonlalonde, @heitorado, @lorenzejay, @theCyberTech
</Update>
<Update label="Nov 07, 2025">
## v1.4.1
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.4.1)
## What's Changed
### Bug Fixes
- Fix handling of agent max iterations
- Resolve routing issues for LLM model syntax to respected providers
### Contributors
@greysonlalonde
</Update>
<Update label="Nov 07, 2025">
## v1.4.0
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.4.0)
## What's Changed
### Features
- Add support for non-AST plot routes
- Implement first-class support for MCP
- Add Pydantic validation dunder to BaseInterceptor
- Add support for LLM message interceptor hooks
- Cache i18n prompts for efficient use
- Enhance QdrantVectorSearchTool
### Bug Fixes
- Fix issues with keeping stopwords updated
- Resolve unpickleable values in flow state
- Ensure lite agents course-correct on validation errors
- Fix callback argument hashing to ensure caching works
- Allow adding RAG source content from valid URLs
- Make plot node selection smoother
- Fix duplicating document IDs for knowledge
### Refactoring
- Improve MCP tool execution handling with concurrent futures
- Simplify flow handling, typing, and logging; update UI and tests
- Refactor stop word management to a property
### Documentation
- Migrate embedder to embedding_model and require vectordb across tool docs; add provider examples (en/ko/pt-BR)
### Contributors
@danielfsbarreto, @greysonlalonde, @lorenzejay, @lucasgomide, @tonykipkemboi
</Update>
<Update label="Nov 01, 2025">
## v1.3.0
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.3.0)
## What's Changed
### Features
- Refactor flow handling, typing, and logging
- Enhance QdrantVectorSearchTool
### Bug Fixes
- Fix Firecrawl tools and add tests
- Refactor use_stop_words to property and add check for stop words
### Documentation
- Migrate embedder to embedding_model and require vectordb across tool docs
- Add provider examples in English, Korean, and Portuguese
### Refactoring
- Improve flow handling and UI updates
### Contributors
@danielfsbarreto, @greysonlalonde, @lorenzejay, @lucasgomide, @tonykipkemboi
</Update>
<Update label="Oct 27, 2025">
## v1.2.1
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.2.1)
## What's Changed
### Features
- Add support for Datadog integration
- Support apps and mcps in liteagent
### Documentation
- Describe mandatory environment variable for calling Platform tools for each integration
- Add Datadog integration documentation
### Contributors
@barieom, @lorenzejay, @lucasgomide, @sabrenner
</Update>
<Update label="Oct 24, 2025">
## v1.2.0
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.2.0)
## What's Changed
### Bug Fixes
- Update default LLM model and improve error logging in LLM utilities
- Change flow visualization directory and method inspection
### Dropping Unused
- Remove aisuite
### Contributors
@greysonlalonde, @lorenzejay
</Update>
<Update label="Oct 21, 2025">
## v1.1.0
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.1.0)
## What's Changed
### Features
- Enhance InternalInstructor to support multiple LLM providers
- Implement mypy plugin base
- Improve QdrantVectorSearchTool
### Bug Fixes
- Correct broken integration documentation links
- Fix double trace call and add types
- Pin template versions to latest
### Documentation
- Update LLM integration details and examples
### Refactoring
- Improve CrewBase typing
### Contributors
@cwarre33, @danielfsbarreto, @greysonlalonde, @lorenzejay
</Update>
<Update label="Oct 20, 2025">
## v1.0.0
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0)
## What's Changed
### Features
- Bump versions to 1.0.0
- Enhance knowledge and guardrail event handling in Agent class
- Inject tool repository credentials in crewai run command
### Bug Fixes
- Preserve nested condition structure in Flow decorators
- Add standard print parameters to Printer.print method
- Fix errors when there is no input() available
- Add a leeway of 10s when decoding JWT
- Revert bad cron schedule
- Correct cron schedule to run every 5 days at specific dates
- Use system PATH for Docker binary instead of hardcoded path
- Add CodeQL configuration to properly exclude template directories
### Documentation
- Update security policy for vulnerability reporting
- Add guide for capturing telemetry logs in CrewAI AMP
- Add missing /resume files
- Clarify webhook URL parameter in HITL workflows
### Contributors
@Vidit-Ostwal, @greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @lucasgomide, @mplachta, @theCyberTech
</Update>
<Update label="Oct 18, 2025">
## v1.0.0b3 (Pre-release)
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0b3)
## What's Changed
### Features
- Enhance task guardrail functionality and validation
- Improve support for importing native SDK
- Add Azure native tests
- Enhance BedrockCompletion class with advanced features
- Enhance GeminiCompletion class with client parameter support
- Enhance AnthropicCompletion class with additional client parameters
### Bug Fixes
- Preserve nested condition structure in Flow decorators
- Add standard print parameters to Printer.print method
- Remove stdout prints and improve test determinism
### Refactoring
- Convert project module to metaclass with full typing
### Contributors
@greysonlalonde, @lorenzejay
</Update>
<Update label="Oct 16, 2025">
## v1.0.0b2 (Pre-release)
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0b2)
## What's Changed
### Features
- Enhance OpenAICompletion class with additional client parameters
- Improve event bus thread safety and async support
- Inject tool repository credentials in crewai run command
### Bug Fixes
- Fix issue where it errors out if there is no input() available
- Add a leeway of 10s when decoding JWT
- Fix copying and adding NOT_SPECIFIED check in task.py
### Documentation
- Ensure CREWAI_PLATFORM_INTEGRATION_TOKEN is mentioned in documentation
- Update triggers documentation
### Contributors
@Vidit-Ostwal, @greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @lucasgomide
</Update>
<Update label="Oct 14, 2025">
## v1.0.0b1 (Pre-release)
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0b1)
## What's Changed
### Features
- Enhance OpenAICompletion class with additional client parameters
- Improve event bus thread safety and async support
- Implement Bedrock LLM integration
### Bug Fixes
- Fix issue with missing input() availability
- Resolve JWT decoding error by adding a leeway of 10 seconds
- Inject tool repository credentials in crewai run command
- Fix copy and add NOT_SPECIFIED check in task.py
### Documentation
- Ensure CREWAI_PLATFORM_INTEGRATION_TOKEN is mentioned in documentation
- Update triggers documentation
### Contributors
@Vidit-Ostwal, @greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @lucasgomide
</Update>
<Update label="Oct 13, 2025">
## v0.203.1
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/0.203.1)
## What's Changed
### Core Improvements & Fixes
- Fixed injection of tool repository credentials into the `crewai run` command
- Added a 10-second leeway when decoding JWTs to reduce token validation errors
- Corrected (then reverted) cron schedule fix intended to run jobs every 5 days at specific dates
### Documentation & Guides
- Updated security policy to clarify the process for vulnerability reporting
</Update>
<Update label="Oct 09, 2025">
## v1.0.0a4 (Pre-release)
[View release on GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0a4)
## What's Changed
### Features
- Enhance knowledge and guardrail event handling in Agent class
- Introduce trigger listing and execution commands for local development
- Update documentation with new approach to consume Platform Actions
- Add guide for capturing telemetry logs in CrewAI AMP
### Bug Fixes
- Revert bad cron schedule
- Correct cron schedule to run every 5 days at specific dates
- Remove duplicate line and add explicit environment variable
- Resolve linting errors across the codebase
- Replace print statements with logger in agent and memory handling
- Use system PATH for Docker binary instead of hardcoded path
- Allow failed PyPI publish
- Match tag and release title, ignore devtools build for PyPI
### Documentation
- Update security policy for vulnerability reporting
- Add missing /resume files
- Clarify webhook URL parameter in HITL workflows
### Contributors
@Vidit-Ostwal, @greysonlalonde, @lorenzejay, @lucasgomide, @theCyberTech
</Update>
<Update label="Sep 30, 2025">
## v1.0.0a1

View File

@@ -574,6 +574,10 @@ When you run this Flow, the output will change based on the random boolean value
### Human in the Loop (human feedback)
<Note>
The `@human_feedback` decorator requires **CrewAI version 1.8.0 or higher**.
</Note>
The `@human_feedback` decorator enables human-in-the-loop workflows by pausing flow execution to collect feedback from a human. This is useful for approval gates, quality review, and decision points that require human judgment.
```python Code

View File

@@ -375,10 +375,13 @@ In this section, you'll find detailed examples that help you select, configure,
GOOGLE_API_KEY=<your-api-key>
GEMINI_API_KEY=<your-api-key>
# Optional - for Vertex AI
# For Vertex AI Express mode (API key authentication)
GOOGLE_GENAI_USE_VERTEXAI=true
GOOGLE_API_KEY=<your-api-key>
# For Vertex AI with service account
GOOGLE_CLOUD_PROJECT=<your-project-id>
GOOGLE_CLOUD_LOCATION=<location> # Defaults to us-central1
GOOGLE_GENAI_USE_VERTEXAI=true # Set to use Vertex AI
```
**Basic Usage:**
@@ -412,7 +415,35 @@ In this section, you'll find detailed examples that help you select, configure,
)
```
**Vertex AI Configuration:**
**Vertex AI Express Mode (API Key Authentication):**
Vertex AI Express mode allows you to use Vertex AI with simple API key authentication instead of service account credentials. This is the quickest way to get started with Vertex AI.
To enable Express mode, set both environment variables in your `.env` file:
```toml .env
GOOGLE_GENAI_USE_VERTEXAI=true
GOOGLE_API_KEY=<your-api-key>
```
Then use the LLM as usual:
```python Code
from crewai import LLM
llm = LLM(
model="gemini/gemini-2.0-flash",
temperature=0.7
)
```
<Info>
To get an Express mode API key:
- New Google Cloud users: Get an [express mode API key](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart?usertype=apikey)
- Existing Google Cloud users: Get a [Google Cloud API key bound to a service account](https://cloud.google.com/docs/authentication/api-keys)
For more details, see the [Vertex AI Express mode documentation](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart?usertype=apikey).
</Info>
**Vertex AI Configuration (Service Account):**
```python Code
from crewai import LLM
@@ -424,10 +455,10 @@ In this section, you'll find detailed examples that help you select, configure,
```
**Supported Environment Variables:**
- `GOOGLE_API_KEY` or `GEMINI_API_KEY`: Your Google API key (required for Gemini API)
- `GOOGLE_CLOUD_PROJECT`: Google Cloud project ID (for Vertex AI)
- `GOOGLE_API_KEY` or `GEMINI_API_KEY`: Your Google API key (required for Gemini API and Vertex AI Express mode)
- `GOOGLE_GENAI_USE_VERTEXAI`: Set to `true` to use Vertex AI (required for Express mode)
- `GOOGLE_CLOUD_PROJECT`: Google Cloud project ID (for Vertex AI with service account)
- `GOOGLE_CLOUD_LOCATION`: GCP location (defaults to `us-central1`)
- `GOOGLE_GENAI_USE_VERTEXAI`: Set to `true` to use Vertex AI
**Features:**
- Native function calling support for Gemini 1.5+ and 2.x models

View File

@@ -0,0 +1,342 @@
---
title: PII Redaction for Traces
description: "Automatically redact sensitive data from crew and flow execution traces"
icon: "lock"
mode: "wide"
---
## Overview
PII Redaction is a CrewAI AMP feature that automatically detects and masks Personally Identifiable Information (PII) in your crew and flow execution traces. This ensures sensitive data like credit card numbers, social security numbers, email addresses, and names are not exposed in your CrewAI AMP traces. You can also create custom recognizers to protect organization-specific data.
<Info>
PII Redaction is available on the Enterprise plan.
Deployment must be version 1.8.0 or higher.
</Info>
<Frame>
![PII Redaction Overview](/images/enterprise/pii_mask_recognizer_trace_example.png)
</Frame>
## Why PII Redaction Matters
When running AI agents in production, sensitive information often flows through your crews:
- Customer data from CRM integrations
- Financial information from payment processors
- Personal details from form submissions
- Internal employee data
Without proper redaction, this data appears in traces, making compliance with regulations like GDPR, HIPAA, and PCI-DSS challenging. PII Redaction solves this by automatically masking sensitive data before it's stored in traces.
## How It Works
1. **Detect** - Scan trace event data for known PII patterns
2. **Classify** - Identify the type of sensitive data (credit card, SSN, email, etc.)
3. **Mask/Redact** - Replace the sensitive data with masked values based on your configuration
```
Original: "Contact john.doe@company.com or call 555-123-4567"
Redacted: "Contact <EMAIL_ADDRESS> or call <PHONE_NUMBER>"
```
## Enabling PII Redaction
<Info>
You must be on the Enterprise plan and your deployment must be version 1.8.0 or higher to use this feature.
</Info>
<Steps>
<Step title="Navigate to Crew Settings">
In the CrewAI AMP dashboard, select your deployed crew and go to one of your deployments/automations, then navigate to **Settings** → **PII Protection**.
</Step>
<Step title="Enable PII Protection">
Toggle on **PII Redaction for Traces**. This will enable automatic scanning and redaction of trace data.
<Info>
You need to manually enable PII Redaction for each deployment.
</Info>
<Frame>
![Enable PII Redaction](/images/enterprise/pii_mask_recognizer_enable.png)
</Frame>
</Step>
<Step title="Configure Entity Types">
Select which types of PII to detect and redact. Each entity can be individually enabled or disabled.
<Frame>
![Configure Entities](/images/enterprise/pii_mask_recognizer_supported_entities.png)
</Frame>
</Step>
<Step title="Save">
Save your configuration. PII redaction will be active on all subsequent crew executions, no redeployment is needed.
</Step>
</Steps>
## Supported Entity Types
CrewAI supports the following PII entity types, organized by category.
### Global Entities
| Entity | Description | Example |
|--------|-------------|---------|
| `CREDIT_CARD` | Credit/debit card numbers | "4111-1111-1111-1111" |
| `CRYPTO` | Cryptocurrency wallet addresses | "bc1qxy2kgd..." |
| `DATE_TIME` | Dates and times | "January 15, 2024" |
| `EMAIL_ADDRESS` | Email addresses | "john@example.com" |
| `IBAN_CODE` | International bank account numbers | "DE89 3704 0044 0532 0130 00" |
| `IP_ADDRESS` | IPv4 and IPv6 addresses | "192.168.1.1" |
| `LOCATION` | Geographic locations | "New York City" |
| `MEDICAL_LICENSE` | Medical license numbers | "MD12345" |
| `NRP` | Nationalities, religious, or political groups | - |
| `PERSON` | Personal names | "John Doe" |
| `PHONE_NUMBER` | Phone numbers in various formats | "+1 (555) 123-4567" |
| `URL` | Web URLs | "https://example.com" |
### US-Specific Entities
| Entity | Description | Example |
|--------|-------------|---------|
| `US_BANK_NUMBER` | US Bank account numbers | "1234567890" |
| `US_DRIVER_LICENSE` | US Driver's license numbers | "D1234567" |
| `US_ITIN` | Individual Taxpayer ID | "900-70-0000" |
| `US_PASSPORT` | US Passport numbers | "123456789" |
| `US_SSN` | Social Security Numbers | "123-45-6789" |
## Redaction Actions
For each enabled entity, you can configure how the data is redacted:
| Action | Description | Example Output |
|--------|-------------|----------------|
| `mask` | Replace with the entity type label | `<CREDIT_CARD>` |
| `redact` | Completely remove the text | *(empty)* |
## Custom Recognizers
In addition to built-in entities, you can create **custom recognizers** to detect organization-specific PII patterns.
<Frame>
![Custom Recognizers](/images/enterprise/pii_mask_recognizer.png)
</Frame>
### Recognizer Types
You have two options for custom recognizers:
| Type | Best For | Example Use Case |
|------|----------|------------------|
| **Pattern-based (Regex)** | Structured data with predictable formats | Salary amounts, employee IDs, project codes |
| **Deny-list** | Exact string matches | Company names, internal codenames, specific terms |
### Creating a Custom Recognizer
<Steps>
<Step title="Navigate to Custom Recognizers">
Go to your Organization **Settings** → **Organization** → **Add Recognizer**.
</Step>
<Step title="Configure the Recognizer">
<Frame>
![Configure Recognizer](/images/enterprise/pii_mask_recognizer_create.png)
</Frame>
Configure the following fields:
- **Name**: A descriptive name for the recognizer
- **Entity Type**: The entity label that will appear in redacted output (e.g., `EMPLOYEE_ID`, `SALARY`)
- **Type**: Choose between Regex Pattern or Deny List
- **Pattern/Values**: Regex pattern or list of strings to match
- **Confidence Threshold**: Minimum score (0.0-1.0) required for a match to trigger redaction. Higher values (e.g., 0.8) reduce false positives but may miss some matches. Lower values (e.g., 0.5) catch more matches but may over-redact. Default is 0.8.
- **Context Words** (optional): Words that increase detection confidence when found nearby
</Step>
<Step title="Save">
Save the recognizer. It will be available to enable on your deployments.
</Step>
</Steps>
### Understanding Entity Types
The **Entity Type** determines how matched content appears in redacted traces:
```
Entity Type: SALARY
Pattern: salary:\s*\$\s*\d+
Input: "Employee salary: $50,000"
Output: "Employee <SALARY>"
```
### Using Context Words
Context words improve accuracy by increasing confidence when specific terms appear near the matched pattern:
```
Context Words: "project", "code", "internal"
Entity Type: PROJECT_CODE
Pattern: PRJ-\d{4}
```
When "project" or "code" appears near "PRJ-1234", the recognizer has higher confidence it's a true match, reducing false positives.
## Viewing Redacted Traces
Once PII redaction is enabled, your traces will show redacted values in place of sensitive data:
```
Task Output: "Customer <PERSON> placed order #12345.
Contact email: <EMAIL_ADDRESS>, phone: <PHONE_NUMBER>.
Payment processed for card ending in <CREDIT_CARD>."
```
Redacted values are clearly marked with angle brackets and the entity type label (e.g., `<EMAIL_ADDRESS>`), making it easy to understand what data was protected while still allowing you to debug and monitor crew behavior.
## Best Practices
### Performance Considerations
<Steps>
<Step title="Enable Only Needed Entities">
Each enabled entity adds processing overhead. Only enable entities relevant to your data.
</Step>
<Step title="Use Specific Patterns">
For custom recognizers, use specific patterns to reduce false positives and improve performance. Regex patterns are best when identifying specific patterns in the traces such as salary, employee id, project code, etc. Deny-list recognizers are best when identifying exact strings in the traces such as company names, internal codenames, etc.
</Step>
<Step title="Leverage Context Words">
Context words improve accuracy by only triggering detection when surrounding text matches.
</Step>
</Steps>
## Troubleshooting
<Accordion title="PII Not Being Redacted">
**Possible Causes:**
- Entity type not enabled in configuration
- Pattern doesn't match the data format
- Custom recognizer has syntax errors
**Solutions:**
- Verify entity is enabled in Settings → Security
- Test regex patterns with sample data
- Check logs for configuration errors
</Accordion>
<Accordion title="Too Much Data Being Redacted">
**Possible Causes:**
- Overly broad entity types enabled (e.g., `DATE_TIME` catches dates everywhere)
- Custom recognizer patterns are too general
**Solutions:**
- Disable entities that cause false positives
- Make custom patterns more specific
- Add context words to improve accuracy
</Accordion>
<Accordion title="Performance Issues">
**Possible Causes:**
- Too many entities enabled
- NLP-based entities (`PERSON`, `LOCATION`, `NRP`) are computationally expensive as they use machine learning models
**Solutions:**
- Only enable entities you actually need
- Consider using pattern-based alternatives where possible
- Monitor trace processing times in the dashboard
</Accordion>
---
## Practical Example: Salary Pattern Matching
This example demonstrates how to create a custom recognizer to detect and mask salary information in your traces.
### Use Case
Your crew processes employee or financial data that includes salary information in formats like:
- `salary: $50,000`
- `salary: $125,000.00`
- `salary:$1,500.50`
You want to automatically mask these values to protect sensitive compensation data.
### Configuration
<Frame>
![Salary Recognizer Configuration](/images/enterprise/pii_mask_custom_recognizer_salary.png)
</Frame>
| Field | Value |
|-------|-------|
| **Name** | `SALARY` |
| **Entity Type** | `SALARY` |
| **Type** | Regex Pattern |
| **Regex Pattern** | `salary:\s*\$\s*\d{1,3}(,\d{3})*(\.\d{2})?` |
| **Action** | Mask |
| **Confidence Threshold** | `0.8` |
| **Context Words** | `salary, compensation, pay, wage, income` |
### Regex Pattern Breakdown
| Pattern Component | Meaning |
|-------------------|---------|
| `salary:` | Matches the literal text "salary:" |
| `\s*` | Matches zero or more whitespace characters |
| `\$` | Matches the dollar sign (escaped) |
| `\s*` | Matches zero or more whitespace characters after $ |
| `\d{1,3}` | Matches 1-3 digits (e.g., "1", "50", "125") |
| `(,\d{3})*` | Matches comma-separated thousands (e.g., ",000", ",500,000") |
| `(\.\d{2})?` | Optionally matches cents (e.g., ".00", ".50") |
### Example Results
```
Original: "Employee record shows salary: $125,000.00 annually"
Redacted: "Employee record shows <SALARY> annually"
Original: "Base salary:$50,000 with bonus potential"
Redacted: "Base <SALARY> with bonus potential"
```
<Tip>
Adding context words like "salary", "compensation", "pay", "wage", and "income" helps increase detection confidence when these terms appear near the matched pattern, reducing false positives.
</Tip>
### Enable the Recognizer for Your Deployments
<Warning>
Creating a custom recognizer at the organization level does not automatically enable it for your deployments. You must manually enable each recognizer for every deployment where you want it applied.
</Warning>
After creating your custom recognizer, enable it for each deployment:
<Steps>
<Step title="Navigate to Your Deployment">
Go to your deployment/automation and open **Settings** → **PII Protection**.
</Step>
<Step title="Select Custom Recognizers">
Under **Mask Recognizers**, you'll see your organization-defined recognizers. Check the box next to the recognizers you want to enable.
<Frame>
![Enable Custom Recognizer](/images/enterprise/pii_mask_recognizers_options.png)
</Frame>
</Step>
<Step title="Save Configuration">
Save your changes. The recognizer will be active on all subsequent executions for this deployment.
</Step>
</Steps>
<Info>
Repeat this process for each deployment where you need the custom recognizer. This gives you granular control over which recognizers are active in different environments (e.g., development vs. production).
</Info>

View File

@@ -1,12 +1,12 @@
---
title: "Deploy Crew"
description: "Deploying a Crew on CrewAI AMP"
title: "Deploy to AMP"
description: "Deploy your Crew or Flow to CrewAI AMP"
icon: "rocket"
mode: "wide"
---
<Note>
After creating a crew locally or through Crew Studio, the next step is
After creating a Crew or Flow locally (or through Crew Studio), the next step is
deploying it to the CrewAI AMP platform. This guide covers multiple deployment
methods to help you choose the best approach for your workflow.
</Note>
@@ -14,19 +14,26 @@ mode: "wide"
## Prerequisites
<CardGroup cols={2}>
<Card title="Crew Ready for Deployment" icon="users">
You should have a working crew either built locally or created through Crew
Studio
<Card title="Project Ready for Deployment" icon="check-circle">
You should have a working Crew or Flow that runs successfully locally.
Follow our [preparation guide](/en/enterprise/guides/prepare-for-deployment) to verify your project structure.
</Card>
<Card title="GitHub Repository" icon="github">
Your crew code should be in a GitHub repository (for GitHub integration
Your code should be in a GitHub repository (for GitHub integration
method)
</Card>
</CardGroup>
<Info>
**Crews vs Flows**: Both project types can be deployed as "automations" on CrewAI AMP.
The deployment process is the same, but they have different project structures.
See [Prepare for Deployment](/en/enterprise/guides/prepare-for-deployment) for details.
</Info>
## Option 1: Deploy Using CrewAI CLI
The CLI provides the fastest way to deploy locally developed crews to the Enterprise platform.
The CLI provides the fastest way to deploy locally developed Crews or Flows to the AMP platform.
The CLI automatically detects your project type from `pyproject.toml` and builds accordingly.
<Steps>
<Step title="Install CrewAI CLI">
@@ -128,7 +135,7 @@ crewai deploy remove <deployment_id>
## Option 2: Deploy Directly via Web Interface
You can also deploy your crews directly through the CrewAI AMP web interface by connecting your GitHub account. This approach doesn't require using the CLI on your local machine.
You can also deploy your Crews or Flows directly through the CrewAI AMP web interface by connecting your GitHub account. This approach doesn't require using the CLI on your local machine. The platform automatically detects your project type and handles the build appropriately.
<Steps>
@@ -282,68 +289,7 @@ For automated deployments in CI/CD pipelines, you can use the CrewAI API to trig
</Steps>
## ⚠️ Environment Variable Security Requirements
<Warning>
**Important**: CrewAI AMP has security restrictions on environment variable
names that can cause deployment failures if not followed.
</Warning>
### Blocked Environment Variable Patterns
For security reasons, the following environment variable naming patterns are **automatically filtered** and will cause deployment issues:
**Blocked Patterns:**
- Variables ending with `_TOKEN` (e.g., `MY_API_TOKEN`)
- Variables ending with `_PASSWORD` (e.g., `DB_PASSWORD`)
- Variables ending with `_SECRET` (e.g., `API_SECRET`)
- Variables ending with `_KEY` in certain contexts
**Specific Blocked Variables:**
- `GITHUB_USER`, `GITHUB_TOKEN`
- `AWS_REGION`, `AWS_DEFAULT_REGION`
- Various internal CrewAI system variables
### Allowed Exceptions
Some variables are explicitly allowed despite matching blocked patterns:
- `AZURE_AD_TOKEN`
- `AZURE_OPENAI_AD_TOKEN`
- `ENTERPRISE_ACTION_TOKEN`
- `CREWAI_ENTEPRISE_TOOLS_TOKEN`
### How to Fix Naming Issues
If your deployment fails due to environment variable restrictions:
```bash
# ❌ These will cause deployment failures
OPENAI_TOKEN=sk-...
DATABASE_PASSWORD=mypassword
API_SECRET=secret123
# ✅ Use these naming patterns instead
OPENAI_API_KEY=sk-...
DATABASE_CREDENTIALS=mypassword
API_CONFIG=secret123
```
### Best Practices
1. **Use standard naming conventions**: `PROVIDER_API_KEY` instead of `PROVIDER_TOKEN`
2. **Test locally first**: Ensure your crew works with the renamed variables
3. **Update your code**: Change any references to the old variable names
4. **Document changes**: Keep track of renamed variables for your team
<Tip>
If you encounter deployment failures with cryptic environment variable errors,
check your variable names against these patterns first.
</Tip>
### Interact with Your Deployed Crew
## Interact with Your Deployed Automation
Once deployment is complete, you can access your crew through:
@@ -387,7 +333,108 @@ The Enterprise platform also offers:
- **Custom Tools Repository**: Create, share, and install tools
- **Crew Studio**: Build crews through a chat interface without writing code
## Troubleshooting Deployment Failures
If your deployment fails, check these common issues:
### Build Failures
#### Missing uv.lock File
**Symptom**: Build fails early with dependency resolution errors
**Solution**: Generate and commit the lock file:
```bash
uv lock
git add uv.lock
git commit -m "Add uv.lock for deployment"
git push
```
<Warning>
The `uv.lock` file is required for all deployments. Without it, the platform
cannot reliably install your dependencies.
</Warning>
#### Wrong Project Structure
**Symptom**: "Could not find entry point" or "Module not found" errors
**Solution**: Verify your project matches the expected structure:
- **Both Crews and Flows**: Must have entry point at `src/project_name/main.py`
- **Crews**: Use a `run()` function as entry point
- **Flows**: Use a `kickoff()` function as entry point
See [Prepare for Deployment](/en/enterprise/guides/prepare-for-deployment) for detailed structure diagrams.
#### Missing CrewBase Decorator
**Symptom**: "Crew not found", "Config not found", or agent/task configuration errors
**Solution**: Ensure **all** crew classes use the `@CrewBase` decorator:
```python
from crewai.project import CrewBase, agent, crew, task
@CrewBase # This decorator is REQUIRED
class YourCrew():
"""Your crew description"""
@agent
def my_agent(self) -> Agent:
return Agent(
config=self.agents_config['my_agent'], # type: ignore[index]
verbose=True
)
# ... rest of crew definition
```
<Info>
This applies to standalone Crews AND crews embedded inside Flow projects.
Every crew class needs the decorator.
</Info>
#### Incorrect pyproject.toml Type
**Symptom**: Build succeeds but runtime fails, or unexpected behavior
**Solution**: Verify the `[tool.crewai]` section matches your project type:
```toml
# For Crew projects:
[tool.crewai]
type = "crew"
# For Flow projects:
[tool.crewai]
type = "flow"
```
### Runtime Failures
#### LLM Connection Failures
**Symptom**: API key errors, "model not found", or authentication failures
**Solution**:
1. Verify your LLM provider's API key is correctly set in environment variables
2. Ensure the environment variable names match what your code expects
3. Test locally with the exact same environment variables before deploying
#### Crew Execution Errors
**Symptom**: Crew starts but fails during execution
**Solution**:
1. Check the execution logs in the AMP dashboard (Traces tab)
2. Verify all tools have required API keys configured
3. Ensure agent configurations in `agents.yaml` are valid
4. Check task configurations in `tasks.yaml` for syntax errors
<Card title="Need Help?" icon="headset" href="mailto:support@crewai.com">
Contact our support team for assistance with deployment issues or questions
about the Enterprise platform.
about the AMP platform.
</Card>

View File

@@ -0,0 +1,305 @@
---
title: "Prepare for Deployment"
description: "Ensure your Crew or Flow is ready for deployment to CrewAI AMP"
icon: "clipboard-check"
mode: "wide"
---
<Note>
Before deploying to CrewAI AMP, it's crucial to verify your project is correctly structured.
Both Crews and Flows can be deployed as "automations," but they have different project structures
and requirements that must be met for successful deployment.
</Note>
## Understanding Automations
In CrewAI AMP, **automations** is the umbrella term for deployable Agentic AI projects. An automation can be either:
- **A Crew**: A standalone team of AI agents working together on tasks
- **A Flow**: An orchestrated workflow that can combine multiple crews, direct LLM calls, and procedural logic
Understanding which type you're deploying is essential because they have different project structures and entry points.
## Crews vs Flows: Key Differences
<CardGroup cols={2}>
<Card title="Crew Projects" icon="users">
Standalone AI agent teams with `crew.py` defining agents and tasks. Best for focused, collaborative tasks.
</Card>
<Card title="Flow Projects" icon="diagram-project">
Orchestrated workflows with embedded crews in a `crews/` folder. Best for complex, multi-stage processes.
</Card>
</CardGroup>
| Aspect | Crew | Flow |
|--------|------|------|
| **Project structure** | `src/project_name/` with `crew.py` | `src/project_name/` with `crews/` folder |
| **Main logic location** | `src/project_name/crew.py` | `src/project_name/main.py` (Flow class) |
| **Entry point function** | `run()` in `main.py` | `kickoff()` in `main.py` |
| **pyproject.toml type** | `type = "crew"` | `type = "flow"` |
| **CLI create command** | `crewai create crew name` | `crewai create flow name` |
| **Config location** | `src/project_name/config/` | `src/project_name/crews/crew_name/config/` |
| **Can contain other crews** | No | Yes (in `crews/` folder) |
## Project Structure Reference
### Crew Project Structure
When you run `crewai create crew my_crew`, you get this structure:
```
my_crew/
├── .gitignore
├── pyproject.toml # Must have type = "crew"
├── README.md
├── .env
├── uv.lock # REQUIRED for deployment
└── src/
└── my_crew/
├── __init__.py
├── main.py # Entry point with run() function
├── crew.py # Crew class with @CrewBase decorator
├── tools/
│ ├── custom_tool.py
│ └── __init__.py
└── config/
├── agents.yaml # Agent definitions
└── tasks.yaml # Task definitions
```
<Warning>
The nested `src/project_name/` structure is critical for Crews.
Placing files at the wrong level will cause deployment failures.
</Warning>
### Flow Project Structure
When you run `crewai create flow my_flow`, you get this structure:
```
my_flow/
├── .gitignore
├── pyproject.toml # Must have type = "flow"
├── README.md
├── .env
├── uv.lock # REQUIRED for deployment
└── src/
└── my_flow/
├── __init__.py
├── main.py # Entry point with kickoff() function + Flow class
├── crews/ # Embedded crews folder
│ └── poem_crew/
│ ├── __init__.py
│ ├── poem_crew.py # Crew with @CrewBase decorator
│ └── config/
│ ├── agents.yaml
│ └── tasks.yaml
└── tools/
├── __init__.py
└── custom_tool.py
```
<Info>
Both Crews and Flows use the `src/project_name/` structure.
The key difference is that Flows have a `crews/` folder for embedded crews,
while Crews have `crew.py` directly in the project folder.
</Info>
## Pre-Deployment Checklist
Use this checklist to verify your project is ready for deployment.
### 1. Verify pyproject.toml Configuration
Your `pyproject.toml` must include the correct `[tool.crewai]` section:
<Tabs>
<Tab title="For Crews">
```toml
[tool.crewai]
type = "crew"
```
</Tab>
<Tab title="For Flows">
```toml
[tool.crewai]
type = "flow"
```
</Tab>
</Tabs>
<Warning>
If the `type` doesn't match your project structure, the build will fail or
the automation won't run correctly.
</Warning>
### 2. Ensure uv.lock File Exists
CrewAI uses `uv` for dependency management. The `uv.lock` file ensures reproducible builds and is **required** for deployment.
```bash
# Generate or update the lock file
uv lock
# Verify it exists
ls -la uv.lock
```
If the file doesn't exist, run `uv lock` and commit it to your repository:
```bash
uv lock
git add uv.lock
git commit -m "Add uv.lock for deployment"
git push
```
### 3. Validate CrewBase Decorator Usage
**Every crew class must use the `@CrewBase` decorator.** This applies to:
- Standalone crew projects
- Crews embedded inside Flow projects
```python
from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from crewai.agents.agent_builder.base_agent import BaseAgent
from typing import List
@CrewBase # This decorator is REQUIRED
class MyCrew():
"""My crew description"""
agents: List[BaseAgent]
tasks: List[Task]
@agent
def my_agent(self) -> Agent:
return Agent(
config=self.agents_config['my_agent'], # type: ignore[index]
verbose=True
)
@task
def my_task(self) -> Task:
return Task(
config=self.tasks_config['my_task'] # type: ignore[index]
)
@crew
def crew(self) -> Crew:
return Crew(
agents=self.agents,
tasks=self.tasks,
process=Process.sequential,
verbose=True,
)
```
<Warning>
If you forget the `@CrewBase` decorator, your deployment will fail with
errors about missing agents or tasks configurations.
</Warning>
### 4. Check Project Entry Points
Both Crews and Flows have their entry point in `src/project_name/main.py`:
<Tabs>
<Tab title="For Crews">
The entry point uses a `run()` function:
```python
# src/my_crew/main.py
from my_crew.crew import MyCrew
def run():
"""Run the crew."""
inputs = {'topic': 'AI in Healthcare'}
result = MyCrew().crew().kickoff(inputs=inputs)
return result
if __name__ == "__main__":
run()
```
</Tab>
<Tab title="For Flows">
The entry point uses a `kickoff()` function with a Flow class:
```python
# src/my_flow/main.py
from crewai.flow import Flow, listen, start
from my_flow.crews.poem_crew.poem_crew import PoemCrew
class MyFlow(Flow):
@start()
def begin(self):
# Flow logic here
result = PoemCrew().crew().kickoff(inputs={...})
return result
def kickoff():
"""Run the flow."""
MyFlow().kickoff()
if __name__ == "__main__":
kickoff()
```
</Tab>
</Tabs>
### 5. Prepare Environment Variables
Before deployment, ensure you have:
1. **LLM API keys** ready (OpenAI, Anthropic, Google, etc.)
2. **Tool API keys** if using external tools (Serper, etc.)
<Tip>
Test your project locally with the same environment variables before deploying
to catch configuration issues early.
</Tip>
## Quick Validation Commands
Run these commands from your project root to quickly verify your setup:
```bash
# 1. Check project type in pyproject.toml
grep -A2 "\[tool.crewai\]" pyproject.toml
# 2. Verify uv.lock exists
ls -la uv.lock || echo "ERROR: uv.lock missing! Run 'uv lock'"
# 3. Verify src/ structure exists
ls -la src/*/main.py 2>/dev/null || echo "No main.py found in src/"
# 4. For Crews - verify crew.py exists
ls -la src/*/crew.py 2>/dev/null || echo "No crew.py (expected for Crews)"
# 5. For Flows - verify crews/ folder exists
ls -la src/*/crews/ 2>/dev/null || echo "No crews/ folder (expected for Flows)"
# 6. Check for CrewBase usage
grep -r "@CrewBase" . --include="*.py"
```
## Common Setup Mistakes
| Mistake | Symptom | Fix |
|---------|---------|-----|
| Missing `uv.lock` | Build fails during dependency resolution | Run `uv lock` and commit |
| Wrong `type` in pyproject.toml | Build succeeds but runtime fails | Change to correct type |
| Missing `@CrewBase` decorator | "Config not found" errors | Add decorator to all crew classes |
| Files at root instead of `src/` | Entry point not found | Move to `src/project_name/` |
| Missing `run()` or `kickoff()` | Cannot start automation | Add correct entry function |
## Next Steps
Once your project passes all checklist items, you're ready to deploy:
<Card title="Deploy to AMP" icon="rocket" href="/en/enterprise/guides/deploy-to-amp">
Follow the deployment guide to deploy your Crew or Flow to CrewAI AMP using
the CLI, web interface, or CI/CD integration.
</Card>

View File

@@ -1,43 +1,48 @@
---
title: Agent-to-Agent (A2A) Protocol
description: Enable CrewAI agents to delegate tasks to remote A2A-compliant agents for specialized handling
description: Agents delegate tasks to remote A2A agents and/or operate as A2A-compliant server agents.
icon: network-wired
mode: "wide"
---
## A2A Agent Delegation
CrewAI supports the Agent-to-Agent (A2A) protocol, allowing agents to delegate tasks to remote specialized agents. The agent's LLM automatically decides whether to handle a task directly or delegate to an A2A agent based on the task requirements.
<Note>
A2A delegation requires the `a2a-sdk` package. Install with: `uv add 'crewai[a2a]'` or `pip install 'crewai[a2a]'`
</Note>
CrewAI treats [A2A protocol](https://a2a-protocol.org/latest/) as a first-class delegation primitive, enabling agents to delegate tasks, request information, and collaborate with remote agents, as well as act as A2A-compliant server agents.
In client mode, agents autonomously choose between local execution and remote delegation based on task requirements.
## How It Works
When an agent is configured with A2A capabilities:
1. The LLM analyzes each task
1. The Agent analyzes each task
2. It decides to either:
- Handle the task directly using its own capabilities
- Delegate to a remote A2A agent for specialized handling
3. If delegating, the agent communicates with the remote A2A agent through the protocol
4. Results are returned to the CrewAI workflow
<Note>
A2A delegation requires the `a2a-sdk` package. Install with: `uv add 'crewai[a2a]'` or `pip install 'crewai[a2a]'`
</Note>
## Basic Configuration
<Warning>
`crewai.a2a.config.A2AConfig` is deprecated and will be removed in v2.0.0. Use `A2AClientConfig` for connecting to remote agents and/or `A2AServerConfig` for exposing agents as servers.
</Warning>
Configure an agent for A2A delegation by setting the `a2a` parameter:
```python Code
from crewai import Agent, Crew, Task
from crewai.a2a import A2AConfig
from crewai.a2a import A2AClientConfig
agent = Agent(
role="Research Coordinator",
goal="Coordinate research tasks efficiently",
backstory="Expert at delegating to specialized research agents",
llm="gpt-4o",
a2a=A2AConfig(
a2a=A2AClientConfig(
endpoint="https://example.com/.well-known/agent-card.json",
timeout=120,
max_turns=10
@@ -54,9 +59,9 @@ crew = Crew(agents=[agent], tasks=[task], verbose=True)
result = crew.kickoff()
```
## Configuration Options
## Client Configuration Options
The `A2AConfig` class accepts the following parameters:
The `A2AClientConfig` class accepts the following parameters:
<ParamField path="endpoint" type="str" required>
The A2A agent endpoint URL (typically points to `.well-known/agent-card.json`)
@@ -91,14 +96,34 @@ The `A2AConfig` class accepts the following parameters:
Update mechanism for receiving task status. Options: `StreamingConfig`, `PollingConfig`, or `PushNotificationConfig`.
</ParamField>
<ParamField path="transport_protocol" type="Literal['JSONRPC', 'GRPC', 'HTTP+JSON']" default="JSONRPC">
Transport protocol for A2A communication. Options: `JSONRPC` (default), `GRPC`, or `HTTP+JSON`.
</ParamField>
<ParamField path="accepted_output_modes" type="list[str]" default='["application/json"]'>
Media types the client can accept in responses.
</ParamField>
<ParamField path="supported_transports" type="list[str]" default='["JSONRPC"]'>
Ordered list of transport protocols the client supports.
</ParamField>
<ParamField path="use_client_preference" type="bool" default="False">
Whether to prioritize client transport preferences over server.
</ParamField>
<ParamField path="extensions" type="list[str]" default="[]">
Extension URIs the client supports.
</ParamField>
## Authentication
For A2A agents that require authentication, use one of the provided auth schemes:
<Tabs>
<Tab title="Bearer Token">
```python Code
from crewai.a2a import A2AConfig
```python bearer_token_auth.py lines
from crewai.a2a import A2AClientConfig
from crewai.a2a.auth import BearerTokenAuth
agent = Agent(
@@ -106,18 +131,18 @@ agent = Agent(
goal="Coordinate tasks with secured agents",
backstory="Manages secure agent communications",
llm="gpt-4o",
a2a=A2AConfig(
a2a=A2AClientConfig(
endpoint="https://secure-agent.example.com/.well-known/agent-card.json",
auth=BearerTokenAuth(token="your-bearer-token"),
timeout=120
)
)
```
```
</Tab>
<Tab title="API Key">
```python Code
from crewai.a2a import A2AConfig
```python api_key_auth.py lines
from crewai.a2a import A2AClientConfig
from crewai.a2a.auth import APIKeyAuth
agent = Agent(
@@ -125,7 +150,7 @@ agent = Agent(
goal="Coordinate with API-based agents",
backstory="Manages API-authenticated communications",
llm="gpt-4o",
a2a=A2AConfig(
a2a=A2AClientConfig(
endpoint="https://api-agent.example.com/.well-known/agent-card.json",
auth=APIKeyAuth(
api_key="your-api-key",
@@ -135,12 +160,12 @@ agent = Agent(
timeout=120
)
)
```
```
</Tab>
<Tab title="OAuth2">
```python Code
from crewai.a2a import A2AConfig
```python oauth2_auth.py lines
from crewai.a2a import A2AClientConfig
from crewai.a2a.auth import OAuth2ClientCredentials
agent = Agent(
@@ -148,7 +173,7 @@ agent = Agent(
goal="Coordinate with OAuth-secured agents",
backstory="Manages OAuth-authenticated communications",
llm="gpt-4o",
a2a=A2AConfig(
a2a=A2AClientConfig(
endpoint="https://oauth-agent.example.com/.well-known/agent-card.json",
auth=OAuth2ClientCredentials(
token_url="https://auth.example.com/oauth/token",
@@ -159,12 +184,12 @@ agent = Agent(
timeout=120
)
)
```
```
</Tab>
<Tab title="HTTP Basic">
```python Code
from crewai.a2a import A2AConfig
```python http_basic_auth.py lines
from crewai.a2a import A2AClientConfig
from crewai.a2a.auth import HTTPBasicAuth
agent = Agent(
@@ -172,7 +197,7 @@ agent = Agent(
goal="Coordinate with basic auth agents",
backstory="Manages basic authentication communications",
llm="gpt-4o",
a2a=A2AConfig(
a2a=A2AClientConfig(
endpoint="https://basic-agent.example.com/.well-known/agent-card.json",
auth=HTTPBasicAuth(
username="your-username",
@@ -181,7 +206,7 @@ agent = Agent(
timeout=120
)
)
```
```
</Tab>
</Tabs>
@@ -190,7 +215,7 @@ agent = Agent(
Configure multiple A2A agents for delegation by passing a list:
```python Code
from crewai.a2a import A2AConfig
from crewai.a2a import A2AClientConfig
from crewai.a2a.auth import BearerTokenAuth
agent = Agent(
@@ -199,11 +224,11 @@ agent = Agent(
backstory="Expert at delegating to the right specialist",
llm="gpt-4o",
a2a=[
A2AConfig(
A2AClientConfig(
endpoint="https://research.example.com/.well-known/agent-card.json",
timeout=120
),
A2AConfig(
A2AClientConfig(
endpoint="https://data.example.com/.well-known/agent-card.json",
auth=BearerTokenAuth(token="data-token"),
timeout=90
@@ -219,7 +244,7 @@ The LLM will automatically choose which A2A agent to delegate to based on the ta
Control how agent connection failures are handled using the `fail_fast` parameter:
```python Code
from crewai.a2a import A2AConfig
from crewai.a2a import A2AClientConfig
# Fail immediately on connection errors (default)
agent = Agent(
@@ -227,7 +252,7 @@ agent = Agent(
goal="Coordinate research tasks",
backstory="Expert at delegation",
llm="gpt-4o",
a2a=A2AConfig(
a2a=A2AClientConfig(
endpoint="https://research.example.com/.well-known/agent-card.json",
fail_fast=True
)
@@ -240,11 +265,11 @@ agent = Agent(
backstory="Expert at working with available resources",
llm="gpt-4o",
a2a=[
A2AConfig(
A2AClientConfig(
endpoint="https://primary.example.com/.well-known/agent-card.json",
fail_fast=False
),
A2AConfig(
A2AClientConfig(
endpoint="https://backup.example.com/.well-known/agent-card.json",
fail_fast=False
)
@@ -263,8 +288,8 @@ Control how your agent receives task status updates from remote A2A agents:
<Tabs>
<Tab title="Streaming (Default)">
```python Code
from crewai.a2a import A2AConfig
```python streaming_config.py lines
from crewai.a2a import A2AClientConfig
from crewai.a2a.updates import StreamingConfig
agent = Agent(
@@ -272,17 +297,17 @@ agent = Agent(
goal="Coordinate research tasks",
backstory="Expert at delegation",
llm="gpt-4o",
a2a=A2AConfig(
a2a=A2AClientConfig(
endpoint="https://research.example.com/.well-known/agent-card.json",
updates=StreamingConfig()
)
)
```
```
</Tab>
<Tab title="Polling">
```python Code
from crewai.a2a import A2AConfig
```python polling_config.py lines
from crewai.a2a import A2AClientConfig
from crewai.a2a.updates import PollingConfig
agent = Agent(
@@ -290,7 +315,7 @@ agent = Agent(
goal="Coordinate research tasks",
backstory="Expert at delegation",
llm="gpt-4o",
a2a=A2AConfig(
a2a=A2AClientConfig(
endpoint="https://research.example.com/.well-known/agent-card.json",
updates=PollingConfig(
interval=2.0,
@@ -299,12 +324,12 @@ agent = Agent(
)
)
)
```
```
</Tab>
<Tab title="Push Notifications">
```python Code
from crewai.a2a import A2AConfig
```python push_notifications_config.py lines
from crewai.a2a import A2AClientConfig
from crewai.a2a.updates import PushNotificationConfig
agent = Agent(
@@ -312,19 +337,137 @@ agent = Agent(
goal="Coordinate research tasks",
backstory="Expert at delegation",
llm="gpt-4o",
a2a=A2AConfig(
a2a=A2AClientConfig(
endpoint="https://research.example.com/.well-known/agent-card.json",
updates=PushNotificationConfig(
url={base_url}/a2a/callback",
url="{base_url}/a2a/callback",
token="your-validation-token",
timeout=300.0
)
)
)
```
```
</Tab>
</Tabs>
## Exposing Agents as A2A Servers
You can expose your CrewAI agents as A2A-compliant servers, allowing other A2A clients to delegate tasks to them.
### Server Configuration
Add an `A2AServerConfig` to your agent to enable server capabilities:
```python a2a_server_agent.py lines
from crewai import Agent
from crewai.a2a import A2AServerConfig
agent = Agent(
role="Data Analyst",
goal="Analyze datasets and provide insights",
backstory="Expert data scientist with statistical analysis skills",
llm="gpt-4o",
a2a=A2AServerConfig(url="https://your-server.com")
)
```
### Server Configuration Options
<ParamField path="name" type="str" default="None">
Human-readable name for the agent. Defaults to the agent's role if not provided.
</ParamField>
<ParamField path="description" type="str" default="None">
Human-readable description. Defaults to the agent's goal and backstory if not provided.
</ParamField>
<ParamField path="version" type="str" default="1.0.0">
Version string for the agent card.
</ParamField>
<ParamField path="skills" type="list[AgentSkill]" default="[]">
List of agent skills. Auto-generated from agent tools if not provided.
</ParamField>
<ParamField path="capabilities" type="AgentCapabilities" default="AgentCapabilities(streaming=True, push_notifications=False)">
Declaration of optional capabilities supported by the agent.
</ParamField>
<ParamField path="default_input_modes" type="list[str]" default='["text/plain", "application/json"]'>
Supported input MIME types.
</ParamField>
<ParamField path="default_output_modes" type="list[str]" default='["text/plain", "application/json"]'>
Supported output MIME types.
</ParamField>
<ParamField path="url" type="str" default="None">
Preferred endpoint URL. If set, overrides the URL passed to `to_agent_card()`.
</ParamField>
<ParamField path="preferred_transport" type="Literal['JSONRPC', 'GRPC', 'HTTP+JSON']" default="JSONRPC">
Transport protocol for the preferred endpoint.
</ParamField>
<ParamField path="protocol_version" type="str" default="0.3">
A2A protocol version this agent supports.
</ParamField>
<ParamField path="provider" type="AgentProvider" default="None">
Information about the agent's service provider.
</ParamField>
<ParamField path="documentation_url" type="str" default="None">
URL to the agent's documentation.
</ParamField>
<ParamField path="icon_url" type="str" default="None">
URL to an icon for the agent.
</ParamField>
<ParamField path="additional_interfaces" type="list[AgentInterface]" default="[]">
Additional supported interfaces (transport and URL combinations).
</ParamField>
<ParamField path="security" type="list[dict[str, list[str]]]" default="[]">
Security requirement objects for all agent interactions.
</ParamField>
<ParamField path="security_schemes" type="dict[str, SecurityScheme]" default="{}">
Security schemes available to authorize requests.
</ParamField>
<ParamField path="supports_authenticated_extended_card" type="bool" default="False">
Whether agent provides extended card to authenticated users.
</ParamField>
<ParamField path="signatures" type="list[AgentCardSignature]" default="[]">
JSON Web Signatures for the AgentCard.
</ParamField>
### Combined Client and Server
An agent can act as both client and server by providing both configurations:
```python Code
from crewai import Agent
from crewai.a2a import A2AClientConfig, A2AServerConfig
agent = Agent(
role="Research Coordinator",
goal="Coordinate research and serve analysis requests",
backstory="Expert at delegation and analysis",
llm="gpt-4o",
a2a=[
A2AClientConfig(
endpoint="https://specialist.example.com/.well-known/agent-card.json",
timeout=120
),
A2AServerConfig(url="https://your-server.com")
]
)
```
## Best Practices
<CardGroup cols={2}>

View File

@@ -7,6 +7,10 @@ mode: "wide"
## Overview
<Note>
The `@human_feedback` decorator requires **CrewAI version 1.8.0 or higher**. Make sure to update your installation before using this feature.
</Note>
The `@human_feedback` decorator enables human-in-the-loop (HITL) workflows directly within CrewAI Flows. It allows you to pause flow execution, present output to a human for review, collect their feedback, and optionally route to different listeners based on the feedback outcome.
This is particularly valuable for:

View File

@@ -11,10 +11,10 @@ Human-in-the-Loop (HITL) is a powerful approach that combines artificial intelli
CrewAI offers two main approaches for implementing human-in-the-loop workflows:
| Approach | Best For | Integration |
|----------|----------|-------------|
| **Flow-based** (`@human_feedback` decorator) | Local development, console-based review, synchronous workflows | [Human Feedback in Flows](/en/learn/human-feedback-in-flows) |
| **Webhook-based** (Enterprise) | Production deployments, async workflows, external integrations (Slack, Teams, etc.) | This guide |
| Approach | Best For | Integration | Version |
|----------|----------|-------------|---------|
| **Flow-based** (`@human_feedback` decorator) | Local development, console-based review, synchronous workflows | [Human Feedback in Flows](/en/learn/human-feedback-in-flows) | **1.8.0+** |
| **Webhook-based** (Enterprise) | Production deployments, async workflows, external integrations (Slack, Teams, etc.) | This guide | - |
<Tip>
If you're building flows and want to add human review steps with routing based on feedback, check out the [Human Feedback in Flows](/en/learn/human-feedback-in-flows) guide for the `@human_feedback` decorator.

View File

@@ -0,0 +1,115 @@
---
title: Galileo
description: Galileo integration for CrewAI tracing and evaluation
icon: telescope
mode: "wide"
---
## Overview
This guide demonstrates how to integrate **Galileo** with **CrewAI**
for comprehensive tracing and Evaluation Engineering.
By the end of this guide, you will be able to trace your CrewAI agents,
monitor their performance, and evaluate their behaviour with
Galileo's powerful observability platform.
> **What is Galileo?** [Galileo](https://galileo.ai) is AI evaluation and observability
platform that delivers end-to-end tracing, evaluation,
and monitoring for AI applications. It enables teams to capture ground truth,
create robust guardrails, and run systematic experiments with
built-in experiment tracking and performance analytics—ensuring reliability,
transparency, and continuous improvement across the AI lifecycle.
## Getting started
This tutorial follows the [CrewAI quickstart](/en/quickstart) and shows how to add
Galileo's [CrewAIEventListener](https://v2docs.galileo.ai/sdk-api/python/reference/handlers/crewai/handler),
an event handler.
For more information, see Galileos
[Add Galileo to a CrewAI Application](https://v2docs.galileo.ai/how-to-guides/third-party-integrations/add-galileo-to-crewai/add-galileo-to-crewai)
how-to guide.
> **Note** This tutorial assumes you have completed the [CrewAI quickstart](/en/quickstart).
If you want a completed comprehensive example, see the Galileo
[CrewAI sdk-example repo](https://github.com/rungalileo/sdk-examples/tree/main/python/agent/crew-ai).
### Step 1: Install dependencies
Install the required dependencies for your app.
Create a virtual environment using your preferred method,
then install dependencies inside that environment using your
preferred tool:
```bash
uv add galileo
```
### Step 2: Add to the .env file from the [CrewAI quickstart](/en/quickstart)
```bash
# Your Galileo API key
GALILEO_API_KEY="your-galileo-api-key"
# Your Galileo project name
GALILEO_PROJECT="your-galileo-project-name"
# The name of the Log stream you want to use for logging
GALILEO_LOG_STREAM="your-galileo-log-stream "
```
### Step 3: Add the Galileo event listener
To enable logging with Galileo, you need to create an instance of the `CrewAIEventListener`.
Import the Galileo CrewAI handler package by
adding the following code at the top of your main.py file:
```python
from galileo.handlers.crewai.handler import CrewAIEventListener
```
At the start of your run function, create the event listener:
```python
def run():
# Create the event listener
CrewAIEventListener()
# The rest of your existing code goes here
```
When you create the listener instance, it is automatically
registered with CrewAI.
### Step 4: Run your crew
Run your crew with the CrewAI CLI:
```bash
crewai run
```
### Step 5: View the traces in Galileo
Once your crew has finished, the traces will be flushed and appear in Galileo.
![Galileo trace view](/images/galileo-trace-veiw.png)
## Understanding the Galileo Integration
Galileo integrates with CrewAI by registering an event listener
that captures Crew execution events (e.g., agent actions, tool calls, model responses)
and forwards them to Galileo for observability and evaluation.
### Understanding the event listener
Creating a `CrewAIEventListener()` instance is all thats
required to enable Galileo for a CrewAI run. When instantiated, the listener:
- Automatically registers itself with CrewAI
- Reads Galileo configuration from environment variables
- Logs all run data to the Galileo project and log stream specified by
`GALILEO_PROJECT` and `GALILEO_LOG_STREAM`
No additional configuration or code changes are required.
All data from this run is logged to the Galileo project and
log stream specified by your environment configuration
(for example, GALILEO_PROJECT and GALILEO_LOG_STREAM).

Binary file not shown.

After

Width:  |  Height:  |  Size: 906 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 200 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 865 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1021 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 104 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 116 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 93 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 239 KiB

View File

@@ -4,6 +4,545 @@ description: "CrewAI의 제품 업데이트, 개선 사항 및 버그 수정"
icon: "clock"
mode: "wide"
---
<Update label="2026년 1월 8일">
## v1.8.0
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.8.0)
## 변경 사항
### 기능
- a2a를 위한 네이티브 비동기 체인 추가
- 핸들러 및 설정과 함께 a2a 업데이트 메커니즘(poll/stream/push) 추가
- 휴먼 인 더 루프 피드백을 위한 전역 흐름 설정 도입
- 스트리밍 도구 호출 이벤트 추가 및 프로바이더 ID 추적 수정
- 프로덕션 준비된 Flows 및 Crews 아키텍처 도입
- Flows를 위한 HITL 추가
- 향상된 이벤트 처리를 위한 EventListener 및 TraceCollectionListener 개선
### 버그 수정
- 누락된 a2a 종속성을 선택적으로 처리
- WorkOS 로그인 폴링을 위한 오류 가져오기 수정
- 샘플 문서의 잘못된 트리거 이름 수정
### 문서
- 웹훅 스트리밍 문서 업데이트
- AOP에서 AMP로 문서 언어 조정
### 기여자
@Vidit-Ostwal, @greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @lucasgomide, @mplachta
</Update>
<Update label="2025년 12월 19일">
## v1.7.2
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.7.2)
## 변경 사항
### 버그 수정
- 연결 문제 해결
### 문서
- api-reference/status 문서 페이지 업데이트
### 기여자
@greysonlalonde, @heitorado, @lorenzejay, @lucasgomide
</Update>
<Update label="2025년 12월 16일">
## v1.7.1
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.7.1)
## 변경 사항
### 개선 사항
- bump 명령에 `--no-commit` 플래그 추가
- 도구 인수 직렬화에 JSON 스키마 사용
### 버그 수정
- 도구 저장소 로그인 실패 시 응답에서 오류 메시지 표시 수정
- 비동기 작업 실행 시 future의 정상적인 종료 수정
- 인덱스를 추가하여 작업 순서 수정
- Windows 신호에 대한 플랫폼 호환성 검사 수정
- 프로세스 중단을 방지하기 위한 RPM 컨트롤러 타이머 수정
- 토큰 사용량 기록 수정 및 스트림에서 응답 모델 검증
### 문서
- 비동기에 대한 번역된 문서 추가
- AOP Deploy API 문서 추가
- 에이전트 핸들러 커넥터 문서 추가
- 네이티브 비동기 문서 추가
### 기여자
@Llamrei, @dragosmc, @gilfeig, @greysonlalonde, @heitorado, @lorenzejay, @mattatcha, @vinibrsl
</Update>
<Update label="2025년 12월 9일">
## v1.7.0
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.7.0)
## 변경 사항
### 기능
- 비동기 흐름 킥오프 추가
- 비동기 크루 지원 추가
- 비동기 작업 지원 추가
- 비동기 지식 지원 추가
- 비동기 메모리 지원 추가
- 도구 및 에이전트 실행기에 대한 비동기 지원 추가; 타입 및 문서 개선
- a2a 확장 API 및 비동기 에이전트 카드 캐싱 구현; 작업 전파 및 스트리밍 수정
- 네이티브 비동기 도구 지원 추가
- 비동기 llm 지원 추가
- sys 이벤트 유형 및 핸들러 생성
### 버그 수정
- nonetypes가 otel에 전달되지 않도록 보장하는 문제 수정
- 토큰 저장소 파일 작업의 교착 상태 수정
- otel span이 닫히도록 보장하는 수정
- 임베딩에 HuggingFaceEmbeddingFunction 사용, 키 업데이트 및 테스트 추가
- 모든 지원되는 anthropic 모델에 대해 supports_tools가 true인지 확인
- 라이트 에이전트 흐름에서 훅이 작동하도록 보장
### 기여자
@greysonlalonde, @lorenzejay
</Update>
<Update label="2025년 11월 29일">
## v1.6.1
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.6.1)
## 변경 사항
### 버그 수정
- ChatCompletionsClient 호출이 제대로 작동하도록 수정
- 어노테이션에 대해 비동기 메서드가 실행 가능하도록 보장
- RagTool.add의 매개변수 수정, 타입 및 테스트 추가
- SSE 클라이언트에서 잘못된 매개변수 제거
- 'crewai config reset' 명령에서 'oauth2_extra' 설정 삭제
### 리팩토링
- LLM 클래스에서 모델 검증 및 프로바이더 추론 향상
### 기여자
@Vidit-Ostwal, @greysonlalonde, @heitorado, @lorenzejay
</Update>
<Update label="2025년 11월 25일">
## v1.6.0
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.6.0)
## 변경 사항
### 기능
- 흐름 및 크루에 스트리밍 결과 지원 추가
- gemini-3-pro-preview 추가
- Entra ID를 사용한 CLI 로그인 지원
- Merge Agent Handler 도구 추가
- 흐름 이벤트 상태 관리 향상
### 버그 수정
- 사용자 지정 rag 저장소 지속 경로가 전달된 경우 설정되도록 보장
- 퍼지 반환이 더 엄격하고 타입 경고를 표시하도록 보장
- openai response_format 매개변수 다시 추가 및 테스트 추가
- rag 도구 임베딩 설정 수정
- 플롯에서 흐름 실행 시작 패널이 표시되지 않도록 보장
### 문서
- 문서에서 AMP에서 AOP로 참조 업데이트
- AMP에서 AOP로 업데이트
### 기여자
@Vidit-Ostwal, @gilfeig, @greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @markmcd
</Update>
<Update label="2025년 11월 22일">
## v0.203.2
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/0.203.2)
## 변경 사항
- 0.203.1에서 0.203.2로 핫픽스 버전 범프
</Update>
<Update label="2025년 11월 16일">
## v1.5.0
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.5.0)
## 변경 사항
### 기능
- a2a 신뢰 원격 완료 상태 플래그 추가
- Okta 인증 서버에 대한 더 많은 데이터 가져오기 및 저장
- CrewAgentExecutor에서 LLM 호출 전후 훅 구현
- TaskOutput 및 LiteAgentOutputs에 메시지 노출
- QdrantVectorSearchTool의 스키마 설명 향상
### 버그 수정
- 추적 인스트루멘테이션 플래그가 올바르게 적용되도록 보장
- 사용자 정의 도구 문서 링크 수정 및 Mintlify 깨진 링크 작업 추가
### 문서
- LLM 기반 검증 지원으로 작업 가드레일 문서 향상
### 기여자
@danielfsbarreto, @greysonlalonde, @heitorado, @lorenzejay, @theCyberTech
</Update>
<Update label="2025년 11월 7일">
## v1.4.1
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.4.1)
## 변경 사항
### 버그 수정
- 에이전트 최대 반복 처리 수정
- LLM 모델 구문에 대한 라우팅 문제를 해당 프로바이더로 해결
### 기여자
@greysonlalonde
</Update>
<Update label="2025년 11월 7일">
## v1.4.0
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.4.0)
## 변경 사항
### 기능
- 비AST 플롯 경로 지원 추가
- MCP에 대한 일급 지원 구현
- BaseInterceptor에 Pydantic 검증 던더 추가
- LLM 메시지 인터셉터 훅 지원 추가
- 효율적인 사용을 위한 i18n 프롬프트 캐싱
- QdrantVectorSearchTool 향상
### 버그 수정
- stopwords 업데이트 유지 관련 문제 수정
- 흐름 상태에서 피클할 수 없는 값 해결
- 라이트 에이전트가 검증 오류 시 수정되도록 보장
- 캐싱이 작동하도록 콜백 인수 해싱 수정
- 유효한 URL에서 RAG 소스 콘텐츠 추가 허용
- 플롯 노드 선택을 더 부드럽게 만듦
- 지식에 대한 중복 문서 ID 수정
### 리팩토링
- concurrent futures로 MCP 도구 실행 처리 개선
- 흐름 처리, 타입 및 로깅 단순화; UI 및 테스트 업데이트
- 중지 단어 관리를 속성으로 리팩토링
### 문서
- embedder를 embedding_model로 마이그레이션하고 도구 문서 전체에 vectordb 필요; 프로바이더 예제 추가 (en/ko/pt-BR)
### 기여자
@danielfsbarreto, @greysonlalonde, @lorenzejay, @lucasgomide, @tonykipkemboi
</Update>
<Update label="2025년 11월 1일">
## v1.3.0
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.3.0)
## 변경 사항
### 기능
- 흐름 처리, 타입 및 로깅 리팩토링
- QdrantVectorSearchTool 향상
### 버그 수정
- Firecrawl 도구 수정 및 테스트 추가
- use_stop_words를 속성으로 리팩토링하고 중지 단어 확인 추가
### 문서
- embedder를 embedding_model로 마이그레이션하고 도구 문서 전체에 vectordb 필요
- 영어, 한국어 및 포르투갈어로 프로바이더 예제 추가
### 리팩토링
- 흐름 처리 및 UI 업데이트 개선
### 기여자
@danielfsbarreto, @greysonlalonde, @lorenzejay, @lucasgomide, @tonykipkemboi
</Update>
<Update label="2025년 10월 27일">
## v1.2.1
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.2.1)
## 변경 사항
### 기능
- Datadog 통합 지원 추가
- liteagent에서 apps 및 mcps 지원
### 문서
- 각 통합에 대해 Platform 도구를 호출하기 위한 필수 환경 변수 설명
- Datadog 통합 문서 추가
### 기여자
@barieom, @lorenzejay, @lucasgomide, @sabrenner
</Update>
<Update label="2025년 10월 24일">
## v1.2.0
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.2.0)
## 변경 사항
### 버그 수정
- 기본 LLM 모델 업데이트 및 LLM 유틸리티의 오류 로깅 개선
- 흐름 시각화 디렉토리 및 메서드 검사 변경
### 사용되지 않는 항목 삭제
- aisuite 제거
### 기여자
@greysonlalonde, @lorenzejay
</Update>
<Update label="2025년 10월 21일">
## v1.1.0
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.1.0)
## 변경 사항
### 기능
- InternalInstructor를 향상하여 여러 LLM 프로바이더 지원
- mypy 플러그인 기반 구현
- QdrantVectorSearchTool 개선
### 버그 수정
- 깨진 통합 문서 링크 수정
- 이중 추적 호출 수정 및 타입 추가
- 템플릿 버전을 최신으로 고정
### 문서
- LLM 통합 세부 정보 및 예제 업데이트
### 리팩토링
- CrewBase 타이핑 개선
### 기여자
@cwarre33, @danielfsbarreto, @greysonlalonde, @lorenzejay
</Update>
<Update label="2025년 10월 20일">
## v1.0.0
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0)
## 변경 사항
### 기능
- 버전을 1.0.0으로 범프
- Agent 클래스에서 지식 및 가드레일 이벤트 처리 향상
- crewai run 명령에 도구 저장소 자격 증명 주입
### 버그 수정
- Flow 데코레이터에서 중첩된 조건 구조 유지
- Printer.print 메서드에 표준 인쇄 매개변수 추가
- input()을 사용할 수 없을 때 오류 수정
- JWT 디코딩 시 10초 여유 추가
- 잘못된 cron 일정 되돌리기
- 특정 날짜에 5일마다 실행되도록 cron 일정 수정
- 하드코딩된 경로 대신 Docker 바이너리에 시스템 PATH 사용
- 템플릿 디렉토리를 올바르게 제외하기 위한 CodeQL 구성 추가
### 문서
- 취약점 보고를 위한 보안 정책 업데이트
- CrewAI AMP에서 텔레메트리 로그 캡처 가이드 추가
- 누락된 /resume 파일 추가
- HITL 워크플로에서 웹훅 URL 매개변수 명확화
### 기여자
@Vidit-Ostwal, @greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @lucasgomide, @mplachta, @theCyberTech
</Update>
<Update label="2025년 10월 18일">
## v1.0.0b3 (프리릴리스)
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0b3)
## 변경 사항
### 기능
- 작업 가드레일 기능 및 검증 향상
- 네이티브 SDK 가져오기 지원 개선
- Azure 네이티브 테스트 추가
- 고급 기능으로 BedrockCompletion 클래스 향상
- 클라이언트 매개변수 지원으로 GeminiCompletion 클래스 향상
- 추가 클라이언트 매개변수로 AnthropicCompletion 클래스 향상
### 버그 수정
- Flow 데코레이터에서 중첩된 조건 구조 유지
- Printer.print 메서드에 표준 인쇄 매개변수 추가
- stdout 인쇄 제거 및 테스트 결정론 개선
### 리팩토링
- 전체 타이핑을 포함한 메타클래스로 프로젝트 모듈 변환
### 기여자
@greysonlalonde, @lorenzejay
</Update>
<Update label="2025년 10월 16일">
## v1.0.0b2 (프리릴리스)
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0b2)
## 변경 사항
### 기능
- 추가 클라이언트 매개변수로 OpenAICompletion 클래스 향상
- 이벤트 버스 스레드 안전성 및 비동기 지원 개선
- crewai run 명령에 도구 저장소 자격 증명 주입
### 버그 수정
- input()을 사용할 수 없을 때 오류가 발생하는 문제 수정
- JWT 디코딩 시 10초 여유 추가
- task.py에서 복사 및 NOT_SPECIFIED 확인 수정
### 문서
- 문서에서 CREWAI_PLATFORM_INTEGRATION_TOKEN이 언급되도록 보장
- 트리거 문서 업데이트
### 기여자
@Vidit-Ostwal, @greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @lucasgomide
</Update>
<Update label="2025년 10월 14일">
## v1.0.0b1 (프리릴리스)
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0b1)
## 변경 사항
### 기능
- 추가 클라이언트 매개변수로 OpenAICompletion 클래스 향상
- 이벤트 버스 스레드 안전성 및 비동기 지원 개선
- Bedrock LLM 통합 구현
### 버그 수정
- 누락된 input() 가용성 문제 수정
- 10초 여유를 추가하여 JWT 디코딩 오류 해결
- crewai run 명령에 도구 저장소 자격 증명 주입
- task.py에서 복사 및 NOT_SPECIFIED 확인 수정
### 문서
- 문서에서 CREWAI_PLATFORM_INTEGRATION_TOKEN이 언급되도록 보장
- 트리거 문서 업데이트
### 기여자
@Vidit-Ostwal, @greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @lucasgomide
</Update>
<Update label="2025년 10월 13일">
## v0.203.1
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/0.203.1)
## 변경 사항
### 핵심 개선 및 수정
- `crewai run` 명령에 도구 저장소 자격 증명 주입 수정
- 토큰 검증 오류를 줄이기 위해 JWT 디코딩 시 10초 여유 추가
- 특정 날짜에 5일마다 작업을 실행하도록 의도된 cron 일정 수정(이후 되돌림)
### 문서 및 가이드
- 취약점 보고 프로세스를 명확히 하기 위해 보안 정책 업데이트
</Update>
<Update label="2025년 10월 9일">
## v1.0.0a4 (프리릴리스)
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0a4)
## 변경 사항
### 기능
- Agent 클래스에서 지식 및 가드레일 이벤트 처리 향상
- 로컬 개발을 위한 트리거 목록 및 실행 명령 도입
- Platform Actions을 소비하는 새로운 접근 방식으로 문서 업데이트
- CrewAI AMP에서 텔레메트리 로그 캡처 가이드 추가
### 버그 수정
- 잘못된 cron 일정 되돌리기
- 특정 날짜에 5일마다 실행되도록 cron 일정 수정
- 중복 행 제거 및 명시적 환경 변수 추가
### 기여자
@greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @lucasgomide, @mplachta, @theCyberTech
</Update>
<Update label="2025년 10월 7일">
## v1.0.0a3 (프리릴리스)
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0a3)
## 변경 사항
### 기능
- 플랫폼 작업에 대한 에이전트 지원 추가
- 코드 실행기 도구에 인터프리터 인수 추가
- 플랫폼 앱 실행에 대한 직접 지원
### 문서
- 플랫폼 작업 문서 추가
- MCP 문서에 stdio 및 sse 전송 유형 추가
- AWS 모델 목록 업데이트
### 기여자
@greysonlalonde, @heitorado, @lorenzejay, @lucasgomide
</Update>
<Update label="2025년 10월 3일">
## v1.0.0a2 (프리릴리스)
[GitHub 릴리스 보기](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0a2)
## 변경 사항
### 핵심 개선 및 수정
- 모노레포를 위한 CI 업데이트
- 기본 Anthropic 모델을 claude-sonnet-4-20250514로 업데이트
- 모델 업데이트에 대한 테스트 수정
### 기여자
@greysonlalonde, @lorenzejay
</Update>
<Update label="2025년 9월 30일">
## v1.0.0a1

View File

@@ -567,6 +567,10 @@ Fourth method running
### Human in the Loop (인간 피드백)
<Note>
`@human_feedback` 데코레이터는 **CrewAI 버전 1.8.0 이상**이 필요합니다.
</Note>
`@human_feedback` 데코레이터는 인간의 피드백을 수집하기 위해 플로우 실행을 일시 중지하는 human-in-the-loop 워크플로우를 가능하게 합니다. 이는 승인 게이트, 품질 검토, 인간의 판단이 필요한 결정 지점에 유용합니다.
```python Code

View File

@@ -107,7 +107,7 @@ CrewAI 코드 내에는 사용할 모델을 지정할 수 있는 여러 위치
## 공급자 구성 예시
CrewAI는 고유한 기능, 인증 방법, 모델 역량을 제공하는 다양한 LLM 공급자를 지원합니다.
CrewAI는 고유한 기능, 인증 방법, 모델 역량을 제공하는 다양한 LLM 공급자를 지원합니다.
이 섹션에서는 프로젝트의 요구에 가장 적합한 LLM을 선택, 구성, 최적화하는 데 도움이 되는 자세한 예시를 제공합니다.
<AccordionGroup>
@@ -153,8 +153,8 @@ CrewAI는 고유한 기능, 인증 방법, 모델 역량을 제공하는 다양
</Accordion>
<Accordion title="Meta-Llama">
Meta의 Llama API는 Meta의 대형 언어 모델 패밀리 접근을 제공합니다.
API는 [Meta Llama API](https://llama.developer.meta.com?utm_source=partner-crewai&utm_medium=website)에서 사용할 수 있습니다.
Meta의 Llama API는 Meta의 대형 언어 모델 패밀리 접근을 제공합니다.
API는 [Meta Llama API](https://llama.developer.meta.com?utm_source=partner-crewai&utm_medium=website)에서 사용할 수 있습니다.
`.env` 파일에 다음 환경 변수를 설정하십시오:
```toml Code
@@ -207,11 +207,20 @@ CrewAI는 고유한 기능, 인증 방법, 모델 역량을 제공하는 다양
`.env` 파일에 API 키를 설정하십시오. 키가 필요하거나 기존 키를 찾으려면 [AI Studio](https://aistudio.google.com/apikey)를 확인하세요.
```toml .env
# https://ai.google.dev/gemini-api/docs/api-key
# Gemini API 사용 시 (다음 중 하나)
GOOGLE_API_KEY=<your-api-key>
GEMINI_API_KEY=<your-api-key>
# Vertex AI Express 모드 사용 시 (API 키 인증)
GOOGLE_GENAI_USE_VERTEXAI=true
GOOGLE_API_KEY=<your-api-key>
# Vertex AI 서비스 계정 사용 시
GOOGLE_CLOUD_PROJECT=<your-project-id>
GOOGLE_CLOUD_LOCATION=<location> # 기본값: us-central1
```
CrewAI 프로젝트에서의 예시 사용법:
**기본 사용법:**
```python Code
from crewai import LLM
@@ -221,6 +230,34 @@ CrewAI는 고유한 기능, 인증 방법, 모델 역량을 제공하는 다양
)
```
**Vertex AI Express 모드 (API 키 인증):**
Vertex AI Express 모드를 사용하면 서비스 계정 자격 증명 대신 간단한 API 키 인증으로 Vertex AI를 사용할 수 있습니다. Vertex AI를 시작하는 가장 빠른 방법입니다.
Express 모드를 활성화하려면 `.env` 파일에 두 환경 변수를 모두 설정하세요:
```toml .env
GOOGLE_GENAI_USE_VERTEXAI=true
GOOGLE_API_KEY=<your-api-key>
```
그런 다음 평소처럼 LLM을 사용하세요:
```python Code
from crewai import LLM
llm = LLM(
model="gemini/gemini-2.0-flash",
temperature=0.7
)
```
<Info>
Express 모드 API 키를 받으려면:
- 신규 Google Cloud 사용자: [Express 모드 API 키](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart?usertype=apikey) 받기
- 기존 Google Cloud 사용자: [서비스 계정에 바인딩된 Google Cloud API 키](https://cloud.google.com/docs/authentication/api-keys) 받기
자세한 내용은 [Vertex AI Express 모드 문서](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart?usertype=apikey)를 참조하세요.
</Info>
### Gemini 모델
Google은 다양한 용도에 최적화된 강력한 모델을 제공합니다.
@@ -476,7 +513,7 @@ CrewAI는 고유한 기능, 인증 방법, 모델 역량을 제공하는 다양
<Accordion title="Local NVIDIA NIM Deployed using WSL2">
NVIDIA NIM을 이용하면 Windows 기기에서 WSL2(Windows Subsystem for Linux)를 통해 강력한 LLM을 로컬로 실행할 수 있습니다.
NVIDIA NIM을 이용하면 Windows 기기에서 WSL2(Windows Subsystem for Linux)를 통해 강력한 LLM을 로컬로 실행할 수 있습니다.
이 방식은 Nvidia GPU를 활용하여 프라이빗하고, 안전하며, 비용 효율적인 AI 추론을 클라우드 서비스에 의존하지 않고 구현할 수 있습니다.
데이터 프라이버시, 오프라인 기능이 필요한 개발, 테스트, 또는 프로덕션 환경에 최적입니다.
@@ -954,4 +991,4 @@ LLM 설정을 최대한 활용하는 방법을 알아보세요:
llm = LLM(model="openai/gpt-4o") # 128K tokens
```
</Tab>
</Tabs>
</Tabs>

View File

@@ -128,7 +128,7 @@ Flow를 배포할 때 다음을 고려하세요:
### CrewAI Enterprise
Flow를 배포하는 가장 쉬운 방법은 CrewAI Enterprise를 사용하는 것입니다. 인프라, 인증 및 모니터링을 대신 처리합니다.
시작하려면 [배포 가이드](/ko/enterprise/guides/deploy-crew)를 확인하세요.
시작하려면 [배포 가이드](/ko/enterprise/guides/deploy-to-amp)를 확인하세요.
```bash
crewai deploy create

View File

@@ -91,7 +91,7 @@ Git 없이 빠르게 배포 — 프로젝트 ZIP 패키지를 업로드하세요
## 관련 문서
<CardGroup cols={3}>
<Card title="크루 배포" href="/ko/enterprise/guides/deploy-crew" icon="rocket">
<Card title="크루 배포" href="/ko/enterprise/guides/deploy-to-amp" icon="rocket">
GitHub 또는 ZIP 파일로 크루 배포
</Card>
<Card title="자동화 트리거" href="/ko/enterprise/guides/automation-triggers" icon="trigger">

View File

@@ -79,7 +79,7 @@ Crew Studio는 자연어와 시각적 워크플로 에디터로 처음부터 자
<Card title="크루 빌드" href="/ko/enterprise/guides/build-crew" icon="paintbrush">
크루를 빌드하세요.
</Card>
<Card title="크루 배포" href="/ko/enterprise/guides/deploy-crew" icon="rocket">
<Card title="크루 배포" href="/ko/enterprise/guides/deploy-to-amp" icon="rocket">
GitHub 또는 ZIP 파일로 크루 배포.
</Card>
<Card title="React 컴포넌트 내보내기" href="/ko/enterprise/guides/react-component-export" icon="download">

View File

@@ -0,0 +1,342 @@
---
title: 트레이스용 PII 삭제
description: "크루 및 플로우 실행 트레이스에서 민감한 데이터를 자동으로 삭제합니다"
icon: "lock"
mode: "wide"
---
## 개요
PII 삭제는 크루 및 플로우 실행 트레이스에서 개인 식별 정보(PII)를 자동으로 감지하고 마스킹하는 CrewAI AMP 기능입니다. 이를 통해 신용카드 번호, 주민등록번호, 이메일 주소, 이름과 같은 민감한 데이터가 CrewAI AMP 트레이스에 노출되지 않도록 보장합니다. 또한 조직별 데이터를 보호하기 위해 커스텀 인식기를 생성할 수 있습니다.
<Info>
PII 삭제는 Enterprise 플랜에서 사용 가능합니다.
배포 버전은 1.8.0 이상이어야 합니다.
</Info>
<Frame>
![PII 삭제 개요](/images/enterprise/pii_mask_recognizer_trace_example.png)
</Frame>
## PII 삭제가 중요한 이유
프로덕션 환경에서 AI 에이전트를 실행할 때, 민감한 정보가 종종 크루를 통해 흐릅니다:
- CRM 통합의 고객 데이터
- 결제 처리업체의 금융 정보
- 양식 제출의 개인 정보
- 내부 직원 데이터
적절한 삭제 없이는 이 데이터가 트레이스에 나타나, GDPR, HIPAA, PCI-DSS와 같은 규정 준수가 어려워집니다. PII 삭제는 트레이스에 저장되기 전에 민감한 데이터를 자동으로 마스킹하여 이 문제를 해결합니다.
## 작동 방식
1. **감지** - 알려진 PII 패턴에 대해 트레이스 이벤트 데이터를 스캔
2. **분류** - 민감한 데이터 유형 식별 (신용카드, SSN, 이메일 등)
3. **마스킹/삭제** - 구성에 따라 민감한 데이터를 마스킹된 값으로 대체
```
원본: "john.doe@company.com으로 연락하거나 555-123-4567로 전화하세요"
삭제됨: "<EMAIL_ADDRESS>로 연락하거나 <PHONE_NUMBER>로 전화하세요"
```
## PII 삭제 활성화
<Info>
이 기능을 사용하려면 Enterprise 플랜이어야 하며 배포 버전이 1.8.0 이상이어야 합니다.
</Info>
<Steps>
<Step title="크루 설정으로 이동">
CrewAI AMP 대시보드에서 배포된 크루를 선택하고 배포/자동화 중 하나로 이동한 다음 **Settings** → **PII Protection**으로 이동합니다.
</Step>
<Step title="PII 보호 활성화">
**PII Redaction for Traces**를 토글하여 활성화합니다. 이렇게 하면 트레이스 데이터의 자동 스캔 및 삭제가 활성화됩니다.
<Info>
각 배포에 대해 PII 삭제를 수동으로 활성화해야 합니다.
</Info>
<Frame>
![PII 삭제 활성화](/images/enterprise/pii_mask_recognizer_enable.png)
</Frame>
</Step>
<Step title="엔티티 유형 구성">
감지하고 삭제할 PII 유형을 선택합니다. 각 엔티티는 개별적으로 활성화하거나 비활성화할 수 있습니다.
<Frame>
![엔티티 구성](/images/enterprise/pii_mask_recognizer_supported_entities.png)
</Frame>
</Step>
<Step title="저장">
구성을 저장합니다. PII 삭제는 이후 모든 크루 실행에서 활성화되며, 재배포가 필요하지 않습니다.
</Step>
</Steps>
## 지원되는 엔티티 유형
CrewAI는 카테고리별로 구성된 다음 PII 엔티티 유형을 지원합니다.
### 글로벌 엔티티
| 엔티티 | 설명 | 예시 |
|--------|------|------|
| `CREDIT_CARD` | 신용/직불 카드 번호 | "4111-1111-1111-1111" |
| `CRYPTO` | 암호화폐 지갑 주소 | "bc1qxy2kgd..." |
| `DATE_TIME` | 날짜 및 시간 | "2024년 1월 15일" |
| `EMAIL_ADDRESS` | 이메일 주소 | "john@example.com" |
| `IBAN_CODE` | 국제 은행 계좌 번호 | "DE89 3704 0044 0532 0130 00" |
| `IP_ADDRESS` | IPv4 및 IPv6 주소 | "192.168.1.1" |
| `LOCATION` | 지리적 위치 | "뉴욕시" |
| `MEDICAL_LICENSE` | 의료 면허 번호 | "MD12345" |
| `NRP` | 국적, 종교 또는 정치 그룹 | - |
| `PERSON` | 개인 이름 | "홍길동" |
| `PHONE_NUMBER` | 다양한 형식의 전화번호 | "+82 (10) 1234-5678" |
| `URL` | 웹 URL | "https://example.com" |
### 미국 특정 엔티티
| 엔티티 | 설명 | 예시 |
|--------|------|------|
| `US_BANK_NUMBER` | 미국 은행 계좌 번호 | "1234567890" |
| `US_DRIVER_LICENSE` | 미국 운전면허 번호 | "D1234567" |
| `US_ITIN` | 개인 납세자 번호 | "900-70-0000" |
| `US_PASSPORT` | 미국 여권 번호 | "123456789" |
| `US_SSN` | 사회보장번호 | "123-45-6789" |
## 삭제 작업
활성화된 각 엔티티에 대해 데이터가 삭제되는 방식을 구성할 수 있습니다:
| 작업 | 설명 | 출력 예시 |
|------|------|----------|
| `mask` | 엔티티 유형 레이블로 대체 | `<CREDIT_CARD>` |
| `redact` | 텍스트를 완전히 제거 | *(비어있음)* |
## 커스텀 인식기
기본 제공 엔티티 외에도 조직별 PII 패턴을 감지하기 위한 **커스텀 인식기**를 생성할 수 있습니다.
<Frame>
![커스텀 인식기](/images/enterprise/pii_mask_recognizer.png)
</Frame>
### 인식기 유형
커스텀 인식기에는 두 가지 옵션이 있습니다:
| 유형 | 적합한 용도 | 사용 사례 예시 |
|------|------------|---------------|
| **패턴 기반 (Regex)** | 예측 가능한 형식의 구조화된 데이터 | 급여 금액, 직원 ID, 프로젝트 코드 |
| **거부 목록** | 정확한 문자열 매칭 | 회사명, 내부 코드명, 특정 용어 |
### 커스텀 인식기 생성
<Steps>
<Step title="커스텀 인식기로 이동">
조직 **Settings** → **Organization** → **Add Recognizer**로 이동합니다.
</Step>
<Step title="인식기 구성">
<Frame>
![인식기 구성](/images/enterprise/pii_mask_recognizer_create.png)
</Frame>
다음 필드를 구성합니다:
- **Name**: 인식기의 설명적 이름
- **Entity Type**: 삭제된 출력에 나타날 엔티티 레이블 (예: `EMPLOYEE_ID`, `SALARY`)
- **Type**: Regex 패턴 또는 거부 목록 중 선택
- **Pattern/Values**: 매칭할 Regex 패턴 또는 문자열 목록
- **Confidence Threshold**: 삭제를 트리거하는 데 필요한 최소 점수 (0.0-1.0). 높은 값 (예: 0.8)은 거짓 양성을 줄이지만 일부 매치를 놓칠 수 있습니다. 낮은 값 (예: 0.5)은 더 많은 매치를 잡지만 과도하게 삭제할 수 있습니다. 기본값은 0.8입니다.
- **Context Words** (선택사항): 근처에서 발견될 때 감지 신뢰도를 높이는 단어
</Step>
<Step title="저장">
인식기를 저장합니다. 배포에서 활성화할 수 있게 됩니다.
</Step>
</Steps>
### 엔티티 유형 이해하기
**Entity Type**은 매칭된 콘텐츠가 삭제된 트레이스에 어떻게 나타나는지 결정합니다:
```
Entity Type: SALARY
Pattern: salary:\s*\$\s*\d+
입력: "직원 급여: $50,000"
출력: "직원 <SALARY>"
```
### 컨텍스트 단어 사용
컨텍스트 단어는 매칭된 패턴 근처에 특정 용어가 나타날 때 신뢰도를 높여 정확도를 향상시킵니다:
```
Context Words: "project", "code", "internal"
Entity Type: PROJECT_CODE
Pattern: PRJ-\d{4}
```
"project" 또는 "code"가 "PRJ-1234" 근처에 나타나면, 인식기는 그것이 진정한 매치라는 확신이 높아져 거짓 양성을 줄입니다.
## 삭제된 트레이스 보기
PII 삭제가 활성화되면, 트레이스에서 민감한 데이터 대신 삭제된 값이 표시됩니다:
```
Task Output: "고객 <PERSON>이 주문 #12345를 했습니다.
연락처 이메일: <EMAIL_ADDRESS>, 전화: <PHONE_NUMBER>.
<CREDIT_CARD>로 끝나는 카드로 결제가 처리되었습니다."
```
삭제된 값은 꺾쇠 괄호와 엔티티 유형 레이블 (예: `<EMAIL_ADDRESS>`)로 명확하게 표시되어, 어떤 데이터가 보호되었는지 쉽게 이해할 수 있으면서도 크루 동작을 디버그하고 모니터링할 수 있습니다.
## 모범 사례
### 성능 고려사항
<Steps>
<Step title="필요한 엔티티만 활성화">
활성화된 각 엔티티는 처리 오버헤드를 추가합니다. 데이터와 관련된 엔티티만 활성화하세요.
</Step>
<Step title="구체적인 패턴 사용">
커스텀 인식기의 경우 거짓 양성을 줄이고 성능을 향상시키기 위해 구체적인 패턴을 사용하세요. Regex 패턴은 급여, 직원 ID, 프로젝트 코드 등 특정 패턴을 식별할 때 가장 적합합니다. 거부 목록 인식기는 회사명, 내부 코드명 등 정확한 문자열을 식별할 때 가장 적합합니다.
</Step>
<Step title="컨텍스트 단어 활용">
컨텍스트 단어는 주변 텍스트가 매칭될 때만 감지를 트리거하여 정확도를 향상시킵니다.
</Step>
</Steps>
## 문제 해결
<Accordion title="PII가 삭제되지 않음">
**가능한 원인:**
- 구성에서 엔티티 유형이 활성화되지 않음
- 패턴이 데이터 형식과 매치되지 않음
- 커스텀 인식기에 구문 오류가 있음
**해결책:**
- Settings → Security에서 엔티티가 활성화되어 있는지 확인
- 샘플 데이터로 regex 패턴 테스트
- 구성 오류에 대한 로그 확인
</Accordion>
<Accordion title="너무 많은 데이터가 삭제됨">
**가능한 원인:**
- 너무 광범위한 엔티티 유형이 활성화됨 (예: `DATE_TIME`이 모든 곳의 날짜를 잡음)
- 커스텀 인식기 패턴이 너무 일반적임
**해결책:**
- 거짓 양성을 유발하는 엔티티 비활성화
- 커스텀 패턴을 더 구체적으로 만들기
- 정확도 향상을 위해 컨텍스트 단어 추가
</Accordion>
<Accordion title="성능 문제">
**가능한 원인:**
- 너무 많은 엔티티가 활성화됨
- NLP 기반 엔티티 (`PERSON`, `LOCATION`, `NRP`)는 머신러닝 모델을 사용하므로 계산 비용이 높음
**해결책:**
- 실제로 필요한 엔티티만 활성화
- 가능한 경우 패턴 기반 대안 고려
- 대시보드에서 트레이스 처리 시간 모니터링
</Accordion>
---
## 실제 예시: 급여 패턴 매칭
이 예시는 트레이스에서 급여 정보를 감지하고 마스킹하는 커스텀 인식기를 생성하는 방법을 보여줍니다.
### 사용 사례
크루가 다음과 같은 형식의 급여 정보가 포함된 직원 또는 재무 데이터를 처리합니다:
- `salary: $50,000`
- `salary: $125,000.00`
- `salary:$1,500.50`
민감한 보상 데이터를 보호하기 위해 이러한 값을 자동으로 마스킹하려고 합니다.
### 구성
<Frame>
![급여 인식기 구성](/images/enterprise/pii_mask_custom_recognizer_salary.png)
</Frame>
| 필드 | 값 |
|------|-----|
| **Name** | `SALARY` |
| **Entity Type** | `SALARY` |
| **Type** | Regex Pattern |
| **Regex Pattern** | `salary:\s*\$\s*\d{1,3}(,\d{3})*(\.\d{2})?` |
| **Action** | Mask |
| **Confidence Threshold** | `0.8` |
| **Context Words** | `salary, compensation, pay, wage, income` |
### Regex 패턴 분석
| 패턴 구성요소 | 의미 |
|--------------|------|
| `salary:` | 리터럴 텍스트 "salary:" 매치 |
| `\s*` | 0개 이상의 공백 문자 매치 |
| `\$` | 달러 기호 매치 (이스케이프) |
| `\s*` | $ 뒤의 0개 이상의 공백 문자 매치 |
| `\d{1,3}` | 1-3자리 숫자 매치 (예: "1", "50", "125") |
| `(,\d{3})*` | 쉼표로 구분된 천 단위 매치 (예: ",000", ",500,000") |
| `(\.\d{2})?` | 선택적으로 센트 매치 (예: ".00", ".50") |
### 결과 예시
```
원본: "직원 기록에 salary: $125,000.00 연봉이 표시됩니다"
삭제됨: "직원 기록에 <SALARY> 연봉이 표시됩니다"
원본: "기본 salary:$50,000에 보너스 가능성"
삭제됨: "기본 <SALARY>에 보너스 가능성"
```
<Tip>
"salary", "compensation", "pay", "wage", "income"과 같은 컨텍스트 단어를 추가하면 이러한 용어가 매칭된 패턴 근처에 나타날 때 감지 신뢰도가 높아져 거짓 양성을 줄입니다.
</Tip>
### 배포에서 인식기 활성화
<Warning>
조직 수준에서 커스텀 인식기를 생성해도 배포에 자동으로 활성화되지 않습니다. 적용하려는 모든 배포에 대해 각 인식기를 수동으로 활성화해야 합니다.
</Warning>
커스텀 인식기를 생성한 후, 각 배포에서 활성화합니다:
<Steps>
<Step title="배포로 이동">
배포/자동화로 이동하여 **Settings** → **PII Protection**을 엽니다.
</Step>
<Step title="커스텀 인식기 선택">
**Mask Recognizers** 아래에서 조직에서 정의한 인식기를 볼 수 있습니다. 활성화하려는 인식기 옆의 체크박스를 선택합니다.
<Frame>
![커스텀 인식기 활성화](/images/enterprise/pii_mask_recognizers_options.png)
</Frame>
</Step>
<Step title="구성 저장">
변경 사항을 저장합니다. 인식기는 이 배포의 모든 후속 실행에서 활성화됩니다.
</Step>
</Steps>
<Info>
커스텀 인식기가 필요한 각 배포에서 이 프로세스를 반복합니다. 이를 통해 다양한 환경 (예: 개발 vs. 프로덕션)에서 어떤 인식기가 활성화되는지 세밀하게 제어할 수 있습니다.
</Info>

View File

@@ -1,305 +0,0 @@
---
title: "Crew 배포"
description: "CrewAI 엔터프라이즈에서 Crew 배포하기"
icon: "rocket"
mode: "wide"
---
<Note>
로컬에서 또는 Crew Studio를 통해 crew를 생성한 후, 다음 단계는 이를 CrewAI AMP
플랫폼에 배포하는 것입니다. 본 가이드에서는 다양한 배포 방법을 다루며,
여러분의 워크플로우에 가장 적합한 방식을 선택할 수 있도록 안내합니다.
</Note>
## 사전 준비 사항
<CardGroup cols={2}>
<Card title="배포 준비가 된 Crew" icon="users">
작동 중인 crew가 로컬에서 빌드되었거나 Crew Studio를 통해 생성되어 있어야
합니다.
</Card>
<Card title="GitHub 저장소" icon="github">
crew 코드가 GitHub 저장소에 있어야 합니다(GitHub 연동 방식의 경우).
</Card>
</CardGroup>
## 옵션 1: CrewAI CLI를 사용한 배포
CLI는 로컬에서 개발된 crew를 Enterprise 플랫폼에 가장 빠르게 배포할 수 있는 방법을 제공합니다.
<Steps>
<Step title="CrewAI CLI 설치">
아직 설치하지 않았다면 CrewAI CLI를 설치하세요:
```bash
pip install crewai[tools]
```
<Tip>
CLI는 기본 CrewAI 패키지에 포함되어 있지만, `[tools]` 추가 옵션을 사용하면 모든 배포 종속성을 함께 설치할 수 있습니다.
</Tip>
</Step>
<Step title="Enterprise 플랫폼에 인증">
먼저, CrewAI AMP 플랫폼에 CLI를 인증해야 합니다:
```bash
# 이미 CrewAI AMP 계정이 있거나 새로 생성하고 싶을 때:
crewai login
```
위 명령어를 실행하면 CLI가 다음을 진행합니다:
1. URL과 고유 기기 코드를 표시합니다
2. 브라우저를 열어 인증 페이지로 이동합니다
3. 기기 확인을 요청합니다
4. 인증 과정을 완료합니다
인증이 성공적으로 완료되면 터미널에 확인 메시지가 표시됩니다!
</Step>
<Step title="배포 생성">
프로젝트 디렉터리에서 다음 명령어를 실행하세요:
```bash
crewai deploy create
```
이 명령어는 다음을 수행합니다:
1. GitHub 저장소 정보를 감지합니다
2. 로컬 `.env` 파일의 환경 변수를 식별합니다
3. 이러한 변수를 Enterprise 플랫폼으로 안전하게 전송합니다
4. 고유 식별자가 부여된 새 배포를 만듭니다
성공적으로 생성되면 다음과 같은 메시지가 표시됩니다:
```shell
Deployment created successfully!
Name: your_project_name
Deployment ID: 01234567-89ab-cdef-0123-456789abcdef
Current Status: Deploy Enqueued
```
</Step>
<Step title="배포 진행 상황 모니터링">
다음 명령어로 배포 상태를 추적할 수 있습니다:
```bash
crewai deploy status
```
빌드 과정의 상세 로그가 필요하다면:
```bash
crewai deploy logs
```
<Tip>
첫 배포는 컨테이너 이미지를 빌드하므로 일반적으로 10~15분 정도 소요됩니다. 이후 배포는 훨씬 빠릅니다.
</Tip>
</Step>
</Steps>
## 추가 CLI 명령어
CrewAI CLI는 배포를 관리하기 위한 여러 명령어를 제공합니다:
```bash
# 모든 배포 목록 확인
crewai deploy list
# 배포 상태 확인
crewai deploy status
# 배포 로그 보기
crewai deploy logs
# 코드 변경 후 업데이트 푸시
crewai deploy push
# 배포 삭제
crewai deploy remove <deployment_id>
```
## 옵션 2: 웹 인터페이스를 통한 직접 배포
GitHub 계정을 연결하여 CrewAI AMP 웹 인터페이스를 통해 crews를 직접 배포할 수도 있습니다. 이 방법은 로컬 머신에서 CLI를 사용할 필요가 없습니다.
<Steps>
<Step title="GitHub로 푸시하기">
crew를 GitHub 저장소에 푸시해야 합니다. 아직 crew를 만들지 않았다면, [이 튜토리얼](/ko/quickstart)을 따라할 수 있습니다.
</Step>
<Step title="GitHub를 CrewAI AOP에 연결하기">
1. [CrewAI AMP](https://app.crewai.com)에 로그인합니다.
2. "Connect GitHub" 버튼을 클릭합니다.
<Frame>
![Connect GitHub Button](/images/enterprise/connect-github.png)
</Frame>
</Step>
<Step title="저장소 선택하기">
GitHub 계정을 연결한 후 배포할 저장소를 선택할 수 있습니다:
<Frame>
![Select Repository](/images/enterprise/select-repo.png)
</Frame>
</Step>
<Step title="환경 변수 설정하기">
배포 전에, LLM 제공업체 또는 기타 서비스에 연결할 환경 변수를 설정해야 합니다:
1. 변수를 개별적으로 또는 일괄적으로 추가할 수 있습니다.
2. 환경 변수는 `KEY=VALUE` 형식(한 줄에 하나씩)으로 입력합니다.
<Frame>
![Set Environment Variables](/images/enterprise/set-env-variables.png)
</Frame>
</Step>
<Step title="Crew 배포하기">
1. "Deploy" 버튼을 클릭하여 배포 프로세스를 시작합니다.
2. 진행 바를 통해 진행 상황을 모니터링할 수 있습니다.
3. 첫 번째 배포에는 일반적으로 약 10-15분 정도 소요되며, 이후 배포는 더 빠릅니다.
<Frame>
![Deploy Progress](/images/enterprise/deploy-progress.png)
</Frame>
배포가 완료되면 다음을 확인할 수 있습니다:
- crew의 고유 URL
- crew API를 보호할 Bearer 토큰
- 배포를 삭제해야 하는 경우 "Delete" 버튼
</Step>
</Steps>
## ⚠️ 환경 변수 보안 요구사항
<Warning>
**중요**: CrewAI AOP는 환경 변수 이름에 대한 보안 제한이 있으며, 이를 따르지
않을 경우 배포가 실패할 수 있습니다.
</Warning>
### 차단된 환경 변수 패턴
보안상의 이유로, 다음과 같은 환경 변수 명명 패턴은 **자동으로 필터링**되며 배포에 문제가 발생할 수 있습니다:
**차단된 패턴:**
- `_TOKEN`으로 끝나는 변수 (예: `MY_API_TOKEN`)
- `_PASSWORD`로 끝나는 변수 (예: `DB_PASSWORD`)
- `_SECRET`로 끝나는 변수 (예: `API_SECRET`)
- 특정 상황에서 `_KEY`로 끝나는 변수
**특정 차단 변수:**
- `GITHUB_USER`, `GITHUB_TOKEN`
- `AWS_REGION`, `AWS_DEFAULT_REGION`
- 다양한 내부 CrewAI 시스템 변수
### 허용된 예외
일부 변수는 차단된 패턴과 일치하더라도 명시적으로 허용됩니다:
- `AZURE_AD_TOKEN`
- `AZURE_OPENAI_AD_TOKEN`
- `ENTERPRISE_ACTION_TOKEN`
- `CREWAI_ENTEPRISE_TOOLS_TOKEN`
### 네이밍 문제 해결 방법
환경 변수 제한으로 인해 배포가 실패하는 경우:
```bash
# ❌ 이러한 이름은 배포 실패를 초래합니다
OPENAI_TOKEN=sk-...
DATABASE_PASSWORD=mypassword
API_SECRET=secret123
# ✅ 대신 다음과 같은 네이밍 패턴을 사용하세요
OPENAI_API_KEY=sk-...
DATABASE_CREDENTIALS=mypassword
API_CONFIG=secret123
```
### 모범 사례
1. **표준 명명 규칙 사용**: `PROVIDER_TOKEN` 대신 `PROVIDER_API_KEY` 사용
2. **먼저 로컬에서 테스트**: crew가 이름이 변경된 변수로 제대로 동작하는지 확인
3. **코드 업데이트**: 이전 변수 이름을 참조하는 부분을 모두 변경
4. **변경 내용 문서화**: 팀을 위해 이름이 변경된 변수를 기록
<Tip>
배포 실패 시, 환경 변수 에러 메시지가 난해하다면 먼저 변수 이름이 이 패턴을
따르는지 확인하세요.
</Tip>
### 배포된 Crew와 상호작용하기
배포가 완료되면 다음을 통해 crew에 접근할 수 있습니다:
1. **REST API**: 플랫폼에서 아래의 주요 경로가 포함된 고유한 HTTPS 엔드포인트를 생성합니다:
- `/inputs`: 필요한 입력 파라미터 목록
- `/kickoff`: 제공된 입력값으로 실행 시작
- `/status/{kickoff_id}`: 실행 상태 확인
2. **웹 인터페이스**: [app.crewai.com](https://app.crewai.com)에 방문하여 다음을 확인할 수 있습니다:
- **Status 탭**: 배포 정보, API 엔드포인트 세부 정보 및 인증 토큰 확인
- **Run 탭**: crew 구조의 시각적 표현
- **Executions 탭**: 모든 실행 내역
- **Metrics 탭**: 성능 분석
- **Traces 탭**: 상세 실행 인사이트
### 실행 트리거하기
Enterprise 대시보드에서 다음 작업을 수행할 수 있습니다:
1. crew 이름을 클릭하여 상세 정보를 엽니다
2. 관리 인터페이스에서 "Trigger Crew"를 선택합니다
3. 나타나는 모달에 필요한 입력값을 입력합니다
4. 파이프라인을 따라 실행의 진행 상황을 모니터링합니다
### 모니터링 및 분석
Enterprise 플랫폼은 포괄적인 가시성 기능을 제공합니다:
- **실행 관리**: 활성 및 완료된 실행 추적
- **트레이스**: 각 실행의 상세 분해
- **메트릭**: 토큰 사용량, 실행 시간, 비용
- **타임라인 보기**: 작업 시퀀스의 시각적 표현
### 고급 기능
Enterprise 플랫폼은 또한 다음을 제공합니다:
- **환경 변수 관리**: API 키를 안전하게 저장 및 관리
- **LLM 연결**: 다양한 LLM 공급자와의 통합 구성
- **Custom Tools Repository**: 도구 생성, 공유 및 설치
- **Crew Studio**: 코드를 작성하지 않고 채팅 인터페이스를 통해 crew 빌드
<Card
title="도움이 필요하신가요?"
icon="headset"
href="mailto:support@crewai.com"
>
Enterprise 플랫폼의 배포 문제 또는 문의 사항이 있으시면 지원팀에 연락해
주십시오.
</Card>

View File

@@ -0,0 +1,438 @@
---
title: "AMP에 배포하기"
description: "Crew 또는 Flow를 CrewAI AMP에 배포하기"
icon: "rocket"
mode: "wide"
---
<Note>
로컬에서 또는 Crew Studio를 통해 Crew나 Flow를 생성한 후, 다음 단계는 이를 CrewAI AMP
플랫폼에 배포하는 것입니다. 본 가이드에서는 다양한 배포 방법을 다루며,
여러분의 워크플로우에 가장 적합한 방식을 선택할 수 있도록 안내합니다.
</Note>
## 사전 준비 사항
<CardGroup cols={2}>
<Card title="배포 준비가 완료된 프로젝트" icon="check-circle">
로컬에서 성공적으로 실행되는 Crew 또는 Flow가 있어야 합니다.
[배포 준비 가이드](/ko/enterprise/guides/prepare-for-deployment)를 따라 프로젝트 구조를 확인하세요.
</Card>
<Card title="GitHub 저장소" icon="github">
코드가 GitHub 저장소에 있어야 합니다(GitHub 연동 방식의 경우).
</Card>
</CardGroup>
<Info>
**Crews vs Flows**: 두 프로젝트 유형 모두 CrewAI AMP에서 "자동화"로 배포할 수 있습니다.
배포 과정은 동일하지만, 프로젝트 구조가 다릅니다.
자세한 내용은 [배포 준비하기](/ko/enterprise/guides/prepare-for-deployment)를 참조하세요.
</Info>
## 옵션 1: CrewAI CLI를 사용한 배포
CLI는 로컬에서 개발된 Crew 또는 Flow를 AMP 플랫폼에 가장 빠르게 배포할 수 있는 방법을 제공합니다.
CLI는 `pyproject.toml`에서 프로젝트 유형을 자동으로 감지하고 그에 맞게 빌드합니다.
<Steps>
<Step title="CrewAI CLI 설치">
아직 설치하지 않았다면 CrewAI CLI를 설치하세요:
```bash
pip install crewai[tools]
```
<Tip>
CLI는 기본 CrewAI 패키지에 포함되어 있지만, `[tools]` 추가 옵션을 사용하면 모든 배포 종속성을 함께 설치할 수 있습니다.
</Tip>
</Step>
<Step title="Enterprise 플랫폼에 인증">
먼저, CrewAI AMP 플랫폼에 CLI를 인증해야 합니다:
```bash
# 이미 CrewAI AMP 계정이 있거나 새로 생성하고 싶을 때:
crewai login
```
위 명령어를 실행하면 CLI가 다음을 진행합니다:
1. URL과 고유 기기 코드를 표시합니다
2. 브라우저를 열어 인증 페이지로 이동합니다
3. 기기 확인을 요청합니다
4. 인증 과정을 완료합니다
인증이 성공적으로 완료되면 터미널에 확인 메시지가 표시됩니다!
</Step>
<Step title="배포 생성">
프로젝트 디렉터리에서 다음 명령어를 실행하세요:
```bash
crewai deploy create
```
이 명령어는 다음을 수행합니다:
1. GitHub 저장소 정보를 감지합니다
2. 로컬 `.env` 파일의 환경 변수를 식별합니다
3. 이러한 변수를 Enterprise 플랫폼으로 안전하게 전송합니다
4. 고유 식별자가 부여된 새 배포를 만듭니다
성공적으로 생성되면 다음과 같은 메시지가 표시됩니다:
```shell
Deployment created successfully!
Name: your_project_name
Deployment ID: 01234567-89ab-cdef-0123-456789abcdef
Current Status: Deploy Enqueued
```
</Step>
<Step title="배포 진행 상황 모니터링">
다음 명령어로 배포 상태를 추적할 수 있습니다:
```bash
crewai deploy status
```
빌드 과정의 상세 로그가 필요하다면:
```bash
crewai deploy logs
```
<Tip>
첫 배포는 컨테이너 이미지를 빌드하므로 일반적으로 10~15분 정도 소요됩니다. 이후 배포는 훨씬 빠릅니다.
</Tip>
</Step>
</Steps>
## 추가 CLI 명령어
CrewAI CLI는 배포를 관리하기 위한 여러 명령어를 제공합니다:
```bash
# 모든 배포 목록 확인
crewai deploy list
# 배포 상태 확인
crewai deploy status
# 배포 로그 보기
crewai deploy logs
# 코드 변경 후 업데이트 푸시
crewai deploy push
# 배포 삭제
crewai deploy remove <deployment_id>
```
## 옵션 2: 웹 인터페이스를 통한 직접 배포
GitHub 계정을 연결하여 CrewAI AMP 웹 인터페이스를 통해 Crew 또는 Flow를 직접 배포할 수도 있습니다. 이 방법은 로컬 머신에서 CLI를 사용할 필요가 없습니다. 플랫폼은 자동으로 프로젝트 유형을 감지하고 적절하게 빌드를 처리합니다.
<Steps>
<Step title="GitHub로 푸시하기">
Crew를 GitHub 저장소에 푸시해야 합니다. 아직 Crew를 만들지 않았다면, [이 튜토리얼](/ko/quickstart)을 따라할 수 있습니다.
</Step>
<Step title="GitHub를 CrewAI AMP에 연결하기">
1. [CrewAI AMP](https://app.crewai.com)에 로그인합니다.
2. "Connect GitHub" 버튼을 클릭합니다.
<Frame>
![Connect GitHub Button](/images/enterprise/connect-github.png)
</Frame>
</Step>
<Step title="저장소 선택하기">
GitHub 계정을 연결한 후 배포할 저장소를 선택할 수 있습니다:
<Frame>
![Select Repository](/images/enterprise/select-repo.png)
</Frame>
</Step>
<Step title="환경 변수 설정하기">
배포 전에, LLM 제공업체 또는 기타 서비스에 연결할 환경 변수를 설정해야 합니다:
1. 변수를 개별적으로 또는 일괄적으로 추가할 수 있습니다.
2. 환경 변수는 `KEY=VALUE` 형식(한 줄에 하나씩)으로 입력합니다.
<Frame>
![Set Environment Variables](/images/enterprise/set-env-variables.png)
</Frame>
</Step>
<Step title="Crew 배포하기">
1. "Deploy" 버튼을 클릭하여 배포 프로세스를 시작합니다.
2. 진행 바를 통해 진행 상황을 모니터링할 수 있습니다.
3. 첫 번째 배포에는 일반적으로 약 10-15분 정도 소요되며, 이후 배포는 더 빠릅니다.
<Frame>
![Deploy Progress](/images/enterprise/deploy-progress.png)
</Frame>
배포가 완료되면 다음을 확인할 수 있습니다:
- Crew의 고유 URL
- Crew API를 보호할 Bearer 토큰
- 배포를 삭제해야 하는 경우 "Delete" 버튼
</Step>
</Steps>
## 옵션 3: API를 통한 재배포 (CI/CD 통합)
CI/CD 파이프라인에서 자동화된 배포를 위해 CrewAI API를 사용하여 기존 crew의 재배포를 트리거할 수 있습니다. 이 방법은 GitHub Actions, Jenkins 또는 기타 자동화 워크플로우에 특히 유용합니다.
<Steps>
<Step title="개인 액세스 토큰 발급">
CrewAI AMP 계정 설정에서 API 토큰을 생성합니다:
1. [app.crewai.com](https://app.crewai.com)으로 이동합니다
2. **Settings** → **Account** → **Personal Access Token**을 클릭합니다
3. 새 토큰을 생성하고 안전하게 복사합니다
4. 이 토큰을 CI/CD 시스템의 시크릿으로 저장합니다
</Step>
<Step title="Automation UUID 찾기">
배포된 crew의 고유 식별자를 찾습니다:
1. CrewAI AMP 대시보드에서 **Automations**로 이동합니다
2. 기존 automation/crew를 선택합니다
3. **Additional Details**를 클릭합니다
4. **UUID**를 복사합니다 - 이것이 특정 crew 배포를 식별합니다
</Step>
<Step title="API를 통한 재배포 트리거">
Deploy API 엔드포인트를 사용하여 재배포를 트리거합니다:
```bash
curl -i -X POST \
-H "Authorization: Bearer YOUR_PERSONAL_ACCESS_TOKEN" \
https://app.crewai.com/crewai_plus/api/v1/crews/YOUR-AUTOMATION-UUID/deploy
# HTTP/2 200
# content-type: application/json
#
# {
# "uuid": "your-automation-uuid",
# "status": "Deploy Enqueued",
# "public_url": "https://your-crew-deployment.crewai.com",
# "token": "your-bearer-token"
# }
```
<Info>
Git에 연결되어 처음 생성된 automation의 경우, API가 재배포 전에 자동으로 저장소에서 최신 변경 사항을 가져옵니다.
</Info>
</Step>
<Step title="GitHub Actions 통합 예시">
더 복잡한 배포 트리거가 있는 GitHub Actions 워크플로우 예시입니다:
```yaml
name: Deploy CrewAI Automation
on:
push:
branches: [ main ]
pull_request:
types: [ labeled ]
release:
types: [ published ]
jobs:
deploy:
runs-on: ubuntu-latest
if: |
(github.event_name == 'push' && github.ref == 'refs/heads/main') ||
(github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'deploy')) ||
(github.event_name == 'release')
steps:
- name: Trigger CrewAI Redeployment
run: |
curl -X POST \
-H "Authorization: Bearer ${{ secrets.CREWAI_PAT }}" \
https://app.crewai.com/crewai_plus/api/v1/crews/${{ secrets.CREWAI_AUTOMATION_UUID }}/deploy
```
<Tip>
`CREWAI_PAT`와 `CREWAI_AUTOMATION_UUID`를 저장소 시크릿으로 추가하세요. PR 배포의 경우 "deploy" 라벨을 추가하여 워크플로우를 트리거합니다.
</Tip>
</Step>
</Steps>
## 배포된 Automation과 상호작용하기
배포가 완료되면 다음을 통해 crew에 접근할 수 있습니다:
1. **REST API**: 플랫폼에서 아래의 주요 경로가 포함된 고유한 HTTPS 엔드포인트를 생성합니다:
- `/inputs`: 필요한 입력 파라미터 목록
- `/kickoff`: 제공된 입력값으로 실행 시작
- `/status/{kickoff_id}`: 실행 상태 확인
2. **웹 인터페이스**: [app.crewai.com](https://app.crewai.com)에 방문하여 다음을 확인할 수 있습니다:
- **Status 탭**: 배포 정보, API 엔드포인트 세부 정보 및 인증 토큰 확인
- **Run 탭**: Crew 구조의 시각적 표현
- **Executions 탭**: 모든 실행 내역
- **Metrics 탭**: 성능 분석
- **Traces 탭**: 상세 실행 인사이트
### 실행 트리거하기
Enterprise 대시보드에서 다음 작업을 수행할 수 있습니다:
1. Crew 이름을 클릭하여 상세 정보를 엽니다
2. 관리 인터페이스에서 "Trigger Crew"를 선택합니다
3. 나타나는 모달에 필요한 입력값을 입력합니다
4. 파이프라인을 따라 실행의 진행 상황을 모니터링합니다
### 모니터링 및 분석
Enterprise 플랫폼은 포괄적인 가시성 기능을 제공합니다:
- **실행 관리**: 활성 및 완료된 실행 추적
- **트레이스**: 각 실행의 상세 분해
- **메트릭**: 토큰 사용량, 실행 시간, 비용
- **타임라인 보기**: 작업 시퀀스의 시각적 표현
### 고급 기능
Enterprise 플랫폼은 또한 다음을 제공합니다:
- **환경 변수 관리**: API 키를 안전하게 저장 및 관리
- **LLM 연결**: 다양한 LLM 공급자와의 통합 구성
- **Custom Tools Repository**: 도구 생성, 공유 및 설치
- **Crew Studio**: 코드를 작성하지 않고 채팅 인터페이스를 통해 crew 빌드
## 배포 실패 문제 해결
배포가 실패하면 다음과 같은 일반적인 문제를 확인하세요:
### 빌드 실패
#### uv.lock 파일 누락
**증상**: 의존성 해결 오류와 함께 빌드 초기에 실패
**해결책**: lock 파일을 생성하고 커밋합니다:
```bash
uv lock
git add uv.lock
git commit -m "Add uv.lock for deployment"
git push
```
<Warning>
`uv.lock` 파일은 모든 배포에 필수입니다. 이 파일이 없으면 플랫폼에서
의존성을 안정적으로 설치할 수 없습니다.
</Warning>
#### 잘못된 프로젝트 구조
**증상**: "Could not find entry point" 또는 "Module not found" 오류
**해결책**: 프로젝트가 예상 구조와 일치하는지 확인합니다:
- **Crews와 Flows 모두**: 진입점이 `src/project_name/main.py`에 있어야 합니다
- **Crews**: 진입점으로 `run()` 함수 사용
- **Flows**: 진입점으로 `kickoff()` 함수 사용
자세한 구조 다이어그램은 [배포 준비하기](/ko/enterprise/guides/prepare-for-deployment)를 참조하세요.
#### CrewBase 데코레이터 누락
**증상**: "Crew not found", "Config not found" 또는 agent/task 구성 오류
**해결책**: **모든** crew 클래스가 `@CrewBase` 데코레이터를 사용하는지 확인합니다:
```python
from crewai.project import CrewBase, agent, crew, task
@CrewBase # 이 데코레이터는 필수입니다
class YourCrew():
"""Crew 설명"""
@agent
def my_agent(self) -> Agent:
return Agent(
config=self.agents_config['my_agent'], # type: ignore[index]
verbose=True
)
# ... 나머지 crew 정의
```
<Info>
이것은 독립 실행형 Crews와 Flow 프로젝트 내에 포함된 crews 모두에 적용됩니다.
모든 crew 클래스에 데코레이터가 필요합니다.
</Info>
#### 잘못된 pyproject.toml 타입
**증상**: 빌드는 성공하지만 런타임에서 실패하거나 예상치 못한 동작
**해결책**: `[tool.crewai]` 섹션이 프로젝트 유형과 일치하는지 확인합니다:
```toml
# Crew 프로젝트의 경우:
[tool.crewai]
type = "crew"
# Flow 프로젝트의 경우:
[tool.crewai]
type = "flow"
```
### 런타임 실패
#### LLM 연결 실패
**증상**: API 키 오류, "model not found" 또는 인증 실패
**해결책**:
1. LLM 제공업체의 API 키가 환경 변수에 올바르게 설정되어 있는지 확인합니다
2. 환경 변수 이름이 코드에서 예상하는 것과 일치하는지 확인합니다
3. 배포 전에 동일한 환경 변수로 로컬에서 테스트합니다
#### Crew 실행 오류
**증상**: Crew가 시작되지만 실행 중에 실패
**해결책**:
1. AMP 대시보드에서 실행 로그를 확인합니다 (Traces 탭)
2. 모든 도구에 필요한 API 키가 구성되어 있는지 확인합니다
3. `agents.yaml`의 agent 구성이 유효한지 확인합니다
4. `tasks.yaml`의 task 구성에 구문 오류가 없는지 확인합니다
<Card title="도움이 필요하신가요?" icon="headset" href="mailto:support@crewai.com">
배포 문제 또는 AMP 플랫폼에 대한 문의 사항이 있으시면 지원팀에 연락해 주세요.
</Card>

View File

@@ -0,0 +1,305 @@
---
title: "배포 준비하기"
description: "Crew 또는 Flow가 CrewAI AMP에 배포될 준비가 되었는지 확인하기"
icon: "clipboard-check"
mode: "wide"
---
<Note>
CrewAI AMP에 배포하기 전에, 프로젝트가 올바르게 구성되어 있는지 확인하는 것이 중요합니다.
Crews와 Flows 모두 "자동화"로 배포할 수 있지만, 성공적인 배포를 위해 충족해야 하는
서로 다른 프로젝트 구조와 요구 사항이 있습니다.
</Note>
## 자동화 이해하기
CrewAI AMP에서 **자동화(automations)**는 배포 가능한 Agentic AI 프로젝트의 총칭입니다. 자동화는 다음 중 하나일 수 있습니다:
- **Crew**: 작업을 함께 수행하는 AI 에이전트들의 독립 실행형 팀
- **Flow**: 여러 crew, 직접 LLM 호출 및 절차적 로직을 결합할 수 있는 오케스트레이션된 워크플로우
배포하는 유형을 이해하는 것은 프로젝트 구조와 진입점이 다르기 때문에 필수적입니다.
## Crews vs Flows: 주요 차이점
<CardGroup cols={2}>
<Card title="Crew 프로젝트" icon="users">
에이전트와 작업을 정의하는 `crew.py`가 있는 독립 실행형 AI 에이전트 팀. 집중적이고 협업적인 작업에 적합합니다.
</Card>
<Card title="Flow 프로젝트" icon="diagram-project">
`crews/` 폴더에 포함된 crew가 있는 오케스트레이션된 워크플로우. 복잡한 다단계 프로세스에 적합합니다.
</Card>
</CardGroup>
| 측면 | Crew | Flow |
|------|------|------|
| **프로젝트 구조** | `crew.py`가 있는 `src/project_name/` | `crews/` 폴더가 있는 `src/project_name/` |
| **메인 로직 위치** | `src/project_name/crew.py` | `src/project_name/main.py` (Flow 클래스) |
| **진입점 함수** | `main.py`의 `run()` | `main.py`의 `kickoff()` |
| **pyproject.toml 타입** | `type = "crew"` | `type = "flow"` |
| **CLI 생성 명령어** | `crewai create crew name` | `crewai create flow name` |
| **설정 위치** | `src/project_name/config/` | `src/project_name/crews/crew_name/config/` |
| **다른 crew 포함 가능** | 아니오 | 예 (`crews/` 폴더 내) |
## 프로젝트 구조 참조
### Crew 프로젝트 구조
`crewai create crew my_crew`를 실행하면 다음 구조를 얻습니다:
```
my_crew/
├── .gitignore
├── pyproject.toml # type = "crew"여야 함
├── README.md
├── .env
├── uv.lock # 배포에 필수
└── src/
└── my_crew/
├── __init__.py
├── main.py # run() 함수가 있는 진입점
├── crew.py # @CrewBase 데코레이터가 있는 Crew 클래스
├── tools/
│ ├── custom_tool.py
│ └── __init__.py
└── config/
├── agents.yaml # 에이전트 정의
└── tasks.yaml # 작업 정의
```
<Warning>
중첩된 `src/project_name/` 구조는 Crews에 매우 중요합니다.
잘못된 레벨에 파일을 배치하면 배포 실패의 원인이 됩니다.
</Warning>
### Flow 프로젝트 구조
`crewai create flow my_flow`를 실행하면 다음 구조를 얻습니다:
```
my_flow/
├── .gitignore
├── pyproject.toml # type = "flow"여야 함
├── README.md
├── .env
├── uv.lock # 배포에 필수
└── src/
└── my_flow/
├── __init__.py
├── main.py # kickoff() 함수 + Flow 클래스가 있는 진입점
├── crews/ # 포함된 crews 폴더
│ └── poem_crew/
│ ├── __init__.py
│ ├── poem_crew.py # @CrewBase 데코레이터가 있는 Crew
│ └── config/
│ ├── agents.yaml
│ └── tasks.yaml
└── tools/
├── __init__.py
└── custom_tool.py
```
<Info>
Crews와 Flows 모두 `src/project_name/` 구조를 사용합니다.
핵심 차이점은 Flows는 포함된 crews를 위한 `crews/` 폴더가 있고,
Crews는 프로젝트 폴더에 직접 `crew.py`가 있다는 것입니다.
</Info>
## 배포 전 체크리스트
이 체크리스트를 사용하여 프로젝트가 배포 준비가 되었는지 확인하세요.
### 1. pyproject.toml 설정 확인
`pyproject.toml`에 올바른 `[tool.crewai]` 섹션이 포함되어야 합니다:
<Tabs>
<Tab title="Crews의 경우">
```toml
[tool.crewai]
type = "crew"
```
</Tab>
<Tab title="Flows의 경우">
```toml
[tool.crewai]
type = "flow"
```
</Tab>
</Tabs>
<Warning>
`type`이 프로젝트 구조와 일치하지 않으면 빌드가 실패하거나
자동화가 올바르게 실행되지 않습니다.
</Warning>
### 2. uv.lock 파일 존재 확인
CrewAI는 의존성 관리를 위해 `uv`를 사용합니다. `uv.lock` 파일은 재현 가능한 빌드를 보장하며 배포에 **필수**입니다.
```bash
# lock 파일 생성 또는 업데이트
uv lock
# 존재 여부 확인
ls -la uv.lock
```
파일이 존재하지 않으면 `uv lock`을 실행하고 저장소에 커밋하세요:
```bash
uv lock
git add uv.lock
git commit -m "Add uv.lock for deployment"
git push
```
### 3. CrewBase 데코레이터 사용 확인
**모든 crew 클래스는 `@CrewBase` 데코레이터를 사용해야 합니다.** 이것은 다음에 적용됩니다:
- 독립 실행형 crew 프로젝트
- Flow 프로젝트 내에 포함된 crews
```python
from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from crewai.agents.agent_builder.base_agent import BaseAgent
from typing import List
@CrewBase # 이 데코레이터는 필수입니다
class MyCrew():
"""내 crew 설명"""
agents: List[BaseAgent]
tasks: List[Task]
@agent
def my_agent(self) -> Agent:
return Agent(
config=self.agents_config['my_agent'], # type: ignore[index]
verbose=True
)
@task
def my_task(self) -> Task:
return Task(
config=self.tasks_config['my_task'] # type: ignore[index]
)
@crew
def crew(self) -> Crew:
return Crew(
agents=self.agents,
tasks=self.tasks,
process=Process.sequential,
verbose=True,
)
```
<Warning>
`@CrewBase` 데코레이터를 잊으면 에이전트나 작업 구성이 누락되었다는
오류와 함께 배포가 실패합니다.
</Warning>
### 4. 프로젝트 진입점 확인
Crews와 Flows 모두 `src/project_name/main.py`에 진입점이 있습니다:
<Tabs>
<Tab title="Crews의 경우">
진입점은 `run()` 함수를 사용합니다:
```python
# src/my_crew/main.py
from my_crew.crew import MyCrew
def run():
"""crew를 실행합니다."""
inputs = {'topic': 'AI in Healthcare'}
result = MyCrew().crew().kickoff(inputs=inputs)
return result
if __name__ == "__main__":
run()
```
</Tab>
<Tab title="Flows의 경우">
진입점은 Flow 클래스와 함께 `kickoff()` 함수를 사용합니다:
```python
# src/my_flow/main.py
from crewai.flow import Flow, listen, start
from my_flow.crews.poem_crew.poem_crew import PoemCrew
class MyFlow(Flow):
@start()
def begin(self):
# Flow 로직
result = PoemCrew().crew().kickoff(inputs={...})
return result
def kickoff():
"""flow를 실행합니다."""
MyFlow().kickoff()
if __name__ == "__main__":
kickoff()
```
</Tab>
</Tabs>
### 5. 환경 변수 준비
배포 전에 다음을 준비해야 합니다:
1. **LLM API 키** (OpenAI, Anthropic, Google 등)
2. **도구 API 키** - 외부 도구를 사용하는 경우 (Serper 등)
<Tip>
구성 문제를 조기에 발견하기 위해 배포 전에 동일한 환경 변수로
로컬에서 프로젝트를 테스트하세요.
</Tip>
## 빠른 검증 명령어
프로젝트 루트에서 다음 명령어를 실행하여 설정을 빠르게 확인하세요:
```bash
# 1. pyproject.toml에서 프로젝트 타입 확인
grep -A2 "\[tool.crewai\]" pyproject.toml
# 2. uv.lock 존재 확인
ls -la uv.lock || echo "오류: uv.lock이 없습니다! 'uv lock'을 실행하세요"
# 3. src/ 구조 존재 확인
ls -la src/*/main.py 2>/dev/null || echo "src/에서 main.py를 찾을 수 없습니다"
# 4. Crews의 경우 - crew.py 존재 확인
ls -la src/*/crew.py 2>/dev/null || echo "crew.py가 없습니다 (Crews에서 예상됨)"
# 5. Flows의 경우 - crews/ 폴더 존재 확인
ls -la src/*/crews/ 2>/dev/null || echo "crews/ 폴더가 없습니다 (Flows에서 예상됨)"
# 6. CrewBase 사용 확인
grep -r "@CrewBase" . --include="*.py"
```
## 일반적인 설정 실수
| 실수 | 증상 | 해결 방법 |
|------|------|----------|
| `uv.lock` 누락 | 의존성 해결 중 빌드 실패 | `uv lock` 실행 후 커밋 |
| pyproject.toml의 잘못된 `type` | 빌드 성공하지만 런타임 실패 | 올바른 타입으로 변경 |
| `@CrewBase` 데코레이터 누락 | "Config not found" 오류 | 모든 crew 클래스에 데코레이터 추가 |
| `src/` 대신 루트에 파일 배치 | 진입점을 찾을 수 없음 | `src/project_name/`으로 이동 |
| `run()` 또는 `kickoff()` 누락 | 자동화를 시작할 수 없음 | 올바른 진입 함수 추가 |
## 다음 단계
프로젝트가 모든 체크리스트 항목을 통과하면 배포할 준비가 된 것입니다:
<Card title="AMP에 배포하기" icon="rocket" href="/ko/enterprise/guides/deploy-to-amp">
CLI, 웹 인터페이스 또는 CI/CD 통합을 사용하여 Crew 또는 Flow를 CrewAI AMP에
배포하려면 배포 가이드를 따르세요.
</Card>

View File

@@ -79,7 +79,7 @@ CrewAI AOP는 오픈 소스 프레임워크의 강력함에 프로덕션 배포,
<Card
title="Crew 배포"
icon="rocket"
href="/ko/enterprise/guides/deploy-crew"
href="/ko/enterprise/guides/deploy-to-amp"
>
Crew 배포
</Card>
@@ -96,4 +96,4 @@ CrewAI AOP는 오픈 소스 프레임워크의 강력함에 프로덕션 배포,
</Step>
</Steps>
자세한 안내를 원하시면 [배포 가이드](/ko/enterprise/guides/deploy-crew)를 확인하거나 아래 버튼을 클릭해 시작하세요.
자세한 안내를 원하시면 [배포 가이드](/ko/enterprise/guides/deploy-to-amp)를 확인하거나 아래 버튼을 클릭해 시작하세요.

View File

@@ -7,6 +7,10 @@ mode: "wide"
## 개요
<Note>
`@human_feedback` 데코레이터는 **CrewAI 버전 1.8.0 이상**이 필요합니다. 이 기능을 사용하기 전에 설치를 업데이트하세요.
</Note>
`@human_feedback` 데코레이터는 CrewAI Flow 내에서 직접 human-in-the-loop(HITL) 워크플로우를 가능하게 합니다. Flow 실행을 일시 중지하고, 인간에게 검토를 위해 출력을 제시하고, 피드백을 수집하고, 선택적으로 피드백 결과에 따라 다른 리스너로 라우팅할 수 있습니다.
이는 특히 다음과 같은 경우에 유용합니다:

View File

@@ -5,9 +5,22 @@ icon: "user-check"
mode: "wide"
---
휴먼 인 더 루프(HITL, Human-in-the-Loop)는 인공지능과 인간의 전문 지식을 결합하여 의사결정을 강화하고 작업 결과를 향상시키는 강력한 접근 방식입니다. 이 가이드에서는 CrewAI 내에서 HITL을 구현하는 방법을 안내합니다.
휴먼 인 더 루프(HITL, Human-in-the-Loop)는 인공지능과 인간의 전문 지식을 결합하여 의사결정을 강화하고 작업 결과를 향상시키는 강력한 접근 방식입니다. CrewAI는 필요에 따라 HITL을 구현하는 여러 가지 방법을 제공합니다.
## HITL 워크플로우 설정
## HITL 접근 방식 선택
CrewAI는 human-in-the-loop 워크플로우를 구현하기 위한 두 가지 주요 접근 방식을 제공합니다:
| 접근 방식 | 적합한 용도 | 통합 | 버전 |
|----------|----------|-------------|---------|
| **Flow 기반** (`@human_feedback` 데코레이터) | 로컬 개발, 콘솔 기반 검토, 동기식 워크플로우 | [Flow에서 인간 피드백](/ko/learn/human-feedback-in-flows) | **1.8.0+** |
| **Webhook 기반** (Enterprise) | 프로덕션 배포, 비동기 워크플로우, 외부 통합 (Slack, Teams 등) | 이 가이드 | - |
<Tip>
Flow를 구축하면서 피드백을 기반으로 라우팅하는 인간 검토 단계를 추가하려면 `@human_feedback` 데코레이터에 대한 [Flow에서 인간 피드백](/ko/learn/human-feedback-in-flows) 가이드를 참조하세요.
</Tip>
## Webhook 기반 HITL 워크플로우 설정
<Steps>
<Step title="작업 구성">

View File

@@ -0,0 +1,115 @@
---
title: Galileo 갈릴레오
description: CrewAI 추적 및 평가를 위한 Galileo 통합
icon: telescope
mode: "wide"
---
## 개요
이 가이드는 **Galileo**를 **CrewAI**와 통합하는 방법을 보여줍니다.
포괄적인 추적 및 평가 엔지니어링을 위한 것입니다.
이 가이드가 끝나면 CrewAI 에이전트를 추적할 수 있게 됩니다.
성과를 모니터링하고 행동을 평가합니다.
Galileo의 강력한 관측 플랫폼.
> **갈릴레오(Galileo)란 무엇인가요?**[Galileo](https://galileo.ai/)는 AI 평가 및 관찰 가능성입니다.
엔드투엔드 추적, 평가,
AI 애플리케이션 모니터링. 이를 통해 팀은 실제 사실을 포착할 수 있습니다.
견고한 가드레일을 만들고 체계적인 실험을 실행하세요.
내장된 실험 추적 및 성능 분석으로 신뢰성 보장
AI 수명주기 전반에 걸쳐 투명성과 지속적인 개선을 제공합니다.
## 시작하기
이 튜토리얼은 [CrewAI 빠른 시작](/ko/quickstart.mdx)을 따르며 추가하는 방법을 보여줍니다.
갈릴레오의 [CrewAIEventListener](https://v2docs.galileo.ai/sdk-api/python/reference/handlers/crewai/handler),
이벤트 핸들러.
자세한 내용은 갈릴레오 문서를 참고하세요.
[CrewAI 애플리케이션에 Galileo 추가](https://v2docs.galileo.ai/how-to-guides/third-party-integrations/add-galileo-to-crewai/add-galileo-to-crewai)
방법 안내.
> **참고**이 튜토리얼에서는 [CrewAI 빠른 시작](/ko/quickstart.mdx)을 완료했다고 가정합니다.
완전한 포괄적인 예제를 원한다면 Galileo
[CrewAI SDK 예제 저장소](https://github.com/rungalileo/sdk-examples/tree/main/python/agent/crew-ai).
### 1단계: 종속성 설치
앱에 필요한 종속성을 설치합니다.
원하는 방법으로 가상 환경을 생성하고,
그런 다음 다음을 사용하여 해당 환경 내에 종속성을 설치하십시오.
선호하는 도구:
```bash
uv add galileo
```
### 2단계: [CrewAI 빠른 시작](/ko/quickstart.mdx)에서 .env 파일에 추가
```bash
# Your Galileo API key
GALILEO_API_KEY="your-galileo-api-key"
# Your Galileo project name
GALILEO_PROJECT="your-galileo-project-name"
# The name of the Log stream you want to use for logging
GALILEO_LOG_STREAM="your-galileo-log-stream "
```
### 3단계: Galileo 이벤트 리스너 추가
Galileo로 로깅을 활성화하려면 `CrewAIEventListener`의 인스턴스를 생성해야 합니다.
다음을 통해 Galileo CrewAI 핸들러 패키지를 가져옵니다.
main.py 파일 상단에 다음 코드를 추가하세요.
```python
from galileo.handlers.crewai.handler import CrewAIEventListener
```
실행 함수 시작 시 이벤트 리스너를 생성합니다.
```python
def run():
# Create the event listener
CrewAIEventListener()
# The rest of your existing code goes here
```
리스너 인스턴스를 생성하면 자동으로
CrewAI에 등록되었습니다.
### 4단계: Crew Agent 실행
CrewAI CLI를 사용하여 Crew Agent를 실행하세요.
```bash
crewai run
```
### 5단계: Galileo에서 추적 보기
승무원 에이전트가 완료되면 흔적이 플러시되어 Galileo에 나타납니다.
![Galileo trace view](/images/galileo-trace-veiw.png)
## 갈릴레오 통합 이해
Galileo는 이벤트 리스너를 등록하여 CrewAI와 통합됩니다.
승무원 실행 이벤트(예: 에이전트 작업, 도구 호출, 모델 응답)를 캡처합니다.
관찰 가능성과 평가를 위해 이를 갈릴레오에 전달합니다.
### 이벤트 리스너 이해
`CrewAIEventListener()` 인스턴스를 생성하는 것이 전부입니다.
CrewAI 실행을 위해 Galileo를 활성화하는 데 필요합니다. 인스턴스화되면 리스너는 다음을 수행합니다.
-CrewAI에 자동으로 등록됩니다.
-환경 변수에서 Galileo 구성을 읽습니다.
-모든 실행 데이터를 Galileo 프로젝트 및 다음에서 지정한 로그 스트림에 기록합니다.
`GALILEO_PROJECT` 및 `GALILEO_LOG_STREAM`
추가 구성이나 코드 변경이 필요하지 않습니다.
이 실행의 모든 데이터는 Galileo 프로젝트에 기록되며
환경 구성에 따라 지정된 로그 스트림
(예: GALILEO_PROJECT 및 GALILEO_LOG_STREAM)

View File

@@ -4,6 +4,545 @@ description: "Atualizações de produto, melhorias e correções do CrewAI"
icon: "clock"
mode: "wide"
---
<Update label="08 jan 2026">
## v1.8.0
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.8.0)
## O que Mudou
### Funcionalidades
- Adicionar cadeia async nativa para a2a
- Adicionar mecanismos de atualização a2a (poll/stream/push) com handlers e config
- Introduzir configuração global de fluxo para feedback human-in-the-loop
- Adicionar eventos de chamada de ferramenta em streaming e corrigir rastreamento de ID do provedor
- Introduzir arquitetura de Flows e Crews pronta para produção
- Adicionar HITL para Flows
- Melhorar EventListener e TraceCollectionListener para melhor tratamento de eventos
### Correções de Bugs
- Tratar dependência a2a ausente como opcional
- Corrigir busca de erro para polling de login WorkOS
- Corrigir nome de trigger errado na documentação de exemplo
### Documentação
- Atualizar documentação de webhook-streaming
- Ajustar linguagem da documentação de AOP para AMP
### Contribuidores
@Vidit-Ostwal, @greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @lucasgomide, @mplachta
</Update>
<Update label="19 dez 2025">
## v1.7.2
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.7.2)
## O que Mudou
### Correções de Bugs
- Resolver problemas de conexão
### Documentação
- Atualizar página de documentação api-reference/status
### Contribuidores
@greysonlalonde, @heitorado, @lorenzejay, @lucasgomide
</Update>
<Update label="16 dez 2025">
## v1.7.1
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.7.1)
## O que Mudou
### Melhorias
- Adicionar flag `--no-commit` ao comando bump
- Usar schema JSON para serialização de argumentos de ferramenta
### Correções de Bugs
- Corrigir exibição de mensagem de erro da resposta quando login do repositório de ferramentas falha
- Corrigir terminação graciosa de future ao executar tarefa assincronamente
- Corrigir ordenação de tarefas adicionando índice
- Corrigir verificações de compatibilidade de plataforma para sinais Windows
- Corrigir timer do controlador RPM para evitar travamento do processo
- Corrigir registro de uso de tokens e validar modelo de resposta em stream
### Documentação
- Adicionar documentação traduzida para async
- Adicionar documentação para API Deploy AOP
- Adicionar documentação para o conector agent handler
- Adicionar documentação sobre async nativo
### Contribuidores
@Llamrei, @dragosmc, @gilfeig, @greysonlalonde, @heitorado, @lorenzejay, @mattatcha, @vinibrsl
</Update>
<Update label="09 dez 2025">
## v1.7.0
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.7.0)
## O que Mudou
### Funcionalidades
- Adicionar kickoff de fluxo async
- Adicionar suporte a crew async
- Adicionar suporte a tarefa async
- Adicionar suporte a conhecimento async
- Adicionar suporte a memória async
- Adicionar suporte async para ferramentas e executor de agente; melhorar tipagem e docs
- Implementar API de extensões a2a e cache de cartão de agente async; corrigir propagação de tarefas e streaming
- Adicionar suporte a ferramenta async nativa
- Adicionar suporte a llm async
- Criar tipos de eventos sys e handler
### Correções de Bugs
- Corrigir problema para garantir que nonetypes não sejam passados para otel
- Corrigir deadlock em operações de arquivo do armazenamento de tokens
- Corrigir para garantir que span otel seja fechado
- Usar HuggingFaceEmbeddingFunction para embeddings, atualizar chaves e adicionar testes
- Corrigir para garantir que supports_tools seja true para todos os modelos anthropic suportados
- Garantir que hooks funcionem com fluxos de lite agents
### Contribuidores
@greysonlalonde, @lorenzejay
</Update>
<Update label="29 nov 2025">
## v1.6.1
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.6.1)
## O que Mudou
### Correções de Bugs
- Corrigir chamada ChatCompletionsClient para garantir funcionamento adequado
- Garantir que métodos async sejam executáveis para anotações
- Corrigir parâmetros em RagTool.add, adicionar tipagem e testes
- Remover parâmetro inválido do cliente SSE
- Apagar configuração 'oauth2_extra' no comando 'crewai config reset'
### Refatoração
- Aprimorar validação de modelo e inferência de provedor na classe LLM
### Contribuidores
@Vidit-Ostwal, @greysonlalonde, @heitorado, @lorenzejay
</Update>
<Update label="25 nov 2025">
## v1.6.0
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.6.0)
## O que Mudou
### Funcionalidades
- Adicionar suporte a resultado de streaming para fluxos e crews
- Adicionar gemini-3-pro-preview
- Suportar login CLI com Entra ID
- Adicionar ferramenta Merge Agent Handler
- Aprimorar gerenciamento de estado de eventos de fluxo
### Correções de Bugs
- Garantir que caminho de persistência de armazenamento rag personalizado seja definido se passado
- Garantir que retornos fuzzy sejam mais estritos e mostrem aviso de tipo
- Re-adicionar parâmetro response_format do openai e adicionar teste
- Corrigir configuração de embeddings da ferramenta rag
- Garantir que painel de início de execução de fluxo não seja mostrado no plot
### Documentação
- Atualizar referências de AMP para AOP na documentação
- Atualizar AMP para AOP
### Contribuidores
@Vidit-Ostwal, @gilfeig, @greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @markmcd
</Update>
<Update label="22 nov 2025">
## v0.203.2
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/0.203.2)
## O que Mudou
- Bump de versão hotfix de 0.203.1 para 0.203.2
</Update>
<Update label="16 nov 2025">
## v1.5.0
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.5.0)
## O que Mudou
### Funcionalidades
- Adicionar flag de status de conclusão remota de confiança a2a
- Buscar e armazenar mais dados sobre servidor de autorização Okta
- Implementar hooks antes e depois de chamadas LLM no CrewAgentExecutor
- Expor mensagens para TaskOutput e LiteAgentOutputs
- Aprimorar descrição de schema do QdrantVectorSearchTool
### Correções de Bugs
- Garantir que flags de instrumentação de rastreamento sejam aplicadas corretamente
- Corrigir links de documentação de ferramentas personalizadas e adicionar ação de links quebrados do Mintlify
### Documentação
- Aprimorar documentação de guardrail de tarefa com suporte a validação baseada em LLM
### Contribuidores
@danielfsbarreto, @greysonlalonde, @heitorado, @lorenzejay, @theCyberTech
</Update>
<Update label="07 nov 2025">
## v1.4.1
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.4.1)
## O que Mudou
### Correções de Bugs
- Corrigir tratamento de iterações máximas do agente
- Resolver problemas de roteamento para sintaxe de modelo LLM para provedores respeitados
### Contribuidores
@greysonlalonde
</Update>
<Update label="07 nov 2025">
## v1.4.0
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.4.0)
## O que Mudou
### Funcionalidades
- Adicionar suporte para rotas de plot não-AST
- Implementar suporte de primeira classe para MCP
- Adicionar dunder de validação Pydantic ao BaseInterceptor
- Adicionar suporte para hooks de interceptor de mensagem LLM
- Cache de prompts i18n para uso eficiente
- Aprimorar QdrantVectorSearchTool
### Correções de Bugs
- Corrigir problemas para manter stopwords atualizadas
- Resolver valores não pickleable no estado de fluxo
- Garantir que lite agents corrijam curso em erros de validação
- Corrigir hash de argumento de callback para garantir que cache funcione
- Permitir adicionar conteúdo de fonte RAG de URLs válidas
- Tornar seleção de nó de plot mais suave
- Corrigir IDs de documento duplicados para conhecimento
### Refatoração
- Melhorar tratamento de execução de ferramenta MCP com concurrent futures
- Simplificar tratamento de fluxo, tipagem e logging; atualizar UI e testes
- Refatorar gerenciamento de stop word para propriedade
### Documentação
- Migrar embedder para embedding_model e exigir vectordb em documentação de ferramentas; adicionar exemplos de provedor (en/ko/pt-BR)
### Contribuidores
@danielfsbarreto, @greysonlalonde, @lorenzejay, @lucasgomide, @tonykipkemboi
</Update>
<Update label="01 nov 2025">
## v1.3.0
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.3.0)
## O que Mudou
### Funcionalidades
- Refatorar tratamento de fluxo, tipagem e logging
- Aprimorar QdrantVectorSearchTool
### Correções de Bugs
- Corrigir ferramentas Firecrawl e adicionar testes
- Refatorar use_stop_words para propriedade e adicionar verificação para stop words
### Documentação
- Migrar embedder para embedding_model e exigir vectordb em documentação de ferramentas
- Adicionar exemplos de provedor em Inglês, Coreano e Português
### Refatoração
- Melhorar tratamento de fluxo e atualizações de UI
### Contribuidores
@danielfsbarreto, @greysonlalonde, @lorenzejay, @lucasgomide, @tonykipkemboi
</Update>
<Update label="27 out 2025">
## v1.2.1
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.2.1)
## O que Mudou
### Funcionalidades
- Adicionar suporte para integração Datadog
- Suportar apps e mcps em liteagent
### Documentação
- Descrever variável de ambiente obrigatória para chamar ferramentas Platform para cada integração
- Adicionar documentação de integração Datadog
### Contribuidores
@barieom, @lorenzejay, @lucasgomide, @sabrenner
</Update>
<Update label="24 out 2025">
## v1.2.0
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.2.0)
## O que Mudou
### Correções de Bugs
- Atualizar modelo LLM padrão e melhorar logging de erros em utilitários LLM
- Alterar diretório de visualização de fluxo e inspeção de método
### Removendo Não Utilizados
- Remover aisuite
### Contribuidores
@greysonlalonde, @lorenzejay
</Update>
<Update label="21 out 2025">
## v1.1.0
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.1.0)
## O que Mudou
### Funcionalidades
- Aprimorar InternalInstructor para suportar múltiplos provedores LLM
- Implementar base de plugin mypy
- Melhorar QdrantVectorSearchTool
### Correções de Bugs
- Corrigir links de documentação de integração quebrados
- Corrigir chamada de trace dupla e adicionar tipos
- Fixar versões de template para mais recente
### Documentação
- Atualizar detalhes e exemplos de integração LLM
### Refatoração
- Melhorar tipagem do CrewBase
### Contribuidores
@cwarre33, @danielfsbarreto, @greysonlalonde, @lorenzejay
</Update>
<Update label="20 out 2025">
## v1.0.0
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0)
## O que Mudou
### Funcionalidades
- Bump de versões para 1.0.0
- Aprimorar tratamento de eventos de conhecimento e guardrail na classe Agent
- Injetar credenciais do repositório de ferramentas no comando crewai run
### Correções de Bugs
- Preservar estrutura de condição aninhada em decoradores Flow
- Adicionar parâmetros de print padrão ao método Printer.print
- Corrigir erros quando não há input() disponível
- Adicionar margem de 10s ao decodificar JWT
- Reverter agenda cron ruim
- Corrigir agenda cron para executar a cada 5 dias em datas específicas
- Usar PATH do sistema para binário Docker em vez de caminho hardcoded
- Adicionar configuração CodeQL para excluir corretamente diretórios de template
### Documentação
- Atualizar política de segurança para relatório de vulnerabilidade
- Adicionar guia para capturar logs de telemetria no CrewAI AMP
- Adicionar arquivos /resume ausentes
- Esclarecer parâmetro de URL de webhook em workflows HITL
### Contribuidores
@Vidit-Ostwal, @greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @lucasgomide, @mplachta, @theCyberTech
</Update>
<Update label="18 out 2025">
## v1.0.0b3 (Pré-lançamento)
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0b3)
## O que Mudou
### Funcionalidades
- Aprimorar funcionalidade e validação de guardrail de tarefa
- Melhorar suporte para importar SDK nativo
- Adicionar testes nativos Azure
- Aprimorar classe BedrockCompletion com funcionalidades avançadas
- Aprimorar classe GeminiCompletion com suporte a parâmetro de cliente
- Aprimorar classe AnthropicCompletion com parâmetros de cliente adicionais
### Correções de Bugs
- Preservar estrutura de condição aninhada em decoradores Flow
- Adicionar parâmetros de print padrão ao método Printer.print
- Remover prints stdout e melhorar determinismo de teste
### Refatoração
- Converter módulo de projeto para metaclasse com tipagem completa
### Contribuidores
@greysonlalonde, @lorenzejay
</Update>
<Update label="16 out 2025">
## v1.0.0b2 (Pré-lançamento)
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0b2)
## O que Mudou
### Funcionalidades
- Aprimorar classe OpenAICompletion com parâmetros de cliente adicionais
- Melhorar segurança de thread do event bus e suporte async
- Injetar credenciais do repositório de ferramentas no comando crewai run
### Correções de Bugs
- Corrigir problema onde ocorre erro se não houver input() disponível
- Adicionar margem de 10s ao decodificar JWT
- Corrigir cópia e adicionar verificação NOT_SPECIFIED em task.py
### Documentação
- Garantir que CREWAI_PLATFORM_INTEGRATION_TOKEN seja mencionado na documentação
- Atualizar documentação de triggers
### Contribuidores
@Vidit-Ostwal, @greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @lucasgomide
</Update>
<Update label="14 out 2025">
## v1.0.0b1 (Pré-lançamento)
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0b1)
## O que Mudou
### Funcionalidades
- Aprimorar classe OpenAICompletion com parâmetros de cliente adicionais
- Melhorar segurança de thread do event bus e suporte async
- Implementar integração Bedrock LLM
### Correções de Bugs
- Corrigir problema com disponibilidade de input() ausente
- Resolver erro de decodificação JWT adicionando margem de 10 segundos
- Injetar credenciais do repositório de ferramentas no comando crewai run
- Corrigir cópia e adicionar verificação NOT_SPECIFIED em task.py
### Documentação
- Garantir que CREWAI_PLATFORM_INTEGRATION_TOKEN seja mencionado na documentação
- Atualizar documentação de triggers
### Contribuidores
@Vidit-Ostwal, @greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @lucasgomide
</Update>
<Update label="13 out 2025">
## v0.203.1
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/0.203.1)
## O que Mudou
### Melhorias e Correções do Núcleo
- Corrigida injeção de credenciais do repositório de ferramentas no comando `crewai run`
- Adicionada margem de 10 segundos ao decodificar JWTs para reduzir erros de validação de token
- Corrigida (depois revertida) correção de agenda cron destinada a executar jobs a cada 5 dias em datas específicas
### Documentação e Guias
- Atualizada política de segurança para esclarecer o processo de relatório de vulnerabilidade
</Update>
<Update label="09 out 2025">
## v1.0.0a4 (Pré-lançamento)
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0a4)
## O que Mudou
### Funcionalidades
- Aprimorar tratamento de eventos de conhecimento e guardrail na classe Agent
- Introduzir comandos de listagem e execução de trigger para desenvolvimento local
- Atualizar documentação com nova abordagem para consumir Platform Actions
- Adicionar guia para capturar logs de telemetria no CrewAI AMP
### Correções de Bugs
- Reverter agenda cron ruim
- Corrigir agenda cron para executar a cada 5 dias em datas específicas
- Remover linha duplicada e adicionar variável de ambiente explícita
### Contribuidores
@greysonlalonde, @heitorado, @joaomdmoura, @lorenzejay, @lucasgomide, @mplachta, @theCyberTech
</Update>
<Update label="07 out 2025">
## v1.0.0a3 (Pré-lançamento)
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0a3)
## O que Mudou
### Funcionalidades
- Adicionar suporte a agente para ações de plataforma
- Adicionar argumento de interpretador para ferramenta de execução de código
- Suporte direto para execução de apps de plataforma
### Documentação
- Adicionar documentação de ações de plataforma
- Adicionar tipos de transporte stdio e sse à documentação MCP
- Atualizar lista de modelos AWS
### Contribuidores
@greysonlalonde, @heitorado, @lorenzejay, @lucasgomide
</Update>
<Update label="03 out 2025">
## v1.0.0a2 (Pré-lançamento)
[Ver release no GitHub](https://github.com/crewAIInc/crewAI/releases/tag/1.0.0a2)
## O que Mudou
### Melhorias e Correções do Núcleo
- Atualizações de CI para monorepo
- Atualizar modelo Anthropic padrão para claude-sonnet-4-20250514
- Corrigir testes para atualização de modelo
### Contribuidores
@greysonlalonde, @lorenzejay
</Update>
<Update label="30 set 2025">
## v1.0.0a1

View File

@@ -309,6 +309,10 @@ Ao executar esse Flow, a saída será diferente dependendo do valor booleano ale
### Human in the Loop (feedback humano)
<Note>
O decorador `@human_feedback` requer **CrewAI versão 1.8.0 ou superior**.
</Note>
O decorador `@human_feedback` permite fluxos de trabalho human-in-the-loop, pausando a execução do flow para coletar feedback de um humano. Isso é útil para portões de aprovação, revisão de qualidade e pontos de decisão que requerem julgamento humano.
```python Code

View File

@@ -79,7 +79,7 @@ Existem diferentes locais no código do CrewAI onde você pode especificar o mod
# Configuração avançada com parâmetros detalhados
llm = LLM(
model="openai/gpt-4",
model="openai/gpt-4",
temperature=0.8,
max_tokens=150,
top_p=0.9,
@@ -207,11 +207,20 @@ Nesta seção, você encontrará exemplos detalhados que ajudam a selecionar, co
Defina sua chave de API no seu arquivo `.env`. Se precisar de uma chave, ou encontrar uma existente, verifique o [AI Studio](https://aistudio.google.com/apikey).
```toml .env
# https://ai.google.dev/gemini-api/docs/api-key
# Para API Gemini (uma das seguintes)
GOOGLE_API_KEY=<your-api-key>
GEMINI_API_KEY=<your-api-key>
# Para Vertex AI Express mode (autenticação por chave de API)
GOOGLE_GENAI_USE_VERTEXAI=true
GOOGLE_API_KEY=<your-api-key>
# Para Vertex AI com conta de serviço
GOOGLE_CLOUD_PROJECT=<your-project-id>
GOOGLE_CLOUD_LOCATION=<location> # Padrão: us-central1
```
Exemplo de uso em seu projeto CrewAI:
**Uso Básico:**
```python Code
from crewai import LLM
@@ -221,6 +230,34 @@ Nesta seção, você encontrará exemplos detalhados que ajudam a selecionar, co
)
```
**Vertex AI Express Mode (Autenticação por Chave de API):**
O Vertex AI Express mode permite usar o Vertex AI com autenticação simples por chave de API, em vez de credenciais de conta de serviço. Esta é a maneira mais rápida de começar com o Vertex AI.
Para habilitar o Express mode, defina ambas as variáveis de ambiente no seu arquivo `.env`:
```toml .env
GOOGLE_GENAI_USE_VERTEXAI=true
GOOGLE_API_KEY=<your-api-key>
```
Em seguida, use o LLM normalmente:
```python Code
from crewai import LLM
llm = LLM(
model="gemini/gemini-2.0-flash",
temperature=0.7
)
```
<Info>
Para obter uma chave de API do Express mode:
- Novos usuários do Google Cloud: Obtenha uma [chave de API do Express mode](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart?usertype=apikey)
- Usuários existentes do Google Cloud: Obtenha uma [chave de API do Google Cloud vinculada a uma conta de serviço](https://cloud.google.com/docs/authentication/api-keys)
Para mais detalhes, consulte a [documentação do Vertex AI Express mode](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart?usertype=apikey).
</Info>
### Modelos Gemini
O Google oferece uma variedade de modelos poderosos otimizados para diferentes casos de uso.
@@ -823,7 +860,7 @@ Saiba como obter o máximo da configuração do seu LLM:
Lembre-se de monitorar regularmente o uso de tokens e ajustar suas configurações para otimizar custos e desempenho.
</Info>
</Accordion>
<Accordion title="Descartar Parâmetros Adicionais">
O CrewAI usa Litellm internamente para chamadas LLM, permitindo descartar parâmetros adicionais desnecessários para seu caso de uso. Isso pode simplificar seu código e reduzir a complexidade da configuração do LLM.
Por exemplo, se não precisar enviar o parâmetro <code>stop</code>, basta omiti-lo na chamada do LLM:
@@ -882,4 +919,4 @@ Saiba como obter o máximo da configuração do seu LLM:
llm = LLM(model="openai/gpt-4o") # 128K tokens
```
</Tab>
</Tabs>
</Tabs>

View File

@@ -128,7 +128,7 @@ Ao implantar seu Flow, considere o seguinte:
### CrewAI Enterprise
A maneira mais fácil de implantar seu Flow é usando o CrewAI Enterprise. Ele lida com a infraestrutura, autenticação e monitoramento para você.
Confira o [Guia de Implantação](/pt-BR/enterprise/guides/deploy-crew) para começar.
Confira o [Guia de Implantação](/pt-BR/enterprise/guides/deploy-to-amp) para começar.
```bash
crewai deploy create

View File

@@ -91,7 +91,7 @@ Após implantar, você pode ver os detalhes da automação e usar o menu **Optio
## Relacionados
<CardGroup cols={3}>
<Card title="Implantar um Crew" href="/pt-BR/enterprise/guides/deploy-crew" icon="rocket">
<Card title="Implantar um Crew" href="/pt-BR/enterprise/guides/deploy-to-amp" icon="rocket">
Implante um Crew via GitHub ou arquivo ZIP.
</Card>
<Card title="Gatilhos de Automação" href="/pt-BR/enterprise/guides/automation-triggers" icon="trigger">

View File

@@ -79,7 +79,7 @@ Após publicar, você pode visualizar os detalhes da automação e usar o menu *
<Card title="Criar um Crew" href="/pt-BR/enterprise/guides/build-crew" icon="paintbrush">
Crie um Crew.
</Card>
<Card title="Implantar um Crew" href="/pt-BR/enterprise/guides/deploy-crew" icon="rocket">
<Card title="Implantar um Crew" href="/pt-BR/enterprise/guides/deploy-to-amp" icon="rocket">
Implante um Crew via GitHub ou ZIP.
</Card>
<Card title="Exportar um Componente React" href="/pt-BR/enterprise/guides/react-component-export" icon="download">

View File

@@ -0,0 +1,342 @@
---
title: Redação de PII para Traces
description: "Redija automaticamente dados sensíveis de traces de execução de crews e flows"
icon: "lock"
mode: "wide"
---
## Visão Geral
A Redação de PII é um recurso do CrewAI AMP que detecta e mascara automaticamente Informações de Identificação Pessoal (PII) nos traces de execução de crews e flows. Isso garante que dados sensíveis como números de cartão de crédito, CPF, endereços de e-mail e nomes não sejam expostos nos traces do CrewAI AMP. Você também pode criar reconhecedores personalizados para proteger dados específicos da sua organização.
<Info>
A Redação de PII está disponível no plano Enterprise.
A implantação deve ser versão 1.8.0 ou superior.
</Info>
<Frame>
![Visão Geral da Redação de PII](/images/enterprise/pii_mask_recognizer_trace_example.png)
</Frame>
## Por Que a Redação de PII é Importante
Ao executar agentes de IA em produção, informações sensíveis frequentemente fluem através das suas crews:
- Dados de clientes de integrações CRM
- Informações financeiras de processadores de pagamento
- Detalhes pessoais de envios de formulários
- Dados internos de funcionários
Sem a redação adequada, esses dados aparecem nos traces, tornando a conformidade com regulamentações como LGPD, HIPAA e PCI-DSS desafiadora. A Redação de PII resolve isso mascarando automaticamente dados sensíveis antes de serem armazenados nos traces.
## Como Funciona
1. **Detectar** - Escanear dados de eventos de trace para padrões de PII conhecidos
2. **Classificar** - Identificar o tipo de dado sensível (cartão de crédito, CPF, e-mail, etc.)
3. **Mascarar/Redigir** - Substituir os dados sensíveis por valores mascarados com base na sua configuração
```
Original: "Entre em contato com john.doe@company.com ou ligue para 555-123-4567"
Redigido: "Entre em contato com <EMAIL_ADDRESS> ou ligue para <PHONE_NUMBER>"
```
## Habilitando a Redação de PII
<Info>
Você deve estar no plano Enterprise e sua implantação deve ser versão 1.8.0 ou superior para usar este recurso.
</Info>
<Steps>
<Step title="Navegue até Configurações da Crew">
No painel do CrewAI AMP, selecione sua crew implantada e vá para uma de suas implantações/automações, depois navegue até **Settings** → **PII Protection**.
</Step>
<Step title="Habilitar Proteção PII">
Ative **PII Redaction for Traces**. Isso habilitará a varredura automática e redação de dados de trace.
<Info>
Você precisa habilitar manualmente a Redação de PII para cada implantação.
</Info>
<Frame>
![Habilitar Redação de PII](/images/enterprise/pii_mask_recognizer_enable.png)
</Frame>
</Step>
<Step title="Configurar Tipos de Entidade">
Selecione quais tipos de PII detectar e redigir. Cada entidade pode ser habilitada ou desabilitada individualmente.
<Frame>
![Configurar Entidades](/images/enterprise/pii_mask_recognizer_supported_entities.png)
</Frame>
</Step>
<Step title="Salvar">
Salve sua configuração. A redação de PII estará ativa em todas as execuções subsequentes da crew, sem necessidade de reimplantação.
</Step>
</Steps>
## Tipos de Entidade Suportados
O CrewAI suporta os seguintes tipos de entidade PII, organizados por categoria.
### Entidades Globais
| Entidade | Descrição | Exemplo |
|----------|-----------|---------|
| `CREDIT_CARD` | Números de cartão de crédito/débito | "4111-1111-1111-1111" |
| `CRYPTO` | Endereços de carteira de criptomoedas | "bc1qxy2kgd..." |
| `DATE_TIME` | Datas e horários | "15 de janeiro de 2024" |
| `EMAIL_ADDRESS` | Endereços de e-mail | "john@example.com" |
| `IBAN_CODE` | Números de conta bancária internacional | "DE89 3704 0044 0532 0130 00" |
| `IP_ADDRESS` | Endereços IPv4 e IPv6 | "192.168.1.1" |
| `LOCATION` | Localizações geográficas | "São Paulo" |
| `MEDICAL_LICENSE` | Números de licença médica | "CRM12345" |
| `NRP` | Nacionalidades, grupos religiosos ou políticos | - |
| `PERSON` | Nomes pessoais | "João Silva" |
| `PHONE_NUMBER` | Números de telefone em vários formatos | "+55 (11) 98765-4321" |
| `URL` | URLs da web | "https://example.com" |
### Entidades Específicas dos EUA
| Entidade | Descrição | Exemplo |
|----------|-----------|---------|
| `US_BANK_NUMBER` | Números de conta bancária dos EUA | "1234567890" |
| `US_DRIVER_LICENSE` | Números de carteira de motorista dos EUA | "D1234567" |
| `US_ITIN` | Número de Identificação de Contribuinte Individual | "900-70-0000" |
| `US_PASSPORT` | Números de passaporte dos EUA | "123456789" |
| `US_SSN` | Números de Seguro Social | "123-45-6789" |
## Ações de Redação
Para cada entidade habilitada, você pode configurar como os dados são redigidos:
| Ação | Descrição | Exemplo de Saída |
|------|-----------|------------------|
| `mask` | Substituir pelo rótulo do tipo de entidade | `<CREDIT_CARD>` |
| `redact` | Remover completamente o texto | *(vazio)* |
## Reconhecedores Personalizados
Além das entidades integradas, você pode criar **reconhecedores personalizados** para detectar padrões de PII específicos da sua organização.
<Frame>
![Reconhecedores Personalizados](/images/enterprise/pii_mask_recognizer.png)
</Frame>
### Tipos de Reconhecedores
Você tem duas opções para reconhecedores personalizados:
| Tipo | Melhor Para | Exemplo de Caso de Uso |
|------|-------------|------------------------|
| **Baseado em Padrão (Regex)** | Dados estruturados com formatos previsíveis | Valores de salário, IDs de funcionários, códigos de projeto |
| **Lista de Negação** | Correspondências exatas de strings | Nomes de empresas, codinomes internos, termos específicos |
### Criando um Reconhecedor Personalizado
<Steps>
<Step title="Navegue até Reconhecedores Personalizados">
Vá para **Settings** da Organização → **Organization** → **Add Recognizer**.
</Step>
<Step title="Configure o Reconhecedor">
<Frame>
![Configurar Reconhecedor](/images/enterprise/pii_mask_recognizer_create.png)
</Frame>
Configure os seguintes campos:
- **Name**: Um nome descritivo para o reconhecedor
- **Entity Type**: O rótulo da entidade que aparecerá na saída redigida (ex.: `EMPLOYEE_ID`, `SALARY`)
- **Type**: Escolha entre Padrão Regex ou Lista de Negação
- **Pattern/Values**: Padrão regex ou lista de strings para corresponder
- **Confidence Threshold**: Pontuação mínima (0.0-1.0) necessária para uma correspondência acionar a redação. Valores mais altos (ex.: 0.8) reduzem falsos positivos, mas podem perder algumas correspondências. Valores mais baixos (ex.: 0.5) capturam mais correspondências, mas podem redigir em excesso. O padrão é 0.8.
- **Context Words** (opcional): Palavras que aumentam a confiança de detecção quando encontradas próximas
</Step>
<Step title="Salvar">
Salve o reconhecedor. Ele estará disponível para habilitar em suas implantações.
</Step>
</Steps>
### Entendendo os Tipos de Entidade
O **Entity Type** determina como o conteúdo correspondido aparece nos traces redigidos:
```
Entity Type: SALARY
Pattern: salary:\s*\$\s*\d+
Entrada: "Salário do funcionário: $50,000"
Saída: "Salário do funcionário <SALARY>"
```
### Usando Palavras de Contexto
Palavras de contexto melhoram a precisão aumentando a confiança quando termos específicos aparecem próximos ao padrão correspondido:
```
Context Words: "project", "code", "internal"
Entity Type: PROJECT_CODE
Pattern: PRJ-\d{4}
```
Quando "project" ou "code" aparece próximo a "PRJ-1234", o reconhecedor tem maior confiança de que é uma correspondência verdadeira, reduzindo falsos positivos.
## Visualizando Traces Redigidos
Uma vez que a redação de PII está habilitada, seus traces mostrarão valores redigidos no lugar de dados sensíveis:
```
Task Output: "Cliente <PERSON> fez o pedido #12345.
E-mail de contato: <EMAIL_ADDRESS>, telefone: <PHONE_NUMBER>.
Pagamento processado para cartão terminando em <CREDIT_CARD>."
```
Os valores redigidos são claramente marcados com colchetes angulares e o rótulo do tipo de entidade (ex.: `<EMAIL_ADDRESS>`), facilitando entender quais dados foram protegidos enquanto ainda permite depurar e monitorar o comportamento da crew.
## Melhores Práticas
### Considerações de Desempenho
<Steps>
<Step title="Habilite Apenas Entidades Necessárias">
Cada entidade habilitada adiciona sobrecarga de processamento. Habilite apenas entidades relevantes para seus dados.
</Step>
<Step title="Use Padrões Específicos">
Para reconhecedores personalizados, use padrões específicos para reduzir falsos positivos e melhorar o desempenho. Padrões regex são melhores para identificar padrões específicos nos traces como salário, ID de funcionário, código de projeto, etc. Reconhecedores de lista de negação são melhores para identificar strings exatas nos traces como nomes de empresas, codinomes internos, etc.
</Step>
<Step title="Aproveite Palavras de Contexto">
Palavras de contexto melhoram a precisão acionando a detecção apenas quando o texto circundante corresponde.
</Step>
</Steps>
## Solução de Problemas
<Accordion title="PII Não Está Sendo Redigido">
**Possíveis Causas:**
- Tipo de entidade não habilitado na configuração
- Padrão não corresponde ao formato dos dados
- Reconhecedor personalizado tem erros de sintaxe
**Soluções:**
- Verifique se a entidade está habilitada em Settings → Security
- Teste padrões regex com dados de amostra
- Verifique logs para erros de configuração
</Accordion>
<Accordion title="Muitos Dados Estão Sendo Redigidos">
**Possíveis Causas:**
- Tipos de entidade muito amplos habilitados (ex.: `DATE_TIME` captura datas em todos os lugares)
- Padrões de reconhecedor personalizado são muito gerais
**Soluções:**
- Desabilite entidades que causam falsos positivos
- Torne padrões personalizados mais específicos
- Adicione palavras de contexto para melhorar a precisão
</Accordion>
<Accordion title="Problemas de Desempenho">
**Possíveis Causas:**
- Muitas entidades habilitadas
- Entidades baseadas em NLP (`PERSON`, `LOCATION`, `NRP`) são computacionalmente caras pois usam modelos de machine learning
**Soluções:**
- Habilite apenas entidades que você realmente precisa
- Considere usar alternativas baseadas em padrão quando possível
- Monitore tempos de processamento de trace no painel
</Accordion>
---
## Exemplo Prático: Correspondência de Padrão de Salário
Este exemplo demonstra como criar um reconhecedor personalizado para detectar e mascarar informações de salário em seus traces.
### Caso de Uso
Sua crew processa dados de funcionários ou financeiros que incluem informações de salário em formatos como:
- `salary: $50,000`
- `salary: $125,000.00`
- `salary:$1,500.50`
Você deseja mascarar automaticamente esses valores para proteger dados sensíveis de remuneração.
### Configuração
<Frame>
![Configuração do Reconhecedor de Salário](/images/enterprise/pii_mask_custom_recognizer_salary.png)
</Frame>
| Campo | Valor |
|-------|-------|
| **Name** | `SALARY` |
| **Entity Type** | `SALARY` |
| **Type** | Regex Pattern |
| **Regex Pattern** | `salary:\s*\$\s*\d{1,3}(,\d{3})*(\.\d{2})?` |
| **Action** | Mask |
| **Confidence Threshold** | `0.8` |
| **Context Words** | `salary, compensation, pay, wage, income` |
### Análise do Padrão Regex
| Componente do Padrão | Significado |
|----------------------|-------------|
| `salary:` | Corresponde ao texto literal "salary:" |
| `\s*` | Corresponde a zero ou mais caracteres de espaço em branco |
| `\$` | Corresponde ao sinal de dólar (escapado) |
| `\s*` | Corresponde a zero ou mais caracteres de espaço em branco após $ |
| `\d{1,3}` | Corresponde a 1-3 dígitos (ex.: "1", "50", "125") |
| `(,\d{3})*` | Corresponde a milhares separados por vírgula (ex.: ",000", ",500,000") |
| `(\.\d{2})?` | Opcionalmente corresponde a centavos (ex.: ".00", ".50") |
### Resultados de Exemplo
```
Original: "Registro do funcionário mostra salary: $125,000.00 anualmente"
Redigido: "Registro do funcionário mostra <SALARY> anualmente"
Original: "Salário base salary:$50,000 com potencial de bônus"
Redigido: "Salário base <SALARY> com potencial de bônus"
```
<Tip>
Adicionar palavras de contexto como "salary", "compensation", "pay", "wage" e "income" ajuda a aumentar a confiança de detecção quando esses termos aparecem próximos ao padrão correspondido, reduzindo falsos positivos.
</Tip>
### Habilite o Reconhecedor para Suas Implantações
<Warning>
Criar um reconhecedor personalizado no nível da organização não o habilita automaticamente para suas implantações. Você deve habilitar manualmente cada reconhecedor para cada implantação onde deseja aplicá-lo.
</Warning>
Após criar seu reconhecedor personalizado, habilite-o para cada implantação:
<Steps>
<Step title="Navegue até Sua Implantação">
Vá para sua implantação/automação e abra **Settings** → **PII Protection**.
</Step>
<Step title="Selecione Reconhecedores Personalizados">
Em **Mask Recognizers**, você verá os reconhecedores definidos pela sua organização. Marque a caixa ao lado dos reconhecedores que deseja habilitar.
<Frame>
![Habilitar Reconhecedor Personalizado](/images/enterprise/pii_mask_recognizers_options.png)
</Frame>
</Step>
<Step title="Salvar Configuração">
Salve suas alterações. O reconhecedor estará ativo em todas as execuções subsequentes para esta implantação.
</Step>
</Steps>
<Info>
Repita este processo para cada implantação onde você precisa do reconhecedor personalizado. Isso oferece controle granular sobre quais reconhecedores estão ativos em diferentes ambientes (ex.: desenvolvimento vs. produção).
</Info>

View File

@@ -1,304 +0,0 @@
---
title: "Deploy Crew"
description: "Implantando um Crew na CrewAI AMP"
icon: "rocket"
mode: "wide"
---
<Note>
Depois de criar um crew localmente ou pelo Crew Studio, o próximo passo é
implantá-lo na plataforma CrewAI AMP. Este guia cobre múltiplos métodos de
implantação para ajudá-lo a escolher a melhor abordagem para o seu fluxo de
trabalho.
</Note>
## Pré-requisitos
<CardGroup cols={2}>
<Card title="Crew Pronto para Implantação" icon="users">
Você deve ter um crew funcional, criado localmente ou pelo Crew Studio
</Card>
<Card title="Repositório GitHub" icon="github">
O código do seu crew deve estar em um repositório do GitHub (para o método
de integração com GitHub)
</Card>
</CardGroup>
## Opção 1: Implantar Usando o CrewAI CLI
A CLI fornece a maneira mais rápida de implantar crews desenvolvidos localmente na plataforma Enterprise.
<Steps>
<Step title="Instale o CrewAI CLI">
Se ainda não tiver, instale o CrewAI CLI:
```bash
pip install crewai[tools]
```
<Tip>
A CLI vem com o pacote principal CrewAI, mas o extra `[tools]` garante todas as dependências de implantação.
</Tip>
</Step>
<Step title="Autentique-se na Plataforma Enterprise">
Primeiro, você precisa autenticar sua CLI com a plataforma CrewAI AMP:
```bash
# Se já possui uma conta CrewAI AMP, ou deseja criar uma:
crewai login
```
Ao executar qualquer um dos comandos, a CLI irá:
1. Exibir uma URL e um código de dispositivo único
2. Abrir seu navegador para a página de autenticação
3. Solicitar a confirmação do dispositivo
4. Completar o processo de autenticação
Após a autenticação bem-sucedida, você verá uma mensagem de confirmação no terminal!
</Step>
<Step title="Criar uma Implantação">
No diretório do seu projeto, execute:
```bash
crewai deploy create
```
Este comando irá:
1. Detectar informações do seu repositório GitHub
2. Identificar variáveis de ambiente no seu arquivo `.env` local
3. Transferir essas variáveis com segurança para a plataforma Enterprise
4. Criar uma nova implantação com um identificador único
Com a criação bem-sucedida, você verá uma mensagem como:
```shell
Deployment created successfully!
Name: your_project_name
Deployment ID: 01234567-89ab-cdef-0123-456789abcdef
Current Status: Deploy Enqueued
```
</Step>
<Step title="Acompanhe o Progresso da Implantação">
Acompanhe o status da implantação com:
```bash
crewai deploy status
```
Para ver logs detalhados do processo de build:
```bash
crewai deploy logs
```
<Tip>
A primeira implantação normalmente leva de 10 a 15 minutos, pois as imagens dos containers são construídas. As próximas implantações são bem mais rápidas.
</Tip>
</Step>
</Steps>
## Comandos Adicionais da CLI
O CrewAI CLI oferece vários comandos para gerenciar suas implantações:
```bash
# Liste todas as suas implantações
crewai deploy list
# Consulte o status de uma implantação
crewai deploy status
# Veja os logs da implantação
crewai deploy logs
# Envie atualizações após alterações no código
crewai deploy push
# Remova uma implantação
crewai deploy remove <deployment_id>
```
## Opção 2: Implantar Diretamente pela Interface Web
Você também pode implantar seus crews diretamente pela interface web da CrewAI AMP conectando sua conta do GitHub. Esta abordagem não requer utilizar a CLI na sua máquina local.
<Steps>
<Step title="Enviar no GitHub">
Você precisa subir seu crew para um repositório do GitHub. Caso ainda não tenha criado um crew, você pode [seguir este tutorial](/pt-BR/quickstart).
</Step>
<Step title="Conectando o GitHub ao CrewAI AMP">
1. Faça login em [CrewAI AMP](https://app.crewai.com)
2. Clique no botão "Connect GitHub"
<Frame>
![Botão Connect GitHub](/images/enterprise/connect-github.png)
</Frame>
</Step>
<Step title="Selecionar o Repositório">
Após conectar sua conta GitHub, você poderá selecionar qual repositório deseja implantar:
<Frame>
![Selecionar Repositório](/images/enterprise/select-repo.png)
</Frame>
</Step>
<Step title="Definir as Variáveis de Ambiente">
Antes de implantar, você precisará configurar as variáveis de ambiente para conectar ao seu provedor de LLM ou outros serviços:
1. Você pode adicionar variáveis individualmente ou em lote
2. Digite suas variáveis no formato `KEY=VALUE` (uma por linha)
<Frame>
![Definir Variáveis de Ambiente](/images/enterprise/set-env-variables.png)
</Frame>
</Step>
<Step title="Implante Seu Crew">
1. Clique no botão "Deploy" para iniciar o processo de implantação
2. Você pode monitorar o progresso pela barra de progresso
3. A primeira implantação geralmente demora de 10 a 15 minutos; as próximas serão mais rápidas
<Frame>
![Progresso da Implantação](/images/enterprise/deploy-progress.png)
</Frame>
Após a conclusão, você verá:
- A URL exclusiva do seu crew
- Um Bearer token para proteger sua API crew
- Um botão "Delete" caso precise remover a implantação
</Step>
</Steps>
## ⚠️ Requisitos de Segurança para Variáveis de Ambiente
<Warning>
**Importante**: A CrewAI AMP possui restrições de segurança sobre os nomes de
variáveis de ambiente que podem causar falha na implantação caso não sejam
seguidas.
</Warning>
### Padrões de Variáveis de Ambiente Bloqueados
Por motivos de segurança, os seguintes padrões de nome de variável de ambiente são **automaticamente filtrados** e causarão problemas de implantação:
**Padrões Bloqueados:**
- Variáveis terminando em `_TOKEN` (ex: `MY_API_TOKEN`)
- Variáveis terminando em `_PASSWORD` (ex: `DB_PASSWORD`)
- Variáveis terminando em `_SECRET` (ex: `API_SECRET`)
- Variáveis terminando em `_KEY` em certos contextos
**Variáveis Bloqueadas Específicas:**
- `GITHUB_USER`, `GITHUB_TOKEN`
- `AWS_REGION`, `AWS_DEFAULT_REGION`
- Diversas variáveis internas do sistema CrewAI
### Exceções Permitidas
Algumas variáveis são explicitamente permitidas mesmo coincidindo com os padrões bloqueados:
- `AZURE_AD_TOKEN`
- `AZURE_OPENAI_AD_TOKEN`
- `ENTERPRISE_ACTION_TOKEN`
- `CREWAI_ENTEPRISE_TOOLS_TOKEN`
### Como Corrigir Problemas de Nomeação
Se sua implantação falhar devido a restrições de variáveis de ambiente:
```bash
# ❌ Estas irão causar falhas na implantação
OPENAI_TOKEN=sk-...
DATABASE_PASSWORD=mysenha
API_SECRET=segredo123
# ✅ Utilize estes padrões de nomeação
OPENAI_API_KEY=sk-...
DATABASE_CREDENTIALS=mysenha
API_CONFIG=segredo123
```
### Melhores Práticas
1. **Use convenções padrão de nomenclatura**: `PROVIDER_API_KEY` em vez de `PROVIDER_TOKEN`
2. **Teste localmente primeiro**: Certifique-se de que seu crew funciona com as variáveis renomeadas
3. **Atualize seu código**: Altere todas as referências aos nomes antigos das variáveis
4. **Documente as mudanças**: Mantenha registro das variáveis renomeadas para seu time
<Tip>
Se você se deparar com falhas de implantação com erros enigmáticos de
variáveis de ambiente, confira primeiro os nomes das variáveis em relação a
esses padrões.
</Tip>
### Interaja com Seu Crew Implantado
Após a implantação, você pode acessar seu crew por meio de:
1. **REST API**: A plataforma gera um endpoint HTTPS exclusivo com estas rotas principais:
- `/inputs`: Lista os parâmetros de entrada requeridos
- `/kickoff`: Inicia uma execução com os inputs fornecidos
- `/status/{kickoff_id}`: Consulta o status da execução
2. **Interface Web**: Acesse [app.crewai.com](https://app.crewai.com) para visualizar:
- **Aba Status**: Informações da implantação, detalhes do endpoint da API e token de autenticação
- **Aba Run**: Visualização da estrutura do seu crew
- **Aba Executions**: Histórico de todas as execuções
- **Aba Metrics**: Análises de desempenho
- **Aba Traces**: Insights detalhados das execuções
### Dispare uma Execução
No dashboard Enterprise, você pode:
1. Clicar no nome do seu crew para abrir seus detalhes
2. Selecionar "Trigger Crew" na interface de gerenciamento
3. Inserir os inputs necessários no modal exibido
4. Monitorar o progresso à medida que a execução avança pelo pipeline
### Monitoramento e Análises
A plataforma Enterprise oferece recursos abrangentes de observabilidade:
- **Gestão das Execuções**: Acompanhe execuções ativas e concluídas
- **Traces**: Quebra detalhada de cada execução
- **Métricas**: Uso de tokens, tempos de execução e custos
- **Visualização em Linha do Tempo**: Representação visual das sequências de tarefas
### Funcionalidades Avançadas
A plataforma Enterprise também oferece:
- **Gerenciamento de Variáveis de Ambiente**: Armazene e gerencie com segurança as chaves de API
- **Conexões com LLM**: Configure integrações com diversos provedores de LLM
- **Repositório Custom Tools**: Crie, compartilhe e instale ferramentas
- **Crew Studio**: Monte crews via interface de chat sem escrever código
<Card title="Precisa de Ajuda?" icon="headset" href="mailto:support@crewai.com">
Entre em contato com nossa equipe de suporte para ajuda com questões de
implantação ou dúvidas sobre a plataforma Enterprise.
</Card>

View File

@@ -0,0 +1,439 @@
---
title: "Deploy para AMP"
description: "Implante seu Crew ou Flow no CrewAI AMP"
icon: "rocket"
mode: "wide"
---
<Note>
Depois de criar um Crew ou Flow localmente (ou pelo Crew Studio), o próximo passo é
implantá-lo na plataforma CrewAI AMP. Este guia cobre múltiplos métodos de
implantação para ajudá-lo a escolher a melhor abordagem para o seu fluxo de trabalho.
</Note>
## Pré-requisitos
<CardGroup cols={2}>
<Card title="Projeto Pronto para Implantação" icon="check-circle">
Você deve ter um Crew ou Flow funcionando localmente com sucesso.
Siga nosso [guia de preparação](/pt-BR/enterprise/guides/prepare-for-deployment) para verificar a estrutura do seu projeto.
</Card>
<Card title="Repositório GitHub" icon="github">
Seu código deve estar em um repositório do GitHub (para o método de integração com GitHub).
</Card>
</CardGroup>
<Info>
**Crews vs Flows**: Ambos os tipos de projeto podem ser implantados como "automações" no CrewAI AMP.
O processo de implantação é o mesmo, mas eles têm estruturas de projeto diferentes.
Veja [Preparar para Implantação](/pt-BR/enterprise/guides/prepare-for-deployment) para detalhes.
</Info>
## Opção 1: Implantar Usando o CrewAI CLI
A CLI fornece a maneira mais rápida de implantar Crews ou Flows desenvolvidos localmente na plataforma AMP.
A CLI detecta automaticamente o tipo do seu projeto a partir do `pyproject.toml` e faz o build adequadamente.
<Steps>
<Step title="Instale o CrewAI CLI">
Se ainda não tiver, instale o CrewAI CLI:
```bash
pip install crewai[tools]
```
<Tip>
A CLI vem com o pacote principal CrewAI, mas o extra `[tools]` garante todas as dependências de implantação.
</Tip>
</Step>
<Step title="Autentique-se na Plataforma Enterprise">
Primeiro, você precisa autenticar sua CLI com a plataforma CrewAI AMP:
```bash
# Se já possui uma conta CrewAI AMP, ou deseja criar uma:
crewai login
```
Ao executar qualquer um dos comandos, a CLI irá:
1. Exibir uma URL e um código de dispositivo único
2. Abrir seu navegador para a página de autenticação
3. Solicitar a confirmação do dispositivo
4. Completar o processo de autenticação
Após a autenticação bem-sucedida, você verá uma mensagem de confirmação no terminal!
</Step>
<Step title="Criar uma Implantação">
No diretório do seu projeto, execute:
```bash
crewai deploy create
```
Este comando irá:
1. Detectar informações do seu repositório GitHub
2. Identificar variáveis de ambiente no seu arquivo `.env` local
3. Transferir essas variáveis com segurança para a plataforma Enterprise
4. Criar uma nova implantação com um identificador único
Com a criação bem-sucedida, você verá uma mensagem como:
```shell
Deployment created successfully!
Name: your_project_name
Deployment ID: 01234567-89ab-cdef-0123-456789abcdef
Current Status: Deploy Enqueued
```
</Step>
<Step title="Acompanhe o Progresso da Implantação">
Acompanhe o status da implantação com:
```bash
crewai deploy status
```
Para ver logs detalhados do processo de build:
```bash
crewai deploy logs
```
<Tip>
A primeira implantação normalmente leva de 10 a 15 minutos, pois as imagens dos containers são construídas. As próximas implantações são bem mais rápidas.
</Tip>
</Step>
</Steps>
## Comandos Adicionais da CLI
O CrewAI CLI oferece vários comandos para gerenciar suas implantações:
```bash
# Liste todas as suas implantações
crewai deploy list
# Consulte o status de uma implantação
crewai deploy status
# Veja os logs da implantação
crewai deploy logs
# Envie atualizações após alterações no código
crewai deploy push
# Remova uma implantação
crewai deploy remove <deployment_id>
```
## Opção 2: Implantar Diretamente pela Interface Web
Você também pode implantar seus Crews ou Flows diretamente pela interface web do CrewAI AMP conectando sua conta do GitHub. Esta abordagem não requer utilizar a CLI na sua máquina local. A plataforma detecta automaticamente o tipo do seu projeto e trata o build adequadamente.
<Steps>
<Step title="Enviar para o GitHub">
Você precisa enviar seu crew para um repositório do GitHub. Caso ainda não tenha criado um crew, você pode [seguir este tutorial](/pt-BR/quickstart).
</Step>
<Step title="Conectando o GitHub ao CrewAI AMP">
1. Faça login em [CrewAI AMP](https://app.crewai.com)
2. Clique no botão "Connect GitHub"
<Frame>
![Botão Connect GitHub](/images/enterprise/connect-github.png)
</Frame>
</Step>
<Step title="Selecionar o Repositório">
Após conectar sua conta GitHub, você poderá selecionar qual repositório deseja implantar:
<Frame>
![Selecionar Repositório](/images/enterprise/select-repo.png)
</Frame>
</Step>
<Step title="Definir as Variáveis de Ambiente">
Antes de implantar, você precisará configurar as variáveis de ambiente para conectar ao seu provedor de LLM ou outros serviços:
1. Você pode adicionar variáveis individualmente ou em lote
2. Digite suas variáveis no formato `KEY=VALUE` (uma por linha)
<Frame>
![Definir Variáveis de Ambiente](/images/enterprise/set-env-variables.png)
</Frame>
</Step>
<Step title="Implante Seu Crew">
1. Clique no botão "Deploy" para iniciar o processo de implantação
2. Você pode monitorar o progresso pela barra de progresso
3. A primeira implantação geralmente demora de 10 a 15 minutos; as próximas serão mais rápidas
<Frame>
![Progresso da Implantação](/images/enterprise/deploy-progress.png)
</Frame>
Após a conclusão, você verá:
- A URL exclusiva do seu crew
- Um Bearer token para proteger sua API crew
- Um botão "Delete" caso precise remover a implantação
</Step>
</Steps>
## Opção 3: Reimplantar Usando API (Integração CI/CD)
Para implantações automatizadas em pipelines CI/CD, você pode usar a API do CrewAI para acionar reimplantações de crews existentes. Isso é particularmente útil para GitHub Actions, Jenkins ou outros workflows de automação.
<Steps>
<Step title="Obtenha Seu Token de Acesso Pessoal">
Navegue até as configurações da sua conta CrewAI AMP para gerar um token de API:
1. Acesse [app.crewai.com](https://app.crewai.com)
2. Clique em **Settings** → **Account** → **Personal Access Token**
3. Gere um novo token e copie-o com segurança
4. Armazene este token como um secret no seu sistema CI/CD
</Step>
<Step title="Encontre o UUID da Sua Automação">
Localize o identificador único do seu crew implantado:
1. Acesse **Automations** no seu dashboard CrewAI AMP
2. Selecione sua automação/crew existente
3. Clique em **Additional Details**
4. Copie o **UUID** - este identifica sua implantação específica do crew
</Step>
<Step title="Acione a Reimplantação via API">
Use o endpoint da API de Deploy para acionar uma reimplantação:
```bash
curl -i -X POST \
-H "Authorization: Bearer YOUR_PERSONAL_ACCESS_TOKEN" \
https://app.crewai.com/crewai_plus/api/v1/crews/YOUR-AUTOMATION-UUID/deploy
# HTTP/2 200
# content-type: application/json
#
# {
# "uuid": "your-automation-uuid",
# "status": "Deploy Enqueued",
# "public_url": "https://your-crew-deployment.crewai.com",
# "token": "your-bearer-token"
# }
```
<Info>
Se sua automação foi criada originalmente conectada ao Git, a API automaticamente puxará as últimas alterações do seu repositório antes de reimplantar.
</Info>
</Step>
<Step title="Exemplo de Integração com GitHub Actions">
Aqui está um workflow do GitHub Actions com gatilhos de implantação mais complexos:
```yaml
name: Deploy CrewAI Automation
on:
push:
branches: [ main ]
pull_request:
types: [ labeled ]
release:
types: [ published ]
jobs:
deploy:
runs-on: ubuntu-latest
if: |
(github.event_name == 'push' && github.ref == 'refs/heads/main') ||
(github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'deploy')) ||
(github.event_name == 'release')
steps:
- name: Trigger CrewAI Redeployment
run: |
curl -X POST \
-H "Authorization: Bearer ${{ secrets.CREWAI_PAT }}" \
https://app.crewai.com/crewai_plus/api/v1/crews/${{ secrets.CREWAI_AUTOMATION_UUID }}/deploy
```
<Tip>
Adicione `CREWAI_PAT` e `CREWAI_AUTOMATION_UUID` como secrets do repositório. Para implantações de PR, adicione um label "deploy" para acionar o workflow.
</Tip>
</Step>
</Steps>
## Interaja com Sua Automação Implantada
Após a implantação, você pode acessar seu crew através de:
1. **REST API**: A plataforma gera um endpoint HTTPS exclusivo com estas rotas principais:
- `/inputs`: Lista os parâmetros de entrada requeridos
- `/kickoff`: Inicia uma execução com os inputs fornecidos
- `/status/{kickoff_id}`: Consulta o status da execução
2. **Interface Web**: Acesse [app.crewai.com](https://app.crewai.com) para visualizar:
- **Aba Status**: Informações da implantação, detalhes do endpoint da API e token de autenticação
- **Aba Run**: Visualização da estrutura do seu crew
- **Aba Executions**: Histórico de todas as execuções
- **Aba Metrics**: Análises de desempenho
- **Aba Traces**: Insights detalhados das execuções
### Dispare uma Execução
No dashboard Enterprise, você pode:
1. Clicar no nome do seu crew para abrir seus detalhes
2. Selecionar "Trigger Crew" na interface de gerenciamento
3. Inserir os inputs necessários no modal exibido
4. Monitorar o progresso à medida que a execução avança pelo pipeline
### Monitoramento e Análises
A plataforma Enterprise oferece recursos abrangentes de observabilidade:
- **Gestão das Execuções**: Acompanhe execuções ativas e concluídas
- **Traces**: Quebra detalhada de cada execução
- **Métricas**: Uso de tokens, tempos de execução e custos
- **Visualização em Linha do Tempo**: Representação visual das sequências de tarefas
### Funcionalidades Avançadas
A plataforma Enterprise também oferece:
- **Gerenciamento de Variáveis de Ambiente**: Armazene e gerencie com segurança as chaves de API
- **Conexões com LLM**: Configure integrações com diversos provedores de LLM
- **Repositório Custom Tools**: Crie, compartilhe e instale ferramentas
- **Crew Studio**: Monte crews via interface de chat sem escrever código
## Solução de Problemas em Falhas de Implantação
Se sua implantação falhar, verifique estes problemas comuns:
### Falhas de Build
#### Arquivo uv.lock Ausente
**Sintoma**: Build falha no início com erros de resolução de dependências
**Solução**: Gere e faça commit do arquivo lock:
```bash
uv lock
git add uv.lock
git commit -m "Add uv.lock for deployment"
git push
```
<Warning>
O arquivo `uv.lock` é obrigatório para todas as implantações. Sem ele, a plataforma
não consegue instalar suas dependências de forma confiável.
</Warning>
#### Estrutura de Projeto Incorreta
**Sintoma**: Erros "Could not find entry point" ou "Module not found"
**Solução**: Verifique se seu projeto corresponde à estrutura esperada:
- **Tanto Crews quanto Flows**: Devem ter ponto de entrada em `src/project_name/main.py`
- **Crews**: Usam uma função `run()` como ponto de entrada
- **Flows**: Usam uma função `kickoff()` como ponto de entrada
Veja [Preparar para Implantação](/pt-BR/enterprise/guides/prepare-for-deployment) para diagramas de estrutura detalhados.
#### Decorador CrewBase Ausente
**Sintoma**: Erros "Crew not found", "Config not found" ou erros de configuração de agent/task
**Solução**: Certifique-se de que **todas** as classes crew usam o decorador `@CrewBase`:
```python
from crewai.project import CrewBase, agent, crew, task
@CrewBase # Este decorador é OBRIGATÓRIO
class YourCrew():
"""Descrição do seu crew"""
@agent
def my_agent(self) -> Agent:
return Agent(
config=self.agents_config['my_agent'], # type: ignore[index]
verbose=True
)
# ... resto da definição do crew
```
<Info>
Isso se aplica a Crews independentes E crews embutidos dentro de projetos Flow.
Toda classe crew precisa do decorador.
</Info>
#### Tipo Incorreto no pyproject.toml
**Sintoma**: Build tem sucesso mas falha em runtime, ou comportamento inesperado
**Solução**: Verifique se a seção `[tool.crewai]` corresponde ao tipo do seu projeto:
```toml
# Para projetos Crew:
[tool.crewai]
type = "crew"
# Para projetos Flow:
[tool.crewai]
type = "flow"
```
### Falhas de Runtime
#### Falhas de Conexão com LLM
**Sintoma**: Erros de chave API, "model not found" ou falhas de autenticação
**Solução**:
1. Verifique se a chave API do seu provedor LLM está corretamente definida nas variáveis de ambiente
2. Certifique-se de que os nomes das variáveis de ambiente correspondem ao que seu código espera
3. Teste localmente com exatamente as mesmas variáveis de ambiente antes de implantar
#### Erros de Execução do Crew
**Sintoma**: Crew inicia mas falha durante a execução
**Solução**:
1. Verifique os logs de execução no dashboard AMP (aba Traces)
2. Verifique se todas as ferramentas têm as chaves API necessárias configuradas
3. Certifique-se de que as configurações de agents em `agents.yaml` são válidas
4. Verifique se há erros de sintaxe nas configurações de tasks em `tasks.yaml`
<Card title="Precisa de Ajuda?" icon="headset" href="mailto:support@crewai.com">
Entre em contato com nossa equipe de suporte para ajuda com questões de
implantação ou dúvidas sobre a plataforma AMP.
</Card>

View File

@@ -0,0 +1,305 @@
---
title: "Preparar para Implantação"
description: "Certifique-se de que seu Crew ou Flow está pronto para implantação no CrewAI AMP"
icon: "clipboard-check"
mode: "wide"
---
<Note>
Antes de implantar no CrewAI AMP, é crucial verificar se seu projeto está estruturado corretamente.
Tanto Crews quanto Flows podem ser implantados como "automações", mas eles têm estruturas de projeto
e requisitos diferentes que devem ser atendidos para uma implantação bem-sucedida.
</Note>
## Entendendo Automações
No CrewAI AMP, **automações** é o termo geral para projetos de IA Agêntica implantáveis. Uma automação pode ser:
- **Um Crew**: Uma equipe independente de agentes de IA trabalhando juntos em tarefas
- **Um Flow**: Um workflow orquestrado que pode combinar múltiplos crews, chamadas diretas de LLM e lógica procedural
Entender qual tipo você está implantando é essencial porque eles têm estruturas de projeto e pontos de entrada diferentes.
## Crews vs Flows: Principais Diferenças
<CardGroup cols={2}>
<Card title="Projetos Crew" icon="users">
Equipes de agentes de IA independentes com `crew.py` definindo agentes e tarefas. Ideal para tarefas focadas e colaborativas.
</Card>
<Card title="Projetos Flow" icon="diagram-project">
Workflows orquestrados com crews embutidos em uma pasta `crews/`. Ideal para processos complexos de múltiplas etapas.
</Card>
</CardGroup>
| Aspecto | Crew | Flow |
|---------|------|------|
| **Estrutura do projeto** | `src/project_name/` com `crew.py` | `src/project_name/` com pasta `crews/` |
| **Localização da lógica principal** | `src/project_name/crew.py` | `src/project_name/main.py` (classe Flow) |
| **Função de ponto de entrada** | `run()` em `main.py` | `kickoff()` em `main.py` |
| **Tipo no pyproject.toml** | `type = "crew"` | `type = "flow"` |
| **Comando CLI de criação** | `crewai create crew name` | `crewai create flow name` |
| **Localização da configuração** | `src/project_name/config/` | `src/project_name/crews/crew_name/config/` |
| **Pode conter outros crews** | Não | Sim (na pasta `crews/`) |
## Referência de Estrutura de Projeto
### Estrutura de Projeto Crew
Quando você executa `crewai create crew my_crew`, você obtém esta estrutura:
```
my_crew/
├── .gitignore
├── pyproject.toml # Deve ter type = "crew"
├── README.md
├── .env
├── uv.lock # OBRIGATÓRIO para implantação
└── src/
└── my_crew/
├── __init__.py
├── main.py # Ponto de entrada com função run()
├── crew.py # Classe Crew com decorador @CrewBase
├── tools/
│ ├── custom_tool.py
│ └── __init__.py
└── config/
├── agents.yaml # Definições de agentes
└── tasks.yaml # Definições de tarefas
```
<Warning>
A estrutura aninhada `src/project_name/` é crítica para Crews.
Colocar arquivos no nível errado causará falhas na implantação.
</Warning>
### Estrutura de Projeto Flow
Quando você executa `crewai create flow my_flow`, você obtém esta estrutura:
```
my_flow/
├── .gitignore
├── pyproject.toml # Deve ter type = "flow"
├── README.md
├── .env
├── uv.lock # OBRIGATÓRIO para implantação
└── src/
└── my_flow/
├── __init__.py
├── main.py # Ponto de entrada com função kickoff() + classe Flow
├── crews/ # Pasta de crews embutidos
│ └── poem_crew/
│ ├── __init__.py
│ ├── poem_crew.py # Crew com decorador @CrewBase
│ └── config/
│ ├── agents.yaml
│ └── tasks.yaml
└── tools/
├── __init__.py
└── custom_tool.py
```
<Info>
Tanto Crews quanto Flows usam a estrutura `src/project_name/`.
A diferença chave é que Flows têm uma pasta `crews/` para crews embutidos,
enquanto Crews têm `crew.py` diretamente na pasta do projeto.
</Info>
## Checklist Pré-Implantação
Use este checklist para verificar se seu projeto está pronto para implantação.
### 1. Verificar Configuração do pyproject.toml
Seu `pyproject.toml` deve incluir a seção `[tool.crewai]` correta:
<Tabs>
<Tab title="Para Crews">
```toml
[tool.crewai]
type = "crew"
```
</Tab>
<Tab title="Para Flows">
```toml
[tool.crewai]
type = "flow"
```
</Tab>
</Tabs>
<Warning>
Se o `type` não corresponder à estrutura do seu projeto, o build falhará ou
a automação não funcionará corretamente.
</Warning>
### 2. Garantir que o Arquivo uv.lock Existe
CrewAI usa `uv` para gerenciamento de dependências. O arquivo `uv.lock` garante builds reproduzíveis e é **obrigatório** para implantação.
```bash
# Gerar ou atualizar o arquivo lock
uv lock
# Verificar se existe
ls -la uv.lock
```
Se o arquivo não existir, execute `uv lock` e faça commit no seu repositório:
```bash
uv lock
git add uv.lock
git commit -m "Add uv.lock for deployment"
git push
```
### 3. Validar Uso do Decorador CrewBase
**Toda classe crew deve usar o decorador `@CrewBase`.** Isso se aplica a:
- Projetos crew independentes
- Crews embutidos dentro de projetos Flow
```python
from crewai import Agent, Crew, Process, Task
from crewai.project import CrewBase, agent, crew, task
from crewai.agents.agent_builder.base_agent import BaseAgent
from typing import List
@CrewBase # Este decorador é OBRIGATÓRIO
class MyCrew():
"""Descrição do meu crew"""
agents: List[BaseAgent]
tasks: List[Task]
@agent
def my_agent(self) -> Agent:
return Agent(
config=self.agents_config['my_agent'], # type: ignore[index]
verbose=True
)
@task
def my_task(self) -> Task:
return Task(
config=self.tasks_config['my_task'] # type: ignore[index]
)
@crew
def crew(self) -> Crew:
return Crew(
agents=self.agents,
tasks=self.tasks,
process=Process.sequential,
verbose=True,
)
```
<Warning>
Se você esquecer o decorador `@CrewBase`, sua implantação falhará com
erros sobre configurações de agents ou tasks ausentes.
</Warning>
### 4. Verificar Pontos de Entrada do Projeto
Tanto Crews quanto Flows têm seu ponto de entrada em `src/project_name/main.py`:
<Tabs>
<Tab title="Para Crews">
O ponto de entrada usa uma função `run()`:
```python
# src/my_crew/main.py
from my_crew.crew import MyCrew
def run():
"""Executa o crew."""
inputs = {'topic': 'AI in Healthcare'}
result = MyCrew().crew().kickoff(inputs=inputs)
return result
if __name__ == "__main__":
run()
```
</Tab>
<Tab title="Para Flows">
O ponto de entrada usa uma função `kickoff()` com uma classe Flow:
```python
# src/my_flow/main.py
from crewai.flow import Flow, listen, start
from my_flow.crews.poem_crew.poem_crew import PoemCrew
class MyFlow(Flow):
@start()
def begin(self):
# Lógica do Flow aqui
result = PoemCrew().crew().kickoff(inputs={...})
return result
def kickoff():
"""Executa o flow."""
MyFlow().kickoff()
if __name__ == "__main__":
kickoff()
```
</Tab>
</Tabs>
### 5. Preparar Variáveis de Ambiente
Antes da implantação, certifique-se de ter:
1. **Chaves de API de LLM** prontas (OpenAI, Anthropic, Google, etc.)
2. **Chaves de API de ferramentas** se estiver usando ferramentas externas (Serper, etc.)
<Tip>
Teste seu projeto localmente com as mesmas variáveis de ambiente antes de implantar
para detectar problemas de configuração antecipadamente.
</Tip>
## Comandos de Validação Rápida
Execute estes comandos a partir da raiz do seu projeto para verificar rapidamente sua configuração:
```bash
# 1. Verificar tipo do projeto no pyproject.toml
grep -A2 "\[tool.crewai\]" pyproject.toml
# 2. Verificar se uv.lock existe
ls -la uv.lock || echo "ERRO: uv.lock ausente! Execute 'uv lock'"
# 3. Verificar se estrutura src/ existe
ls -la src/*/main.py 2>/dev/null || echo "Nenhum main.py encontrado em src/"
# 4. Para Crews - verificar se crew.py existe
ls -la src/*/crew.py 2>/dev/null || echo "Nenhum crew.py (esperado para Crews)"
# 5. Para Flows - verificar se pasta crews/ existe
ls -la src/*/crews/ 2>/dev/null || echo "Nenhuma pasta crews/ (esperado para Flows)"
# 6. Verificar uso do CrewBase
grep -r "@CrewBase" . --include="*.py"
```
## Erros Comuns de Configuração
| Erro | Sintoma | Correção |
|------|---------|----------|
| `uv.lock` ausente | Build falha durante resolução de dependências | Execute `uv lock` e faça commit |
| `type` errado no pyproject.toml | Build bem-sucedido mas falha em runtime | Altere para o tipo correto |
| Decorador `@CrewBase` ausente | Erros "Config not found" | Adicione decorador a todas as classes crew |
| Arquivos na raiz ao invés de `src/` | Ponto de entrada não encontrado | Mova para `src/project_name/` |
| `run()` ou `kickoff()` ausente | Não é possível iniciar automação | Adicione a função de entrada correta |
## Próximos Passos
Uma vez que seu projeto passar por todos os itens do checklist, você está pronto para implantar:
<Card title="Deploy para AMP" icon="rocket" href="/pt-BR/enterprise/guides/deploy-to-amp">
Siga o guia de implantação para implantar seu Crew ou Flow no CrewAI AMP usando
a CLI, interface web ou integração CI/CD.
</Card>

View File

@@ -82,7 +82,7 @@ CrewAI AMP expande o poder do framework open-source com funcionalidades projetad
<Card
title="Implantar Crew"
icon="rocket"
href="/pt-BR/enterprise/guides/deploy-crew"
href="/pt-BR/enterprise/guides/deploy-to-amp"
>
Implantar Crew
</Card>
@@ -92,11 +92,11 @@ CrewAI AMP expande o poder do framework open-source com funcionalidades projetad
<Card
title="Acesso via API"
icon="code"
href="/pt-BR/enterprise/guides/deploy-crew"
href="/pt-BR/enterprise/guides/kickoff-crew"
>
Usar a API do Crew
</Card>
</Step>
</Steps>
Para instruções detalhadas, consulte nosso [guia de implantação](/pt-BR/enterprise/guides/deploy-crew) ou clique no botão abaixo para começar.
Para instruções detalhadas, consulte nosso [guia de implantação](/pt-BR/enterprise/guides/deploy-to-amp) ou clique no botão abaixo para começar.

View File

@@ -7,6 +7,10 @@ mode: "wide"
## Visão Geral
<Note>
O decorador `@human_feedback` requer **CrewAI versão 1.8.0 ou superior**. Certifique-se de atualizar sua instalação antes de usar este recurso.
</Note>
O decorador `@human_feedback` permite fluxos de trabalho human-in-the-loop (HITL) diretamente nos CrewAI Flows. Ele permite pausar a execução do flow, apresentar a saída para um humano revisar, coletar seu feedback e, opcionalmente, rotear para diferentes listeners com base no resultado do feedback.
Isso é particularmente valioso para:

View File

@@ -5,9 +5,22 @@ icon: "user-check"
mode: "wide"
---
Human-in-the-Loop (HITL) é uma abordagem poderosa que combina a inteligência artificial com a experiência humana para aprimorar a tomada de decisões e melhorar os resultados das tarefas. Este guia mostra como implementar HITL dentro da CrewAI.
Human-in-the-Loop (HITL) é uma abordagem poderosa que combina a inteligência artificial com a experiência humana para aprimorar a tomada de decisões e melhorar os resultados das tarefas. CrewAI oferece várias maneiras de implementar HITL dependendo das suas necessidades.
## Configurando Workflows HITL
## Escolhendo Sua Abordagem HITL
CrewAI oferece duas abordagens principais para implementar workflows human-in-the-loop:
| Abordagem | Melhor Para | Integração | Versão |
|----------|----------|-------------|---------|
| **Baseada em Flow** (decorador `@human_feedback`) | Desenvolvimento local, revisão via console, workflows síncronos | [Feedback Humano em Flows](/pt-BR/learn/human-feedback-in-flows) | **1.8.0+** |
| **Baseada em Webhook** (Enterprise) | Deployments em produção, workflows assíncronos, integrações externas (Slack, Teams, etc.) | Este guia | - |
<Tip>
Se você está construindo flows e deseja adicionar etapas de revisão humana com roteamento baseado em feedback, confira o guia [Feedback Humano em Flows](/pt-BR/learn/human-feedback-in-flows) para o decorador `@human_feedback`.
</Tip>
## Configurando Workflows HITL Baseados em Webhook
<Steps>
<Step title="Configure sua Tarefa">

View File

@@ -0,0 +1,115 @@
---
title: Galileo Galileu
description: Integração Galileo para rastreamento e avaliação CrewAI
icon: telescope
mode: "wide"
---
## Visão geral
Este guia demonstra como integrar o **Galileo**com o **CrewAI**
para rastreamento abrangente e engenharia de avaliação.
Ao final deste guia, você será capaz de rastrear seus agentes CrewAI,
monitorar seu desempenho e avaliar seu comportamento com
A poderosa plataforma de observabilidade do Galileo.
> **O que é Galileo?**[Galileo](https://galileo.ai/) é avaliação e observabilidade de IA
plataforma que oferece rastreamento, avaliação e
e monitoramento de aplicações de IA. Ele permite que as equipes capturem a verdade,
criar grades de proteção robustas e realizar experimentos sistemáticos com
rastreamento de experimentos integrado e análise de desempenho -garantindo confiabilidade,
transparência e melhoria contínua em todo o ciclo de vida da IA.
## Primeiros passos
Este tutorial segue o [CrewAI Quickstart](pt-BR/quickstart) e mostra como adicionar
[CrewAIEventListener] do Galileo(https://v2docs.galileo.ai/sdk-api/python/reference/handlers/crewai/handler),
um manipulador de eventos.
Para mais informações, consulte Galileu
[Adicionar Galileo a um aplicativo CrewAI](https://v2docs.galileo.ai/how-to-guides/third-party-integrations/add-galileo-to-crewai/add-galileo-to-crewai)
guia prático.
> **Observação**Este tutorial pressupõe que você concluiu o [CrewAI Quickstart](pt-BR/quickstart).
Se você quiser um exemplo completo e abrangente, consulte o Galileo
[Repositório de exemplo SDK da CrewAI](https://github.com/rungalileo/sdk-examples/tree/main/python/agent/crew-ai).
### Etapa 1: instalar dependências
Instale as dependências necessárias para seu aplicativo.
Crie um ambiente virtual usando seu método preferido,
em seguida, instale dependências dentro desse ambiente usando seu
ferramenta preferida:
```bash
uv add galileo
```
### Etapa 2: adicione ao arquivo .env do [CrewAI Quickstart](/pt-BR/quickstart)
```bash
# Your Galileo API key
GALILEO_API_KEY="your-galileo-api-key"
# Your Galileo project name
GALILEO_PROJECT="your-galileo-project-name"
# The name of the Log stream you want to use for logging
GALILEO_LOG_STREAM="your-galileo-log-stream "
```
### Etapa 3: adicionar o ouvinte de eventos Galileo
Para habilitar o registro com Galileo, você precisa criar uma instância do `CrewAIEventListener`.
Importe o pacote manipulador Galileo CrewAI por
adicionando o seguinte código no topo do seu arquivo main.py:
```python
from galileo.handlers.crewai.handler import CrewAIEventListener
```
No início da sua função run, crie o ouvinte de evento:
```python
def run():
# Create the event listener
CrewAIEventListener()
# The rest of your existing code goes here
```
Quando você cria a instância do listener, ela é automaticamente
registrado na CrewAI.
### Etapa 4: administre sua Crew
Administre sua Crew com o CrewAI CLI:
```bash
crewai run
```
### Passo 5: Visualize os traços no Galileo
Assim que sua tripulação terminar, os rastros serão eliminados e aparecerão no Galileo.
![Galileo trace view](/images/galileo-trace-veiw.png)
## Compreendendo a integração do Galileo
Galileo se integra ao CrewAI registrando um ouvinte de evento
que captura eventos de execução da tripulação (por exemplo, ações do agente, chamadas de ferramentas, respostas do modelo)
e os encaminha ao Galileo para observabilidade e avaliação.
### Compreendendo o ouvinte de eventos
Criar uma instância `CrewAIEventListener()` é tudo o que você precisa
necessário para habilitar o Galileo para uma execução do CrewAI. Quando instanciado, o ouvinte:
-Registra-se automaticamente no CrewAI
-Lê a configuração do Galileo a partir de variáveis de ambiente
-Registra todos os dados de execução no projeto Galileo e fluxo de log especificado por
`GALILEO_PROJECT` e `GALILEO_LOG_STREAM`
Nenhuma configuração adicional ou alterações de código são necessárias.
Todos os dados desta execução são registados no projecto Galileo e
fluxo de log especificado pela configuração do seu ambiente
(por exemplo, GALILEO_PROJECT e GALILEO_LOG_STREAM).

View File

@@ -0,0 +1,43 @@
# crewai-files
File handling utilities for CrewAI multimodal inputs.
## Supported File Types
- `ImageFile` - PNG, JPEG, GIF, WebP
- `PDFFile` - PDF documents
- `TextFile` - Plain text files
- `AudioFile` - MP3, WAV, FLAC, OGG, M4A
- `VideoFile` - MP4, WebM, MOV, AVI
## Usage
```python
from crewai_files import File, ImageFile, PDFFile
# Auto-detect file type
file = File(source="document.pdf") # Resolves to PDFFile
# Or use specific types
image = ImageFile(source="chart.png")
pdf = PDFFile(source="report.pdf")
```
### Passing Files to Crews
```python
crew.kickoff(inputs={
"files": {"chart": ImageFile(source="chart.png")}
})
```
### Passing Files to Tasks
```python
task = Task(
description="Analyze the chart",
expected_output="Analysis",
agent=agent,
input_files=[ImageFile(source="chart.png")],
)
```

View File

@@ -0,0 +1,25 @@
[project]
name = "crewai-files"
dynamic = ["version"]
description = "File handling utilities for CrewAI multimodal inputs"
readme = "README.md"
authors = [
{ name = "Greyson LaLonde", email = "greyson@crewai.com" }
]
requires-python = ">=3.10, <3.14"
dependencies = [
"Pillow~=10.4.0",
"pypdf~=4.0.0",
"python-magic>=0.4.27",
"aiocache~=0.12.3",
"aiofiles~=24.1.0",
"tinytag~=1.10.0",
"av~=13.0.0",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.hatch.version]
path = "src/crewai_files/__init__.py"

View File

@@ -0,0 +1,153 @@
"""File handling utilities for crewAI tasks."""
from crewai_files.cache.cleanup import (
cleanup_expired_files,
cleanup_provider_files,
cleanup_uploaded_files,
)
from crewai_files.cache.upload_cache import (
CachedUpload,
UploadCache,
get_upload_cache,
reset_upload_cache,
)
from crewai_files.core.resolved import (
FileReference,
InlineBase64,
InlineBytes,
ResolvedFile,
ResolvedFileType,
UrlReference,
)
from crewai_files.core.sources import (
FileBytes,
FilePath,
FileSource,
FileSourceInput,
FileStream,
FileUrl,
RawFileInput,
)
from crewai_files.core.types import (
AudioExtension,
AudioFile,
AudioMimeType,
BaseFile,
File,
FileInput,
FileMode,
ImageExtension,
ImageFile,
ImageMimeType,
PDFContentType,
PDFExtension,
PDFFile,
TextContentType,
TextExtension,
TextFile,
VideoExtension,
VideoFile,
VideoMimeType,
)
from crewai_files.formatting import (
aformat_multimodal_content,
format_multimodal_content,
)
from crewai_files.processing import (
ANTHROPIC_CONSTRAINTS,
BEDROCK_CONSTRAINTS,
GEMINI_CONSTRAINTS,
OPENAI_CONSTRAINTS,
AudioConstraints,
FileHandling,
FileProcessingError,
FileProcessor,
FileTooLargeError,
FileValidationError,
ImageConstraints,
PDFConstraints,
ProcessingDependencyError,
ProviderConstraints,
UnsupportedFileTypeError,
VideoConstraints,
get_constraints_for_provider,
)
from crewai_files.resolution.resolver import (
FileResolver,
FileResolverConfig,
create_resolver,
)
from crewai_files.resolution.utils import normalize_input_files, wrap_file_source
from crewai_files.uploaders import FileUploader, UploadResult, get_uploader
__all__ = [
"ANTHROPIC_CONSTRAINTS",
"BEDROCK_CONSTRAINTS",
"GEMINI_CONSTRAINTS",
"OPENAI_CONSTRAINTS",
"AudioConstraints",
"AudioExtension",
"AudioFile",
"AudioMimeType",
"BaseFile",
"CachedUpload",
"File",
"FileBytes",
"FileHandling",
"FileInput",
"FileMode",
"FilePath",
"FileProcessingError",
"FileProcessor",
"FileReference",
"FileResolver",
"FileResolverConfig",
"FileSource",
"FileSourceInput",
"FileStream",
"FileTooLargeError",
"FileUploader",
"FileUrl",
"FileValidationError",
"ImageConstraints",
"ImageExtension",
"ImageFile",
"ImageMimeType",
"InlineBase64",
"InlineBytes",
"PDFConstraints",
"PDFContentType",
"PDFExtension",
"PDFFile",
"ProcessingDependencyError",
"ProviderConstraints",
"RawFileInput",
"ResolvedFile",
"ResolvedFileType",
"TextContentType",
"TextExtension",
"TextFile",
"UnsupportedFileTypeError",
"UploadCache",
"UploadResult",
"UrlReference",
"VideoConstraints",
"VideoExtension",
"VideoFile",
"VideoMimeType",
"aformat_multimodal_content",
"cleanup_expired_files",
"cleanup_provider_files",
"cleanup_uploaded_files",
"create_resolver",
"format_multimodal_content",
"get_constraints_for_provider",
"get_upload_cache",
"get_uploader",
"normalize_input_files",
"reset_upload_cache",
"wrap_file_source",
]
__version__ = "1.8.1"

View File

@@ -0,0 +1,14 @@
"""Upload caching and cleanup."""
from crewai_files.cache.cleanup import cleanup_uploaded_files
from crewai_files.cache.metrics import FileOperationMetrics, measure_operation
from crewai_files.cache.upload_cache import UploadCache, get_upload_cache
__all__ = [
"FileOperationMetrics",
"UploadCache",
"cleanup_uploaded_files",
"get_upload_cache",
"measure_operation",
]

View File

@@ -0,0 +1,374 @@
"""Cleanup utilities for uploaded files."""
from __future__ import annotations
import asyncio
import logging
from typing import TYPE_CHECKING
from crewai_files.cache.upload_cache import CachedUpload, UploadCache
from crewai_files.uploaders import get_uploader
from crewai_files.uploaders.factory import ProviderType
if TYPE_CHECKING:
from crewai_files.uploaders.base import FileUploader
logger = logging.getLogger(__name__)
def _safe_delete(
uploader: FileUploader,
file_id: str,
provider: str,
) -> bool:
"""Safely delete a file, logging any errors.
Args:
uploader: The file uploader to use.
file_id: The file ID to delete.
provider: Provider name for logging.
Returns:
True if deleted successfully, False otherwise.
"""
try:
if uploader.delete(file_id):
logger.debug(f"Deleted {file_id} from {provider}")
return True
logger.warning(f"Failed to delete {file_id} from {provider}")
return False
except Exception as e:
logger.warning(f"Error deleting {file_id} from {provider}: {e}")
return False
def cleanup_uploaded_files(
cache: UploadCache,
*,
delete_from_provider: bool = True,
providers: list[ProviderType] | None = None,
) -> int:
"""Clean up uploaded files from the cache and optionally from providers.
Args:
cache: The upload cache to clean up.
delete_from_provider: If True, delete files from the provider as well.
providers: Optional list of providers to clean up. If None, cleans all.
Returns:
Number of files cleaned up.
"""
cleaned = 0
provider_uploads: dict[ProviderType, list[CachedUpload]] = {}
for provider in _get_providers_from_cache(cache):
if providers is not None and provider not in providers:
continue
provider_uploads[provider] = cache.get_all_for_provider(provider)
if delete_from_provider:
for provider, uploads in provider_uploads.items():
uploader = get_uploader(provider)
if uploader is None:
logger.warning(
f"No uploader available for {provider}, skipping cleanup"
)
continue
for upload in uploads:
if _safe_delete(uploader, upload.file_id, provider):
cleaned += 1
cache.clear()
logger.info(f"Cleaned up {cleaned} uploaded files")
return cleaned
def cleanup_expired_files(
cache: UploadCache,
*,
delete_from_provider: bool = False,
) -> int:
"""Clean up expired files from the cache.
Args:
cache: The upload cache to clean up.
delete_from_provider: If True, attempt to delete from provider as well.
Note: Expired files may already be deleted by the provider.
Returns:
Number of expired entries removed from cache.
"""
expired_entries: list[CachedUpload] = []
if delete_from_provider:
for provider in _get_providers_from_cache(cache):
expired_entries.extend(
upload
for upload in cache.get_all_for_provider(provider)
if upload.is_expired()
)
removed = cache.clear_expired()
if delete_from_provider:
for upload in expired_entries:
uploader = get_uploader(upload.provider)
if uploader is not None:
try:
uploader.delete(upload.file_id)
except Exception as e:
logger.debug(f"Could not delete expired file {upload.file_id}: {e}")
return removed
def cleanup_provider_files(
provider: ProviderType,
*,
cache: UploadCache | None = None,
delete_all_from_provider: bool = False,
) -> int:
"""Clean up all files for a specific provider.
Args:
provider: Provider name to clean up.
cache: Optional upload cache to clear entries from.
delete_all_from_provider: If True, delete all files from the provider,
not just cached ones.
Returns:
Number of files deleted.
"""
deleted = 0
uploader = get_uploader(provider)
if uploader is None:
logger.warning(f"No uploader available for {provider}")
return 0
if delete_all_from_provider:
try:
files = uploader.list_files()
for file_info in files:
file_id = file_info.get("id") or file_info.get("name")
if file_id and uploader.delete(file_id):
deleted += 1
except Exception as e:
logger.warning(f"Error listing/deleting files from {provider}: {e}")
elif cache is not None:
uploads = cache.get_all_for_provider(provider)
for upload in uploads:
if _safe_delete(uploader, upload.file_id, provider):
deleted += 1
cache.remove_by_file_id(upload.file_id, provider)
logger.info(f"Deleted {deleted} files from {provider}")
return deleted
def _get_providers_from_cache(cache: UploadCache) -> set[ProviderType]:
"""Get unique provider names from cache entries.
Args:
cache: The upload cache.
Returns:
Set of provider names.
"""
return cache.get_providers()
async def _asafe_delete(
uploader: FileUploader,
file_id: str,
provider: str,
) -> bool:
"""Async safely delete a file, logging any errors.
Args:
uploader: The file uploader to use.
file_id: The file ID to delete.
provider: Provider name for logging.
Returns:
True if deleted successfully, False otherwise.
"""
try:
if await uploader.adelete(file_id):
logger.debug(f"Deleted {file_id} from {provider}")
return True
logger.warning(f"Failed to delete {file_id} from {provider}")
return False
except Exception as e:
logger.warning(f"Error deleting {file_id} from {provider}: {e}")
return False
async def acleanup_uploaded_files(
cache: UploadCache,
*,
delete_from_provider: bool = True,
providers: list[ProviderType] | None = None,
max_concurrency: int = 10,
) -> int:
"""Async clean up uploaded files from the cache and optionally from providers.
Args:
cache: The upload cache to clean up.
delete_from_provider: If True, delete files from the provider as well.
providers: Optional list of providers to clean up. If None, cleans all.
max_concurrency: Maximum number of concurrent delete operations.
Returns:
Number of files cleaned up.
"""
cleaned = 0
provider_uploads: dict[ProviderType, list[CachedUpload]] = {}
for provider in _get_providers_from_cache(cache):
if providers is not None and provider not in providers:
continue
provider_uploads[provider] = await cache.aget_all_for_provider(provider)
if delete_from_provider:
semaphore = asyncio.Semaphore(max_concurrency)
async def delete_one(file_uploader: FileUploader, cached: CachedUpload) -> bool:
"""Delete a single file with semaphore limiting."""
async with semaphore:
return await _asafe_delete(
file_uploader, cached.file_id, cached.provider
)
tasks: list[asyncio.Task[bool]] = []
for provider, uploads in provider_uploads.items():
uploader = get_uploader(provider)
if uploader is None:
logger.warning(
f"No uploader available for {provider}, skipping cleanup"
)
continue
tasks.extend(
asyncio.create_task(delete_one(uploader, cached)) for cached in uploads
)
results = await asyncio.gather(*tasks, return_exceptions=True)
cleaned = sum(1 for r in results if r is True)
await cache.aclear()
logger.info(f"Cleaned up {cleaned} uploaded files")
return cleaned
async def acleanup_expired_files(
cache: UploadCache,
*,
delete_from_provider: bool = False,
max_concurrency: int = 10,
) -> int:
"""Async clean up expired files from the cache.
Args:
cache: The upload cache to clean up.
delete_from_provider: If True, attempt to delete from provider as well.
max_concurrency: Maximum number of concurrent delete operations.
Returns:
Number of expired entries removed from cache.
"""
expired_entries: list[CachedUpload] = []
if delete_from_provider:
for provider in _get_providers_from_cache(cache):
uploads = await cache.aget_all_for_provider(provider)
expired_entries.extend(upload for upload in uploads if upload.is_expired())
removed = await cache.aclear_expired()
if delete_from_provider and expired_entries:
semaphore = asyncio.Semaphore(max_concurrency)
async def delete_expired(cached: CachedUpload) -> None:
"""Delete an expired file with semaphore limiting."""
async with semaphore:
file_uploader = get_uploader(cached.provider)
if file_uploader is not None:
try:
await file_uploader.adelete(cached.file_id)
except Exception as e:
logger.debug(
f"Could not delete expired file {cached.file_id}: {e}"
)
await asyncio.gather(
*[delete_expired(cached) for cached in expired_entries],
return_exceptions=True,
)
return removed
async def acleanup_provider_files(
provider: ProviderType,
*,
cache: UploadCache | None = None,
delete_all_from_provider: bool = False,
max_concurrency: int = 10,
) -> int:
"""Async clean up all files for a specific provider.
Args:
provider: Provider name to clean up.
cache: Optional upload cache to clear entries from.
delete_all_from_provider: If True, delete all files from the provider.
max_concurrency: Maximum number of concurrent delete operations.
Returns:
Number of files deleted.
"""
deleted = 0
uploader = get_uploader(provider)
if uploader is None:
logger.warning(f"No uploader available for {provider}")
return 0
semaphore = asyncio.Semaphore(max_concurrency)
async def delete_single(target_file_id: str) -> bool:
"""Delete a single file with semaphore limiting."""
async with semaphore:
return await uploader.adelete(target_file_id)
if delete_all_from_provider:
try:
files = uploader.list_files()
tasks = []
for file_info in files:
fid = file_info.get("id") or file_info.get("name")
if fid:
tasks.append(delete_single(fid))
results = await asyncio.gather(*tasks, return_exceptions=True)
deleted = sum(1 for r in results if r is True)
except Exception as e:
logger.warning(f"Error listing/deleting files from {provider}: {e}")
elif cache is not None:
uploads = await cache.aget_all_for_provider(provider)
tasks = []
for upload in uploads:
tasks.append(delete_single(upload.file_id))
results = await asyncio.gather(*tasks, return_exceptions=True)
for upload, result in zip(uploads, results, strict=False):
if result is True:
deleted += 1
await cache.aremove_by_file_id(upload.file_id, provider)
logger.info(f"Deleted {deleted} files from {provider}")
return deleted

View File

@@ -0,0 +1,184 @@
"""Performance metrics and structured logging for file operations."""
from __future__ import annotations
from collections.abc import Generator
from contextlib import contextmanager
from dataclasses import dataclass, field
from datetime import datetime, timezone
import logging
import time
from typing import Any
logger = logging.getLogger(__name__)
@dataclass
class FileOperationMetrics:
"""Metrics for a file operation.
Attributes:
operation: Name of the operation (e.g., "upload", "resolve", "process").
filename: Name of the file being operated on.
provider: Provider name if applicable.
duration_ms: Duration of the operation in milliseconds.
size_bytes: Size of the file in bytes.
success: Whether the operation succeeded.
error: Error message if operation failed.
timestamp: When the operation occurred.
metadata: Additional operation-specific metadata.
"""
operation: str
filename: str | None = None
provider: str | None = None
duration_ms: float = 0.0
size_bytes: int | None = None
success: bool = True
error: str | None = None
timestamp: datetime = field(default_factory=lambda: datetime.now(timezone.utc))
metadata: dict[str, Any] = field(default_factory=dict)
def to_dict(self) -> dict[str, Any]:
"""Convert metrics to dictionary for logging.
Returns:
Dictionary representation of metrics.
"""
result: dict[str, Any] = {
"operation": self.operation,
"duration_ms": round(self.duration_ms, 2),
"success": self.success,
"timestamp": self.timestamp.isoformat(),
}
if self.filename:
result["file_name"] = self.filename
if self.provider:
result["provider"] = self.provider
if self.size_bytes is not None:
result["size_bytes"] = self.size_bytes
if self.error:
result["error"] = self.error
if self.metadata:
result.update(self.metadata)
return result
@contextmanager
def measure_operation(
operation: str,
*,
filename: str | None = None,
provider: str | None = None,
size_bytes: int | None = None,
log_level: int = logging.DEBUG,
**extra_metadata: Any,
) -> Generator[FileOperationMetrics, None, None]:
"""Context manager to measure and log operation performance.
Args:
operation: Name of the operation.
filename: Optional filename being operated on.
provider: Optional provider name.
size_bytes: Optional file size in bytes.
log_level: Log level for the result message.
**extra_metadata: Additional metadata to include.
Yields:
FileOperationMetrics object that will be populated with results.
Example:
with measure_operation("upload", filename="test.pdf", provider="openai") as metrics:
result = upload_file(file)
metrics.metadata["file_id"] = result.file_id
"""
metrics = FileOperationMetrics(
operation=operation,
filename=filename,
provider=provider,
size_bytes=size_bytes,
metadata=dict(extra_metadata),
)
start_time = time.perf_counter()
try:
yield metrics
metrics.success = True
except Exception as e:
metrics.success = False
metrics.error = str(e)
raise
finally:
metrics.duration_ms = (time.perf_counter() - start_time) * 1000
log_message = f"{operation}"
if filename:
log_message += f" [{filename}]"
if provider:
log_message += f" ({provider})"
if metrics.success:
log_message += f" completed in {metrics.duration_ms:.2f}ms"
else:
log_message += f" failed after {metrics.duration_ms:.2f}ms: {metrics.error}"
logger.log(log_level, log_message, extra=metrics.to_dict())
def log_file_operation(
operation: str,
*,
filename: str | None = None,
provider: str | None = None,
size_bytes: int | None = None,
duration_ms: float | None = None,
success: bool = True,
error: str | None = None,
level: int = logging.INFO,
**extra: Any,
) -> None:
"""Log a file operation with structured data.
Args:
operation: Name of the operation.
filename: Optional filename being operated on.
provider: Optional provider name.
size_bytes: Optional file size in bytes.
duration_ms: Optional duration in milliseconds.
success: Whether the operation succeeded.
error: Optional error message.
level: Log level to use.
**extra: Additional metadata to include.
"""
metrics = FileOperationMetrics(
operation=operation,
filename=filename,
provider=provider,
size_bytes=size_bytes,
duration_ms=duration_ms or 0.0,
success=success,
error=error,
metadata=dict(extra),
)
message = f"{operation}"
if filename:
message += f" [{filename}]"
if provider:
message += f" ({provider})"
if success:
if duration_ms:
message += f" completed in {duration_ms:.2f}ms"
else:
message += " completed"
else:
message += " failed"
if error:
message += f": {error}"
logger.log(level, message, extra=metrics.to_dict())

View File

@@ -0,0 +1,553 @@
"""Cache for tracking uploaded files using aiocache."""
from __future__ import annotations
import asyncio
import atexit
import builtins
from collections.abc import Iterator
from dataclasses import dataclass
from datetime import datetime, timezone
import hashlib
import logging
from typing import TYPE_CHECKING, Any
from aiocache import Cache # type: ignore[import-untyped]
from aiocache.serializers import PickleSerializer # type: ignore[import-untyped]
from crewai_files.core.constants import DEFAULT_MAX_CACHE_ENTRIES, DEFAULT_TTL_SECONDS
from crewai_files.uploaders.factory import ProviderType
if TYPE_CHECKING:
from crewai_files.core.types import FileInput
logger = logging.getLogger(__name__)
@dataclass
class CachedUpload:
"""Represents a cached file upload.
Attributes:
file_id: Provider-specific file identifier.
provider: Name of the provider.
file_uri: Optional URI for accessing the file.
content_type: MIME type of the uploaded file.
uploaded_at: When the file was uploaded.
expires_at: When the upload expires (if applicable).
"""
file_id: str
provider: ProviderType
file_uri: str | None
content_type: str
uploaded_at: datetime
expires_at: datetime | None = None
def is_expired(self) -> bool:
"""Check if this cached upload has expired."""
if self.expires_at is None:
return False
return datetime.now(timezone.utc) >= self.expires_at
def _make_key(file_hash: str, provider: str) -> str:
"""Create a cache key from file hash and provider."""
return f"upload:{provider}:{file_hash}"
def _compute_file_hash_streaming(chunks: Iterator[bytes]) -> str:
"""Compute SHA-256 hash from streaming chunks.
Args:
chunks: Iterator of byte chunks.
Returns:
Hexadecimal hash string.
"""
hasher = hashlib.sha256()
for chunk in chunks:
hasher.update(chunk)
return hasher.hexdigest()
def _compute_file_hash(file: FileInput) -> str:
"""Compute SHA-256 hash of file content.
Uses streaming for FilePath sources to avoid loading large files into memory.
"""
from crewai_files.core.sources import FilePath
source = file._file_source
if isinstance(source, FilePath):
return _compute_file_hash_streaming(source.read_chunks(chunk_size=1024 * 1024))
content = file.read()
return hashlib.sha256(content).hexdigest()
class UploadCache:
"""Async cache for tracking uploaded files using aiocache.
Supports in-memory caching by default, with optional Redis backend
for distributed setups.
Attributes:
ttl: Default time-to-live in seconds for cached entries.
namespace: Cache namespace for isolation.
"""
def __init__(
self,
ttl: int = DEFAULT_TTL_SECONDS,
namespace: str = "crewai_uploads",
cache_type: str = "memory",
max_entries: int | None = DEFAULT_MAX_CACHE_ENTRIES,
**cache_kwargs: Any,
) -> None:
"""Initialize the upload cache.
Args:
ttl: Default TTL in seconds.
namespace: Cache namespace.
cache_type: Backend type ("memory" or "redis").
max_entries: Maximum cache entries (None for unlimited).
**cache_kwargs: Additional args for cache backend.
"""
self.ttl = ttl
self.namespace = namespace
self.max_entries = max_entries
self._provider_keys: dict[ProviderType, set[str]] = {}
self._key_access_order: list[str] = []
if cache_type == "redis":
self._cache = Cache(
Cache.REDIS,
serializer=PickleSerializer(),
namespace=namespace,
**cache_kwargs,
)
else:
self._cache = Cache(
serializer=PickleSerializer(),
namespace=namespace,
)
def _track_key(self, provider: ProviderType, key: str) -> None:
"""Track a key for a provider (for cleanup) and access order."""
if provider not in self._provider_keys:
self._provider_keys[provider] = set()
self._provider_keys[provider].add(key)
if key in self._key_access_order:
self._key_access_order.remove(key)
self._key_access_order.append(key)
def _untrack_key(self, provider: ProviderType, key: str) -> None:
"""Remove key tracking for a provider."""
if provider in self._provider_keys:
self._provider_keys[provider].discard(key)
if key in self._key_access_order:
self._key_access_order.remove(key)
async def _evict_if_needed(self) -> int:
"""Evict oldest entries if limit exceeded.
Returns:
Number of entries evicted.
"""
if self.max_entries is None:
return 0
current_count = len(self)
if current_count < self.max_entries:
return 0
to_evict = max(1, self.max_entries // 10)
return await self._evict_oldest(to_evict)
async def _evict_oldest(self, count: int) -> int:
"""Evict the oldest entries from the cache.
Args:
count: Number of entries to evict.
Returns:
Number of entries actually evicted.
"""
evicted = 0
keys_to_evict = self._key_access_order[:count]
for key in keys_to_evict:
await self._cache.delete(key)
self._key_access_order.remove(key)
for provider_keys in self._provider_keys.values():
provider_keys.discard(key)
evicted += 1
if evicted > 0:
logger.debug(f"Evicted {evicted} oldest cache entries")
return evicted
async def aget(
self, file: FileInput, provider: ProviderType
) -> CachedUpload | None:
"""Get a cached upload for a file.
Args:
file: The file to look up.
provider: The provider name.
Returns:
Cached upload if found and not expired, None otherwise.
"""
file_hash = _compute_file_hash(file)
return await self.aget_by_hash(file_hash, provider)
async def aget_by_hash(
self, file_hash: str, provider: ProviderType
) -> CachedUpload | None:
"""Get a cached upload by file hash.
Args:
file_hash: Hash of the file content.
provider: The provider name.
Returns:
Cached upload if found and not expired, None otherwise.
"""
key = _make_key(file_hash, provider)
result = await self._cache.get(key)
if result is None:
return None
if isinstance(result, CachedUpload):
if result.is_expired():
await self._cache.delete(key)
self._untrack_key(provider, key)
return None
return result
return None
async def aset(
self,
file: FileInput,
provider: ProviderType,
file_id: str,
file_uri: str | None = None,
expires_at: datetime | None = None,
) -> CachedUpload:
"""Cache an uploaded file.
Args:
file: The file that was uploaded.
provider: The provider name.
file_id: Provider-specific file identifier.
file_uri: Optional URI for accessing the file.
expires_at: When the upload expires.
Returns:
The created cache entry.
"""
file_hash = _compute_file_hash(file)
return await self.aset_by_hash(
file_hash=file_hash,
content_type=file.content_type,
provider=provider,
file_id=file_id,
file_uri=file_uri,
expires_at=expires_at,
)
async def aset_by_hash(
self,
file_hash: str,
content_type: str,
provider: ProviderType,
file_id: str,
file_uri: str | None = None,
expires_at: datetime | None = None,
) -> CachedUpload:
"""Cache an uploaded file by hash.
Args:
file_hash: Hash of the file content.
content_type: MIME type of the file.
provider: The provider name.
file_id: Provider-specific file identifier.
file_uri: Optional URI for accessing the file.
expires_at: When the upload expires.
Returns:
The created cache entry.
"""
await self._evict_if_needed()
key = _make_key(file_hash, provider)
now = datetime.now(timezone.utc)
cached = CachedUpload(
file_id=file_id,
provider=provider,
file_uri=file_uri,
content_type=content_type,
uploaded_at=now,
expires_at=expires_at,
)
ttl = self.ttl
if expires_at is not None:
ttl = max(0, int((expires_at - now).total_seconds()))
await self._cache.set(key, cached, ttl=ttl)
self._track_key(provider, key)
logger.debug(f"Cached upload: {file_id} for provider {provider}")
return cached
async def aremove(self, file: FileInput, provider: ProviderType) -> bool:
"""Remove a cached upload.
Args:
file: The file to remove.
provider: The provider name.
Returns:
True if entry was removed, False if not found.
"""
file_hash = _compute_file_hash(file)
key = _make_key(file_hash, provider)
result = await self._cache.delete(key)
removed = bool(result > 0 if isinstance(result, int) else result)
if removed:
self._untrack_key(provider, key)
return removed
async def aremove_by_file_id(self, file_id: str, provider: ProviderType) -> bool:
"""Remove a cached upload by file ID.
Args:
file_id: The file ID to remove.
provider: The provider name.
Returns:
True if entry was removed, False if not found.
"""
if provider not in self._provider_keys:
return False
for key in list(self._provider_keys[provider]):
cached = await self._cache.get(key)
if isinstance(cached, CachedUpload) and cached.file_id == file_id:
await self._cache.delete(key)
self._untrack_key(provider, key)
return True
return False
async def aclear_expired(self) -> int:
"""Remove all expired entries from the cache.
Returns:
Number of entries removed.
"""
removed = 0
for provider, keys in list(self._provider_keys.items()):
for key in list(keys):
cached = await self._cache.get(key)
if cached is None or (
isinstance(cached, CachedUpload) and cached.is_expired()
):
await self._cache.delete(key)
self._untrack_key(provider, key)
removed += 1
if removed > 0:
logger.debug(f"Cleared {removed} expired cache entries")
return removed
async def aclear(self) -> int:
"""Clear all entries from the cache.
Returns:
Number of entries cleared.
"""
count = sum(len(keys) for keys in self._provider_keys.values())
await self._cache.clear(namespace=self.namespace)
self._provider_keys.clear()
if count > 0:
logger.debug(f"Cleared {count} cache entries")
return count
async def aget_all_for_provider(self, provider: ProviderType) -> list[CachedUpload]:
"""Get all cached uploads for a provider.
Args:
provider: The provider name.
Returns:
List of cached uploads for the provider.
"""
if provider not in self._provider_keys:
return []
results: list[CachedUpload] = []
for key in list(self._provider_keys[provider]):
cached = await self._cache.get(key)
if isinstance(cached, CachedUpload) and not cached.is_expired():
results.append(cached)
return results
@staticmethod
def _run_sync(coro: Any) -> Any:
"""Run an async coroutine from sync context without blocking event loop."""
try:
loop = asyncio.get_running_loop()
except RuntimeError:
loop = None
if loop is not None and loop.is_running():
future = asyncio.run_coroutine_threadsafe(coro, loop)
return future.result(timeout=30)
return asyncio.run(coro)
def get(self, file: FileInput, provider: ProviderType) -> CachedUpload | None:
"""Sync wrapper for aget."""
result: CachedUpload | None = self._run_sync(self.aget(file, provider))
return result
def get_by_hash(
self, file_hash: str, provider: ProviderType
) -> CachedUpload | None:
"""Sync wrapper for aget_by_hash."""
result: CachedUpload | None = self._run_sync(
self.aget_by_hash(file_hash, provider)
)
return result
def set(
self,
file: FileInput,
provider: ProviderType,
file_id: str,
file_uri: str | None = None,
expires_at: datetime | None = None,
) -> CachedUpload:
"""Sync wrapper for aset."""
result: CachedUpload = self._run_sync(
self.aset(file, provider, file_id, file_uri, expires_at)
)
return result
def set_by_hash(
self,
file_hash: str,
content_type: str,
provider: ProviderType,
file_id: str,
file_uri: str | None = None,
expires_at: datetime | None = None,
) -> CachedUpload:
"""Sync wrapper for aset_by_hash."""
result: CachedUpload = self._run_sync(
self.aset_by_hash(
file_hash, content_type, provider, file_id, file_uri, expires_at
)
)
return result
def remove(self, file: FileInput, provider: ProviderType) -> bool:
"""Sync wrapper for aremove."""
result: bool = self._run_sync(self.aremove(file, provider))
return result
def remove_by_file_id(self, file_id: str, provider: ProviderType) -> bool:
"""Sync wrapper for aremove_by_file_id."""
result: bool = self._run_sync(self.aremove_by_file_id(file_id, provider))
return result
def clear_expired(self) -> int:
"""Sync wrapper for aclear_expired."""
result: int = self._run_sync(self.aclear_expired())
return result
def clear(self) -> int:
"""Sync wrapper for aclear."""
result: int = self._run_sync(self.aclear())
return result
def get_all_for_provider(self, provider: ProviderType) -> list[CachedUpload]:
"""Sync wrapper for aget_all_for_provider."""
result: list[CachedUpload] = self._run_sync(
self.aget_all_for_provider(provider)
)
return result
def __len__(self) -> int:
"""Return the number of cached entries."""
return sum(len(keys) for keys in self._provider_keys.values())
def get_providers(self) -> builtins.set[ProviderType]:
"""Get all provider names that have cached entries.
Returns:
Set of provider names.
"""
return builtins.set(self._provider_keys.keys())
_default_cache: UploadCache | None = None
def get_upload_cache(
ttl: int = DEFAULT_TTL_SECONDS,
namespace: str = "crewai_uploads",
cache_type: str = "memory",
**cache_kwargs: Any,
) -> UploadCache:
"""Get or create the default upload cache.
Args:
ttl: Default TTL in seconds.
namespace: Cache namespace.
cache_type: Backend type ("memory" or "redis").
**cache_kwargs: Additional args for cache backend.
Returns:
The upload cache instance.
"""
global _default_cache
if _default_cache is None:
_default_cache = UploadCache(
ttl=ttl,
namespace=namespace,
cache_type=cache_type,
**cache_kwargs,
)
return _default_cache
def reset_upload_cache() -> None:
"""Reset the default upload cache (useful for testing)."""
global _default_cache
if _default_cache is not None:
_default_cache.clear()
_default_cache = None
def _cleanup_on_exit() -> None:
"""Clean up uploaded files on process exit."""
global _default_cache
if _default_cache is None or len(_default_cache) == 0:
return
from crewai_files.cache.cleanup import cleanup_uploaded_files
try:
cleanup_uploaded_files(_default_cache)
except Exception as e:
logger.debug(f"Error during exit cleanup: {e}")
atexit.register(_cleanup_on_exit)

View File

@@ -0,0 +1,92 @@
"""Core file types and sources."""
from crewai_files.core.constants import (
BACKOFF_BASE_DELAY,
BACKOFF_JITTER_FACTOR,
BACKOFF_MAX_DELAY,
DEFAULT_MAX_CACHE_ENTRIES,
DEFAULT_MAX_FILE_SIZE_BYTES,
DEFAULT_TTL_SECONDS,
DEFAULT_UPLOAD_CHUNK_SIZE,
FILES_API_MAX_SIZE,
GEMINI_FILE_TTL,
MAGIC_BUFFER_SIZE,
MAX_CONCURRENCY,
MULTIPART_CHUNKSIZE,
MULTIPART_THRESHOLD,
UPLOAD_MAX_RETRIES,
UPLOAD_RETRY_DELAY_BASE,
)
from crewai_files.core.resolved import (
FileReference,
InlineBase64,
InlineBytes,
ResolvedFile,
UrlReference,
)
from crewai_files.core.sources import (
AsyncFileStream,
FileBytes,
FilePath,
FileSource,
FileStream,
FileUrl,
)
from crewai_files.core.types import (
AudioFile,
AudioMimeType,
BaseFile,
CoercedFileSource,
File,
FileInput,
FileMode,
ImageFile,
ImageMimeType,
PDFFile,
TextFile,
VideoFile,
VideoMimeType,
)
__all__ = [
"BACKOFF_BASE_DELAY",
"BACKOFF_JITTER_FACTOR",
"BACKOFF_MAX_DELAY",
"DEFAULT_MAX_CACHE_ENTRIES",
"DEFAULT_MAX_FILE_SIZE_BYTES",
"DEFAULT_TTL_SECONDS",
"DEFAULT_UPLOAD_CHUNK_SIZE",
"FILES_API_MAX_SIZE",
"GEMINI_FILE_TTL",
"MAGIC_BUFFER_SIZE",
"MAX_CONCURRENCY",
"MULTIPART_CHUNKSIZE",
"MULTIPART_THRESHOLD",
"UPLOAD_MAX_RETRIES",
"UPLOAD_RETRY_DELAY_BASE",
"AsyncFileStream",
"AudioFile",
"AudioMimeType",
"BaseFile",
"CoercedFileSource",
"File",
"FileBytes",
"FileInput",
"FileMode",
"FilePath",
"FileReference",
"FileSource",
"FileStream",
"FileUrl",
"ImageFile",
"ImageMimeType",
"InlineBase64",
"InlineBytes",
"PDFFile",
"ResolvedFile",
"TextFile",
"UrlReference",
"VideoFile",
"VideoMimeType",
]

View File

@@ -0,0 +1,26 @@
"""Constants for file handling utilities."""
from datetime import timedelta
from typing import Final, Literal
DEFAULT_MAX_FILE_SIZE_BYTES: Final[Literal[524_288_000]] = 524_288_000
MAGIC_BUFFER_SIZE: Final[Literal[2048]] = 2048
UPLOAD_MAX_RETRIES: Final[Literal[3]] = 3
UPLOAD_RETRY_DELAY_BASE: Final[Literal[2]] = 2
DEFAULT_TTL_SECONDS: Final[Literal[86_400]] = 86_400
DEFAULT_MAX_CACHE_ENTRIES: Final[Literal[1000]] = 1000
GEMINI_FILE_TTL: Final[timedelta] = timedelta(hours=48)
BACKOFF_BASE_DELAY: Final[float] = 1.0
BACKOFF_MAX_DELAY: Final[float] = 30.0
BACKOFF_JITTER_FACTOR: Final[float] = 0.1
FILES_API_MAX_SIZE: Final[Literal[536_870_912]] = 536_870_912
DEFAULT_UPLOAD_CHUNK_SIZE: Final[Literal[67_108_864]] = 67_108_864
MULTIPART_THRESHOLD: Final[Literal[8_388_608]] = 8_388_608
MULTIPART_CHUNKSIZE: Final[Literal[8_388_608]] = 8_388_608
MAX_CONCURRENCY: Final[Literal[10]] = 10

View File

@@ -0,0 +1,84 @@
"""Resolved file types representing different delivery methods for file content."""
from abc import ABC
from dataclasses import dataclass
from datetime import datetime
@dataclass(frozen=True)
class ResolvedFile(ABC):
"""Base class for resolved file representations.
A ResolvedFile represents the final form of a file ready for delivery
to an LLM provider, whether inline or via reference.
Attributes:
content_type: MIME type of the file content.
"""
content_type: str
@dataclass(frozen=True)
class InlineBase64(ResolvedFile):
"""File content encoded as base64 string.
Used by most providers for inline file content in messages.
Attributes:
content_type: MIME type of the file content.
data: Base64-encoded file content.
"""
data: str
@dataclass(frozen=True)
class InlineBytes(ResolvedFile):
"""File content as raw bytes.
Used by providers like Bedrock that accept raw bytes instead of base64.
Attributes:
content_type: MIME type of the file content.
data: Raw file bytes.
"""
data: bytes
@dataclass(frozen=True)
class FileReference(ResolvedFile):
"""Reference to an uploaded file.
Used when files are uploaded via provider File APIs.
Attributes:
content_type: MIME type of the file content.
file_id: Provider-specific file identifier.
provider: Name of the provider the file was uploaded to.
expires_at: When the uploaded file expires (if applicable).
file_uri: Optional URI for accessing the file (used by Gemini).
"""
file_id: str
provider: str
expires_at: datetime | None = None
file_uri: str | None = None
@dataclass(frozen=True)
class UrlReference(ResolvedFile):
"""Reference to a file accessible via URL.
Used by providers that support fetching files from URLs.
Attributes:
content_type: MIME type of the file content.
url: URL where the file can be accessed.
"""
url: str
ResolvedFileType = InlineBase64 | InlineBytes | FileReference | UrlReference

View File

@@ -0,0 +1,529 @@
"""Base file class for handling file inputs in tasks."""
from __future__ import annotations
from collections.abc import AsyncIterator, Iterator
import inspect
import mimetypes
from pathlib import Path
from typing import Annotated, Any, BinaryIO, Protocol, cast, runtime_checkable
import aiofiles
from pydantic import (
BaseModel,
BeforeValidator,
Field,
GetCoreSchemaHandler,
PrivateAttr,
model_validator,
)
from pydantic_core import CoreSchema, core_schema
from typing_extensions import TypeIs
from crewai_files.core.constants import DEFAULT_MAX_FILE_SIZE_BYTES, MAGIC_BUFFER_SIZE
@runtime_checkable
class AsyncReadable(Protocol):
"""Protocol for async readable streams."""
async def read(self, size: int = -1) -> bytes:
"""Read up to size bytes from the stream."""
...
class _AsyncReadableValidator:
"""Pydantic validator for AsyncReadable types."""
@classmethod
def __get_pydantic_core_schema__(
cls, _source_type: Any, _handler: GetCoreSchemaHandler
) -> CoreSchema:
return core_schema.no_info_plain_validator_function(
cls._validate,
serialization=core_schema.plain_serializer_function_ser_schema(
lambda x: None, info_arg=False
),
)
@staticmethod
def _validate(value: Any) -> AsyncReadable:
if isinstance(value, AsyncReadable):
return value
raise ValueError("Expected an async readable object with async read() method")
ValidatedAsyncReadable = Annotated[AsyncReadable, _AsyncReadableValidator()]
def _fallback_content_type(filename: str | None) -> str:
"""Get content type from filename extension or return default."""
if filename:
mime_type, _ = mimetypes.guess_type(filename)
if mime_type:
return mime_type
return "application/octet-stream"
def generate_filename(content_type: str) -> str:
"""Generate a UUID-based filename with extension from content type.
Args:
content_type: MIME type to derive extension from.
Returns:
Filename in format "{uuid}{ext}" where ext includes the dot.
"""
import uuid
ext = mimetypes.guess_extension(content_type) or ""
return f"{uuid.uuid4()}{ext}"
def detect_content_type(data: bytes, filename: str | None = None) -> str:
"""Detect MIME type from file content.
Uses python-magic if available for accurate content-based detection,
falls back to mimetypes module using filename extension.
Args:
data: Raw bytes to analyze (only first 2048 bytes are used).
filename: Optional filename for extension-based fallback.
Returns:
The detected MIME type.
"""
try:
import magic
result: str = magic.from_buffer(data[:MAGIC_BUFFER_SIZE], mime=True)
return result
except ImportError:
return _fallback_content_type(filename)
def detect_content_type_from_path(path: Path, filename: str | None = None) -> str:
"""Detect MIME type from file path.
Uses python-magic's from_file() for accurate detection without reading
the entire file into memory.
Args:
path: Path to the file.
filename: Optional filename for extension-based fallback.
Returns:
The detected MIME type.
"""
try:
import magic
result: str = magic.from_file(str(path), mime=True)
return result
except ImportError:
return _fallback_content_type(filename or path.name)
class _BinaryIOValidator:
"""Pydantic validator for BinaryIO types."""
@classmethod
def __get_pydantic_core_schema__(
cls, _source_type: Any, _handler: GetCoreSchemaHandler
) -> CoreSchema:
return core_schema.no_info_plain_validator_function(
cls._validate,
serialization=core_schema.plain_serializer_function_ser_schema(
lambda x: None, info_arg=False
),
)
@staticmethod
def _validate(value: Any) -> BinaryIO:
if hasattr(value, "read") and hasattr(value, "seek"):
return cast(BinaryIO, value)
raise ValueError("Expected a binary file-like object with read() and seek()")
ValidatedBinaryIO = Annotated[BinaryIO, _BinaryIOValidator()]
class FilePath(BaseModel):
"""File loaded from a filesystem path."""
path: Path = Field(description="Path to the file on the filesystem.")
max_size_bytes: int = Field(
default=DEFAULT_MAX_FILE_SIZE_BYTES,
exclude=True,
description="Maximum file size in bytes.",
)
_content: bytes | None = PrivateAttr(default=None)
_content_type: str = PrivateAttr()
@model_validator(mode="after")
def _validate_file_exists(self) -> FilePath:
"""Validate that the file exists, is secure, and within size limits."""
from crewai_files.processing.exceptions import FileTooLargeError
path_str = str(self.path)
if ".." in path_str:
raise ValueError(f"Path traversal not allowed: {self.path}")
if self.path.is_symlink():
resolved = self.path.resolve()
cwd = Path.cwd().resolve()
if not str(resolved).startswith(str(cwd)):
raise ValueError(f"Symlink escapes allowed directory: {self.path}")
if not self.path.exists():
raise ValueError(f"File not found: {self.path}")
if not self.path.is_file():
raise ValueError(f"Path is not a file: {self.path}")
actual_size = self.path.stat().st_size
if actual_size > self.max_size_bytes:
raise FileTooLargeError(
f"File exceeds max size ({actual_size} > {self.max_size_bytes})",
file_name=str(self.path),
actual_size=actual_size,
max_size=self.max_size_bytes,
)
self._content_type = detect_content_type_from_path(self.path, self.path.name)
return self
@property
def filename(self) -> str:
"""Get the filename from the path."""
return self.path.name
@property
def content_type(self) -> str:
"""Get the content type."""
return self._content_type
def read(self) -> bytes:
"""Read the file content from disk."""
if self._content is None:
self._content = self.path.read_bytes()
return self._content
async def aread(self) -> bytes:
"""Async read the file content from disk."""
if self._content is None:
async with aiofiles.open(self.path, "rb") as f:
self._content = await f.read()
return self._content
def read_chunks(self, chunk_size: int = 65536) -> Iterator[bytes]:
"""Stream file content in chunks without loading entirely into memory.
Args:
chunk_size: Size of each chunk in bytes.
Yields:
Chunks of file content.
"""
with open(self.path, "rb") as f:
while chunk := f.read(chunk_size):
yield chunk
async def aread_chunks(self, chunk_size: int = 65536) -> AsyncIterator[bytes]:
"""Async streaming for non-blocking I/O.
Args:
chunk_size: Size of each chunk in bytes.
Yields:
Chunks of file content.
"""
async with aiofiles.open(self.path, "rb") as f:
while chunk := await f.read(chunk_size):
yield chunk
class FileBytes(BaseModel):
"""File created from raw bytes content."""
data: bytes = Field(description="Raw bytes content of the file.")
filename: str | None = Field(default=None, description="Optional filename.")
_content_type: str = PrivateAttr()
@model_validator(mode="after")
def _detect_content_type(self) -> FileBytes:
"""Detect and cache content type from data."""
self._content_type = detect_content_type(self.data, self.filename)
return self
@property
def content_type(self) -> str:
"""Get the content type."""
return self._content_type
def read(self) -> bytes:
"""Return the bytes content."""
return self.data
async def aread(self) -> bytes:
"""Async return the bytes content (immediate, already in memory)."""
return self.data
def read_chunks(self, chunk_size: int = 65536) -> Iterator[bytes]:
"""Stream bytes content in chunks.
Args:
chunk_size: Size of each chunk in bytes.
Yields:
Chunks of bytes content.
"""
for i in range(0, len(self.data), chunk_size):
yield self.data[i : i + chunk_size]
async def aread_chunks(self, chunk_size: int = 65536) -> AsyncIterator[bytes]:
"""Async streaming (immediate yield since already in memory).
Args:
chunk_size: Size of each chunk in bytes.
Yields:
Chunks of bytes content.
"""
for chunk in self.read_chunks(chunk_size):
yield chunk
class FileStream(BaseModel):
"""File loaded from a file-like stream."""
stream: ValidatedBinaryIO = Field(description="Binary file stream.")
filename: str | None = Field(default=None, description="Optional filename.")
_content: bytes | None = PrivateAttr(default=None)
_content_type: str = PrivateAttr()
@model_validator(mode="after")
def _initialize(self) -> FileStream:
"""Extract filename and detect content type."""
if self.filename is None:
name = getattr(self.stream, "name", None)
if name is not None:
self.filename = Path(name).name
position = self.stream.tell()
self.stream.seek(0)
header = self.stream.read(MAGIC_BUFFER_SIZE)
self.stream.seek(position)
self._content_type = detect_content_type(header, self.filename)
return self
@property
def content_type(self) -> str:
"""Get the content type."""
return self._content_type
def read(self) -> bytes:
"""Read the stream content. Content is cached after first read."""
if self._content is None:
position = self.stream.tell()
self.stream.seek(0)
self._content = self.stream.read()
self.stream.seek(position)
return self._content
def close(self) -> None:
"""Close the underlying stream."""
self.stream.close()
def __enter__(self) -> FileStream:
"""Enter context manager."""
return self
def __exit__(
self,
exc_type: type[BaseException] | None,
exc_val: BaseException | None,
exc_tb: Any,
) -> None:
"""Exit context manager and close stream."""
self.close()
def read_chunks(self, chunk_size: int = 65536) -> Iterator[bytes]:
"""Stream from underlying stream in chunks.
Args:
chunk_size: Size of each chunk in bytes.
Yields:
Chunks of stream content.
"""
position = self.stream.tell()
self.stream.seek(0)
try:
while chunk := self.stream.read(chunk_size):
yield chunk
finally:
self.stream.seek(position)
class AsyncFileStream(BaseModel):
"""File loaded from an async stream.
Use for async file handles like aiofiles objects or aiohttp response bodies.
This is an async-only type - use aread() instead of read().
Attributes:
stream: Async file-like object with async read() method.
filename: Optional filename for the stream.
"""
stream: ValidatedAsyncReadable = Field(
description="Async file stream with async read() method."
)
filename: str | None = Field(default=None, description="Optional filename.")
_content: bytes | None = PrivateAttr(default=None)
_content_type: str | None = PrivateAttr(default=None)
@property
def content_type(self) -> str:
"""Get the content type from stream content (cached). Requires aread() first."""
if self._content is None:
raise RuntimeError("Call aread() first to load content")
if self._content_type is None:
self._content_type = detect_content_type(self._content, self.filename)
return self._content_type
async def aread(self) -> bytes:
"""Async read the stream content. Content is cached after first read."""
if self._content is None:
self._content = await self.stream.read()
return self._content
async def aclose(self) -> None:
"""Async close the underlying stream."""
if hasattr(self.stream, "close"):
result = self.stream.close()
if inspect.isawaitable(result):
await result
async def __aenter__(self) -> AsyncFileStream:
"""Async enter context manager."""
return self
async def __aexit__(
self,
exc_type: type[BaseException] | None,
exc_val: BaseException | None,
exc_tb: Any,
) -> None:
"""Async exit context manager and close stream."""
await self.aclose()
async def aread_chunks(self, chunk_size: int = 65536) -> AsyncIterator[bytes]:
"""Async stream content in chunks.
Args:
chunk_size: Size of each chunk in bytes.
Yields:
Chunks of stream content.
"""
while chunk := await self.stream.read(chunk_size):
yield chunk
class FileUrl(BaseModel):
"""File referenced by URL.
For providers that support URL references, the URL is passed directly.
For providers that don't, content is fetched on demand.
Attributes:
url: URL where the file can be accessed.
filename: Optional filename (extracted from URL if not provided).
"""
url: str = Field(description="URL where the file can be accessed.")
filename: str | None = Field(default=None, description="Optional filename.")
_content_type: str | None = PrivateAttr(default=None)
_content: bytes | None = PrivateAttr(default=None)
@model_validator(mode="after")
def _validate_url(self) -> FileUrl:
"""Validate URL format."""
if not self.url.startswith(("http://", "https://")):
raise ValueError(f"Invalid URL scheme: {self.url}")
return self
@property
def content_type(self) -> str:
"""Get the content type, guessing from URL extension if not set."""
if self._content_type is None:
self._content_type = self._guess_content_type()
return self._content_type
def _guess_content_type(self) -> str:
"""Guess content type from URL extension."""
from urllib.parse import urlparse
parsed = urlparse(self.url)
path = parsed.path
guessed, _ = mimetypes.guess_type(path)
return guessed or "application/octet-stream"
def read(self) -> bytes:
"""Fetch content from URL (for providers that don't support URL references)."""
if self._content is None:
import httpx
response = httpx.get(self.url, follow_redirects=True)
response.raise_for_status()
self._content = response.content
if "content-type" in response.headers:
self._content_type = response.headers["content-type"].split(";")[0]
return self._content
async def aread(self) -> bytes:
"""Async fetch content from URL."""
if self._content is None:
import httpx
async with httpx.AsyncClient() as client:
response = await client.get(self.url, follow_redirects=True)
response.raise_for_status()
self._content = response.content
if "content-type" in response.headers:
self._content_type = response.headers["content-type"].split(";")[0]
return self._content
FileSource = FilePath | FileBytes | FileStream | AsyncFileStream | FileUrl
def is_file_source(v: object) -> TypeIs[FileSource]:
"""Type guard to narrow input to FileSource."""
return isinstance(v, (FilePath, FileBytes, FileStream, FileUrl))
def _normalize_source(value: Any) -> FileSource:
"""Convert raw input to appropriate source type."""
if isinstance(value, (FilePath, FileBytes, FileStream, AsyncFileStream, FileUrl)):
return value
if isinstance(value, str):
if value.startswith(("http://", "https://")):
return FileUrl(url=value)
return FilePath(path=Path(value))
if isinstance(value, Path):
return FilePath(path=value)
if isinstance(value, bytes):
return FileBytes(data=value)
if isinstance(value, AsyncReadable):
return AsyncFileStream(stream=value)
if hasattr(value, "read") and hasattr(value, "seek"):
return FileStream(stream=value)
raise ValueError(f"Cannot convert {type(value).__name__} to file source")
RawFileInput = str | Path | bytes
FileSourceInput = Annotated[
RawFileInput | FileSource, BeforeValidator(_normalize_source)
]

View File

@@ -0,0 +1,282 @@
"""Content-type specific file classes."""
from __future__ import annotations
from abc import ABC
from io import IOBase
from pathlib import Path
from typing import Annotated, Any, BinaryIO, Literal
from pydantic import BaseModel, Field, GetCoreSchemaHandler
from pydantic_core import CoreSchema, core_schema
from typing_extensions import Self
from crewai_files.core.sources import (
AsyncFileStream,
FileBytes,
FilePath,
FileSource,
FileStream,
FileUrl,
is_file_source,
)
FileSourceInput = str | Path | bytes | IOBase | FileSource
class _FileSourceCoercer:
"""Pydantic-compatible type that coerces various inputs to FileSource."""
@classmethod
def _coerce(cls, v: Any) -> FileSource:
"""Convert raw input to appropriate FileSource type."""
if isinstance(v, (FilePath, FileBytes, FileStream, FileUrl)):
return v
if isinstance(v, str):
if v.startswith(("http://", "https://")):
return FileUrl(url=v)
return FilePath(path=Path(v))
if isinstance(v, Path):
return FilePath(path=v)
if isinstance(v, bytes):
return FileBytes(data=v)
if isinstance(v, (IOBase, BinaryIO)):
return FileStream(stream=v)
raise ValueError(f"Cannot convert {type(v).__name__} to file source")
@classmethod
def __get_pydantic_core_schema__(
cls,
_source_type: Any,
_handler: GetCoreSchemaHandler,
) -> CoreSchema:
"""Generate Pydantic core schema for FileSource coercion."""
return core_schema.no_info_plain_validator_function(
cls._coerce,
serialization=core_schema.plain_serializer_function_ser_schema(
lambda v: v,
info_arg=False,
return_schema=core_schema.any_schema(),
),
)
CoercedFileSource = Annotated[FileSourceInput, _FileSourceCoercer]
FileMode = Literal["strict", "auto", "warn", "chunk"]
ImageExtension = Literal[
".png",
".jpg",
".jpeg",
".gif",
".webp",
".bmp",
".tiff",
".tif",
".svg",
".heic",
".heif",
]
ImageMimeType = Literal[
"image/png",
"image/jpeg",
"image/gif",
"image/webp",
"image/bmp",
"image/tiff",
"image/svg+xml",
"image/heic",
"image/heif",
]
PDFExtension = Literal[".pdf"]
PDFContentType = Literal["application/pdf"]
TextExtension = Literal[
".txt",
".md",
".rst",
".csv",
".json",
".xml",
".yaml",
".yml",
".html",
".htm",
".log",
".ini",
".cfg",
".conf",
]
TextContentType = Literal[
"text/plain",
"text/markdown",
"text/csv",
"application/json",
"application/xml",
"text/xml",
"application/x-yaml",
"text/yaml",
"text/html",
]
AudioExtension = Literal[
".mp3", ".wav", ".ogg", ".flac", ".aac", ".m4a", ".wma", ".aiff", ".opus"
]
AudioMimeType = Literal[
"audio/mp3",
"audio/mpeg",
"audio/wav",
"audio/x-wav",
"audio/ogg",
"audio/flac",
"audio/aac",
"audio/m4a",
"audio/mp4",
"audio/x-ms-wma",
"audio/aiff",
"audio/opus",
]
VideoExtension = Literal[
".mp4", ".avi", ".mkv", ".mov", ".webm", ".flv", ".wmv", ".m4v", ".mpeg", ".mpg"
]
VideoMimeType = Literal[
"video/mp4",
"video/mpeg",
"video/webm",
"video/quicktime",
"video/x-msvideo",
"video/x-matroska",
"video/x-flv",
"video/x-ms-wmv",
]
class BaseFile(ABC, BaseModel):
"""Abstract base class for typed file wrappers.
Provides common functionality for all file types including:
- File source management
- Content reading
- Dict unpacking support (`**` syntax)
- Per-file mode mode
Can be unpacked with ** syntax: `{**ImageFile(source="./chart.png")}`
which unpacks to: `{"chart": <ImageFile instance>}` using filename stem as key.
Attributes:
source: The underlying file source (path, bytes, or stream).
mode: How to handle this file if it exceeds provider limits.
"""
source: CoercedFileSource = Field(description="The underlying file source.")
mode: FileMode = Field(
default="auto",
description="How to handle if file exceeds limits: strict, auto, warn, chunk.",
)
@property
def _file_source(self) -> FileSource:
"""Get source with narrowed type (always FileSource after validation)."""
if is_file_source(self.source):
return self.source
raise TypeError("source must be a FileSource after validation")
@property
def filename(self) -> str | None:
"""Get the filename from the source."""
return self._file_source.filename
@property
def content_type(self) -> str:
"""Get the content type from the source."""
return self._file_source.content_type
def read(self) -> bytes:
"""Read the file content as bytes."""
return self._file_source.read() # type: ignore[union-attr]
async def aread(self) -> bytes:
"""Async read the file content as bytes.
Raises:
TypeError: If the underlying source doesn't support async read.
"""
source = self._file_source
if isinstance(source, (FilePath, FileBytes, AsyncFileStream, FileUrl)):
return await source.aread()
raise TypeError(f"{type(source).__name__} does not support async read")
def read_text(self, encoding: str = "utf-8") -> str:
"""Read the file content as string."""
return self.read().decode(encoding)
@property
def _unpack_key(self) -> str:
"""Get the key to use when unpacking (filename stem)."""
filename = self._file_source.filename
if filename:
return Path(filename).stem
return "file"
def keys(self) -> list[str]:
"""Return keys for dict unpacking."""
return [self._unpack_key]
def __getitem__(self, key: str) -> Self:
"""Return self for dict unpacking."""
if key == self._unpack_key:
return self
raise KeyError(key)
class ImageFile(BaseFile):
"""File representing an image.
Supports common image formats: PNG, JPEG, GIF, WebP, BMP, TIFF, SVG.
"""
class PDFFile(BaseFile):
"""File representing a PDF document."""
class TextFile(BaseFile):
"""File representing a text document.
Supports common text formats: TXT, MD, RST, CSV, JSON, XML, YAML, HTML.
"""
class AudioFile(BaseFile):
"""File representing an audio file.
Supports common audio formats: MP3, WAV, OGG, FLAC, AAC, M4A, WMA.
"""
class VideoFile(BaseFile):
"""File representing a video file.
Supports common video formats: MP4, AVI, MKV, MOV, WebM, FLV, WMV.
"""
class File(BaseFile):
"""Generic file that auto-detects the appropriate type.
Use this when you don't want to specify the exact file type.
The content type is automatically detected from the file contents.
Example:
>>> pdf_file = File(source="./document.pdf")
>>> image_file = File(source="./image.png")
>>> bytes_file = File(source=b"file content")
"""
FileInput = AudioFile | File | ImageFile | PDFFile | TextFile | VideoFile

View File

@@ -0,0 +1,14 @@
"""High-level formatting API for multimodal content."""
from crewai_files.formatting.api import (
aformat_multimodal_content,
format_multimodal_content,
)
from crewai_files.formatting.openai import OpenAIResponsesFormatter
__all__ = [
"OpenAIResponsesFormatter",
"aformat_multimodal_content",
"format_multimodal_content",
]

View File

@@ -0,0 +1,98 @@
"""Anthropic content block formatter."""
from __future__ import annotations
import base64
from typing import Any
from crewai_files.core.resolved import (
FileReference,
InlineBase64,
InlineBytes,
ResolvedFileType,
UrlReference,
)
from crewai_files.core.types import FileInput
class AnthropicFormatter:
"""Formats resolved files into Anthropic content blocks."""
def format_block(
self,
file: FileInput,
resolved: ResolvedFileType,
) -> dict[str, Any] | None:
"""Format a resolved file into an Anthropic content block.
Args:
file: Original file input with metadata.
resolved: Resolved file.
Returns:
Content block dict or None if not supported.
"""
content_type = file.content_type
block_type = self._get_block_type(content_type)
if block_type is None:
return None
if isinstance(resolved, FileReference):
return {
"type": block_type,
"source": {
"type": "file",
"file_id": resolved.file_id,
},
"cache_control": {"type": "ephemeral"},
}
if isinstance(resolved, UrlReference):
return {
"type": block_type,
"source": {
"type": "url",
"url": resolved.url,
},
"cache_control": {"type": "ephemeral"},
}
if isinstance(resolved, InlineBase64):
return {
"type": block_type,
"source": {
"type": "base64",
"media_type": resolved.content_type,
"data": resolved.data,
},
"cache_control": {"type": "ephemeral"},
}
if isinstance(resolved, InlineBytes):
return {
"type": block_type,
"source": {
"type": "base64",
"media_type": resolved.content_type,
"data": base64.b64encode(resolved.data).decode("ascii"),
},
"cache_control": {"type": "ephemeral"},
}
raise TypeError(f"Unexpected resolved type: {type(resolved).__name__}")
@staticmethod
def _get_block_type(content_type: str) -> str | None:
"""Get Anthropic block type for content type.
Args:
content_type: MIME type.
Returns:
Block type string or None if not supported.
"""
if content_type.startswith("image/"):
return "image"
if content_type == "application/pdf":
return "document"
return None

View File

@@ -0,0 +1,328 @@
"""High-level API for formatting multimodal content."""
from __future__ import annotations
import os
from typing import Any
from crewai_files.cache.upload_cache import get_upload_cache
from crewai_files.core.types import FileInput
from crewai_files.formatting.anthropic import AnthropicFormatter
from crewai_files.formatting.bedrock import BedrockFormatter
from crewai_files.formatting.gemini import GeminiFormatter
from crewai_files.formatting.openai import OpenAIFormatter, OpenAIResponsesFormatter
from crewai_files.processing.constraints import get_constraints_for_provider
from crewai_files.processing.processor import FileProcessor
from crewai_files.resolution.resolver import FileResolver, FileResolverConfig
from crewai_files.uploaders.factory import ProviderType
def _normalize_provider(provider: str | None) -> ProviderType:
"""Normalize provider string to ProviderType.
Args:
provider: Raw provider string.
Returns:
Normalized provider type.
Raises:
ValueError: If provider is None or empty.
"""
if not provider:
raise ValueError("provider is required")
provider_lower = provider.lower()
if "gemini" in provider_lower:
return "gemini"
if "google" in provider_lower:
return "google"
if "anthropic" in provider_lower:
return "anthropic"
if "claude" in provider_lower:
return "claude"
if "bedrock" in provider_lower:
return "bedrock"
if "aws" in provider_lower:
return "aws"
if "azure" in provider_lower:
return "azure"
if "gpt" in provider_lower:
return "gpt"
return "openai"
def format_multimodal_content(
files: dict[str, FileInput],
provider: str | None = None,
api: str | None = None,
prefer_upload: bool | None = None,
) -> list[dict[str, Any]]:
"""Format files as provider-specific multimodal content blocks.
This is the main high-level API for converting files to content blocks
suitable for sending to LLM providers. It handles:
- File processing according to provider constraints
- Resolution (upload vs inline) based on provider capabilities
- Formatting into provider-specific content block structures
Args:
files: Dictionary mapping file names to FileInput objects.
provider: Provider name (e.g., "openai", "anthropic", "bedrock", "gemini").
api: API variant (e.g., "responses" for OpenAI Responses API).
prefer_upload: Whether to prefer uploading files instead of inlining.
If None, uses provider-specific defaults.
Returns:
List of content blocks in the provider's expected format.
Example:
>>> from crewai_files import format_multimodal_content, ImageFile
>>> files = {"photo": ImageFile(source="image.jpg")}
>>> blocks = format_multimodal_content(files, "openai")
>>> # For OpenAI Responses API:
>>> blocks = format_multimodal_content(files, "openai", api="responses")
>>> # With file upload:
>>> blocks = format_multimodal_content(
... files, "openai", api="responses", prefer_upload=True
... )
"""
if not files:
return []
provider_type = _normalize_provider(provider)
processor = FileProcessor(constraints=provider_type)
processed_files = processor.process_files(files)
if not processed_files:
return []
constraints = get_constraints_for_provider(provider_type)
supported_types = _get_supported_types(constraints)
supported_files = _filter_supported_files(processed_files, supported_types)
if not supported_files:
return []
config = _get_resolver_config(provider_type, prefer_upload)
upload_cache = get_upload_cache()
resolver = FileResolver(config=config, upload_cache=upload_cache)
formatter = _get_formatter(provider_type, api)
content_blocks: list[dict[str, Any]] = []
for name, file_input in supported_files.items():
resolved = resolver.resolve(file_input, provider_type)
block = _format_block(formatter, file_input, resolved, name)
if block is not None:
content_blocks.append(block)
return content_blocks
async def aformat_multimodal_content(
files: dict[str, FileInput],
provider: str | None = None,
api: str | None = None,
prefer_upload: bool | None = None,
) -> list[dict[str, Any]]:
"""Async format files as provider-specific multimodal content blocks.
Async version of format_multimodal_content with parallel file resolution.
Args:
files: Dictionary mapping file names to FileInput objects.
provider: Provider name (e.g., "openai", "anthropic", "bedrock", "gemini").
api: API variant (e.g., "responses" for OpenAI Responses API).
prefer_upload: Whether to prefer uploading files instead of inlining.
If None, uses provider-specific defaults.
Returns:
List of content blocks in the provider's expected format.
"""
if not files:
return []
provider_type = _normalize_provider(provider)
processor = FileProcessor(constraints=provider_type)
processed_files = await processor.aprocess_files(files)
if not processed_files:
return []
constraints = get_constraints_for_provider(provider_type)
supported_types = _get_supported_types(constraints)
supported_files = _filter_supported_files(processed_files, supported_types)
if not supported_files:
return []
config = _get_resolver_config(provider_type, prefer_upload)
upload_cache = get_upload_cache()
resolver = FileResolver(config=config, upload_cache=upload_cache)
resolved_files = await resolver.aresolve_files(supported_files, provider_type)
formatter = _get_formatter(provider_type, api)
content_blocks: list[dict[str, Any]] = []
for name, resolved in resolved_files.items():
file_input = supported_files[name]
block = _format_block(formatter, file_input, resolved, name)
if block is not None:
content_blocks.append(block)
return content_blocks
def _get_supported_types(
constraints: Any | None,
) -> list[str]:
"""Get list of supported MIME type prefixes from constraints.
Args:
constraints: Provider constraints.
Returns:
List of MIME type prefixes (e.g., ["image/", "application/pdf"]).
"""
if constraints is None:
return []
supported: list[str] = []
if constraints.image is not None:
supported.append("image/")
if constraints.pdf is not None:
supported.append("application/pdf")
if constraints.audio is not None:
supported.append("audio/")
if constraints.video is not None:
supported.append("video/")
if constraints.text is not None:
supported.append("text/")
supported.append("application/json")
supported.append("application/xml")
supported.append("application/x-yaml")
return supported
def _filter_supported_files(
files: dict[str, FileInput],
supported_types: list[str],
) -> dict[str, FileInput]:
"""Filter files to those with supported content types.
Args:
files: All files.
supported_types: MIME type prefixes to allow.
Returns:
Filtered dictionary of supported files.
"""
return {
name: f
for name, f in files.items()
if any(f.content_type.startswith(t) for t in supported_types)
}
def _get_resolver_config(
provider_lower: str,
prefer_upload_override: bool | None = None,
) -> FileResolverConfig:
"""Get resolver config for provider.
Args:
provider_lower: Lowercase provider name.
prefer_upload_override: Override for prefer_upload setting.
If None, uses provider-specific defaults.
Returns:
Configured FileResolverConfig.
"""
if "bedrock" in provider_lower:
s3_bucket = os.environ.get("CREWAI_BEDROCK_S3_BUCKET")
prefer_upload = (
prefer_upload_override
if prefer_upload_override is not None
else bool(s3_bucket)
)
return FileResolverConfig(
prefer_upload=prefer_upload, use_bytes_for_bedrock=True
)
prefer_upload = (
prefer_upload_override if prefer_upload_override is not None else False
)
return FileResolverConfig(prefer_upload=prefer_upload)
def _get_formatter(
provider_lower: str,
api: str | None = None,
) -> (
OpenAIFormatter
| OpenAIResponsesFormatter
| AnthropicFormatter
| BedrockFormatter
| GeminiFormatter
):
"""Get formatter for provider.
Args:
provider_lower: Lowercase provider name.
api: API variant (e.g., "responses" for OpenAI Responses API).
Returns:
Provider-specific formatter instance.
"""
if "anthropic" in provider_lower or "claude" in provider_lower:
return AnthropicFormatter()
if "bedrock" in provider_lower or "aws" in provider_lower:
s3_bucket_owner = os.environ.get("CREWAI_BEDROCK_S3_BUCKET_OWNER")
return BedrockFormatter(s3_bucket_owner=s3_bucket_owner)
if "gemini" in provider_lower or "google" in provider_lower:
return GeminiFormatter()
if api == "responses":
return OpenAIResponsesFormatter()
return OpenAIFormatter()
def _format_block(
formatter: OpenAIFormatter
| OpenAIResponsesFormatter
| AnthropicFormatter
| BedrockFormatter
| GeminiFormatter,
file_input: FileInput,
resolved: Any,
name: str,
) -> dict[str, Any] | None:
"""Format a single file block using the appropriate formatter.
Args:
formatter: Provider formatter.
file_input: Original file input.
resolved: Resolved file.
name: File name.
Returns:
Content block dict or None.
"""
if isinstance(formatter, BedrockFormatter):
return formatter.format_block(file_input, resolved, name=name)
if isinstance(formatter, AnthropicFormatter):
return formatter.format_block(file_input, resolved)
if isinstance(formatter, OpenAIResponsesFormatter):
return formatter.format_block(resolved, file_input.content_type)
if isinstance(formatter, (OpenAIFormatter, GeminiFormatter)):
return formatter.format_block(resolved)
raise TypeError(f"Unknown formatter type: {type(formatter).__name__}")

View File

@@ -0,0 +1,200 @@
"""Bedrock content block formatter."""
from __future__ import annotations
import base64
from typing import Any
from crewai_files.core.resolved import (
FileReference,
InlineBase64,
InlineBytes,
ResolvedFileType,
UrlReference,
)
from crewai_files.core.types import FileInput
_DOCUMENT_FORMATS: dict[str, str] = {
"application/pdf": "pdf",
"text/csv": "csv",
"text/plain": "txt",
"text/markdown": "md",
"text/html": "html",
"application/msword": "doc",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document": "docx",
"application/vnd.ms-excel": "xls",
"application/vnd.openxmlformats-officedocument.spreadsheetml.sheet": "xlsx",
}
_VIDEO_FORMATS: dict[str, str] = {
"video/mp4": "mp4",
"video/quicktime": "mov",
"video/x-matroska": "mkv",
"video/webm": "webm",
"video/x-flv": "flv",
"video/mpeg": "mpeg",
"video/3gpp": "three_gp",
}
class BedrockFormatter:
"""Formats resolved files into Bedrock Converse API content blocks."""
def __init__(self, s3_bucket_owner: str | None = None) -> None:
"""Initialize formatter.
Args:
s3_bucket_owner: Optional S3 bucket owner for file references.
"""
self.s3_bucket_owner = s3_bucket_owner
def format_block(
self,
file: FileInput,
resolved: ResolvedFileType,
name: str | None = None,
) -> dict[str, Any] | None:
"""Format a resolved file into a Bedrock content block.
Args:
file: Original file input with metadata.
resolved: Resolved file.
name: File name (required for document blocks).
Returns:
Content block dict or None if not supported.
"""
content_type = file.content_type
if isinstance(resolved, FileReference):
if not resolved.file_uri:
raise ValueError("Bedrock requires file_uri for FileReference (S3 URI)")
return self._format_s3_block(content_type, resolved.file_uri, name)
if isinstance(resolved, InlineBytes):
return self._format_bytes_block(content_type, resolved.data, name)
if isinstance(resolved, InlineBase64):
file_bytes = base64.b64decode(resolved.data)
return self._format_bytes_block(content_type, file_bytes, name)
if isinstance(resolved, UrlReference):
raise ValueError(
"Bedrock does not support URL references - resolve to bytes first"
)
raise TypeError(f"Unexpected resolved type: {type(resolved).__name__}")
def _format_s3_block(
self,
content_type: str,
file_uri: str,
name: str | None,
) -> dict[str, Any] | None:
"""Format block with S3 location source.
Args:
content_type: MIME type.
file_uri: S3 URI.
name: File name for documents.
Returns:
Content block dict or None.
"""
s3_location: dict[str, Any] = {"uri": file_uri}
if self.s3_bucket_owner:
s3_location["bucketOwner"] = self.s3_bucket_owner
if content_type.startswith("image/"):
return {
"image": {
"format": self._get_image_format(content_type),
"source": {"s3Location": s3_location},
}
}
if content_type.startswith("video/"):
video_format = _VIDEO_FORMATS.get(content_type)
if video_format:
return {
"video": {
"format": video_format,
"source": {"s3Location": s3_location},
}
}
return None
doc_format = _DOCUMENT_FORMATS.get(content_type)
if doc_format:
return {
"document": {
"name": name or "document",
"format": doc_format,
"source": {"s3Location": s3_location},
}
}
return None
def _format_bytes_block(
self,
content_type: str,
file_bytes: bytes,
name: str | None,
) -> dict[str, Any] | None:
"""Format block with inline bytes source.
Args:
content_type: MIME type.
file_bytes: Raw file bytes.
name: File name for documents.
Returns:
Content block dict or None.
"""
if content_type.startswith("image/"):
return {
"image": {
"format": self._get_image_format(content_type),
"source": {"bytes": file_bytes},
}
}
if content_type.startswith("video/"):
video_format = _VIDEO_FORMATS.get(content_type)
if video_format:
return {
"video": {
"format": video_format,
"source": {"bytes": file_bytes},
}
}
return None
doc_format = _DOCUMENT_FORMATS.get(content_type)
if doc_format:
return {
"document": {
"name": name or "document",
"format": doc_format,
"source": {"bytes": file_bytes},
}
}
return None
@staticmethod
def _get_image_format(content_type: str) -> str:
"""Get Bedrock image format from content type.
Args:
content_type: MIME type.
Returns:
Format string for Bedrock.
"""
media_type = content_type.split("/")[-1]
if media_type == "jpg":
return "jpeg"
return media_type

View File

@@ -0,0 +1,67 @@
"""Gemini content block formatter."""
from __future__ import annotations
import base64
from typing import Any
from crewai_files.core.resolved import (
FileReference,
InlineBase64,
InlineBytes,
ResolvedFileType,
UrlReference,
)
class GeminiFormatter:
"""Formats resolved files into Gemini content blocks."""
@staticmethod
def format_block(resolved: ResolvedFileType) -> dict[str, Any]:
"""Format a resolved file into a Gemini content block.
Args:
resolved: Resolved file.
Returns:
Content block dict.
Raises:
TypeError: If resolved type is not supported.
"""
if isinstance(resolved, FileReference):
if not resolved.file_uri:
raise ValueError("Gemini requires file_uri for FileReference")
return {
"fileData": {
"mimeType": resolved.content_type,
"fileUri": resolved.file_uri,
}
}
if isinstance(resolved, UrlReference):
return {
"fileData": {
"mimeType": resolved.content_type,
"fileUri": resolved.url,
}
}
if isinstance(resolved, InlineBase64):
return {
"inlineData": {
"mimeType": resolved.content_type,
"data": resolved.data,
}
}
if isinstance(resolved, InlineBytes):
return {
"inlineData": {
"mimeType": resolved.content_type,
"data": base64.b64encode(resolved.data).decode("ascii"),
}
}
raise TypeError(f"Unexpected resolved type: {type(resolved).__name__}")

View File

@@ -0,0 +1,149 @@
"""OpenAI content block formatter."""
from __future__ import annotations
import base64
from typing import Any
from crewai_files.core.resolved import (
FileReference,
InlineBase64,
InlineBytes,
ResolvedFileType,
UrlReference,
)
class OpenAIResponsesFormatter:
"""Formats resolved files into OpenAI Responses API content blocks.
The Responses API uses a different format than Chat Completions:
- Images use `type: "input_image"` with `file_id` or `image_url`
- PDFs use `type: "input_file"` with `file_id`, `file_url`, or `file_data`
"""
@staticmethod
def format_block(resolved: ResolvedFileType, content_type: str) -> dict[str, Any]:
"""Format a resolved file into an OpenAI Responses API content block.
Args:
resolved: Resolved file.
content_type: MIME type of the file.
Returns:
Content block dict.
Raises:
TypeError: If resolved type is not supported.
"""
is_image = content_type.startswith("image/")
is_pdf = content_type == "application/pdf"
if isinstance(resolved, FileReference):
if is_image:
return {
"type": "input_image",
"file_id": resolved.file_id,
}
if is_pdf:
return {
"type": "input_file",
"file_id": resolved.file_id,
}
raise TypeError(
f"Unsupported content type for Responses API: {content_type}"
)
if isinstance(resolved, UrlReference):
if is_image:
return {
"type": "input_image",
"image_url": resolved.url,
}
if is_pdf:
return {
"type": "input_file",
"file_url": resolved.url,
}
raise TypeError(
f"Unsupported content type for Responses API: {content_type}"
)
if isinstance(resolved, InlineBase64):
if is_image:
return {
"type": "input_image",
"image_url": f"data:{resolved.content_type};base64,{resolved.data}",
}
if is_pdf:
return {
"type": "input_file",
"file_data": f"data:{resolved.content_type};base64,{resolved.data}",
}
raise TypeError(
f"Unsupported content type for Responses API: {content_type}"
)
if isinstance(resolved, InlineBytes):
data = base64.b64encode(resolved.data).decode("ascii")
if is_image:
return {
"type": "input_image",
"image_url": f"data:{resolved.content_type};base64,{data}",
}
if is_pdf:
return {
"type": "input_file",
"file_data": f"data:{resolved.content_type};base64,{data}",
}
raise TypeError(
f"Unsupported content type for Responses API: {content_type}"
)
raise TypeError(f"Unexpected resolved type: {type(resolved).__name__}")
class OpenAIFormatter:
"""Formats resolved files into OpenAI content blocks."""
@staticmethod
def format_block(resolved: ResolvedFileType) -> dict[str, Any]:
"""Format a resolved file into an OpenAI content block.
Args:
resolved: Resolved file.
Returns:
Content block dict.
Raises:
TypeError: If resolved type is not supported.
"""
if isinstance(resolved, FileReference):
return {
"type": "file",
"file": {"file_id": resolved.file_id},
}
if isinstance(resolved, UrlReference):
return {
"type": "image_url",
"image_url": {"url": resolved.url},
}
if isinstance(resolved, InlineBase64):
return {
"type": "image_url",
"image_url": {
"url": f"data:{resolved.content_type};base64,{resolved.data}"
},
}
if isinstance(resolved, InlineBytes):
data = base64.b64encode(resolved.data).decode("ascii")
return {
"type": "image_url",
"image_url": {"url": f"data:{resolved.content_type};base64,{data}"},
}
raise TypeError(f"Unexpected resolved type: {type(resolved).__name__}")

View File

@@ -0,0 +1,62 @@
"""File processing module for multimodal content handling.
This module provides validation, transformation, and processing utilities
for files used in multimodal LLM interactions.
"""
from crewai_files.processing.constraints import (
ANTHROPIC_CONSTRAINTS,
BEDROCK_CONSTRAINTS,
GEMINI_CONSTRAINTS,
OPENAI_CONSTRAINTS,
AudioConstraints,
ImageConstraints,
PDFConstraints,
ProviderConstraints,
VideoConstraints,
get_constraints_for_provider,
)
from crewai_files.processing.enums import FileHandling
from crewai_files.processing.exceptions import (
FileProcessingError,
FileTooLargeError,
FileValidationError,
ProcessingDependencyError,
UnsupportedFileTypeError,
)
from crewai_files.processing.processor import FileProcessor
from crewai_files.processing.validators import (
validate_audio,
validate_file,
validate_image,
validate_pdf,
validate_text,
validate_video,
)
__all__ = [
"ANTHROPIC_CONSTRAINTS",
"BEDROCK_CONSTRAINTS",
"GEMINI_CONSTRAINTS",
"OPENAI_CONSTRAINTS",
"AudioConstraints",
"FileHandling",
"FileProcessingError",
"FileProcessor",
"FileTooLargeError",
"FileValidationError",
"ImageConstraints",
"PDFConstraints",
"ProcessingDependencyError",
"ProviderConstraints",
"UnsupportedFileTypeError",
"VideoConstraints",
"get_constraints_for_provider",
"validate_audio",
"validate_file",
"validate_image",
"validate_pdf",
"validate_text",
"validate_video",
]

View File

@@ -0,0 +1,331 @@
"""Provider-specific file constraints for multimodal content."""
from dataclasses import dataclass
from functools import lru_cache
from typing import Literal
from crewai_files.core.types import (
AudioMimeType,
ImageMimeType,
TextContentType,
VideoMimeType,
)
ProviderName = Literal[
"anthropic",
"openai",
"gemini",
"bedrock",
"azure",
]
DEFAULT_IMAGE_FORMATS: tuple[ImageMimeType, ...] = (
"image/png",
"image/jpeg",
"image/gif",
"image/webp",
)
GEMINI_IMAGE_FORMATS: tuple[ImageMimeType, ...] = (
"image/png",
"image/jpeg",
"image/gif",
"image/webp",
"image/heic",
"image/heif",
)
DEFAULT_AUDIO_FORMATS: tuple[AudioMimeType, ...] = (
"audio/mp3",
"audio/mpeg",
"audio/wav",
"audio/ogg",
"audio/flac",
"audio/aac",
"audio/m4a",
)
GEMINI_AUDIO_FORMATS: tuple[AudioMimeType, ...] = (
"audio/mp3",
"audio/mpeg",
"audio/wav",
"audio/ogg",
"audio/flac",
"audio/aac",
"audio/m4a",
"audio/opus",
)
DEFAULT_VIDEO_FORMATS: tuple[VideoMimeType, ...] = (
"video/mp4",
"video/mpeg",
"video/webm",
"video/quicktime",
)
GEMINI_VIDEO_FORMATS: tuple[VideoMimeType, ...] = (
"video/mp4",
"video/mpeg",
"video/webm",
"video/quicktime",
"video/x-msvideo",
"video/x-flv",
)
DEFAULT_TEXT_FORMATS: tuple[TextContentType, ...] = (
"text/plain",
"text/markdown",
"text/csv",
"application/json",
"text/xml",
"text/html",
)
GEMINI_TEXT_FORMATS: tuple[TextContentType, ...] = (
"text/plain",
"text/markdown",
"text/csv",
"application/json",
"application/xml",
"text/xml",
"application/x-yaml",
"text/yaml",
"text/html",
)
@dataclass(frozen=True)
class ImageConstraints:
"""Constraints for image files.
Attributes:
max_size_bytes: Maximum file size in bytes.
max_width: Maximum image width in pixels.
max_height: Maximum image height in pixels.
max_images_per_request: Maximum number of images per request.
supported_formats: Supported image MIME types.
"""
max_size_bytes: int
max_width: int | None = None
max_height: int | None = None
max_images_per_request: int | None = None
supported_formats: tuple[ImageMimeType, ...] = DEFAULT_IMAGE_FORMATS
@dataclass(frozen=True)
class PDFConstraints:
"""Constraints for PDF files.
Attributes:
max_size_bytes: Maximum file size in bytes.
max_pages: Maximum number of pages.
"""
max_size_bytes: int
max_pages: int | None = None
@dataclass(frozen=True)
class AudioConstraints:
"""Constraints for audio files.
Attributes:
max_size_bytes: Maximum file size in bytes.
max_duration_seconds: Maximum audio duration in seconds.
supported_formats: Supported audio MIME types.
"""
max_size_bytes: int
max_duration_seconds: int | None = None
supported_formats: tuple[AudioMimeType, ...] = DEFAULT_AUDIO_FORMATS
@dataclass(frozen=True)
class VideoConstraints:
"""Constraints for video files.
Attributes:
max_size_bytes: Maximum file size in bytes.
max_duration_seconds: Maximum video duration in seconds.
supported_formats: Supported video MIME types.
"""
max_size_bytes: int
max_duration_seconds: int | None = None
supported_formats: tuple[VideoMimeType, ...] = DEFAULT_VIDEO_FORMATS
@dataclass(frozen=True)
class TextConstraints:
"""Constraints for text files.
Attributes:
max_size_bytes: Maximum file size in bytes.
supported_formats: Supported text MIME types.
"""
max_size_bytes: int
supported_formats: tuple[TextContentType, ...] = DEFAULT_TEXT_FORMATS
@dataclass(frozen=True)
class ProviderConstraints:
"""Complete set of constraints for a provider.
Attributes:
name: Provider name identifier.
image: Image file constraints.
pdf: PDF file constraints.
audio: Audio file constraints.
video: Video file constraints.
text: Text file constraints.
general_max_size_bytes: Maximum size for any file type.
supports_file_upload: Whether the provider supports file upload APIs.
file_upload_threshold_bytes: Size threshold above which to use file upload.
supports_url_references: Whether the provider supports URL-based file references.
"""
name: ProviderName
image: ImageConstraints | None = None
pdf: PDFConstraints | None = None
audio: AudioConstraints | None = None
video: VideoConstraints | None = None
text: TextConstraints | None = None
general_max_size_bytes: int | None = None
supports_file_upload: bool = False
file_upload_threshold_bytes: int | None = None
supports_url_references: bool = False
ANTHROPIC_CONSTRAINTS = ProviderConstraints(
name="anthropic",
image=ImageConstraints(
max_size_bytes=5_242_880, # 5 MB per image
max_width=8000,
max_height=8000,
max_images_per_request=100,
),
pdf=PDFConstraints(
max_size_bytes=33_554_432, # 32 MB request size limit
max_pages=100,
),
supports_file_upload=True,
file_upload_threshold_bytes=5_242_880,
supports_url_references=True,
)
OPENAI_CONSTRAINTS = ProviderConstraints(
name="openai",
image=ImageConstraints(
max_size_bytes=20_971_520,
max_images_per_request=10,
),
pdf=PDFConstraints(
max_size_bytes=33_554_432, # 32 MB total across all file inputs
max_pages=100,
),
audio=AudioConstraints(
max_size_bytes=26_214_400, # 25 MB - whisper limit
max_duration_seconds=1500, # 25 minutes, arbitrary-ish, this is from the transcriptions limit
),
supports_file_upload=True,
file_upload_threshold_bytes=5_242_880,
supports_url_references=True,
)
GEMINI_CONSTRAINTS = ProviderConstraints(
name="gemini",
image=ImageConstraints(
max_size_bytes=104_857_600,
supported_formats=GEMINI_IMAGE_FORMATS,
),
pdf=PDFConstraints(
max_size_bytes=52_428_800,
),
audio=AudioConstraints(
max_size_bytes=104_857_600,
max_duration_seconds=34200, # 9.5 hours
supported_formats=GEMINI_AUDIO_FORMATS,
),
video=VideoConstraints(
max_size_bytes=2_147_483_648,
max_duration_seconds=3600, # 1 hour at default resolution
supported_formats=GEMINI_VIDEO_FORMATS,
),
text=TextConstraints(
max_size_bytes=104_857_600,
supported_formats=GEMINI_TEXT_FORMATS,
),
supports_file_upload=True,
file_upload_threshold_bytes=20_971_520,
supports_url_references=True,
)
BEDROCK_CONSTRAINTS = ProviderConstraints(
name="bedrock",
image=ImageConstraints(
max_size_bytes=4_608_000,
max_width=8000,
max_height=8000,
),
pdf=PDFConstraints(
max_size_bytes=3_840_000,
max_pages=100,
),
supports_url_references=True, # S3 URIs supported
)
AZURE_CONSTRAINTS = ProviderConstraints(
name="azure",
image=ImageConstraints(
max_size_bytes=20_971_520,
max_images_per_request=10,
),
audio=AudioConstraints(
max_size_bytes=26_214_400, # 25 MB - same as openai
max_duration_seconds=1500, # 25 minutes - same as openai
),
supports_url_references=True,
)
_PROVIDER_CONSTRAINTS_MAP: dict[str, ProviderConstraints] = {
"anthropic": ANTHROPIC_CONSTRAINTS,
"openai": OPENAI_CONSTRAINTS,
"gemini": GEMINI_CONSTRAINTS,
"bedrock": BEDROCK_CONSTRAINTS,
"azure": AZURE_CONSTRAINTS,
"claude": ANTHROPIC_CONSTRAINTS,
"gpt": OPENAI_CONSTRAINTS,
"google": GEMINI_CONSTRAINTS,
"aws": BEDROCK_CONSTRAINTS,
}
@lru_cache(maxsize=32)
def get_constraints_for_provider(
provider: str | ProviderConstraints,
) -> ProviderConstraints | None:
"""Get constraints for a provider by name or return if already ProviderConstraints.
Args:
provider: Provider name string or ProviderConstraints instance.
Returns:
ProviderConstraints for the provider, or None if not found.
"""
if isinstance(provider, ProviderConstraints):
return provider
provider_lower = provider.lower()
if provider_lower in _PROVIDER_CONSTRAINTS_MAP:
return _PROVIDER_CONSTRAINTS_MAP[provider_lower]
for key, constraints in _PROVIDER_CONSTRAINTS_MAP.items():
if key in provider_lower:
return constraints
return None

View File

@@ -0,0 +1,19 @@
"""Enums for file processing configuration."""
from enum import Enum
class FileHandling(Enum):
"""Defines how files exceeding provider limits should be handled.
Attributes:
STRICT: Fail with an error if file exceeds limits.
AUTO: Automatically resize, compress, or optimize to fit limits.
WARN: Log a warning but attempt to process anyway.
CHUNK: Split large files into smaller pieces.
"""
STRICT = "strict"
AUTO = "auto"
WARN = "warn"
CHUNK = "chunk"

View File

@@ -0,0 +1,145 @@
"""Exceptions for file processing operations."""
class FileProcessingError(Exception):
"""Base exception for file processing errors."""
def __init__(self, message: str, file_name: str | None = None) -> None:
"""Initialize the exception.
Args:
message: Error message describing the issue.
file_name: Optional name of the file that caused the error.
"""
self.file_name = file_name
super().__init__(message)
class FileValidationError(FileProcessingError):
"""Raised when file validation fails."""
class FileTooLargeError(FileValidationError):
"""Raised when a file exceeds the maximum allowed size."""
def __init__(
self,
message: str,
file_name: str | None = None,
actual_size: int | None = None,
max_size: int | None = None,
) -> None:
"""Initialize the exception.
Args:
message: Error message describing the issue.
file_name: Optional name of the file that caused the error.
actual_size: The actual size of the file in bytes.
max_size: The maximum allowed size in bytes.
"""
self.actual_size = actual_size
self.max_size = max_size
super().__init__(message, file_name)
class UnsupportedFileTypeError(FileValidationError):
"""Raised when a file type is not supported by the provider."""
def __init__(
self,
message: str,
file_name: str | None = None,
content_type: str | None = None,
) -> None:
"""Initialize the exception.
Args:
message: Error message describing the issue.
file_name: Optional name of the file that caused the error.
content_type: The content type that is not supported.
"""
self.content_type = content_type
super().__init__(message, file_name)
class ProcessingDependencyError(FileProcessingError):
"""Raised when a required processing dependency is not installed."""
def __init__(
self,
message: str,
dependency: str,
install_command: str | None = None,
) -> None:
"""Initialize the exception.
Args:
message: Error message describing the issue.
dependency: Name of the missing dependency.
install_command: Optional command to install the dependency.
"""
self.dependency = dependency
self.install_command = install_command
super().__init__(message)
class TransientFileError(FileProcessingError):
"""Transient error that may succeed on retry (network, timeout)."""
class PermanentFileError(FileProcessingError):
"""Permanent error that will not succeed on retry (auth, format)."""
class UploadError(FileProcessingError):
"""Base exception for upload errors."""
class TransientUploadError(UploadError, TransientFileError):
"""Upload failed but may succeed on retry (network issues, rate limits)."""
class PermanentUploadError(UploadError, PermanentFileError):
"""Upload failed permanently (auth failure, invalid file, unsupported type)."""
def classify_upload_error(e: Exception, filename: str | None = None) -> Exception:
"""Classify an exception as transient or permanent upload error.
Analyzes the exception type name and status code to determine if
the error is likely transient (retryable) or permanent.
Args:
e: The exception to classify.
filename: Optional filename for error context.
Returns:
A TransientUploadError or PermanentUploadError wrapping the original.
"""
error_type = type(e).__name__
if "RateLimit" in error_type or "APIConnection" in error_type:
return TransientUploadError(f"Transient upload error: {e}", file_name=filename)
if "Authentication" in error_type or "Permission" in error_type:
return PermanentUploadError(
f"Authentication/permission error: {e}", file_name=filename
)
if "BadRequest" in error_type or "InvalidRequest" in error_type:
return PermanentUploadError(f"Invalid request: {e}", file_name=filename)
status_code = getattr(e, "status_code", None)
if status_code is not None:
if status_code >= 500 or status_code == 429:
return TransientUploadError(
f"Server error ({status_code}): {e}", file_name=filename
)
if status_code in (401, 403):
return PermanentUploadError(
f"Auth error ({status_code}): {e}", file_name=filename
)
if status_code == 400:
return PermanentUploadError(
f"Bad request ({status_code}): {e}", file_name=filename
)
return TransientUploadError(f"Upload failed: {e}", file_name=filename)

View File

@@ -0,0 +1,346 @@
"""FileProcessor for validating and transforming files based on provider constraints."""
import asyncio
from collections.abc import Sequence
import logging
from crewai_files.core.types import (
AudioFile,
File,
FileInput,
ImageFile,
PDFFile,
TextFile,
VideoFile,
)
from crewai_files.processing.constraints import (
ProviderConstraints,
get_constraints_for_provider,
)
from crewai_files.processing.enums import FileHandling
from crewai_files.processing.exceptions import (
FileProcessingError,
FileTooLargeError,
FileValidationError,
UnsupportedFileTypeError,
)
from crewai_files.processing.transformers import (
chunk_pdf,
chunk_text,
get_image_dimensions,
get_pdf_page_count,
optimize_image,
resize_image,
)
from crewai_files.processing.validators import validate_file
logger = logging.getLogger(__name__)
class FileProcessor:
"""Processes files according to provider constraints and per-file mode mode.
Validates files against provider-specific limits and optionally transforms
them (resize, compress, chunk) to meet those limits. Each file specifies
its own mode mode via `file.mode`.
Attributes:
constraints: Provider constraints for validation.
"""
def __init__(
self,
constraints: ProviderConstraints | str | None = None,
) -> None:
"""Initialize the FileProcessor.
Args:
constraints: Provider constraints or provider name string.
If None, validation is skipped.
"""
if isinstance(constraints, str):
resolved = get_constraints_for_provider(constraints)
if resolved is None:
logger.warning(
f"Unknown provider '{constraints}' - validation disabled"
)
self.constraints = resolved
else:
self.constraints = constraints
def validate(self, file: FileInput) -> Sequence[str]:
"""Validate a file against provider constraints.
Args:
file: The file to validate.
Returns:
List of validation error messages (empty if valid).
Raises:
FileValidationError: If file.mode is STRICT and validation fails.
"""
if self.constraints is None:
return []
mode = self._get_mode(file)
raise_on_error = mode == FileHandling.STRICT
return validate_file(file, self.constraints, raise_on_error=raise_on_error)
@staticmethod
def _get_mode(file: FileInput) -> FileHandling:
"""Get the mode mode for a file.
Args:
file: The file to get mode for.
Returns:
The file's mode mode, defaulting to AUTO.
"""
mode = getattr(file, "mode", None)
if mode is None:
return FileHandling.AUTO
if isinstance(mode, str):
return FileHandling(mode)
if isinstance(mode, FileHandling):
return mode
return FileHandling.AUTO
def process(self, file: FileInput) -> FileInput | Sequence[FileInput]:
"""Process a single file according to constraints and its mode mode.
Args:
file: The file to process.
Returns:
The processed file (possibly transformed) or a sequence of files
if the file was chunked.
Raises:
FileProcessingError: If file.mode is STRICT and processing fails.
"""
if self.constraints is None:
return file
mode = self._get_mode(file)
try:
errors = self.validate(file)
if not errors:
return file
if mode == FileHandling.STRICT:
raise FileValidationError("; ".join(errors), file_name=file.filename)
if mode == FileHandling.WARN:
for error in errors:
logger.warning(error)
return file
if mode == FileHandling.AUTO:
return self._auto_process(file)
if mode == FileHandling.CHUNK:
return self._chunk_process(file)
return file
except (FileValidationError, FileTooLargeError, UnsupportedFileTypeError):
raise
except Exception as e:
logger.error(f"Error processing file '{file.filename}': {e}")
if mode == FileHandling.STRICT:
raise FileProcessingError(str(e), file_name=file.filename) from e
return file
def process_files(
self,
files: dict[str, FileInput],
) -> dict[str, FileInput]:
"""Process multiple files according to constraints.
Args:
files: Dictionary mapping names to file inputs.
Returns:
Dictionary mapping names to processed files. If a file is chunked,
multiple entries are created with indexed names.
"""
result: dict[str, FileInput] = {}
for name, file in files.items():
processed = self.process(file)
if isinstance(processed, Sequence) and not isinstance(
processed, (str, bytes)
):
for i, chunk in enumerate(processed):
chunk_name = f"{name}_chunk_{i}"
result[chunk_name] = chunk
else:
result[name] = processed
return result
async def aprocess_files(
self,
files: dict[str, FileInput],
max_concurrency: int = 10,
) -> dict[str, FileInput]:
"""Async process multiple files in parallel.
Args:
files: Dictionary mapping names to file inputs.
max_concurrency: Maximum number of concurrent processing tasks.
Returns:
Dictionary mapping names to processed files. If a file is chunked,
multiple entries are created with indexed names.
"""
semaphore = asyncio.Semaphore(max_concurrency)
async def process_single(
key: str, input_file: FileInput
) -> tuple[str, FileInput | Sequence[FileInput]]:
"""Process a single file with semaphore limiting."""
async with semaphore:
loop = asyncio.get_running_loop()
result = await loop.run_in_executor(None, self.process, input_file)
return key, result
tasks = [process_single(n, f) for n, f in files.items()]
gather_results = await asyncio.gather(*tasks, return_exceptions=True)
output: dict[str, FileInput] = {}
for item in gather_results:
if isinstance(item, BaseException):
logger.error(f"Processing failed: {item}")
continue
entry_name, processed = item
if isinstance(processed, Sequence) and not isinstance(
processed, (str, bytes)
):
for i, chunk in enumerate(processed):
output[f"{entry_name}_chunk_{i}"] = chunk
elif isinstance(
processed, (AudioFile, File, ImageFile, PDFFile, TextFile, VideoFile)
):
output[entry_name] = processed
return output
def _auto_process(self, file: FileInput) -> FileInput:
"""Automatically resize/compress file to meet constraints.
Args:
file: The file to process.
Returns:
The processed file.
"""
if self.constraints is None:
return file
if isinstance(file, ImageFile) and self.constraints.image is not None:
return self._auto_process_image(file)
if isinstance(file, PDFFile) and self.constraints.pdf is not None:
logger.warning(
f"Cannot auto-compress PDF '{file.filename}'. "
"Consider using CHUNK mode for large PDFs."
)
return file
if isinstance(file, (AudioFile, VideoFile)):
logger.warning(
f"Auto-processing not supported for {type(file).__name__}. "
"File will be used as-is."
)
return file
return file
def _auto_process_image(self, file: ImageFile) -> ImageFile:
"""Auto-process an image file.
Args:
file: The image file to process.
Returns:
The processed image file.
"""
if self.constraints is None or self.constraints.image is None:
return file
image_constraints = self.constraints.image
processed = file
content = file.read()
current_size = len(content)
if image_constraints.max_width or image_constraints.max_height:
dimensions = get_image_dimensions(file)
if dimensions:
width, height = dimensions
max_w = image_constraints.max_width or width
max_h = image_constraints.max_height or height
if width > max_w or height > max_h:
try:
processed = resize_image(file, max_w, max_h)
content = processed.read()
current_size = len(content)
except Exception as e:
logger.warning(f"Failed to resize image: {e}")
if current_size > image_constraints.max_size_bytes:
try:
processed = optimize_image(processed, image_constraints.max_size_bytes)
except Exception as e:
logger.warning(f"Failed to optimize image: {e}")
return processed
def _chunk_process(self, file: FileInput) -> FileInput | Sequence[FileInput]:
"""Split file into chunks to meet constraints.
Args:
file: The file to chunk.
Returns:
Original file if chunking not needed, or sequence of chunked files.
"""
if self.constraints is None:
return file
if isinstance(file, PDFFile) and self.constraints.pdf is not None:
max_pages = self.constraints.pdf.max_pages
if max_pages is not None:
page_count = get_pdf_page_count(file)
if page_count is not None and page_count > max_pages:
try:
return list(chunk_pdf(file, max_pages))
except Exception as e:
logger.warning(f"Failed to chunk PDF: {e}")
return file
if isinstance(file, TextFile):
# Use general max size as character limit approximation
max_size = self.constraints.general_max_size_bytes
if max_size is not None:
content = file.read()
if len(content) > max_size:
try:
return list(chunk_text(file, max_size))
except Exception as e:
logger.warning(f"Failed to chunk text file: {e}")
return file
if isinstance(file, (ImageFile, AudioFile, VideoFile)):
logger.warning(
f"Chunking not supported for {type(file).__name__}. "
"Consider using AUTO mode for images."
)
return file

View File

@@ -0,0 +1,336 @@
"""File transformation functions for resizing, optimizing, and chunking."""
from collections.abc import Iterator
import io
import logging
from crewai_files.core.sources import FileBytes
from crewai_files.core.types import ImageFile, PDFFile, TextFile
from crewai_files.processing.exceptions import ProcessingDependencyError
logger = logging.getLogger(__name__)
def resize_image(
file: ImageFile,
max_width: int,
max_height: int,
*,
preserve_aspect_ratio: bool = True,
) -> ImageFile:
"""Resize an image to fit within the specified dimensions.
Args:
file: The image file to resize.
max_width: Maximum width in pixels.
max_height: Maximum height in pixels.
preserve_aspect_ratio: If True, maintain aspect ratio while fitting within bounds.
Returns:
A new ImageFile with the resized image data.
Raises:
ProcessingDependencyError: If Pillow is not installed.
"""
try:
from PIL import Image
except ImportError as e:
raise ProcessingDependencyError(
"Pillow is required for image resizing",
dependency="Pillow",
install_command="pip install Pillow",
) from e
content = file.read()
with Image.open(io.BytesIO(content)) as img:
original_width, original_height = img.size
if original_width <= max_width and original_height <= max_height:
return file
if preserve_aspect_ratio:
width_ratio = max_width / original_width
height_ratio = max_height / original_height
scale_factor = min(width_ratio, height_ratio)
new_width = int(original_width * scale_factor)
new_height = int(original_height * scale_factor)
else:
new_width = min(original_width, max_width)
new_height = min(original_height, max_height)
resized_img = img.resize((new_width, new_height), Image.Resampling.LANCZOS)
output_format = img.format or "PNG"
if output_format.upper() == "JPEG":
if resized_img.mode in ("RGBA", "LA", "P"):
resized_img = resized_img.convert("RGB")
output_buffer = io.BytesIO()
resized_img.save(output_buffer, format=output_format)
output_bytes = output_buffer.getvalue()
logger.info(
f"Resized image '{file.filename}' from {original_width}x{original_height} "
f"to {new_width}x{new_height}"
)
return ImageFile(source=FileBytes(data=output_bytes, filename=file.filename))
def optimize_image(
file: ImageFile,
target_size_bytes: int,
*,
min_quality: int = 20,
initial_quality: int = 85,
) -> ImageFile:
"""Optimize an image to fit within a target file size.
Uses iterative quality reduction to achieve target size.
Args:
file: The image file to optimize.
target_size_bytes: Target maximum file size in bytes.
min_quality: Minimum quality to use (prevents excessive degradation).
initial_quality: Starting quality for optimization.
Returns:
A new ImageFile with the optimized image data.
Raises:
ProcessingDependencyError: If Pillow is not installed.
"""
try:
from PIL import Image
except ImportError as e:
raise ProcessingDependencyError(
"Pillow is required for image optimization",
dependency="Pillow",
install_command="pip install Pillow",
) from e
content = file.read()
current_size = len(content)
if current_size <= target_size_bytes:
return file
with Image.open(io.BytesIO(content)) as img:
if img.mode in ("RGBA", "LA", "P"):
img = img.convert("RGB")
output_format = "JPEG"
else:
output_format = img.format or "JPEG"
if output_format.upper() not in ("JPEG", "JPG"):
output_format = "JPEG"
quality = initial_quality
output_bytes = content
while len(output_bytes) > target_size_bytes and quality >= min_quality:
output_buffer = io.BytesIO()
img.save(
output_buffer, format=output_format, quality=quality, optimize=True
)
output_bytes = output_buffer.getvalue()
if len(output_bytes) > target_size_bytes:
quality -= 5
logger.info(
f"Optimized image '{file.filename}' from {current_size} bytes to "
f"{len(output_bytes)} bytes (quality={quality})"
)
filename = file.filename
if (
filename
and output_format.upper() == "JPEG"
and not filename.lower().endswith((".jpg", ".jpeg"))
):
filename = filename.rsplit(".", 1)[0] + ".jpg"
return ImageFile(source=FileBytes(data=output_bytes, filename=filename))
def chunk_pdf(
file: PDFFile,
max_pages: int,
*,
overlap_pages: int = 0,
) -> Iterator[PDFFile]:
"""Split a PDF into chunks of maximum page count.
Yields chunks one at a time to minimize memory usage.
Args:
file: The PDF file to chunk.
max_pages: Maximum pages per chunk.
overlap_pages: Number of overlapping pages between chunks (for context).
Yields:
PDFFile objects, one per chunk.
Raises:
ProcessingDependencyError: If pypdf is not installed.
"""
try:
from pypdf import PdfReader, PdfWriter
except ImportError as e:
raise ProcessingDependencyError(
"pypdf is required for PDF chunking",
dependency="pypdf",
install_command="pip install pypdf",
) from e
content = file.read()
reader = PdfReader(io.BytesIO(content))
total_pages = len(reader.pages)
if total_pages <= max_pages:
yield file
return
filename = file.filename or "document.pdf"
base_filename = filename.rsplit(".", 1)[0]
step = max_pages - overlap_pages
chunk_num = 0
start_page = 0
while start_page < total_pages:
end_page = min(start_page + max_pages, total_pages)
writer = PdfWriter()
for page_num in range(start_page, end_page):
writer.add_page(reader.pages[page_num])
output_buffer = io.BytesIO()
writer.write(output_buffer)
output_bytes = output_buffer.getvalue()
chunk_filename = f"{base_filename}_chunk_{chunk_num}.pdf"
logger.info(
f"Created PDF chunk '{chunk_filename}' with pages {start_page + 1}-{end_page}"
)
yield PDFFile(source=FileBytes(data=output_bytes, filename=chunk_filename))
start_page += step
chunk_num += 1
def chunk_text(
file: TextFile,
max_chars: int,
*,
overlap_chars: int = 200,
split_on_newlines: bool = True,
) -> Iterator[TextFile]:
"""Split a text file into chunks of maximum character count.
Yields chunks one at a time to minimize memory usage.
Args:
file: The text file to chunk.
max_chars: Maximum characters per chunk.
overlap_chars: Number of overlapping characters between chunks.
split_on_newlines: If True, prefer splitting at newline boundaries.
Yields:
TextFile objects, one per chunk.
"""
content = file.read()
text = content.decode(errors="replace")
total_chars = len(text)
if total_chars <= max_chars:
yield file
return
filename = file.filename or "text.txt"
base_filename = filename.rsplit(".", 1)[0]
extension = filename.rsplit(".", 1)[-1] if "." in filename else "txt"
chunk_num = 0
start_pos = 0
while start_pos < total_chars:
end_pos = min(start_pos + max_chars, total_chars)
if end_pos < total_chars and split_on_newlines:
last_newline = text.rfind("\n", start_pos, end_pos)
if last_newline > start_pos + max_chars // 2:
end_pos = last_newline + 1
chunk_content = text[start_pos:end_pos]
chunk_bytes = chunk_content.encode()
chunk_filename = f"{base_filename}_chunk_{chunk_num}.{extension}"
logger.info(
f"Created text chunk '{chunk_filename}' with {len(chunk_content)} characters"
)
yield TextFile(source=FileBytes(data=chunk_bytes, filename=chunk_filename))
if end_pos < total_chars:
start_pos = max(start_pos + 1, end_pos - overlap_chars)
else:
start_pos = total_chars
chunk_num += 1
def get_image_dimensions(file: ImageFile) -> tuple[int, int] | None:
"""Get the dimensions of an image file.
Args:
file: The image file to measure.
Returns:
Tuple of (width, height) in pixels, or None if dimensions cannot be determined.
"""
try:
from PIL import Image
except ImportError:
logger.warning("Pillow not installed - cannot get image dimensions")
return None
content = file.read()
try:
with Image.open(io.BytesIO(content)) as img:
width, height = img.size
return width, height
except Exception as e:
logger.warning(f"Failed to get image dimensions: {e}")
return None
def get_pdf_page_count(file: PDFFile) -> int | None:
"""Get the page count of a PDF file.
Args:
file: The PDF file to measure.
Returns:
Number of pages, or None if page count cannot be determined.
"""
try:
from pypdf import PdfReader
except ImportError:
logger.warning("pypdf not installed - cannot get PDF page count")
return None
content = file.read()
try:
reader = PdfReader(io.BytesIO(content))
return len(reader.pages)
except Exception as e:
logger.warning(f"Failed to get PDF page count: {e}")
return None

View File

@@ -0,0 +1,564 @@
"""File validation functions for checking against provider constraints."""
from collections.abc import Sequence
import io
import logging
from crewai_files.core.types import (
AudioFile,
FileInput,
ImageFile,
PDFFile,
TextFile,
VideoFile,
)
from crewai_files.processing.constraints import (
AudioConstraints,
ImageConstraints,
PDFConstraints,
ProviderConstraints,
VideoConstraints,
)
from crewai_files.processing.exceptions import (
FileTooLargeError,
FileValidationError,
UnsupportedFileTypeError,
)
logger = logging.getLogger(__name__)
def _get_image_dimensions(content: bytes) -> tuple[int, int] | None:
"""Get image dimensions using Pillow if available.
Args:
content: Raw image bytes.
Returns:
Tuple of (width, height) or None if Pillow unavailable.
"""
try:
from PIL import Image
with Image.open(io.BytesIO(content)) as img:
width, height = img.size
return int(width), int(height)
except ImportError:
logger.warning(
"Pillow not installed - cannot validate image dimensions. "
"Install with: pip install Pillow"
)
return None
def _get_pdf_page_count(content: bytes) -> int | None:
"""Get PDF page count using pypdf if available.
Args:
content: Raw PDF bytes.
Returns:
Page count or None if pypdf unavailable.
"""
try:
from pypdf import PdfReader
reader = PdfReader(io.BytesIO(content))
return len(reader.pages)
except ImportError:
logger.warning(
"pypdf not installed - cannot validate PDF page count. "
"Install with: pip install pypdf"
)
return None
def _get_audio_duration(content: bytes, filename: str | None = None) -> float | None:
"""Get audio duration in seconds using tinytag if available.
Args:
content: Raw audio bytes.
filename: Optional filename for format detection hint.
Returns:
Duration in seconds or None if tinytag unavailable.
"""
try:
from tinytag import TinyTag # type: ignore[import-untyped]
except ImportError:
logger.warning(
"tinytag not installed - cannot validate audio duration. "
"Install with: pip install tinytag"
)
return None
try:
tag = TinyTag.get(file_obj=io.BytesIO(content), filename=filename)
duration: float | None = tag.duration
return duration
except Exception as e:
logger.debug(f"Could not determine audio duration: {e}")
return None
_VIDEO_FORMAT_MAP: dict[str, str] = {
"video/mp4": "mp4",
"video/webm": "webm",
"video/x-matroska": "matroska",
"video/quicktime": "mov",
"video/x-msvideo": "avi",
"video/x-flv": "flv",
}
def _get_video_duration(
content: bytes, content_type: str | None = None
) -> float | None:
"""Get video duration in seconds using av if available.
Args:
content: Raw video bytes.
content_type: Optional MIME type for format detection hint.
Returns:
Duration in seconds or None if av unavailable.
"""
try:
import av
except ImportError:
logger.warning(
"av (PyAV) not installed - cannot validate video duration. "
"Install with: pip install av"
)
return None
format_hint = _VIDEO_FORMAT_MAP.get(content_type) if content_type else None
try:
with av.open(io.BytesIO(content), format=format_hint) as container: # type: ignore[attr-defined]
duration: int | None = container.duration # type: ignore[union-attr]
if duration is None:
return None
return float(duration) / 1_000_000
except Exception as e:
logger.debug(f"Could not determine video duration: {e}")
return None
def _format_size(size_bytes: int) -> str:
"""Format byte size to human-readable string."""
if size_bytes >= 1024 * 1024 * 1024:
return f"{size_bytes / (1024 * 1024 * 1024):.1f}GB"
if size_bytes >= 1024 * 1024:
return f"{size_bytes / (1024 * 1024):.1f}MB"
if size_bytes >= 1024:
return f"{size_bytes / 1024:.1f}KB"
return f"{size_bytes}B"
def _validate_size(
file_type: str,
filename: str | None,
file_size: int,
max_size: int,
errors: list[str],
raise_on_error: bool,
) -> None:
"""Validate file size against maximum.
Args:
file_type: Type label for error messages (e.g., "Image", "PDF").
filename: Name of the file being validated.
file_size: Actual file size in bytes.
max_size: Maximum allowed size in bytes.
errors: List to append error messages to.
raise_on_error: If True, raise FileTooLargeError on failure.
"""
if file_size > max_size:
msg = (
f"{file_type} '{filename}' size ({_format_size(file_size)}) exceeds "
f"maximum ({_format_size(max_size)})"
)
errors.append(msg)
if raise_on_error:
raise FileTooLargeError(
msg,
file_name=filename,
actual_size=file_size,
max_size=max_size,
)
def _validate_format(
file_type: str,
filename: str | None,
content_type: str,
supported_formats: tuple[str, ...],
errors: list[str],
raise_on_error: bool,
) -> None:
"""Validate content type against supported formats.
Args:
file_type: Type label for error messages (e.g., "Image", "Audio").
filename: Name of the file being validated.
content_type: MIME type of the file.
supported_formats: Tuple of supported MIME types.
errors: List to append error messages to.
raise_on_error: If True, raise UnsupportedFileTypeError on failure.
"""
if content_type not in supported_formats:
msg = (
f"{file_type} format '{content_type}' is not supported. "
f"Supported: {', '.join(supported_formats)}"
)
errors.append(msg)
if raise_on_error:
raise UnsupportedFileTypeError(
msg, file_name=filename, content_type=content_type
)
def validate_image(
file: ImageFile,
constraints: ImageConstraints,
*,
raise_on_error: bool = True,
) -> Sequence[str]:
"""Validate an image file against constraints.
Args:
file: The image file to validate.
constraints: Image constraints to validate against.
raise_on_error: If True, raise exceptions on validation failure.
Returns:
List of validation error messages (empty if valid).
Raises:
FileTooLargeError: If the file exceeds size limits.
FileValidationError: If the file exceeds dimension limits.
UnsupportedFileTypeError: If the format is not supported.
"""
errors: list[str] = []
content = file.read()
file_size = len(content)
filename = file.filename
_validate_size(
"Image", filename, file_size, constraints.max_size_bytes, errors, raise_on_error
)
_validate_format(
"Image",
filename,
file.content_type,
constraints.supported_formats,
errors,
raise_on_error,
)
if constraints.max_width is not None or constraints.max_height is not None:
dimensions = _get_image_dimensions(content)
if dimensions is not None:
width, height = dimensions
if constraints.max_width and width > constraints.max_width:
msg = (
f"Image '{filename}' width ({width}px) exceeds "
f"maximum ({constraints.max_width}px)"
)
errors.append(msg)
if raise_on_error:
raise FileValidationError(msg, file_name=filename)
if constraints.max_height and height > constraints.max_height:
msg = (
f"Image '{filename}' height ({height}px) exceeds "
f"maximum ({constraints.max_height}px)"
)
errors.append(msg)
if raise_on_error:
raise FileValidationError(msg, file_name=filename)
return errors
def validate_pdf(
file: PDFFile,
constraints: PDFConstraints,
*,
raise_on_error: bool = True,
) -> Sequence[str]:
"""Validate a PDF file against constraints.
Args:
file: The PDF file to validate.
constraints: PDF constraints to validate against.
raise_on_error: If True, raise exceptions on validation failure.
Returns:
List of validation error messages (empty if valid).
Raises:
FileTooLargeError: If the file exceeds size limits.
FileValidationError: If the file exceeds page limits.
"""
errors: list[str] = []
content = file.read()
file_size = len(content)
filename = file.filename
_validate_size(
"PDF", filename, file_size, constraints.max_size_bytes, errors, raise_on_error
)
if constraints.max_pages is not None:
page_count = _get_pdf_page_count(content)
if page_count is not None and page_count > constraints.max_pages:
msg = (
f"PDF '{filename}' page count ({page_count}) exceeds "
f"maximum ({constraints.max_pages})"
)
errors.append(msg)
if raise_on_error:
raise FileValidationError(msg, file_name=filename)
return errors
def validate_audio(
file: AudioFile,
constraints: AudioConstraints,
*,
raise_on_error: bool = True,
) -> Sequence[str]:
"""Validate an audio file against constraints.
Args:
file: The audio file to validate.
constraints: Audio constraints to validate against.
raise_on_error: If True, raise exceptions on validation failure.
Returns:
List of validation error messages (empty if valid).
Raises:
FileTooLargeError: If the file exceeds size limits.
FileValidationError: If the file exceeds duration limits.
UnsupportedFileTypeError: If the format is not supported.
"""
errors: list[str] = []
content = file.read()
file_size = len(content)
filename = file.filename
_validate_size(
"Audio",
filename,
file_size,
constraints.max_size_bytes,
errors,
raise_on_error,
)
_validate_format(
"Audio",
filename,
file.content_type,
constraints.supported_formats,
errors,
raise_on_error,
)
if constraints.max_duration_seconds is not None:
duration = _get_audio_duration(content, filename)
if duration is not None and duration > constraints.max_duration_seconds:
msg = (
f"Audio '{filename}' duration ({duration:.1f}s) exceeds "
f"maximum ({constraints.max_duration_seconds}s)"
)
errors.append(msg)
if raise_on_error:
raise FileValidationError(msg, file_name=filename)
return errors
def validate_video(
file: VideoFile,
constraints: VideoConstraints,
*,
raise_on_error: bool = True,
) -> Sequence[str]:
"""Validate a video file against constraints.
Args:
file: The video file to validate.
constraints: Video constraints to validate against.
raise_on_error: If True, raise exceptions on validation failure.
Returns:
List of validation error messages (empty if valid).
Raises:
FileTooLargeError: If the file exceeds size limits.
FileValidationError: If the file exceeds duration limits.
UnsupportedFileTypeError: If the format is not supported.
"""
errors: list[str] = []
content = file.read()
file_size = len(content)
filename = file.filename
_validate_size(
"Video",
filename,
file_size,
constraints.max_size_bytes,
errors,
raise_on_error,
)
_validate_format(
"Video",
filename,
file.content_type,
constraints.supported_formats,
errors,
raise_on_error,
)
if constraints.max_duration_seconds is not None:
duration = _get_video_duration(content)
if duration is not None and duration > constraints.max_duration_seconds:
msg = (
f"Video '{filename}' duration ({duration:.1f}s) exceeds "
f"maximum ({constraints.max_duration_seconds}s)"
)
errors.append(msg)
if raise_on_error:
raise FileValidationError(msg, file_name=filename)
return errors
def validate_text(
file: TextFile,
constraints: ProviderConstraints,
*,
raise_on_error: bool = True,
) -> Sequence[str]:
"""Validate a text file against general constraints.
Args:
file: The text file to validate.
constraints: Provider constraints to validate against.
raise_on_error: If True, raise exceptions on validation failure.
Returns:
List of validation error messages (empty if valid).
Raises:
FileTooLargeError: If the file exceeds size limits.
"""
errors: list[str] = []
if constraints.general_max_size_bytes is None:
return errors
file_size = len(file.read())
_validate_size(
"Text file",
file.filename,
file_size,
constraints.general_max_size_bytes,
errors,
raise_on_error,
)
return errors
def _check_unsupported_type(
file: FileInput,
provider_name: str,
type_name: str,
raise_on_error: bool,
) -> Sequence[str]:
"""Check if file type is unsupported and handle error.
Args:
file: The file being validated.
provider_name: Name of the provider.
type_name: Name of the file type (e.g., "images", "PDFs").
raise_on_error: If True, raise exception instead of returning errors.
Returns:
List with error message (only returns when raise_on_error is False).
Raises:
UnsupportedFileTypeError: If raise_on_error is True.
"""
msg = f"Provider '{provider_name}' does not support {type_name}"
if raise_on_error:
raise UnsupportedFileTypeError(
msg, file_name=file.filename, content_type=file.content_type
)
return [msg]
def validate_file(
file: FileInput,
constraints: ProviderConstraints,
*,
raise_on_error: bool = True,
) -> Sequence[str]:
"""Validate a file against provider constraints.
Dispatches to the appropriate validator based on file type.
Args:
file: The file to validate.
constraints: Provider constraints to validate against.
raise_on_error: If True, raise exceptions on validation failure.
Returns:
List of validation error messages (empty if valid).
Raises:
FileTooLargeError: If the file exceeds size limits.
FileValidationError: If the file fails other validation checks.
UnsupportedFileTypeError: If the file type is not supported.
"""
if isinstance(file, ImageFile):
if constraints.image is None:
return _check_unsupported_type(
file, constraints.name, "images", raise_on_error
)
return validate_image(file, constraints.image, raise_on_error=raise_on_error)
if isinstance(file, PDFFile):
if constraints.pdf is None:
return _check_unsupported_type(
file, constraints.name, "PDFs", raise_on_error
)
return validate_pdf(file, constraints.pdf, raise_on_error=raise_on_error)
if isinstance(file, AudioFile):
if constraints.audio is None:
return _check_unsupported_type(
file, constraints.name, "audio", raise_on_error
)
return validate_audio(file, constraints.audio, raise_on_error=raise_on_error)
if isinstance(file, VideoFile):
if constraints.video is None:
return _check_unsupported_type(
file, constraints.name, "video", raise_on_error
)
return validate_video(file, constraints.video, raise_on_error=raise_on_error)
if isinstance(file, TextFile):
return validate_text(file, constraints, raise_on_error=raise_on_error)
return []

View File

@@ -0,0 +1,16 @@
"""File resolution logic."""
from crewai_files.resolution.resolver import FileResolver
from crewai_files.resolution.utils import (
is_file_source,
normalize_input_files,
wrap_file_source,
)
__all__ = [
"FileResolver",
"is_file_source",
"normalize_input_files",
"wrap_file_source",
]

View File

@@ -0,0 +1,670 @@
"""FileResolver for deciding file delivery method and managing uploads."""
import asyncio
import base64
from dataclasses import dataclass, field
import hashlib
import logging
from crewai_files.cache.metrics import measure_operation
from crewai_files.cache.upload_cache import CachedUpload, UploadCache
from crewai_files.core.constants import UPLOAD_MAX_RETRIES, UPLOAD_RETRY_DELAY_BASE
from crewai_files.core.resolved import (
FileReference,
InlineBase64,
InlineBytes,
ResolvedFile,
UrlReference,
)
from crewai_files.core.sources import FileUrl
from crewai_files.core.types import FileInput
from crewai_files.processing.constraints import (
AudioConstraints,
ImageConstraints,
PDFConstraints,
ProviderConstraints,
VideoConstraints,
get_constraints_for_provider,
)
from crewai_files.uploaders import UploadResult, get_uploader
from crewai_files.uploaders.base import FileUploader
from crewai_files.uploaders.factory import ProviderType
logger = logging.getLogger(__name__)
@dataclass
class FileContext:
"""Cached file metadata to avoid redundant reads.
Attributes:
content: Raw file bytes.
size: Size of the file in bytes.
content_hash: SHA-256 hash of the file content.
content_type: MIME type of the file.
"""
content: bytes
size: int
content_hash: str
content_type: str
@dataclass
class FileResolverConfig:
"""Configuration for FileResolver.
Attributes:
prefer_upload: If True, prefer uploading over inline for supported providers.
upload_threshold_bytes: Size threshold above which to use upload.
If None, uses provider-specific threshold.
use_bytes_for_bedrock: If True, use raw bytes instead of base64 for Bedrock.
"""
prefer_upload: bool = False
upload_threshold_bytes: int | None = None
use_bytes_for_bedrock: bool = True
@dataclass
class FileResolver:
"""Resolves files to their delivery format based on provider capabilities.
Decides whether to use inline base64, raw bytes, or file upload based on:
- Provider constraints and capabilities
- File size
- Configuration preferences
Caches uploaded files to avoid redundant uploads.
Attributes:
config: Resolver configuration.
upload_cache: Cache for tracking uploaded files.
"""
config: FileResolverConfig = field(default_factory=FileResolverConfig)
upload_cache: UploadCache | None = None
_uploaders: dict[str, FileUploader] = field(default_factory=dict)
@staticmethod
def _build_file_context(file: FileInput) -> FileContext:
"""Build context by reading file once.
Args:
file: The file to build context for.
Returns:
FileContext with cached metadata.
"""
content = file.read()
return FileContext(
content=content,
size=len(content),
content_hash=hashlib.sha256(content).hexdigest(),
content_type=file.content_type,
)
@staticmethod
def _is_url_source(file: FileInput) -> bool:
"""Check if file source is a URL.
Args:
file: The file to check.
Returns:
True if the file source is a FileUrl, False otherwise.
"""
return isinstance(file._file_source, FileUrl)
@staticmethod
def _supports_url(constraints: ProviderConstraints | None) -> bool:
"""Check if provider supports URL references.
Args:
constraints: Provider constraints.
Returns:
True if the provider supports URL references, False otherwise.
"""
return constraints is not None and constraints.supports_url_references
@staticmethod
def _resolve_as_url(file: FileInput) -> UrlReference:
"""Resolve a URL source as UrlReference.
Args:
file: The file with URL source.
Returns:
UrlReference with the URL and content type.
"""
source = file._file_source
if not isinstance(source, FileUrl):
raise TypeError(f"Expected FileUrl source, got {type(source).__name__}")
return UrlReference(
content_type=file.content_type,
url=source.url,
)
def resolve(self, file: FileInput, provider: ProviderType) -> ResolvedFile:
"""Resolve a file to its delivery format for a provider.
Args:
file: The file to resolve.
provider: Provider name (e.g., "gemini", "anthropic", "openai").
Returns:
ResolvedFile representing the appropriate delivery format.
"""
constraints = get_constraints_for_provider(provider)
if self._is_url_source(file) and self._supports_url(constraints):
return self._resolve_as_url(file)
context = self._build_file_context(file)
should_upload = self._should_upload(file, provider, constraints, context.size)
if should_upload:
resolved = self._resolve_via_upload(file, provider, context)
if resolved is not None:
return resolved
return self._resolve_inline(file, provider, context)
def resolve_files(
self,
files: dict[str, FileInput],
provider: ProviderType,
) -> dict[str, ResolvedFile]:
"""Resolve multiple files for a provider.
Args:
files: Dictionary mapping names to file inputs.
provider: Provider name.
Returns:
Dictionary mapping names to resolved files.
"""
return {name: self.resolve(file, provider) for name, file in files.items()}
@staticmethod
def _get_type_constraint(
content_type: str,
constraints: ProviderConstraints,
) -> ImageConstraints | PDFConstraints | AudioConstraints | VideoConstraints | None:
"""Get type-specific constraint based on content type.
Args:
content_type: MIME type of the file.
constraints: Provider constraints.
Returns:
Type-specific constraint or None if not found.
"""
if content_type.startswith("image/"):
return constraints.image
if content_type == "application/pdf":
return constraints.pdf
if content_type.startswith("audio/"):
return constraints.audio
if content_type.startswith("video/"):
return constraints.video
return None
def _should_upload(
self,
file: FileInput,
provider: str,
constraints: ProviderConstraints | None,
file_size: int,
) -> bool:
"""Determine if a file should be uploaded rather than inlined.
Uses type-specific constraints to make smarter decisions:
- Checks if file exceeds type-specific inline size limits
- Falls back to general threshold if no type-specific constraint
Args:
file: The file to check.
provider: Provider name.
constraints: Provider constraints.
file_size: Size of the file in bytes.
Returns:
True if the file should be uploaded, False otherwise.
"""
if constraints is None or not constraints.supports_file_upload:
return False
if self.config.prefer_upload:
return True
content_type = file.content_type
type_constraint = self._get_type_constraint(content_type, constraints)
if type_constraint is not None:
# Check if file exceeds type-specific inline limit
if file_size > type_constraint.max_size_bytes:
logger.debug(
f"File {file.filename} ({file_size}B) exceeds {content_type} "
f"inline limit ({type_constraint.max_size_bytes}B) for {provider}"
)
return True
# Fall back to general threshold
threshold = self.config.upload_threshold_bytes
if threshold is None:
threshold = constraints.file_upload_threshold_bytes
if threshold is not None and file_size > threshold:
return True
return False
def _resolve_via_upload(
self,
file: FileInput,
provider: ProviderType,
context: FileContext,
) -> ResolvedFile | None:
"""Resolve a file by uploading it.
Args:
file: The file to upload.
provider: Provider name.
context: Pre-computed file context.
Returns:
FileReference if upload succeeds, None otherwise.
"""
if self.upload_cache is not None:
cached = self.upload_cache.get_by_hash(context.content_hash, provider)
if cached is not None:
logger.debug(
f"Using cached upload for {file.filename}: {cached.file_id}"
)
return FileReference(
content_type=cached.content_type,
file_id=cached.file_id,
provider=cached.provider,
expires_at=cached.expires_at,
file_uri=cached.file_uri,
)
uploader = self._get_uploader(provider)
if uploader is None:
logger.debug(f"No uploader available for {provider}")
return None
result = self._upload_with_retry(uploader, file, provider, context.size)
if result is None:
return None
if self.upload_cache is not None:
self.upload_cache.set_by_hash(
file_hash=context.content_hash,
content_type=context.content_type,
provider=provider,
file_id=result.file_id,
file_uri=result.file_uri,
expires_at=result.expires_at,
)
return FileReference(
content_type=result.content_type,
file_id=result.file_id,
provider=result.provider,
expires_at=result.expires_at,
file_uri=result.file_uri,
)
@staticmethod
def _upload_with_retry(
uploader: FileUploader,
file: FileInput,
provider: str,
file_size: int,
) -> UploadResult | None:
"""Upload with exponential backoff retry.
Args:
uploader: The uploader to use.
file: The file to upload.
provider: Provider name for logging.
file_size: Size of the file in bytes.
Returns:
UploadResult if successful, None otherwise.
"""
import time
from crewai_files.processing.exceptions import (
PermanentUploadError,
TransientUploadError,
)
last_error: Exception | None = None
for attempt in range(UPLOAD_MAX_RETRIES):
with measure_operation(
"upload",
filename=file.filename,
provider=provider,
size_bytes=file_size,
attempt=attempt + 1,
) as metrics:
try:
result = uploader.upload(file)
metrics.metadata["file_id"] = result.file_id
return result
except PermanentUploadError as e:
metrics.metadata["error_type"] = "permanent"
logger.warning(
f"Non-retryable upload error for {file.filename}: {e}"
)
return None
except TransientUploadError as e:
metrics.metadata["error_type"] = "transient"
last_error = e
except Exception as e:
metrics.metadata["error_type"] = "unknown"
last_error = e
if attempt < UPLOAD_MAX_RETRIES - 1:
delay = UPLOAD_RETRY_DELAY_BASE**attempt
logger.debug(
f"Retrying upload for {file.filename} in {delay}s (attempt {attempt + 1})"
)
time.sleep(delay)
logger.warning(
f"Upload failed for {file.filename} to {provider} after {UPLOAD_MAX_RETRIES} attempts: {last_error}"
)
return None
def _resolve_inline(
self,
file: FileInput,
provider: str,
context: FileContext,
) -> ResolvedFile:
"""Resolve a file as inline content.
Args:
file: The file to resolve (used for logging).
provider: Provider name.
context: Pre-computed file context.
Returns:
InlineBase64 or InlineBytes depending on provider.
"""
logger.debug(f"Resolving {file.filename} as inline for {provider}")
if self.config.use_bytes_for_bedrock and "bedrock" in provider:
return InlineBytes(
content_type=context.content_type,
data=context.content,
)
encoded = base64.b64encode(context.content).decode("ascii")
return InlineBase64(
content_type=context.content_type,
data=encoded,
)
async def aresolve(self, file: FileInput, provider: ProviderType) -> ResolvedFile:
"""Async resolve a file to its delivery format for a provider.
Args:
file: The file to resolve.
provider: Provider name (e.g., "gemini", "anthropic", "openai").
Returns:
ResolvedFile representing the appropriate delivery format.
"""
constraints = get_constraints_for_provider(provider)
if self._is_url_source(file) and self._supports_url(constraints):
return self._resolve_as_url(file)
context = self._build_file_context(file)
should_upload = self._should_upload(file, provider, constraints, context.size)
if should_upload:
resolved = await self._aresolve_via_upload(file, provider, context)
if resolved is not None:
return resolved
return self._resolve_inline(file, provider, context)
async def aresolve_files(
self,
files: dict[str, FileInput],
provider: ProviderType,
max_concurrency: int = 10,
) -> dict[str, ResolvedFile]:
"""Async resolve multiple files in parallel.
Args:
files: Dictionary mapping names to file inputs.
provider: Provider name.
max_concurrency: Maximum number of concurrent resolutions.
Returns:
Dictionary mapping names to resolved files.
"""
semaphore = asyncio.Semaphore(max_concurrency)
async def resolve_single(
entry_key: str, input_file: FileInput
) -> tuple[str, ResolvedFile]:
"""Resolve a single file with semaphore limiting."""
async with semaphore:
entry_resolved = await self.aresolve(input_file, provider)
return entry_key, entry_resolved
tasks = [resolve_single(n, f) for n, f in files.items()]
gather_results = await asyncio.gather(*tasks, return_exceptions=True)
output: dict[str, ResolvedFile] = {}
for item in gather_results:
if isinstance(item, BaseException):
logger.error(f"Resolution failed: {item}")
continue
key, resolved = item
output[key] = resolved
return output
async def _aresolve_via_upload(
self,
file: FileInput,
provider: ProviderType,
context: FileContext,
) -> ResolvedFile | None:
"""Async resolve a file by uploading it.
Args:
file: The file to upload.
provider: Provider name.
context: Pre-computed file context.
Returns:
FileReference if upload succeeds, None otherwise.
"""
if self.upload_cache is not None:
cached = await self.upload_cache.aget_by_hash(
context.content_hash, provider
)
if cached is not None:
logger.debug(
f"Using cached upload for {file.filename}: {cached.file_id}"
)
return FileReference(
content_type=cached.content_type,
file_id=cached.file_id,
provider=cached.provider,
expires_at=cached.expires_at,
file_uri=cached.file_uri,
)
uploader = self._get_uploader(provider)
if uploader is None:
logger.debug(f"No uploader available for {provider}")
return None
result = await self._aupload_with_retry(uploader, file, provider, context.size)
if result is None:
return None
if self.upload_cache is not None:
await self.upload_cache.aset_by_hash(
file_hash=context.content_hash,
content_type=context.content_type,
provider=provider,
file_id=result.file_id,
file_uri=result.file_uri,
expires_at=result.expires_at,
)
return FileReference(
content_type=result.content_type,
file_id=result.file_id,
provider=result.provider,
expires_at=result.expires_at,
file_uri=result.file_uri,
)
@staticmethod
async def _aupload_with_retry(
uploader: FileUploader,
file: FileInput,
provider: str,
file_size: int,
) -> UploadResult | None:
"""Async upload with exponential backoff retry.
Args:
uploader: The uploader to use.
file: The file to upload.
provider: Provider name for logging.
file_size: Size of the file in bytes.
Returns:
UploadResult if successful, None otherwise.
"""
from crewai_files.processing.exceptions import (
PermanentUploadError,
TransientUploadError,
)
last_error: Exception | None = None
for attempt in range(UPLOAD_MAX_RETRIES):
with measure_operation(
"upload",
filename=file.filename,
provider=provider,
size_bytes=file_size,
attempt=attempt + 1,
) as metrics:
try:
result = await uploader.aupload(file)
metrics.metadata["file_id"] = result.file_id
return result
except PermanentUploadError as e:
metrics.metadata["error_type"] = "permanent"
logger.warning(
f"Non-retryable upload error for {file.filename}: {e}"
)
return None
except TransientUploadError as e:
metrics.metadata["error_type"] = "transient"
last_error = e
except Exception as e:
metrics.metadata["error_type"] = "unknown"
last_error = e
if attempt < UPLOAD_MAX_RETRIES - 1:
delay = UPLOAD_RETRY_DELAY_BASE**attempt
logger.debug(
f"Retrying upload for {file.filename} in {delay}s (attempt {attempt + 1})"
)
await asyncio.sleep(delay)
logger.warning(
f"Upload failed for {file.filename} to {provider} after {UPLOAD_MAX_RETRIES} attempts: {last_error}"
)
return None
def _get_uploader(self, provider: ProviderType) -> FileUploader | None:
"""Get or create an uploader for a provider.
Args:
provider: Provider name.
Returns:
FileUploader instance or None if not available.
"""
if provider not in self._uploaders:
uploader = get_uploader(provider)
if uploader is not None:
self._uploaders[provider] = uploader
else:
return None
return self._uploaders.get(provider)
def get_cached_uploads(self, provider: ProviderType) -> list[CachedUpload]:
"""Get all cached uploads for a provider.
Args:
provider: Provider name.
Returns:
List of cached uploads.
"""
if self.upload_cache is None:
return []
return self.upload_cache.get_all_for_provider(provider)
def clear_cache(self) -> None:
"""Clear the upload cache."""
if self.upload_cache is not None:
self.upload_cache.clear()
def create_resolver(
provider: str | None = None,
prefer_upload: bool = False,
upload_threshold_bytes: int | None = None,
enable_cache: bool = True,
) -> FileResolver:
"""Create a configured FileResolver.
Args:
provider: Optional provider name to load default threshold from constraints.
prefer_upload: Whether to prefer upload over inline.
upload_threshold_bytes: Size threshold for using upload. If None and
provider is specified, uses provider's default threshold.
enable_cache: Whether to enable upload caching.
Returns:
Configured FileResolver instance.
"""
threshold = upload_threshold_bytes
if threshold is None and provider is not None:
constraints = get_constraints_for_provider(provider)
if constraints is not None:
threshold = constraints.file_upload_threshold_bytes
config = FileResolverConfig(
prefer_upload=prefer_upload,
upload_threshold_bytes=threshold,
)
cache = UploadCache() if enable_cache else None
return FileResolver(config=config, upload_cache=cache)

View File

@@ -0,0 +1,91 @@
"""Utility functions for file handling."""
from __future__ import annotations
from pathlib import Path
from typing import TYPE_CHECKING
from crewai_files.core.sources import is_file_source
if TYPE_CHECKING:
from crewai_files.core.sources import FileSource, FileSourceInput
from crewai_files.core.types import FileInput
__all__ = ["is_file_source", "normalize_input_files", "wrap_file_source"]
def wrap_file_source(source: FileSource) -> FileInput:
"""Wrap a FileSource in the appropriate typed FileInput wrapper.
Args:
source: The file source to wrap.
Returns:
Typed FileInput wrapper based on content type.
"""
from crewai_files.core.types import (
AudioFile,
ImageFile,
PDFFile,
TextFile,
VideoFile,
)
content_type = source.content_type
if content_type.startswith("image/"):
return ImageFile(source=source)
if content_type.startswith("audio/"):
return AudioFile(source=source)
if content_type.startswith("video/"):
return VideoFile(source=source)
if content_type == "application/pdf":
return PDFFile(source=source)
return TextFile(source=source)
def normalize_input_files(
input_files: list[FileSourceInput | FileInput],
) -> dict[str, FileInput]:
"""Convert a list of file sources to a named dictionary of FileInputs.
Args:
input_files: List of file source inputs or File objects.
Returns:
Dictionary mapping names to FileInput wrappers.
"""
from crewai_files.core.sources import FileBytes, FilePath, FileStream, FileUrl
from crewai_files.core.types import BaseFile
result: dict[str, FileInput] = {}
for i, item in enumerate(input_files):
if isinstance(item, BaseFile):
name = item.filename or f"file_{i}"
if "." in name:
name = name.rsplit(".", 1)[0]
result[name] = item
continue
file_source: FilePath | FileBytes | FileStream | FileUrl
if isinstance(item, (FilePath, FileBytes, FileStream, FileUrl)):
file_source = item
elif isinstance(item, Path):
file_source = FilePath(path=item)
elif isinstance(item, str):
if item.startswith(("http://", "https://")):
file_source = FileUrl(url=item)
else:
file_source = FilePath(path=Path(item))
elif isinstance(item, (bytes, memoryview)):
file_source = FileBytes(data=bytes(item))
else:
continue
name = file_source.filename or f"file_{i}"
result[name] = wrap_file_source(file_source)
return result

View File

@@ -0,0 +1,11 @@
"""File uploader implementations for provider File APIs."""
from crewai_files.uploaders.base import FileUploader, UploadResult
from crewai_files.uploaders.factory import get_uploader
__all__ = [
"FileUploader",
"UploadResult",
"get_uploader",
]

View File

@@ -0,0 +1,242 @@
"""Anthropic Files API uploader implementation."""
from __future__ import annotations
import logging
import os
from typing import Any
from crewai_files.core.sources import generate_filename
from crewai_files.core.types import FileInput
from crewai_files.processing.exceptions import classify_upload_error
from crewai_files.uploaders.base import FileUploader, UploadResult
logger = logging.getLogger(__name__)
class AnthropicFileUploader(FileUploader):
"""Uploader for Anthropic Files API.
Uses the anthropic SDK to upload files. Files are stored persistently
until explicitly deleted.
"""
def __init__(
self,
api_key: str | None = None,
client: Any = None,
async_client: Any = None,
) -> None:
"""Initialize the Anthropic uploader.
Args:
api_key: Optional Anthropic API key. If not provided, uses
ANTHROPIC_API_KEY environment variable.
client: Optional pre-instantiated Anthropic client.
async_client: Optional pre-instantiated async Anthropic client.
"""
self._api_key = api_key or os.environ.get("ANTHROPIC_API_KEY")
self._client: Any = client
self._async_client: Any = async_client
@property
def provider_name(self) -> str:
"""Return the provider name."""
return "anthropic"
def _get_client(self) -> Any:
"""Get or create the Anthropic client."""
if self._client is None:
try:
import anthropic
self._client = anthropic.Anthropic(api_key=self._api_key)
except ImportError as e:
raise ImportError(
"anthropic is required for Anthropic file uploads. "
"Install with: pip install anthropic"
) from e
return self._client
def _get_async_client(self) -> Any:
"""Get or create the async Anthropic client."""
if self._async_client is None:
try:
import anthropic
self._async_client = anthropic.AsyncAnthropic(api_key=self._api_key)
except ImportError as e:
raise ImportError(
"anthropic is required for Anthropic file uploads. "
"Install with: pip install anthropic"
) from e
return self._async_client
def upload(self, file: FileInput, purpose: str | None = None) -> UploadResult:
"""Upload a file to Anthropic.
Args:
file: The file to upload.
purpose: Optional purpose for the file (default: "user_upload").
Returns:
UploadResult with the file ID and metadata.
Raises:
TransientUploadError: For retryable errors (network, rate limits).
PermanentUploadError: For non-retryable errors (auth, validation).
"""
try:
client = self._get_client()
content = file.read()
logger.info(
f"Uploading file '{file.filename}' to Anthropic ({len(content)} bytes)"
)
filename = file.filename or generate_filename(file.content_type)
uploaded_file = client.beta.files.upload(
file=(filename, content, file.content_type),
)
logger.info(f"Uploaded to Anthropic: {uploaded_file.id}")
return UploadResult(
file_id=uploaded_file.id,
file_uri=None,
content_type=file.content_type,
expires_at=None,
provider=self.provider_name,
)
except ImportError:
raise
except Exception as e:
raise classify_upload_error(e, file.filename) from e
def delete(self, file_id: str) -> bool:
"""Delete an uploaded file from Anthropic.
Args:
file_id: The file ID to delete.
Returns:
True if deletion was successful, False otherwise.
"""
try:
client = self._get_client()
client.beta.files.delete(file_id=file_id)
logger.info(f"Deleted Anthropic file: {file_id}")
return True
except Exception as e:
logger.warning(f"Failed to delete Anthropic file {file_id}: {e}")
return False
def get_file_info(self, file_id: str) -> dict[str, Any] | None:
"""Get information about an uploaded file.
Args:
file_id: The file ID.
Returns:
Dictionary with file information, or None if not found.
"""
try:
client = self._get_client()
file_info = client.beta.files.retrieve(file_id=file_id)
return {
"id": file_info.id,
"filename": file_info.filename,
"purpose": file_info.purpose,
"size_bytes": file_info.size_bytes,
"created_at": file_info.created_at,
}
except Exception as e:
logger.debug(f"Failed to get Anthropic file info for {file_id}: {e}")
return None
def list_files(self) -> list[dict[str, Any]]:
"""List all uploaded files.
Returns:
List of dictionaries with file information.
"""
try:
client = self._get_client()
files = client.beta.files.list()
return [
{
"id": f.id,
"filename": f.filename,
"purpose": f.purpose,
"size_bytes": f.size_bytes,
"created_at": f.created_at,
}
for f in files.data
]
except Exception as e:
logger.warning(f"Failed to list Anthropic files: {e}")
return []
async def aupload(
self, file: FileInput, purpose: str | None = None
) -> UploadResult:
"""Async upload a file to Anthropic using native async client.
Args:
file: The file to upload.
purpose: Optional purpose for the file (default: "user_upload").
Returns:
UploadResult with the file ID and metadata.
Raises:
TransientUploadError: For retryable errors (network, rate limits).
PermanentUploadError: For non-retryable errors (auth, validation).
"""
try:
client = self._get_async_client()
content = await file.aread()
logger.info(
f"Uploading file '{file.filename}' to Anthropic ({len(content)} bytes)"
)
filename = file.filename or generate_filename(file.content_type)
uploaded_file = await client.beta.files.upload(
file=(filename, content, file.content_type),
)
logger.info(f"Uploaded to Anthropic: {uploaded_file.id}")
return UploadResult(
file_id=uploaded_file.id,
file_uri=None,
content_type=file.content_type,
expires_at=None,
provider=self.provider_name,
)
except ImportError:
raise
except Exception as e:
raise classify_upload_error(e, file.filename) from e
async def adelete(self, file_id: str) -> bool:
"""Async delete an uploaded file from Anthropic.
Args:
file_id: The file ID to delete.
Returns:
True if deletion was successful, False otherwise.
"""
try:
client = self._get_async_client()
await client.beta.files.delete(file_id=file_id)
logger.info(f"Deleted Anthropic file: {file_id}")
return True
except Exception as e:
logger.warning(f"Failed to delete Anthropic file {file_id}: {e}")
return False

View File

@@ -0,0 +1,118 @@
"""Base class for file uploaders."""
from abc import ABC, abstractmethod
import asyncio
from dataclasses import dataclass
from datetime import datetime
from typing import Any
from crewai_files.core.types import FileInput
@dataclass
class UploadResult:
"""Result of a file upload operation.
Attributes:
file_id: Provider-specific file identifier.
file_uri: Optional URI for accessing the file.
content_type: MIME type of the uploaded file.
expires_at: When the upload expires (if applicable).
provider: Name of the provider.
"""
file_id: str
provider: str
content_type: str
file_uri: str | None = None
expires_at: datetime | None = None
class FileUploader(ABC):
"""Abstract base class for provider file uploaders.
Implementations handle uploading files to provider-specific File APIs.
"""
@property
@abstractmethod
def provider_name(self) -> str:
"""Return the provider name."""
@abstractmethod
def upload(self, file: FileInput, purpose: str | None = None) -> UploadResult:
"""Upload a file to the provider.
Args:
file: The file to upload.
purpose: Optional purpose/description for the upload.
Returns:
UploadResult with the file identifier and metadata.
Raises:
Exception: If upload fails.
"""
async def aupload(
self, file: FileInput, purpose: str | None = None
) -> UploadResult:
"""Async upload a file to the provider.
Default implementation runs sync upload in executor.
Override in subclasses for native async support.
Args:
file: The file to upload.
purpose: Optional purpose/description for the upload.
Returns:
UploadResult with the file identifier and metadata.
"""
loop = asyncio.get_running_loop()
return await loop.run_in_executor(None, self.upload, file, purpose)
@abstractmethod
def delete(self, file_id: str) -> bool:
"""Delete an uploaded file.
Args:
file_id: The file identifier to delete.
Returns:
True if deletion was successful, False otherwise.
"""
async def adelete(self, file_id: str) -> bool:
"""Async delete an uploaded file.
Default implementation runs sync delete in executor.
Override in subclasses for native async support.
Args:
file_id: The file identifier to delete.
Returns:
True if deletion was successful, False otherwise.
"""
loop = asyncio.get_running_loop()
return await loop.run_in_executor(None, self.delete, file_id)
def get_file_info(self, file_id: str) -> dict[str, Any] | None:
"""Get information about an uploaded file.
Args:
file_id: The file identifier.
Returns:
Dictionary with file information, or None if not found.
"""
return None
def list_files(self) -> list[dict[str, Any]]:
"""List all uploaded files.
Returns:
List of dictionaries with file information.
"""
return []

View File

@@ -0,0 +1,477 @@
"""AWS Bedrock S3 file uploader implementation."""
from __future__ import annotations
import hashlib
import logging
import os
from pathlib import Path
from typing import Any
from crewai_files.core.constants import (
MAX_CONCURRENCY,
MULTIPART_CHUNKSIZE,
MULTIPART_THRESHOLD,
)
from crewai_files.core.sources import FileBytes, FilePath
from crewai_files.core.types import FileInput
from crewai_files.processing.exceptions import (
PermanentUploadError,
TransientUploadError,
)
from crewai_files.uploaders.base import FileUploader, UploadResult
logger = logging.getLogger(__name__)
def _classify_s3_error(e: Exception, filename: str | None) -> Exception:
"""Classify an S3 exception as transient or permanent upload error.
Args:
e: The exception to classify.
filename: The filename for error context.
Returns:
A TransientUploadError or PermanentUploadError wrapping the original.
"""
error_type = type(e).__name__
error_code = getattr(e, "response", {}).get("Error", {}).get("Code", "")
if error_code in ("SlowDown", "ServiceUnavailable", "InternalError"):
return TransientUploadError(f"Transient S3 error: {e}", file_name=filename)
if error_code in ("AccessDenied", "InvalidAccessKeyId", "SignatureDoesNotMatch"):
return PermanentUploadError(f"S3 authentication error: {e}", file_name=filename)
if error_code in ("NoSuchBucket", "InvalidBucketName"):
return PermanentUploadError(f"S3 bucket error: {e}", file_name=filename)
if "Throttl" in error_type or "Throttl" in str(e):
return TransientUploadError(f"S3 throttling: {e}", file_name=filename)
return TransientUploadError(f"S3 upload failed: {e}", file_name=filename)
def _get_file_path(file: FileInput) -> Path | None:
"""Get the filesystem path if file source is FilePath.
Args:
file: The file input to check.
Returns:
Path if source is FilePath, None otherwise.
"""
source = file._file_source
if isinstance(source, FilePath):
return source.path
return None
def _get_file_size(file: FileInput) -> int | None:
"""Get file size without reading content if possible.
Args:
file: The file input.
Returns:
Size in bytes if determinable without reading, None otherwise.
"""
source = file._file_source
if isinstance(source, FilePath):
return source.path.stat().st_size
if isinstance(source, FileBytes):
return len(source.data)
return None
def _compute_hash_streaming(file_path: Path) -> str:
"""Compute SHA-256 hash by streaming file content.
Args:
file_path: Path to the file.
Returns:
First 16 characters of hex digest.
"""
hasher = hashlib.sha256()
with open(file_path, "rb") as f:
while chunk := f.read(1024 * 1024):
hasher.update(chunk)
return hasher.hexdigest()[:16]
class BedrockFileUploader(FileUploader):
"""Uploader for AWS Bedrock via S3.
Uploads files to S3 and returns S3 URIs that can be used with Bedrock's
Converse API s3Location source format.
"""
def __init__(
self,
bucket_name: str | None = None,
bucket_owner: str | None = None,
prefix: str = "crewai-files",
region: str | None = None,
client: Any = None,
async_client: Any = None,
) -> None:
"""Initialize the Bedrock S3 uploader.
Args:
bucket_name: S3 bucket name. If not provided, uses
CREWAI_BEDROCK_S3_BUCKET environment variable.
bucket_owner: Optional bucket owner account ID for cross-account access.
Uses CREWAI_BEDROCK_S3_BUCKET_OWNER environment variable if not provided.
prefix: S3 key prefix for uploaded files (default: "crewai-files").
region: AWS region. Uses AWS_REGION or AWS_DEFAULT_REGION if not provided.
client: Optional pre-instantiated boto3 S3 client.
async_client: Optional pre-instantiated aioboto3 S3 client.
"""
self._bucket_name = bucket_name or os.environ.get("CREWAI_BEDROCK_S3_BUCKET")
self._bucket_owner = bucket_owner or os.environ.get(
"CREWAI_BEDROCK_S3_BUCKET_OWNER"
)
self._prefix = prefix
self._region = region or os.environ.get(
"AWS_REGION", os.environ.get("AWS_DEFAULT_REGION")
)
self._client: Any = client
self._async_client: Any = async_client
@property
def provider_name(self) -> str:
"""Return the provider name."""
return "bedrock"
@property
def bucket_name(self) -> str:
"""Return the configured bucket name."""
if not self._bucket_name:
raise ValueError(
"S3 bucket name not configured. Set CREWAI_BEDROCK_S3_BUCKET "
"environment variable or pass bucket_name parameter."
)
return self._bucket_name
@property
def bucket_owner(self) -> str | None:
"""Return the configured bucket owner."""
return self._bucket_owner
def _get_client(self) -> Any:
"""Get or create the S3 client."""
if self._client is None:
try:
import boto3
self._client = boto3.client("s3", region_name=self._region)
except ImportError as e:
raise ImportError(
"boto3 is required for Bedrock S3 file uploads. "
"Install with: pip install boto3"
) from e
return self._client
def _get_async_client(self) -> Any:
"""Get or create the async S3 client."""
if self._async_client is None:
try:
import aioboto3 # type: ignore[import-not-found]
self._session = aioboto3.Session()
except ImportError as e:
raise ImportError(
"aioboto3 is required for async Bedrock S3 file uploads. "
"Install with: pip install aioboto3"
) from e
return self._session
def _generate_s3_key(self, file: FileInput, content: bytes | None = None) -> str:
"""Generate a unique S3 key for the file.
For FilePath sources with no content provided, computes hash via streaming.
Args:
file: The file being uploaded.
content: The file content bytes (optional for FilePath sources).
Returns:
S3 key string.
"""
if content is not None:
content_hash = hashlib.sha256(content).hexdigest()[:16]
else:
file_path = _get_file_path(file)
if file_path is not None:
content_hash = _compute_hash_streaming(file_path)
else:
content_hash = hashlib.sha256(file.read()).hexdigest()[:16]
filename = file.filename or "file"
safe_filename = "".join(
c if c.isalnum() or c in ".-_" else "_" for c in filename
)
return f"{self._prefix}/{content_hash}_{safe_filename}"
def _build_s3_uri(self, key: str) -> str:
"""Build an S3 URI from a key.
Args:
key: The S3 object key.
Returns:
S3 URI string.
"""
return f"s3://{self.bucket_name}/{key}"
@staticmethod
def _get_transfer_config() -> Any:
"""Get boto3 TransferConfig for multipart uploads."""
from boto3.s3.transfer import TransferConfig
return TransferConfig(
multipart_threshold=MULTIPART_THRESHOLD,
multipart_chunksize=MULTIPART_CHUNKSIZE,
max_concurrency=MAX_CONCURRENCY,
)
def upload(self, file: FileInput, purpose: str | None = None) -> UploadResult:
"""Upload a file to S3 for use with Bedrock.
Uses streaming upload with automatic multipart for large files.
For FilePath sources, streams directly from disk without loading into memory.
Args:
file: The file to upload.
purpose: Optional purpose (unused, kept for interface consistency).
Returns:
UploadResult with the S3 URI and metadata.
Raises:
TransientUploadError: For retryable errors (network, throttling).
PermanentUploadError: For non-retryable errors (auth, validation).
"""
import io
try:
client = self._get_client()
transfer_config = self._get_transfer_config()
file_path = _get_file_path(file)
if file_path is not None:
file_size = file_path.stat().st_size
s3_key = self._generate_s3_key(file)
logger.info(
f"Uploading file '{file.filename}' to S3 bucket "
f"'{self.bucket_name}' ({file_size} bytes, streaming)"
)
with open(file_path, "rb") as f:
client.upload_fileobj(
f,
self.bucket_name,
s3_key,
ExtraArgs={"ContentType": file.content_type},
Config=transfer_config,
)
else:
content = file.read()
s3_key = self._generate_s3_key(file, content)
logger.info(
f"Uploading file '{file.filename}' to S3 bucket "
f"'{self.bucket_name}' ({len(content)} bytes)"
)
client.upload_fileobj(
io.BytesIO(content),
self.bucket_name,
s3_key,
ExtraArgs={"ContentType": file.content_type},
Config=transfer_config,
)
s3_uri = self._build_s3_uri(s3_key)
logger.info(f"Uploaded to S3: {s3_uri}")
return UploadResult(
file_id=s3_key,
file_uri=s3_uri,
content_type=file.content_type,
expires_at=None,
provider=self.provider_name,
)
except ImportError:
raise
except Exception as e:
raise _classify_s3_error(e, file.filename) from e
def delete(self, file_id: str) -> bool:
"""Delete an uploaded file from S3.
Args:
file_id: The S3 key to delete.
Returns:
True if deletion was successful, False otherwise.
"""
try:
client = self._get_client()
client.delete_object(Bucket=self.bucket_name, Key=file_id)
logger.info(f"Deleted S3 object: s3://{self.bucket_name}/{file_id}")
return True
except Exception as e:
logger.warning(
f"Failed to delete S3 object s3://{self.bucket_name}/{file_id}: {e}"
)
return False
def get_file_info(self, file_id: str) -> dict[str, Any] | None:
"""Get information about an uploaded file.
Args:
file_id: The S3 key.
Returns:
Dictionary with file information, or None if not found.
"""
try:
client = self._get_client()
response = client.head_object(Bucket=self.bucket_name, Key=file_id)
return {
"id": file_id,
"uri": self._build_s3_uri(file_id),
"content_type": response.get("ContentType"),
"size": response.get("ContentLength"),
"last_modified": response.get("LastModified"),
"etag": response.get("ETag"),
}
except Exception as e:
logger.debug(f"Failed to get S3 object info for {file_id}: {e}")
return None
def list_files(self) -> list[dict[str, Any]]:
"""List all uploaded files in the configured prefix.
Returns:
List of dictionaries with file information.
"""
try:
client = self._get_client()
response = client.list_objects_v2(
Bucket=self.bucket_name,
Prefix=self._prefix,
)
return [
{
"id": obj["Key"],
"uri": self._build_s3_uri(obj["Key"]),
"size": obj.get("Size"),
"last_modified": obj.get("LastModified"),
"etag": obj.get("ETag"),
}
for obj in response.get("Contents", [])
]
except Exception as e:
logger.warning(f"Failed to list S3 objects: {e}")
return []
async def aupload(
self, file: FileInput, purpose: str | None = None
) -> UploadResult:
"""Async upload a file to S3 for use with Bedrock.
Uses streaming upload with automatic multipart for large files.
For FilePath sources, streams directly from disk without loading into memory.
Args:
file: The file to upload.
purpose: Optional purpose (unused, kept for interface consistency).
Returns:
UploadResult with the S3 URI and metadata.
Raises:
TransientUploadError: For retryable errors (network, throttling).
PermanentUploadError: For non-retryable errors (auth, validation).
"""
import io
import aiofiles
try:
session = self._get_async_client()
transfer_config = self._get_transfer_config()
file_path = _get_file_path(file)
if file_path is not None:
file_size = file_path.stat().st_size
s3_key = self._generate_s3_key(file)
logger.info(
f"Uploading file '{file.filename}' to S3 bucket "
f"'{self.bucket_name}' ({file_size} bytes, streaming)"
)
async with session.client("s3", region_name=self._region) as client:
async with aiofiles.open(file_path, "rb") as f:
await client.upload_fileobj(
f,
self.bucket_name,
s3_key,
ExtraArgs={"ContentType": file.content_type},
Config=transfer_config,
)
else:
content = await file.aread()
s3_key = self._generate_s3_key(file, content)
logger.info(
f"Uploading file '{file.filename}' to S3 bucket "
f"'{self.bucket_name}' ({len(content)} bytes)"
)
async with session.client("s3", region_name=self._region) as client:
await client.upload_fileobj(
io.BytesIO(content),
self.bucket_name,
s3_key,
ExtraArgs={"ContentType": file.content_type},
Config=transfer_config,
)
s3_uri = self._build_s3_uri(s3_key)
logger.info(f"Uploaded to S3: {s3_uri}")
return UploadResult(
file_id=s3_key,
file_uri=s3_uri,
content_type=file.content_type,
expires_at=None,
provider=self.provider_name,
)
except ImportError:
raise
except Exception as e:
raise _classify_s3_error(e, file.filename) from e
async def adelete(self, file_id: str) -> bool:
"""Async delete an uploaded file from S3.
Args:
file_id: The S3 key to delete.
Returns:
True if deletion was successful, False otherwise.
"""
try:
session = self._get_async_client()
async with session.client("s3", region_name=self._region) as client:
await client.delete_object(Bucket=self.bucket_name, Key=file_id)
logger.info(f"Deleted S3 object: s3://{self.bucket_name}/{file_id}")
return True
except Exception as e:
logger.warning(
f"Failed to delete S3 object s3://{self.bucket_name}/{file_id}: {e}"
)
return False

View File

@@ -0,0 +1,216 @@
"""Factory for creating file uploaders."""
from __future__ import annotations
import logging
from typing import Any as AnyType, Literal, TypeAlias, TypedDict, overload
from typing_extensions import NotRequired, Unpack
from crewai_files.uploaders.anthropic import AnthropicFileUploader
from crewai_files.uploaders.bedrock import BedrockFileUploader
from crewai_files.uploaders.gemini import GeminiFileUploader
from crewai_files.uploaders.openai import OpenAIFileUploader
logger = logging.getLogger(__name__)
FileUploaderType: TypeAlias = (
GeminiFileUploader
| AnthropicFileUploader
| BedrockFileUploader
| OpenAIFileUploader
)
GeminiProviderType = Literal["gemini", "google"]
AnthropicProviderType = Literal["anthropic", "claude"]
OpenAIProviderType = Literal["openai", "gpt", "azure"]
BedrockProviderType = Literal["bedrock", "aws"]
ProviderType: TypeAlias = (
GeminiProviderType
| AnthropicProviderType
| OpenAIProviderType
| BedrockProviderType
)
class _BaseOpts(TypedDict):
"""Kwargs for uploader factory."""
api_key: NotRequired[str | None]
client: NotRequired[AnyType]
async_client: NotRequired[AnyType]
class OpenAIOpts(_BaseOpts):
"""Kwargs for openai uploader factory."""
chunk_size: NotRequired[int]
class GeminiOpts(TypedDict):
"""Kwargs for gemini uploader factory."""
api_key: NotRequired[str | None]
client: NotRequired[AnyType]
class AnthropicOpts(_BaseOpts):
"""Kwargs for anthropic uploader factory."""
class BedrockOpts(TypedDict):
"""Kwargs for bedrock uploader factory."""
bucket_name: NotRequired[str | None]
bucket_owner: NotRequired[str | None]
prefix: NotRequired[str]
region: NotRequired[str | None]
client: NotRequired[AnyType]
async_client: NotRequired[AnyType]
class AllOptions(TypedDict):
"""Kwargs for uploader factory."""
api_key: NotRequired[str | None]
chunk_size: NotRequired[int]
bucket_name: NotRequired[str | None]
bucket_owner: NotRequired[str | None]
prefix: NotRequired[str]
region: NotRequired[str | None]
client: NotRequired[AnyType]
async_client: NotRequired[AnyType]
@overload
def get_uploader(
provider: GeminiProviderType,
**kwargs: Unpack[GeminiOpts],
) -> GeminiFileUploader:
"""Get Gemini file uploader."""
@overload
def get_uploader(
provider: AnthropicProviderType,
**kwargs: Unpack[AnthropicOpts],
) -> AnthropicFileUploader:
"""Get Anthropic file uploader."""
@overload
def get_uploader(
provider: OpenAIProviderType,
**kwargs: Unpack[OpenAIOpts],
) -> OpenAIFileUploader:
"""Get OpenAI file uploader."""
@overload
def get_uploader(
provider: BedrockProviderType,
**kwargs: Unpack[BedrockOpts],
) -> BedrockFileUploader:
"""Get Bedrock file uploader."""
@overload
def get_uploader(
provider: ProviderType, **kwargs: Unpack[AllOptions]
) -> FileUploaderType:
"""Get any file uploader."""
def get_uploader(
provider: ProviderType, **kwargs: Unpack[AllOptions]
) -> FileUploaderType:
"""Get a file uploader for a specific provider.
Args:
provider: Provider name (e.g., "gemini", "anthropic").
**kwargs: Additional arguments passed to the uploader constructor.
Returns:
FileUploader instance for the provider, or None if not supported.
"""
provider_lower = provider.lower()
if "gemini" in provider_lower or "google" in provider_lower:
try:
from crewai_files.uploaders.gemini import GeminiFileUploader
return GeminiFileUploader(
api_key=kwargs.get("api_key"),
client=kwargs.get("client"),
)
except ImportError:
logger.warning(
"google-genai not installed. Install with: pip install google-genai"
)
raise
if "anthropic" in provider_lower or "claude" in provider_lower:
try:
from crewai_files.uploaders.anthropic import AnthropicFileUploader
return AnthropicFileUploader(
api_key=kwargs.get("api_key"),
client=kwargs.get("client"),
async_client=kwargs.get("async_client"),
)
except ImportError:
logger.warning(
"anthropic not installed. Install with: pip install anthropic"
)
raise
if (
"openai" in provider_lower
or "gpt" in provider_lower
or "azure" in provider_lower
):
try:
from crewai_files.uploaders.openai import OpenAIFileUploader
return OpenAIFileUploader(
api_key=kwargs.get("api_key"),
chunk_size=kwargs.get("chunk_size", 67_108_864),
client=kwargs.get("client"),
async_client=kwargs.get("async_client"),
)
except ImportError:
logger.warning("openai not installed. Install with: pip install openai")
raise
if "bedrock" in provider_lower or "aws" in provider_lower:
import os
if (
not os.environ.get("CREWAI_BEDROCK_S3_BUCKET")
and "bucket_name" not in kwargs
):
logger.debug(
"Bedrock S3 uploader not configured. "
"Set CREWAI_BEDROCK_S3_BUCKET environment variable to enable."
)
raise
try:
from crewai_files.uploaders.bedrock import BedrockFileUploader
return BedrockFileUploader(
bucket_name=kwargs.get("bucket_name"),
bucket_owner=kwargs.get("bucket_owner"),
prefix=kwargs.get("prefix", "crewai-files"),
region=kwargs.get("region"),
client=kwargs.get("client"),
async_client=kwargs.get("async_client"),
)
except ImportError:
logger.warning("boto3 not installed. Install with: pip install boto3")
raise
logger.debug(f"No file uploader available for provider: {provider}")
raise

View File

@@ -0,0 +1,448 @@
"""Gemini File API uploader implementation."""
from __future__ import annotations
import asyncio
from datetime import datetime, timezone
import io
import logging
import os
from pathlib import Path
import random
import time
from typing import Any
from crewai_files.core.constants import (
BACKOFF_BASE_DELAY,
BACKOFF_JITTER_FACTOR,
BACKOFF_MAX_DELAY,
GEMINI_FILE_TTL,
)
from crewai_files.core.sources import FilePath
from crewai_files.core.types import FileInput
from crewai_files.processing.exceptions import (
PermanentUploadError,
TransientUploadError,
classify_upload_error,
)
from crewai_files.uploaders.base import FileUploader, UploadResult
logger = logging.getLogger(__name__)
def _compute_backoff_delay(attempt: int) -> float:
"""Compute exponential backoff delay with jitter.
Args:
attempt: The current attempt number (0-indexed).
Returns:
Delay in seconds with jitter applied.
"""
delay: float = min(BACKOFF_BASE_DELAY * (2**attempt), BACKOFF_MAX_DELAY)
jitter: float = random.uniform(0, delay * BACKOFF_JITTER_FACTOR) # noqa: S311
return float(delay + jitter)
def _classify_gemini_error(e: Exception, filename: str | None) -> Exception:
"""Classify a Gemini exception as transient or permanent upload error.
Checks Gemini-specific error message patterns first, then falls back
to generic status code classification.
Args:
e: The exception to classify.
filename: The filename for error context.
Returns:
A TransientUploadError or PermanentUploadError wrapping the original.
"""
error_msg = str(e).lower()
if "quota" in error_msg or "rate" in error_msg or "limit" in error_msg:
return TransientUploadError(f"Rate limit error: {e}", file_name=filename)
if "auth" in error_msg or "permission" in error_msg or "denied" in error_msg:
return PermanentUploadError(
f"Authentication/permission error: {e}", file_name=filename
)
if "invalid" in error_msg or "unsupported" in error_msg:
return PermanentUploadError(f"Invalid request: {e}", file_name=filename)
return classify_upload_error(e, filename)
def _get_file_path(file: FileInput) -> Path | None:
"""Get the filesystem path if file source is FilePath.
Args:
file: The file input to check.
Returns:
Path if source is FilePath, None otherwise.
"""
source = file._file_source
if isinstance(source, FilePath):
return source.path
return None
class GeminiFileUploader(FileUploader):
"""Uploader for Google Gemini File API.
Uses the google-genai SDK to upload files. Files are stored for 48 hours.
"""
def __init__(
self,
api_key: str | None = None,
client: Any = None,
) -> None:
"""Initialize the Gemini uploader.
Args:
api_key: Optional Google API key. If not provided, uses
GOOGLE_API_KEY environment variable.
client: Optional pre-instantiated Gemini client.
"""
self._api_key = api_key or os.environ.get("GOOGLE_API_KEY")
self._client: Any = client
@property
def provider_name(self) -> str:
"""Return the provider name."""
return "gemini"
def _get_client(self) -> Any:
"""Get or create the Gemini client."""
if self._client is None:
try:
from google import genai
self._client = genai.Client(api_key=self._api_key)
except ImportError as e:
raise ImportError(
"google-genai is required for Gemini file uploads. "
"Install with: pip install google-genai"
) from e
return self._client
def upload(self, file: FileInput, purpose: str | None = None) -> UploadResult:
"""Upload a file to Gemini.
For FilePath sources, passes the path directly to the SDK which handles
streaming internally via resumable uploads, avoiding memory overhead.
Args:
file: The file to upload.
purpose: Optional purpose/description (used as display name).
Returns:
UploadResult with the file URI and metadata.
Raises:
TransientUploadError: For retryable errors (network, rate limits).
PermanentUploadError: For non-retryable errors (auth, validation).
"""
try:
client = self._get_client()
display_name = purpose or file.filename
file_path = _get_file_path(file)
if file_path is not None:
file_size = file_path.stat().st_size
logger.info(
f"Uploading file '{file.filename}' to Gemini via path "
f"({file_size} bytes, streaming)"
)
uploaded_file = client.files.upload(
file=file_path,
config={
"display_name": display_name,
"mime_type": file.content_type,
},
)
else:
content = file.read()
file_data = io.BytesIO(content)
file_data.name = file.filename
logger.info(
f"Uploading file '{file.filename}' to Gemini ({len(content)} bytes)"
)
uploaded_file = client.files.upload(
file=file_data,
config={
"display_name": display_name,
"mime_type": file.content_type,
},
)
if file.content_type.startswith("video/"):
if not self.wait_for_processing(uploaded_file.name):
raise PermanentUploadError(
f"Video processing failed for {file.filename}",
file_name=file.filename,
)
expires_at = datetime.now(timezone.utc) + GEMINI_FILE_TTL
logger.info(
f"Uploaded to Gemini: {uploaded_file.name} (URI: {uploaded_file.uri})"
)
return UploadResult(
file_id=uploaded_file.name,
file_uri=uploaded_file.uri,
content_type=file.content_type,
expires_at=expires_at,
provider=self.provider_name,
)
except ImportError:
raise
except (TransientUploadError, PermanentUploadError):
raise
except Exception as e:
raise _classify_gemini_error(e, file.filename) from e
async def aupload(
self, file: FileInput, purpose: str | None = None
) -> UploadResult:
"""Async upload a file to Gemini using native async client.
For FilePath sources, passes the path directly to the SDK which handles
streaming internally via resumable uploads, avoiding memory overhead.
Args:
file: The file to upload.
purpose: Optional purpose/description (used as display name).
Returns:
UploadResult with the file URI and metadata.
Raises:
TransientUploadError: For retryable errors (network, rate limits).
PermanentUploadError: For non-retryable errors (auth, validation).
"""
try:
client = self._get_client()
display_name = purpose or file.filename
file_path = _get_file_path(file)
if file_path is not None:
file_size = file_path.stat().st_size
logger.info(
f"Uploading file '{file.filename}' to Gemini via path "
f"({file_size} bytes, streaming)"
)
uploaded_file = await client.aio.files.upload(
file=file_path,
config={
"display_name": display_name,
"mime_type": file.content_type,
},
)
else:
content = await file.aread()
file_data = io.BytesIO(content)
file_data.name = file.filename
logger.info(
f"Uploading file '{file.filename}' to Gemini ({len(content)} bytes)"
)
uploaded_file = await client.aio.files.upload(
file=file_data,
config={
"display_name": display_name,
"mime_type": file.content_type,
},
)
if file.content_type.startswith("video/"):
if not await self.await_for_processing(uploaded_file.name):
raise PermanentUploadError(
f"Video processing failed for {file.filename}",
file_name=file.filename,
)
expires_at = datetime.now(timezone.utc) + GEMINI_FILE_TTL
logger.info(
f"Uploaded to Gemini: {uploaded_file.name} (URI: {uploaded_file.uri})"
)
return UploadResult(
file_id=uploaded_file.name,
file_uri=uploaded_file.uri,
content_type=file.content_type,
expires_at=expires_at,
provider=self.provider_name,
)
except ImportError:
raise
except (TransientUploadError, PermanentUploadError):
raise
except Exception as e:
raise _classify_gemini_error(e, file.filename) from e
def delete(self, file_id: str) -> bool:
"""Delete an uploaded file from Gemini.
Args:
file_id: The file name/ID to delete.
Returns:
True if deletion was successful, False otherwise.
"""
try:
client = self._get_client()
client.files.delete(name=file_id)
logger.info(f"Deleted Gemini file: {file_id}")
return True
except Exception as e:
logger.warning(f"Failed to delete Gemini file {file_id}: {e}")
return False
async def adelete(self, file_id: str) -> bool:
"""Async delete an uploaded file from Gemini.
Args:
file_id: The file name/ID to delete.
Returns:
True if deletion was successful, False otherwise.
"""
try:
client = self._get_client()
await client.aio.files.delete(name=file_id)
logger.info(f"Deleted Gemini file: {file_id}")
return True
except Exception as e:
logger.warning(f"Failed to delete Gemini file {file_id}: {e}")
return False
def get_file_info(self, file_id: str) -> dict[str, Any] | None:
"""Get information about an uploaded file.
Args:
file_id: The file name/ID.
Returns:
Dictionary with file information, or None if not found.
"""
try:
client = self._get_client()
file_info = client.files.get(name=file_id)
return {
"name": file_info.name,
"uri": file_info.uri,
"display_name": file_info.display_name,
"mime_type": file_info.mime_type,
"size_bytes": file_info.size_bytes,
"state": str(file_info.state),
"create_time": file_info.create_time,
"expiration_time": file_info.expiration_time,
}
except Exception as e:
logger.debug(f"Failed to get Gemini file info for {file_id}: {e}")
return None
def list_files(self) -> list[dict[str, Any]]:
"""List all uploaded files.
Returns:
List of dictionaries with file information.
"""
try:
client = self._get_client()
files = client.files.list()
return [
{
"name": f.name,
"uri": f.uri,
"display_name": f.display_name,
"mime_type": f.mime_type,
"size_bytes": f.size_bytes,
"state": str(f.state),
}
for f in files
]
except Exception as e:
logger.warning(f"Failed to list Gemini files: {e}")
return []
def wait_for_processing(self, file_id: str, timeout_seconds: int = 300) -> bool:
"""Wait for a file to finish processing with exponential backoff.
Some files (especially videos) need time to process after upload.
Args:
file_id: The file name/ID.
timeout_seconds: Maximum time to wait.
Returns:
True if processing completed, False if timed out or failed.
"""
try:
from google.genai.types import FileState
except ImportError:
return True
client = self._get_client()
start_time = time.time()
attempt = 0
while time.time() - start_time < timeout_seconds:
file_info = client.files.get(name=file_id)
if file_info.state == FileState.ACTIVE:
return True
if file_info.state == FileState.FAILED:
logger.error(f"Gemini file processing failed: {file_id}")
return False
time.sleep(_compute_backoff_delay(attempt))
attempt += 1
logger.warning(f"Timed out waiting for Gemini file processing: {file_id}")
return False
async def await_for_processing(
self, file_id: str, timeout_seconds: int = 300
) -> bool:
"""Async wait for a file to finish processing with exponential backoff.
Some files (especially videos) need time to process after upload.
Args:
file_id: The file name/ID.
timeout_seconds: Maximum time to wait.
Returns:
True if processing completed, False if timed out or failed.
"""
try:
from google.genai.types import FileState
except ImportError:
return True
client = self._get_client()
start_time = time.time()
attempt = 0
while time.time() - start_time < timeout_seconds:
file_info = await client.aio.files.get(name=file_id)
if file_info.state == FileState.ACTIVE:
return True
if file_info.state == FileState.FAILED:
logger.error(f"Gemini file processing failed: {file_id}")
return False
await asyncio.sleep(_compute_backoff_delay(attempt))
attempt += 1
logger.warning(f"Timed out waiting for Gemini file processing: {file_id}")
return False

View File

@@ -0,0 +1,695 @@
"""OpenAI Files API uploader implementation."""
from __future__ import annotations
from collections.abc import AsyncIterator, Iterator
import io
import logging
import os
from typing import Any
from crewai_files.core.constants import DEFAULT_UPLOAD_CHUNK_SIZE, FILES_API_MAX_SIZE
from crewai_files.core.sources import FileBytes, FilePath, FileStream, generate_filename
from crewai_files.core.types import FileInput
from crewai_files.processing.exceptions import (
PermanentUploadError,
TransientUploadError,
classify_upload_error,
)
from crewai_files.uploaders.base import FileUploader, UploadResult
logger = logging.getLogger(__name__)
def _get_purpose_for_content_type(content_type: str, purpose: str | None) -> str:
"""Get the appropriate purpose for a file based on content type.
OpenAI Files API requires different purposes for different file types:
- Images (for Responses API vision): "vision"
- PDFs and other documents: "user_data"
Args:
content_type: MIME type of the file.
purpose: Optional explicit purpose override.
Returns:
The purpose string to use for upload.
"""
if purpose is not None:
return purpose
if content_type.startswith("image/"):
return "vision"
return "user_data"
def _get_file_size(file: FileInput) -> int | None:
"""Get file size without reading content if possible.
Args:
file: The file to get size for.
Returns:
File size in bytes, or None if size cannot be determined without reading.
"""
source = file._file_source
if isinstance(source, FilePath):
return source.path.stat().st_size
if isinstance(source, FileBytes):
return len(source.data)
return None
def _iter_file_chunks(file: FileInput, chunk_size: int) -> Iterator[bytes]:
"""Iterate over file content in chunks.
Args:
file: The file to read.
chunk_size: Size of each chunk in bytes.
Yields:
Chunks of file content.
"""
source = file._file_source
if isinstance(source, (FilePath, FileBytes, FileStream)):
yield from source.read_chunks(chunk_size)
else:
content = file.read()
for i in range(0, len(content), chunk_size):
yield content[i : i + chunk_size]
async def _aiter_file_chunks(
file: FileInput, chunk_size: int, content: bytes | None = None
) -> AsyncIterator[bytes]:
"""Async iterate over file content in chunks.
Args:
file: The file to read.
chunk_size: Size of each chunk in bytes.
content: Optional pre-loaded content to chunk.
Yields:
Chunks of file content.
"""
if content is not None:
for i in range(0, len(content), chunk_size):
yield content[i : i + chunk_size]
return
source = file._file_source
if isinstance(source, FilePath):
async for chunk in source.aread_chunks(chunk_size):
yield chunk
elif isinstance(source, (FileBytes, FileStream)):
for chunk in source.read_chunks(chunk_size):
yield chunk
else:
data = await file.aread()
for i in range(0, len(data), chunk_size):
yield data[i : i + chunk_size]
class OpenAIFileUploader(FileUploader):
"""Uploader for OpenAI Files and Uploads APIs.
Uses the Files API for files up to 512MB (single request).
Uses the Uploads API for files larger than 512MB (multipart chunked).
"""
def __init__(
self,
api_key: str | None = None,
chunk_size: int = DEFAULT_UPLOAD_CHUNK_SIZE,
client: Any = None,
async_client: Any = None,
) -> None:
"""Initialize the OpenAI uploader.
Args:
api_key: Optional OpenAI API key. If not provided, uses
OPENAI_API_KEY environment variable.
chunk_size: Chunk size in bytes for multipart uploads (default 64MB).
client: Optional pre-instantiated OpenAI client.
async_client: Optional pre-instantiated async OpenAI client.
"""
self._api_key = api_key or os.environ.get("OPENAI_API_KEY")
self._chunk_size = chunk_size
self._client: Any = client
self._async_client: Any = async_client
@property
def provider_name(self) -> str:
"""Return the provider name."""
return "openai"
def _build_upload_result(self, file_id: str, content_type: str) -> UploadResult:
"""Build an UploadResult for a completed upload.
Args:
file_id: The uploaded file ID.
content_type: The file's content type.
Returns:
UploadResult with the file metadata.
"""
return UploadResult(
file_id=file_id,
file_uri=None,
content_type=content_type,
expires_at=None,
provider=self.provider_name,
)
def _get_client(self) -> Any:
"""Get or create the OpenAI client."""
if self._client is None:
try:
from openai import OpenAI
self._client = OpenAI(api_key=self._api_key)
except ImportError as e:
raise ImportError(
"openai is required for OpenAI file uploads. "
"Install with: pip install openai"
) from e
return self._client
def _get_async_client(self) -> Any:
"""Get or create the async OpenAI client."""
if self._async_client is None:
try:
from openai import AsyncOpenAI
self._async_client = AsyncOpenAI(api_key=self._api_key)
except ImportError as e:
raise ImportError(
"openai is required for OpenAI file uploads. "
"Install with: pip install openai"
) from e
return self._async_client
def upload(self, file: FileInput, purpose: str | None = None) -> UploadResult:
"""Upload a file to OpenAI.
Uses Files API for files <= 512MB, Uploads API for larger files.
For large files, streams chunks to avoid loading entire file in memory.
Args:
file: The file to upload.
purpose: Optional purpose for the file (default: "user_data").
Returns:
UploadResult with the file ID and metadata.
Raises:
TransientUploadError: For retryable errors (network, rate limits).
PermanentUploadError: For non-retryable errors (auth, validation).
"""
try:
file_size = _get_file_size(file)
if file_size is not None and file_size > FILES_API_MAX_SIZE:
return self._upload_multipart_streaming(file, file_size, purpose)
content = file.read()
if len(content) > FILES_API_MAX_SIZE:
return self._upload_multipart(file, content, purpose)
return self._upload_simple(file, content, purpose)
except ImportError:
raise
except (TransientUploadError, PermanentUploadError):
raise
except Exception as e:
raise classify_upload_error(e, file.filename) from e
def _upload_simple(
self,
file: FileInput,
content: bytes,
purpose: str | None,
) -> UploadResult:
"""Upload using the Files API (single request, up to 512MB).
Args:
file: The file to upload.
content: File content bytes.
purpose: Optional purpose for the file.
Returns:
UploadResult with the file ID and metadata.
"""
client = self._get_client()
file_purpose = _get_purpose_for_content_type(file.content_type, purpose)
filename = file.filename or generate_filename(file.content_type)
file_data = io.BytesIO(content)
file_data.name = filename
logger.info(
f"Uploading file '{filename}' to OpenAI Files API ({len(content)} bytes)"
)
uploaded_file = client.files.create(
file=file_data,
purpose=file_purpose,
)
logger.info(f"Uploaded to OpenAI: {uploaded_file.id}")
return self._build_upload_result(uploaded_file.id, file.content_type)
def _upload_multipart(
self,
file: FileInput,
content: bytes,
purpose: str | None,
) -> UploadResult:
"""Upload using the Uploads API with content already in memory.
Args:
file: The file to upload.
content: File content bytes (already loaded).
purpose: Optional purpose for the file.
Returns:
UploadResult with the file ID and metadata.
"""
client = self._get_client()
file_purpose = _get_purpose_for_content_type(file.content_type, purpose)
filename = file.filename or generate_filename(file.content_type)
file_size = len(content)
logger.info(
f"Uploading file '{filename}' to OpenAI Uploads API "
f"({file_size} bytes, {self._chunk_size} byte chunks)"
)
upload = client.uploads.create(
bytes=file_size,
filename=filename,
mime_type=file.content_type,
purpose=file_purpose,
)
part_ids: list[str] = []
offset = 0
part_num = 1
try:
while offset < file_size:
chunk = content[offset : offset + self._chunk_size]
chunk_io = io.BytesIO(chunk)
logger.debug(
f"Uploading part {part_num} ({len(chunk)} bytes, offset {offset})"
)
part = client.uploads.parts.create(
upload_id=upload.id,
data=chunk_io,
)
part_ids.append(part.id)
offset += self._chunk_size
part_num += 1
completed = client.uploads.complete(
upload_id=upload.id,
part_ids=part_ids,
)
file_id = completed.file.id if completed.file else upload.id
logger.info(f"Completed multipart upload to OpenAI: {file_id}")
return self._build_upload_result(file_id, file.content_type)
except Exception:
logger.warning(f"Multipart upload failed, cancelling upload {upload.id}")
try:
client.uploads.cancel(upload_id=upload.id)
except Exception as cancel_err:
logger.debug(f"Failed to cancel upload: {cancel_err}")
raise
def _upload_multipart_streaming(
self,
file: FileInput,
file_size: int,
purpose: str | None,
) -> UploadResult:
"""Upload using the Uploads API with streaming chunks.
Streams chunks directly from the file source without loading
the entire file into memory. Used for large files.
Args:
file: The file to upload.
file_size: Total file size in bytes.
purpose: Optional purpose for the file.
Returns:
UploadResult with the file ID and metadata.
"""
client = self._get_client()
file_purpose = _get_purpose_for_content_type(file.content_type, purpose)
filename = file.filename or generate_filename(file.content_type)
logger.info(
f"Uploading file '{filename}' to OpenAI Uploads API (streaming) "
f"({file_size} bytes, {self._chunk_size} byte chunks)"
)
upload = client.uploads.create(
bytes=file_size,
filename=filename,
mime_type=file.content_type,
purpose=file_purpose,
)
part_ids: list[str] = []
part_num = 1
try:
for chunk in _iter_file_chunks(file, self._chunk_size):
chunk_io = io.BytesIO(chunk)
logger.debug(f"Uploading part {part_num} ({len(chunk)} bytes)")
part = client.uploads.parts.create(
upload_id=upload.id,
data=chunk_io,
)
part_ids.append(part.id)
part_num += 1
completed = client.uploads.complete(
upload_id=upload.id,
part_ids=part_ids,
)
file_id = completed.file.id if completed.file else upload.id
logger.info(f"Completed streaming multipart upload to OpenAI: {file_id}")
return self._build_upload_result(file_id, file.content_type)
except Exception:
logger.warning(f"Multipart upload failed, cancelling upload {upload.id}")
try:
client.uploads.cancel(upload_id=upload.id)
except Exception as cancel_err:
logger.debug(f"Failed to cancel upload: {cancel_err}")
raise
def delete(self, file_id: str) -> bool:
"""Delete an uploaded file from OpenAI.
Args:
file_id: The file ID to delete.
Returns:
True if deletion was successful, False otherwise.
"""
try:
client = self._get_client()
client.files.delete(file_id)
logger.info(f"Deleted OpenAI file: {file_id}")
return True
except Exception as e:
logger.warning(f"Failed to delete OpenAI file {file_id}: {e}")
return False
def get_file_info(self, file_id: str) -> dict[str, Any] | None:
"""Get information about an uploaded file.
Args:
file_id: The file ID.
Returns:
Dictionary with file information, or None if not found.
"""
try:
client = self._get_client()
file_info = client.files.retrieve(file_id)
return {
"id": file_info.id,
"filename": file_info.filename,
"purpose": file_info.purpose,
"bytes": file_info.bytes,
"created_at": file_info.created_at,
"status": file_info.status,
}
except Exception as e:
logger.debug(f"Failed to get OpenAI file info for {file_id}: {e}")
return None
def list_files(self) -> list[dict[str, Any]]:
"""List all uploaded files.
Returns:
List of dictionaries with file information.
"""
try:
client = self._get_client()
files = client.files.list()
return [
{
"id": f.id,
"filename": f.filename,
"purpose": f.purpose,
"bytes": f.bytes,
"created_at": f.created_at,
"status": f.status,
}
for f in files.data
]
except Exception as e:
logger.warning(f"Failed to list OpenAI files: {e}")
return []
async def aupload(
self, file: FileInput, purpose: str | None = None
) -> UploadResult:
"""Async upload a file to OpenAI using native async client.
Uses Files API for files <= 512MB, Uploads API for larger files.
For large files, streams chunks to avoid loading entire file in memory.
Args:
file: The file to upload.
purpose: Optional purpose for the file (default: "user_data").
Returns:
UploadResult with the file ID and metadata.
Raises:
TransientUploadError: For retryable errors (network, rate limits).
PermanentUploadError: For non-retryable errors (auth, validation).
"""
try:
file_size = _get_file_size(file)
if file_size is not None and file_size > FILES_API_MAX_SIZE:
return await self._aupload_multipart_streaming(file, file_size, purpose)
content = await file.aread()
if len(content) > FILES_API_MAX_SIZE:
return await self._aupload_multipart(file, content, purpose)
return await self._aupload_simple(file, content, purpose)
except ImportError:
raise
except (TransientUploadError, PermanentUploadError):
raise
except Exception as e:
raise classify_upload_error(e, file.filename) from e
async def _aupload_simple(
self,
file: FileInput,
content: bytes,
purpose: str | None,
) -> UploadResult:
"""Async upload using the Files API (single request, up to 512MB).
Args:
file: The file to upload.
content: File content bytes.
purpose: Optional purpose for the file.
Returns:
UploadResult with the file ID and metadata.
"""
client = self._get_async_client()
file_purpose = _get_purpose_for_content_type(file.content_type, purpose)
file_data = io.BytesIO(content)
file_data.name = file.filename or generate_filename(file.content_type)
logger.info(
f"Uploading file '{file.filename}' to OpenAI Files API ({len(content)} bytes)"
)
uploaded_file = await client.files.create(
file=file_data,
purpose=file_purpose,
)
logger.info(f"Uploaded to OpenAI: {uploaded_file.id}")
return self._build_upload_result(uploaded_file.id, file.content_type)
async def _aupload_multipart(
self,
file: FileInput,
content: bytes,
purpose: str | None,
) -> UploadResult:
"""Async upload using the Uploads API (multipart chunked, up to 8GB).
Args:
file: The file to upload.
content: File content bytes.
purpose: Optional purpose for the file.
Returns:
UploadResult with the file ID and metadata.
"""
client = self._get_async_client()
file_purpose = _get_purpose_for_content_type(file.content_type, purpose)
filename = file.filename or generate_filename(file.content_type)
file_size = len(content)
logger.info(
f"Uploading file '{filename}' to OpenAI Uploads API "
f"({file_size} bytes, {self._chunk_size} byte chunks)"
)
upload = await client.uploads.create(
bytes=file_size,
filename=filename,
mime_type=file.content_type,
purpose=file_purpose,
)
part_ids: list[str] = []
offset = 0
part_num = 1
try:
while offset < file_size:
chunk = content[offset : offset + self._chunk_size]
chunk_io = io.BytesIO(chunk)
logger.debug(
f"Uploading part {part_num} ({len(chunk)} bytes, offset {offset})"
)
part = await client.uploads.parts.create(
upload_id=upload.id,
data=chunk_io,
)
part_ids.append(part.id)
offset += self._chunk_size
part_num += 1
completed = await client.uploads.complete(
upload_id=upload.id,
part_ids=part_ids,
)
file_id = completed.file.id if completed.file else upload.id
logger.info(f"Completed multipart upload to OpenAI: {file_id}")
return self._build_upload_result(file_id, file.content_type)
except Exception:
logger.warning(f"Multipart upload failed, cancelling upload {upload.id}")
try:
await client.uploads.cancel(upload_id=upload.id)
except Exception as cancel_err:
logger.debug(f"Failed to cancel upload: {cancel_err}")
raise
async def _aupload_multipart_streaming(
self,
file: FileInput,
file_size: int,
purpose: str | None,
) -> UploadResult:
"""Async upload using the Uploads API with streaming chunks.
Streams chunks directly from the file source without loading
the entire file into memory. Used for large files.
Args:
file: The file to upload.
file_size: Total file size in bytes.
purpose: Optional purpose for the file.
Returns:
UploadResult with the file ID and metadata.
"""
client = self._get_async_client()
file_purpose = _get_purpose_for_content_type(file.content_type, purpose)
filename = file.filename or generate_filename(file.content_type)
logger.info(
f"Uploading file '{filename}' to OpenAI Uploads API (streaming) "
f"({file_size} bytes, {self._chunk_size} byte chunks)"
)
upload = await client.uploads.create(
bytes=file_size,
filename=filename,
mime_type=file.content_type,
purpose=file_purpose,
)
part_ids: list[str] = []
part_num = 1
try:
async for chunk in _aiter_file_chunks(file, self._chunk_size):
chunk_io = io.BytesIO(chunk)
logger.debug(f"Uploading part {part_num} ({len(chunk)} bytes)")
part = await client.uploads.parts.create(
upload_id=upload.id,
data=chunk_io,
)
part_ids.append(part.id)
part_num += 1
completed = await client.uploads.complete(
upload_id=upload.id,
part_ids=part_ids,
)
file_id = completed.file.id if completed.file else upload.id
logger.info(f"Completed streaming multipart upload to OpenAI: {file_id}")
return self._build_upload_result(file_id, file.content_type)
except Exception:
logger.warning(f"Multipart upload failed, cancelling upload {upload.id}")
try:
await client.uploads.cancel(upload_id=upload.id)
except Exception as cancel_err:
logger.debug(f"Failed to cancel upload: {cancel_err}")
raise
async def adelete(self, file_id: str) -> bool:
"""Async delete an uploaded file from OpenAI.
Args:
file_id: The file ID to delete.
Returns:
True if deletion was successful, False otherwise.
"""
try:
client = self._get_async_client()
await client.files.delete(file_id)
logger.info(f"Deleted OpenAI file: {file_id}")
return True
except Exception as e:
logger.warning(f"Failed to delete OpenAI file {file_id}: {e}")
return False

View File

@@ -0,0 +1,5 @@
Quarter,Revenue ($M),Expenses ($M),Profit ($M)
Q1 2024,70,40,30
Q2 2024,75,42,33
Q3 2024,80,45,35
Q4 2024,75,44,31
1 Quarter Revenue ($M) Expenses ($M) Profit ($M)
2 Q1 2024 70 40 30
3 Q2 2024 75 42 33
4 Q3 2024 80 45 35
5 Q4 2024 75 44 31

Binary file not shown.

After

Width:  |  Height:  |  Size: 27 KiB

View File

@@ -0,0 +1,10 @@
Review Guidelines
1. Be clear and concise: Write feedback that is easy to understand.
2. Focus on behavior and outcomes: Describe what happened and why it matters.
3. Be specific: Provide examples to support your points.
4. Balance positives and improvements: Highlight strengths and areas to grow.
5. Be respectful and constructive: Assume positive intent and offer solutions.
6. Use objective criteria: Reference goals, metrics, or expectations where possible.
7. Suggest next steps: Recommend actionable ways to improve.
8. Proofread: Check tone, grammar, and clarity before submitting.

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,225 @@
"""Tests for provider constraints."""
from crewai_files.processing.constraints import (
ANTHROPIC_CONSTRAINTS,
BEDROCK_CONSTRAINTS,
GEMINI_CONSTRAINTS,
OPENAI_CONSTRAINTS,
AudioConstraints,
ImageConstraints,
PDFConstraints,
ProviderConstraints,
VideoConstraints,
get_constraints_for_provider,
)
import pytest
class TestImageConstraints:
"""Tests for ImageConstraints dataclass."""
def test_image_constraints_creation(self):
"""Test creating image constraints with all fields."""
constraints = ImageConstraints(
max_size_bytes=5 * 1024 * 1024,
max_width=8000,
max_height=8000,
max_images_per_request=10,
)
assert constraints.max_size_bytes == 5 * 1024 * 1024
assert constraints.max_width == 8000
assert constraints.max_height == 8000
assert constraints.max_images_per_request == 10
def test_image_constraints_defaults(self):
"""Test image constraints with default values."""
constraints = ImageConstraints(max_size_bytes=1000)
assert constraints.max_size_bytes == 1000
assert constraints.max_width is None
assert constraints.max_height is None
assert constraints.max_images_per_request is None
assert "image/png" in constraints.supported_formats
def test_image_constraints_frozen(self):
"""Test that image constraints are immutable."""
constraints = ImageConstraints(max_size_bytes=1000)
with pytest.raises(Exception):
constraints.max_size_bytes = 2000
class TestPDFConstraints:
"""Tests for PDFConstraints dataclass."""
def test_pdf_constraints_creation(self):
"""Test creating PDF constraints."""
constraints = PDFConstraints(
max_size_bytes=30 * 1024 * 1024,
max_pages=100,
)
assert constraints.max_size_bytes == 30 * 1024 * 1024
assert constraints.max_pages == 100
def test_pdf_constraints_defaults(self):
"""Test PDF constraints with default values."""
constraints = PDFConstraints(max_size_bytes=1000)
assert constraints.max_size_bytes == 1000
assert constraints.max_pages is None
class TestAudioConstraints:
"""Tests for AudioConstraints dataclass."""
def test_audio_constraints_creation(self):
"""Test creating audio constraints."""
constraints = AudioConstraints(
max_size_bytes=100 * 1024 * 1024,
max_duration_seconds=3600,
)
assert constraints.max_size_bytes == 100 * 1024 * 1024
assert constraints.max_duration_seconds == 3600
assert "audio/mp3" in constraints.supported_formats
class TestVideoConstraints:
"""Tests for VideoConstraints dataclass."""
def test_video_constraints_creation(self):
"""Test creating video constraints."""
constraints = VideoConstraints(
max_size_bytes=2 * 1024 * 1024 * 1024,
max_duration_seconds=7200,
)
assert constraints.max_size_bytes == 2 * 1024 * 1024 * 1024
assert constraints.max_duration_seconds == 7200
assert "video/mp4" in constraints.supported_formats
class TestProviderConstraints:
"""Tests for ProviderConstraints dataclass."""
def test_provider_constraints_creation(self):
"""Test creating full provider constraints."""
constraints = ProviderConstraints(
name="test-provider",
image=ImageConstraints(max_size_bytes=5 * 1024 * 1024),
pdf=PDFConstraints(max_size_bytes=30 * 1024 * 1024),
supports_file_upload=True,
file_upload_threshold_bytes=10 * 1024 * 1024,
)
assert constraints.name == "test-provider"
assert constraints.image is not None
assert constraints.pdf is not None
assert constraints.supports_file_upload is True
def test_provider_constraints_defaults(self):
"""Test provider constraints with default values."""
constraints = ProviderConstraints(name="test")
assert constraints.name == "test"
assert constraints.image is None
assert constraints.pdf is None
assert constraints.audio is None
assert constraints.video is None
assert constraints.supports_file_upload is False
class TestPredefinedConstraints:
"""Tests for predefined provider constraints."""
def test_anthropic_constraints(self):
"""Test Anthropic constraints are properly defined."""
assert ANTHROPIC_CONSTRAINTS.name == "anthropic"
assert ANTHROPIC_CONSTRAINTS.image is not None
assert ANTHROPIC_CONSTRAINTS.image.max_size_bytes == 5 * 1024 * 1024
assert ANTHROPIC_CONSTRAINTS.image.max_width == 8000
assert ANTHROPIC_CONSTRAINTS.pdf is not None
assert ANTHROPIC_CONSTRAINTS.pdf.max_pages == 100
assert ANTHROPIC_CONSTRAINTS.supports_file_upload is True
def test_openai_constraints(self):
"""Test OpenAI constraints are properly defined."""
assert OPENAI_CONSTRAINTS.name == "openai"
assert OPENAI_CONSTRAINTS.image is not None
assert OPENAI_CONSTRAINTS.image.max_size_bytes == 20 * 1024 * 1024
assert OPENAI_CONSTRAINTS.pdf is None # OpenAI doesn't support PDFs
def test_gemini_constraints(self):
"""Test Gemini constraints are properly defined."""
assert GEMINI_CONSTRAINTS.name == "gemini"
assert GEMINI_CONSTRAINTS.image is not None
assert GEMINI_CONSTRAINTS.pdf is not None
assert GEMINI_CONSTRAINTS.audio is not None
assert GEMINI_CONSTRAINTS.video is not None
assert GEMINI_CONSTRAINTS.supports_file_upload is True
def test_bedrock_constraints(self):
"""Test Bedrock constraints are properly defined."""
assert BEDROCK_CONSTRAINTS.name == "bedrock"
assert BEDROCK_CONSTRAINTS.image is not None
assert BEDROCK_CONSTRAINTS.image.max_size_bytes == 4_608_000
assert BEDROCK_CONSTRAINTS.pdf is not None
assert BEDROCK_CONSTRAINTS.supports_file_upload is False
class TestGetConstraintsForProvider:
"""Tests for get_constraints_for_provider function."""
def test_get_by_exact_name(self):
"""Test getting constraints by exact provider name."""
result = get_constraints_for_provider("anthropic")
assert result == ANTHROPIC_CONSTRAINTS
result = get_constraints_for_provider("openai")
assert result == OPENAI_CONSTRAINTS
result = get_constraints_for_provider("gemini")
assert result == GEMINI_CONSTRAINTS
def test_get_by_alias(self):
"""Test getting constraints by alias name."""
result = get_constraints_for_provider("claude")
assert result == ANTHROPIC_CONSTRAINTS
result = get_constraints_for_provider("gpt")
assert result == OPENAI_CONSTRAINTS
result = get_constraints_for_provider("google")
assert result == GEMINI_CONSTRAINTS
def test_get_case_insensitive(self):
"""Test case-insensitive lookup."""
result = get_constraints_for_provider("ANTHROPIC")
assert result == ANTHROPIC_CONSTRAINTS
result = get_constraints_for_provider("OpenAI")
assert result == OPENAI_CONSTRAINTS
def test_get_with_provider_constraints_object(self):
"""Test passing ProviderConstraints object returns it unchanged."""
custom = ProviderConstraints(name="custom")
result = get_constraints_for_provider(custom)
assert result is custom
def test_get_unknown_provider(self):
"""Test unknown provider returns None."""
result = get_constraints_for_provider("unknown-provider")
assert result is None
def test_get_by_partial_match(self):
"""Test partial match in provider string."""
result = get_constraints_for_provider("claude-3-sonnet")
assert result == ANTHROPIC_CONSTRAINTS
result = get_constraints_for_provider("gpt-4o")
assert result == OPENAI_CONSTRAINTS
result = get_constraints_for_provider("gemini-pro")
assert result == GEMINI_CONSTRAINTS

View File

@@ -0,0 +1,303 @@
"""Tests for FileProcessor class."""
from crewai_files import FileBytes, ImageFile
from crewai_files.processing.constraints import (
ANTHROPIC_CONSTRAINTS,
ImageConstraints,
ProviderConstraints,
)
from crewai_files.processing.enums import FileHandling
from crewai_files.processing.exceptions import (
FileTooLargeError,
)
from crewai_files.processing.processor import FileProcessor
import pytest
# Minimal valid PNG: 8x8 pixel RGB image (valid for PIL)
MINIMAL_PNG = bytes(
[
0x89,
0x50,
0x4E,
0x47,
0x0D,
0x0A,
0x1A,
0x0A,
0x00,
0x00,
0x00,
0x0D,
0x49,
0x48,
0x44,
0x52,
0x00,
0x00,
0x00,
0x08,
0x00,
0x00,
0x00,
0x08,
0x08,
0x02,
0x00,
0x00,
0x00,
0x4B,
0x6D,
0x29,
0xDC,
0x00,
0x00,
0x00,
0x12,
0x49,
0x44,
0x41,
0x54,
0x78,
0x9C,
0x63,
0xFC,
0xCF,
0x80,
0x1D,
0x30,
0xE1,
0x10,
0x1F,
0xA4,
0x12,
0x00,
0xCD,
0x41,
0x01,
0x0F,
0xE8,
0x41,
0xE2,
0x6F,
0x00,
0x00,
0x00,
0x00,
0x49,
0x45,
0x4E,
0x44,
0xAE,
0x42,
0x60,
0x82,
]
)
# Minimal valid PDF
MINIMAL_PDF = (
b"%PDF-1.4\n1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj "
b"2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj "
b"3 0 obj<</Type/Page/MediaBox[0 0 612 792]/Parent 2 0 R>>endobj "
b"xref\n0 4\n0000000000 65535 f \n0000000009 00000 n \n"
b"0000000052 00000 n \n0000000101 00000 n \n"
b"trailer<</Size 4/Root 1 0 R>>\nstartxref\n178\n%%EOF"
)
class TestFileProcessorInit:
"""Tests for FileProcessor initialization."""
def test_init_with_constraints(self):
"""Test initialization with ProviderConstraints."""
processor = FileProcessor(constraints=ANTHROPIC_CONSTRAINTS)
assert processor.constraints == ANTHROPIC_CONSTRAINTS
def test_init_with_provider_string(self):
"""Test initialization with provider name string."""
processor = FileProcessor(constraints="anthropic")
assert processor.constraints == ANTHROPIC_CONSTRAINTS
def test_init_with_unknown_provider(self):
"""Test initialization with unknown provider sets constraints to None."""
processor = FileProcessor(constraints="unknown")
assert processor.constraints is None
def test_init_with_none_constraints(self):
"""Test initialization with None constraints."""
processor = FileProcessor(constraints=None)
assert processor.constraints is None
class TestFileProcessorValidate:
"""Tests for FileProcessor.validate method."""
def test_validate_valid_file(self):
"""Test validating a valid file returns no errors."""
processor = FileProcessor(constraints=ANTHROPIC_CONSTRAINTS)
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
errors = processor.validate(file)
assert len(errors) == 0
def test_validate_without_constraints(self):
"""Test validating without constraints returns empty list."""
processor = FileProcessor(constraints=None)
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
errors = processor.validate(file)
assert len(errors) == 0
def test_validate_strict_raises_on_error(self):
"""Test STRICT mode raises on validation error."""
constraints = ProviderConstraints(
name="test",
image=ImageConstraints(max_size_bytes=10),
)
processor = FileProcessor(constraints=constraints)
# Set mode to strict on the file
file = ImageFile(
source=FileBytes(data=MINIMAL_PNG, filename="test.png"), mode="strict"
)
with pytest.raises(FileTooLargeError):
processor.validate(file)
class TestFileProcessorProcess:
"""Tests for FileProcessor.process method."""
def test_process_valid_file(self):
"""Test processing a valid file returns it unchanged."""
processor = FileProcessor(constraints=ANTHROPIC_CONSTRAINTS)
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
result = processor.process(file)
assert result == file
def test_process_without_constraints(self):
"""Test processing without constraints returns file unchanged."""
processor = FileProcessor(constraints=None)
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
result = processor.process(file)
assert result == file
def test_process_strict_raises_on_error(self):
"""Test STRICT mode raises on processing error."""
constraints = ProviderConstraints(
name="test",
image=ImageConstraints(max_size_bytes=10),
)
processor = FileProcessor(constraints=constraints)
# Set mode to strict on the file
file = ImageFile(
source=FileBytes(data=MINIMAL_PNG, filename="test.png"), mode="strict"
)
with pytest.raises(FileTooLargeError):
processor.process(file)
def test_process_warn_returns_file(self):
"""Test WARN mode returns file with warning."""
constraints = ProviderConstraints(
name="test",
image=ImageConstraints(max_size_bytes=10),
)
processor = FileProcessor(constraints=constraints)
# Set mode to warn on the file
file = ImageFile(
source=FileBytes(data=MINIMAL_PNG, filename="test.png"), mode="warn"
)
result = processor.process(file)
assert result == file
class TestFileProcessorProcessFiles:
"""Tests for FileProcessor.process_files method."""
def test_process_files_multiple(self):
"""Test processing multiple files."""
processor = FileProcessor(constraints=ANTHROPIC_CONSTRAINTS)
files = {
"image1": ImageFile(
source=FileBytes(data=MINIMAL_PNG, filename="test1.png")
),
"image2": ImageFile(
source=FileBytes(data=MINIMAL_PNG, filename="test2.png")
),
}
result = processor.process_files(files)
assert len(result) == 2
assert "image1" in result
assert "image2" in result
def test_process_files_empty(self):
"""Test processing empty files dict."""
processor = FileProcessor(constraints=ANTHROPIC_CONSTRAINTS)
result = processor.process_files({})
assert result == {}
class TestFileHandlingEnum:
"""Tests for FileHandling enum."""
def test_enum_values(self):
"""Test all enum values are accessible."""
assert FileHandling.STRICT.value == "strict"
assert FileHandling.AUTO.value == "auto"
assert FileHandling.WARN.value == "warn"
assert FileHandling.CHUNK.value == "chunk"
class TestFileProcessorPerFileMode:
"""Tests for per-file mode handling."""
def test_file_default_mode_is_auto(self):
"""Test that files default to auto mode."""
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
assert file.mode == "auto"
def test_file_custom_mode(self):
"""Test setting custom mode on file."""
file = ImageFile(
source=FileBytes(data=MINIMAL_PNG, filename="test.png"), mode="strict"
)
assert file.mode == "strict"
def test_processor_respects_file_mode(self):
"""Test processor uses each file's mode setting."""
constraints = ProviderConstraints(
name="test",
image=ImageConstraints(max_size_bytes=10),
)
processor = FileProcessor(constraints=constraints)
# File with strict mode should raise
strict_file = ImageFile(
source=FileBytes(data=MINIMAL_PNG, filename="test.png"), mode="strict"
)
with pytest.raises(FileTooLargeError):
processor.process(strict_file)
# File with warn mode should not raise
warn_file = ImageFile(
source=FileBytes(data=MINIMAL_PNG, filename="test.png"), mode="warn"
)
result = processor.process(warn_file)
assert result == warn_file

View File

@@ -0,0 +1,362 @@
"""Unit tests for file transformers."""
import io
from unittest.mock import patch
from crewai_files import ImageFile, PDFFile, TextFile
from crewai_files.core.sources import FileBytes
from crewai_files.processing.exceptions import ProcessingDependencyError
from crewai_files.processing.transformers import (
chunk_pdf,
chunk_text,
get_image_dimensions,
get_pdf_page_count,
optimize_image,
resize_image,
)
import pytest
def create_test_png(width: int = 100, height: int = 100) -> bytes:
"""Create a minimal valid PNG for testing."""
from PIL import Image
img = Image.new("RGB", (width, height), color="red")
buffer = io.BytesIO()
img.save(buffer, format="PNG")
return buffer.getvalue()
def create_test_pdf(num_pages: int = 1) -> bytes:
"""Create a minimal valid PDF for testing."""
from pypdf import PdfWriter
writer = PdfWriter()
for _ in range(num_pages):
writer.add_blank_page(width=612, height=792)
buffer = io.BytesIO()
writer.write(buffer)
return buffer.getvalue()
class TestResizeImage:
"""Tests for resize_image function."""
def test_resize_larger_image(self) -> None:
"""Test resizing an image larger than max dimensions."""
png_bytes = create_test_png(200, 150)
img = ImageFile(source=FileBytes(data=png_bytes, filename="test.png"))
result = resize_image(img, max_width=100, max_height=100)
dims = get_image_dimensions(result)
assert dims is not None
width, height = dims
assert width <= 100
assert height <= 100
def test_no_resize_if_within_bounds(self) -> None:
"""Test that small images are returned unchanged."""
png_bytes = create_test_png(50, 50)
img = ImageFile(source=FileBytes(data=png_bytes, filename="small.png"))
result = resize_image(img, max_width=100, max_height=100)
assert result is img
def test_preserve_aspect_ratio(self) -> None:
"""Test that aspect ratio is preserved during resize."""
png_bytes = create_test_png(200, 100)
img = ImageFile(source=FileBytes(data=png_bytes, filename="wide.png"))
result = resize_image(img, max_width=100, max_height=100)
dims = get_image_dimensions(result)
assert dims is not None
width, height = dims
assert width == 100
assert height == 50
def test_resize_without_aspect_ratio(self) -> None:
"""Test resizing without preserving aspect ratio."""
png_bytes = create_test_png(200, 100)
img = ImageFile(source=FileBytes(data=png_bytes, filename="wide.png"))
result = resize_image(
img, max_width=50, max_height=50, preserve_aspect_ratio=False
)
dims = get_image_dimensions(result)
assert dims is not None
width, height = dims
assert width == 50
assert height == 50
def test_resize_returns_image_file(self) -> None:
"""Test that resize returns an ImageFile instance."""
png_bytes = create_test_png(200, 200)
img = ImageFile(source=FileBytes(data=png_bytes, filename="test.png"))
result = resize_image(img, max_width=100, max_height=100)
assert isinstance(result, ImageFile)
def test_raises_without_pillow(self) -> None:
"""Test that ProcessingDependencyError is raised without Pillow."""
img = ImageFile(source=FileBytes(data=b"fake", filename="test.png"))
with patch.dict("sys.modules", {"PIL": None, "PIL.Image": None}):
with pytest.raises(ProcessingDependencyError) as exc_info:
# Force reimport to trigger ImportError
import importlib
import crewai_files.processing.transformers as t
importlib.reload(t)
t.resize_image(img, 100, 100)
assert "Pillow" in str(exc_info.value)
class TestOptimizeImage:
"""Tests for optimize_image function."""
def test_optimize_reduces_size(self) -> None:
"""Test that optimization reduces file size."""
png_bytes = create_test_png(500, 500)
original_size = len(png_bytes)
img = ImageFile(source=FileBytes(data=png_bytes, filename="large.png"))
result = optimize_image(img, target_size_bytes=original_size // 2)
result_size = len(result.read())
assert result_size < original_size
def test_no_optimize_if_under_target(self) -> None:
"""Test that small images are returned unchanged."""
png_bytes = create_test_png(50, 50)
img = ImageFile(source=FileBytes(data=png_bytes, filename="small.png"))
result = optimize_image(img, target_size_bytes=1024 * 1024)
assert result is img
def test_optimize_returns_image_file(self) -> None:
"""Test that optimize returns an ImageFile instance."""
png_bytes = create_test_png(200, 200)
img = ImageFile(source=FileBytes(data=png_bytes, filename="test.png"))
result = optimize_image(img, target_size_bytes=100)
assert isinstance(result, ImageFile)
def test_optimize_respects_min_quality(self) -> None:
"""Test that optimization stops at minimum quality."""
png_bytes = create_test_png(100, 100)
img = ImageFile(source=FileBytes(data=png_bytes, filename="test.png"))
# Request impossibly small size - should stop at min quality
result = optimize_image(img, target_size_bytes=10, min_quality=50)
assert isinstance(result, ImageFile)
assert len(result.read()) > 10
class TestChunkPdf:
"""Tests for chunk_pdf function."""
def test_chunk_splits_large_pdf(self) -> None:
"""Test that large PDFs are split into chunks."""
pdf_bytes = create_test_pdf(num_pages=10)
pdf = PDFFile(source=FileBytes(data=pdf_bytes, filename="large.pdf"))
result = list(chunk_pdf(pdf, max_pages=3))
assert len(result) == 4
assert all(isinstance(chunk, PDFFile) for chunk in result)
def test_no_chunk_if_within_limit(self) -> None:
"""Test that small PDFs are returned unchanged."""
pdf_bytes = create_test_pdf(num_pages=3)
pdf = PDFFile(source=FileBytes(data=pdf_bytes, filename="small.pdf"))
result = list(chunk_pdf(pdf, max_pages=5))
assert len(result) == 1
assert result[0] is pdf
def test_chunk_filenames(self) -> None:
"""Test that chunked files have indexed filenames."""
pdf_bytes = create_test_pdf(num_pages=6)
pdf = PDFFile(source=FileBytes(data=pdf_bytes, filename="document.pdf"))
result = list(chunk_pdf(pdf, max_pages=2))
assert result[0].filename == "document_chunk_0.pdf"
assert result[1].filename == "document_chunk_1.pdf"
assert result[2].filename == "document_chunk_2.pdf"
def test_chunk_with_overlap(self) -> None:
"""Test chunking with overlapping pages."""
pdf_bytes = create_test_pdf(num_pages=10)
pdf = PDFFile(source=FileBytes(data=pdf_bytes, filename="doc.pdf"))
result = list(chunk_pdf(pdf, max_pages=4, overlap_pages=1))
# With overlap, we get more chunks
assert len(result) >= 3
def test_chunk_page_counts(self) -> None:
"""Test that each chunk has correct page count."""
pdf_bytes = create_test_pdf(num_pages=7)
pdf = PDFFile(source=FileBytes(data=pdf_bytes, filename="doc.pdf"))
result = list(chunk_pdf(pdf, max_pages=3))
page_counts = [get_pdf_page_count(chunk) for chunk in result]
assert page_counts == [3, 3, 1]
class TestChunkText:
"""Tests for chunk_text function."""
def test_chunk_splits_large_text(self) -> None:
"""Test that large text files are split into chunks."""
content = "Hello world. " * 100
text = TextFile(source=content.encode(), filename="large.txt")
result = list(chunk_text(text, max_chars=200, overlap_chars=0))
assert len(result) > 1
assert all(isinstance(chunk, TextFile) for chunk in result)
def test_no_chunk_if_within_limit(self) -> None:
"""Test that small text files are returned unchanged."""
content = "Short text"
text = TextFile(source=content.encode(), filename="small.txt")
result = list(chunk_text(text, max_chars=1000, overlap_chars=0))
assert len(result) == 1
assert result[0] is text
def test_chunk_filenames(self) -> None:
"""Test that chunked files have indexed filenames."""
content = "A" * 500
text = TextFile(source=FileBytes(data=content.encode(), filename="data.txt"))
result = list(chunk_text(text, max_chars=200, overlap_chars=0))
assert result[0].filename == "data_chunk_0.txt"
assert result[1].filename == "data_chunk_1.txt"
assert len(result) == 3
def test_chunk_preserves_extension(self) -> None:
"""Test that file extension is preserved in chunks."""
content = "A" * 500
text = TextFile(source=FileBytes(data=content.encode(), filename="script.py"))
result = list(chunk_text(text, max_chars=200, overlap_chars=0))
assert all(chunk.filename.endswith(".py") for chunk in result)
def test_chunk_prefers_newline_boundaries(self) -> None:
"""Test that chunking prefers to split at newlines."""
content = "Line one\nLine two\nLine three\nLine four\nLine five"
text = TextFile(source=content.encode(), filename="lines.txt")
result = list(
chunk_text(text, max_chars=25, overlap_chars=0, split_on_newlines=True)
)
# Should split at newline boundaries
for chunk in result:
chunk_text_content = chunk.read().decode()
# Chunks should end at newlines (except possibly the last)
if chunk != result[-1]:
assert (
chunk_text_content.endswith("\n") or len(chunk_text_content) <= 25
)
def test_chunk_with_overlap(self) -> None:
"""Test chunking with overlapping characters."""
content = "ABCDEFGHIJ" * 10
text = TextFile(source=content.encode(), filename="data.txt")
result = list(chunk_text(text, max_chars=30, overlap_chars=5))
# With overlap, chunks should share some content
assert len(result) >= 3
def test_chunk_overlap_larger_than_max_chars(self) -> None:
"""Test that overlap > max_chars doesn't cause infinite loop."""
content = "A" * 100
text = TextFile(source=content.encode(), filename="data.txt")
# overlap_chars > max_chars should still work (just with max overlap)
result = list(chunk_text(text, max_chars=20, overlap_chars=50))
assert len(result) > 1
# Should still complete without hanging
class TestGetImageDimensions:
"""Tests for get_image_dimensions function."""
def test_get_dimensions(self) -> None:
"""Test getting image dimensions."""
png_bytes = create_test_png(150, 100)
img = ImageFile(source=FileBytes(data=png_bytes, filename="test.png"))
dims = get_image_dimensions(img)
assert dims == (150, 100)
def test_returns_none_for_invalid_image(self) -> None:
"""Test that None is returned for invalid image data."""
img = ImageFile(source=FileBytes(data=b"not an image", filename="bad.png"))
dims = get_image_dimensions(img)
assert dims is None
def test_returns_none_without_pillow(self) -> None:
"""Test that None is returned when Pillow is not installed."""
png_bytes = create_test_png(100, 100)
ImageFile(source=FileBytes(data=png_bytes, filename="test.png"))
with patch.dict("sys.modules", {"PIL": None}):
# Can't easily test this without unloading module
# Just verify the function handles the case gracefully
pass
class TestGetPdfPageCount:
"""Tests for get_pdf_page_count function."""
def test_get_page_count(self) -> None:
"""Test getting PDF page count."""
pdf_bytes = create_test_pdf(num_pages=5)
pdf = PDFFile(source=FileBytes(data=pdf_bytes, filename="test.pdf"))
count = get_pdf_page_count(pdf)
assert count == 5
def test_single_page(self) -> None:
"""Test page count for single page PDF."""
pdf_bytes = create_test_pdf(num_pages=1)
pdf = PDFFile(source=FileBytes(data=pdf_bytes, filename="single.pdf"))
count = get_pdf_page_count(pdf)
assert count == 1
def test_returns_none_for_invalid_pdf(self) -> None:
"""Test that None is returned for invalid PDF data."""
pdf = PDFFile(source=FileBytes(data=b"not a pdf", filename="bad.pdf"))
count = get_pdf_page_count(pdf)
assert count is None

View File

@@ -0,0 +1,644 @@
"""Tests for file validators."""
from unittest.mock import patch
from crewai_files import AudioFile, FileBytes, ImageFile, PDFFile, TextFile, VideoFile
from crewai_files.processing.constraints import (
ANTHROPIC_CONSTRAINTS,
AudioConstraints,
ImageConstraints,
PDFConstraints,
ProviderConstraints,
VideoConstraints,
)
from crewai_files.processing.exceptions import (
FileTooLargeError,
FileValidationError,
UnsupportedFileTypeError,
)
from crewai_files.processing.validators import (
_get_audio_duration,
_get_video_duration,
validate_audio,
validate_file,
validate_image,
validate_pdf,
validate_text,
validate_video,
)
import pytest
# Minimal valid PNG: 8x8 pixel RGB image (valid for PIL)
MINIMAL_PNG = bytes(
[
0x89,
0x50,
0x4E,
0x47,
0x0D,
0x0A,
0x1A,
0x0A,
0x00,
0x00,
0x00,
0x0D,
0x49,
0x48,
0x44,
0x52,
0x00,
0x00,
0x00,
0x08,
0x00,
0x00,
0x00,
0x08,
0x08,
0x02,
0x00,
0x00,
0x00,
0x4B,
0x6D,
0x29,
0xDC,
0x00,
0x00,
0x00,
0x12,
0x49,
0x44,
0x41,
0x54,
0x78,
0x9C,
0x63,
0xFC,
0xCF,
0x80,
0x1D,
0x30,
0xE1,
0x10,
0x1F,
0xA4,
0x12,
0x00,
0xCD,
0x41,
0x01,
0x0F,
0xE8,
0x41,
0xE2,
0x6F,
0x00,
0x00,
0x00,
0x00,
0x49,
0x45,
0x4E,
0x44,
0xAE,
0x42,
0x60,
0x82,
]
)
# Minimal valid PDF
MINIMAL_PDF = (
b"%PDF-1.4\n1 0 obj<</Type/Catalog/Pages 2 0 R>>endobj "
b"2 0 obj<</Type/Pages/Kids[3 0 R]/Count 1>>endobj "
b"3 0 obj<</Type/Page/MediaBox[0 0 612 792]/Parent 2 0 R>>endobj "
b"xref\n0 4\n0000000000 65535 f \n0000000009 00000 n \n"
b"0000000052 00000 n \n0000000101 00000 n \n"
b"trailer<</Size 4/Root 1 0 R>>\nstartxref\n178\n%%EOF"
)
class TestValidateImage:
"""Tests for validate_image function."""
def test_validate_valid_image(self):
"""Test validating a valid image within constraints."""
constraints = ImageConstraints(
max_size_bytes=10 * 1024 * 1024,
supported_formats=("image/png",),
)
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
errors = validate_image(file, constraints, raise_on_error=False)
assert len(errors) == 0
def test_validate_image_too_large(self):
"""Test validating an image that exceeds size limit."""
constraints = ImageConstraints(
max_size_bytes=10, # Very small limit
supported_formats=("image/png",),
)
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
with pytest.raises(FileTooLargeError) as exc_info:
validate_image(file, constraints)
assert "exceeds" in str(exc_info.value)
assert exc_info.value.file_name == "test.png"
def test_validate_image_unsupported_format(self):
"""Test validating an image with unsupported format."""
constraints = ImageConstraints(
max_size_bytes=10 * 1024 * 1024,
supported_formats=("image/jpeg",), # Only JPEG
)
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
with pytest.raises(UnsupportedFileTypeError) as exc_info:
validate_image(file, constraints)
assert "not supported" in str(exc_info.value)
def test_validate_image_no_raise(self):
"""Test validating with raise_on_error=False returns errors list."""
constraints = ImageConstraints(
max_size_bytes=10,
supported_formats=("image/jpeg",),
)
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
errors = validate_image(file, constraints, raise_on_error=False)
assert len(errors) == 2 # Size error and format error
class TestValidatePDF:
"""Tests for validate_pdf function."""
def test_validate_valid_pdf(self):
"""Test validating a valid PDF within constraints."""
constraints = PDFConstraints(
max_size_bytes=10 * 1024 * 1024,
)
file = PDFFile(source=FileBytes(data=MINIMAL_PDF, filename="test.pdf"))
errors = validate_pdf(file, constraints, raise_on_error=False)
assert len(errors) == 0
def test_validate_pdf_too_large(self):
"""Test validating a PDF that exceeds size limit."""
constraints = PDFConstraints(
max_size_bytes=10, # Very small limit
)
file = PDFFile(source=FileBytes(data=MINIMAL_PDF, filename="test.pdf"))
with pytest.raises(FileTooLargeError) as exc_info:
validate_pdf(file, constraints)
assert "exceeds" in str(exc_info.value)
class TestValidateText:
"""Tests for validate_text function."""
def test_validate_valid_text(self):
"""Test validating a valid text file."""
constraints = ProviderConstraints(
name="test",
general_max_size_bytes=10 * 1024 * 1024,
)
file = TextFile(source=FileBytes(data=b"Hello, World!", filename="test.txt"))
errors = validate_text(file, constraints, raise_on_error=False)
assert len(errors) == 0
def test_validate_text_too_large(self):
"""Test validating text that exceeds size limit."""
constraints = ProviderConstraints(
name="test",
general_max_size_bytes=5,
)
file = TextFile(source=FileBytes(data=b"Hello, World!", filename="test.txt"))
with pytest.raises(FileTooLargeError):
validate_text(file, constraints)
def test_validate_text_no_limit(self):
"""Test validating text with no size limit."""
constraints = ProviderConstraints(name="test")
file = TextFile(source=FileBytes(data=b"Hello, World!", filename="test.txt"))
errors = validate_text(file, constraints, raise_on_error=False)
assert len(errors) == 0
class TestValidateFile:
"""Tests for validate_file function."""
def test_validate_file_dispatches_to_image(self):
"""Test validate_file dispatches to image validator."""
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
errors = validate_file(file, ANTHROPIC_CONSTRAINTS, raise_on_error=False)
assert len(errors) == 0
def test_validate_file_dispatches_to_pdf(self):
"""Test validate_file dispatches to PDF validator."""
file = PDFFile(source=FileBytes(data=MINIMAL_PDF, filename="test.pdf"))
errors = validate_file(file, ANTHROPIC_CONSTRAINTS, raise_on_error=False)
assert len(errors) == 0
def test_validate_file_unsupported_type(self):
"""Test validating a file type not supported by provider."""
constraints = ProviderConstraints(
name="test",
image=None, # No image support
)
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
with pytest.raises(UnsupportedFileTypeError) as exc_info:
validate_file(file, constraints)
assert "does not support images" in str(exc_info.value)
def test_validate_file_pdf_not_supported(self):
"""Test validating PDF when provider doesn't support it."""
constraints = ProviderConstraints(
name="test",
pdf=None, # No PDF support
)
file = PDFFile(source=FileBytes(data=MINIMAL_PDF, filename="test.pdf"))
with pytest.raises(UnsupportedFileTypeError) as exc_info:
validate_file(file, constraints)
assert "does not support PDFs" in str(exc_info.value)
# Minimal audio bytes for testing (not a valid audio file, used for mocked tests)
MINIMAL_AUDIO = b"\x00" * 100
# Minimal video bytes for testing (not a valid video file, used for mocked tests)
MINIMAL_VIDEO = b"\x00" * 100
# Fallback content type when python-magic cannot detect
FALLBACK_CONTENT_TYPE = "application/octet-stream"
class TestValidateAudio:
"""Tests for validate_audio function and audio duration validation."""
def test_validate_valid_audio(self):
"""Test validating a valid audio file within constraints."""
constraints = AudioConstraints(
max_size_bytes=10 * 1024 * 1024,
supported_formats=("audio/mp3", "audio/mpeg", FALLBACK_CONTENT_TYPE),
)
file = AudioFile(source=FileBytes(data=MINIMAL_AUDIO, filename="test.mp3"))
errors = validate_audio(file, constraints, raise_on_error=False)
assert len(errors) == 0
def test_validate_audio_too_large(self):
"""Test validating an audio file that exceeds size limit."""
constraints = AudioConstraints(
max_size_bytes=10, # Very small limit
supported_formats=("audio/mp3", "audio/mpeg", FALLBACK_CONTENT_TYPE),
)
file = AudioFile(source=FileBytes(data=MINIMAL_AUDIO, filename="test.mp3"))
with pytest.raises(FileTooLargeError) as exc_info:
validate_audio(file, constraints)
assert "exceeds" in str(exc_info.value)
assert exc_info.value.file_name == "test.mp3"
def test_validate_audio_unsupported_format(self):
"""Test validating an audio file with unsupported format."""
constraints = AudioConstraints(
max_size_bytes=10 * 1024 * 1024,
supported_formats=("audio/wav",), # Only WAV
)
file = AudioFile(source=FileBytes(data=MINIMAL_AUDIO, filename="test.mp3"))
with pytest.raises(UnsupportedFileTypeError) as exc_info:
validate_audio(file, constraints)
assert "not supported" in str(exc_info.value)
@patch("crewai_files.processing.validators._get_audio_duration")
def test_validate_audio_duration_passes(self, mock_get_duration):
"""Test validating audio when duration is under limit."""
mock_get_duration.return_value = 30.0
constraints = AudioConstraints(
max_size_bytes=10 * 1024 * 1024,
max_duration_seconds=60,
supported_formats=("audio/mp3", "audio/mpeg", FALLBACK_CONTENT_TYPE),
)
file = AudioFile(source=FileBytes(data=MINIMAL_AUDIO, filename="test.mp3"))
errors = validate_audio(file, constraints, raise_on_error=False)
assert len(errors) == 0
mock_get_duration.assert_called_once()
@patch("crewai_files.processing.validators._get_audio_duration")
def test_validate_audio_duration_fails(self, mock_get_duration):
"""Test validating audio when duration exceeds limit."""
mock_get_duration.return_value = 120.5
constraints = AudioConstraints(
max_size_bytes=10 * 1024 * 1024,
max_duration_seconds=60,
supported_formats=("audio/mp3", "audio/mpeg", FALLBACK_CONTENT_TYPE),
)
file = AudioFile(source=FileBytes(data=MINIMAL_AUDIO, filename="test.mp3"))
with pytest.raises(FileValidationError) as exc_info:
validate_audio(file, constraints)
assert "duration" in str(exc_info.value).lower()
assert "120.5s" in str(exc_info.value)
assert "60s" in str(exc_info.value)
@patch("crewai_files.processing.validators._get_audio_duration")
def test_validate_audio_duration_no_raise(self, mock_get_duration):
"""Test audio duration validation with raise_on_error=False."""
mock_get_duration.return_value = 120.5
constraints = AudioConstraints(
max_size_bytes=10 * 1024 * 1024,
max_duration_seconds=60,
supported_formats=("audio/mp3", "audio/mpeg", FALLBACK_CONTENT_TYPE),
)
file = AudioFile(source=FileBytes(data=MINIMAL_AUDIO, filename="test.mp3"))
errors = validate_audio(file, constraints, raise_on_error=False)
assert len(errors) == 1
assert "duration" in errors[0].lower()
@patch("crewai_files.processing.validators._get_audio_duration")
def test_validate_audio_duration_none_skips(self, mock_get_duration):
"""Test that duration validation is skipped when max_duration_seconds is None."""
constraints = AudioConstraints(
max_size_bytes=10 * 1024 * 1024,
max_duration_seconds=None,
supported_formats=("audio/mp3", "audio/mpeg", FALLBACK_CONTENT_TYPE),
)
file = AudioFile(source=FileBytes(data=MINIMAL_AUDIO, filename="test.mp3"))
errors = validate_audio(file, constraints, raise_on_error=False)
assert len(errors) == 0
mock_get_duration.assert_not_called()
@patch("crewai_files.processing.validators._get_audio_duration")
def test_validate_audio_duration_detection_returns_none(self, mock_get_duration):
"""Test that validation passes when duration detection returns None."""
mock_get_duration.return_value = None
constraints = AudioConstraints(
max_size_bytes=10 * 1024 * 1024,
max_duration_seconds=60,
supported_formats=("audio/mp3", "audio/mpeg", FALLBACK_CONTENT_TYPE),
)
file = AudioFile(source=FileBytes(data=MINIMAL_AUDIO, filename="test.mp3"))
errors = validate_audio(file, constraints, raise_on_error=False)
assert len(errors) == 0
class TestValidateVideo:
"""Tests for validate_video function and video duration validation."""
def test_validate_valid_video(self):
"""Test validating a valid video file within constraints."""
constraints = VideoConstraints(
max_size_bytes=10 * 1024 * 1024,
supported_formats=("video/mp4", FALLBACK_CONTENT_TYPE),
)
file = VideoFile(source=FileBytes(data=MINIMAL_VIDEO, filename="test.mp4"))
errors = validate_video(file, constraints, raise_on_error=False)
assert len(errors) == 0
def test_validate_video_too_large(self):
"""Test validating a video file that exceeds size limit."""
constraints = VideoConstraints(
max_size_bytes=10, # Very small limit
supported_formats=("video/mp4", FALLBACK_CONTENT_TYPE),
)
file = VideoFile(source=FileBytes(data=MINIMAL_VIDEO, filename="test.mp4"))
with pytest.raises(FileTooLargeError) as exc_info:
validate_video(file, constraints)
assert "exceeds" in str(exc_info.value)
assert exc_info.value.file_name == "test.mp4"
def test_validate_video_unsupported_format(self):
"""Test validating a video file with unsupported format."""
constraints = VideoConstraints(
max_size_bytes=10 * 1024 * 1024,
supported_formats=("video/webm",), # Only WebM
)
file = VideoFile(source=FileBytes(data=MINIMAL_VIDEO, filename="test.mp4"))
with pytest.raises(UnsupportedFileTypeError) as exc_info:
validate_video(file, constraints)
assert "not supported" in str(exc_info.value)
@patch("crewai_files.processing.validators._get_video_duration")
def test_validate_video_duration_passes(self, mock_get_duration):
"""Test validating video when duration is under limit."""
mock_get_duration.return_value = 30.0
constraints = VideoConstraints(
max_size_bytes=10 * 1024 * 1024,
max_duration_seconds=60,
supported_formats=("video/mp4", FALLBACK_CONTENT_TYPE),
)
file = VideoFile(source=FileBytes(data=MINIMAL_VIDEO, filename="test.mp4"))
errors = validate_video(file, constraints, raise_on_error=False)
assert len(errors) == 0
mock_get_duration.assert_called_once()
@patch("crewai_files.processing.validators._get_video_duration")
def test_validate_video_duration_fails(self, mock_get_duration):
"""Test validating video when duration exceeds limit."""
mock_get_duration.return_value = 180.0
constraints = VideoConstraints(
max_size_bytes=10 * 1024 * 1024,
max_duration_seconds=60,
supported_formats=("video/mp4", FALLBACK_CONTENT_TYPE),
)
file = VideoFile(source=FileBytes(data=MINIMAL_VIDEO, filename="test.mp4"))
with pytest.raises(FileValidationError) as exc_info:
validate_video(file, constraints)
assert "duration" in str(exc_info.value).lower()
assert "180.0s" in str(exc_info.value)
assert "60s" in str(exc_info.value)
@patch("crewai_files.processing.validators._get_video_duration")
def test_validate_video_duration_no_raise(self, mock_get_duration):
"""Test video duration validation with raise_on_error=False."""
mock_get_duration.return_value = 180.0
constraints = VideoConstraints(
max_size_bytes=10 * 1024 * 1024,
max_duration_seconds=60,
supported_formats=("video/mp4", FALLBACK_CONTENT_TYPE),
)
file = VideoFile(source=FileBytes(data=MINIMAL_VIDEO, filename="test.mp4"))
errors = validate_video(file, constraints, raise_on_error=False)
assert len(errors) == 1
assert "duration" in errors[0].lower()
@patch("crewai_files.processing.validators._get_video_duration")
def test_validate_video_duration_none_skips(self, mock_get_duration):
"""Test that duration validation is skipped when max_duration_seconds is None."""
constraints = VideoConstraints(
max_size_bytes=10 * 1024 * 1024,
max_duration_seconds=None,
supported_formats=("video/mp4", FALLBACK_CONTENT_TYPE),
)
file = VideoFile(source=FileBytes(data=MINIMAL_VIDEO, filename="test.mp4"))
errors = validate_video(file, constraints, raise_on_error=False)
assert len(errors) == 0
mock_get_duration.assert_not_called()
@patch("crewai_files.processing.validators._get_video_duration")
def test_validate_video_duration_detection_returns_none(self, mock_get_duration):
"""Test that validation passes when duration detection returns None."""
mock_get_duration.return_value = None
constraints = VideoConstraints(
max_size_bytes=10 * 1024 * 1024,
max_duration_seconds=60,
supported_formats=("video/mp4", FALLBACK_CONTENT_TYPE),
)
file = VideoFile(source=FileBytes(data=MINIMAL_VIDEO, filename="test.mp4"))
errors = validate_video(file, constraints, raise_on_error=False)
assert len(errors) == 0
class TestGetAudioDuration:
"""Tests for _get_audio_duration helper function."""
def test_get_audio_duration_corrupt_file(self):
"""Test handling of corrupt audio data."""
corrupt_data = b"not valid audio data at all"
result = _get_audio_duration(corrupt_data)
assert result is None
class TestGetVideoDuration:
"""Tests for _get_video_duration helper function."""
def test_get_video_duration_corrupt_file(self):
"""Test handling of corrupt video data."""
corrupt_data = b"not valid video data at all"
result = _get_video_duration(corrupt_data)
assert result is None
class TestRealVideoFile:
"""Tests using real video fixture file."""
@pytest.fixture
def sample_video_path(self):
"""Path to sample video fixture."""
from pathlib import Path
path = Path(__file__).parent.parent.parent / "fixtures" / "sample_video.mp4"
if not path.exists():
pytest.skip("sample_video.mp4 fixture not found")
return path
@pytest.fixture
def sample_video_content(self, sample_video_path):
"""Read sample video content."""
return sample_video_path.read_bytes()
def test_get_video_duration_real_file(self, sample_video_content):
"""Test duration detection with real video file."""
try:
import av # noqa: F401
except ImportError:
pytest.skip("PyAV not installed")
duration = _get_video_duration(sample_video_content, "video/mp4")
assert duration is not None
assert 4.5 <= duration <= 5.5 # ~5 seconds with tolerance
def test_get_video_duration_real_file_no_format_hint(self, sample_video_content):
"""Test duration detection without format hint."""
try:
import av # noqa: F401
except ImportError:
pytest.skip("PyAV not installed")
duration = _get_video_duration(sample_video_content)
assert duration is not None
assert 4.5 <= duration <= 5.5
def test_validate_video_real_file_passes(self, sample_video_path):
"""Test validating real video file within constraints."""
try:
import av # noqa: F401
except ImportError:
pytest.skip("PyAV not installed")
constraints = VideoConstraints(
max_size_bytes=10 * 1024 * 1024,
max_duration_seconds=60,
supported_formats=("video/mp4",),
)
file = VideoFile(source=str(sample_video_path))
errors = validate_video(file, constraints, raise_on_error=False)
assert len(errors) == 0
def test_validate_video_real_file_duration_exceeded(self, sample_video_path):
"""Test validating real video file that exceeds duration limit."""
try:
import av # noqa: F401
except ImportError:
pytest.skip("PyAV not installed")
constraints = VideoConstraints(
max_size_bytes=10 * 1024 * 1024,
max_duration_seconds=2, # Video is ~5 seconds
supported_formats=("video/mp4",),
)
file = VideoFile(source=str(sample_video_path))
with pytest.raises(FileValidationError) as exc_info:
validate_video(file, constraints)
assert "duration" in str(exc_info.value).lower()
assert "2s" in str(exc_info.value)

View File

@@ -0,0 +1,311 @@
"""Tests for FileUrl source type and URL resolution."""
from unittest.mock import AsyncMock, MagicMock, patch
from crewai_files import FileBytes, FileUrl, ImageFile
from crewai_files.core.resolved import InlineBase64, UrlReference
from crewai_files.core.sources import FilePath, _normalize_source
from crewai_files.resolution.resolver import FileResolver
import pytest
class TestFileUrl:
"""Tests for FileUrl source type."""
def test_create_file_url(self):
"""Test creating FileUrl with valid URL."""
url = FileUrl(url="https://example.com/image.png")
assert url.url == "https://example.com/image.png"
assert url.filename is None
def test_create_file_url_with_filename(self):
"""Test creating FileUrl with custom filename."""
url = FileUrl(url="https://example.com/image.png", filename="custom.png")
assert url.url == "https://example.com/image.png"
assert url.filename == "custom.png"
def test_invalid_url_scheme_raises(self):
"""Test that non-http(s) URLs raise ValueError."""
with pytest.raises(ValueError, match="Invalid URL scheme"):
FileUrl(url="ftp://example.com/file.txt")
def test_invalid_url_scheme_file_raises(self):
"""Test that file:// URLs raise ValueError."""
with pytest.raises(ValueError, match="Invalid URL scheme"):
FileUrl(url="file:///path/to/file.txt")
def test_http_url_valid(self):
"""Test that HTTP URLs are valid."""
url = FileUrl(url="http://example.com/image.jpg")
assert url.url == "http://example.com/image.jpg"
def test_https_url_valid(self):
"""Test that HTTPS URLs are valid."""
url = FileUrl(url="https://example.com/image.jpg")
assert url.url == "https://example.com/image.jpg"
def test_content_type_guessing_png(self):
"""Test content type guessing for PNG files."""
url = FileUrl(url="https://example.com/image.png")
assert url.content_type == "image/png"
def test_content_type_guessing_jpeg(self):
"""Test content type guessing for JPEG files."""
url = FileUrl(url="https://example.com/photo.jpg")
assert url.content_type == "image/jpeg"
def test_content_type_guessing_pdf(self):
"""Test content type guessing for PDF files."""
url = FileUrl(url="https://example.com/document.pdf")
assert url.content_type == "application/pdf"
def test_content_type_guessing_with_query_params(self):
"""Test content type guessing with URL query parameters."""
url = FileUrl(url="https://example.com/image.png?v=123&token=abc")
assert url.content_type == "image/png"
def test_content_type_fallback_unknown(self):
"""Test content type falls back to octet-stream for unknown extensions."""
url = FileUrl(url="https://example.com/file.unknownext123")
assert url.content_type == "application/octet-stream"
def test_content_type_no_extension(self):
"""Test content type for URL without extension."""
url = FileUrl(url="https://example.com/file")
assert url.content_type == "application/octet-stream"
def test_read_fetches_content(self):
"""Test that read() fetches content from URL."""
url = FileUrl(url="https://example.com/image.png")
mock_response = MagicMock()
mock_response.content = b"fake image content"
mock_response.headers = {"content-type": "image/png"}
with patch("httpx.get", return_value=mock_response) as mock_get:
content = url.read()
mock_get.assert_called_once_with(
"https://example.com/image.png", follow_redirects=True
)
assert content == b"fake image content"
def test_read_caches_content(self):
"""Test that read() caches content."""
url = FileUrl(url="https://example.com/image.png")
mock_response = MagicMock()
mock_response.content = b"fake content"
mock_response.headers = {}
with patch("httpx.get", return_value=mock_response) as mock_get:
content1 = url.read()
content2 = url.read()
mock_get.assert_called_once()
assert content1 == content2
def test_read_updates_content_type_from_response(self):
"""Test that read() updates content type from response headers."""
url = FileUrl(url="https://example.com/file")
mock_response = MagicMock()
mock_response.content = b"fake content"
mock_response.headers = {"content-type": "image/webp; charset=utf-8"}
with patch("httpx.get", return_value=mock_response):
url.read()
assert url.content_type == "image/webp"
@pytest.mark.asyncio
async def test_aread_fetches_content(self):
"""Test that aread() fetches content from URL asynchronously."""
url = FileUrl(url="https://example.com/image.png")
mock_response = MagicMock()
mock_response.content = b"async fake content"
mock_response.headers = {"content-type": "image/png"}
mock_response.raise_for_status = MagicMock()
mock_client = MagicMock()
mock_client.get = AsyncMock(return_value=mock_response)
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=None)
with patch("httpx.AsyncClient", return_value=mock_client):
content = await url.aread()
assert content == b"async fake content"
@pytest.mark.asyncio
async def test_aread_caches_content(self):
"""Test that aread() caches content."""
url = FileUrl(url="https://example.com/image.png")
mock_response = MagicMock()
mock_response.content = b"cached content"
mock_response.headers = {}
mock_response.raise_for_status = MagicMock()
mock_client = MagicMock()
mock_client.get = AsyncMock(return_value=mock_response)
mock_client.__aenter__ = AsyncMock(return_value=mock_client)
mock_client.__aexit__ = AsyncMock(return_value=None)
with patch("httpx.AsyncClient", return_value=mock_client):
content1 = await url.aread()
content2 = await url.aread()
mock_client.get.assert_called_once()
assert content1 == content2
class TestNormalizeSource:
"""Tests for _normalize_source with URL detection."""
def test_normalize_url_string(self):
"""Test that URL strings are converted to FileUrl."""
result = _normalize_source("https://example.com/image.png")
assert isinstance(result, FileUrl)
assert result.url == "https://example.com/image.png"
def test_normalize_http_url_string(self):
"""Test that HTTP URL strings are converted to FileUrl."""
result = _normalize_source("http://example.com/file.pdf")
assert isinstance(result, FileUrl)
assert result.url == "http://example.com/file.pdf"
def test_normalize_file_path_string(self, tmp_path):
"""Test that file path strings are converted to FilePath."""
test_file = tmp_path / "test.png"
test_file.write_bytes(b"test content")
result = _normalize_source(str(test_file))
assert isinstance(result, FilePath)
def test_normalize_relative_path_is_not_url(self):
"""Test that relative path strings are not treated as URLs."""
result = _normalize_source("https://example.com/file.png")
assert isinstance(result, FileUrl)
assert not isinstance(result, FilePath)
def test_normalize_file_url_passthrough(self):
"""Test that FileUrl instances pass through unchanged."""
original = FileUrl(url="https://example.com/image.png")
result = _normalize_source(original)
assert result is original
class TestResolverUrlHandling:
"""Tests for FileResolver URL handling."""
def test_resolve_url_source_for_supported_provider(self):
"""Test URL source resolves to UrlReference for supported providers."""
resolver = FileResolver()
file = ImageFile(source=FileUrl(url="https://example.com/image.png"))
resolved = resolver.resolve(file, "anthropic")
assert isinstance(resolved, UrlReference)
assert resolved.url == "https://example.com/image.png"
assert resolved.content_type == "image/png"
def test_resolve_url_source_openai(self):
"""Test URL source resolves to UrlReference for OpenAI."""
resolver = FileResolver()
file = ImageFile(source=FileUrl(url="https://example.com/photo.jpg"))
resolved = resolver.resolve(file, "openai")
assert isinstance(resolved, UrlReference)
assert resolved.url == "https://example.com/photo.jpg"
def test_resolve_url_source_gemini(self):
"""Test URL source resolves to UrlReference for Gemini."""
resolver = FileResolver()
file = ImageFile(source=FileUrl(url="https://example.com/image.webp"))
resolved = resolver.resolve(file, "gemini")
assert isinstance(resolved, UrlReference)
assert resolved.url == "https://example.com/image.webp"
def test_resolve_url_source_azure(self):
"""Test URL source resolves to UrlReference for Azure."""
resolver = FileResolver()
file = ImageFile(source=FileUrl(url="https://example.com/image.gif"))
resolved = resolver.resolve(file, "azure")
assert isinstance(resolved, UrlReference)
assert resolved.url == "https://example.com/image.gif"
def test_resolve_url_source_bedrock_fetches_content(self):
"""Test URL source fetches content for Bedrock (unsupported URLs)."""
resolver = FileResolver()
file_url = FileUrl(url="https://example.com/image.png")
file = ImageFile(source=file_url)
mock_response = MagicMock()
mock_response.content = b"\x89PNG\r\n\x1a\n" + b"\x00" * 50
mock_response.headers = {"content-type": "image/png"}
with patch("httpx.get", return_value=mock_response):
resolved = resolver.resolve(file, "bedrock")
assert not isinstance(resolved, UrlReference)
def test_resolve_bytes_source_still_works(self):
"""Test that bytes source still resolves normally."""
resolver = FileResolver()
minimal_png = (
b"\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x08\x00\x00\x00\x08"
b"\x01\x00\x00\x00\x00\xf9Y\xab\xcd\x00\x00\x00\nIDATx\x9cc`\x00\x00"
b"\x00\x02\x00\x01\xe2!\xbc3\x00\x00\x00\x00IEND\xaeB`\x82"
)
file = ImageFile(source=FileBytes(data=minimal_png, filename="test.png"))
resolved = resolver.resolve(file, "anthropic")
assert isinstance(resolved, InlineBase64)
@pytest.mark.asyncio
async def test_aresolve_url_source(self):
"""Test async URL resolution for supported provider."""
resolver = FileResolver()
file = ImageFile(source=FileUrl(url="https://example.com/image.png"))
resolved = await resolver.aresolve(file, "anthropic")
assert isinstance(resolved, UrlReference)
assert resolved.url == "https://example.com/image.png"
class TestImageFileWithUrl:
"""Tests for creating ImageFile with URL source."""
def test_image_file_from_url_string(self):
"""Test creating ImageFile from URL string."""
file = ImageFile(source="https://example.com/image.png")
assert isinstance(file.source, FileUrl)
assert file.source.url == "https://example.com/image.png"
def test_image_file_from_file_url(self):
"""Test creating ImageFile from FileUrl instance."""
url = FileUrl(url="https://example.com/photo.jpg")
file = ImageFile(source=url)
assert file.source is url
assert file.content_type == "image/jpeg"

View File

@@ -0,0 +1,134 @@
"""Tests for resolved file types."""
from datetime import datetime, timezone
from crewai_files.core.resolved import (
FileReference,
InlineBase64,
InlineBytes,
ResolvedFile,
UrlReference,
)
import pytest
class TestInlineBase64:
"""Tests for InlineBase64 resolved type."""
def test_create_inline_base64(self):
"""Test creating InlineBase64 instance."""
resolved = InlineBase64(
content_type="image/png",
data="iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAYAAAAfFcSJAAAADUlEQVR42mNk+M9QDwADhgGAWjR9awAAAABJRU5ErkJggg==",
)
assert resolved.content_type == "image/png"
assert len(resolved.data) > 0
def test_inline_base64_is_resolved_file(self):
"""Test InlineBase64 is a ResolvedFile."""
resolved = InlineBase64(content_type="image/png", data="abc123")
assert isinstance(resolved, ResolvedFile)
def test_inline_base64_frozen(self):
"""Test InlineBase64 is immutable."""
resolved = InlineBase64(content_type="image/png", data="abc123")
with pytest.raises(Exception):
resolved.data = "xyz789"
class TestInlineBytes:
"""Tests for InlineBytes resolved type."""
def test_create_inline_bytes(self):
"""Test creating InlineBytes instance."""
data = b"\x89PNG\r\n\x1a\n"
resolved = InlineBytes(
content_type="image/png",
data=data,
)
assert resolved.content_type == "image/png"
assert resolved.data == data
def test_inline_bytes_is_resolved_file(self):
"""Test InlineBytes is a ResolvedFile."""
resolved = InlineBytes(content_type="image/png", data=b"test")
assert isinstance(resolved, ResolvedFile)
class TestFileReference:
"""Tests for FileReference resolved type."""
def test_create_file_reference(self):
"""Test creating FileReference instance."""
resolved = FileReference(
content_type="image/png",
file_id="file-abc123",
provider="gemini",
)
assert resolved.content_type == "image/png"
assert resolved.file_id == "file-abc123"
assert resolved.provider == "gemini"
assert resolved.expires_at is None
assert resolved.file_uri is None
def test_file_reference_with_expiry(self):
"""Test FileReference with expiry time."""
expiry = datetime.now(timezone.utc)
resolved = FileReference(
content_type="application/pdf",
file_id="file-xyz789",
provider="gemini",
expires_at=expiry,
)
assert resolved.expires_at == expiry
def test_file_reference_with_uri(self):
"""Test FileReference with URI."""
resolved = FileReference(
content_type="video/mp4",
file_id="file-video123",
provider="gemini",
file_uri="https://generativelanguage.googleapis.com/v1/files/file-video123",
)
assert resolved.file_uri is not None
def test_file_reference_is_resolved_file(self):
"""Test FileReference is a ResolvedFile."""
resolved = FileReference(
content_type="image/png",
file_id="file-123",
provider="anthropic",
)
assert isinstance(resolved, ResolvedFile)
class TestUrlReference:
"""Tests for UrlReference resolved type."""
def test_create_url_reference(self):
"""Test creating UrlReference instance."""
resolved = UrlReference(
content_type="image/png",
url="https://storage.googleapis.com/bucket/image.png",
)
assert resolved.content_type == "image/png"
assert resolved.url == "https://storage.googleapis.com/bucket/image.png"
def test_url_reference_is_resolved_file(self):
"""Test UrlReference is a ResolvedFile."""
resolved = UrlReference(
content_type="image/jpeg",
url="https://example.com/photo.jpg",
)
assert isinstance(resolved, ResolvedFile)

View File

@@ -0,0 +1,176 @@
"""Tests for FileResolver."""
from crewai_files import FileBytes, ImageFile
from crewai_files.cache.upload_cache import UploadCache
from crewai_files.core.resolved import InlineBase64, InlineBytes
from crewai_files.resolution.resolver import (
FileResolver,
FileResolverConfig,
create_resolver,
)
# Minimal valid PNG
MINIMAL_PNG = (
b"\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x08\x00\x00\x00\x08"
b"\x01\x00\x00\x00\x00\xf9Y\xab\xcd\x00\x00\x00\nIDATx\x9cc`\x00\x00"
b"\x00\x02\x00\x01\xe2!\xbc3\x00\x00\x00\x00IEND\xaeB`\x82"
)
class TestFileResolverConfig:
"""Tests for FileResolverConfig."""
def test_default_config(self):
"""Test default configuration values."""
config = FileResolverConfig()
assert config.prefer_upload is False
assert config.upload_threshold_bytes is None
assert config.use_bytes_for_bedrock is True
def test_custom_config(self):
"""Test custom configuration values."""
config = FileResolverConfig(
prefer_upload=True,
upload_threshold_bytes=1024 * 1024,
use_bytes_for_bedrock=False,
)
assert config.prefer_upload is True
assert config.upload_threshold_bytes == 1024 * 1024
assert config.use_bytes_for_bedrock is False
class TestFileResolver:
"""Tests for FileResolver class."""
def test_resolve_inline_base64(self):
"""Test resolving file as inline base64."""
resolver = FileResolver()
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
resolved = resolver.resolve(file, "openai")
assert isinstance(resolved, InlineBase64)
assert resolved.content_type == "image/png"
assert len(resolved.data) > 0
def test_resolve_inline_bytes_for_bedrock(self):
"""Test resolving file as inline bytes for Bedrock."""
config = FileResolverConfig(use_bytes_for_bedrock=True)
resolver = FileResolver(config=config)
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
resolved = resolver.resolve(file, "bedrock")
assert isinstance(resolved, InlineBytes)
assert resolved.content_type == "image/png"
assert resolved.data == MINIMAL_PNG
def test_resolve_files_multiple(self):
"""Test resolving multiple files."""
resolver = FileResolver()
files = {
"image1": ImageFile(
source=FileBytes(data=MINIMAL_PNG, filename="test1.png")
),
"image2": ImageFile(
source=FileBytes(data=MINIMAL_PNG, filename="test2.png")
),
}
resolved = resolver.resolve_files(files, "openai")
assert len(resolved) == 2
assert "image1" in resolved
assert "image2" in resolved
assert all(isinstance(r, InlineBase64) for r in resolved.values())
def test_resolve_with_cache(self):
"""Test resolver uses cache."""
cache = UploadCache()
resolver = FileResolver(upload_cache=cache)
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
# First resolution
resolved1 = resolver.resolve(file, "openai")
# Second resolution (should use same base64 encoding)
resolved2 = resolver.resolve(file, "openai")
assert isinstance(resolved1, InlineBase64)
assert isinstance(resolved2, InlineBase64)
# Data should be identical
assert resolved1.data == resolved2.data
def test_clear_cache(self):
"""Test clearing resolver cache."""
cache = UploadCache()
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
# Add something to cache manually
cache.set(file=file, provider="gemini", file_id="test")
resolver = FileResolver(upload_cache=cache)
resolver.clear_cache()
assert len(cache) == 0
def test_get_cached_uploads(self):
"""Test getting cached uploads from resolver."""
cache = UploadCache()
file = ImageFile(source=FileBytes(data=MINIMAL_PNG, filename="test.png"))
cache.set(file=file, provider="gemini", file_id="test-1")
cache.set(file=file, provider="anthropic", file_id="test-2")
resolver = FileResolver(upload_cache=cache)
gemini_uploads = resolver.get_cached_uploads("gemini")
anthropic_uploads = resolver.get_cached_uploads("anthropic")
assert len(gemini_uploads) == 1
assert len(anthropic_uploads) == 1
def test_get_cached_uploads_empty(self):
"""Test getting cached uploads when no cache."""
resolver = FileResolver() # No cache
uploads = resolver.get_cached_uploads("gemini")
assert uploads == []
class TestCreateResolver:
"""Tests for create_resolver factory function."""
def test_create_default_resolver(self):
"""Test creating resolver with default settings."""
resolver = create_resolver()
assert resolver.config.prefer_upload is False
assert resolver.upload_cache is not None
def test_create_resolver_with_options(self):
"""Test creating resolver with custom options."""
resolver = create_resolver(
prefer_upload=True,
upload_threshold_bytes=5 * 1024 * 1024,
enable_cache=False,
)
assert resolver.config.prefer_upload is True
assert resolver.config.upload_threshold_bytes == 5 * 1024 * 1024
assert resolver.upload_cache is None
def test_create_resolver_cache_enabled(self):
"""Test resolver has cache when enabled."""
resolver = create_resolver(enable_cache=True)
assert resolver.upload_cache is not None
def test_create_resolver_cache_disabled(self):
"""Test resolver has no cache when disabled."""
resolver = create_resolver(enable_cache=False)
assert resolver.upload_cache is None

Some files were not shown because too many files have changed in this diff Show More