* Handle Snowflake Claude stringified tool calls
* Fix Snowflake tool id type narrowing
* Extract Snowflake tool result text in summaries
* Bump PyJWT for vulnerability scan
---------
Co-authored-by: João Moura <joaomdmoura@gmail.com>
* Fix structured output leaks in tool-calling loops
* addressing comments
* drop scripts
* Update Gemini agent tests to include structured output with thoughts and bump model version to 2.5-flash
* merge
* Update Anthropic test cases to use new model and tool structure
- Changed the model from "claude-3-5-haiku-20241022" to "claude-sonnet-4-6" in the test setup.
- Updated the request and response formats in the YAML test cassette to reflect the new tool structure and improved content formatting.
- Adjusted the expected response body to match the new output format from the assistant, including changes in tool usage and response details.
- Increased rate limit values in the response headers for better testing scenarios.
* adjusted bedrock cassettes
* adjusting cassettes for bedrock
* fix test
* Update VCR configuration to use 'host' instead of 'bedrock_host' for request matching
* feat(azure): forward credential_scopes to Azure AI Inference client
Adds a credential_scopes field to the native Azure AI Inference
provider and a matching AZURE_CREDENTIAL_SCOPES env var
(comma-separated). The value is forwarded to ChatCompletionsClient /
AsyncChatCompletionsClient when set, letting keyless / Entra-based
callers target a specific Azure AD audience (e.g.
https://cognitiveservices.azure.com/.default) without subclassing the
provider. Matches the upstream azure.ai.inference SDK kwarg of the
same name.
Lazy build re-reads the env var so an LLM constructed at module
import (before deployment env vars are set) still picks up scopes —
same pattern as the existing AZURE_API_KEY / AZURE_ENDPOINT lazy
reads. to_config_dict round-trips the field.
* refactor(azure): tighten credential_scopes env handling
Address review feedback:
- Move os.getenv into the helper so AZURE_CREDENTIAL_SCOPES appears once
- Match the surrounding api_key/endpoint `or` style in the validator
- Drop the list() defensive copy in to_config_dict — every other field
in that method (and the base class's `stop`) is assigned by reference
Enables keyless Azure auth (OIDC Workload Identity Federation, Managed
Identity, Azure CLI, env-configured Service Principal) without any
crewAI-specific configuration. Customers whose deployment environment
already sets the standard azure-identity env vars get keyless auth for
free; the existing API-key path is unchanged.
Linear: FAC-40
Add crewai deploy validate to check project structure, dependencies, imports, and env usage before deploy
Run validation automatically in deploy create and deploy push with skip flag support
Return structured findings with stable codes and hints
Add test coverage for validation scenarios
refactor: defer LLM client construction to first use
Move SDK client creation out of model initialization into lazy getters
Add _get_sync_client and _get_async_client across providers
Route all provider calls through lazy getters
Surface credential errors at first real invocation
refactor: standardize provider client access
Align async paths to use _get_async_client
Avoid client construction in lightweight config accessors
Simplify provider lifecycle and improve consistency
test: update suite for new behavior
Update tests for lazy initialization contract
Update CLI tests for validation flow and skip flag
Expand coverage for provider initialization paths
* fix: bump litellm to >=1.83.0 to address CVE-2026-35030
Bump litellm from <=1.82.6 to >=1.83.0 to fix JWT auth bypass via
OIDC cache key collision (CVE-2026-35030). Also widen devtools openai
pin from ~=1.83.0 to >=1.83.0,<3 to resolve the version conflict
(litellm 1.83.0 requires openai>=2.8.0).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: resolve mypy errors from litellm bump
- Remove unused type: ignore[import-untyped] on instructor import
- Remove all unused type: ignore[union-attr] comments (litellm types fixed)
- Add hasattr guard for tool_call.function — new litellm adds
ChatCompletionMessageCustomToolCall to the union which lacks .function
* fix: tighten litellm pin to ~=1.83.0 (patch-only bumps)
>=1.83.0,<2 is too wide — litellm has had breaking changes between
minors. ~=1.83.0 means >=1.83.0,<1.84.0 — gets CVE patches but won't
pull in breaking minor releases.
* ci: bump uv from 0.8.4 to 0.11.3
* fix: resolve mypy errors in openai completion from 2.x type changes
Use isinstance checks with concrete openai response types instead of
string comparisons for proper type narrowing. Update code interpreter
handling for outputs/OutputImage API changes in openai 2.x.
* fix: pre-cache tiktoken encoding before VCR intercepts requests
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Alex <alex@crewai.com>
Co-authored-by: Greyson LaLonde <greyson@crewai.com>
GPT-5.x models reject the `stop` parameter at the API level with "Unsupported parameter: 'stop' is not supported with this model". This breaks CrewAI executions when routing through LiteLLM (e.g. via
OpenAI-compatible gateways like Asimov), because the LiteLLM fallback path always includes `stop` in the API request params.
The native OpenAI provider was unaffected because it never sends `stop` to the API — it applies stop words client-side via `_apply_stop_words()`. However, when the request goes through LiteLLM (custom endpoints, proxy gateways),
`stop` is sent as an API parameter and GPT-5.x rejects it.
Additionally, the existing retry logic that catches this error only matched the OpenAI API error format ("Unsupported parameter") but missed
LiteLLM's own pre-validation error format ("does not support parameters"), so the self-healing retry never triggered for LiteLLM-routed calls.
- Delegate supports_function_calling() to parent (handles o1 models via OpenRouter)
- Guard empty env vars in base_url resolution
- Fix misleading comment about model validation rules
- Remove unused MagicMock import
- Use 'is not None' for env var restoration in tests
Co-authored-by: Joao Moura <joao@crewai.com>
* feat: add native OpenAI-compatible providers (OpenRouter, DeepSeek, Ollama, vLLM, Cerebras, Dashscope)
Add a data-driven OpenAI-compatible provider system that enables
native support for multiple third-party APIs that implement the
OpenAI API specification.
New providers:
- OpenRouter: 500+ models via openrouter.ai
- DeepSeek: deepseek-chat, deepseek-coder, deepseek-reasoner
- Ollama: local models (llama3, mistral, codellama, etc.)
- hosted_vllm: self-hosted vLLM servers
- Cerebras: ultra-fast inference
- Dashscope: Alibaba Qwen models (qwen-turbo, qwen-max, etc.)
Architecture:
- Single OpenAICompatibleCompletion class extends OpenAICompletion
- ProviderConfig dataclass stores per-provider settings
- Registry dict makes adding new providers a single config entry
- Handles provider-specific quirks (OpenRouter headers, Ollama
base URL normalization, optional API keys)
Usage:
LLM(model="deepseek/deepseek-chat")
LLM(model="ollama/llama3")
LLM(model="openrouter/anthropic/claude-3-opus")
LLM(model="llama3", provider="ollama")
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
* fix: add is_litellm=True to tests that test litellm-specific methods
Tests for _get_custom_llm_provider and _validate_call_params used
openrouter/ model prefix which now routes to native provider.
Added is_litellm=True to force litellm path since these test
litellm-specific internals.
---------
Co-authored-by: Joao Moura <joao@crewai.com>
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
* feat: introduce PlanningConfig for enhanced agent planning capabilities (#4344)
* feat: introduce PlanningConfig for enhanced agent planning capabilities
This update adds a new PlanningConfig class to manage agent planning configurations, allowing for customizable planning behavior before task execution. The existing reasoning parameter is deprecated in favor of this new configuration, ensuring backward compatibility while enhancing the planning process. Additionally, the Agent class has been updated to utilize this new configuration, and relevant utility functions have been adjusted accordingly. Tests have been added to validate the new planning functionality and ensure proper integration with existing agent workflows.
* dropping redundancy
* fix test
* revert handle_reasoning here
* refactor: update reasoning handling in Agent class
This commit modifies the Agent class to conditionally call the handle_reasoning function based on the executor class being used. The legacy CrewAgentExecutor will continue to utilize handle_reasoning, while the new AgentExecutor will manage planning internally. Additionally, the PlanningConfig class has been referenced in the documentation to clarify its role in enabling or disabling planning. Tests have been updated to reflect these changes and ensure proper functionality.
* improve planning prompts
* matching
* refactor: remove default enabled flag from PlanningConfig in Agent class
* more cassettes
* fix test
* refactor: update planning prompt and remove deprecated methods in reasoning handler
* improve planning prompt
* Lorenze/feat planning pt 2 todo list gen (#4449)
* feat: introduce PlanningConfig for enhanced agent planning capabilities
This update adds a new PlanningConfig class to manage agent planning configurations, allowing for customizable planning behavior before task execution. The existing reasoning parameter is deprecated in favor of this new configuration, ensuring backward compatibility while enhancing the planning process. Additionally, the Agent class has been updated to utilize this new configuration, and relevant utility functions have been adjusted accordingly. Tests have been added to validate the new planning functionality and ensure proper integration with existing agent workflows.
* dropping redundancy
* fix test
* revert handle_reasoning here
* refactor: update reasoning handling in Agent class
This commit modifies the Agent class to conditionally call the handle_reasoning function based on the executor class being used. The legacy CrewAgentExecutor will continue to utilize handle_reasoning, while the new AgentExecutor will manage planning internally. Additionally, the PlanningConfig class has been referenced in the documentation to clarify its role in enabling or disabling planning. Tests have been updated to reflect these changes and ensure proper functionality.
* improve planning prompts
* matching
* refactor: remove default enabled flag from PlanningConfig in Agent class
* more cassettes
* fix test
* feat: enhance agent planning with structured todo management
This commit introduces a new planning system within the AgentExecutor class, allowing for the creation of structured todo items from planning steps. The TodoList and TodoItem models have been added to facilitate tracking of plan execution. The reasoning plan now includes a list of steps, improving the clarity and organization of agent tasks. Additionally, tests have been added to validate the new planning functionality and ensure proper integration with existing workflows.
* refactor: update planning prompt and remove deprecated methods in reasoning handler
* improve planning prompt
* improve handler
* linted
* linted
* Lorenze/feat/planning pt 3 todo list execution (#4450)
* feat: introduce PlanningConfig for enhanced agent planning capabilities
This update adds a new PlanningConfig class to manage agent planning configurations, allowing for customizable planning behavior before task execution. The existing reasoning parameter is deprecated in favor of this new configuration, ensuring backward compatibility while enhancing the planning process. Additionally, the Agent class has been updated to utilize this new configuration, and relevant utility functions have been adjusted accordingly. Tests have been added to validate the new planning functionality and ensure proper integration with existing agent workflows.
* dropping redundancy
* fix test
* revert handle_reasoning here
* refactor: update reasoning handling in Agent class
This commit modifies the Agent class to conditionally call the handle_reasoning function based on the executor class being used. The legacy CrewAgentExecutor will continue to utilize handle_reasoning, while the new AgentExecutor will manage planning internally. Additionally, the PlanningConfig class has been referenced in the documentation to clarify its role in enabling or disabling planning. Tests have been updated to reflect these changes and ensure proper functionality.
* improve planning prompts
* matching
* refactor: remove default enabled flag from PlanningConfig in Agent class
* more cassettes
* fix test
* feat: enhance agent planning with structured todo management
This commit introduces a new planning system within the AgentExecutor class, allowing for the creation of structured todo items from planning steps. The TodoList and TodoItem models have been added to facilitate tracking of plan execution. The reasoning plan now includes a list of steps, improving the clarity and organization of agent tasks. Additionally, tests have been added to validate the new planning functionality and ensure proper integration with existing workflows.
* refactor: update planning prompt and remove deprecated methods in reasoning handler
* improve planning prompt
* improve handler
* execute todos and be able to track them
* feat: introduce PlannerObserver and StepExecutor for enhanced plan execution
This commit adds the PlannerObserver and StepExecutor classes to the CrewAI framework, implementing the observation phase of the Plan-and-Execute architecture. The PlannerObserver analyzes step execution results, determines plan validity, and suggests refinements, while the StepExecutor executes individual todo items in isolation. These additions improve the overall planning and execution process, allowing for more dynamic and responsive agent behavior. Additionally, new observation events have been defined to facilitate monitoring and logging of the planning process.
* refactor: enhance final answer synthesis in AgentExecutor
This commit improves the synthesis of final answers in the AgentExecutor class by implementing a more coherent approach to combining results from multiple todo items. The method now utilizes a single LLM call to generate a polished response, falling back to concatenation if the synthesis fails. Additionally, the test cases have been updated to reflect the changes in planning and execution, ensuring that the results are properly validated and that the plan-and-execute architecture is functioning as intended.
* refactor: enhance final answer synthesis in AgentExecutor
This commit improves the synthesis of final answers in the AgentExecutor class by implementing a more coherent approach to combining results from multiple todo items. The method now utilizes a single LLM call to generate a polished response, falling back to concatenation if the synthesis fails. Additionally, the test cases have been updated to reflect the changes in planning and execution, ensuring that the results are properly validated and that the plan-and-execute architecture is functioning as intended.
* refactor: implement structured output handling in final answer synthesis
This commit enhances the final answer synthesis process in the AgentExecutor class by introducing support for structured outputs when a response model is specified. The synthesis method now utilizes the response model to produce outputs that conform to the expected schema, while still falling back to concatenation in case of synthesis failures. This change ensures that intermediate steps yield free-text results, but the final output can be structured, improving the overall coherence and usability of the synthesized answers.
* regen tests
* linted
* fix
* Enhance PlanningConfig and AgentExecutor with Reasoning Effort Levels
This update introduces a new attribute in the class, allowing users to customize the observation and replanning behavior during task execution. The class has been modified to utilize this new attribute, routing step observations based on the specified reasoning effort level: low, medium, or high.
Additionally, tests have been added to validate the functionality of the reasoning effort levels, ensuring that the agent behaves as expected under different configurations. This enhancement improves the adaptability and efficiency of the planning process in agent execution.
* regen cassettes for test and fix test
* cassette regen
* fixing tests
* dry
* Refactor PlannerObserver and StepExecutor to Utilize I18N for Prompts
This update enhances the PlannerObserver and StepExecutor classes by integrating the I18N utility for managing prompts and messages. The system and user prompts are now retrieved from the I18N module, allowing for better localization and maintainability. Additionally, the code has been cleaned up to remove hardcoded strings, improving readability and consistency across the planning and execution processes.
* Refactor PlannerObserver and StepExecutor to Utilize I18N for Prompts
This update enhances the PlannerObserver and StepExecutor classes by integrating the I18N utility for managing prompts and messages. The system and user prompts are now retrieved from the I18N module, allowing for better localization and maintainability. Additionally, the code has been cleaned up to remove hardcoded strings, improving readability and consistency across the planning and execution processes.
* consolidate agent logic
* fix datetime
* improving step executor
* refactor: streamline observation and refinement process in PlannerObserver
- Updated the PlannerObserver to apply structured refinements directly from observations without requiring a second LLM call.
- Renamed method to for clarity.
- Enhanced documentation to reflect changes in how refinements are handled.
- Removed unnecessary LLM message building and parsing logic, simplifying the refinement process.
- Updated event emissions to include summaries of refinements instead of raw data.
* enhance step executor with tool usage events and validation
- Added event emissions for tool usage, including started and finished events, to track tool execution.
- Implemented validation to ensure expected tools are called during step execution, raising errors when not.
- Refactored the method to handle tool execution with event logging.
- Introduced a new method for parsing tool input into a structured format.
- Updated tests to cover new functionality and ensure correct behavior of tool usage events.
* refactor: enhance final answer synthesis logic in AgentExecutor
- Updated the finalization process to conditionally skip synthesis when the last todo result is sufficient as a complete answer.
- Introduced a new method to determine if the last todo result can be used directly, improving efficiency.
- Added tests to verify the new behavior, ensuring synthesis is skipped when appropriate and maintained when a response model is set.
* fix: update observation handling in PlannerObserver for LLM errors
- Modified the error handling in the PlannerObserver to default to a conservative replan when an LLM call fails.
- Updated the return values to indicate that the step was not completed successfully and that a full replan is needed.
- Added a new test to verify the behavior of the observer when an LLM error occurs, ensuring the correct replan logic is triggered.
* refactor: enhance planning and execution flow in agents
- Updated the PlannerObserver to accept a kickoff input for standalone task execution, improving flexibility in task handling.
- Refined the step execution process in StepExecutor to support multi-turn action loops, allowing for iterative tool execution and observation.
- Introduced a method to extract relevant task sections from descriptions, ensuring clarity in task requirements.
- Enhanced the AgentExecutor to manage step failures more effectively, triggering replans only when necessary and preserving completed task history.
- Updated translations to reflect changes in planning principles and execution prompts, emphasizing concrete and executable steps.
* refactor: update setup_native_tools to include tool_name_mapping
- Modified the setup_native_tools function to return an additional mapping of tool names.
- Updated StepExecutor and AgentExecutor classes to accommodate the new return value from setup_native_tools.
* fix tests
* linted
* linted
* feat: enhance image block handling in Anthropic provider and update AgentExecutor logic
- Added a method to convert OpenAI-style image_url blocks to Anthropic's required format.
- Updated AgentExecutor to handle cases where no todos are ready, introducing a needs_replan return state.
- Improved fallback answer generation in AgentExecutor to prevent RuntimeErrors when no final output is produced.
* lint
* lint
* 1. Added failed to TodoStatus (planning_types.py)
- TodoStatus now includes failed as a valid state: Literal[pending, running, completed, failed]
- Added mark_failed(step_number, result) method to TodoList
- Added get_failed_todos() method to TodoList
- Updated is_complete to treat both completed and failed as terminal states
- Updated replace_pending_todos docstring to mention failed items are preserved
2. Mark running todos as failed before replan (agent_executor.py)
All three effort-level handlers now call mark_failed() on the current todo before routing to replan_now:
- Low effort (handle_step_observed_low): hard-failure branch
- Medium effort (handle_step_observed_medium): needs_full_replan branch
- High effort (decide_next_action): both needs_full_replan and step_completed_successfully=False branches
3. Updated _should_replan to use get_failed_todos()
Previously filtered on todo.status == failed which was dead code. Now uses the proper accessor method that will actually find failed items.
What this fixes: Before these changes, a step that triggered a replan would stay in running status permanently, causing is_complete to never
return True and next_pending to skip it — leading to stuck execution states. Now failed steps are properly tracked, replanning context correctly
reports them, and LiteAgentOutput.failed_todos will actually return results.
* fix test
* imp on failed states
* adjusted the var name from AgentReActState to AgentExecutorState
* addressed p0 bugs
* more improvements
* linted
* regen cassette
* addressing crictical comments
* ensure configurable timeouts, max_replans and max step iterations
* adjusted tools
* dropping debug statements
* addressed comment
* fix linter
* lints and test fixes
* fix: default observation parse fallback to failure and clean up plan-execute types
When _parse_observation_response fails all parse attempts, default to
step_completed_successfully=False instead of True to avoid silently
masking failures. Extract duplicate _extract_task_section into a shared
utility in agent_utils. Type PlanningConfig.llm as str | BaseLLM | None
instead of str | Any | None. Make StepResult a frozen dataclass for
immutability consistency with StepExecutionContext.
* fix: remove Any from function_calling_llm union type in step_executor
* fix: make BaseTool usage count thread-safe for parallel step execution
Add _usage_lock and _claim_usage() to BaseTool for atomic
check-and-increment of current_usage_count. This prevents race
conditions when parallel plan steps invoke the same tool concurrently
via execute_todos_parallel. Remove the racy pre-check from
execute_single_native_tool_call since the limit is now enforced
atomically inside tool.run().
---------
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
Co-authored-by: Greyson LaLonde <greyson@crewai.com>
* fix(bedrock): group parallel tool results in single user message
When an AWS Bedrock model makes multiple tool calls in a single
response, the Converse API requires all corresponding tool results
to be sent back in a single user message. Previously, each tool
result was emitted as a separate user message, causing:
ValidationException: Expected toolResult blocks at messages.2.content
Fix: When processing consecutive tool messages, append the toolResult
block to the preceding user message (if it already contains
toolResult blocks) instead of creating a new message. This groups
all parallel tool results together while keeping tool results from
different assistant turns separate.
Fixes#4749
Signed-off-by: Giulio Leone <6887247+giulio-leone@users.noreply.github.com>
* Update lib/crewai/tests/llms/bedrock/test_bedrock.py
* fix: group bedrock tool results
Co-authored-by: João Moura <joaomdmoura@gmail.com>
---------
Signed-off-by: Giulio Leone <6887247+giulio-leone@users.noreply.github.com>
Co-authored-by: Giulio Leone <6887247+giulio-leone@users.noreply.github.com>
Co-authored-by: João Moura <joaomdmoura@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
* fix: map output_pydantic/output_json to native structured output
* test: add crew+tools+structured output integration test for Gemini
* fix: re-record stale cassette for test_crew_testing_function
* fix: re-record remaining stale cassettes for native structured output
* fix: enable native structured output for lite agent and fix mypy errors
* fix: bedrock region was always set to "us-east-1" not respecting the env
var.
code had AWS_REGION_NAME referenced, but not used, unified to
AWS_DEFAULT_REGION as per documentation
* DRY code improvement and fix caught by tests.
* Supporting litellm configuration
* feat: enhance AnthropicCompletion to support available functions in tool execution
- Updated the `_prepare_completion_params` method to accept `available_functions` for better tool handling.
- Modified tool execution logic to directly return results from tools when `available_functions` is provided, aligning behavior with OpenAI's model.
- Added new test cases to validate the execution of tools with available functions, ensuring correct argument passing and result formatting.
This change improves the flexibility and usability of the Anthropic LLM integration, allowing for more complex interactions with tools.
* refactor: remove redundant event emission in AnthropicCompletion
* fix test
* dry up
* fix: improve output handling and response model integration in agents
- Refactored output handling in the Agent class to ensure proper conversion and formatting of outputs, including support for BaseModel instances.
- Enhanced the AgentExecutor class to correctly utilize response models during execution, improving the handling of structured outputs.
- Updated the Gemini and Anthropic completion providers to ensure compatibility with new response model handling, including the addition of strict mode for function definitions.
- Improved the OpenAI completion provider to enforce strict adherence to function schemas.
- Adjusted translations to clarify instructions regarding output formatting and schema adherence.
* drop what was a print that didnt get deleted properly
* fixes gemini
* azure working
* bedrock works
* added tests
* adjust test
* fix tests and regen
* fix tests and regen
* refactor: ensure stop words are applied correctly in Azure, Gemini, and OpenAI completions; add tests to validate behavior with structured outputs
* linting
- add gemini 2.0 schema support using response_json_schema with propertyordering while retaining backward compatibility for earlier models
- refactor llm completions to return validated pydantic models when a response_model is provided, updating hooks, types, and tests for consistent structured outputs
- extend agentfinish and executors to support basemodel outputs, improve anthropic structured parsing, and clean up schema utilities, tests, and original_json handling
- Updated the GeminiCompletion class to handle non-dict values returned from tools, ensuring that floats are wrapped in a dictionary format for consistent response handling.
- Introduced a new YAML cassette to test the Gemini LLM's ability to process tools that return float values, verifying that the agent can correctly utilize the sum_numbers tool and return the expected results.
- Added a comprehensive test case to validate the integration of the sum_numbers tool within the Gemini LLM, ensuring accurate calculations and proper response formatting.
These changes improve the robustness of tool interactions within the Gemini LLM and enhance testing coverage for float return values.
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
- add input_files parameter to Crew.kickoff(), Flow.kickoff(), Task, and Agent.kickoff()
- add provider-specific file uploaders for OpenAI, Anthropic, Gemini, and Bedrock
- add file type detection, constraint validation, and automatic format conversion
- add URL file source support for multimodal content
- add streaming uploads for large files
- add prompt caching support for Anthropic
- add OpenAI Responses API support
* wip restrcuturing agent executor and liteagent
* fix: handle None task in AgentExecutor to prevent errors
Added a check to ensure that if the task is None, the method returns early without attempting to access task properties. This change improves the robustness of the AgentExecutor by preventing potential errors when the task is not set.
* refactor: streamline AgentExecutor initialization by removing redundant parameters
Updated the Agent class to simplify the initialization of the AgentExecutor by removing unnecessary task and crew parameters in standalone mode. This change enhances code clarity and maintains backward compatibility by ensuring that the executor is correctly configured without redundant assignments.
* wip: clean
* ensure executors work inside a flow due to flow in flow async structure
* refactor: enhance agent kickoff preparation by separating common logic
Updated the Agent class to introduce a new private method that consolidates the common setup logic for both synchronous and asynchronous kickoff executions. This change improves code clarity and maintainability by reducing redundancy in the kickoff process, while ensuring that the agent can still execute effectively within both standalone and flow contexts.
* linting and tests
* fix test
* refactor: improve test for Agent kickoff parameters
Updated the test for the Agent class to ensure that the kickoff method correctly preserves parameters. The test now verifies the configuration of the agent after kickoff, enhancing clarity and maintainability. Additionally, the test for asynchronous kickoff within a flow context has been updated to reflect the Agent class instead of LiteAgent.
* refactor: update test task guardrail process output for improved validation
Refactored the test for task guardrail process output to enhance the validation of the output against the OpenAPI schema. The changes include a more structured request body and updated response handling to ensure compliance with the guardrail requirements. This update aims to improve the clarity and reliability of the test cases, ensuring that task outputs are correctly validated and feedback is appropriately provided.
* test fix cassette
* test fix cassette
* working
* working cassette
* refactor: streamline agent execution and enhance flow compatibility
Refactored the Agent class to simplify the execution method by removing the event loop check and clarifying the behavior when called from synchronous and asynchronous contexts. The changes ensure that the method operates seamlessly within flow methods, improving clarity in the documentation. Additionally, updated the AgentExecutor to set the response model to None, enhancing flexibility. New test cassettes were added to validate the functionality of agents within flow contexts, ensuring robust testing for both synchronous and asynchronous operations.
* fixed cassette
* Enhance Flow Execution Logic
- Introduced conditional execution for start methods in the Flow class.
- Unconditional start methods are prioritized during kickoff, while conditional starts are executed only if no unconditional starts are present.
- Improved handling of cyclic flows by allowing re-execution of conditional start methods triggered by routers.
- Added checks to continue execution chains for completed conditional starts.
These changes improve the flexibility and control of flow execution, ensuring that the correct methods are triggered based on the defined conditions.
* Enhance Agent and Flow Execution Logic
- Updated the Agent class to automatically detect the event loop and return a coroutine when called within a Flow, simplifying async handling for users.
- Modified Flow class to execute listeners sequentially, preventing race conditions on shared state during listener execution.
- Improved handling of coroutine results from synchronous methods, ensuring proper execution flow and state management.
These changes enhance the overall execution logic and user experience when working with agents and flows in CrewAI.
* Enhance Flow Listener Logic and Agent Imports
- Updated the Flow class to track fired OR listeners, ensuring that multi-source OR listeners only trigger once during execution. This prevents redundant executions and improves flow efficiency.
- Cleared fired OR listeners during cyclic flow resets to allow re-execution in new cycles.
- Modified the Agent class imports to include Coroutine from collections.abc, enhancing type handling for asynchronous operations.
These changes improve the control and performance of flow execution in CrewAI, ensuring more predictable behavior in complex scenarios.
* adjusted test due to new cassette
* ensure native tool calling works with liteagent
* ensure response model is respected
* Enhance Tool Name Handling for LLM Compatibility
- Added a new function to replace invalid characters in function names with underscores, ensuring compatibility with LLM providers.
- Updated the function to sanitize tool names before validation.
- Modified the function to use sanitized names for tool registration.
These changes improve the robustness of tool name handling, preventing potential issues with invalid characters in function names.
* ensure we dont finalize batch on just a liteagent finishing
* max tools per turn wip and ensure we drop print times
* fix sync main issues
* fix llm_call_completed event serialization issue
* drop max_tools_iterations
* for fixing model dump with state
* Add extract_tool_call_info function to handle various tool call formats
- Introduced a new utility function to extract tool call ID, name, and arguments from different provider formats (OpenAI, Gemini, Anthropic, and dictionary).
- This enhancement improves the flexibility and compatibility of tool calls across multiple LLM providers, ensuring consistent handling of tool call information.
- The function returns a tuple containing the call ID, function name, and function arguments, or None if the format is unrecognized.
* Refactor AgentExecutor to support batch execution of native tool calls
- Updated the method to process all tools from in a single batch, enhancing efficiency and reducing the number of interactions with the LLM.
- Introduced a new utility function to streamline the extraction of tool call details, improving compatibility with various tool formats.
- Removed the parameter, simplifying the initialization of the .
- Enhanced logging and message handling to provide clearer insights during tool execution.
- This refactor improves the overall performance and usability of the agent execution flow.
* Update English translations for tool usage and reasoning instructions
- Revised the `post_tool_reasoning` message to clarify the analysis process after tool usage, emphasizing the need to provide only the final answer if requirements are met.
- Updated the `format` message to simplify the instructions for deciding between using a tool or providing a final answer, enhancing clarity for users.
- These changes improve the overall user experience by providing clearer guidance on task execution and response formatting.
* fix
* fixing azure tests
* organizae imports
* dropped unused
* Remove debug print statements from AgentExecutor to clean up the code and improve readability. This change enhances the overall performance of the agent execution flow by eliminating unnecessary console output during LLM calls and iterations.
* linted
* updated cassette
* regen cassette
* revert crew agent executor
* adjust cassettes and dropped tests due to native tool implementation
* adjust
* ensure we properly fail tools and emit their events
* Enhance tool handling and delegation tracking in agent executors
- Implemented immediate return for tools with result_as_answer=True in crew_agent_executor.py.
- Added delegation tracking functionality in agent_utils.py to increment delegations when specific tools are used.
- Updated tool usage logic to handle caching more effectively in tool_usage.py.
- Enhanced test cases to validate new delegation features and tool caching behavior.
This update improves the efficiency of tool execution and enhances the delegation capabilities of agents.
* Enhance tool handling and delegation tracking in agent executors
- Implemented immediate return for tools with result_as_answer=True in crew_agent_executor.py.
- Added delegation tracking functionality in agent_utils.py to increment delegations when specific tools are used.
- Updated tool usage logic to handle caching more effectively in tool_usage.py.
- Enhanced test cases to validate new delegation features and tool caching behavior.
This update improves the efficiency of tool execution and enhances the delegation capabilities of agents.
* fix cassettes
* fix
* regen cassettes
* regen gemini
* ensure we support bedrock
* supporting bedrock
* regen azure cassettes
* Implement max usage count tracking for tools in agent executors
- Added functionality to check if a tool has reached its maximum usage count before execution in both crew_agent_executor.py and agent_executor.py.
- Enhanced error handling to return a message when a tool's usage limit is reached.
- Updated tool usage logic in tool_usage.py to increment usage counts and print current usage status.
- Introduced tests to validate max usage count behavior for native tool calling, ensuring proper enforcement and tracking.
This update improves tool management by preventing overuse and providing clear feedback when limits are reached.
* fix other test
* fix test
* drop logs
* better tests
* regen
* regen all azure cassettes
* regen again placeholder for cassette matching
* fix: unify tool name sanitization across codebase
* fix: include tool role messages in save_last_messages
* fix: update sanitize_tool_name test expectations
Align test expectations with unified sanitize_tool_name behavior
that lowercases and splits camelCase for LLM provider compatibility.
* fix: apply sanitize_tool_name consistently across codebase
Unify tool name sanitization to ensure consistency between tool names
shown to LLMs and tool name matching/lookup logic.
* regen
* fix: sanitize tool names in native tool call processing
- Update extract_tool_call_info to return sanitized tool names
- Fix delegation tool name matching to use sanitized names
- Add sanitization in crew_agent_executor tool call extraction
- Add sanitization in experimental agent_executor
- Add sanitization in LLM.call function lookup
- Update streaming utility to use sanitized names
- Update base_agent_executor_mixin delegation check
* Extract text content from parts directly to avoid warning about non-text parts
* Add test case for Gemini token usage tracking
- Introduced a new YAML cassette for tracking token usage in Gemini API responses.
- Updated the test for Gemini to validate token usage metrics and response content.
- Ensured proper integration with the Gemini model and API key handling.
---------
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
- Updated the `supports_stop_words` method to accurately reflect support for stop sequences based on model type, specifically excluding GPT-5 and O-series models.
- Added comprehensive tests to verify that GPT-5 family and O-series models do not support stop words, ensuring correct behavior in completion parameter preparation.
- Ensured that stop words are not included in parameters for unsupported models while maintaining expected behavior for supported models.
* supporting thinking for anthropic models
* drop comments here
* thinking and tool calling support
* fix: properly mock tool use and text block types in Anthropic tests
- Updated the test for the Anthropic tool use conversation flow to include type attributes for mocked ToolUseBlock and text blocks, ensuring accurate simulation of tool interactions during testing.
* feat: add AnthropicThinkingConfig for enhanced thinking capabilities
This update introduces the AnthropicThinkingConfig class to manage thinking parameters for the Anthropic completion model. The LLM and AnthropicCompletion classes have been updated to utilize this new configuration. Additionally, new test cassettes have been added to validate the functionality of thinking blocks across interactions.
* Adding drop parameters
* Adding test case
* Just some spacing addition
* Adding drop params to maintain consistency
* Changing variable name
---------
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
* Add gemini-3-pro-preview
Also refactors the tool support check for better forward compatibility.
* Add cassette for Gemini 3 Pro
---------
Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>