Release/v1.0.0 (#3618)

* feat: add `apps` & `actions` attributes to Agent (#3504) * feat: add app attributes to Agent * feat: add actions attribute to Agent * chore: resolve linter issues * refactor: merge the apps and actions parameters into a single one * fix: remove unnecessary print * feat: logging error when CrewaiPlatformTools fails * chore: export CrewaiPlatformTools directly from crewai_tools * style: resolver linter issues * test: fix broken tests * style: solve linter issues * fix: fix broken test * feat: monorepo restructure and test/ci updates - Add crewai workspace member - Fix vcr cassette paths and restore test dirs - Resolve ci failures and update linter/pytest rules * chore: update python version to 3.13 and package metadata * feat: add crewai-tools workspace and fix tests/dependencies * feat: add crewai-tools workspace structure * Squashed 'temp-crewai-tools/' content from commit 9bae5633 git-subtree-dir: temp-crewai-tools git-subtree-split: 9bae56339096cb70f03873e600192bd2cd207ac9 * feat: configure crewai-tools workspace package with dependencies * fix: apply ruff auto-formatting to crewai-tools code * chore: update lockfile * fix: don't allow tool tests yet * fix: comment out extra pytest flags for now * fix: remove conflicting conftest.py from crewai-tools tests * fix: resolve dependency conflicts and test issues - Pin vcrpy to 7.0.0 to fix pytest-recording compatibility - Comment out types-requests to resolve urllib3 conflict - Update requests requirement in crewai-tools to >=2.32.0 * chore: update CI workflows and docs for monorepo structure * chore: update CI workflows and docs for monorepo structure * fix: actions syntax * chore: ci publish and pin versions * fix: add permission to action * chore: bump version to 1.0.0a1 across all packages - Updated version to 1.0.0a1 in pyproject.toml for crewai and crewai-tools - Adjusted version in __init__.py files for consistency * WIP: v1 docs (#3626) (cherry picked from commit d46e20fa09bcd2f5916282f5553ddeb7183bd92c) * docs: parity for all translations * docs: full name of acronym AMP * docs: fix lingering unused code * docs: expand contextual options in docs.json * docs: add contextual action to request feature on GitHub (#3635) * chore: apply linting fixes to crewai-tools * feat: add required env var validation for brightdata Co-authored-by: Greyson Lalonde <greyson.r.lalonde@gmail.com> * fix: handle properly anyOf oneOf allOf schema's props Co-authored-by: Greyson Lalonde <greyson.r.lalonde@gmail.com> * feat: bump version to 1.0.0a2 * Lorenze/native inference sdks (#3619) * ruff linted * using native sdks with litellm fallback * drop exa * drop print on completion * Refactor LLM and utility functions for type consistency - Updated `max_tokens` parameter in `LLM` class to accept `float` in addition to `int`. - Modified `create_llm` function to ensure consistent type hints and return types, now returning `LLM | BaseLLM | None`. - Adjusted type hints for various parameters in `create_llm` and `_llm_via_environment_or_fallback` functions for improved clarity and type safety. - Enhanced test cases to reflect changes in type handling and ensure proper instantiation of LLM instances. * fix agent_tests * fix litellm tests and usagemetrics fix * drop print * Refactor LLM event handling and improve test coverage - Removed commented-out event emission for LLM call failures in `llm.py`. - Added `from_agent` parameter to `CrewAgentExecutor` for better context in LLM responses. - Enhanced test for LLM call failure to simulate OpenAI API failure and updated assertions for clarity. - Updated agent and task ID assertions in tests to ensure they are consistently treated as strings. * fix test_converter * fixed tests/agents/test_agent.py * Refactor LLM context length exception handling and improve provider integration - Renamed `LLMContextLengthExceededException` to `LLMContextLengthExceededExceptionError` for clarity and consistency. - Updated LLM class to pass the provider parameter correctly during initialization. - Enhanced error handling in various LLM provider implementations to raise the new exception type. - Adjusted tests to reflect the updated exception name and ensure proper error handling in context length scenarios. * Enhance LLM context window handling across providers - Introduced CONTEXT_WINDOW_USAGE_RATIO to adjust context window sizes dynamically for Anthropic, Azure, Gemini, and OpenAI LLMs. - Added validation for context window sizes in Azure and Gemini providers to ensure they fall within acceptable limits. - Updated context window size calculations to use the new ratio, improving consistency and adaptability across different models. - Removed hardcoded context window sizes in favor of ratio-based calculations for better flexibility. * fix test agent again * fix test agent * feat: add native LLM providers for Anthropic, Azure, and Gemini - Introduced new completion implementations for Anthropic, Azure, and Gemini, integrating their respective SDKs. - Added utility functions for tool validation and extraction to support function calling across LLM providers. - Enhanced context window management and token usage extraction for each provider. - Created a common utility module for shared functionality among LLM providers. * chore: update dependencies and improve context management - Removed direct dependency on `litellm` from the main dependencies and added it under extras for better modularity. - Updated the `litellm` dependency specification to allow for greater flexibility in versioning. - Refactored context length exception handling across various LLM providers to use a consistent error class. - Enhanced platform-specific dependency markers for NVIDIA packages to ensure compatibility across different systems. * refactor(tests): update LLM instantiation to include is_litellm flag in test cases - Modified multiple test cases in test_llm.py to set the is_litellm parameter to True when instantiating the LLM class. - This change ensures that the tests are aligned with the latest LLM configuration requirements and improves consistency across test scenarios. - Adjusted relevant assertions and comments to reflect the updated LLM behavior. * linter * linted * revert constants * fix(tests): correct type hint in expected model description - Updated the expected description in the test_generate_model_description_dict_field function to use 'Dict' instead of 'dict' for consistency with type hinting conventions. - This change ensures that the test accurately reflects the expected output format for model descriptions. * refactor(llm): enhance LLM instantiation and error handling - Updated the LLM class to include validation for the model parameter, ensuring it is a non-empty string. - Improved error handling by logging warnings when the native SDK fails, allowing for a fallback to LiteLLM. - Adjusted the instantiation of LLM in test cases to consistently include the is_litellm flag, aligning with recent changes in LLM configuration. - Modified relevant tests to reflect these updates, ensuring better coverage and accuracy in testing scenarios. * fixed test * refactor(llm): enhance token usage tracking and add copy methods - Updated the LLM class to track token usage and log callbacks in streaming mode, improving monitoring capabilities. - Introduced shallow and deep copy methods for the LLM instance, allowing for better management of LLM configurations and parameters. - Adjusted test cases to instantiate LLM with the is_litellm flag, ensuring alignment with recent changes in LLM configuration. * refactor(tests): reorganize imports and enhance error messages in test cases - Cleaned up import statements in test_crew.py for better organization and readability. - Enhanced error messages in test cases to use `re.escape` for improved regex matching, ensuring more robust error handling. - Adjusted comments for clarity and consistency across test scenarios. - Ensured that all necessary modules are imported correctly to avoid potential runtime issues. * feat: add base devtooling * fix: ensure dep refs are updated for devtools * fix: allow pre-release * feat: allow release after tag * feat: bump versions to 1.0.0a3 Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com> * fix: match tag and release title, ignore devtools build for pypi * fix: allow failed pypi publish * feat: introduce trigger listing and execution commands for local development (#3643) * chore: exclude tests from ruff linting * chore: exclude tests from GitHub Actions linter * fix: replace print statements with logger in agent and memory handling * chore: add noqa for intentional print in printer utility * fix: resolve linting errors across codebase * feat: update docs with new approach to consume Platform Actions (#3675) * fix: remove duplicate line and add explicit env var * feat: bump versions to 1.0.0a4 (#3686) * Update triggers docs (#3678) * docs: introduce triggers list & triggers run command * docs: add KO triggers docs * docs: ensure CREWAI_PLATFORM_INTEGRATION_TOKEN is mentioned on docs (#3687) * Lorenze/bedrock llm (#3693) * feat: add AWS Bedrock support and update dependencies - Introduced BedrockCompletion class for AWS Bedrock integration in LLM. - Added boto3 as a new dependency in both pyproject.toml and uv.lock. - Updated LLM class to support Bedrock provider. - Created new files for Bedrock provider implementation. * using converse api * converse * linted * refactor: update BedrockCompletion class to improve parameter handling - Changed max_tokens from a fixed integer to an optional integer. - Simplified model ID assignment by removing the inference profile mapping method. - Cleaned up comments and unnecessary code related to tool specifications and model-specific parameters. * feat: improve event bus thread safety and async support Add thread-safe, async-compatible event bus with read–write locking and handler dependency ordering. Remove blinker dependency and implement direct dispatch. Improve type safety, error handling, and deterministic event synchronization. Refactor tests to auto-wait for async handlers, ensure clean teardown, and add comprehensive concurrency coverage. Replace thread-local state in AgentEvaluator with instance-based locking for correct cross-thread access. Enhance tracing reliability and event finalization. * feat: enhance OpenAICompletion class with additional client parameters (#3701) * feat: enhance OpenAICompletion class with additional client parameters - Added support for default_headers, default_query, and client_params in the OpenAICompletion class. - Refactored client initialization to use a dedicated method for client parameter retrieval. - Introduced new test cases to validate the correct usage of OpenAICompletion with various parameters. * fix: correct test case for unsupported OpenAI model - Updated the test_openai.py to ensure that the LLM instance is created before calling the method, maintaining proper error handling for unsupported models. - This change ensures that the test accurately checks for the NotFoundError when an invalid model is specified. * fix: enhance error handling in OpenAICompletion class - Added specific exception handling for NotFoundError and APIConnectionError in the OpenAICompletion class to provide clearer error messages and improve logging. - Updated the test case for unsupported models to ensure it raises a ValueError with the appropriate message when a non-existent model is specified. - This change improves the robustness of the OpenAI API integration and enhances the clarity of error reporting. * fix: improve test for unsupported OpenAI model handling - Refactored the test case in test_openai.py to create the LLM instance after mocking the OpenAI client, ensuring proper error handling for unsupported models. - This change enhances the clarity of the test by accurately checking for ValueError when a non-existent model is specified, aligning with recent improvements in error handling for the OpenAICompletion class. * feat: bump versions to 1.0.0b1 (#3706) * Lorenze/tools drop litellm (#3710) * completely drop litellm and correctly pass config for qdrant * feat: add support for additional embedding models in EmbeddingService - Expanded the list of supported embedding models to include Google Vertex, Hugging Face, Jina, Ollama, OpenAI, Roboflow, Watson X, custom embeddings, Sentence Transformers, Text2Vec, OpenClip, and Instructor. - This enhancement improves the versatility of the EmbeddingService by allowing integration with a wider range of embedding providers. * fix: update collection parameter handling in CrewAIRagAdapter - Changed the condition for setting vectors_config in the CrewAIRagAdapter to check for QdrantConfig instance instead of using hasattr. This improves type safety and ensures proper configuration handling for Qdrant integration. * moved stagehand as optional dep (#3712) * feat: bump versions to 1.0.0b2 (#3713) * feat: enhance AnthropicCompletion class with additional client parame… (#3707) * feat: enhance AnthropicCompletion class with additional client parameters and tool handling - Added support for client_params in the AnthropicCompletion class to allow for additional client configuration. - Refactored client initialization to use a dedicated method for retrieving client parameters. - Implemented a new method to handle tool use conversation flow, ensuring proper execution and response handling. - Introduced comprehensive test cases to validate the functionality of the AnthropicCompletion class, including tool use scenarios and parameter handling. * drop print statements * test: add fixture to mock ANTHROPIC_API_KEY for tests - Introduced a pytest fixture to automatically mock the ANTHROPIC_API_KEY environment variable for all tests in the test_anthropic.py module. - This change ensures that tests can run without requiring a real API key, improving test isolation and reliability. * refactor: streamline streaming message handling in AnthropicCompletion class - Removed the 'stream' parameter from the API call as it is set internally by the SDK. - Simplified the handling of tool use events and response construction by extracting token usage from the final message. - Enhanced the flow for managing tool use conversation, ensuring proper integration with the streaming API response. * fix streaming here too * fix: improve error handling in tool conversion for AnthropicCompletion class - Enhanced exception handling during tool conversion by catching KeyError and ValueError. - Added logging for conversion errors to aid in debugging and maintain robustness in tool integration. * feat: enhance GeminiCompletion class with client parameter support (#3717) * feat: enhance GeminiCompletion class with client parameter support - Added support for client_params in the GeminiCompletion class to allow for additional client configuration. - Refactored client initialization into a dedicated method for improved parameter handling. - Introduced a new method to retrieve client parameters, ensuring compatibility with the base class. - Enhanced error handling during client initialization to provide clearer messages for missing configuration. - Updated documentation to reflect the changes in client parameter usage. * add optional dependancies * refactor: update test fixture to mock GOOGLE_API_KEY - Renamed the fixture from `mock_anthropic_api_key` to `mock_google_api_key` to reflect the change in the environment variable being mocked. - This update ensures that all tests in the module can run with a mocked GOOGLE_API_KEY, improving test isolation and reliability. * fix tests * feat: enhance BedrockCompletion class with advanced features * feat: enhance BedrockCompletion class with advanced features and error handling - Added support for guardrail configuration, additional model request fields, and custom response field paths in the BedrockCompletion class. - Improved error handling for AWS exceptions and added token usage tracking with stop reason logging. - Enhanced streaming response handling with comprehensive event management, including tool use and content block processing. - Updated documentation to reflect new features and initialization parameters. - Introduced a new test suite for BedrockCompletion to validate functionality and ensure robust integration with AWS Bedrock APIs. * chore: add boto typing * fix: use typing_extensions.Required for Python 3.10 compatibility --------- Co-authored-by: Greyson Lalonde <greyson.r.lalonde@gmail.com> * feat: azure native tests * feat: add Azure AI Inference support and related tests - Introduced the `azure-ai-inference` package with version `1.0.0b9` and its dependencies in `uv.lock` and `pyproject.toml`. - Added new test files for Azure LLM functionality, including tests for Azure completion and tool handling. - Implemented comprehensive test cases to validate Azure-specific behavior and integration with the CrewAI framework. - Enhanced the testing framework to mock Azure credentials and ensure proper isolation during tests. * feat: enhance AzureCompletion class with Azure OpenAI support - Added support for the Azure OpenAI endpoint in the AzureCompletion class, allowing for flexible endpoint configurations. - Implemented endpoint validation and correction to ensure proper URL formats for Azure OpenAI deployments. - Enhanced error handling to provide clearer messages for common HTTP errors, including authentication and rate limit issues. - Updated tests to validate the new endpoint handling and error messaging, ensuring robust integration with Azure AI Inference. - Refactored parameter preparation to conditionally include the model parameter based on the endpoint type. * refactor: convert project module to metaclass with full typing * Lorenze/OpenAI base url backwards support (#3723) * fix: enhance OpenAICompletion class base URL handling - Updated the base URL assignment in the OpenAICompletion class to prioritize the new `api_base` attribute and fallback to the environment variable `OPENAI_BASE_URL` if both are not set. - Added `api_base` to the list of parameters in the OpenAICompletion class to ensure proper configuration and flexibility in API endpoint management. * feat: enhance OpenAICompletion class with api_base support - Added the `api_base` parameter to the OpenAICompletion class to allow for flexible API endpoint configuration. - Updated the `_get_client_params` method to prioritize `base_url` over `api_base`, ensuring correct URL handling. - Introduced comprehensive tests to validate the behavior of `api_base` and `base_url` in various scenarios, including environment variable fallback. - Enhanced test coverage for client parameter retrieval, ensuring robust integration with the OpenAI API. * fix: improve OpenAICompletion class configuration handling - Added a debug print statement to log the client configuration parameters during initialization for better traceability. - Updated the base URL assignment logic to ensure it defaults to None if no valid base URL is provided, enhancing robustness in API endpoint configuration. - Refined the retrieval of the `api_base` environment variable to streamline the configuration process. * drop print * feat: improvements on import native sdk support (#3725) * feat: add support for Anthropic provider and enhance logging - Introduced the `anthropic` package with version `0.69.0` in `pyproject.toml` and `uv.lock`, allowing for integration with the Anthropic API. - Updated logging in the LLM class to provide clearer error messages when importing native providers, enhancing debugging capabilities. - Improved error handling in the AnthropicCompletion class to guide users on installation via the updated error message format. - Refactored import error handling in other provider classes to maintain consistency in error messaging and installation instructions. * feat: enhance LLM support with Bedrock provider and update dependencies - Added support for the `bedrock` provider in the LLM class, allowing integration with AWS Bedrock APIs. - Updated `uv.lock` to replace `boto3` with `bedrock` in the dependencies, reflecting the new provider structure. - Introduced `SUPPORTED_NATIVE_PROVIDERS` to include `bedrock` and ensure proper error handling when instantiating native providers. - Enhanced error handling in the LLM class to raise informative errors when native provider instantiation fails. - Added tests to validate the behavior of the new Bedrock provider and ensure fallback mechanisms work correctly for unsupported providers. * test: update native provider fallback tests to expect ImportError * adjust the test with the expected bevaior - raising ImportError * this is exoecting the litellm format, all gemini native tests are in test_google.py --------- Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com> * fix: remove stdout prints, improve test determinism, and update trace handling Removed `print` statements from the `LLMStreamChunkEvent` handler to prevent LLM response chunks from being written directly to stdout. The listener now only tracks chunks internally. Fixes #3715 Added explicit return statements for trace-related tests. Updated cassette for `test_failed_evaluation` to reflect new behavior where an empty trace dict is used instead of returning early. Ensured deterministic cleanup order in test fixtures by making `clear_event_bus_handlers` depend on `setup_test_environment`. This guarantees event bus shutdown and file handle cleanup occur before temporary directory deletion, resolving intermittent “Directory not empty” errors in CI. * chore: remove lib/crewai exclusion from pre-commit hooks * feat: enhance task guardrail functionality and validation * feat: enhance task guardrail functionality and validation - Introduced support for multiple guardrails in the Task class, allowing for sequential processing of guardrails. - Added a new `guardrails` field to the Task model to accept a list of callable guardrails or string descriptions. - Implemented validation to ensure guardrails are processed correctly, including handling of retries and error messages. - Enhanced the `_invoke_guardrail_function` method to manage guardrail execution and integrate with existing task output processing. - Updated tests to cover various scenarios involving multiple guardrails, including success, failure, and retry mechanisms. This update improves the flexibility and robustness of task execution by allowing for more complex validation scenarios. * refactor: enhance guardrail type handling in Task model - Updated the Task class to improve guardrail type definitions, introducing GuardrailType and GuardrailsType for better clarity and type safety. - Simplified the validation logic for guardrails, ensuring that both single and multiple guardrails are processed correctly. - Enhanced error messages for guardrail validation to provide clearer feedback when incorrect types are provided. - This refactor improves the maintainability and robustness of task execution by standardizing guardrail handling. * feat: implement per-guardrail retry tracking in Task model - Introduced a new private attribute `_guardrail_retry_counts` to the Task class for tracking retry attempts on a per-guardrail basis. - Updated the guardrail processing logic to utilize the new retry tracking, allowing for independent retry counts for each guardrail. - Enhanced error handling to provide clearer feedback when guardrails fail validation after exceeding retry limits. - Modified existing tests to validate the new retry tracking behavior, ensuring accurate assertions on guardrail retries. This update improves the robustness and flexibility of task execution by allowing for more granular control over guardrail validation and retry mechanisms. * chore: 1.0.0b3 bump (#3734) * chore: full ruff and mypy improved linting, pre-commit setup, and internal architecture. Configured Ruff to respect .gitignore, added stricter rules, and introduced a lock pre-commit hook with virtualenv activation. Fixed type shadowing in EXASearchTool using a type_ alias to avoid PEP 563 conflicts and resolved circular imports in agent executor and guardrail modules. Removed agent-ops attributes, deprecated watson alias, and dropped crewai-enterprise tools with corresponding test updates. Refactored cache and memoization for thread safety and cleaned up structured output adapters and related logic. * New MCL DSL (#3738) * Adding MCP implementation * New tests for MCP implementation * fix tests * update docs * Revert "New tests for MCP implementation" This reverts commit 0bbe6dee90. * linter * linter * fix * verify mcp pacakge exists * adjust docs to be clear only remote servers are supported * reverted * ensure args schema generated properly * properly close out --------- Co-authored-by: lorenzejay <lorenzejaytech@gmail.com> Co-authored-by: Greyson Lalonde <greyson.r.lalonde@gmail.com> * feat: a2a experimental experimental a2a support --------- Co-authored-by: Lucas Gomide <lucaslg200@gmail.com> Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com> Co-authored-by: Tony Kipkemboi <iamtonykipkemboi@gmail.com> Co-authored-by: Mike Plachta <mplachta@users.noreply.github.com> Co-authored-by: João Moura <joaomdmoura@gmail.com>
2026-01-09 16:18:30 +00:00 · 2025-10-20 14:10:19 -07:00
parent 42f2b4d551
commit d1343b96ed
1339 changed files with 111657 additions and 19564 deletions
--- a/lib/crewai/tests/llms/init.py
+++ b/lib/crewai/tests/llms/init.py
--- a/lib/crewai/tests/llms/anthropic/test_anthropic.py
+++ b/lib/crewai/tests/llms/anthropic/test_anthropic.py
@@ -0,0 +1,666 @@
+import os
+import sys
+import types
+from unittest.mock import patch, MagicMock
+import pytest
+
+from crewai.llm import LLM
+from crewai.crew import Crew
+from crewai.agent import Agent
+from crewai.task import Task
+
+
+@pytest.fixture(autouse=True)
+def mock_anthropic_api_key():
+    """Automatically mock ANTHROPIC_API_KEY for all tests in this module."""
+    with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
+        yield
+
+
+def test_anthropic_completion_is_used_when_anthropic_provider():
+    """
+    Test that AnthropicCompletion from completion.py is used when LLM uses provider 'anthropic'
+    """
+    llm = LLM(model="anthropic/claude-3-5-sonnet-20241022")
+
+    assert llm.__class__.__name__ == "AnthropicCompletion"
+    assert llm.provider == "anthropic"
+    assert llm.model == "claude-3-5-sonnet-20241022"
+
+
+def test_anthropic_completion_is_used_when_claude_provider():
+    """
+    Test that AnthropicCompletion is used when provider is 'claude'
+    """
+    llm = LLM(model="claude/claude-3-5-sonnet-20241022")
+
+    from crewai.llms.providers.anthropic.completion import AnthropicCompletion
+    assert isinstance(llm, AnthropicCompletion)
+    assert llm.provider == "claude"
+    assert llm.model == "claude-3-5-sonnet-20241022"
+
+
+
+
+def test_anthropic_tool_use_conversation_flow():
+    """
+    Test that the Anthropic completion properly handles tool use conversation flow
+    """
+    from unittest.mock import Mock, patch
+    from crewai.llms.providers.anthropic.completion import AnthropicCompletion
+    from anthropic.types.tool_use_block import ToolUseBlock
+
+    # Create AnthropicCompletion instance
+    completion = AnthropicCompletion(model="claude-3-5-sonnet-20241022")
+
+    # Mock tool function
+    def mock_weather_tool(location: str) -> str:
+        return f"The weather in {location} is sunny and 75°F"
+
+    available_functions = {"get_weather": mock_weather_tool}
+
+    # Mock the Anthropic client responses
+    with patch.object(completion.client.messages, 'create') as mock_create:
+        # Mock initial response with tool use - need to properly mock ToolUseBlock
+        mock_tool_use = Mock(spec=ToolUseBlock)
+        mock_tool_use.id = "tool_123"
+        mock_tool_use.name = "get_weather"
+        mock_tool_use.input = {"location": "San Francisco"}
+
+        mock_initial_response = Mock()
+        mock_initial_response.content = [mock_tool_use]
+        mock_initial_response.usage = Mock()
+        mock_initial_response.usage.input_tokens = 100
+        mock_initial_response.usage.output_tokens = 50
+
+        # Mock final response after tool result - properly mock text content
+        mock_text_block = Mock()
+        # Set the text attribute as a string, not another Mock
+        mock_text_block.configure_mock(text="Based on the weather data, it's a beautiful day in San Francisco with sunny skies and 75°F temperature.")
+
+        mock_final_response = Mock()
+        mock_final_response.content = [mock_text_block]
+        mock_final_response.usage = Mock()
+        mock_final_response.usage.input_tokens = 150
+        mock_final_response.usage.output_tokens = 75
+
+        # Configure mock to return different responses on successive calls
+        mock_create.side_effect = [mock_initial_response, mock_final_response]
+
+        # Test the call
+        messages = [{"role": "user", "content": "What's the weather like in San Francisco?"}]
+        result = completion.call(
+            messages=messages,
+            available_functions=available_functions
+        )
+
+        # Verify the result contains the final response
+        assert "beautiful day in San Francisco" in result
+        assert "sunny skies" in result
+        assert "75°F" in result
+
+        # Verify that two API calls were made (initial + follow-up)
+        assert mock_create.call_count == 2
+
+        # Verify the second call includes tool results
+        second_call_args = mock_create.call_args_list[1][1]  # kwargs of second call
+        messages_in_second_call = second_call_args["messages"]
+
+        # Should have original user message + assistant tool use + user tool result
+        assert len(messages_in_second_call) == 3
+        assert messages_in_second_call[0]["role"] == "user"
+        assert messages_in_second_call[1]["role"] == "assistant"
+        assert messages_in_second_call[2]["role"] == "user"
+
+        # Verify tool result format
+        tool_result = messages_in_second_call[2]["content"][0]
+        assert tool_result["type"] == "tool_result"
+        assert tool_result["tool_use_id"] == "tool_123"
+        assert "sunny and 75°F" in tool_result["content"]
+
+
+def test_anthropic_completion_module_is_imported():
+    """
+    Test that the completion module is properly imported when using Anthropic provider
+    """
+    module_name = "crewai.llms.providers.anthropic.completion"
+
+    # Remove module from cache if it exists
+    if module_name in sys.modules:
+        del sys.modules[module_name]
+
+    # Create LLM instance - this should trigger the import
+    LLM(model="anthropic/claude-3-5-sonnet-20241022")
+
+    # Verify the module was imported
+    assert module_name in sys.modules
+    completion_mod = sys.modules[module_name]
+    assert isinstance(completion_mod, types.ModuleType)
+
+    # Verify the class exists in the module
+    assert hasattr(completion_mod, 'AnthropicCompletion')
+
+
+def test_native_anthropic_raises_error_when_initialization_fails():
+    """
+    Test that LLM raises ImportError when native Anthropic completion fails to initialize.
+    This ensures we don't silently fall back when there's a configuration issue.
+    """
+    # Mock the _get_native_provider to return a failing class
+    with patch('crewai.llm.LLM._get_native_provider') as mock_get_provider:
+
+        class FailingCompletion:
+            def __init__(self, *args, **kwargs):
+                raise Exception("Native Anthropic SDK failed")
+
+        mock_get_provider.return_value = FailingCompletion
+
+        # This should raise ImportError, not fall back to LiteLLM
+        with pytest.raises(ImportError) as excinfo:
+            LLM(model="anthropic/claude-3-5-sonnet-20241022")
+
+        assert "Error importing native provider" in str(excinfo.value)
+        assert "Native Anthropic SDK failed" in str(excinfo.value)
+
+
+def test_anthropic_completion_initialization_parameters():
+    """
+    Test that AnthropicCompletion is initialized with correct parameters
+    """
+    llm = LLM(
+        model="anthropic/claude-3-5-sonnet-20241022",
+        temperature=0.7,
+        max_tokens=2000,
+        top_p=0.9,
+        api_key="test-key"
+    )
+
+    from crewai.llms.providers.anthropic.completion import AnthropicCompletion
+    assert isinstance(llm, AnthropicCompletion)
+    assert llm.model == "claude-3-5-sonnet-20241022"
+    assert llm.temperature == 0.7
+    assert llm.max_tokens == 2000
+    assert llm.top_p == 0.9
+
+
+def test_anthropic_specific_parameters():
+    """
+    Test Anthropic-specific parameters like stop_sequences and streaming
+    """
+    llm = LLM(
+        model="anthropic/claude-3-5-sonnet-20241022",
+        stop_sequences=["Human:", "Assistant:"],
+        stream=True,
+        max_retries=5,
+        timeout=60
+    )
+
+    from crewai.llms.providers.anthropic.completion import AnthropicCompletion
+    assert isinstance(llm, AnthropicCompletion)
+    assert llm.stop_sequences == ["Human:", "Assistant:"]
+    assert llm.stream == True
+    assert llm.client.max_retries == 5
+    assert llm.client.timeout == 60
+
+
+def test_anthropic_completion_call():
+    """
+    Test that AnthropicCompletion call method works
+    """
+    llm = LLM(model="anthropic/claude-3-5-sonnet-20241022")
+
+    # Mock the call method on the instance
+    with patch.object(llm, 'call', return_value="Hello! I'm Claude, ready to help.") as mock_call:
+        result = llm.call("Hello, how are you?")
+
+        assert result == "Hello! I'm Claude, ready to help."
+        mock_call.assert_called_once_with("Hello, how are you?")
+
+
+def test_anthropic_completion_called_during_crew_execution():
+    """
+    Test that AnthropicCompletion.call is actually invoked when running a crew
+    """
+    # Create the LLM instance first
+    anthropic_llm = LLM(model="anthropic/claude-3-5-sonnet-20241022")
+
+    # Mock the call method on the specific instance
+    with patch.object(anthropic_llm, 'call', return_value="Tokyo has 14 million people.") as mock_call:
+
+        # Create agent with explicit LLM configuration
+        agent = Agent(
+            role="Research Assistant",
+            goal="Find population info",
+            backstory="You research populations.",
+            llm=anthropic_llm,
+        )
+
+        task = Task(
+            description="Find Tokyo population",
+            expected_output="Population number",
+            agent=agent,
+        )
+
+        crew = Crew(agents=[agent], tasks=[task])
+        result = crew.kickoff()
+
+        # Verify mock was called
+        assert mock_call.called
+        assert "14 million" in str(result)
+
+
+def test_anthropic_completion_call_arguments():
+    """
+    Test that AnthropicCompletion.call is invoked with correct arguments
+    """
+    # Create LLM instance first
+    anthropic_llm = LLM(model="anthropic/claude-3-5-sonnet-20241022")
+
+    # Mock the instance method
+    with patch.object(anthropic_llm, 'call') as mock_call:
+        mock_call.return_value = "Task completed successfully."
+
+        agent = Agent(
+            role="Test Agent",
+            goal="Complete a simple task",
+            backstory="You are a test agent.",
+            llm=anthropic_llm  # Use same instance
+        )
+
+        task = Task(
+            description="Say hello world",
+            expected_output="Hello world",
+            agent=agent,
+        )
+
+        crew = Crew(agents=[agent], tasks=[task])
+        crew.kickoff()
+
+        # Verify call was made
+        assert mock_call.called
+
+        # Check the arguments passed to the call method
+        call_args = mock_call.call_args
+        assert call_args is not None
+
+        # The first argument should be the messages
+        messages = call_args[0][0]  # First positional argument
+        assert isinstance(messages, (str, list))
+
+        # Verify that the task description appears in the messages
+        if isinstance(messages, str):
+            assert "hello world" in messages.lower()
+        elif isinstance(messages, list):
+            message_content = str(messages).lower()
+            assert "hello world" in message_content
+
+
+def test_multiple_anthropic_calls_in_crew():
+    """
+    Test that AnthropicCompletion.call is invoked multiple times for multiple tasks
+    """
+    # Create LLM instance first
+    anthropic_llm = LLM(model="anthropic/claude-3-5-sonnet-20241022")
+
+    # Mock the instance method
+    with patch.object(anthropic_llm, 'call') as mock_call:
+        mock_call.return_value = "Task completed."
+
+        agent = Agent(
+            role="Multi-task Agent",
+            goal="Complete multiple tasks",
+            backstory="You can handle multiple tasks.",
+            llm=anthropic_llm  # Use same instance
+        )
+
+        task1 = Task(
+            description="First task",
+            expected_output="First result",
+            agent=agent,
+        )
+
+        task2 = Task(
+            description="Second task",
+            expected_output="Second result",
+            agent=agent,
+        )
+
+        crew = Crew(
+            agents=[agent],
+            tasks=[task1, task2]
+        )
+        crew.kickoff()
+
+        # Verify multiple calls were made
+        assert mock_call.call_count >= 2  # At least one call per task
+
+        # Verify each call had proper arguments
+        for call in mock_call.call_args_list:
+            assert len(call[0]) > 0  # Has positional arguments
+            messages = call[0][0]
+            assert messages is not None
+
+
+def test_anthropic_completion_with_tools():
+    """
+    Test that AnthropicCompletion.call is invoked with tools when agent has tools
+    """
+    from crewai.tools import tool
+
+    @tool
+    def sample_tool(query: str) -> str:
+        """A sample tool for testing"""
+        return f"Tool result for: {query}"
+
+    # Create LLM instance first
+    anthropic_llm = LLM(model="anthropic/claude-3-5-sonnet-20241022")
+
+    # Mock the instance method
+    with patch.object(anthropic_llm, 'call') as mock_call:
+        mock_call.return_value = "Task completed with tools."
+
+        agent = Agent(
+            role="Tool User",
+            goal="Use tools to complete tasks",
+            backstory="You can use tools.",
+            llm=anthropic_llm,  # Use same instance
+            tools=[sample_tool]
+        )
+
+        task = Task(
+            description="Use the sample tool",
+            expected_output="Tool usage result",
+            agent=agent,
+        )
+
+        crew = Crew(agents=[agent], tasks=[task])
+        crew.kickoff()
+
+        assert mock_call.called
+
+        call_args = mock_call.call_args
+        call_kwargs = call_args[1] if len(call_args) > 1 else {}
+
+        if 'tools' in call_kwargs:
+            assert call_kwargs['tools'] is not None
+            assert len(call_kwargs['tools']) > 0
+
+
+def test_anthropic_raises_error_when_model_not_supported():
+    """Test that AnthropicCompletion raises ValueError when model not supported"""
+
+    # Mock the Anthropic client to raise an error
+    with patch('crewai.llms.providers.anthropic.completion.Anthropic') as mock_anthropic_class:
+        mock_client = MagicMock()
+        mock_anthropic_class.return_value = mock_client
+
+        # Mock the error that Anthropic would raise for unsupported models
+        from anthropic import NotFoundError
+        mock_client.messages.create.side_effect = NotFoundError(
+            message="The model `model-doesnt-exist` does not exist",
+            response=MagicMock(),
+            body={}
+        )
+
+        llm = LLM(model="anthropic/model-doesnt-exist")
+
+        with pytest.raises(Exception):  # Should raise some error for unsupported model
+            llm.call("Hello")
+
+
+def test_anthropic_client_params_setup():
+    """
+    Test that client_params are properly merged with default client parameters
+    """
+    # Use only valid Anthropic client parameters
+    custom_client_params = {
+        "default_headers": {"X-Custom-Header": "test-value"},
+    }
+
+    with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
+        llm = LLM(
+            model="anthropic/claude-3-5-sonnet-20241022",
+            api_key="test-key",
+            base_url="https://custom-api.com",
+            timeout=45,
+            max_retries=5,
+            client_params=custom_client_params
+        )
+
+        from crewai.llms.providers.anthropic.completion import AnthropicCompletion
+        assert isinstance(llm, AnthropicCompletion)
+
+        assert llm.client_params == custom_client_params
+
+        merged_params = llm._get_client_params()
+
+        assert merged_params["api_key"] == "test-key"
+        assert merged_params["base_url"] == "https://custom-api.com"
+        assert merged_params["timeout"] == 45
+        assert merged_params["max_retries"] == 5
+
+        assert merged_params["default_headers"] == {"X-Custom-Header": "test-value"}
+
+
+def test_anthropic_client_params_override_defaults():
+    """
+    Test that client_params can override default client parameters
+    """
+    override_client_params = {
+        "timeout": 120,  # Override the timeout parameter
+        "max_retries": 10,  # Override the max_retries parameter
+        "default_headers": {"X-Override": "true"}  # Valid custom parameter
+    }
+
+    with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
+        llm = LLM(
+            model="anthropic/claude-3-5-sonnet-20241022",
+            api_key="test-key",
+            timeout=30,
+            max_retries=3,
+            client_params=override_client_params
+        )
+
+        # Verify this is actually AnthropicCompletion, not LiteLLM fallback
+        from crewai.llms.providers.anthropic.completion import AnthropicCompletion
+        assert isinstance(llm, AnthropicCompletion)
+
+        merged_params = llm._get_client_params()
+
+        # client_params should override the individual parameters
+        assert merged_params["timeout"] == 120
+        assert merged_params["max_retries"] == 10
+        assert merged_params["default_headers"] == {"X-Override": "true"}
+
+
+def test_anthropic_client_params_none():
+    """
+    Test that client_params=None works correctly (no additional parameters)
+    """
+    with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
+        llm = LLM(
+            model="anthropic/claude-3-5-sonnet-20241022",
+            api_key="test-key",
+            base_url="https://api.anthropic.com",
+            timeout=60,
+            max_retries=2,
+            client_params=None
+        )
+
+        from crewai.llms.providers.anthropic.completion import AnthropicCompletion
+        assert isinstance(llm, AnthropicCompletion)
+
+        assert llm.client_params is None
+
+        merged_params = llm._get_client_params()
+
+        expected_keys = {"api_key", "base_url", "timeout", "max_retries"}
+        assert set(merged_params.keys()) == expected_keys
+
+        # Fixed assertions - all should be inside the with block and use correct values
+        assert merged_params["api_key"] == "test-key"  # Not "test-anthropic-key"
+        assert merged_params["base_url"] == "https://api.anthropic.com"
+        assert merged_params["timeout"] == 60
+        assert merged_params["max_retries"] == 2
+
+
+def test_anthropic_client_params_empty_dict():
+    """
+    Test that client_params={} works correctly (empty additional parameters)
+    """
+    with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-key"}):
+        llm = LLM(
+            model="anthropic/claude-3-5-sonnet-20241022",
+            api_key="test-key",
+            client_params={}
+        )
+
+        from crewai.llms.providers.anthropic.completion import AnthropicCompletion
+        assert isinstance(llm, AnthropicCompletion)
+
+        assert llm.client_params == {}
+
+        merged_params = llm._get_client_params()
+
+        assert "api_key" in merged_params
+        assert merged_params["api_key"] == "test-key"
+
+
+def test_anthropic_model_detection():
+    """
+    Test that various Anthropic model formats are properly detected
+    """
+    # Test Anthropic model naming patterns that actually work with provider detection
+    anthropic_test_cases = [
+        "anthropic/claude-3-5-sonnet-20241022",
+        "claude/claude-3-5-sonnet-20241022"
+    ]
+
+    for model_name in anthropic_test_cases:
+        llm = LLM(model=model_name)
+        from crewai.llms.providers.anthropic.completion import AnthropicCompletion
+        assert isinstance(llm, AnthropicCompletion), f"Failed for model: {model_name}"
+
+
+def test_anthropic_supports_stop_words():
+    """
+    Test that Anthropic models support stop sequences
+    """
+    llm = LLM(model="anthropic/claude-3-5-sonnet-20241022")
+    assert llm.supports_stop_words() == True
+
+
+def test_anthropic_context_window_size():
+    """
+    Test that Anthropic models return correct context window sizes
+    """
+    llm = LLM(model="anthropic/claude-3-5-sonnet-20241022")
+    context_size = llm.get_context_window_size()
+
+    # Should return a reasonable context window size (Claude 3.5 has 200k tokens)
+    assert context_size > 100000  # Should be substantial
+    assert context_size <= 200000  # But not exceed the actual limit
+
+
+def test_anthropic_message_formatting():
+    """
+    Test that messages are properly formatted for Anthropic API
+    """
+    llm = LLM(model="anthropic/claude-3-5-sonnet-20241022")
+
+    # Test message formatting
+    test_messages = [
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Hello"},
+        {"role": "assistant", "content": "Hi there!"},
+        {"role": "user", "content": "How are you?"}
+    ]
+
+    formatted_messages, system_message = llm._format_messages_for_anthropic(test_messages)
+
+    # System message should be extracted
+    assert system_message == "You are a helpful assistant."
+
+    # Remaining messages should start with user
+    assert formatted_messages[0]["role"] == "user"
+    assert len(formatted_messages) >= 3  # Should have user, assistant, user messages
+
+
+def test_anthropic_streaming_parameter():
+    """
+    Test that streaming parameter is properly handled
+    """
+    # Test non-streaming
+    llm_no_stream = LLM(model="anthropic/claude-3-5-sonnet-20241022", stream=False)
+    assert llm_no_stream.stream == False
+
+    # Test streaming
+    llm_stream = LLM(model="anthropic/claude-3-5-sonnet-20241022", stream=True)
+    assert llm_stream.stream == True
+
+
+def test_anthropic_tool_conversion():
+    """
+    Test that tools are properly converted to Anthropic format
+    """
+    llm = LLM(model="anthropic/claude-3-5-sonnet-20241022")
+
+    # Mock tool in CrewAI format
+    crewai_tools = [{
+        "type": "function",
+        "function": {
+            "name": "test_tool",
+            "description": "A test tool",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "query": {"type": "string", "description": "Search query"}
+                },
+                "required": ["query"]
+            }
+        }
+    }]
+
+    # Test tool conversion
+    anthropic_tools = llm._convert_tools_for_interference(crewai_tools)
+
+    assert len(anthropic_tools) == 1
+    assert anthropic_tools[0]["name"] == "test_tool"
+    assert anthropic_tools[0]["description"] == "A test tool"
+    assert "input_schema" in anthropic_tools[0]
+
+
+def test_anthropic_environment_variable_api_key():
+    """
+    Test that Anthropic API key is properly loaded from environment
+    """
+    with patch.dict(os.environ, {"ANTHROPIC_API_KEY": "test-anthropic-key"}):
+        llm = LLM(model="anthropic/claude-3-5-sonnet-20241022")
+
+        assert llm.client is not None
+        assert hasattr(llm.client, 'messages')
+
+
+def test_anthropic_token_usage_tracking():
+    """
+    Test that token usage is properly tracked for Anthropic responses
+    """
+    llm = LLM(model="anthropic/claude-3-5-sonnet-20241022")
+
+    # Mock the Anthropic response with usage information
+    with patch.object(llm.client.messages, 'create') as mock_create:
+        mock_response = MagicMock()
+        mock_response.content = [MagicMock(text="test response")]
+        mock_response.usage = MagicMock(input_tokens=50, output_tokens=25)
+        mock_create.return_value = mock_response
+
+        result = llm.call("Hello")
+
+        # Verify the response
+        assert result == "test response"
+
+        # Verify token usage was extracted
+        usage = llm._extract_anthropic_token_usage(mock_response)
+        assert usage["input_tokens"] == 50
+        assert usage["output_tokens"] == 25
+        assert usage["total_tokens"] == 75
--- a/lib/crewai/tests/llms/azure/init.py
+++ b/lib/crewai/tests/llms/azure/init.py
@@ -0,0 +1,3 @@
+# Azure LLM tests
+
+
--- a/lib/crewai/tests/llms/azure/test_azure.py
+++ b/lib/crewai/tests/llms/azure/test_azure.py
--- a/lib/crewai/tests/llms/bedrock/test_bedrock.py
+++ b/lib/crewai/tests/llms/bedrock/test_bedrock.py
@@ -0,0 +1,738 @@
+import os
+import sys
+import types
+from unittest.mock import patch, MagicMock
+import pytest
+
+from crewai.llm import LLM
+from crewai.crew import Crew
+from crewai.agent import Agent
+from crewai.task import Task
+
+
+@pytest.fixture(autouse=True)
+def mock_aws_credentials():
+    """Automatically mock AWS credentials and boto3 Session for all tests in this module."""
+    with patch.dict(os.environ, {
+        "AWS_ACCESS_KEY_ID": "test-access-key",
+        "AWS_SECRET_ACCESS_KEY": "test-secret-key",
+        "AWS_DEFAULT_REGION": "us-east-1"
+    }):
+        # Mock boto3 Session to prevent actual AWS connections
+        with patch('crewai.llms.providers.bedrock.completion.Session') as mock_session_class:
+            # Create mock session instance
+            mock_session_instance = MagicMock()
+            mock_client = MagicMock()
+
+            # Set up default mock responses to prevent hanging
+            default_response = {
+                'output': {
+                    'message': {
+                        'role': 'assistant',
+                        'content': [
+                            {'text': 'Test response'}
+                        ]
+                    }
+                },
+                'usage': {
+                    'inputTokens': 10,
+                    'outputTokens': 5,
+                    'totalTokens': 15
+                }
+            }
+            mock_client.converse.return_value = default_response
+            mock_client.converse_stream.return_value = {'stream': []}
+
+            # Configure the mock session instance to return the mock client
+            mock_session_instance.client.return_value = mock_client
+
+            # Configure the mock Session class to return the mock session instance
+            mock_session_class.return_value = mock_session_instance
+
+            yield mock_session_class, mock_client
+
+
+def test_bedrock_completion_is_used_when_bedrock_provider():
+    """
+    Test that BedrockCompletion from completion.py is used when LLM uses provider 'bedrock'
+    """
+    llm = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+
+    assert llm.__class__.__name__ == "BedrockCompletion"
+    assert llm.provider == "bedrock"
+    assert llm.model == "anthropic.claude-3-5-sonnet-20241022-v2:0"
+
+
+def test_bedrock_completion_module_is_imported():
+    """
+    Test that the completion module is properly imported when using Bedrock provider
+    """
+    module_name = "crewai.llms.providers.bedrock.completion"
+
+    # Remove module from cache if it exists
+    if module_name in sys.modules:
+        del sys.modules[module_name]
+
+    # Create LLM instance - this should trigger the import
+    LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+
+    # Verify the module was imported
+    assert module_name in sys.modules
+    completion_mod = sys.modules[module_name]
+    assert isinstance(completion_mod, types.ModuleType)
+
+    # Verify the class exists in the module
+    assert hasattr(completion_mod, 'BedrockCompletion')
+
+
+def test_native_bedrock_raises_error_when_initialization_fails():
+    """
+    Test that LLM raises ImportError when native Bedrock completion fails.
+
+    With the new behavior, when a native provider is in SUPPORTED_NATIVE_PROVIDERS
+    but fails to instantiate, we raise an ImportError instead of silently falling back.
+    This provides clearer error messages to users about missing dependencies.
+    """
+    # Mock the _get_native_provider to return a failing class
+    with patch('crewai.llm.LLM._get_native_provider') as mock_get_provider:
+
+        class FailingCompletion:
+            def __init__(self, *args, **kwargs):
+                raise Exception("Native AWS Bedrock SDK failed")
+
+        mock_get_provider.return_value = FailingCompletion
+
+        # This should raise ImportError with clear message
+        with pytest.raises(ImportError) as excinfo:
+            LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+
+        # Verify the error message is helpful
+        assert "Error importing native provider" in str(excinfo.value)
+        assert "Native AWS Bedrock SDK failed" in str(excinfo.value)
+
+
+def test_bedrock_completion_initialization_parameters():
+    """
+    Test that BedrockCompletion is initialized with correct parameters
+    """
+    llm = LLM(
+        model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
+        temperature=0.7,
+        max_tokens=2000,
+        top_p=0.9,
+        top_k=40,
+        region_name="us-west-2"
+    )
+
+    from crewai.llms.providers.bedrock.completion import BedrockCompletion
+    assert isinstance(llm, BedrockCompletion)
+    assert llm.model == "anthropic.claude-3-5-sonnet-20241022-v2:0"
+    assert llm.temperature == 0.7
+    assert llm.max_tokens == 2000
+    assert llm.top_p == 0.9
+    assert llm.top_k == 40
+    assert llm.region_name == "us-west-2"
+
+
+def test_bedrock_specific_parameters():
+    """
+    Test Bedrock-specific parameters like stop_sequences and streaming
+    """
+    llm = LLM(
+        model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
+        stop_sequences=["Human:", "Assistant:"],
+        stream=True,
+        region_name="us-east-1"
+    )
+
+    from crewai.llms.providers.bedrock.completion import BedrockCompletion
+    assert isinstance(llm, BedrockCompletion)
+    assert llm.stop_sequences == ["Human:", "Assistant:"]
+    assert llm.stream == True
+    assert llm.region_name == "us-east-1"
+
+
+def test_bedrock_completion_call():
+    """
+    Test that BedrockCompletion call method works
+    """
+    llm = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+
+    # Mock the call method on the instance
+    with patch.object(llm, 'call', return_value="Hello! I'm Claude on Bedrock, ready to help.") as mock_call:
+        result = llm.call("Hello, how are you?")
+
+        assert result == "Hello! I'm Claude on Bedrock, ready to help."
+        mock_call.assert_called_once_with("Hello, how are you?")
+
+
+def test_bedrock_completion_called_during_crew_execution():
+    """
+    Test that BedrockCompletion.call is actually invoked when running a crew
+    """
+    # Create the LLM instance first
+    bedrock_llm = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+
+    # Mock the call method on the specific instance
+    with patch.object(bedrock_llm, 'call', return_value="Tokyo has 14 million people.") as mock_call:
+
+        # Create agent with explicit LLM configuration
+        agent = Agent(
+            role="Research Assistant",
+            goal="Find population info",
+            backstory="You research populations.",
+            llm=bedrock_llm,
+        )
+
+        task = Task(
+            description="Find Tokyo population",
+            expected_output="Population number",
+            agent=agent,
+        )
+
+        crew = Crew(agents=[agent], tasks=[task])
+        result = crew.kickoff()
+
+        # Verify mock was called
+        assert mock_call.called
+        assert "14 million" in str(result)
+
+
+@pytest.mark.skip(reason="Crew execution test - may hang, needs investigation")
+def test_bedrock_completion_call_arguments():
+    """
+    Test that BedrockCompletion.call is invoked with correct arguments
+    """
+    # Create LLM instance first
+    bedrock_llm = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+
+    # Mock the instance method
+    with patch.object(bedrock_llm, 'call') as mock_call:
+        mock_call.return_value = "Task completed successfully."
+
+        agent = Agent(
+            role="Test Agent",
+            goal="Complete a simple task",
+            backstory="You are a test agent.",
+            llm=bedrock_llm  # Use same instance
+        )
+
+        task = Task(
+            description="Say hello world",
+            expected_output="Hello world",
+            agent=agent,
+        )
+
+        crew = Crew(agents=[agent], tasks=[task])
+        crew.kickoff()
+
+        # Verify call was made
+        assert mock_call.called
+
+        # Check the arguments passed to the call method
+        call_args = mock_call.call_args
+        assert call_args is not None
+
+        # The first argument should be the messages
+        messages = call_args[0][0]  # First positional argument
+        assert isinstance(messages, (str, list))
+
+        # Verify that the task description appears in the messages
+        if isinstance(messages, str):
+            assert "hello world" in messages.lower()
+        elif isinstance(messages, list):
+            message_content = str(messages).lower()
+            assert "hello world" in message_content
+
+
+def test_multiple_bedrock_calls_in_crew():
+    """
+    Test that BedrockCompletion.call is invoked multiple times for multiple tasks
+    """
+    # Create LLM instance first
+    bedrock_llm = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+
+    # Mock the instance method
+    with patch.object(bedrock_llm, 'call') as mock_call:
+        mock_call.return_value = "Task completed."
+
+        agent = Agent(
+            role="Multi-task Agent",
+            goal="Complete multiple tasks",
+            backstory="You can handle multiple tasks.",
+            llm=bedrock_llm  # Use same instance
+        )
+
+        task1 = Task(
+            description="First task",
+            expected_output="First result",
+            agent=agent,
+        )
+
+        task2 = Task(
+            description="Second task",
+            expected_output="Second result",
+            agent=agent,
+        )
+
+        crew = Crew(
+            agents=[agent],
+            tasks=[task1, task2]
+        )
+        crew.kickoff()
+
+        # Verify multiple calls were made
+        assert mock_call.call_count >= 2  # At least one call per task
+
+        # Verify each call had proper arguments
+        for call in mock_call.call_args_list:
+            assert len(call[0]) > 0  # Has positional arguments
+            messages = call[0][0]
+            assert messages is not None
+
+def test_bedrock_completion_with_tools():
+    """
+    Test that BedrockCompletion.call is invoked with tools when agent has tools
+    """
+    from crewai.tools import tool
+
+    @tool
+    def sample_tool(query: str) -> str:
+        """A sample tool for testing"""
+        return f"Tool result for: {query}"
+
+    # Create LLM instance first
+    bedrock_llm = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+
+    # Mock the instance method
+    with patch.object(bedrock_llm, 'call') as mock_call:
+        mock_call.return_value = "Task completed with tools."
+
+        agent = Agent(
+            role="Tool User",
+            goal="Use tools to complete tasks",
+            backstory="You can use tools.",
+            llm=bedrock_llm,  # Use same instance
+            tools=[sample_tool]
+        )
+
+        task = Task(
+            description="Use the sample tool",
+            expected_output="Tool usage result",
+            agent=agent,
+        )
+
+        crew = Crew(agents=[agent], tasks=[task])
+
+        crew.kickoff()
+
+        assert mock_call.called
+
+        call_args = mock_call.call_args
+        call_kwargs = call_args[1] if len(call_args) > 1 else {}
+
+        if 'tools' in call_kwargs:
+            assert call_kwargs['tools'] is not None
+            assert len(call_kwargs['tools']) > 0
+
+
+def test_bedrock_raises_error_when_model_not_found(mock_aws_credentials):
+    """Test that BedrockCompletion raises appropriate error when model not found"""
+    from botocore.exceptions import ClientError
+
+    # Get the mock client from the fixture
+    _, mock_client = mock_aws_credentials
+
+    error_response = {
+        'Error': {
+            'Code': 'ResourceNotFoundException',
+            'Message': 'Could not resolve the foundation model from the model identifier'
+        }
+    }
+    mock_client.converse.side_effect = ClientError(error_response, 'converse')
+
+    llm = LLM(model="bedrock/model-doesnt-exist")
+
+    with pytest.raises(Exception):  # Should raise some error for unsupported model
+        llm.call("Hello")
+
+
+def test_bedrock_aws_credentials_configuration():
+    """
+    Test that AWS credentials configuration works properly
+    """
+    # Test with environment variables
+    with patch.dict(os.environ, {
+        "AWS_ACCESS_KEY_ID": "test-access-key",
+        "AWS_SECRET_ACCESS_KEY": "test-secret-key",
+        "AWS_DEFAULT_REGION": "us-east-1"
+    }):
+        llm = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+
+        from crewai.llms.providers.bedrock.completion import BedrockCompletion
+        assert isinstance(llm, BedrockCompletion)
+        assert llm.region_name == "us-east-1"
+
+    # Test with explicit credentials
+    llm_explicit = LLM(
+        model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
+        aws_access_key_id="explicit-key",
+        aws_secret_access_key="explicit-secret",
+        region_name="us-west-2"
+    )
+    assert isinstance(llm_explicit, BedrockCompletion)
+    assert llm_explicit.region_name == "us-west-2"
+
+
+def test_bedrock_model_capabilities():
+    """
+    Test that model capabilities are correctly identified
+    """
+    # Test Claude model
+    llm_claude = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+    from crewai.llms.providers.bedrock.completion import BedrockCompletion
+    assert isinstance(llm_claude, BedrockCompletion)
+    assert llm_claude.is_claude_model == True
+    assert llm_claude.supports_tools == True
+
+    # Test other Bedrock model
+    llm_titan = LLM(model="bedrock/amazon.titan-text-express-v1")
+    assert isinstance(llm_titan, BedrockCompletion)
+    assert llm_titan.supports_tools == True
+
+
+def test_bedrock_inference_config():
+    """
+    Test that inference config is properly prepared
+    """
+    llm = LLM(
+        model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
+        temperature=0.7,
+        top_p=0.9,
+        top_k=40,
+        max_tokens=1000
+    )
+
+    from crewai.llms.providers.bedrock.completion import BedrockCompletion
+    assert isinstance(llm, BedrockCompletion)
+
+    # Test config preparation
+    config = llm._get_inference_config()
+
+    # Verify config has the expected parameters
+    assert 'temperature' in config
+    assert config['temperature'] == 0.7
+    assert 'topP' in config
+    assert config['topP'] == 0.9
+    assert 'maxTokens' in config
+    assert config['maxTokens'] == 1000
+    assert 'topK' in config
+    assert config['topK'] == 40
+
+
+def test_bedrock_model_detection():
+    """
+    Test that various Bedrock model formats are properly detected
+    """
+    # Test Bedrock model naming patterns
+    bedrock_test_cases = [
+        "bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
+        "bedrock/anthropic.claude-3-haiku-20240307-v1:0",
+        "bedrock/amazon.titan-text-express-v1",
+        "bedrock/meta.llama3-70b-instruct-v1:0"
+    ]
+
+    for model_name in bedrock_test_cases:
+        llm = LLM(model=model_name)
+        from crewai.llms.providers.bedrock.completion import BedrockCompletion
+        assert isinstance(llm, BedrockCompletion), f"Failed for model: {model_name}"
+
+
+def test_bedrock_supports_stop_words():
+    """
+    Test that Bedrock models support stop sequences
+    """
+    llm = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+    assert llm.supports_stop_words() == True
+
+
+def test_bedrock_context_window_size():
+    """
+    Test that Bedrock models return correct context window sizes
+    """
+    # Test Claude 3.5 Sonnet
+    llm_claude = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+    context_size_claude = llm_claude.get_context_window_size()
+    assert context_size_claude > 150000  # Should be substantial (200K tokens with ratio)
+
+    # Test Titan
+    llm_titan = LLM(model="bedrock/amazon.titan-text-express-v1")
+    context_size_titan = llm_titan.get_context_window_size()
+    assert context_size_titan > 5000  # Should have 8K context window
+
+
+def test_bedrock_message_formatting():
+    """
+    Test that messages are properly formatted for Bedrock Converse API
+    """
+    llm = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+
+    # Test message formatting
+    test_messages = [
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Hello"},
+        {"role": "assistant", "content": "Hi there!"},
+        {"role": "user", "content": "How are you?"}
+    ]
+
+    formatted_messages, system_message = llm._format_messages_for_converse(test_messages)
+
+    # System message should be extracted
+    assert system_message == "You are a helpful assistant."
+
+    # Remaining messages should be in Converse format
+    assert len(formatted_messages) >= 3  # Should have user, assistant, user messages
+
+    # First message should be user role
+    assert formatted_messages[0]["role"] == "user"
+    # Second should be assistant
+    assert formatted_messages[1]["role"] == "assistant"
+
+    # Messages should have content array with text
+    assert isinstance(formatted_messages[0]["content"], list)
+    assert "text" in formatted_messages[0]["content"][0]
+
+
+def test_bedrock_streaming_parameter():
+    """
+    Test that streaming parameter is properly handled
+    """
+    # Test non-streaming
+    llm_no_stream = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0", stream=False)
+    assert llm_no_stream.stream == False
+
+    # Test streaming
+    llm_stream = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0", stream=True)
+    assert llm_stream.stream == True
+
+
+def test_bedrock_tool_conversion():
+    """
+    Test that tools are properly converted to Bedrock Converse format
+    """
+    llm = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+
+    # Mock tool in CrewAI format
+    crewai_tools = [{
+        "type": "function",
+        "function": {
+            "name": "test_tool",
+            "description": "A test tool",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "query": {"type": "string", "description": "Search query"}
+                },
+                "required": ["query"]
+            }
+        }
+    }]
+
+    # Test tool conversion
+    bedrock_tools = llm._format_tools_for_converse(crewai_tools)
+
+    assert len(bedrock_tools) == 1
+    # Bedrock tools should have toolSpec structure
+    assert "toolSpec" in bedrock_tools[0]
+    assert bedrock_tools[0]["toolSpec"]["name"] == "test_tool"
+    assert bedrock_tools[0]["toolSpec"]["description"] == "A test tool"
+    assert "inputSchema" in bedrock_tools[0]["toolSpec"]
+
+
+def test_bedrock_environment_variable_credentials(mock_aws_credentials):
+    """
+    Test that AWS credentials are properly loaded from environment
+    """
+    mock_session_class, _ = mock_aws_credentials
+
+    # Reset the mock to clear any previous calls
+    mock_session_class.reset_mock()
+
+    with patch.dict(os.environ, {
+        "AWS_ACCESS_KEY_ID": "test-access-key-123",
+        "AWS_SECRET_ACCESS_KEY": "test-secret-key-456"
+    }):
+        llm = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+
+        # Verify Session was called with environment credentials
+        assert mock_session_class.called
+        # Get the most recent call - Session is called as Session(...)
+        call_kwargs = mock_session_class.call_args[1] if mock_session_class.call_args else {}
+        assert call_kwargs.get('aws_access_key_id') == "test-access-key-123"
+        assert call_kwargs.get('aws_secret_access_key') == "test-secret-key-456"
+
+
+def test_bedrock_token_usage_tracking():
+    """
+    Test that token usage is properly tracked for Bedrock responses
+    """
+    llm = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+
+    # Mock the Bedrock response with usage information
+    with patch.object(llm.client, 'converse') as mock_converse:
+        mock_response = {
+            'output': {
+                'message': {
+                    'role': 'assistant',
+                    'content': [
+                        {'text': 'test response'}
+                    ]
+                }
+            },
+            'usage': {
+                'inputTokens': 50,
+                'outputTokens': 25,
+                'totalTokens': 75
+            }
+        }
+        mock_converse.return_value = mock_response
+
+        result = llm.call("Hello")
+
+        # Verify the response
+        assert result == "test response"
+
+        # Verify token usage was tracked
+        assert llm._token_usage['prompt_tokens'] == 50
+        assert llm._token_usage['completion_tokens'] == 25
+        assert llm._token_usage['total_tokens'] == 75
+
+
+def test_bedrock_tool_use_conversation_flow():
+    """
+    Test that the Bedrock completion properly handles tool use conversation flow
+    """
+    from unittest.mock import Mock
+
+    # Create BedrockCompletion instance
+    llm = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+
+    # Mock tool function
+    def mock_weather_tool(location: str) -> str:
+        return f"The weather in {location} is sunny and 75°F"
+
+    available_functions = {"get_weather": mock_weather_tool}
+
+    # Mock the Bedrock client responses
+    with patch.object(llm.client, 'converse') as mock_converse:
+        # First response: tool use request
+        tool_use_response = {
+            'output': {
+                'message': {
+                    'role': 'assistant',
+                    'content': [
+                        {
+                            'toolUse': {
+                                'toolUseId': 'tool-123',
+                                'name': 'get_weather',
+                                'input': {'location': 'San Francisco'}
+                            }
+                        }
+                    ]
+                }
+            },
+            'usage': {
+                'inputTokens': 100,
+                'outputTokens': 50,
+                'totalTokens': 150
+            }
+        }
+
+        # Second response: final answer after tool execution
+        final_response = {
+            'output': {
+                'message': {
+                    'role': 'assistant',
+                    'content': [
+                        {'text': 'Based on the weather data, it is sunny and 75°F in San Francisco.'}
+                    ]
+                }
+            },
+            'usage': {
+                'inputTokens': 120,
+                'outputTokens': 30,
+                'totalTokens': 150
+            }
+        }
+
+        # Configure mock to return different responses on successive calls
+        mock_converse.side_effect = [tool_use_response, final_response]
+
+        # Test the call
+        messages = [{"role": "user", "content": "What's the weather like in San Francisco?"}]
+        result = llm.call(
+            messages=messages,
+            available_functions=available_functions
+        )
+
+        # Verify the final response contains the weather information
+        assert "sunny" in result.lower() or "75" in result
+
+        # Verify that the API was called twice (once for tool use, once for final answer)
+        assert mock_converse.call_count == 2
+
+
+def test_bedrock_handles_cohere_conversation_requirements():
+    """
+    Test that Bedrock properly handles Cohere model's requirement for user message at end
+    """
+    llm = LLM(model="bedrock/cohere.command-r-plus-v1:0")
+
+    # Test message formatting with conversation ending in assistant message
+    test_messages = [
+        {"role": "user", "content": "Hello"},
+        {"role": "assistant", "content": "Hi there!"}
+    ]
+
+    formatted_messages, system_message = llm._format_messages_for_converse(test_messages)
+
+    # For Cohere models, should add a user message at the end
+    assert formatted_messages[-1]["role"] == "user"
+    assert "continue" in formatted_messages[-1]["content"][0]["text"].lower()
+
+
+def test_bedrock_client_error_handling():
+    """
+    Test that Bedrock properly handles various AWS client errors
+    """
+    from botocore.exceptions import ClientError
+
+    llm = LLM(model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0")
+
+    # Test ValidationException
+    with patch.object(llm.client, 'converse') as mock_converse:
+        error_response = {
+            'Error': {
+                'Code': 'ValidationException',
+                'Message': 'Invalid request format'
+            }
+        }
+        mock_converse.side_effect = ClientError(error_response, 'converse')
+
+        with pytest.raises(ValueError) as exc_info:
+            llm.call("Hello")
+        assert "validation" in str(exc_info.value).lower()
+
+    # Test ThrottlingException
+    with patch.object(llm.client, 'converse') as mock_converse:
+        error_response = {
+            'Error': {
+                'Code': 'ThrottlingException',
+                'Message': 'Rate limit exceeded'
+            }
+        }
+        mock_converse.side_effect = ClientError(error_response, 'converse')
+
+        with pytest.raises(RuntimeError) as exc_info:
+            llm.call("Hello")
+        assert "throttled" in str(exc_info.value).lower()
--- a/lib/crewai/tests/llms/google/test_google.py
+++ b/lib/crewai/tests/llms/google/test_google.py
@@ -0,0 +1,650 @@
+import os
+import sys
+import types
+from unittest.mock import patch, MagicMock
+import pytest
+
+from crewai.llm import LLM
+from crewai.crew import Crew
+from crewai.agent import Agent
+from crewai.task import Task
+
+
+@pytest.fixture(autouse=True)
+def mock_google_api_key():
+    """Automatically mock GOOGLE_API_KEY for all tests in this module."""
+    with patch.dict(os.environ, {"GOOGLE_API_KEY": "test-key"}):
+        yield
+
+
+def test_gemini_completion_is_used_when_google_provider():
+    """
+    Test that GeminiCompletion from completion.py is used when LLM uses provider 'google'
+    """
+    llm = LLM(model="google/gemini-2.0-flash-001")
+
+    assert llm.__class__.__name__ == "GeminiCompletion"
+    assert llm.provider == "google"
+    assert llm.model == "gemini-2.0-flash-001"
+
+
+def test_gemini_completion_is_used_when_gemini_provider():
+    """
+    Test that GeminiCompletion is used when provider is 'gemini'
+    """
+    llm = LLM(model="gemini/gemini-2.0-flash-001")
+
+    from crewai.llms.providers.gemini.completion import GeminiCompletion
+    assert isinstance(llm, GeminiCompletion)
+    assert llm.provider == "gemini"
+    assert llm.model == "gemini-2.0-flash-001"
+
+
+
+
+def test_gemini_tool_use_conversation_flow():
+    """
+    Test that the Gemini completion properly handles tool use conversation flow
+    """
+    from unittest.mock import Mock, patch
+    from crewai.llms.providers.gemini.completion import GeminiCompletion
+
+    # Create GeminiCompletion instance
+    completion = GeminiCompletion(model="gemini-2.0-flash-001")
+
+    # Mock tool function
+    def mock_weather_tool(location: str) -> str:
+        return f"The weather in {location} is sunny and 75°F"
+
+    available_functions = {"get_weather": mock_weather_tool}
+
+    # Mock the Google Gemini client responses
+    with patch.object(completion.client.models, 'generate_content') as mock_generate:
+        # Mock function call in response
+        mock_function_call = Mock()
+        mock_function_call.name = "get_weather"
+        mock_function_call.args = {"location": "San Francisco"}
+
+        mock_part = Mock()
+        mock_part.function_call = mock_function_call
+
+        mock_content = Mock()
+        mock_content.parts = [mock_part]
+
+        mock_candidate = Mock()
+        mock_candidate.content = mock_content
+
+        mock_response = Mock()
+        mock_response.candidates = [mock_candidate]
+        mock_response.text = "Based on the weather data, it's a beautiful day in San Francisco with sunny skies and 75°F temperature."
+        mock_response.usage_metadata = Mock()
+        mock_response.usage_metadata.prompt_token_count = 100
+        mock_response.usage_metadata.candidates_token_count = 50
+        mock_response.usage_metadata.total_token_count = 150
+
+        mock_generate.return_value = mock_response
+
+        # Test the call
+        messages = [{"role": "user", "content": "What's the weather like in San Francisco?"}]
+        result = completion.call(
+            messages=messages,
+            available_functions=available_functions
+        )
+
+        # Verify the tool was executed and returned the result
+        assert result == "The weather in San Francisco is sunny and 75°F"
+
+        # Verify that the API was called
+        assert mock_generate.called
+
+
+def test_gemini_completion_module_is_imported():
+    """
+    Test that the completion module is properly imported when using Google provider
+    """
+    module_name = "crewai.llms.providers.gemini.completion"
+
+    # Remove module from cache if it exists
+    if module_name in sys.modules:
+        del sys.modules[module_name]
+
+    # Create LLM instance - this should trigger the import
+    LLM(model="google/gemini-2.0-flash-001")
+
+    # Verify the module was imported
+    assert module_name in sys.modules
+    completion_mod = sys.modules[module_name]
+    assert isinstance(completion_mod, types.ModuleType)
+
+    # Verify the class exists in the module
+    assert hasattr(completion_mod, 'GeminiCompletion')
+
+
+def test_native_gemini_raises_error_when_initialization_fails():
+    """
+    Test that LLM raises ImportError when native Gemini completion fails.
+
+    With the new behavior, when a native provider is in SUPPORTED_NATIVE_PROVIDERS
+    but fails to instantiate, we raise an ImportError instead of silently falling back.
+    This provides clearer error messages to users about missing dependencies.
+    """
+    # Mock the _get_native_provider to return a failing class
+    with patch('crewai.llm.LLM._get_native_provider') as mock_get_provider:
+
+        class FailingCompletion:
+            def __init__(self, *args, **kwargs):
+                raise Exception("Native Google Gen AI SDK failed")
+
+        mock_get_provider.return_value = FailingCompletion
+
+        # This should raise ImportError with clear message
+        with pytest.raises(ImportError) as excinfo:
+            LLM(model="google/gemini-2.0-flash-001")
+
+        # Verify the error message is helpful
+        assert "Error importing native provider" in str(excinfo.value)
+        assert "Native Google Gen AI SDK failed" in str(excinfo.value)
+
+
+def test_gemini_completion_initialization_parameters():
+    """
+    Test that GeminiCompletion is initialized with correct parameters
+    """
+    llm = LLM(
+        model="google/gemini-2.0-flash-001",
+        temperature=0.7,
+        max_output_tokens=2000,
+        top_p=0.9,
+        top_k=40,
+        api_key="test-key"
+    )
+
+    from crewai.llms.providers.gemini.completion import GeminiCompletion
+    assert isinstance(llm, GeminiCompletion)
+    assert llm.model == "gemini-2.0-flash-001"
+    assert llm.temperature == 0.7
+    assert llm.max_output_tokens == 2000
+    assert llm.top_p == 0.9
+    assert llm.top_k == 40
+
+
+def test_gemini_specific_parameters():
+    """
+    Test Gemini-specific parameters like stop_sequences, streaming, and safety settings
+    """
+    safety_settings = {
+        "HARM_CATEGORY_HARASSMENT": "BLOCK_MEDIUM_AND_ABOVE",
+        "HARM_CATEGORY_HATE_SPEECH": "BLOCK_MEDIUM_AND_ABOVE"
+    }
+
+    llm = LLM(
+        model="google/gemini-2.0-flash-001",
+        stop_sequences=["Human:", "Assistant:"],
+        stream=True,
+        safety_settings=safety_settings,
+        project="test-project",
+        location="us-central1"
+    )
+
+    from crewai.llms.providers.gemini.completion import GeminiCompletion
+    assert isinstance(llm, GeminiCompletion)
+    assert llm.stop_sequences == ["Human:", "Assistant:"]
+    assert llm.stream == True
+    assert llm.safety_settings == safety_settings
+    assert llm.project == "test-project"
+    assert llm.location == "us-central1"
+
+
+def test_gemini_completion_call():
+    """
+    Test that GeminiCompletion call method works
+    """
+    llm = LLM(model="google/gemini-2.0-flash-001")
+
+    # Mock the call method on the instance
+    with patch.object(llm, 'call', return_value="Hello! I'm Gemini, ready to help.") as mock_call:
+        result = llm.call("Hello, how are you?")
+
+        assert result == "Hello! I'm Gemini, ready to help."
+        mock_call.assert_called_once_with("Hello, how are you?")
+
+
+def test_gemini_completion_called_during_crew_execution():
+    """
+    Test that GeminiCompletion.call is actually invoked when running a crew
+    """
+    # Create the LLM instance first
+    gemini_llm = LLM(model="google/gemini-2.0-flash-001")
+
+    # Mock the call method on the specific instance
+    with patch.object(gemini_llm, 'call', return_value="Tokyo has 14 million people.") as mock_call:
+
+        # Create agent with explicit LLM configuration
+        agent = Agent(
+            role="Research Assistant",
+            goal="Find population info",
+            backstory="You research populations.",
+            llm=gemini_llm,
+        )
+
+        task = Task(
+            description="Find Tokyo population",
+            expected_output="Population number",
+            agent=agent,
+        )
+
+        crew = Crew(agents=[agent], tasks=[task])
+        result = crew.kickoff()
+
+        # Verify mock was called
+        assert mock_call.called
+        assert "14 million" in str(result)
+
+
+def test_gemini_completion_call_arguments():
+    """
+    Test that GeminiCompletion.call is invoked with correct arguments
+    """
+    # Create LLM instance first
+    gemini_llm = LLM(model="google/gemini-2.0-flash-001")
+
+    # Mock the instance method
+    with patch.object(gemini_llm, 'call') as mock_call:
+        mock_call.return_value = "Task completed successfully."
+
+        agent = Agent(
+            role="Test Agent",
+            goal="Complete a simple task",
+            backstory="You are a test agent.",
+            llm=gemini_llm  # Use same instance
+        )
+
+        task = Task(
+            description="Say hello world",
+            expected_output="Hello world",
+            agent=agent,
+        )
+
+        crew = Crew(agents=[agent], tasks=[task])
+        crew.kickoff()
+
+        # Verify call was made
+        assert mock_call.called
+
+        # Check the arguments passed to the call method
+        call_args = mock_call.call_args
+        assert call_args is not None
+
+        # The first argument should be the messages
+        messages = call_args[0][0]  # First positional argument
+        assert isinstance(messages, (str, list))
+
+        # Verify that the task description appears in the messages
+        if isinstance(messages, str):
+            assert "hello world" in messages.lower()
+        elif isinstance(messages, list):
+            message_content = str(messages).lower()
+            assert "hello world" in message_content
+
+
+def test_multiple_gemini_calls_in_crew():
+    """
+    Test that GeminiCompletion.call is invoked multiple times for multiple tasks
+    """
+    # Create LLM instance first
+    gemini_llm = LLM(model="google/gemini-2.0-flash-001")
+
+    # Mock the instance method
+    with patch.object(gemini_llm, 'call') as mock_call:
+        mock_call.return_value = "Task completed."
+
+        agent = Agent(
+            role="Multi-task Agent",
+            goal="Complete multiple tasks",
+            backstory="You can handle multiple tasks.",
+            llm=gemini_llm  # Use same instance
+        )
+
+        task1 = Task(
+            description="First task",
+            expected_output="First result",
+            agent=agent,
+        )
+
+        task2 = Task(
+            description="Second task",
+            expected_output="Second result",
+            agent=agent,
+        )
+
+        crew = Crew(
+            agents=[agent],
+            tasks=[task1, task2]
+        )
+        crew.kickoff()
+
+        # Verify multiple calls were made
+        assert mock_call.call_count >= 2  # At least one call per task
+
+        # Verify each call had proper arguments
+        for call in mock_call.call_args_list:
+            assert len(call[0]) > 0  # Has positional arguments
+            messages = call[0][0]
+            assert messages is not None
+
+
+def test_gemini_completion_with_tools():
+    """
+    Test that GeminiCompletion.call is invoked with tools when agent has tools
+    """
+    from crewai.tools import tool
+
+    @tool
+    def sample_tool(query: str) -> str:
+        """A sample tool for testing"""
+        return f"Tool result for: {query}"
+
+    # Create LLM instance first
+    gemini_llm = LLM(model="google/gemini-2.0-flash-001")
+
+    # Mock the instance method
+    with patch.object(gemini_llm, 'call') as mock_call:
+        mock_call.return_value = "Task completed with tools."
+
+        agent = Agent(
+            role="Tool User",
+            goal="Use tools to complete tasks",
+            backstory="You can use tools.",
+            llm=gemini_llm,  # Use same instance
+            tools=[sample_tool]
+        )
+
+        task = Task(
+            description="Use the sample tool",
+            expected_output="Tool usage result",
+            agent=agent,
+        )
+
+        crew = Crew(agents=[agent], tasks=[task])
+        crew.kickoff()
+
+        assert mock_call.called
+
+        call_args = mock_call.call_args
+        call_kwargs = call_args[1] if len(call_args) > 1 else {}
+
+        if 'tools' in call_kwargs:
+            assert call_kwargs['tools'] is not None
+            assert len(call_kwargs['tools']) > 0
+
+
+def test_gemini_raises_error_when_model_not_supported():
+    """Test that GeminiCompletion raises ValueError when model not supported"""
+
+    # Mock the Google client to raise an error
+    with patch('crewai.llms.providers.gemini.completion.genai') as mock_genai:
+        mock_client = MagicMock()
+        mock_genai.Client.return_value = mock_client
+
+        from google.genai.errors import ClientError  # type: ignore
+
+        mock_response = MagicMock()
+        mock_response.body_segments = [{
+            'error': {
+                'code': 404,
+                'message': 'models/model-doesnt-exist is not found for API version v1beta, or is not supported for generateContent.',
+                'status': 'NOT_FOUND'
+            }
+        }]
+        mock_response.status_code = 404
+
+        mock_client.models.generate_content.side_effect = ClientError(404, mock_response)
+
+        llm = LLM(model="google/model-doesnt-exist")
+
+        with pytest.raises(Exception):  # Should raise some error for unsupported model
+            llm.call("Hello")
+
+
+def test_gemini_vertex_ai_setup():
+    """
+    Test that Vertex AI configuration is properly handled
+    """
+    with patch.dict(os.environ, {
+        "GOOGLE_CLOUD_PROJECT": "test-project",
+        "GOOGLE_CLOUD_LOCATION": "us-west1"
+    }):
+        llm = LLM(
+            model="google/gemini-2.0-flash-001",
+            project="test-project",
+            location="us-west1"
+        )
+
+        from crewai.llms.providers.gemini.completion import GeminiCompletion
+        assert isinstance(llm, GeminiCompletion)
+
+        assert llm.project == "test-project"
+        assert llm.location == "us-west1"
+
+
+def test_gemini_api_key_configuration():
+    """
+    Test that API key configuration works for both GOOGLE_API_KEY and GEMINI_API_KEY
+    """
+    # Test with GOOGLE_API_KEY
+    with patch.dict(os.environ, {"GOOGLE_API_KEY": "test-google-key"}):
+        llm = LLM(model="google/gemini-2.0-flash-001")
+
+        from crewai.llms.providers.gemini.completion import GeminiCompletion
+        assert isinstance(llm, GeminiCompletion)
+        assert llm.api_key == "test-google-key"
+
+    # Test with GEMINI_API_KEY
+    with patch.dict(os.environ, {"GEMINI_API_KEY": "test-gemini-key"}, clear=True):
+        llm = LLM(model="google/gemini-2.0-flash-001")
+
+        assert isinstance(llm, GeminiCompletion)
+        assert llm.api_key == "test-gemini-key"
+
+
+def test_gemini_model_capabilities():
+    """
+    Test that model capabilities are correctly identified
+    """
+    # Test Gemini 2.0 model
+    llm_2_0 = LLM(model="google/gemini-2.0-flash-001")
+    from crewai.llms.providers.gemini.completion import GeminiCompletion
+    assert isinstance(llm_2_0, GeminiCompletion)
+    assert llm_2_0.is_gemini_2 == True
+    assert llm_2_0.supports_tools == True
+
+    # Test Gemini 1.5 model
+    llm_1_5 = LLM(model="google/gemini-1.5-pro")
+    assert isinstance(llm_1_5, GeminiCompletion)
+    assert llm_1_5.is_gemini_1_5 == True
+    assert llm_1_5.supports_tools == True
+
+
+def test_gemini_generation_config():
+    """
+    Test that generation config is properly prepared
+    """
+    llm = LLM(
+        model="google/gemini-2.0-flash-001",
+        temperature=0.7,
+        top_p=0.9,
+        top_k=40,
+        max_output_tokens=1000
+    )
+
+    from crewai.llms.providers.gemini.completion import GeminiCompletion
+    assert isinstance(llm, GeminiCompletion)
+
+    # Test config preparation
+    config = llm._prepare_generation_config()
+
+    # Verify config has the expected parameters
+    assert hasattr(config, 'temperature') or 'temperature' in str(config)
+    assert hasattr(config, 'top_p') or 'top_p' in str(config)
+    assert hasattr(config, 'top_k') or 'top_k' in str(config)
+    assert hasattr(config, 'max_output_tokens') or 'max_output_tokens' in str(config)
+
+
+def test_gemini_model_detection():
+    """
+    Test that various Gemini model formats are properly detected
+    """
+    # Test Gemini model naming patterns that actually work with provider detection
+    gemini_test_cases = [
+        "google/gemini-2.0-flash-001",
+        "gemini/gemini-2.0-flash-001",
+        "google/gemini-1.5-pro",
+        "gemini/gemini-1.5-flash"
+    ]
+
+    for model_name in gemini_test_cases:
+        llm = LLM(model=model_name)
+        from crewai.llms.providers.gemini.completion import GeminiCompletion
+        assert isinstance(llm, GeminiCompletion), f"Failed for model: {model_name}"
+
+
+def test_gemini_supports_stop_words():
+    """
+    Test that Gemini models support stop sequences
+    """
+    llm = LLM(model="google/gemini-2.0-flash-001")
+    assert llm.supports_stop_words() == True
+
+
+def test_gemini_context_window_size():
+    """
+    Test that Gemini models return correct context window sizes
+    """
+    # Test Gemini 2.0 Flash
+    llm_2_0 = LLM(model="google/gemini-2.0-flash-001")
+    context_size_2_0 = llm_2_0.get_context_window_size()
+    assert context_size_2_0 > 500000  # Should be substantial (1M tokens)
+
+    # Test Gemini 1.5 Pro
+    llm_1_5 = LLM(model="google/gemini-1.5-pro")
+    context_size_1_5 = llm_1_5.get_context_window_size()
+    assert context_size_1_5 > 1000000  # Should be very large (2M tokens)
+
+
+def test_gemini_message_formatting():
+    """
+    Test that messages are properly formatted for Gemini API
+    """
+    llm = LLM(model="google/gemini-2.0-flash-001")
+
+    # Test message formatting
+    test_messages = [
+        {"role": "system", "content": "You are a helpful assistant."},
+        {"role": "user", "content": "Hello"},
+        {"role": "assistant", "content": "Hi there!"},
+        {"role": "user", "content": "How are you?"}
+    ]
+
+    formatted_contents, system_instruction = llm._format_messages_for_gemini(test_messages)
+
+    # System message should be extracted
+    assert system_instruction == "You are a helpful assistant."
+
+    # Remaining messages should be Content objects
+    assert len(formatted_contents) >= 3  # Should have user, model, user messages
+
+    # First content should be user role
+    assert formatted_contents[0].role == "user"
+    # Second should be model (converted from assistant)
+    assert formatted_contents[1].role == "model"
+
+
+def test_gemini_streaming_parameter():
+    """
+    Test that streaming parameter is properly handled
+    """
+    # Test non-streaming
+    llm_no_stream = LLM(model="google/gemini-2.0-flash-001", stream=False)
+    assert llm_no_stream.stream == False
+
+    # Test streaming
+    llm_stream = LLM(model="google/gemini-2.0-flash-001", stream=True)
+    assert llm_stream.stream == True
+
+
+def test_gemini_tool_conversion():
+    """
+    Test that tools are properly converted to Gemini format
+    """
+    llm = LLM(model="google/gemini-2.0-flash-001")
+
+    # Mock tool in CrewAI format
+    crewai_tools = [{
+        "type": "function",
+        "function": {
+            "name": "test_tool",
+            "description": "A test tool",
+            "parameters": {
+                "type": "object",
+                "properties": {
+                    "query": {"type": "string", "description": "Search query"}
+                },
+                "required": ["query"]
+            }
+        }
+    }]
+
+    # Test tool conversion
+    gemini_tools = llm._convert_tools_for_interference(crewai_tools)
+
+    assert len(gemini_tools) == 1
+    # Gemini tools are Tool objects with function_declarations
+    assert hasattr(gemini_tools[0], 'function_declarations')
+    assert len(gemini_tools[0].function_declarations) == 1
+
+    func_decl = gemini_tools[0].function_declarations[0]
+    assert func_decl.name == "test_tool"
+    assert func_decl.description == "A test tool"
+
+
+def test_gemini_environment_variable_api_key():
+    """
+    Test that Google API key is properly loaded from environment
+    """
+    with patch.dict(os.environ, {"GOOGLE_API_KEY": "test-google-key"}):
+        llm = LLM(model="google/gemini-2.0-flash-001")
+
+        assert llm.client is not None
+        assert hasattr(llm.client, 'models')
+        assert llm.api_key == "test-google-key"
+
+
+def test_gemini_token_usage_tracking():
+    """
+    Test that token usage is properly tracked for Gemini responses
+    """
+    llm = LLM(model="google/gemini-2.0-flash-001")
+
+    # Mock the Gemini response with usage information
+    with patch.object(llm.client.models, 'generate_content') as mock_generate:
+        mock_response = MagicMock()
+        mock_response.text = "test response"
+        mock_response.candidates = []
+        mock_response.usage_metadata = MagicMock(
+            prompt_token_count=50,
+            candidates_token_count=25,
+            total_token_count=75
+        )
+        mock_generate.return_value = mock_response
+
+        result = llm.call("Hello")
+
+        # Verify the response
+        assert result == "test response"
+
+        # Verify token usage was extracted
+        usage = llm._extract_token_usage(mock_response)
+        assert usage["prompt_token_count"] == 50
+        assert usage["candidates_token_count"] == 25
+        assert usage["total_token_count"] == 75
+        assert usage["total_tokens"] == 75
--- a/lib/crewai/tests/llms/openai/test_openai.py
+++ b/lib/crewai/tests/llms/openai/test_openai.py
@@ -0,0 +1,484 @@
+import os
+import sys
+import types
+from unittest.mock import patch, MagicMock
+import openai
+import pytest
+
+from crewai.llm import LLM
+from crewai.llms.providers.openai.completion import OpenAICompletion
+from crewai.crew import Crew
+from crewai.agent import Agent
+from crewai.task import Task
+from crewai.cli.constants import DEFAULT_LLM_MODEL
+
+def test_openai_completion_is_used_when_openai_provider():
+    """
+    Test that OpenAICompletion from completion.py is used when LLM uses provider 'openai'
+    """
+    llm = LLM(model="openai/gpt-4o")
+
+    assert llm.__class__.__name__ == "OpenAICompletion"
+    assert llm.provider == "openai"
+    assert llm.model == "gpt-4o"
+
+
+def test_openai_completion_is_used_when_no_provider_prefix():
+    """
+    Test that OpenAICompletion is used when no provider prefix is given (defaults to openai)
+    """
+    llm = LLM(model="gpt-4o")
+
+    from crewai.llms.providers.openai.completion import OpenAICompletion
+    assert isinstance(llm, OpenAICompletion)
+    assert llm.provider == "openai"
+    assert llm.model == "gpt-4o"
+
+@pytest.mark.vcr(filter_headers=["authorization"])
+def test_openai_is_default_provider_without_explicit_llm_set_on_agent():
+    """
+    Test that OpenAI is the default provider when no explicit LLM is set on the agent
+    """
+    agent = Agent(
+        role="Research Assistant",
+        goal="Find information about the population of Tokyo",
+        backstory="You are a helpful research assistant.",
+    )
+    task = Task(
+        description="Find information about the population of Tokyo",
+        expected_output="The population of Tokyo is 10 million",
+        agent=agent,
+    )
+    crew = Crew(agents=[agent], tasks=[task])
+    crew.kickoff()
+    assert crew.agents[0].llm.__class__.__name__ == "OpenAICompletion"
+    assert crew.agents[0].llm.model == DEFAULT_LLM_MODEL
+
+
+
+
+
+
+def test_openai_completion_module_is_imported():
+    """
+    Test that the completion module is properly imported when using OpenAI provider
+    """
+    module_name = "crewai.llms.providers.openai.completion"
+
+    # Remove module from cache if it exists
+    if module_name in sys.modules:
+        del sys.modules[module_name]
+
+    # Create LLM instance - this should trigger the import
+    LLM(model="openai/gpt-4o")
+
+    # Verify the module was imported
+    assert module_name in sys.modules
+    completion_mod = sys.modules[module_name]
+    assert isinstance(completion_mod, types.ModuleType)
+
+    # Verify the class exists in the module
+    assert hasattr(completion_mod, 'OpenAICompletion')
+
+
+def test_native_openai_raises_error_when_initialization_fails():
+    """
+    Test that LLM raises ImportError when native OpenAI completion fails to initialize.
+    This ensures we don't silently fall back when there's a configuration issue.
+    """
+    # Mock the _get_native_provider to return a failing class
+    with patch('crewai.llm.LLM._get_native_provider') as mock_get_provider:
+
+        class FailingCompletion:
+            def __init__(self, *args, **kwargs):
+                raise Exception("Native SDK failed")
+
+        mock_get_provider.return_value = FailingCompletion
+
+        # This should raise ImportError, not fall back to LiteLLM
+        with pytest.raises(ImportError) as excinfo:
+            LLM(model="openai/gpt-4o")
+
+        assert "Error importing native provider" in str(excinfo.value)
+        assert "Native SDK failed" in str(excinfo.value)
+
+
+def test_openai_completion_initialization_parameters():
+    """
+    Test that OpenAICompletion is initialized with correct parameters
+    """
+    llm = LLM(
+        model="openai/gpt-4o",
+        temperature=0.7,
+        max_tokens=1000,
+        api_key="test-key"
+    )
+
+    from crewai.llms.providers.openai.completion import OpenAICompletion
+    assert isinstance(llm, OpenAICompletion)
+    assert llm.model == "gpt-4o"
+    assert llm.temperature == 0.7
+    assert llm.max_tokens == 1000
+
+def test_openai_completion_call():
+    """
+    Test that OpenAICompletion call method works
+    """
+    llm = LLM(model="openai/gpt-4o")
+
+    # Mock the call method on the instance
+    with patch.object(llm, 'call', return_value="Hello! I'm ready to help.") as mock_call:
+        result = llm.call("Hello, how are you?")
+
+        assert result == "Hello! I'm ready to help."
+        mock_call.assert_called_once_with("Hello, how are you?")
+
+
+def test_openai_completion_called_during_crew_execution():
+    """
+    Test that OpenAICompletion.call is actually invoked when running a crew
+    """
+    # Create the LLM instance first
+    openai_llm = LLM(model="openai/gpt-4o")
+
+    # Mock the call method on the specific instance
+    with patch.object(openai_llm, 'call', return_value="Tokyo has 14 million people.") as mock_call:
+
+        # Create agent with explicit LLM configuration
+        agent = Agent(
+            role="Research Assistant",
+            goal="Find population info",
+            backstory="You research populations.",
+            llm=openai_llm,
+        )
+
+        task = Task(
+            description="Find Tokyo population",
+            expected_output="Population number",
+            agent=agent,
+        )
+
+        crew = Crew(agents=[agent], tasks=[task])
+        result = crew.kickoff()
+
+        # Verify mock was called
+        assert mock_call.called
+        assert "14 million" in str(result)
+
+
+def test_openai_completion_call_arguments():
+    """
+    Test that OpenAICompletion.call is invoked with correct arguments
+    """
+    # Create LLM instance first (like working tests)
+    openai_llm = LLM(model="openai/gpt-4o")
+
+    # Mock the instance method (like working tests)
+    with patch.object(openai_llm, 'call') as mock_call:
+        mock_call.return_value = "Task completed successfully."
+
+        agent = Agent(
+            role="Test Agent",
+            goal="Complete a simple task",
+            backstory="You are a test agent.",
+            llm=openai_llm  # Use same instance
+        )
+
+        task = Task(
+            description="Say hello world",
+            expected_output="Hello world",
+            agent=agent,
+        )
+
+        crew = Crew(agents=[agent], tasks=[task])
+        crew.kickoff()
+
+        # Verify call was made
+        assert mock_call.called
+
+        # Check the arguments passed to the call method
+        call_args = mock_call.call_args
+        assert call_args is not None
+
+        # The first argument should be the messages
+        messages = call_args[0][0]  # First positional argument
+        assert isinstance(messages, (str, list))
+
+        # Verify that the task description appears in the messages
+        if isinstance(messages, str):
+            assert "hello world" in messages.lower()
+        elif isinstance(messages, list):
+            message_content = str(messages).lower()
+            assert "hello world" in message_content
+
+
+def test_multiple_openai_calls_in_crew():
+    """
+    Test that OpenAICompletion.call is invoked multiple times for multiple tasks
+    """
+    # Create LLM instance first
+    openai_llm = LLM(model="openai/gpt-4o")
+
+    # Mock the instance method
+    with patch.object(openai_llm, 'call') as mock_call:
+        mock_call.return_value = "Task completed."
+
+        agent = Agent(
+            role="Multi-task Agent",
+            goal="Complete multiple tasks",
+            backstory="You can handle multiple tasks.",
+            llm=openai_llm  # Use same instance
+        )
+
+        task1 = Task(
+            description="First task",
+            expected_output="First result",
+            agent=agent,
+        )
+
+        task2 = Task(
+            description="Second task",
+            expected_output="Second result",
+            agent=agent,
+        )
+
+        crew = Crew(
+            agents=[agent],
+            tasks=[task1, task2]
+        )
+        crew.kickoff()
+
+        # Verify multiple calls were made
+        assert mock_call.call_count >= 2  # At least one call per task
+
+        # Verify each call had proper arguments
+        for call in mock_call.call_args_list:
+            assert len(call[0]) > 0  # Has positional arguments
+            messages = call[0][0]
+            assert messages is not None
+
+
+def test_openai_completion_with_tools():
+    """
+    Test that OpenAICompletion.call is invoked with tools when agent has tools
+    """
+    from crewai.tools import tool
+
+    @tool
+    def sample_tool(query: str) -> str:
+        """A sample tool for testing"""
+        return f"Tool result for: {query}"
+
+    # Create LLM instance first
+    openai_llm = LLM(model="openai/gpt-4o")
+
+    # Mock the instance method (not the class method)
+    with patch.object(openai_llm, 'call') as mock_call:
+        mock_call.return_value = "Task completed with tools."
+
+        agent = Agent(
+            role="Tool User",
+            goal="Use tools to complete tasks",
+            backstory="You can use tools.",
+            llm=openai_llm,  # Use same instance
+            tools=[sample_tool]
+        )
+
+        task = Task(
+            description="Use the sample tool",
+            expected_output="Tool usage result",
+            agent=agent,
+        )
+
+        crew = Crew(agents=[agent], tasks=[task])
+        crew.kickoff()
+
+        assert mock_call.called
+
+        call_args = mock_call.call_args
+        call_kwargs = call_args[1] if len(call_args) > 1 else {}
+
+        if 'tools' in call_kwargs:
+            assert call_kwargs['tools'] is not None
+            assert len(call_kwargs['tools']) > 0
+
+@pytest.mark.vcr(filter_headers=["authorization"])
+def test_openai_completion_call_returns_usage_metrics():
+    """
+    Test that OpenAICompletion.call returns usage metrics
+    """
+    agent = Agent(
+        role="Research Assistant",
+        goal="Find information about the population of Tokyo",
+        backstory="You are a helpful research assistant.",
+        llm=LLM(model="openai/gpt-4o"),
+        verbose=True,
+    )
+
+    task = Task(
+        description="Find information about the population of Tokyo",
+        expected_output="The population of Tokyo is 10 million",
+        agent=agent,
+    )
+
+    crew = Crew(agents=[agent], tasks=[task])
+    result = crew.kickoff()
+    assert result.token_usage is not None
+    assert result.token_usage.total_tokens == 289
+    assert result.token_usage.prompt_tokens == 173
+    assert result.token_usage.completion_tokens == 116
+    assert result.token_usage.successful_requests == 1
+    assert result.token_usage.cached_prompt_tokens == 0
+
+
+def test_openai_raises_error_when_model_not_supported():
+    """Test that OpenAICompletion raises ValueError when model not supported"""
+
+    with patch('crewai.llms.providers.openai.completion.OpenAI') as mock_openai_class:
+        mock_client = MagicMock()
+        mock_openai_class.return_value = mock_client
+
+        mock_client.chat.completions.create.side_effect = openai.NotFoundError(
+            message="The model `model-doesnt-exist` does not exist",
+            response=MagicMock(),
+            body={}
+        )
+
+        llm = LLM(model="openai/model-doesnt-exist")
+
+        with pytest.raises(ValueError, match="Model.*not found"):
+            llm.call("Hello")
+
+def test_openai_client_setup_with_extra_arguments():
+    """
+    Test that OpenAICompletion is initialized with correct parameters
+    """
+    llm = LLM(
+        model="openai/gpt-4o",
+        temperature=0.7,
+        max_tokens=1000,
+        top_p=0.5,
+        max_retries=3,
+        timeout=30
+    )
+
+    # Check that model parameters are stored on the LLM instance
+    assert llm.temperature == 0.7
+    assert llm.max_tokens == 1000
+    assert llm.top_p == 0.5
+
+    # Check that client parameters are properly configured
+    assert llm.client.max_retries == 3
+    assert llm.client.timeout == 30
+
+    # Test that parameters are properly used in API calls
+    with patch.object(llm.client.chat.completions, 'create') as mock_create:
+        mock_create.return_value = MagicMock(
+            choices=[MagicMock(message=MagicMock(content="test response", tool_calls=None))],
+            usage=MagicMock(prompt_tokens=10, completion_tokens=20, total_tokens=30)
+        )
+
+        llm.call("Hello")
+
+        # Verify the API was called with the right parameters
+        call_args = mock_create.call_args[1]  # keyword arguments
+        assert call_args['temperature'] == 0.7
+        assert call_args['max_tokens'] == 1000
+        assert call_args['top_p'] == 0.5
+        assert call_args['model'] == 'gpt-4o'
+
+def test_extra_arguments_are_passed_to_openai_completion():
+    """
+    Test that extra arguments are passed to OpenAICompletion
+    """
+    llm = LLM(model="openai/gpt-4o", temperature=0.7, max_tokens=1000, top_p=0.5, max_retries=3)
+
+    with patch.object(llm.client.chat.completions, 'create') as mock_create:
+        mock_create.return_value = MagicMock(
+            choices=[MagicMock(message=MagicMock(content="test response", tool_calls=None))],
+            usage=MagicMock(prompt_tokens=10, completion_tokens=20, total_tokens=30)
+        )
+
+        llm.call("Hello, how are you?")
+
+        assert mock_create.called
+        call_kwargs = mock_create.call_args[1]
+
+        assert call_kwargs['temperature'] == 0.7
+        assert call_kwargs['max_tokens'] == 1000
+        assert call_kwargs['top_p'] == 0.5
+        assert call_kwargs['model'] == 'gpt-4o'
+
+
+
+def test_openai_get_client_params_with_api_base():
+    """
+    Test that _get_client_params correctly converts api_base to base_url
+    """
+    llm = OpenAICompletion(
+        model="gpt-4o",
+        api_base="https://custom.openai.com/v1",
+    )
+    client_params = llm._get_client_params()
+    assert client_params["base_url"] == "https://custom.openai.com/v1"
+
+def test_openai_get_client_params_with_base_url_priority():
+    """
+    Test that base_url takes priority over api_base in _get_client_params
+    """
+    llm = OpenAICompletion(
+        model="gpt-4o",
+        base_url="https://priority.openai.com/v1",
+        api_base="https://fallback.openai.com/v1",
+    )
+    client_params = llm._get_client_params()
+    assert client_params["base_url"] == "https://priority.openai.com/v1"
+
+def test_openai_get_client_params_with_env_var():
+    """
+    Test that _get_client_params uses OPENAI_BASE_URL environment variable as fallback
+    """
+    with patch.dict(os.environ, {
+        "OPENAI_BASE_URL": "https://env.openai.com/v1",
+    }):
+        llm = OpenAICompletion(model="gpt-4o")
+        client_params = llm._get_client_params()
+        assert client_params["base_url"] == "https://env.openai.com/v1"
+
+def test_openai_get_client_params_priority_order():
+    """
+    Test the priority order: base_url > api_base > OPENAI_BASE_URL env var
+    """
+    with patch.dict(os.environ, {
+        "OPENAI_BASE_URL": "https://env.openai.com/v1",
+    }):
+        # Test base_url beats api_base and env var
+        llm1 = OpenAICompletion(
+            model="gpt-4o",
+            base_url="https://base-url.openai.com/v1",
+            api_base="https://api-base.openai.com/v1",
+        )
+        params1 = llm1._get_client_params()
+        assert params1["base_url"] == "https://base-url.openai.com/v1"
+
+        # Test api_base beats env var when base_url is None
+        llm2 = OpenAICompletion(
+            model="gpt-4o",
+            api_base="https://api-base.openai.com/v1",
+        )
+        params2 = llm2._get_client_params()
+        assert params2["base_url"] == "https://api-base.openai.com/v1"
+
+        # Test env var is used when both base_url and api_base are None
+        llm3 = OpenAICompletion(model="gpt-4o")
+        params3 = llm3._get_client_params()
+        assert params3["base_url"] == "https://env.openai.com/v1"
+
+def test_openai_get_client_params_no_base_url():
+    """
+    Test that _get_client_params works correctly when no base_url is specified
+    """
+    llm = OpenAICompletion(model="gpt-4o")
+    client_params = llm._get_client_params()
+    # When no base_url is provided, it should not be in the params (filtered out as None)
+    assert "base_url" not in client_params or client_params.get("base_url") is None