Lorenze/native inference sdks (#3619)

* ruff linted

* using native sdks with litellm fallback

* drop exa

* drop print on completion

* Refactor LLM and utility functions for type consistency

- Updated `max_tokens` parameter in `LLM` class to accept `float` in addition to `int`.
- Modified `create_llm` function to ensure consistent type hints and return types, now returning `LLM | BaseLLM | None`.
- Adjusted type hints for various parameters in `create_llm` and `_llm_via_environment_or_fallback` functions for improved clarity and type safety.
- Enhanced test cases to reflect changes in type handling and ensure proper instantiation of LLM instances.

* fix agent_tests

* fix litellm tests and usagemetrics fix

* drop print

* Refactor LLM event handling and improve test coverage

- Removed commented-out event emission for LLM call failures in `llm.py`.
- Added `from_agent` parameter to `CrewAgentExecutor` for better context in LLM responses.
- Enhanced test for LLM call failure to simulate OpenAI API failure and updated assertions for clarity.
- Updated agent and task ID assertions in tests to ensure they are consistently treated as strings.

* fix test_converter

* fixed tests/agents/test_agent.py

* Refactor LLM context length exception handling and improve provider integration

- Renamed `LLMContextLengthExceededException` to `LLMContextLengthExceededExceptionError` for clarity and consistency.
- Updated LLM class to pass the provider parameter correctly during initialization.
- Enhanced error handling in various LLM provider implementations to raise the new exception type.
- Adjusted tests to reflect the updated exception name and ensure proper error handling in context length scenarios.

* Enhance LLM context window handling across providers

- Introduced CONTEXT_WINDOW_USAGE_RATIO to adjust context window sizes dynamically for Anthropic, Azure, Gemini, and OpenAI LLMs.
- Added validation for context window sizes in Azure and Gemini providers to ensure they fall within acceptable limits.
- Updated context window size calculations to use the new ratio, improving consistency and adaptability across different models.
- Removed hardcoded context window sizes in favor of ratio-based calculations for better flexibility.

* fix test agent again

* fix test agent

* feat: add native LLM providers for Anthropic, Azure, and Gemini

- Introduced new completion implementations for Anthropic, Azure, and Gemini, integrating their respective SDKs.
- Added utility functions for tool validation and extraction to support function calling across LLM providers.
- Enhanced context window management and token usage extraction for each provider.
- Created a common utility module for shared functionality among LLM providers.

* chore: update dependencies and improve context management

- Removed direct dependency on `litellm` from the main dependencies and added it under extras for better modularity.
- Updated the `litellm` dependency specification to allow for greater flexibility in versioning.
- Refactored context length exception handling across various LLM providers to use a consistent error class.
- Enhanced platform-specific dependency markers for NVIDIA packages to ensure compatibility across different systems.

* refactor(tests): update LLM instantiation to include is_litellm flag in test cases

- Modified multiple test cases in test_llm.py to set the is_litellm parameter to True when instantiating the LLM class.
- This change ensures that the tests are aligned with the latest LLM configuration requirements and improves consistency across test scenarios.
- Adjusted relevant assertions and comments to reflect the updated LLM behavior.

* linter

* linted

* revert constants

* fix(tests): correct type hint in expected model description

- Updated the expected description in the test_generate_model_description_dict_field function to use 'Dict' instead of 'dict' for consistency with type hinting conventions.
- This change ensures that the test accurately reflects the expected output format for model descriptions.

* refactor(llm): enhance LLM instantiation and error handling

- Updated the LLM class to include validation for the model parameter, ensuring it is a non-empty string.
- Improved error handling by logging warnings when the native SDK fails, allowing for a fallback to LiteLLM.
- Adjusted the instantiation of LLM in test cases to consistently include the is_litellm flag, aligning with recent changes in LLM configuration.
- Modified relevant tests to reflect these updates, ensuring better coverage and accuracy in testing scenarios.

* fixed test

* refactor(llm): enhance token usage tracking and add copy methods

- Updated the LLM class to track token usage and log callbacks in streaming mode, improving monitoring capabilities.
- Introduced shallow and deep copy methods for the LLM instance, allowing for better management of LLM configurations and parameters.
- Adjusted test cases to instantiate LLM with the is_litellm flag, ensuring alignment with recent changes in LLM configuration.

* refactor(tests): reorganize imports and enhance error messages in test cases

- Cleaned up import statements in test_crew.py for better organization and readability.
- Enhanced error messages in test cases to use `re.escape` for improved regex matching, ensuring more robust error handling.
- Adjusted comments for clarity and consistency across test scenarios.
- Ensured that all necessary modules are imported correctly to avoid potential runtime issues.
This commit is contained in:
Lorenze Jay
2025-10-03 14:32:35 -07:00
committed by GitHub
parent 428810bd6f
commit 126b91eab3
77 changed files with 25026 additions and 493 deletions

View File

@@ -1,14 +1,9 @@
"""Test Agent creation and execution basic functionality."""
# ruff: noqa: S106
import os
from unittest import mock
from unittest.mock import MagicMock, patch
import pytest
from crewai import Agent, Crew, Task
from crewai.agents.cache import CacheHandler
from crewai.agents.crew_agent_executor import AgentFinish, CrewAgentExecutor
from crewai.events.event_bus import crewai_event_bus
from crewai.events.types.tool_usage_events import ToolUsageFinishedEvent
@@ -17,12 +12,17 @@ from crewai.knowledge.knowledge_config import KnowledgeConfig
from crewai.knowledge.source.base_knowledge_source import BaseKnowledgeSource
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
from crewai.llm import LLM
from crewai.llms.base_llm import BaseLLM
from crewai.process import Process
from crewai.tools import tool
from crewai.tools.tool_calling import InstructorToolCalling
from crewai.tools.tool_usage import ToolUsage
from crewai.utilities import RPMController
from crewai.utilities.errors import AgentRepositoryError
import pytest
from crewai import Agent, Crew, Task
from crewai.agents.cache import CacheHandler
from crewai.tools import tool
from crewai.utilities import RPMController
def test_agent_llm_creation_with_env_vars():
@@ -40,7 +40,7 @@ def test_agent_llm_creation_with_env_vars():
agent = Agent(role="test role", goal="test goal", backstory="test backstory")
# Check if LLM is created correctly
assert isinstance(agent.llm, LLM)
assert isinstance(agent.llm, BaseLLM)
assert agent.llm.model == "gpt-4-turbo"
assert agent.llm.api_key == "test_api_key"
assert agent.llm.base_url == "https://test-api-base.com"
@@ -50,11 +50,18 @@ def test_agent_llm_creation_with_env_vars():
del os.environ["OPENAI_API_BASE"]
del os.environ["OPENAI_MODEL_NAME"]
if original_api_key:
os.environ["OPENAI_API_KEY"] = original_api_key
if original_api_base:
os.environ["OPENAI_API_BASE"] = original_api_base
if original_model_name:
os.environ["OPENAI_MODEL_NAME"] = original_model_name
# Create an agent without specifying LLM
agent = Agent(role="test role", goal="test goal", backstory="test backstory")
# Check if LLM is created correctly
assert isinstance(agent.llm, LLM)
assert isinstance(agent.llm, BaseLLM)
assert agent.llm.model != "gpt-4-turbo"
assert agent.llm.api_key != "test_api_key"
assert agent.llm.base_url != "https://test-api-base.com"
@@ -456,18 +463,30 @@ def test_agent_custom_max_iterations():
allow_delegation=False,
)
with patch.object(
LLM, "call", wraps=LLM("gpt-4o", stop=["\nObservation:"]).call
) as private_mock:
task = Task(
description="The final answer is 42. But don't give it yet, instead keep using the `get_final_answer` tool.",
expected_output="The final answer",
)
agent.execute_task(
task=task,
tools=[get_final_answer],
)
assert private_mock.call_count == 3
original_call = agent.llm.call
call_count = 0
def counting_call(*args, **kwargs):
nonlocal call_count
call_count += 1
return original_call(*args, **kwargs)
agent.llm.call = counting_call
task = Task(
description="The final answer is 42. But don't give it yet, instead keep using the `get_final_answer` tool.",
expected_output="The final answer",
)
result = agent.execute_task(
task=task,
tools=[get_final_answer],
)
assert result is not None
assert isinstance(result, str)
assert len(result) > 0
assert call_count > 0
assert call_count == 3
@pytest.mark.vcr(filter_headers=["authorization"])
@@ -888,9 +907,8 @@ def test_agent_function_calling_llm():
crew = Crew(agents=[agent1], tasks=tasks)
from unittest.mock import patch
import instructor
from crewai.tools.tool_usage import ToolUsage
import instructor
with (
patch.object(
@@ -1413,7 +1431,7 @@ def test_agent_with_llm():
llm=LLM(model="gpt-3.5-turbo", temperature=0.7),
)
assert isinstance(agent.llm, LLM)
assert isinstance(agent.llm, BaseLLM)
assert agent.llm.model == "gpt-3.5-turbo"
assert agent.llm.temperature == 0.7
@@ -1427,7 +1445,7 @@ def test_agent_with_custom_stop_words():
llm=LLM(model="gpt-3.5-turbo", stop=stop_words),
)
assert isinstance(agent.llm, LLM)
assert isinstance(agent.llm, BaseLLM)
assert set(agent.llm.stop) == set([*stop_words, "\nObservation:"])
assert all(word in agent.llm.stop for word in stop_words)
assert "\nObservation:" in agent.llm.stop
@@ -1441,10 +1459,12 @@ def test_agent_with_callbacks():
role="test role",
goal="test goal",
backstory="test backstory",
llm=LLM(model="gpt-3.5-turbo", callbacks=[dummy_callback]),
llm=LLM(model="gpt-3.5-turbo", callbacks=[dummy_callback], is_litellm=True),
)
assert isinstance(agent.llm, LLM)
assert isinstance(agent.llm, BaseLLM)
# All LLM implementations now support callbacks consistently
assert hasattr(agent.llm, "callbacks")
assert len(agent.llm.callbacks) == 1
assert agent.llm.callbacks[0] == dummy_callback
@@ -1463,7 +1483,7 @@ def test_agent_with_additional_kwargs():
),
)
assert isinstance(agent.llm, LLM)
assert isinstance(agent.llm, BaseLLM)
assert agent.llm.model == "gpt-3.5-turbo"
assert agent.llm.temperature == 0.8
assert agent.llm.top_p == 0.9
@@ -1580,40 +1600,40 @@ def test_agent_with_all_llm_attributes():
timeout=10,
temperature=0.7,
top_p=0.9,
n=1,
# n=1,
stop=["STOP", "END"],
max_tokens=100,
presence_penalty=0.1,
frequency_penalty=0.1,
logit_bias={50256: -100}, # Example: bias against the EOT token
# logit_bias={50256: -100}, # Example: bias against the EOT token
response_format={"type": "json_object"},
seed=42,
logprobs=True,
top_logprobs=5,
base_url="https://api.openai.com/v1",
api_version="2023-05-15",
# api_version="2023-05-15",
api_key="sk-your-api-key-here",
),
)
assert isinstance(agent.llm, LLM)
assert isinstance(agent.llm, BaseLLM)
assert agent.llm.model == "gpt-3.5-turbo"
assert agent.llm.timeout == 10
assert agent.llm.temperature == 0.7
assert agent.llm.top_p == 0.9
assert agent.llm.n == 1
# assert agent.llm.n == 1
assert set(agent.llm.stop) == set(["STOP", "END", "\nObservation:"])
assert all(word in agent.llm.stop for word in ["STOP", "END", "\nObservation:"])
assert agent.llm.max_tokens == 100
assert agent.llm.presence_penalty == 0.1
assert agent.llm.frequency_penalty == 0.1
assert agent.llm.logit_bias == {50256: -100}
# assert agent.llm.logit_bias == {50256: -100}
assert agent.llm.response_format == {"type": "json_object"}
assert agent.llm.seed == 42
assert agent.llm.logprobs
assert agent.llm.top_logprobs == 5
assert agent.llm.base_url == "https://api.openai.com/v1"
assert agent.llm.api_version == "2023-05-15"
# assert agent.llm.api_version == "2023-05-15"
assert agent.llm.api_key == "sk-your-api-key-here"
@@ -1982,7 +2002,7 @@ def test_agent_with_knowledge_sources_works_with_copy():
assert len(agent_copy.knowledge_sources) == 1
assert isinstance(agent_copy.knowledge_sources[0], StringKnowledgeSource)
assert agent_copy.knowledge_sources[0].content == content
assert isinstance(agent_copy.llm, LLM)
assert isinstance(agent_copy.llm, BaseLLM)
@pytest.mark.vcr(filter_headers=["authorization"])
@@ -2130,7 +2150,7 @@ def test_litellm_auth_error_handling():
role="test role",
goal="test goal",
backstory="test backstory",
llm=LLM(model="gpt-4"),
llm=LLM(model="gpt-4", is_litellm=True),
max_retry_limit=0, # Disable retries for authentication errors
)
@@ -2157,16 +2177,15 @@ def test_litellm_auth_error_handling():
def test_crew_agent_executor_litellm_auth_error():
"""Test that CrewAgentExecutor handles LiteLLM authentication errors by raising them."""
from litellm.exceptions import AuthenticationError
from crewai.agents.tools_handler import ToolsHandler
from litellm.exceptions import AuthenticationError
# Create an agent and executor
agent = Agent(
role="test role",
goal="test goal",
backstory="test backstory",
llm=LLM(model="gpt-4", api_key="invalid_api_key"),
llm=LLM(model="gpt-4", api_key="invalid_api_key", is_litellm=True),
)
task = Task(
description="Test task",
@@ -2224,7 +2243,7 @@ def test_litellm_anthropic_error_handling():
role="test role",
goal="test goal",
backstory="test backstory",
llm=LLM(model="claude-3.5-sonnet-20240620"),
llm=LLM(model="claude-3.5-sonnet-20240620", is_litellm=True),
max_retry_limit=0,
)

View File

@@ -3,16 +3,17 @@ from collections import defaultdict
from typing import cast
from unittest.mock import Mock, patch
import pytest
from crewai import LLM, Agent
from crewai.events.event_bus import crewai_event_bus
from crewai.events.types.agent_events import LiteAgentExecutionStartedEvent
from crewai.events.types.tool_usage_events import ToolUsageStartedEvent
from crewai.flow import Flow, start
from crewai.lite_agent import LiteAgent, LiteAgentOutput
from crewai.llms.base_llm import BaseLLM
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
import pytest
from crewai import LLM, Agent
from crewai.flow import Flow, start
from crewai.tools import BaseTool
# A simple test tool
@@ -197,10 +198,6 @@ def test_lite_agent_structured_output():
response_format=SimpleOutput,
)
print(f"\n=== Agent Result Type: {type(result)}")
print(f"=== Agent Result: {result}")
print(f"=== Pydantic: {result.pydantic}")
assert result.pydantic is not None, "Should return a Pydantic model"
output = cast(SimpleOutput, result.pydantic)
@@ -295,6 +292,17 @@ def test_sets_parent_flow_when_inside_flow():
mock_llm.call.return_value = "Test response"
mock_llm.stop = []
from crewai.types.usage_metrics import UsageMetrics
mock_usage_metrics = UsageMetrics(
total_tokens=100,
prompt_tokens=50,
completion_tokens=50,
cached_prompt_tokens=0,
successful_requests=1,
)
mock_llm.get_token_usage_summary.return_value = mock_usage_metrics
class MyFlow(Flow):
@start()
def start(self):