Lorenze/native inference sdks (#3619)

* ruff linted

* using native sdks with litellm fallback

* drop exa

* drop print on completion

* Refactor LLM and utility functions for type consistency

- Updated `max_tokens` parameter in `LLM` class to accept `float` in addition to `int`.
- Modified `create_llm` function to ensure consistent type hints and return types, now returning `LLM | BaseLLM | None`.
- Adjusted type hints for various parameters in `create_llm` and `_llm_via_environment_or_fallback` functions for improved clarity and type safety.
- Enhanced test cases to reflect changes in type handling and ensure proper instantiation of LLM instances.

* fix agent_tests

* fix litellm tests and usagemetrics fix

* drop print

* Refactor LLM event handling and improve test coverage

- Removed commented-out event emission for LLM call failures in `llm.py`.
- Added `from_agent` parameter to `CrewAgentExecutor` for better context in LLM responses.
- Enhanced test for LLM call failure to simulate OpenAI API failure and updated assertions for clarity.
- Updated agent and task ID assertions in tests to ensure they are consistently treated as strings.

* fix test_converter

* fixed tests/agents/test_agent.py

* Refactor LLM context length exception handling and improve provider integration

- Renamed `LLMContextLengthExceededException` to `LLMContextLengthExceededExceptionError` for clarity and consistency.
- Updated LLM class to pass the provider parameter correctly during initialization.
- Enhanced error handling in various LLM provider implementations to raise the new exception type.
- Adjusted tests to reflect the updated exception name and ensure proper error handling in context length scenarios.

* Enhance LLM context window handling across providers

- Introduced CONTEXT_WINDOW_USAGE_RATIO to adjust context window sizes dynamically for Anthropic, Azure, Gemini, and OpenAI LLMs.
- Added validation for context window sizes in Azure and Gemini providers to ensure they fall within acceptable limits.
- Updated context window size calculations to use the new ratio, improving consistency and adaptability across different models.
- Removed hardcoded context window sizes in favor of ratio-based calculations for better flexibility.

* fix test agent again

* fix test agent

* feat: add native LLM providers for Anthropic, Azure, and Gemini

- Introduced new completion implementations for Anthropic, Azure, and Gemini, integrating their respective SDKs.
- Added utility functions for tool validation and extraction to support function calling across LLM providers.
- Enhanced context window management and token usage extraction for each provider.
- Created a common utility module for shared functionality among LLM providers.

* chore: update dependencies and improve context management

- Removed direct dependency on `litellm` from the main dependencies and added it under extras for better modularity.
- Updated the `litellm` dependency specification to allow for greater flexibility in versioning.
- Refactored context length exception handling across various LLM providers to use a consistent error class.
- Enhanced platform-specific dependency markers for NVIDIA packages to ensure compatibility across different systems.

* refactor(tests): update LLM instantiation to include is_litellm flag in test cases

- Modified multiple test cases in test_llm.py to set the is_litellm parameter to True when instantiating the LLM class.
- This change ensures that the tests are aligned with the latest LLM configuration requirements and improves consistency across test scenarios.
- Adjusted relevant assertions and comments to reflect the updated LLM behavior.

* linter

* linted

* revert constants

* fix(tests): correct type hint in expected model description

- Updated the expected description in the test_generate_model_description_dict_field function to use 'Dict' instead of 'dict' for consistency with type hinting conventions.
- This change ensures that the test accurately reflects the expected output format for model descriptions.

* refactor(llm): enhance LLM instantiation and error handling

- Updated the LLM class to include validation for the model parameter, ensuring it is a non-empty string.
- Improved error handling by logging warnings when the native SDK fails, allowing for a fallback to LiteLLM.
- Adjusted the instantiation of LLM in test cases to consistently include the is_litellm flag, aligning with recent changes in LLM configuration.
- Modified relevant tests to reflect these updates, ensuring better coverage and accuracy in testing scenarios.

* fixed test

* refactor(llm): enhance token usage tracking and add copy methods

- Updated the LLM class to track token usage and log callbacks in streaming mode, improving monitoring capabilities.
- Introduced shallow and deep copy methods for the LLM instance, allowing for better management of LLM configurations and parameters.
- Adjusted test cases to instantiate LLM with the is_litellm flag, ensuring alignment with recent changes in LLM configuration.

* refactor(tests): reorganize imports and enhance error messages in test cases

- Cleaned up import statements in test_crew.py for better organization and readability.
- Enhanced error messages in test cases to use `re.escape` for improved regex matching, ensuring more robust error handling.
- Adjusted comments for clarity and consistency across test scenarios.
- Ensured that all necessary modules are imported correctly to avoid potential runtime issues.
This commit is contained in:
Lorenze Jay
2025-10-03 14:32:35 -07:00
committed by GitHub
parent 428810bd6f
commit 126b91eab3
77 changed files with 25026 additions and 493 deletions

View File

@@ -1,16 +1,14 @@
"""Test Agent creation and execution basic functionality."""
import json
from collections import defaultdict
from concurrent.futures import Future
from hashlib import md5
import json
import re
from unittest import mock
from unittest.mock import ANY, MagicMock, patch
import pydantic_core
import pytest
from crewai.agent import Agent
from crewai.agents import CacheHandler
from crewai.crew import Crew
from crewai.crews.crew_output import CrewOutput
from crewai.events.event_bus import crewai_event_bus
@@ -30,7 +28,6 @@ from crewai.events.types.memory_events import (
MemorySaveFailedEvent,
MemorySaveStartedEvent,
)
from crewai.flow import Flow, start
from crewai.knowledge.knowledge import Knowledge
from crewai.knowledge.source.string_knowledge_source import StringKnowledgeSource
from crewai.llm import LLM
@@ -46,6 +43,11 @@ from crewai.tasks.task_output import TaskOutput
from crewai.types.usage_metrics import UsageMetrics
from crewai.utilities.rpm_controller import RPMController
from crewai.utilities.task_output_storage_handler import TaskOutputStorageHandler
import pydantic_core
import pytest
from crewai.agents import CacheHandler
from crewai.flow import Flow, start
@pytest.fixture
@@ -199,7 +201,9 @@ def test_async_task_cannot_include_sequential_async_tasks_in_context(
# This should raise an error because task2 is async and has task1 in its context without a sync task in between
with pytest.raises(
ValueError,
match="Task 'Task 2' is asynchronous and cannot include other sequential asynchronous tasks in its context.",
match=re.escape(
"Task 'Task 2' is asynchronous and cannot include other sequential asynchronous tasks in its context."
),
):
Crew(tasks=[task1, task2, task3, task4, task5], agents=[researcher, writer])
@@ -237,7 +241,9 @@ def test_context_no_future_tasks(researcher, writer):
# This should raise an error because task1 has a context dependency on a future task (task4)
with pytest.raises(
ValueError,
match="Task 'Task 1' has a context dependency on a future task 'Task 4', which is not allowed.",
match=re.escape(
"Task 'Task 1' has a context dependency on a future task 'Task 4', which is not allowed."
),
):
Crew(tasks=[task1, task2, task3, task4], agents=[researcher, writer])
@@ -568,9 +574,10 @@ def test_crew_with_delegating_agents(ceo, writer):
@pytest.mark.vcr(filter_headers=["authorization"])
def test_crew_with_delegating_agents_should_not_override_task_tools(ceo, writer):
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
from crewai.tools import BaseTool
class TestToolInput(BaseModel):
"""Input schema for TestTool."""
@@ -627,9 +634,10 @@ def test_crew_with_delegating_agents_should_not_override_task_tools(ceo, writer)
@pytest.mark.vcr(filter_headers=["authorization"])
def test_crew_with_delegating_agents_should_not_override_agent_tools(ceo, writer):
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
from crewai.tools import BaseTool
class TestToolInput(BaseModel):
"""Input schema for TestTool."""
@@ -688,9 +696,10 @@ def test_crew_with_delegating_agents_should_not_override_agent_tools(ceo, writer
@pytest.mark.vcr(filter_headers=["authorization"])
def test_task_tools_override_agent_tools(researcher):
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
from crewai.tools import BaseTool
class TestToolInput(BaseModel):
"""Input schema for TestTool."""
@@ -744,9 +753,10 @@ def test_task_tools_override_agent_tools_with_allow_delegation(researcher, write
Test that task tools override agent tools while preserving delegation tools when allow_delegation=True
"""
from crewai.tools import BaseTool
from pydantic import BaseModel, Field
from crewai.tools import BaseTool
class TestToolInput(BaseModel):
query: str = Field(..., description="Query to process")
@@ -1005,7 +1015,7 @@ def test_crew_kickoff_streaming_usage_metrics():
role="{topic} Researcher",
goal="Express hot takes on {topic}.",
backstory="You have a lot of experience with {topic}.",
llm=LLM(model="gpt-4o", stream=True),
llm=LLM(model="gpt-4o", stream=True, is_litellm=True),
max_iter=3,
)
@@ -1773,7 +1783,7 @@ def test_hierarchical_kickoff_usage_metrics_include_manager(researcher):
agent=researcher, # *regular* agent
)
# ── 2. Stub out each agent's _token_process.get_summary() ───────────────────
# ── 2. Stub out each agent's token usage methods ───────────────────
researcher_metrics = UsageMetrics(
total_tokens=120, prompt_tokens=80, completion_tokens=40, successful_requests=2
)
@@ -1781,10 +1791,10 @@ def test_hierarchical_kickoff_usage_metrics_include_manager(researcher):
total_tokens=30, prompt_tokens=20, completion_tokens=10, successful_requests=1
)
# Replace the internal _token_process objects with simple mocks
researcher._token_process = MagicMock(
get_summary=MagicMock(return_value=researcher_metrics)
)
# Mock the LLM's get_token_usage_summary method for the researcher
researcher.llm.get_token_usage_summary = MagicMock(return_value=researcher_metrics)
# Mock the manager's _token_process since it uses the fallback path
manager._token_process = MagicMock(
get_summary=MagicMock(return_value=manager_metrics)
)
@@ -3334,7 +3344,9 @@ def test_replay_with_invalid_task_id():
):
with pytest.raises(
ValueError,
match="Task with id bf5b09c9-69bd-4eb8-be12-f9e5bae31c2d not found in the crew's tasks.",
match=re.escape(
"Task with id bf5b09c9-69bd-4eb8-be12-f9e5bae31c2d not found in the crew's tasks."
),
):
crew.replay("bf5b09c9-69bd-4eb8-be12-f9e5bae31c2d")
@@ -3813,10 +3825,11 @@ def test_task_tools_preserve_code_execution_tools():
"""
Test that task tools don't override code execution tools when allow_code_execution=True
"""
from crewai.tools import BaseTool
from crewai_tools import CodeInterpreterTool
from pydantic import BaseModel, Field
from crewai.tools import BaseTool
class TestToolInput(BaseModel):
"""Input schema for TestTool."""