Lorenze/tracing v1 (#3279)

* initial setup * feat: enhance CrewKickoffCompletedEvent to include total token usage - Added total_tokens attribute to CrewKickoffCompletedEvent for better tracking of token usage during crew execution. - Updated Crew class to emit total token usage upon kickoff completion. - Removed obsolete context handler and execution context tracker files to streamline event handling. * cleanup * remove print statements for loggers * feat: add CrewAI base URL and improve logging in tracing - Introduced `CREWAI_BASE_URL` constant for easy access to the CrewAI application URL. - Replaced print statements with logging in the `TraceSender` class for better error tracking. - Enhanced the `TraceBatchManager` to provide default values for flow names and removed unnecessary comments. - Implemented singleton pattern in `TraceCollectionListener` to ensure a single instance is used. - Added a new test case to verify that the trace listener correctly collects events during crew execution. * clear * fix: update datetime serialization in tracing interfaces - Removed the 'Z' suffix from datetime serialization in TraceSender and TraceEvent to ensure consistent ISO format. - Added new test cases to validate the functionality of the TraceBatchManager and event collection during crew execution. - Introduced fixtures to clear event bus listeners before each test to maintain isolation. * test: enhance tracing tests with mock authentication token - Added a mock authentication token to the tracing tests to ensure proper setup and event collection. - Updated test methods to include the mock token, improving isolation and reliability of tests related to the TraceListener and BatchManager. - Ensured that the tests validate the correct behavior of event collection during crew execution. * test: refactor tracing tests to improve mock usage - Moved the mock authentication token patching inside the test class to enhance readability and maintainability. - Updated test methods to remove unnecessary mock parameters, streamlining the test signatures. - Ensured that the tests continue to validate the correct behavior of event collection during crew execution while improving isolation. * test: refactor tracing tests for improved mock usage and consistency - Moved mock authentication token patching into individual test methods for better clarity and maintainability. - Corrected the backstory string in the `Agent` instantiation to fix a typo. - Ensured that all tests validate the correct behavior of event collection during crew execution while enhancing isolation and readability. * test: add new tracing test for disabled trace listener - Introduced a new test case to verify that the trace listener does not make HTTP calls when tracing is disabled via environment variables. - Enhanced existing tests by mocking PlusAPI HTTP calls to avoid authentication and network requests, improving test isolation and reliability. - Updated the test setup to ensure proper initialization of the trace listener and its components during crew execution. * refactor: update LLM class to utilize new completion function and improve cost calculation - Replaced direct calls to `litellm.completion` with a new import for better clarity and maintainability. - Introduced a new optional attribute `completion_cost` in the LLM class to track the cost of completions. - Updated the handling of completion responses to ensure accurate cost calculations and improved error handling. - Removed outdated test cassettes for gemini models to streamline test suite and avoid redundancy. - Enhanced existing tests to reflect changes in the LLM class and ensure proper functionality. * test: enhance tracing tests with additional request and response scenarios - Added new test cases to validate the behavior of the trace listener and batch manager when handling 404 responses from the tracing API. - Updated existing test cassettes to include detailed request and response structures, ensuring comprehensive coverage of edge cases. - Improved mock setup to avoid unnecessary network calls and enhance test reliability. - Ensured that the tests validate the correct behavior of event collection during crew execution, particularly in scenarios where the tracing service is unavailable. * feat: enable conditional tracing based on environment variable - Added support for enabling or disabling the trace listener based on the `CREWAI_TRACING_ENABLED` environment variable. - Updated the `Crew` class to conditionally set up the trace listener only when tracing is enabled, improving performance and resource management. - Refactored test cases to ensure proper cleanup of event bus listeners before and after each test, enhancing test reliability and isolation. - Improved mock setup in tracing tests to validate the behavior of the trace listener when tracing is disabled. * fix: downgrade litellm version from 1.74.9 to 1.74.3 - Updated the `pyproject.toml` and `uv.lock` files to reflect the change in the `litellm` dependency version. - This downgrade addresses compatibility issues and ensures stability in the project environment. * refactor: improve tracing test setup by moving mock authentication token patching - Removed the module-level patch for the authentication token and implemented a fixture to mock the token for all tests in the class, enhancing test isolation and readability. - Updated the event bus clearing logic to ensure original handlers are restored after tests, improving reliability of the test environment. - This refactor streamlines the test setup and ensures consistent behavior across tracing tests. * test: enhance tracing test setup with comprehensive mock authentication - Expanded the mock authentication token patching to cover all instances where `get_auth_token` is used across different modules, ensuring consistent behavior in tests. - Introduced a new fixture to reset tracing singleton instances between tests, improving test isolation and reliability. - This update enhances the overall robustness of the tracing tests by ensuring that all necessary components are properly mocked and reset, leading to more reliable test outcomes. * just drop the test for now * refactor: comment out completion-related code in LLM and LLM event classes - Commented out the `completion` and `completion_cost` imports and their usage in the `LLM` class to prevent potential issues during execution. - Updated the `LLMCallCompletedEvent` class to comment out the `response_cost` attribute, ensuring consistency with the changes in the LLM class. - This refactor aims to streamline the code and prepare for future updates without affecting current functionality. * refactor: update LLM response handling in LiteAgent - Commented out the `response_cost` attribute in the LLM response handling to align with recent refactoring in the LLM class. - This change aims to maintain consistency in the codebase and prepare for future updates without affecting current functionality. * refactor: remove commented-out response cost attributes in LLM and LiteAgent - Commented out the `response_cost` attribute in both the `LiteAgent` and `LLM` classes to maintain consistency with recent refactoring efforts. - This change aligns with previous updates aimed at streamlining the codebase and preparing for future enhancements without impacting current functionality. * bring back litellm upgrade version
2026-01-11 00:58:30 +00:00 · 2025-08-06 14:05:14 -07:00
parent 7dc86dc79a
commit 8f4a6cc61c
22 changed files with 3035 additions and 231 deletions
--- a/tests/llm_test.py
+++ b/tests/llm_test.py
@@ -282,9 +282,6 @@ def test_gemini_models(model):
@pytest.mark.parametrize(
    "model",
    [
-        "gemini/gemma-3-1b-it",
-        "gemini/gemma-3-4b-it",
-        "gemini/gemma-3-12b-it",
        "gemini/gemma-3-27b-it",
    ],
 )
@@ -377,6 +374,7 @@ def get_weather_tool_schema():
        },
    }

+
 def test_context_window_exceeded_error_handling():
    """Test that litellm.ContextWindowExceededError is converted to LLMContextLengthExceededException."""
    from litellm.exceptions import ContextWindowExceededError
@@ -392,7 +390,7 @@ def test_context_window_exceeded_error_handling():
        mock_completion.side_effect = ContextWindowExceededError(
            "This model's maximum context length is 8192 tokens. However, your messages resulted in 10000 tokens.",
            model="gpt-4",
-            llm_provider="openai"
+            llm_provider="openai",
        )

        with pytest.raises(LLMContextLengthExceededException) as excinfo:
@@ -407,7 +405,7 @@ def test_context_window_exceeded_error_handling():
        mock_completion.side_effect = ContextWindowExceededError(
            "This model's maximum context length is 8192 tokens. However, your messages resulted in 10000 tokens.",
            model="gpt-4",
-            llm_provider="openai"
+            llm_provider="openai",
        )

        with pytest.raises(LLMContextLengthExceededException) as excinfo:
@@ -598,6 +596,7 @@ def test_handle_streaming_tool_calls(get_weather_tool_schema, mock_emit):
        expected_final_chunk_result=expected_final_chunk_result,
    )

+
@pytest.mark.vcr(filter_headers=["authorization"])
 def test_handle_streaming_tool_calls_with_error(get_weather_tool_schema, mock_emit):
    def get_weather_error(location):
@@ -609,9 +608,7 @@ def test_handle_streaming_tool_calls_with_error(get_weather_tool_schema, mock_em
            {"role": "user", "content": "What is the weather in New York?"},
        ],
        tools=[get_weather_tool_schema],
-        available_functions={
-            "get_weather": get_weather_error
-        },
+        available_functions={"get_weather": get_weather_error},
    )
    assert response == ""
    expected_final_chunk_result = '{"location":"New York, NY"}'
@@ -676,8 +673,11 @@ def test_llm_call_when_stop_is_unsupported(caplog):
    assert isinstance(result, str)
    assert "Paris" in result

+
@pytest.mark.vcr(filter_headers=["authorization"])
-def test_llm_call_when_stop_is_unsupported_when_additional_drop_params_is_provided(caplog):
+def test_llm_call_when_stop_is_unsupported_when_additional_drop_params_is_provided(
+    caplog,
+):
    llm = LLM(model="o1-mini", stop=["stop"], additional_drop_params=["another_param"])
    with caplog.at_level(logging.INFO):
        result = llm.call("What is the capital of France?")
@@ -690,6 +690,7 @@ def test_llm_call_when_stop_is_unsupported_when_additional_drop_params_is_provid
 def ollama_llm():
    return LLM(model="ollama/llama3.2:3b")

+
 def test_ollama_appends_dummy_user_message_when_last_is_assistant(ollama_llm):
    original_messages = [
        {"role": "user", "content": "Hi there"},