lint

Merge branch 'main' of github.com:crewAIInc/crewAI into better-telemetry-tests
refactor: Improve telemetry span tracking in EventListener
2025-12-16 20:38:29 +00:00 · 2025-02-24 12:19:36 -08:00 · 2025-02-24 12:18:43 -08:00 · 2025-02-24 12:17:56 -08:00 · 2025-02-24 15:17:44 -05:00 · 2025-02-24 14:51:58 -05:00
20 changed files with 2803 additions and 1727 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -21,4 +21,5 @@ crew_tasks_output.json
 .mypy_cache
 .ruff_cache
 .venv
-agentops.log
+agentops.log
+test_flow.html
--- a/docs/how-to/langfuse-observability.mdx
+++ b/docs/how-to/langfuse-observability.mdx
@@ -10,6 +10,8 @@ This notebook demonstrates how to integrate **Langfuse** with **CrewAI** using O

 > **What is Langfuse?** [Langfuse](https://langfuse.com) is an open-source LLM engineering platform. It provides tracing and monitoring capabilities for LLM applications, helping developers debug, analyze, and optimize their AI systems. Langfuse integrates with various tools and frameworks via native integrations, OpenTelemetry, and APIs/SDKs.

+[![Langfuse Overview Video](https://github.com/user-attachments/assets/3926b288-ff61-4b95-8aa1-45d041c70866)](https://langfuse.com/watch-demo)
+
 ## Get Started

 We'll walk through a simple example of using CrewAI and integrating it with Langfuse via OpenTelemetry using OpenLit.
--- a/src/crewai/agents/agent_builder/utilities/base_output_converter.py
+++ b/src/crewai/agents/agent_builder/utilities/base_output_converter.py
@@ -31,11 +31,11 @@ class OutputConverter(BaseModel, ABC):
    )

    @abstractmethod
-    def to_pydantic(self, current_attempt=1):
+    def to_pydantic(self, current_attempt=1) -> BaseModel:
        """Convert text to pydantic."""
        pass

    @abstractmethod
-    def to_json(self, current_attempt=1):
+    def to_json(self, current_attempt=1) -> dict:
        """Convert text to json."""
        pass
--- a/src/crewai/crew.py
+++ b/src/crewai/crew.py
@@ -35,7 +35,6 @@ from crewai.process import Process
 from crewai.task import Task
 from crewai.tasks.conditional_task import ConditionalTask
 from crewai.tasks.task_output import TaskOutput
-from crewai.telemetry import Telemetry
 from crewai.tools.agent_tools.agent_tools import AgentTools
 from crewai.tools.base_tool import Tool
 from crewai.traces.unified_trace_controller import init_crew_main_trace
@@ -258,8 +257,6 @@ class Crew(BaseModel):
        if self.function_calling_llm and not isinstance(self.function_calling_llm, LLM):
            self.function_calling_llm = create_llm(self.function_calling_llm)

-        self._telemetry = Telemetry()
-        self._telemetry.set_tracer()
        return self

    @model_validator(mode="after")
@@ -1115,7 +1112,6 @@ class Crew(BaseModel):
            "_short_term_memory",
            "_long_term_memory",
            "_entity_memory",
-            "_telemetry",
            "agents",
            "tasks",
            "knowledge_sources",
@@ -1278,11 +1274,11 @@ class Crew(BaseModel):
    def _reset_all_memories(self) -> None:
        """Reset all available memory systems."""
        memory_systems = [
-            ("short term", self._short_term_memory),
-            ("entity", self._entity_memory),
-            ("long term", self._long_term_memory),
-            ("task output", self._task_output_handler),
-            ("knowledge", self.knowledge),
+            ("short term", getattr(self, "_short_term_memory", None)),
+            ("entity", getattr(self, "_entity_memory", None)),
+            ("long term", getattr(self, "_long_term_memory", None)),
+            ("task output", getattr(self, "_task_output_handler", None)),
+            ("knowledge", getattr(self, "knowledge", None)),
        ]

        for name, system in memory_systems:
--- a/src/crewai/flow/flow.py
+++ b/src/crewai/flow/flow.py
@@ -713,16 +713,35 @@ class Flow(Generic[T], metaclass=FlowMeta):
            raise TypeError(f"State must be dict or BaseModel, got {type(self._state)}")

    def kickoff(self, inputs: Optional[Dict[str, Any]] = None) -> Any:
-        """Start the flow execution.
+        """
+        Start the flow execution in a synchronous context.
+
+        This method wraps kickoff_async so that all state initialization and event
+        emission is handled in the asynchronous method.
+        """
+
+        async def run_flow():
+            return await self.kickoff_async(inputs)
+
+        return asyncio.run(run_flow())
+
+    @init_flow_main_trace
+    async def kickoff_async(self, inputs: Optional[Dict[str, Any]] = None) -> Any:
+        """
+        Start the flow execution asynchronously.
+
+        This method performs state restoration (if an 'id' is provided and persistence is available)
+        and updates the flow state with any additional inputs. It then emits the FlowStartedEvent,
+        logs the flow startup, and executes all start methods. Once completed, it emits the
+        FlowFinishedEvent and returns the final output.

        Args:
-            inputs: Optional dictionary containing input values and potentially a state ID to restore
-        """
-        # Handle state restoration if ID is provided in inputs
-        if inputs and "id" in inputs and self._persistence is not None:
-            restore_uuid = inputs["id"]
-            stored_state = self._persistence.load_state(restore_uuid)
+            inputs: Optional dictionary containing input values and/or a state ID for restoration.

+        Returns:
+            The final output from the flow, which is the result of the last executed method.
+        """
+        if inputs:
            # Override the id in the state if it exists in inputs
            if "id" in inputs:
                if isinstance(self._state, dict):
@@ -730,24 +749,27 @@ class Flow(Generic[T], metaclass=FlowMeta):
                elif isinstance(self._state, BaseModel):
                    setattr(self._state, "id", inputs["id"])

-            if stored_state:
-                self._log_flow_event(
-                    f"Loading flow state from memory for UUID: {restore_uuid}",
-                    color="yellow",
-                )
-                # Restore the state
-                self._restore_state(stored_state)
-            else:
-                self._log_flow_event(
-                    f"No flow state found for UUID: {restore_uuid}", color="red"
-                )
+            # If persistence is enabled, attempt to restore the stored state using the provided id.
+            if "id" in inputs and self._persistence is not None:
+                restore_uuid = inputs["id"]
+                stored_state = self._persistence.load_state(restore_uuid)
+                if stored_state:
+                    self._log_flow_event(
+                        f"Loading flow state from memory for UUID: {restore_uuid}",
+                        color="yellow",
+                    )
+                    self._restore_state(stored_state)
+                else:
+                    self._log_flow_event(
+                        f"No flow state found for UUID: {restore_uuid}", color="red"
+                    )

-            # Apply any additional inputs after restoration
+            # Update state with any additional inputs (ignoring the 'id' key)
            filtered_inputs = {k: v for k, v in inputs.items() if k != "id"}
            if filtered_inputs:
                self._initialize_state(filtered_inputs)

-        # Start flow execution
+        # Emit FlowStartedEvent and log the start of the flow.
        crewai_event_bus.emit(
            self,
            FlowStartedEvent(
@@ -760,27 +782,18 @@ class Flow(Generic[T], metaclass=FlowMeta):
            f"Flow started with ID: {self.flow_id}", color="bold_magenta"
        )

-        if inputs is not None and "id" not in inputs:
-            self._initialize_state(inputs)
-
-        async def run_flow():
-            return await self.kickoff_async()
-
-        return asyncio.run(run_flow())
-
-    @init_flow_main_trace
-    async def kickoff_async(self, inputs: Optional[Dict[str, Any]] = None) -> Any:
        if not self._start_methods:
            raise ValueError("No start method defined")

+        # Execute all start methods concurrently.
        tasks = [
            self._execute_start_method(start_method)
            for start_method in self._start_methods
        ]
        await asyncio.gather(*tasks)
-
        final_output = self._method_outputs[-1] if self._method_outputs else None

+        # Emit FlowFinishedEvent after all processing is complete.
        crewai_event_bus.emit(
            self,
            FlowFinishedEvent(
--- a/src/crewai/llm.py
+++ b/src/crewai/llm.py
@@ -21,14 +21,20 @@ from typing import (
 from dotenv import load_dotenv
 from pydantic import BaseModel

+from crewai.utilities.events.llm_events import (
+    LLMCallCompletedEvent,
+    LLMCallFailedEvent,
+    LLMCallStartedEvent,
+    LLMCallType,
+)
 from crewai.utilities.events.tool_usage_events import ToolExecutionErrorEvent

 with warnings.catch_warnings():
    warnings.simplefilter("ignore", UserWarning)
    import litellm
-    from litellm import Choices, get_supported_openai_params
+    from litellm import Choices
    from litellm.types.utils import ModelResponse
-    from litellm.utils import supports_response_schema
+    from litellm.utils import get_supported_openai_params, supports_response_schema


 from crewai.traces.unified_trace_controller import trace_llm_call
@@ -259,6 +265,15 @@ class LLM:
            >>> print(response)
            "The capital of France is Paris."
        """
+        crewai_event_bus.emit(
+            self,
+            event=LLMCallStartedEvent(
+                messages=messages,
+                tools=tools,
+                callbacks=callbacks,
+                available_functions=available_functions,
+            ),
+        )
        # Validate parameters before proceeding with the call.
        self._validate_call_params()

@@ -333,12 +348,13 @@ class LLM:

                # --- 4) If no tool calls, return the text response
                if not tool_calls or not available_functions:
+                    self._handle_emit_call_events(text_response, LLMCallType.LLM_CALL)
                    return text_response

                # --- 5) Handle the tool call
                tool_call = tool_calls[0]
                function_name = tool_call.function.name
-                print("function_name", function_name)
+
                if function_name in available_functions:
                    try:
                        function_args = json.loads(tool_call.function.arguments)
@@ -350,6 +366,7 @@ class LLM:
                    try:
                        # Call the actual tool function
                        result = fn(**function_args)
+                        self._handle_emit_call_events(result, LLMCallType.TOOL_CALL)
                        return result

                    except Exception as e:
@@ -365,6 +382,12 @@ class LLM:
                                error=str(e),
                            ),
                        )
+                        crewai_event_bus.emit(
+                            self,
+                            event=LLMCallFailedEvent(
+                                error=f"Tool execution error: {str(e)}"
+                            ),
+                        )
                        return text_response

                else:
@@ -374,12 +397,28 @@ class LLM:
                    return text_response

            except Exception as e:
+                crewai_event_bus.emit(
+                    self,
+                    event=LLMCallFailedEvent(error=str(e)),
+                )
                if not LLMContextLengthExceededException(
                    str(e)
                )._is_context_limit_error(str(e)):
                    logging.error(f"LiteLLM call failed: {str(e)}")
                raise

+    def _handle_emit_call_events(self, response: Any, call_type: LLMCallType):
+        """Handle the events for the LLM call.
+
+        Args:
+            response (str): The response from the LLM call.
+            call_type (str): The type of call, either "tool_call" or "llm_call".
+        """
+        crewai_event_bus.emit(
+            self,
+            event=LLMCallCompletedEvent(response=response, call_type=call_type),
+        )
+
    def _format_messages_for_provider(
        self, messages: List[Dict[str, str]]
    ) -> List[Dict[str, str]]:
@@ -449,7 +488,7 @@ class LLM:
    def supports_function_calling(self) -> bool:
        try:
            params = get_supported_openai_params(model=self.model)
-            return "response_format" in params
+            return params is not None and "tools" in params
        except Exception as e:
            logging.error(f"Failed to get supported params: {str(e)}")
            return False
@@ -457,7 +496,7 @@ class LLM:
    def supports_stop_words(self) -> bool:
        try:
            params = get_supported_openai_params(model=self.model)
-            return "stop" in params
+            return params is not None and "stop" in params
        except Exception as e:
            logging.error(f"Failed to get supported params: {str(e)}")
            return False
--- a/src/crewai/utilities/converter.py
+++ b/src/crewai/utilities/converter.py
@@ -20,11 +20,11 @@ class ConverterError(Exception):
 class Converter(OutputConverter):
    """Class that converts text into either pydantic or json."""

-    def to_pydantic(self, current_attempt=1):
+    def to_pydantic(self, current_attempt=1) -> BaseModel:
        """Convert text to pydantic."""
        try:
            if self.llm.supports_function_calling():
-                return self._create_instructor().to_pydantic()
+                result = self._create_instructor().to_pydantic()
            else:
                response = self.llm.call(
                    [
@@ -32,18 +32,40 @@ class Converter(OutputConverter):
                        {"role": "user", "content": self.text},
                    ]
                )
-                return self.model.model_validate_json(response)
+                try:
+                    # Try to directly validate the response JSON
+                    result = self.model.model_validate_json(response)
+                except ValidationError:
+                    # If direct validation fails, attempt to extract valid JSON
+                    result = handle_partial_json(response, self.model, False, None)
+                    # Ensure result is a BaseModel instance
+                    if not isinstance(result, BaseModel):
+                        if isinstance(result, dict):
+                            result = self.model.parse_obj(result)
+                        elif isinstance(result, str):
+                            try:
+                                parsed = json.loads(result)
+                                result = self.model.parse_obj(parsed)
+                            except Exception as parse_err:
+                                raise ConverterError(
+                                    f"Failed to convert partial JSON result into Pydantic: {parse_err}"
+                                )
+                        else:
+                            raise ConverterError(
+                                "handle_partial_json returned an unexpected type."
+                            )
+            return result
        except ValidationError as e:
            if current_attempt < self.max_attempts:
                return self.to_pydantic(current_attempt + 1)
            raise ConverterError(
-                f"Failed to convert text into a Pydantic model due to the following validation error: {e}"
+                f"Failed to convert text into a Pydantic model due to validation error: {e}"
            )
        except Exception as e:
            if current_attempt < self.max_attempts:
                return self.to_pydantic(current_attempt + 1)
            raise ConverterError(
-                f"Failed to convert text into a Pydantic model due to the following error: {e}"
+                f"Failed to convert text into a Pydantic model due to error: {e}"
            )

    def to_json(self, current_attempt=1):
@@ -197,11 +219,15 @@ def get_conversion_instructions(model: Type[BaseModel], llm: Any) -> str:
    if llm.supports_function_calling():
        model_schema = PydanticSchemaParser(model=model).get_schema()
        instructions += (
-            f"\n\nThe JSON should follow this schema:\n```json\n{model_schema}\n```"
+            f"\n\nOutput ONLY the valid JSON and nothing else.\n\n"
+            f"The JSON must follow this schema exactly:\n```json\n{model_schema}\n```"
        )
    else:
        model_description = generate_model_description(model)
-        instructions += f"\n\nThe JSON should follow this format:\n{model_description}"
+        instructions += (
+            f"\n\nOutput ONLY the valid JSON and nothing else.\n\n"
+            f"The JSON must follow this format exactly:\n{model_description}"
+        )
    return instructions


--- a/src/crewai/utilities/events/init.py
+++ b/src/crewai/utilities/events/init.py
@@ -34,6 +34,7 @@ from .tool_usage_events import (
    ToolUsageEvent,
    ToolValidateInputErrorEvent,
 )
+from .llm_events import LLMCallCompletedEvent, LLMCallFailedEvent, LLMCallStartedEvent

 # events
 from .event_listener import EventListener
--- a/src/crewai/utilities/events/event_listener.py
+++ b/src/crewai/utilities/events/event_listener.py
@@ -1,9 +1,17 @@
-from pydantic import PrivateAttr
+from typing import Any, Dict

+from pydantic import Field, PrivateAttr
+
+from crewai.task import Task
 from crewai.telemetry.telemetry import Telemetry
 from crewai.utilities import Logger
 from crewai.utilities.constants import EMITTER_COLOR
 from crewai.utilities.events.base_event_listener import BaseEventListener
+from crewai.utilities.events.llm_events import (
+    LLMCallCompletedEvent,
+    LLMCallFailedEvent,
+    LLMCallStartedEvent,
+)

 from .agent_events import AgentExecutionCompletedEvent, AgentExecutionStartedEvent
 from .crew_events import (
@@ -37,6 +45,7 @@ class EventListener(BaseEventListener):
    _instance = None
    _telemetry: Telemetry = PrivateAttr(default_factory=lambda: Telemetry())
    logger = Logger(verbose=True, default_color=EMITTER_COLOR)
+    execution_spans: Dict[Task, Any] = Field(default_factory=dict)

    def __new__(cls):
        if cls._instance is None:
@@ -49,6 +58,7 @@ class EventListener(BaseEventListener):
            super().__init__()
            self._telemetry = Telemetry()
            self._telemetry.set_tracer()
+            self.execution_spans = {}
            self._initialized = True

    # ----------- CREW EVENTS -----------
@@ -57,7 +67,7 @@ class EventListener(BaseEventListener):
        @crewai_event_bus.on(CrewKickoffStartedEvent)
        def on_crew_started(source, event: CrewKickoffStartedEvent):
            self.logger.log(
-                f"🚀 Crew '{event.crew_name}' started",
+                f"🚀 Crew '{event.crew_name}' started, {source.id}",
                event.timestamp,
            )
            self._telemetry.crew_execution_span(source, event.inputs)
@@ -67,28 +77,28 @@ class EventListener(BaseEventListener):
            final_string_output = event.output.raw
            self._telemetry.end_crew(source, final_string_output)
            self.logger.log(
-                f"✅ Crew '{event.crew_name}' completed",
+                f"✅ Crew '{event.crew_name}' completed, {source.id}",
                event.timestamp,
            )

        @crewai_event_bus.on(CrewKickoffFailedEvent)
        def on_crew_failed(source, event: CrewKickoffFailedEvent):
            self.logger.log(
-                f"❌ Crew '{event.crew_name}' failed",
+                f"❌ Crew '{event.crew_name}' failed, {source.id}",
                event.timestamp,
            )

        @crewai_event_bus.on(CrewTestStartedEvent)
        def on_crew_test_started(source, event: CrewTestStartedEvent):
            cloned_crew = source.copy()
-            cloned_crew._telemetry.test_execution_span(
+            self._telemetry.test_execution_span(
                cloned_crew,
                event.n_iterations,
                event.inputs,
-                event.eval_llm,
+                event.eval_llm or "",
            )
            self.logger.log(
-                f"🚀 Crew '{event.crew_name}' started test",
+                f"🚀 Crew '{event.crew_name}' started test, {source.id}",
                event.timestamp,
            )

@@ -131,9 +141,9 @@ class EventListener(BaseEventListener):

        @crewai_event_bus.on(TaskStartedEvent)
        def on_task_started(source, event: TaskStartedEvent):
-            source._execution_span = self._telemetry.task_started(
-                crew=source.agent.crew, task=source
-            )
+            span = self._telemetry.task_started(crew=source.agent.crew, task=source)
+            self.execution_spans[source] = span
+
            self.logger.log(
                f"📋 Task started: {source.description}",
                event.timestamp,
@@ -141,24 +151,22 @@ class EventListener(BaseEventListener):

        @crewai_event_bus.on(TaskCompletedEvent)
        def on_task_completed(source, event: TaskCompletedEvent):
-            if source._execution_span:
-                self._telemetry.task_ended(
-                    source._execution_span, source, source.agent.crew
-                )
+            span = self.execution_spans.get(source)
+            if span:
+                self._telemetry.task_ended(span, source, source.agent.crew)
            self.logger.log(
                f"✅ Task completed: {source.description}",
                event.timestamp,
            )
-            source._execution_span = None
+            self.execution_spans[source] = None

        @crewai_event_bus.on(TaskFailedEvent)
        def on_task_failed(source, event: TaskFailedEvent):
-            if source._execution_span:
+            span = self.execution_spans.get(source)
+            if span:
                if source.agent and source.agent.crew:
-                    self._telemetry.task_ended(
-                        source._execution_span, source, source.agent.crew
-                    )
-                source._execution_span = None
+                    self._telemetry.task_ended(span, source, source.agent.crew)
+                self.execution_spans[source] = None
            self.logger.log(
                f"❌ Task failed: {source.description}",
                event.timestamp,
@@ -184,7 +192,7 @@ class EventListener(BaseEventListener):

        @crewai_event_bus.on(FlowCreatedEvent)
        def on_flow_created(source, event: FlowCreatedEvent):
-            self._telemetry.flow_creation_span(self.__class__.__name__)
+            self._telemetry.flow_creation_span(event.flow_name)
            self.logger.log(
                f"🌊 Flow Created: '{event.flow_name}'",
                event.timestamp,
@@ -193,17 +201,17 @@ class EventListener(BaseEventListener):
        @crewai_event_bus.on(FlowStartedEvent)
        def on_flow_started(source, event: FlowStartedEvent):
            self._telemetry.flow_execution_span(
-                source.__class__.__name__, list(source._methods.keys())
+                event.flow_name, list(source._methods.keys())
            )
            self.logger.log(
-                f"🤖 Flow Started: '{event.flow_name}'",
+                f"🤖 Flow Started: '{event.flow_name}', {source.flow_id}",
                event.timestamp,
            )

        @crewai_event_bus.on(FlowFinishedEvent)
        def on_flow_finished(source, event: FlowFinishedEvent):
            self.logger.log(
-                f"👍 Flow Finished: '{event.flow_name}'",
+                f"👍 Flow Finished: '{event.flow_name}', {source.flow_id}",
                event.timestamp,
            )

@@ -253,5 +261,28 @@ class EventListener(BaseEventListener):
                #
            )

+        # ----------- LLM EVENTS -----------
+
+        @crewai_event_bus.on(LLMCallStartedEvent)
+        def on_llm_call_started(source, event: LLMCallStartedEvent):
+            self.logger.log(
+                f"🤖 LLM Call Started",
+                event.timestamp,
+            )
+
+        @crewai_event_bus.on(LLMCallCompletedEvent)
+        def on_llm_call_completed(source, event: LLMCallCompletedEvent):
+            self.logger.log(
+                f"✅ LLM Call Completed",
+                event.timestamp,
+            )
+
+        @crewai_event_bus.on(LLMCallFailedEvent)
+        def on_llm_call_failed(source, event: LLMCallFailedEvent):
+            self.logger.log(
+                f"❌ LLM Call Failed: '{event.error}'",
+                event.timestamp,
+            )
+

 event_listener = EventListener()
--- a/src/crewai/utilities/events/llm_events.py
+++ b/src/crewai/utilities/events/llm_events.py
@@ -0,0 +1,36 @@
+from enum import Enum
+from typing import Any, Dict, List, Optional, Union
+
+from crewai.utilities.events.base_events import CrewEvent
+
+
+class LLMCallType(Enum):
+    """Type of LLM call being made"""
+
+    TOOL_CALL = "tool_call"
+    LLM_CALL = "llm_call"
+
+
+class LLMCallStartedEvent(CrewEvent):
+    """Event emitted when a LLM call starts"""
+
+    type: str = "llm_call_started"
+    messages: Union[str, List[Dict[str, str]]]
+    tools: Optional[List[dict]] = None
+    callbacks: Optional[List[Any]] = None
+    available_functions: Optional[Dict[str, Any]] = None
+
+
+class LLMCallCompletedEvent(CrewEvent):
+    """Event emitted when a LLM call completes"""
+
+    type: str = "llm_call_completed"
+    response: Any
+    call_type: LLMCallType
+
+
+class LLMCallFailedEvent(CrewEvent):
+    """Event emitted when a LLM call fails"""
+
+    error: str
+    type: str = "llm_call_failed"
--- a/src/crewai/utilities/events/task_events.py
+++ b/src/crewai/utilities/events/task_events.py
@@ -1,4 +1,4 @@
-from typing import Any, Optional
+from typing import Optional

 from crewai.tasks.task_output import TaskOutput
 from crewai.utilities.events.base_events import CrewEvent
--- a/tests/crew_test.py
+++ b/tests/crew_test.py
@@ -833,6 +833,12 @@ def test_crew_verbose_output(capsys):

    crew.kickoff()
    captured = capsys.readouterr()
+
+    # Filter out event listener logs (lines starting with '[')
+    filtered_output = "\n".join(
+        line for line in captured.out.split("\n") if not line.startswith("[")
+    )
+
    expected_strings = [
        "\x1b[1m\x1b[95m# Agent:\x1b[00m \x1b[1m\x1b[92mResearcher",
        "\x1b[00m\n\x1b[95m## Task:\x1b[00m \x1b[92mResearch AI advancements.",
@@ -845,27 +851,19 @@ def test_crew_verbose_output(capsys):
    ]

    for expected_string in expected_strings:
-        assert expected_string in captured.out
+        assert expected_string in filtered_output

    # Now test with verbose set to False
    crew.verbose = False
    crew._logger = Logger(verbose=False)
    crew.kickoff()
-    expected_listener_logs = [
-        "[🚀 CREW 'CREW' STARTED]",
-        "[📋 TASK STARTED: RESEARCH AI ADVANCEMENTS.]",
-        "[🤖 AGENT 'RESEARCHER' STARTED TASK]",
-        "[✅ AGENT 'RESEARCHER' COMPLETED TASK]",
-        "[✅ TASK COMPLETED: RESEARCH AI ADVANCEMENTS.]",
-        "[📋 TASK STARTED: WRITE ABOUT AI IN HEALTHCARE.]",
-        "[🤖 AGENT 'SENIOR WRITER' STARTED TASK]",
-        "[✅ AGENT 'SENIOR WRITER' COMPLETED TASK]",
-        "[✅ TASK COMPLETED: WRITE ABOUT AI IN HEALTHCARE.]",
-        "[✅ CREW 'CREW' COMPLETED]",
-    ]
    captured = capsys.readouterr()
-    for log in expected_listener_logs:
-        assert log in captured.out
+    filtered_output = "\n".join(
+        line
+        for line in captured.out.split("\n")
+        if not line.startswith("[") and line.strip() and not line.startswith("\x1b")
+    )
+    assert filtered_output == ""


@pytest.mark.vcr(filter_headers=["authorization"])
--- a/tests/utilities/cassettes/test_converter_with_llama3_1_model.yaml
+++ b/tests/utilities/cassettes/test_converter_with_llama3_1_model.yaml
--- a/tests/utilities/cassettes/test_converter_with_llama3_2_model.yaml
+++ b/tests/utilities/cassettes/test_converter_with_llama3_2_model.yaml
@@ -1,14 +1,9 @@
 interactions:
 - request:
-    body: '{"model": "llama3.2:3b", "prompt": "### User:\nName: Alice Llama, Age:
-      30\n\n### System:\nProduce JSON OUTPUT ONLY! Adhere to this format {\"name\":
-      \"function_name\", \"arguments\":{\"argument_name\": \"argument_value\"}} The
-      following functions are available to you:\n{''type'': ''function'', ''function'':
-      {''name'': ''SimpleModel'', ''description'': ''Correctly extracted `SimpleModel`
-      with all the required parameters with correct types'', ''parameters'': {''properties'':
-      {''name'': {''title'': ''Name'', ''type'': ''string''}, ''age'': {''title'':
-      ''Age'', ''type'': ''integer''}}, ''required'': [''age'', ''name''], ''type'':
-      ''object''}}}\n\n\n", "options": {}, "stream": false, "format": "json"}'
+    body: '{"model": "llama3.2:3b", "prompt": "### System:\nPlease convert the following
+      text into valid JSON.\n\nOutput ONLY the valid JSON and nothing else.\n\nThe
+      JSON must follow this format exactly:\n{\n  \"name\": str,\n  \"age\": int\n}\n\n###
+      User:\nName: Alice Llama, Age: 30\n\n", "options": {"stop": []}, "stream": false}'
    headers:
      accept:
      - '*/*'
@@ -17,23 +12,23 @@ interactions:
      connection:
      - keep-alive
      content-length:
-      - '657'
+      - '321'
      host:
      - localhost:11434
      user-agent:
-      - litellm/1.57.4
+      - litellm/1.60.2
    method: POST
    uri: http://localhost:11434/api/generate
  response:
-    content: '{"model":"llama3.2:3b","created_at":"2025-01-15T20:47:11.926411Z","response":"{\"name\":
-      \"SimpleModel\", \"arguments\":{\"name\": \"Alice Llama\", \"age\": 30}}","done":true,"done_reason":"stop","context":[128006,9125,128007,271,38766,1303,33025,2696,25,6790,220,2366,18,271,128009,128006,882,128007,271,14711,2724,512,678,25,30505,445,81101,11,13381,25,220,966,271,14711,744,512,1360,13677,4823,32090,27785,0,2467,6881,311,420,3645,5324,609,794,330,1723,1292,498,330,16774,23118,14819,1292,794,330,14819,3220,32075,578,2768,5865,527,2561,311,499,512,13922,1337,1232,364,1723,518,364,1723,1232,5473,609,1232,364,16778,1747,518,364,4789,1232,364,34192,398,28532,1595,16778,1747,63,449,682,279,2631,5137,449,4495,4595,518,364,14105,1232,5473,13495,1232,5473,609,1232,5473,2150,1232,364,678,518,364,1337,1232,364,928,25762,364,425,1232,5473,2150,1232,364,17166,518,364,1337,1232,364,11924,8439,2186,364,6413,1232,2570,425,518,364,609,4181,364,1337,1232,364,1735,23742,3818,128009,128006,78191,128007,271,5018,609,794,330,16778,1747,498,330,16774,23118,609,794,330,62786,445,81101,498,330,425,794,220,966,3500],"total_duration":3374470708,"load_duration":1075750500,"prompt_eval_count":167,"prompt_eval_duration":1871000000,"eval_count":24,"eval_duration":426000000}'
+    content: '{"model":"llama3.2:3b","created_at":"2025-02-21T02:57:55.059392Z","response":"{\"name\":
+      \"Alice Llama\", \"age\": 30}","done":true,"done_reason":"stop","context":[128006,9125,128007,271,38766,1303,33025,2696,25,6790,220,2366,18,271,128009,128006,882,128007,271,14711,744,512,5618,5625,279,2768,1495,1139,2764,4823,382,5207,27785,279,2764,4823,323,4400,775,382,791,4823,2011,1833,420,3645,7041,512,517,220,330,609,794,610,345,220,330,425,794,528,198,633,14711,2724,512,678,25,30505,445,81101,11,13381,25,220,966,271,128009,128006,78191,128007,271,5018,609,794,330,62786,445,81101,498,330,425,794,220,966,92],"total_duration":4675906000,"load_duration":836091458,"prompt_eval_count":82,"prompt_eval_duration":3561000000,"eval_count":15,"eval_duration":275000000}'
    headers:
      Content-Length:
-      - '1263'
+      - '761'
      Content-Type:
      - application/json; charset=utf-8
      Date:
-      - Wed, 15 Jan 2025 20:47:12 GMT
+      - Fri, 21 Feb 2025 02:57:55 GMT
    http_version: HTTP/1.1
    status_code: 200
 - request:
@@ -52,7 +47,7 @@ interactions:
      host:
      - localhost:11434
      user-agent:
-      - litellm/1.57.4
+      - litellm/1.60.2
    method: POST
    uri: http://localhost:11434/api/show
  response:
@@ -228,7 +223,7 @@ interactions:
      Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama
      3.2: LlamaUseReport@meta.com\",\"modelfile\":\"# Modelfile generated by \\\"ollama
      show\\\"\\n# To build a new Modelfile based on this, replace FROM with:\\n#
-      FROM llama3.2:3b\\n\\nFROM /Users/brandonhancock/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff\\nTEMPLATE
+      FROM llama3.2:3b\\n\\nFROM /Users/joaomoura/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff\\nTEMPLATE
      \\\"\\\"\\\"\\u003c|start_header_id|\\u003esystem\\u003c|end_header_id|\\u003e\\n\\nCutting
      Knowledge Date: December 2023\\n\\n{{ if .System }}{{ .System }}\\n{{- end }}\\n{{-
      if .Tools }}When you receive a tool call response, use the output to format
@@ -441,12 +436,12 @@ interactions:
      .Content }}\\n{{- end }}{{ if not $last }}\\u003c|eot_id|\\u003e{{ end }}\\n{{-
      else if eq .Role \\\"tool\\\" }}\\u003c|start_header_id|\\u003eipython\\u003c|end_header_id|\\u003e\\n\\n{{
      .Content }}\\u003c|eot_id|\\u003e{{ if $last }}\\u003c|start_header_id|\\u003eassistant\\u003c|end_header_id|\\u003e\\n\\n{{
-      end }}\\n{{- end }}\\n{{- end }}\",\"details\":{\"parent_model\":\"\",\"format\":\"gguf\",\"family\":\"llama\",\"families\":[\"llama\"],\"parameter_size\":\"3.2B\",\"quantization_level\":\"Q4_K_M\"},\"model_info\":{\"general.architecture\":\"llama\",\"general.basename\":\"Llama-3.2\",\"general.file_type\":15,\"general.finetune\":\"Instruct\",\"general.languages\":[\"en\",\"de\",\"fr\",\"it\",\"pt\",\"hi\",\"es\",\"th\"],\"general.parameter_count\":3212749888,\"general.quantization_version\":2,\"general.size_label\":\"3B\",\"general.tags\":[\"facebook\",\"meta\",\"pytorch\",\"llama\",\"llama-3\",\"text-generation\"],\"general.type\":\"model\",\"llama.attention.head_count\":24,\"llama.attention.head_count_kv\":8,\"llama.attention.key_length\":128,\"llama.attention.layer_norm_rms_epsilon\":0.00001,\"llama.attention.value_length\":128,\"llama.block_count\":28,\"llama.context_length\":131072,\"llama.embedding_length\":3072,\"llama.feed_forward_length\":8192,\"llama.rope.dimension_count\":128,\"llama.rope.freq_base\":500000,\"llama.vocab_size\":128256,\"tokenizer.ggml.bos_token_id\":128000,\"tokenizer.ggml.eos_token_id\":128009,\"tokenizer.ggml.merges\":null,\"tokenizer.ggml.model\":\"gpt2\",\"tokenizer.ggml.pre\":\"llama-bpe\",\"tokenizer.ggml.token_type\":null,\"tokenizer.ggml.tokens\":null},\"modified_at\":\"2024-12-31T11:53:14.529771974-05:00\"}"
+      end }}\\n{{- end }}\\n{{- end }}\",\"details\":{\"parent_model\":\"\",\"format\":\"gguf\",\"family\":\"llama\",\"families\":[\"llama\"],\"parameter_size\":\"3.2B\",\"quantization_level\":\"Q4_K_M\"},\"model_info\":{\"general.architecture\":\"llama\",\"general.basename\":\"Llama-3.2\",\"general.file_type\":15,\"general.finetune\":\"Instruct\",\"general.languages\":[\"en\",\"de\",\"fr\",\"it\",\"pt\",\"hi\",\"es\",\"th\"],\"general.parameter_count\":3212749888,\"general.quantization_version\":2,\"general.size_label\":\"3B\",\"general.tags\":[\"facebook\",\"meta\",\"pytorch\",\"llama\",\"llama-3\",\"text-generation\"],\"general.type\":\"model\",\"llama.attention.head_count\":24,\"llama.attention.head_count_kv\":8,\"llama.attention.key_length\":128,\"llama.attention.layer_norm_rms_epsilon\":0.00001,\"llama.attention.value_length\":128,\"llama.block_count\":28,\"llama.context_length\":131072,\"llama.embedding_length\":3072,\"llama.feed_forward_length\":8192,\"llama.rope.dimension_count\":128,\"llama.rope.freq_base\":500000,\"llama.vocab_size\":128256,\"tokenizer.ggml.bos_token_id\":128000,\"tokenizer.ggml.eos_token_id\":128009,\"tokenizer.ggml.merges\":null,\"tokenizer.ggml.model\":\"gpt2\",\"tokenizer.ggml.pre\":\"llama-bpe\",\"tokenizer.ggml.token_type\":null,\"tokenizer.ggml.tokens\":null},\"modified_at\":\"2025-02-20T18:55:09.150577031-08:00\"}"
    headers:
      Content-Type:
      - application/json; charset=utf-8
      Date:
-      - Wed, 15 Jan 2025 20:47:12 GMT
+      - Fri, 21 Feb 2025 02:57:55 GMT
      Transfer-Encoding:
      - chunked
    http_version: HTTP/1.1
@@ -467,7 +462,7 @@ interactions:
      host:
      - localhost:11434
      user-agent:
-      - litellm/1.57.4
+      - litellm/1.60.2
    method: POST
    uri: http://localhost:11434/api/show
  response:
@@ -643,7 +638,7 @@ interactions:
      Reporting violations of the Acceptable Use Policy or unlicensed uses of Llama
      3.2: LlamaUseReport@meta.com\",\"modelfile\":\"# Modelfile generated by \\\"ollama
      show\\\"\\n# To build a new Modelfile based on this, replace FROM with:\\n#
-      FROM llama3.2:3b\\n\\nFROM /Users/brandonhancock/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff\\nTEMPLATE
+      FROM llama3.2:3b\\n\\nFROM /Users/joaomoura/.ollama/models/blobs/sha256-dde5aa3fc5ffc17176b5e8bdc82f587b24b2678c6c66101bf7da77af9f7ccdff\\nTEMPLATE
      \\\"\\\"\\\"\\u003c|start_header_id|\\u003esystem\\u003c|end_header_id|\\u003e\\n\\nCutting
      Knowledge Date: December 2023\\n\\n{{ if .System }}{{ .System }}\\n{{- end }}\\n{{-
      if .Tools }}When you receive a tool call response, use the output to format
@@ -856,12 +851,12 @@ interactions:
      .Content }}\\n{{- end }}{{ if not $last }}\\u003c|eot_id|\\u003e{{ end }}\\n{{-
      else if eq .Role \\\"tool\\\" }}\\u003c|start_header_id|\\u003eipython\\u003c|end_header_id|\\u003e\\n\\n{{
      .Content }}\\u003c|eot_id|\\u003e{{ if $last }}\\u003c|start_header_id|\\u003eassistant\\u003c|end_header_id|\\u003e\\n\\n{{
-      end }}\\n{{- end }}\\n{{- end }}\",\"details\":{\"parent_model\":\"\",\"format\":\"gguf\",\"family\":\"llama\",\"families\":[\"llama\"],\"parameter_size\":\"3.2B\",\"quantization_level\":\"Q4_K_M\"},\"model_info\":{\"general.architecture\":\"llama\",\"general.basename\":\"Llama-3.2\",\"general.file_type\":15,\"general.finetune\":\"Instruct\",\"general.languages\":[\"en\",\"de\",\"fr\",\"it\",\"pt\",\"hi\",\"es\",\"th\"],\"general.parameter_count\":3212749888,\"general.quantization_version\":2,\"general.size_label\":\"3B\",\"general.tags\":[\"facebook\",\"meta\",\"pytorch\",\"llama\",\"llama-3\",\"text-generation\"],\"general.type\":\"model\",\"llama.attention.head_count\":24,\"llama.attention.head_count_kv\":8,\"llama.attention.key_length\":128,\"llama.attention.layer_norm_rms_epsilon\":0.00001,\"llama.attention.value_length\":128,\"llama.block_count\":28,\"llama.context_length\":131072,\"llama.embedding_length\":3072,\"llama.feed_forward_length\":8192,\"llama.rope.dimension_count\":128,\"llama.rope.freq_base\":500000,\"llama.vocab_size\":128256,\"tokenizer.ggml.bos_token_id\":128000,\"tokenizer.ggml.eos_token_id\":128009,\"tokenizer.ggml.merges\":null,\"tokenizer.ggml.model\":\"gpt2\",\"tokenizer.ggml.pre\":\"llama-bpe\",\"tokenizer.ggml.token_type\":null,\"tokenizer.ggml.tokens\":null},\"modified_at\":\"2024-12-31T11:53:14.529771974-05:00\"}"
+      end }}\\n{{- end }}\\n{{- end }}\",\"details\":{\"parent_model\":\"\",\"format\":\"gguf\",\"family\":\"llama\",\"families\":[\"llama\"],\"parameter_size\":\"3.2B\",\"quantization_level\":\"Q4_K_M\"},\"model_info\":{\"general.architecture\":\"llama\",\"general.basename\":\"Llama-3.2\",\"general.file_type\":15,\"general.finetune\":\"Instruct\",\"general.languages\":[\"en\",\"de\",\"fr\",\"it\",\"pt\",\"hi\",\"es\",\"th\"],\"general.parameter_count\":3212749888,\"general.quantization_version\":2,\"general.size_label\":\"3B\",\"general.tags\":[\"facebook\",\"meta\",\"pytorch\",\"llama\",\"llama-3\",\"text-generation\"],\"general.type\":\"model\",\"llama.attention.head_count\":24,\"llama.attention.head_count_kv\":8,\"llama.attention.key_length\":128,\"llama.attention.layer_norm_rms_epsilon\":0.00001,\"llama.attention.value_length\":128,\"llama.block_count\":28,\"llama.context_length\":131072,\"llama.embedding_length\":3072,\"llama.feed_forward_length\":8192,\"llama.rope.dimension_count\":128,\"llama.rope.freq_base\":500000,\"llama.vocab_size\":128256,\"tokenizer.ggml.bos_token_id\":128000,\"tokenizer.ggml.eos_token_id\":128009,\"tokenizer.ggml.merges\":null,\"tokenizer.ggml.model\":\"gpt2\",\"tokenizer.ggml.pre\":\"llama-bpe\",\"tokenizer.ggml.token_type\":null,\"tokenizer.ggml.tokens\":null},\"modified_at\":\"2025-02-20T18:55:09.150577031-08:00\"}"
    headers:
      Content-Type:
      - application/json; charset=utf-8
      Date:
-      - Wed, 15 Jan 2025 20:47:12 GMT
+      - Fri, 21 Feb 2025 02:57:55 GMT
      Transfer-Encoding:
      - chunked
    http_version: HTTP/1.1
--- a/tests/utilities/cassettes/test_crew_emits_end_task_event.yaml
+++ b/tests/utilities/cassettes/test_crew_emits_end_task_event.yaml
--- a/tests/utilities/cassettes/test_crew_emits_test_kickoff_type_event.yaml
+++ b/tests/utilities/cassettes/test_crew_emits_test_kickoff_type_event.yaml
@@ -0,0 +1,236 @@
+interactions:
+- request:
+    body: '{"messages": [{"role": "system", "content": "You are base_agent. You are
+      a helpful assistant that just says hi\nYour personal goal is: Just say hi\nTo
+      give my best complete final answer to the task respond using the exact following
+      format:\n\nThought: I now can give a great answer\nFinal Answer: Your final
+      answer must be the great and the most complete as possible, it must be outcome
+      described.\n\nI MUST use these formats, my job depends on it!"}, {"role": "user",
+      "content": "\nCurrent Task: Just say hi\n\nThis is the expected criteria for
+      your final answer: hi\nyou MUST return the actual complete content as the final
+      answer, not a summary.\n\nBegin! This is VERY important to you, use the tools
+      available and give your best Final Answer, your job depends on it!\n\nThought:"}],
+      "model": "gpt-4o-mini", "stop": ["\nObservation:"]}'
+    headers:
+      accept:
+      - application/json
+      accept-encoding:
+      - gzip, deflate
+      connection:
+      - keep-alive
+      content-length:
+      - '838'
+      content-type:
+      - application/json
+      host:
+      - api.openai.com
+      user-agent:
+      - OpenAI/Python 1.61.0
+      x-stainless-arch:
+      - arm64
+      x-stainless-async:
+      - 'false'
+      x-stainless-lang:
+      - python
+      x-stainless-os:
+      - MacOS
+      x-stainless-package-version:
+      - 1.61.0
+      x-stainless-raw-response:
+      - 'true'
+      x-stainless-retry-count:
+      - '0'
+      x-stainless-runtime:
+      - CPython
+      x-stainless-runtime-version:
+      - 3.12.8
+    method: POST
+    uri: https://api.openai.com/v1/chat/completions
+  response:
+    content: "{\n  \"id\": \"chatcmpl-B4VsaBZ4ec4b0ab4pkqWgyxTFVVfc\",\n  \"object\":
+      \"chat.completion\",\n  \"created\": 1740415556,\n  \"model\": \"gpt-4o-mini-2024-07-18\",\n
+      \ \"choices\": [\n    {\n      \"index\": 0,\n      \"message\": {\n        \"role\":
+      \"assistant\",\n        \"content\": \"I now can give a great answer  \\nFinal
+      Answer: hi\",\n        \"refusal\": null\n      },\n      \"logprobs\": null,\n
+      \     \"finish_reason\": \"stop\"\n    }\n  ],\n  \"usage\": {\n    \"prompt_tokens\":
+      161,\n    \"completion_tokens\": 12,\n    \"total_tokens\": 173,\n    \"prompt_tokens_details\":
+      {\n      \"cached_tokens\": 0,\n      \"audio_tokens\": 0\n    },\n    \"completion_tokens_details\":
+      {\n      \"reasoning_tokens\": 0,\n      \"audio_tokens\": 0,\n      \"accepted_prediction_tokens\":
+      0,\n      \"rejected_prediction_tokens\": 0\n    }\n  },\n  \"service_tier\":
+      \"default\",\n  \"system_fingerprint\": \"fp_7fcd609668\"\n}\n"
+    headers:
+      CF-RAY:
+      - 9170edc5da6f230e-SJC
+      Connection:
+      - keep-alive
+      Content-Encoding:
+      - gzip
+      Content-Type:
+      - application/json
+      Date:
+      - Mon, 24 Feb 2025 16:45:57 GMT
+      Server:
+      - cloudflare
+      Set-Cookie:
+      - __cf_bm=lvRw4Nyef7N35to64fj2_kHDfbZp0KSFbwgF5chYMRI-1740415557-1.0.1.1-o5BaN1FpBwv5Wq6zIlv0rCB28lk5hVI9wZQWU3pig1jgyAKDkYzTwZ0MlSR6v6TPIX9RfepjrO3.Gk3FEmcVRw;
+        path=/; expires=Mon, 24-Feb-25 17:15:57 GMT; domain=.api.openai.com; HttpOnly;
+        Secure; SameSite=None
+      - _cfuvid=ySaVoTQvAcQyH5QoJQJDj75e5j8HwGFPOlFMAWEvXJk-1740415557302-0.0.1.1-604800000;
+        path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None
+      Transfer-Encoding:
+      - chunked
+      X-Content-Type-Options:
+      - nosniff
+      access-control-expose-headers:
+      - X-Request-ID
+      alt-svc:
+      - h3=":443"; ma=86400
+      cf-cache-status:
+      - DYNAMIC
+      openai-organization:
+      - crewai-iuxna1
+      openai-processing-ms:
+      - '721'
+      openai-version:
+      - '2020-10-01'
+      strict-transport-security:
+      - max-age=31536000; includeSubDomains; preload
+      x-ratelimit-limit-requests:
+      - '30000'
+      x-ratelimit-limit-tokens:
+      - '150000000'
+      x-ratelimit-remaining-requests:
+      - '29999'
+      x-ratelimit-remaining-tokens:
+      - '149999808'
+      x-ratelimit-reset-requests:
+      - 2ms
+      x-ratelimit-reset-tokens:
+      - 0s
+      x-request-id:
+      - req_fc3b3bcd4382cddaa3c04ce7003e4857
+    http_version: HTTP/1.1
+    status_code: 200
+- request:
+    body: '{"messages": [{"role": "system", "content": "You are Task Execution Evaluator.
+      Evaluator agent for crew evaluation with precise capabilities to evaluate the
+      performance of the agents in the crew based on the tasks they have performed\nYour
+      personal goal is: Your goal is to evaluate the performance of the agents in
+      the crew based on the tasks they have performed using score from 1 to 10 evaluating
+      on completion, quality, and overall performance.\nTo give my best complete final
+      answer to the task respond using the exact following format:\n\nThought: I now
+      can give a great answer\nFinal Answer: Your final answer must be the great and
+      the most complete as possible, it must be outcome described.\n\nI MUST use these
+      formats, my job depends on it!"}, {"role": "user", "content": "\nCurrent Task:
+      Based on the task description and the expected output, compare and evaluate
+      the performance of the agents in the crew based on the Task Output they have
+      performed using score from 1 to 10 evaluating on completion, quality, and overall
+      performance.task_description: Just say hi task_expected_output: hi agent: base_agent
+      agent_goal: Just say hi Task Output: hi\n\nThis is the expected criteria for
+      your final answer: Evaluation Score from 1 to 10 based on the performance of
+      the agents on the tasks\nyou MUST return the actual complete content as the
+      final answer, not a summary.\nEnsure your final answer contains only the content
+      in the following format: {\n  \"quality\": float\n}\n\nEnsure the final output
+      does not include any code block markers like ```json or ```python.\n\nBegin!
+      This is VERY important to you, use the tools available and give your best Final
+      Answer, your job depends on it!\n\nThought:"}], "model": "gpt-4o-mini", "stop":
+      ["\nObservation:"]}'
+    headers:
+      accept:
+      - application/json
+      accept-encoding:
+      - gzip, deflate
+      connection:
+      - keep-alive
+      content-length:
+      - '1765'
+      content-type:
+      - application/json
+      cookie:
+      - __cf_bm=lvRw4Nyef7N35to64fj2_kHDfbZp0KSFbwgF5chYMRI-1740415557-1.0.1.1-o5BaN1FpBwv5Wq6zIlv0rCB28lk5hVI9wZQWU3pig1jgyAKDkYzTwZ0MlSR6v6TPIX9RfepjrO3.Gk3FEmcVRw;
+        _cfuvid=ySaVoTQvAcQyH5QoJQJDj75e5j8HwGFPOlFMAWEvXJk-1740415557302-0.0.1.1-604800000
+      host:
+      - api.openai.com
+      user-agent:
+      - OpenAI/Python 1.61.0
+      x-stainless-arch:
+      - arm64
+      x-stainless-async:
+      - 'false'
+      x-stainless-lang:
+      - python
+      x-stainless-os:
+      - MacOS
+      x-stainless-package-version:
+      - 1.61.0
+      x-stainless-raw-response:
+      - 'true'
+      x-stainless-retry-count:
+      - '0'
+      x-stainless-runtime:
+      - CPython
+      x-stainless-runtime-version:
+      - 3.12.8
+    method: POST
+    uri: https://api.openai.com/v1/chat/completions
+  response:
+    content: "{\n  \"id\": \"chatcmpl-B4Vsbd9AsRaJ2exDtWnHAwC8rIjfi\",\n  \"object\":
+      \"chat.completion\",\n  \"created\": 1740415557,\n  \"model\": \"gpt-4o-mini-2024-07-18\",\n
+      \ \"choices\": [\n    {\n      \"index\": 0,\n      \"message\": {\n        \"role\":
+      \"assistant\",\n        \"content\": \"I now can give a great answer  \\nFinal
+      Answer: {  \\n  \\\"quality\\\": 10  \\n}  \",\n        \"refusal\": null\n
+      \     },\n      \"logprobs\": null,\n      \"finish_reason\": \"stop\"\n    }\n
+      \ ],\n  \"usage\": {\n    \"prompt_tokens\": 338,\n    \"completion_tokens\":
+      22,\n    \"total_tokens\": 360,\n    \"prompt_tokens_details\": {\n      \"cached_tokens\":
+      0,\n      \"audio_tokens\": 0\n    },\n    \"completion_tokens_details\": {\n
+      \     \"reasoning_tokens\": 0,\n      \"audio_tokens\": 0,\n      \"accepted_prediction_tokens\":
+      0,\n      \"rejected_prediction_tokens\": 0\n    }\n  },\n  \"service_tier\":
+      \"default\",\n  \"system_fingerprint\": \"fp_7fcd609668\"\n}\n"
+    headers:
+      CF-RAY:
+      - 9170edd15bb5230e-SJC
+      Connection:
+      - keep-alive
+      Content-Encoding:
+      - gzip
+      Content-Type:
+      - application/json
+      Date:
+      - Mon, 24 Feb 2025 16:45:58 GMT
+      Server:
+      - cloudflare
+      Transfer-Encoding:
+      - chunked
+      X-Content-Type-Options:
+      - nosniff
+      access-control-expose-headers:
+      - X-Request-ID
+      alt-svc:
+      - h3=":443"; ma=86400
+      cf-cache-status:
+      - DYNAMIC
+      openai-organization:
+      - crewai-iuxna1
+      openai-processing-ms:
+      - '860'
+      openai-version:
+      - '2020-10-01'
+      strict-transport-security:
+      - max-age=31536000; includeSubDomains; preload
+      x-ratelimit-limit-requests:
+      - '30000'
+      x-ratelimit-limit-tokens:
+      - '150000000'
+      x-ratelimit-remaining-requests:
+      - '29999'
+      x-ratelimit-remaining-tokens:
+      - '149999578'
+      x-ratelimit-reset-requests:
+      - 2ms
+      x-ratelimit-reset-tokens:
+      - 0s
+      x-request-id:
+      - req_fad452c2d10b5fc95809130912b08837
+    http_version: HTTP/1.1
+    status_code: 200
+version: 1
--- a/tests/utilities/cassettes/test_llm_emits_call_failed_event.yaml
+++ b/tests/utilities/cassettes/test_llm_emits_call_failed_event.yaml
@@ -0,0 +1,103 @@
+interactions:
+- request:
+    body: '{"messages": [{"role": "user", "content": "Hello, how are you?"}], "model":
+      "gpt-4o-mini", "stop": []}'
+    headers:
+      accept:
+      - application/json
+      accept-encoding:
+      - gzip, deflate
+      connection:
+      - keep-alive
+      content-length:
+      - '102'
+      content-type:
+      - application/json
+      cookie:
+      - _cfuvid=IY8ppO70AMHr2skDSUsGh71zqHHdCQCZ3OvkPi26NBc-1740424913267-0.0.1.1-604800000;
+        __cf_bm=fU6K5KZoDmgcEuF8_yWAYKUO5fKHh6q5.wDPnna393g-1740424913-1.0.1.1-2iOaq3JVGWs439V0HxJee0IC9HdJm7dPkeJorD.AGw0YwkngRPM8rrTzn_7ht1BkbOauEezj.wPKcBz18gIYUg
+      host:
+      - api.openai.com
+      user-agent:
+      - OpenAI/Python 1.61.0
+      x-stainless-arch:
+      - arm64
+      x-stainless-async:
+      - 'false'
+      x-stainless-lang:
+      - python
+      x-stainless-os:
+      - MacOS
+      x-stainless-package-version:
+      - 1.61.0
+      x-stainless-raw-response:
+      - 'true'
+      x-stainless-retry-count:
+      - '0'
+      x-stainless-runtime:
+      - CPython
+      x-stainless-runtime-version:
+      - 3.12.8
+    method: POST
+    uri: https://api.openai.com/v1/chat/completions
+  response:
+    content: "{\n  \"id\": \"chatcmpl-B4YLA2SrC2rwdVQ3U87G5a0P5lsLw\",\n  \"object\":
+      \"chat.completion\",\n  \"created\": 1740425016,\n  \"model\": \"gpt-4o-mini-2024-07-18\",\n
+      \ \"choices\": [\n    {\n      \"index\": 0,\n      \"message\": {\n        \"role\":
+      \"assistant\",\n        \"content\": \"Hello! I'm just a computer program, so
+      I don't have feelings, but I'm here and ready to help you. How can I assist
+      you today?\",\n        \"refusal\": null\n      },\n      \"logprobs\": null,\n
+      \     \"finish_reason\": \"stop\"\n    }\n  ],\n  \"usage\": {\n    \"prompt_tokens\":
+      13,\n    \"completion_tokens\": 30,\n    \"total_tokens\": 43,\n    \"prompt_tokens_details\":
+      {\n      \"cached_tokens\": 0,\n      \"audio_tokens\": 0\n    },\n    \"completion_tokens_details\":
+      {\n      \"reasoning_tokens\": 0,\n      \"audio_tokens\": 0,\n      \"accepted_prediction_tokens\":
+      0,\n      \"rejected_prediction_tokens\": 0\n    }\n  },\n  \"service_tier\":
+      \"default\",\n  \"system_fingerprint\": \"fp_709714d124\"\n}\n"
+    headers:
+      CF-RAY:
+      - 9171d4c0ed44236e-SJC
+      Connection:
+      - keep-alive
+      Content-Encoding:
+      - gzip
+      Content-Type:
+      - application/json
+      Date:
+      - Mon, 24 Feb 2025 19:23:38 GMT
+      Server:
+      - cloudflare
+      Transfer-Encoding:
+      - chunked
+      X-Content-Type-Options:
+      - nosniff
+      access-control-expose-headers:
+      - X-Request-ID
+      alt-svc:
+      - h3=":443"; ma=86400
+      cf-cache-status:
+      - DYNAMIC
+      openai-organization:
+      - crewai-iuxna1
+      openai-processing-ms:
+      - '1954'
+      openai-version:
+      - '2020-10-01'
+      strict-transport-security:
+      - max-age=31536000; includeSubDomains; preload
+      x-ratelimit-limit-requests:
+      - '30000'
+      x-ratelimit-limit-tokens:
+      - '150000000'
+      x-ratelimit-remaining-requests:
+      - '29999'
+      x-ratelimit-remaining-tokens:
+      - '149999978'
+      x-ratelimit-reset-requests:
+      - 2ms
+      x-ratelimit-reset-tokens:
+      - 0s
+      x-request-id:
+      - req_ea2703502b8827e4297cd2a7bae9d9c8
+    http_version: HTTP/1.1
+    status_code: 200
+version: 1
--- a/tests/utilities/cassettes/test_llm_emits_call_started_event.yaml
+++ b/tests/utilities/cassettes/test_llm_emits_call_started_event.yaml
@@ -0,0 +1,108 @@
+interactions:
+- request:
+    body: '{"messages": [{"role": "user", "content": "Hello, how are you?"}], "model":
+      "gpt-4o-mini", "stop": []}'
+    headers:
+      accept:
+      - application/json
+      accept-encoding:
+      - gzip, deflate
+      connection:
+      - keep-alive
+      content-length:
+      - '102'
+      content-type:
+      - application/json
+      cookie:
+      - _cfuvid=GefCcEtb_Gem93E4a9Hvt3Xyof1YQZVJAXBb9I6pEUs-1739398417375-0.0.1.1-604800000
+      host:
+      - api.openai.com
+      user-agent:
+      - OpenAI/Python 1.61.0
+      x-stainless-arch:
+      - arm64
+      x-stainless-async:
+      - 'false'
+      x-stainless-lang:
+      - python
+      x-stainless-os:
+      - MacOS
+      x-stainless-package-version:
+      - 1.61.0
+      x-stainless-raw-response:
+      - 'true'
+      x-stainless-retry-count:
+      - '0'
+      x-stainless-runtime:
+      - CPython
+      x-stainless-runtime-version:
+      - 3.12.8
+    method: POST
+    uri: https://api.openai.com/v1/chat/completions
+  response:
+    content: "{\n  \"id\": \"chatcmpl-B4YJU8IWKGyBQtAyPDRd3SFI2flYR\",\n  \"object\":
+      \"chat.completion\",\n  \"created\": 1740424912,\n  \"model\": \"gpt-4o-mini-2024-07-18\",\n
+      \ \"choices\": [\n    {\n      \"index\": 0,\n      \"message\": {\n        \"role\":
+      \"assistant\",\n        \"content\": \"Hello! I'm just a computer program, so
+      I don't have feelings, but I'm here and ready to help you. How can I assist
+      you today?\",\n        \"refusal\": null\n      },\n      \"logprobs\": null,\n
+      \     \"finish_reason\": \"stop\"\n    }\n  ],\n  \"usage\": {\n    \"prompt_tokens\":
+      13,\n    \"completion_tokens\": 30,\n    \"total_tokens\": 43,\n    \"prompt_tokens_details\":
+      {\n      \"cached_tokens\": 0,\n      \"audio_tokens\": 0\n    },\n    \"completion_tokens_details\":
+      {\n      \"reasoning_tokens\": 0,\n      \"audio_tokens\": 0,\n      \"accepted_prediction_tokens\":
+      0,\n      \"rejected_prediction_tokens\": 0\n    }\n  },\n  \"service_tier\":
+      \"default\",\n  \"system_fingerprint\": \"fp_7fcd609668\"\n}\n"
+    headers:
+      CF-RAY:
+      - 9171d230d8ed7ae0-SJC
+      Connection:
+      - keep-alive
+      Content-Encoding:
+      - gzip
+      Content-Type:
+      - application/json
+      Date:
+      - Mon, 24 Feb 2025 19:21:53 GMT
+      Server:
+      - cloudflare
+      Set-Cookie:
+      - __cf_bm=fU6K5KZoDmgcEuF8_yWAYKUO5fKHh6q5.wDPnna393g-1740424913-1.0.1.1-2iOaq3JVGWs439V0HxJee0IC9HdJm7dPkeJorD.AGw0YwkngRPM8rrTzn_7ht1BkbOauEezj.wPKcBz18gIYUg;
+        path=/; expires=Mon, 24-Feb-25 19:51:53 GMT; domain=.api.openai.com; HttpOnly;
+        Secure; SameSite=None
+      - _cfuvid=IY8ppO70AMHr2skDSUsGh71zqHHdCQCZ3OvkPi26NBc-1740424913267-0.0.1.1-604800000;
+        path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None
+      Transfer-Encoding:
+      - chunked
+      X-Content-Type-Options:
+      - nosniff
+      access-control-expose-headers:
+      - X-Request-ID
+      alt-svc:
+      - h3=":443"; ma=86400
+      cf-cache-status:
+      - DYNAMIC
+      openai-organization:
+      - crewai-iuxna1
+      openai-processing-ms:
+      - '993'
+      openai-version:
+      - '2020-10-01'
+      strict-transport-security:
+      - max-age=31536000; includeSubDomains; preload
+      x-ratelimit-limit-requests:
+      - '30000'
+      x-ratelimit-limit-tokens:
+      - '150000000'
+      x-ratelimit-remaining-requests:
+      - '29999'
+      x-ratelimit-remaining-tokens:
+      - '149999978'
+      x-ratelimit-reset-requests:
+      - 2ms
+      x-ratelimit-reset-tokens:
+      - 0s
+      x-request-id:
+      - req_d9c4d49185e97b1797061efc1e55d811
+    http_version: HTTP/1.1
+    status_code: 200
+version: 1
--- a/tests/utilities/test_converter.py
+++ b/tests/utilities/test_converter.py
@@ -1,4 +1,5 @@
 import json
+import os
 from typing import Dict, List, Optional
 from unittest.mock import MagicMock, Mock, patch

@@ -220,10 +221,13 @@ def test_get_conversion_instructions_gpt():
        supports_function_calling.return_value = True
        instructions = get_conversion_instructions(SimpleModel, llm)
        model_schema = PydanticSchemaParser(model=SimpleModel).get_schema()
-        assert (
-            instructions
-            == f"Please convert the following text into valid JSON.\n\nThe JSON should follow this schema:\n```json\n{model_schema}\n```"
+        expected_instructions = (
+            "Please convert the following text into valid JSON.\n\n"
+            "Output ONLY the valid JSON and nothing else.\n\n"
+            "The JSON must follow this schema exactly:\n```json\n"
+            f"{model_schema}\n```"
        )
+        assert instructions == expected_instructions


 def test_get_conversion_instructions_non_gpt():
@@ -346,12 +350,17 @@ def test_convert_with_instructions():
    assert output.age == 30


-@pytest.mark.vcr(filter_headers=["authorization"])
+# Skip tests that call external APIs when running in CI/CD
+skip_external_api = pytest.mark.skipif(
+    os.getenv("CI") is not None, reason="Skipping tests that call external API in CI/CD"
+)
+
+
+@skip_external_api
+@pytest.mark.vcr(filter_headers=["authorization"], record_mode="once")
 def test_converter_with_llama3_2_model():
    llm = LLM(model="ollama/llama3.2:3b", base_url="http://localhost:11434")
-
    sample_text = "Name: Alice Llama, Age: 30"
-
    instructions = get_conversion_instructions(SimpleModel, llm)
    converter = Converter(
        llm=llm,
@@ -359,19 +368,17 @@ def test_converter_with_llama3_2_model():
        model=SimpleModel,
        instructions=instructions,
    )
-
    output = converter.to_pydantic()
-
    assert isinstance(output, SimpleModel)
    assert output.name == "Alice Llama"
    assert output.age == 30


-@pytest.mark.vcr(filter_headers=["authorization"])
+@skip_external_api
+@pytest.mark.vcr(filter_headers=["authorization"], record_mode="once")
 def test_converter_with_llama3_1_model():
    llm = LLM(model="ollama/llama3.1", base_url="http://localhost:11434")
    sample_text = "Name: Alice Llama, Age: 30"
-
    instructions = get_conversion_instructions(SimpleModel, llm)
    converter = Converter(
        llm=llm,
@@ -379,14 +386,19 @@ def test_converter_with_llama3_1_model():
        model=SimpleModel,
        instructions=instructions,
    )
-
    output = converter.to_pydantic()
-
    assert isinstance(output, SimpleModel)
    assert output.name == "Alice Llama"
    assert output.age == 30


+# Skip tests that call external APIs when running in CI/CD
+skip_external_api = pytest.mark.skipif(
+    os.getenv("CI") is not None, reason="Skipping tests that call external API in CI/CD"
+)
+
+
+@skip_external_api
@pytest.mark.vcr(filter_headers=["authorization"])
 def test_converter_with_nested_model():
    llm = LLM(model="gpt-4o-mini")
@@ -563,7 +575,7 @@ def test_converter_with_ambiguous_input():
    with pytest.raises(ConverterError) as exc_info:
        output = converter.to_pydantic()

-    assert "validation error" in str(exc_info.value).lower()
+    assert "failed to convert text into a pydantic model" in str(exc_info.value).lower()


 # Tests for function calling support
--- a/tests/utilities/test_events.py
+++ b/tests/utilities/test_events.py
@@ -1,6 +1,5 @@
-import json
 from datetime import datetime
-from unittest.mock import MagicMock, patch
+from unittest.mock import Mock, patch

 import pytest
 from pydantic import Field
@@ -9,9 +8,9 @@ from crewai.agent import Agent
 from crewai.agents.crew_agent_executor import CrewAgentExecutor
 from crewai.crew import Crew
 from crewai.flow.flow import Flow, listen, start
+from crewai.llm import LLM
 from crewai.task import Task
 from crewai.tools.base_tool import BaseTool
-from crewai.tools.tool_usage import ToolUsage
 from crewai.utilities.events.agent_events import (
    AgentExecutionCompletedEvent,
    AgentExecutionErrorEvent,
@@ -21,8 +20,11 @@ from crewai.utilities.events.crew_events import (
    CrewKickoffCompletedEvent,
    CrewKickoffFailedEvent,
    CrewKickoffStartedEvent,
+    CrewTestCompletedEvent,
+    CrewTestStartedEvent,
 )
 from crewai.utilities.events.crewai_event_bus import crewai_event_bus
+from crewai.utilities.events.event_listener import EventListener
 from crewai.utilities.events.event_types import ToolUsageFinishedEvent
 from crewai.utilities.events.flow_events import (
    FlowCreatedEvent,
@@ -31,6 +33,12 @@ from crewai.utilities.events.flow_events import (
    MethodExecutionFailedEvent,
    MethodExecutionStartedEvent,
 )
+from crewai.utilities.events.llm_events import (
+    LLMCallCompletedEvent,
+    LLMCallFailedEvent,
+    LLMCallStartedEvent,
+    LLMCallType,
+)
 from crewai.utilities.events.task_events import (
    TaskCompletedEvent,
    TaskFailedEvent,
@@ -52,26 +60,35 @@ base_task = Task(
    expected_output="hi",
    agent=base_agent,
 )
+event_listener = EventListener()


@pytest.mark.vcr(filter_headers=["authorization"])
 def test_crew_emits_start_kickoff_event():
    received_events = []
+    mock_span = Mock()

-    with crewai_event_bus.scoped_handlers():
-
-        @crewai_event_bus.on(CrewKickoffStartedEvent)
-        def handle_crew_start(source, event):
-            received_events.append(event)
-
-        crew = Crew(agents=[base_agent], tasks=[base_task], name="TestCrew")
+    @crewai_event_bus.on(CrewKickoffStartedEvent)
+    def handle_crew_start(source, event):
+        received_events.append(event)

+    crew = Crew(agents=[base_agent], tasks=[base_task], name="TestCrew")
+    with (
+        patch.object(
+            event_listener._telemetry, "crew_execution_span", return_value=mock_span
+        ) as mock_crew_execution_span,
+        patch.object(
+            event_listener._telemetry, "end_crew", return_value=mock_span
+        ) as mock_crew_ended,
+    ):
        crew.kickoff()
+    mock_crew_execution_span.assert_called_once_with(crew, None)
+    mock_crew_ended.assert_called_once_with(crew, "hi")

-        assert len(received_events) == 1
-        assert received_events[0].crew_name == "TestCrew"
-        assert isinstance(received_events[0].timestamp, datetime)
-        assert received_events[0].type == "crew_kickoff_started"
+    assert len(received_events) == 1
+    assert received_events[0].crew_name == "TestCrew"
+    assert isinstance(received_events[0].timestamp, datetime)
+    assert received_events[0].type == "crew_kickoff_started"


@pytest.mark.vcr(filter_headers=["authorization"])
@@ -92,6 +109,45 @@ def test_crew_emits_end_kickoff_event():
    assert received_events[0].type == "crew_kickoff_completed"


+@pytest.mark.vcr(filter_headers=["authorization"])
+def test_crew_emits_test_kickoff_type_event():
+    received_events = []
+    mock_span = Mock()
+
+    @crewai_event_bus.on(CrewTestStartedEvent)
+    def handle_crew_end(source, event):
+        received_events.append(event)
+
+    @crewai_event_bus.on(CrewTestCompletedEvent)
+    def handle_crew_test_end(source, event):
+        received_events.append(event)
+
+    eval_llm = LLM(model="gpt-4o-mini")
+    with (
+        patch.object(
+            event_listener._telemetry, "test_execution_span", return_value=mock_span
+        ) as mock_crew_execution_span,
+    ):
+        crew = Crew(agents=[base_agent], tasks=[base_task], name="TestCrew")
+        crew.test(n_iterations=1, eval_llm=eval_llm)
+
+        # Verify the call was made with correct argument types and values
+        assert mock_crew_execution_span.call_count == 1
+        args = mock_crew_execution_span.call_args[0]
+        assert isinstance(args[0], Crew)
+        assert args[1] == 1
+        assert args[2] is None
+        assert args[3] == eval_llm
+
+    assert len(received_events) == 2
+    assert received_events[0].crew_name == "TestCrew"
+    assert isinstance(received_events[0].timestamp, datetime)
+    assert received_events[0].type == "crew_test_started"
+    assert received_events[1].crew_name == "TestCrew"
+    assert isinstance(received_events[1].timestamp, datetime)
+    assert received_events[1].type == "crew_test_completed"
+
+
@pytest.mark.vcr(filter_headers=["authorization"])
 def test_crew_emits_kickoff_failed_event():
    received_events = []
@@ -142,9 +198,20 @@ def test_crew_emits_end_task_event():
    def handle_task_end(source, event):
        received_events.append(event)

+    mock_span = Mock()
    crew = Crew(agents=[base_agent], tasks=[base_task], name="TestCrew")
+    with (
+        patch.object(
+            event_listener._telemetry, "task_started", return_value=mock_span
+        ) as mock_task_started,
+        patch.object(
+            event_listener._telemetry, "task_ended", return_value=mock_span
+        ) as mock_task_ended,
+    ):
+        crew.kickoff()

-    crew.kickoff()
+    mock_task_started.assert_called_once_with(crew=crew, task=base_task)
+    mock_task_ended.assert_called_once_with(mock_span, base_task, crew)

    assert len(received_events) == 1
    assert isinstance(received_events[0].timestamp, datetime)
@@ -334,24 +401,29 @@ def test_tools_emits_error_events():

 def test_flow_emits_start_event():
    received_events = []
+    mock_span = Mock()

-    with crewai_event_bus.scoped_handlers():
+    @crewai_event_bus.on(FlowStartedEvent)
+    def handle_flow_start(source, event):
+        received_events.append(event)

-        @crewai_event_bus.on(FlowStartedEvent)
-        def handle_flow_start(source, event):
-            received_events.append(event)
-
-        class TestFlow(Flow[dict]):
-            @start()
-            def begin(self):
-                return "started"
+    class TestFlow(Flow[dict]):
+        @start()
+        def begin(self):
+            return "started"

+    with (
+        patch.object(
+            event_listener._telemetry, "flow_execution_span", return_value=mock_span
+        ) as mock_flow_execution_span,
+    ):
        flow = TestFlow()
        flow.kickoff()

-        assert len(received_events) == 1
-        assert received_events[0].flow_name == "TestFlow"
-        assert received_events[0].type == "flow_started"
+    mock_flow_execution_span.assert_called_once_with("TestFlow", ["begin"])
+    assert len(received_events) == 1
+    assert received_events[0].flow_name == "TestFlow"
+    assert received_events[0].type == "flow_started"


 def test_flow_emits_finish_event():
@@ -455,6 +527,7 @@ def test_multiple_handlers_for_same_event():

 def test_flow_emits_created_event():
    received_events = []
+    mock_span = Mock()

    @crewai_event_bus.on(FlowCreatedEvent)
    def handle_flow_created(source, event):
@@ -465,8 +538,15 @@ def test_flow_emits_created_event():
        def begin(self):
            return "started"

-    flow = TestFlow()
-    flow.kickoff()
+    with (
+        patch.object(
+            event_listener._telemetry, "flow_creation_span", return_value=mock_span
+        ) as mock_flow_creation_span,
+    ):
+        flow = TestFlow()
+        flow.kickoff()
+
+    mock_flow_creation_span.assert_called_once_with("TestFlow")

    assert len(received_events) == 1
    assert received_events[0].flow_name == "TestFlow"
@@ -495,3 +575,43 @@ def test_flow_emits_method_execution_failed_event():
    assert received_events[0].flow_name == "TestFlow"
    assert received_events[0].type == "method_execution_failed"
    assert received_events[0].error == error
+
+
+@pytest.mark.vcr(filter_headers=["authorization"])
+def test_llm_emits_call_started_event():
+    received_events = []
+
+    @crewai_event_bus.on(LLMCallStartedEvent)
+    def handle_llm_call_started(source, event):
+        received_events.append(event)
+
+    @crewai_event_bus.on(LLMCallCompletedEvent)
+    def handle_llm_call_completed(source, event):
+        received_events.append(event)
+
+    llm = LLM(model="gpt-4o-mini")
+    llm.call("Hello, how are you?")
+
+    assert len(received_events) == 2
+    assert received_events[0].type == "llm_call_started"
+    assert received_events[1].type == "llm_call_completed"
+
+
+@pytest.mark.vcr(filter_headers=["authorization"])
+def test_llm_emits_call_failed_event():
+    received_events = []
+
+    @crewai_event_bus.on(LLMCallFailedEvent)
+    def handle_llm_call_failed(source, event):
+        received_events.append(event)
+
+    error_message = "Simulated LLM call failure"
+    with patch.object(LLM, "_call_llm", side_effect=Exception(error_message)):
+        llm = LLM(model="gpt-4o-mini")
+        with pytest.raises(Exception) as exc_info:
+            llm.call("Hello, how are you?")
+
+        assert str(exc_info.value) == error_message
+        assert len(received_events) == 1
+        assert received_events[0].type == "llm_call_failed"
+        assert received_events[0].error == error_message
Author	SHA1	Message	Date
Lorenze Jay	17a19dee0c	lint	2025-02-24 12:19:36 -08:00
Lorenze Jay	75a84a55c2	Merge branch 'main' of github.com:crewAIInc/crewAI into better-telemetry-tests	2025-02-24 12:18:43 -08:00
Lorenze Jay	7460906712	refactor: Improve telemetry span tracking in EventListener - Remove `execution_span` from Task class - Add `execution_spans` dictionary to EventListener to track spans - Update task event handlers to use new span tracking mechanism - Simplify span management across task lifecycle events	2025-02-24 12:17:56 -08:00
Lorenze Jay	c62fb615b1	feat: Add LLM call events for improved observability (#2214 ) * feat: Add LLM call events for improved observability - Introduce new LLM call events: LLMCallStartedEvent, LLMCallCompletedEvent, and LLMCallFailedEvent - Emit events for LLM calls and tool calls to provide better tracking and debugging - Add event handling in the LLM class to track call lifecycle - Update event bus to support new LLM-related events - Add test cases to validate LLM event emissions * feat: Add event handling for LLM call lifecycle events - Implement event listeners for LLM call events in EventListener - Add logging for LLM call start, completion, and failure events - Import and register new LLM-specific event types * less log * refactor: Update LLM event response type to support Any * refactor: Simplify LLM call completed event emission Remove unnecessary LLMCallType conversion when emitting LLMCallCompletedEvent * refactor: Update LLM event docstrings for clarity Improve docstrings for LLM call events to more accurately describe their purpose and lifecycle * feat: Add LLMCallFailedEvent emission for tool execution errors Enhance error handling by emitting a specific event when tool execution fails during LLM calls	2025-02-24 15:17:44 -05:00
Brandon Hancock (bhancock_ai)	78797c64b0	fix reset memory issue (#2182 ) Co-authored-by: Lorenze Jay <63378463+lorenzejay@users.noreply.github.com>	2025-02-24 14:51:58 -05:00
Lorenze Jay	84c809eee2	dropped comment	2025-02-24 11:05:47 -08:00
Lorenze Jay	9bb46f158c	test: Improve crew verbose output test with event log filtering - Filter out event listener logs in verbose output test - Ensure no output when verbose is set to False - Enhance test coverage for crew logging behavior	2025-02-24 11:05:10 -08:00
Lorenze Jay	2d4a7701e6	Merge branch 'main' of github.com:crewAIInc/crewAI into better-telemetry-tests	2025-02-24 09:03:08 -08:00
Lorenze Jay	9c040c9e97	Remove telemetry references from Crew class - Remove Telemetry import and initialization from Crew class - Delete _telemetry attribute from class configuration - Clean up unused telemetry-related code	2025-02-24 09:01:26 -08:00
Lorenze Jay	2d07c8d2e4	feat: Enhance event listener and telemetry tracking - Update event listener to improve telemetry span handling - Add execution_span field to Task for better tracing - Modify event handling in EventListener to use new span tracking - Remove debug print statements - Improve test coverage for crew and flow events - Update cassettes to reflect new event tracking behavior	2025-02-24 09:00:06 -08:00
Brandon Hancock (bhancock_ai)	8a7584798b	Better support async flows (#2193 ) * Better support async * Drop coroutine	2025-02-24 10:25:30 -05:00
Jannik Maierhöfer	b50772a38b	docs: add header image to langfuse guide (#2128 ) Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com>	2025-02-21 10:11:55 -05:00
João Moura	96a7e8038f	cassetes	2025-02-20 21:00:10 -06:00
Brandon Hancock (bhancock_ai)	ec050e5d33	drop prints (#2181 )	2025-02-20 12:35:39 -05:00
Brandon Hancock (bhancock_ai)	e2ce65fc5b	Check the right property for tool calling (#2160 ) * Check the right property * Fix failing tests * Update cassettes * Update cassettes again * Update cassettes again 2 * Update cassettes again 3 * fix other test that fails in ci/cd * Fix issues pointed out by lorenze	2025-02-20 12:12:52 -05:00