feat: add utils for deferred imports

feat: additional a2a transports
Co-authored-by: Koushiv Sadhukhan <koushiv.777@gmail.com> Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
2026-01-23 23:28:15 +00:00 · 2026-01-13 11:27:02 -05:00 · 2026-01-12 12:03:06 -05:00 · 2026-01-12 02:58:42 -05:00 · 2026-01-11 16:02:55 -08:00
17 changed files with 606 additions and 230 deletions
--- a/docs/en/concepts/flows.mdx
+++ b/docs/en/concepts/flows.mdx
@@ -574,6 +574,10 @@ When you run this Flow, the output will change based on the random boolean value

 ### Human in the Loop (human feedback)

+<Note>
+The `@human_feedback` decorator requires **CrewAI version 1.8.0 or higher**.
+</Note>
+
 The `@human_feedback` decorator enables human-in-the-loop workflows by pausing flow execution to collect feedback from a human. This is useful for approval gates, quality review, and decision points that require human judgment.

 ```python Code
--- a/docs/en/learn/a2a-agent-delegation.mdx
+++ b/docs/en/learn/a2a-agent-delegation.mdx
@@ -91,6 +91,10 @@ The `A2AConfig` class accepts the following parameters:
  Update mechanism for receiving task status. Options: `StreamingConfig`, `PollingConfig`, or `PushNotificationConfig`.
 </ParamField>

+<ParamField path="transport_protocol" type="Literal['JSONRPC', 'GRPC', 'HTTP+JSON']" default="JSONRPC">
+  Transport protocol for A2A communication. Options: `JSONRPC` (default), `GRPC`, or `HTTP+JSON`.
+</ParamField>
+
 ## Authentication

 For A2A agents that require authentication, use one of the provided auth schemes:
--- a/docs/en/learn/human-feedback-in-flows.mdx
+++ b/docs/en/learn/human-feedback-in-flows.mdx
@@ -7,6 +7,10 @@ mode: "wide"

 ## Overview

+<Note>
+The `@human_feedback` decorator requires **CrewAI version 1.8.0 or higher**. Make sure to update your installation before using this feature.
+</Note>
+
 The `@human_feedback` decorator enables human-in-the-loop (HITL) workflows directly within CrewAI Flows. It allows you to pause flow execution, present output to a human for review, collect their feedback, and optionally route to different listeners based on the feedback outcome.

 This is particularly valuable for:
--- a/docs/en/learn/human-in-the-loop.mdx
+++ b/docs/en/learn/human-in-the-loop.mdx
@@ -11,10 +11,10 @@ Human-in-the-Loop (HITL) is a powerful approach that combines artificial intelli

 CrewAI offers two main approaches for implementing human-in-the-loop workflows:

-| Approach | Best For | Integration |
-|----------|----------|-------------|
-| **Flow-based** (`@human_feedback` decorator) | Local development, console-based review, synchronous workflows | [Human Feedback in Flows](/en/learn/human-feedback-in-flows) |
-| **Webhook-based** (Enterprise) | Production deployments, async workflows, external integrations (Slack, Teams, etc.) | This guide |
+| Approach | Best For | Integration | Version |
+|----------|----------|-------------|---------|
+| **Flow-based** (`@human_feedback` decorator) | Local development, console-based review, synchronous workflows | [Human Feedback in Flows](/en/learn/human-feedback-in-flows) | **1.8.0+** |
+| **Webhook-based** (Enterprise) | Production deployments, async workflows, external integrations (Slack, Teams, etc.) | This guide | - |

 <Tip>
 If you're building flows and want to add human review steps with routing based on feedback, check out the [Human Feedback in Flows](/en/learn/human-feedback-in-flows) guide for the `@human_feedback` decorator.
--- a/docs/ko/concepts/flows.mdx
+++ b/docs/ko/concepts/flows.mdx
@@ -567,6 +567,10 @@ Fourth method running

 ### Human in the Loop (인간 피드백)

+<Note>
+`@human_feedback` 데코레이터는 **CrewAI 버전 1.8.0 이상**이 필요합니다.
+</Note>
+
 `@human_feedback` 데코레이터는 인간의 피드백을 수집하기 위해 플로우 실행을 일시 중지하는 human-in-the-loop 워크플로우를 가능하게 합니다. 이는 승인 게이트, 품질 검토, 인간의 판단이 필요한 결정 지점에 유용합니다.

 ```python Code
--- a/docs/ko/learn/human-feedback-in-flows.mdx
+++ b/docs/ko/learn/human-feedback-in-flows.mdx
@@ -7,6 +7,10 @@ mode: "wide"

 ## 개요

+<Note>
+`@human_feedback` 데코레이터는 **CrewAI 버전 1.8.0 이상**이 필요합니다. 이 기능을 사용하기 전에 설치를 업데이트하세요.
+</Note>
+
 `@human_feedback` 데코레이터는 CrewAI Flow 내에서 직접 human-in-the-loop(HITL) 워크플로우를 가능하게 합니다. Flow 실행을 일시 중지하고, 인간에게 검토를 위해 출력을 제시하고, 피드백을 수집하고, 선택적으로 피드백 결과에 따라 다른 리스너로 라우팅할 수 있습니다.

 이는 특히 다음과 같은 경우에 유용합니다:
--- a/docs/ko/learn/human-in-the-loop.mdx
+++ b/docs/ko/learn/human-in-the-loop.mdx
@@ -5,9 +5,22 @@ icon: "user-check"
 mode: "wide"
 ---

-휴먼 인 더 루프(HITL, Human-in-the-Loop)는 인공지능과 인간의 전문 지식을 결합하여 의사결정을 강화하고 작업 결과를 향상시키는 강력한 접근 방식입니다. 이 가이드에서는 CrewAI 내에서 HITL을 구현하는 방법을 안내합니다.
+휴먼 인 더 루프(HITL, Human-in-the-Loop)는 인공지능과 인간의 전문 지식을 결합하여 의사결정을 강화하고 작업 결과를 향상시키는 강력한 접근 방식입니다. CrewAI는 필요에 따라 HITL을 구현하는 여러 가지 방법을 제공합니다.

-## HITL 워크플로우 설정
+## HITL 접근 방식 선택
+
+CrewAI는 human-in-the-loop 워크플로우를 구현하기 위한 두 가지 주요 접근 방식을 제공합니다:
+
+| 접근 방식 | 적합한 용도 | 통합 | 버전 |
+|----------|----------|-------------|---------|
+| **Flow 기반** (`@human_feedback` 데코레이터) | 로컬 개발, 콘솔 기반 검토, 동기식 워크플로우 | [Flow에서 인간 피드백](/ko/learn/human-feedback-in-flows) | **1.8.0+** |
+| **Webhook 기반** (Enterprise) | 프로덕션 배포, 비동기 워크플로우, 외부 통합 (Slack, Teams 등) | 이 가이드 | - |
+
+<Tip>
+Flow를 구축하면서 피드백을 기반으로 라우팅하는 인간 검토 단계를 추가하려면 `@human_feedback` 데코레이터에 대한 [Flow에서 인간 피드백](/ko/learn/human-feedback-in-flows) 가이드를 참조하세요.
+</Tip>
+
+## Webhook 기반 HITL 워크플로우 설정

 <Steps>
    <Step title="작업 구성">
--- a/docs/pt-BR/concepts/flows.mdx
+++ b/docs/pt-BR/concepts/flows.mdx
@@ -309,6 +309,10 @@ Ao executar esse Flow, a saída será diferente dependendo do valor booleano ale

 ### Human in the Loop (feedback humano)

+<Note>
+O decorador `@human_feedback` requer **CrewAI versão 1.8.0 ou superior**.
+</Note>
+
 O decorador `@human_feedback` permite fluxos de trabalho human-in-the-loop, pausando a execução do flow para coletar feedback de um humano. Isso é útil para portões de aprovação, revisão de qualidade e pontos de decisão que requerem julgamento humano.

 ```python Code
--- a/docs/pt-BR/learn/human-feedback-in-flows.mdx
+++ b/docs/pt-BR/learn/human-feedback-in-flows.mdx
@@ -7,6 +7,10 @@ mode: "wide"

 ## Visão Geral

+<Note>
+O decorador `@human_feedback` requer **CrewAI versão 1.8.0 ou superior**. Certifique-se de atualizar sua instalação antes de usar este recurso.
+</Note>
+
 O decorador `@human_feedback` permite fluxos de trabalho human-in-the-loop (HITL) diretamente nos CrewAI Flows. Ele permite pausar a execução do flow, apresentar a saída para um humano revisar, coletar seu feedback e, opcionalmente, rotear para diferentes listeners com base no resultado do feedback.

 Isso é particularmente valioso para:
--- a/docs/pt-BR/learn/human-in-the-loop.mdx
+++ b/docs/pt-BR/learn/human-in-the-loop.mdx
@@ -5,9 +5,22 @@ icon: "user-check"
 mode: "wide"
 ---

-Human-in-the-Loop (HITL) é uma abordagem poderosa que combina a inteligência artificial com a experiência humana para aprimorar a tomada de decisões e melhorar os resultados das tarefas. Este guia mostra como implementar HITL dentro da CrewAI.
+Human-in-the-Loop (HITL) é uma abordagem poderosa que combina a inteligência artificial com a experiência humana para aprimorar a tomada de decisões e melhorar os resultados das tarefas. CrewAI oferece várias maneiras de implementar HITL dependendo das suas necessidades.

-## Configurando Workflows HITL
+## Escolhendo Sua Abordagem HITL
+
+CrewAI oferece duas abordagens principais para implementar workflows human-in-the-loop:
+
+| Abordagem | Melhor Para | Integração | Versão |
+|----------|----------|-------------|---------|
+| **Baseada em Flow** (decorador `@human_feedback`) | Desenvolvimento local, revisão via console, workflows síncronos | [Feedback Humano em Flows](/pt-BR/learn/human-feedback-in-flows) | **1.8.0+** |
+| **Baseada em Webhook** (Enterprise) | Deployments em produção, workflows assíncronos, integrações externas (Slack, Teams, etc.) | Este guia | - |
+
+<Tip>
+Se você está construindo flows e deseja adicionar etapas de revisão humana com roteamento baseado em feedback, confira o guia [Feedback Humano em Flows](/pt-BR/learn/human-feedback-in-flows) para o decorador `@human_feedback`.
+</Tip>
+
+## Configurando Workflows HITL Baseados em Webhook

 <Steps>
    <Step title="Configure sua Tarefa">
--- a/lib/crewai/src/crewai/a2a/config.py
+++ b/lib/crewai/src/crewai/a2a/config.py
@@ -5,7 +5,7 @@ This module is separate from experimental.a2a to avoid circular imports.

 from __future__ import annotations

-from typing import Annotated, Any, ClassVar
+from typing import Annotated, Any, ClassVar, Literal

 from pydantic import (
    BaseModel,
@@ -53,6 +53,7 @@ class A2AConfig(BaseModel):
        fail_fast: If True, raise error when agent unreachable; if False, skip and continue.
        trust_remote_completion_status: If True, return A2A agent's result directly when completed.
        updates: Update mechanism config.
+        transport_protocol: A2A transport protocol (grpc, jsonrpc, http+json).
    """

    model_config: ClassVar[ConfigDict] = ConfigDict(extra="forbid")
@@ -82,3 +83,7 @@ class A2AConfig(BaseModel):
        default_factory=_get_default_update_config,
        description="Update mechanism config",
    )
+    transport_protocol: Literal["JSONRPC", "GRPC", "HTTP+JSON"] = Field(
+        default="JSONRPC",
+        description="Specified mode of A2A transport protocol",
+    )
--- a/lib/crewai/src/crewai/a2a/utils.py
+++ b/lib/crewai/src/crewai/a2a/utils.py
@@ -7,7 +7,7 @@ from collections.abc import AsyncIterator, MutableMapping
 from contextlib import asynccontextmanager
 from functools import lru_cache
 import time
-from typing import TYPE_CHECKING, Any
+from typing import TYPE_CHECKING, Any, Literal
 import uuid

 from a2a.client import A2AClientHTTPError, Client, ClientConfig, ClientFactory
@@ -18,7 +18,6 @@ from a2a.types import (
    PushNotificationConfig as A2APushNotificationConfig,
    Role,
    TextPart,
-    TransportProtocol,
 )
 from aiocache import cached  # type: ignore[import-untyped]
 from aiocache.serializers import PickleSerializer  # type: ignore[import-untyped]
@@ -259,6 +258,7 @@ async def _afetch_agent_card_impl(

 def execute_a2a_delegation(
    endpoint: str,
+    transport_protocol: Literal["JSONRPC", "GRPC", "HTTP+JSON"],
    auth: AuthScheme | None,
    timeout: int,
    task_description: str,
@@ -282,6 +282,23 @@ def execute_a2a_delegation(
    use aexecute_a2a_delegation directly.

    Args:
+        endpoint: A2A agent endpoint URL (AgentCard URL)
+        transport_protocol: Optional A2A transport protocol (grpc, jsonrpc, http+json)
+        auth: Optional AuthScheme for authentication (Bearer, OAuth2, API Key, HTTP Basic/Digest)
+        timeout: Request timeout in seconds
+        task_description: The task to delegate
+        context: Optional context information
+        context_id: Context ID for correlating messages/tasks
+        task_id: Specific task identifier
+        reference_task_ids: List of related task IDs
+        metadata: Additional metadata (external_id, request_id, etc.)
+        extensions: Protocol extensions for custom fields
+        conversation_history: Previous Message objects from conversation
+        agent_id: Agent identifier for logging
+        agent_role: Role of the CrewAI agent delegating the task
+        agent_branch: Optional agent tree branch for logging
+        response_model: Optional Pydantic model for structured outputs
+        turn_number: Optional turn number for multi-turn conversations
        endpoint: A2A agent endpoint URL.
        auth: Optional AuthScheme for authentication.
        timeout: Request timeout in seconds.
@@ -323,6 +340,7 @@ def execute_a2a_delegation(
                agent_role=agent_role,
                agent_branch=agent_branch,
                response_model=response_model,
+                transport_protocol=transport_protocol,
                turn_number=turn_number,
                updates=updates,
            )
@@ -333,6 +351,7 @@ def execute_a2a_delegation(

 async def aexecute_a2a_delegation(
    endpoint: str,
+    transport_protocol: Literal["JSONRPC", "GRPC", "HTTP+JSON"],
    auth: AuthScheme | None,
    timeout: int,
    task_description: str,
@@ -356,6 +375,23 @@ async def aexecute_a2a_delegation(
    in an async context (e.g., with Crew.akickoff() or agent.aexecute_task()).

    Args:
+        endpoint: A2A agent endpoint URL
+        transport_protocol: Optional A2A transport protocol (grpc, jsonrpc, http+json)
+        auth: Optional AuthScheme for authentication
+        timeout: Request timeout in seconds
+        task_description: Task to delegate
+        context: Optional context
+        context_id: Context ID for correlation
+        task_id: Specific task identifier
+        reference_task_ids: Related task IDs
+        metadata: Additional metadata
+        extensions: Protocol extensions
+        conversation_history: Previous Message objects
+        turn_number: Current turn number
+        agent_branch: Agent tree branch for logging
+        agent_id: Agent identifier for logging
+        agent_role: Agent role for logging
+        response_model: Optional Pydantic model for structured outputs
        endpoint: A2A agent endpoint URL.
        auth: Optional AuthScheme for authentication.
        timeout: Request timeout in seconds.
@@ -414,6 +450,7 @@ async def aexecute_a2a_delegation(
        agent_role=agent_role,
        response_model=response_model,
        updates=updates,
+        transport_protocol=transport_protocol,
    )

    crewai_event_bus.emit(
@@ -431,6 +468,7 @@ async def aexecute_a2a_delegation(

 async def _aexecute_a2a_delegation_impl(
    endpoint: str,
+    transport_protocol: Literal["JSONRPC", "GRPC", "HTTP+JSON"],
    auth: AuthScheme | None,
    timeout: int,
    task_description: str,
@@ -524,7 +562,6 @@ async def _aexecute_a2a_delegation_impl(
        extensions=extensions,
    )

-    transport_protocol = TransportProtocol("JSONRPC")
    new_messages: list[Message] = [*conversation_history, message]
    crewai_event_bus.emit(
        None,
@@ -596,7 +633,7 @@ async def _aexecute_a2a_delegation_impl(
@asynccontextmanager
 async def _create_a2a_client(
    agent_card: AgentCard,
-    transport_protocol: TransportProtocol,
+    transport_protocol: Literal["JSONRPC", "GRPC", "HTTP+JSON"],
    timeout: int,
    headers: MutableMapping[str, str],
    streaming: bool,
@@ -640,7 +677,7 @@ async def _create_a2a_client(

        config = ClientConfig(
            httpx_client=httpx_client,
-            supported_transports=[str(transport_protocol.value)],
+            supported_transports=[transport_protocol],
            streaming=streaming and not use_polling,
            polling=use_polling,
            accepted_output_modes=["application/json"],
--- a/lib/crewai/src/crewai/a2a/wrapper.py
+++ b/lib/crewai/src/crewai/a2a/wrapper.py
@@ -771,6 +771,7 @@ def _delegate_to_a2a(
                response_model=agent_config.response_model,
                turn_number=turn_num + 1,
                updates=agent_config.updates,
+                transport_protocol=agent_config.transport_protocol,
            )

            conversation_history = a2a_result.get("history", [])
@@ -1085,6 +1086,7 @@ async def _adelegate_to_a2a(
                agent_branch=agent_branch,
                response_model=agent_config.response_model,
                turn_number=turn_num + 1,
+                transport_protocol=agent_config.transport_protocol,
                updates=agent_config.updates,
            )

--- a/lib/crewai/src/crewai/events/event_listener.py
+++ b/lib/crewai/src/crewai/events/event_listener.py
@@ -209,10 +209,9 @@ class EventListener(BaseEventListener):
        @crewai_event_bus.on(TaskCompletedEvent)
        def on_task_completed(source: Any, event: TaskCompletedEvent) -> None:
            # Handle telemetry
-            span = self.execution_spans.get(source)
+            span = self.execution_spans.pop(source, None)
            if span:
                self._telemetry.task_ended(span, source, source.agent.crew)
-            self.execution_spans[source] = None

            # Pass task name if it exists
            task_name = get_task_name(source)
@@ -222,11 +221,10 @@ class EventListener(BaseEventListener):

        @crewai_event_bus.on(TaskFailedEvent)
        def on_task_failed(source: Any, event: TaskFailedEvent) -> None:
-            span = self.execution_spans.get(source)
+            span = self.execution_spans.pop(source, None)
            if span:
                if source.agent and source.agent.crew:
                    self._telemetry.task_ended(span, source, source.agent.crew)
-                self.execution_spans[source] = None

            # Pass task name if it exists
            task_name = get_task_name(source)
--- a/lib/crewai/src/crewai/llm.py
+++ b/lib/crewai/src/crewai/llm.py
@@ -341,7 +341,6 @@ class AccumulatedToolArgs(BaseModel):

 class LLM(BaseLLM):
    completion_cost: float | None = None
-    _callback_lock: threading.RLock = threading.RLock()

    def __new__(cls, model: str, is_litellm: bool = False, **kwargs: Any) -> LLM:
        """Factory method that routes to native SDK or falls back to LiteLLM.
@@ -1145,7 +1144,7 @@ class LLM(BaseLLM):
            if response_model:
                params["response_model"] = response_model
            response = litellm.completion(**params)
-
+            
            if hasattr(response,"usage") and not isinstance(response.usage, type) and response.usage:
                usage_info = response.usage
                self._track_token_usage_internal(usage_info)
@@ -1364,7 +1363,7 @@ class LLM(BaseLLM):
        """
        full_response = ""
        chunk_count = 0
-
+        
        usage_info = None

        accumulated_tool_args: defaultdict[int, AccumulatedToolArgs] = defaultdict(
@@ -1658,92 +1657,78 @@ class LLM(BaseLLM):
            raise ValueError("LLM call blocked by before_llm_call hook")

        # --- 5) Set up callbacks if provided
-        # Use a class-level lock to synchronize access to global litellm callbacks.
-        # This prevents race conditions when multiple LLM instances call set_callbacks
-        # concurrently, which could cause callbacks to be removed before they fire.
        with suppress_warnings():
-            with LLM._callback_lock:
-                if callbacks and len(callbacks) > 0:
-                    self.set_callbacks(callbacks)
-                try:
-                    # --- 6) Prepare parameters for the completion call
-                    params = self._prepare_completion_params(messages, tools)
-                    # --- 7) Make the completion call and handle response
-                    if self.stream:
-                        result = self._handle_streaming_response(
-                            params=params,
-                            callbacks=callbacks,
-                            available_functions=available_functions,
-                            from_task=from_task,
-                            from_agent=from_agent,
-                            response_model=response_model,
-                        )
-                    else:
-                        result = self._handle_non_streaming_response(
-                            params=params,
-                            callbacks=callbacks,
-                            available_functions=available_functions,
-                            from_task=from_task,
-                            from_agent=from_agent,
-                            response_model=response_model,
-                        )
-
-                    if isinstance(result, str):
-                        result = self._invoke_after_llm_call_hooks(
-                            messages, result, from_agent
-                        )
-
-                    return result
-                except LLMContextLengthExceededError:
-                    # Re-raise LLMContextLengthExceededError as it should be handled
-                    # by the CrewAgentExecutor._invoke_loop method, which can then decide
-                    # whether to summarize the content or abort based on the respect_context_window flag
-                    raise
-                except Exception as e:
-                    unsupported_stop = "Unsupported parameter" in str(
-                        e
-                    ) and "'stop'" in str(e)
-
-                    if unsupported_stop:
-                        if (
-                            "additional_drop_params" in self.additional_params
-                            and isinstance(
-                                self.additional_params["additional_drop_params"], list
-                            )
-                        ):
-                            self.additional_params["additional_drop_params"].append(
-                                "stop"
-                            )
-                        else:
-                            self.additional_params = {
-                                "additional_drop_params": ["stop"]
-                            }
-
-                        logging.info(
-                            "Retrying LLM call without the unsupported 'stop'"
-                        )
-
-                        # Recursive call happens inside the lock since we're using
-                        # a reentrant-safe pattern (the lock is released when we
-                        # exit the with block, and the recursive call will acquire
-                        # it again)
-                        return self.call(
-                            messages,
-                            tools=tools,
-                            callbacks=callbacks,
-                            available_functions=available_functions,
-                            from_task=from_task,
-                            from_agent=from_agent,
-                            response_model=response_model,
-                        )
-
-                    crewai_event_bus.emit(
-                        self,
-                        event=LLMCallFailedEvent(
-                            error=str(e), from_task=from_task, from_agent=from_agent
-                        ),
+            if callbacks and len(callbacks) > 0:
+                self.set_callbacks(callbacks)
+            try:
+                # --- 6) Prepare parameters for the completion call
+                params = self._prepare_completion_params(messages, tools)
+                # --- 7) Make the completion call and handle response
+                if self.stream:
+                    result = self._handle_streaming_response(
+                        params=params,
+                        callbacks=callbacks,
+                        available_functions=available_functions,
+                        from_task=from_task,
+                        from_agent=from_agent,
+                        response_model=response_model,
                    )
-                    raise
+                else:
+                    result = self._handle_non_streaming_response(
+                        params=params,
+                        callbacks=callbacks,
+                        available_functions=available_functions,
+                        from_task=from_task,
+                        from_agent=from_agent,
+                        response_model=response_model,
+                    )
+
+                if isinstance(result, str):
+                    result = self._invoke_after_llm_call_hooks(
+                        messages, result, from_agent
+                    )
+
+                return result
+            except LLMContextLengthExceededError:
+                # Re-raise LLMContextLengthExceededError as it should be handled
+                # by the CrewAgentExecutor._invoke_loop method, which can then decide
+                # whether to summarize the content or abort based on the respect_context_window flag
+                raise
+            except Exception as e:
+                unsupported_stop = "Unsupported parameter" in str(
+                    e
+                ) and "'stop'" in str(e)
+
+                if unsupported_stop:
+                    if (
+                        "additional_drop_params" in self.additional_params
+                        and isinstance(
+                            self.additional_params["additional_drop_params"], list
+                        )
+                    ):
+                        self.additional_params["additional_drop_params"].append("stop")
+                    else:
+                        self.additional_params = {"additional_drop_params": ["stop"]}
+
+                    logging.info("Retrying LLM call without the unsupported 'stop'")
+
+                    return self.call(
+                        messages,
+                        tools=tools,
+                        callbacks=callbacks,
+                        available_functions=available_functions,
+                        from_task=from_task,
+                        from_agent=from_agent,
+                        response_model=response_model,
+                    )
+
+                crewai_event_bus.emit(
+                    self,
+                    event=LLMCallFailedEvent(
+                        error=str(e), from_task=from_task, from_agent=from_agent
+                    ),
+                )
+                raise

    async def acall(
        self,
@@ -1805,27 +1790,14 @@ class LLM(BaseLLM):
                    msg_role: Literal["assistant"] = "assistant"
                    message["role"] = msg_role

-        # Use a class-level lock to synchronize access to global litellm callbacks.
-        # This prevents race conditions when multiple LLM instances call set_callbacks
-        # concurrently, which could cause callbacks to be removed before they fire.
        with suppress_warnings():
-            with LLM._callback_lock:
-                if callbacks and len(callbacks) > 0:
-                    self.set_callbacks(callbacks)
-                try:
-                    params = self._prepare_completion_params(messages, tools)
+            if callbacks and len(callbacks) > 0:
+                self.set_callbacks(callbacks)
+            try:
+                params = self._prepare_completion_params(messages, tools)

-                    if self.stream:
-                        return await self._ahandle_streaming_response(
-                            params=params,
-                            callbacks=callbacks,
-                            available_functions=available_functions,
-                            from_task=from_task,
-                            from_agent=from_agent,
-                            response_model=response_model,
-                        )
-
-                    return await self._ahandle_non_streaming_response(
+                if self.stream:
+                    return await self._ahandle_streaming_response(
                        params=params,
                        callbacks=callbacks,
                        available_functions=available_functions,
@@ -1833,49 +1805,52 @@ class LLM(BaseLLM):
                        from_agent=from_agent,
                        response_model=response_model,
                    )
-                except LLMContextLengthExceededError:
-                    raise
-                except Exception as e:
-                    unsupported_stop = "Unsupported parameter" in str(
-                        e
-                    ) and "'stop'" in str(e)

-                    if unsupported_stop:
-                        if (
-                            "additional_drop_params" in self.additional_params
-                            and isinstance(
-                                self.additional_params["additional_drop_params"], list
-                            )
-                        ):
-                            self.additional_params["additional_drop_params"].append(
-                                "stop"
-                            )
-                        else:
-                            self.additional_params = {
-                                "additional_drop_params": ["stop"]
-                            }
+                return await self._ahandle_non_streaming_response(
+                    params=params,
+                    callbacks=callbacks,
+                    available_functions=available_functions,
+                    from_task=from_task,
+                    from_agent=from_agent,
+                    response_model=response_model,
+                )
+            except LLMContextLengthExceededError:
+                raise
+            except Exception as e:
+                unsupported_stop = "Unsupported parameter" in str(
+                    e
+                ) and "'stop'" in str(e)

-                        logging.info(
-                            "Retrying LLM call without the unsupported 'stop'"
+                if unsupported_stop:
+                    if (
+                        "additional_drop_params" in self.additional_params
+                        and isinstance(
+                            self.additional_params["additional_drop_params"], list
                        )
+                    ):
+                        self.additional_params["additional_drop_params"].append("stop")
+                    else:
+                        self.additional_params = {"additional_drop_params": ["stop"]}

-                        return await self.acall(
-                            messages,
-                            tools=tools,
-                            callbacks=callbacks,
-                            available_functions=available_functions,
-                            from_task=from_task,
-                            from_agent=from_agent,
-                            response_model=response_model,
-                        )
+                    logging.info("Retrying LLM call without the unsupported 'stop'")

-                    crewai_event_bus.emit(
-                        self,
-                        event=LLMCallFailedEvent(
-                            error=str(e), from_task=from_task, from_agent=from_agent
-                        ),
+                    return await self.acall(
+                        messages,
+                        tools=tools,
+                        callbacks=callbacks,
+                        available_functions=available_functions,
+                        from_task=from_task,
+                        from_agent=from_agent,
+                        response_model=response_model,
                    )
-                    raise
+
+                crewai_event_bus.emit(
+                    self,
+                    event=LLMCallFailedEvent(
+                        error=str(e), from_task=from_task, from_agent=from_agent
+                    ),
+                )
+                raise

    def _handle_emit_call_events(
        self,
--- a/lib/crewai/src/crewai/utilities/deferred_import_utils.py
+++ b/lib/crewai/src/crewai/utilities/deferred_import_utils.py
@@ -0,0 +1,370 @@
+"""Lazy loader for Python packages.
+
+Makes it easy to load subpackages and functions on demand.
+
+Pulled from https://github.com/scientific-python/lazy-loader/blob/main/src/lazy_loader/__init__.py,
+modernized a little.
+"""
+
+import ast
+from collections.abc import Callable, Sequence
+from dataclasses import dataclass, field
+import importlib
+import importlib.metadata
+import importlib.util
+import inspect
+import os
+from pathlib import Path
+import sys
+import threading
+import types
+from typing import Any, NoReturn
+import warnings
+
+import packaging.requirements
+
+
+_threadlock = threading.Lock()
+
+
+@dataclass(frozen=True, slots=True)
+class _FrameData:
+    """Captured stack frame information for delayed error reporting."""
+
+    filename: str
+    lineno: int
+    function: str
+    code_context: Sequence[str] | None
+
+
+def attach(
+    package_name: str,
+    submodules: set[str] | None = None,
+    submod_attrs: dict[str, list[str]] | None = None,
+) -> tuple[Callable[[str], Any], Callable[[], list[str]], list[str]]:
+    """Attach lazily loaded submodules, functions, or other attributes.
+
+    Replaces a package's `__getattr__`, `__dir__`, and `__all__` such that
+    imports work normally but occur upon first use.
+
+    Example:
+        __getattr__, __dir__, __all__ = lazy.attach(
+            __name__, ["mysubmodule"], {"foo": ["someattr"]}
+        )
+
+    Args:
+        package_name: The package name, typically ``__name__``.
+        submodules: Set of submodule names to attach.
+        submod_attrs: Mapping of submodule names to lists of attributes.
+            These attributes are imported as they are used.
+
+    Returns:
+        A tuple of (__getattr__, __dir__, __all__) to assign in the package.
+    """
+    submod_attrs = submod_attrs or {}
+    submodules = set(submodules) if submodules else set()
+    attr_to_modules = {
+        attr: mod for mod, attrs in submod_attrs.items() for attr in attrs
+    }
+    __all__ = sorted(submodules | attr_to_modules.keys())
+
+    def __getattr__(name: str) -> Any:  # noqa: N807
+        if name in submodules:
+            return importlib.import_module(f"{package_name}.{name}")
+        if name in attr_to_modules:
+            submod_path = f"{package_name}.{attr_to_modules[name]}"
+            submod = importlib.import_module(submod_path)
+            attr = getattr(submod, name)
+
+            # If the attribute lives in a file (module) with the same
+            # name as the attribute, ensure that the attribute and *not*
+            # the module is accessible on the package.
+            if name == attr_to_modules[name]:
+                pkg = sys.modules[package_name]
+                pkg.__dict__[name] = attr
+
+            return attr
+        raise AttributeError(f"No {package_name} attribute {name}")
+
+    def __dir__() -> list[str]:  # noqa: N807
+        return __all__.copy()
+
+    if os.environ.get("EAGER_IMPORT"):
+        for attr in set(attr_to_modules.keys()) | submodules:
+            __getattr__(attr)
+
+    return __getattr__, __dir__, __all__.copy()
+
+
+class DelayedImportErrorModule(types.ModuleType):
+    """Module type that delays raising ModuleNotFoundError until attribute access.
+
+    Captures stack frame data to provide helpful error messages showing where
+    the original import was attempted.
+    """
+
+    def __init__(
+        self,
+        frame_data: _FrameData,
+        *args: Any,
+        message: str,
+        **kwargs: Any,
+    ) -> None:
+        """Initialize the delayed error module.
+
+        Args:
+            frame_data: Captured frame information for error reporting.
+            *args: Positional arguments passed to ModuleType.
+            message: The error message to display when accessed.
+            **kwargs: Keyword arguments passed to ModuleType.
+        """
+        self._frame_data = frame_data
+        self._message = message
+        super().__init__(*args, **kwargs)
+
+    def __getattr__(self, name: str) -> NoReturn:
+        """Raise ModuleNotFoundError with detailed context on any attribute access."""
+        frame = self._frame_data
+        code = "".join(frame.code_context) if frame.code_context else ""
+        raise ModuleNotFoundError(
+            f"{self._message}\n\n"
+            "This error is lazily reported, having originally occurred in\n"
+            f"  File {frame.filename}, line {frame.lineno}, in {frame.function}\n\n"
+            f"----> {code.strip()}"
+        )
+
+
+def load(
+    fullname: str,
+    *,
+    require: str | None = None,
+    error_on_import: bool = False,
+    suppress_warning: bool = False,
+) -> types.ModuleType:
+    """Return a lazily imported proxy for a module.
+
+    The proxy module delays actual import until first attribute access.
+
+    Example:
+        np = lazy.load("numpy")
+
+        def myfunc():
+            np.norm(...)
+
+    Warning:
+        Lazily loading subpackages causes the parent package to be eagerly
+        loaded. Use `lazy_loader.attach` instead for subpackages.
+
+    Args:
+        fullname: The full name of the module to import (e.g., "scipy").
+        require: A PEP-508 dependency requirement (e.g., "numpy >=1.24").
+            If specified, raises an error if the installed version doesn't match.
+        error_on_import: If True, raise import errors immediately.
+            If False (default), delay errors until module is accessed.
+        suppress_warning: If True, suppress the warning when loading subpackages.
+
+    Returns:
+        A proxy module that loads on first attribute access.
+    """
+    with _threadlock:
+        module = sys.modules.get(fullname)
+
+        # Most common, short-circuit
+        if module is not None and require is None:
+            return module
+
+        have_module = module is not None
+
+        if not suppress_warning and "." in fullname:
+            msg = (
+                "subpackages can technically be lazily loaded, but it causes the "
+                "package to be eagerly loaded even if it is already lazily loaded. "
+                "So, you probably shouldn't use subpackages with this lazy feature."
+            )
+            warnings.warn(msg, RuntimeWarning, stacklevel=2)
+
+        spec = None
+
+        if not have_module:
+            spec = importlib.util.find_spec(fullname)
+            have_module = spec is not None
+
+        if not have_module:
+            not_found_message = f"No module named '{fullname}'"
+        elif require is not None:
+            try:
+                have_module = _check_requirement(require)
+            except ModuleNotFoundError as e:
+                raise ValueError(
+                    f"Found module '{fullname}' but cannot test "
+                    "requirement '{require}'. "
+                    "Requirements must match distribution name, not module name."
+                ) from e
+
+            not_found_message = f"No distribution can be found matching '{require}'"
+
+        if not have_module:
+            if error_on_import:
+                raise ModuleNotFoundError(not_found_message)
+
+            parent = inspect.stack()[1]
+            frame_data = _FrameData(
+                filename=parent.filename,
+                lineno=parent.lineno,
+                function=parent.function,
+                code_context=parent.code_context,
+            )
+            del parent
+            return DelayedImportErrorModule(
+                frame_data,
+                "DelayedImportErrorModule",
+                message=not_found_message,
+            )
+
+        if spec is not None:
+            module = importlib.util.module_from_spec(spec)
+            sys.modules[fullname] = module
+
+            if spec.loader is not None:
+                loader = importlib.util.LazyLoader(spec.loader)
+                loader.exec_module(module)
+
+        if module is None:
+            raise ModuleNotFoundError(f"No module named '{fullname}'")
+
+    return module
+
+
+def _check_requirement(require: str) -> bool:
+    """Verify that a package requirement is satisfied.
+
+    Args:
+        require: A dependency requirement as defined in PEP-508.
+
+    Returns:
+        True if the installed version matches the requirement, False otherwise.
+
+    Raises:
+        ModuleNotFoundError: If the package is not installed.
+    """
+    req = packaging.requirements.Requirement(require)
+    return req.specifier.contains(
+        importlib.metadata.version(req.name),
+        prereleases=True,
+    )
+
+
+@dataclass
+class _StubVisitor(ast.NodeVisitor):
+    """AST visitor to parse a stub file for submodules and submod_attrs."""
+
+    _submodules: set[str] = field(default_factory=set)
+    _submod_attrs: dict[str, list[str]] = field(default_factory=dict)
+
+    def visit_ImportFrom(self, node: ast.ImportFrom) -> None:
+        """Visit an ImportFrom node and extract submodule/attribute information.
+
+        Args:
+            node: The AST ImportFrom node to visit.
+
+        Raises:
+            ValueError: If the import is not a relative import or uses star import.
+        """
+        if node.level != 1:
+            raise ValueError(
+                "Only within-module imports are supported (`from .* import`)"
+            )
+        names = [alias.name for alias in node.names]
+        if node.module:
+            if "*" in names:
+                raise ValueError(
+                    f"lazy stub loader does not support star import "
+                    f"`from {node.module} import *`"
+                )
+            self._submod_attrs.setdefault(node.module, []).extend(names)
+        else:
+            self._submodules.update(names)
+
+
+def attach_stub(
+    package_name: str,
+    filename: str,
+) -> tuple[Callable[[str], Any], Callable[[], list[str]], list[str]]:
+    """Attach lazily loaded submodules and functions from a type stub.
+
+    Parses a `.pyi` stub file to infer submodules and attributes. This allows
+    static type checkers to find imports while providing lazy loading at runtime.
+
+    Args:
+        package_name: The package name, typically ``__name__``.
+        filename: Path to `.py` file with an adjacent `.pyi` file.
+            Typically use ``__file__``.
+
+    Returns:
+        A tuple of (__getattr__, __dir__, __all__) to assign in the package.
+
+    Raises:
+        ValueError: If stub file is not found or contains invalid imports.
+    """
+    path = Path(filename)
+    stubfile = path if path.suffix == ".pyi" else path.with_suffix(".pyi")
+
+    if not stubfile.exists():
+        raise ValueError(f"Cannot load imports from non-existent stub {stubfile!r}")
+
+    visitor = _StubVisitor()
+    visitor.visit(ast.parse(stubfile.read_text()))
+    return attach(package_name, visitor._submodules, visitor._submod_attrs)
+
+
+def lazy_exports_stub(package_name: str, filename: str) -> None:
+    """Install lazy loading on a module based on its .pyi stub file.
+
+    Parses the adjacent `.pyi` stub file to determine what to export lazily.
+    Type checkers see the stub, runtime gets lazy loading.
+
+    Example:
+        # __init__.py
+        from crewai.utilities.lazy import lazy_exports_stub
+        lazy_exports_stub(__name__, __file__)
+
+        # __init__.pyi
+        from .config import ChromaDBConfig, ChromaDBSettings
+        from .types import EmbeddingType
+
+    Args:
+        package_name: The package name, typically ``__name__``.
+        filename: Path to the module file, typically ``__file__``.
+    """
+    __getattr__, __dir__, __all__ = attach_stub(package_name, filename)
+    module = sys.modules[package_name]
+    module.__getattr__ = __getattr__  # type: ignore[method-assign]
+    module.__dir__ = __dir__  # type: ignore[method-assign]
+    module.__dict__["__all__"] = __all__
+
+
+def lazy_exports(
+    package_name: str,
+    submod_attrs: dict[str, list[str]],
+    submodules: set[str] | None = None,
+) -> None:
+    """Install lazy loading on a module.
+
+    Example:
+        from crewai.utilities.lazy import lazy_exports
+
+        lazy_exports(__name__, {
+            'config': ['ChromaDBConfig', 'ChromaDBSettings'],
+            'types': ['EmbeddingType'],
+        })
+
+    Args:
+        package_name: The package name, typically ``__name__``.
+        submod_attrs: Mapping of submodule names to lists of attributes.
+        submodules: Optional set of submodule names to expose directly.
+    """
+    __getattr__, __dir__, __all__ = attach(package_name, submodules, submod_attrs)
+    module = sys.modules[package_name]
+    module.__getattr__ = __getattr__  # type: ignore[method-assign]
+    module.__dir__ = __dir__  # type: ignore[method-assign]
+    module.__dict__["__all__"] = __all__
--- a/lib/crewai/tests/test_llm.py
+++ b/lib/crewai/tests/test_llm.py
@@ -1,6 +1,6 @@
 import logging
 import os
-import threading
+from time import sleep
 from unittest.mock import MagicMock, patch

 from crewai.agents.agent_builder.utilities.base_token_process import TokenProcess
@@ -18,15 +18,9 @@ from pydantic import BaseModel
 import pytest


+# TODO: This test fails without print statement, which makes me think that something is happening asynchronously that we need to eventually fix and dive deeper into at a later date
@pytest.mark.vcr()
 def test_llm_callback_replacement():
-    """Test that callbacks are properly isolated between LLM instances.
-
-    This test verifies that the race condition fix (using _callback_lock) works
-    correctly. Previously, this test required a sleep(5) workaround because
-    callbacks were being modified globally without synchronization, causing
-    one LLM instance's callbacks to interfere with another's.
-    """
    llm1 = LLM(model="gpt-4o-mini", is_litellm=True)
    llm2 = LLM(model="gpt-4o-mini", is_litellm=True)

@@ -43,6 +37,7 @@ def test_llm_callback_replacement():
        messages=[{"role": "user", "content": "Hello, world from another agent!"}],
        callbacks=[calc_handler_2],
    )
+    sleep(5)
    usage_metrics_2 = calc_handler_2.token_cost_process.get_summary()

    # The first handler should not have been updated
@@ -51,66 +46,6 @@ def test_llm_callback_replacement():
    assert usage_metrics_1 == calc_handler_1.token_cost_process.get_summary()


-def test_llm_callback_lock_prevents_race_condition():
-    """Test that the _callback_lock prevents race conditions in concurrent LLM calls.
-
-    This test verifies that multiple threads can safely call LLM.call() with
-    different callbacks without interfering with each other. The lock ensures
-    that callbacks are properly isolated between concurrent calls.
-    """
-    num_threads = 5
-    results: list[int] = []
-    errors: list[Exception] = []
-    lock = threading.Lock()
-
-    def make_llm_call(thread_id: int, mock_completion: MagicMock) -> None:
-        try:
-            llm = LLM(model="gpt-4o-mini", is_litellm=True)
-            calc_handler = TokenCalcHandler(token_cost_process=TokenProcess())
-
-            mock_message = MagicMock()
-            mock_message.content = f"Response from thread {thread_id}"
-            mock_choice = MagicMock()
-            mock_choice.message = mock_message
-            mock_response = MagicMock()
-            mock_response.choices = [mock_choice]
-            mock_response.usage = {
-                "prompt_tokens": 10,
-                "completion_tokens": 10,
-                "total_tokens": 20,
-            }
-            mock_completion.return_value = mock_response
-
-            llm.call(
-                messages=[{"role": "user", "content": f"Hello from thread {thread_id}"}],
-                callbacks=[calc_handler],
-            )
-
-            usage = calc_handler.token_cost_process.get_summary()
-            with lock:
-                results.append(usage.successful_requests)
-        except Exception as e:
-            with lock:
-                errors.append(e)
-
-    with patch("litellm.completion") as mock_completion:
-        threads = [
-            threading.Thread(target=make_llm_call, args=(i, mock_completion))
-            for i in range(num_threads)
-        ]
-
-        for t in threads:
-            t.start()
-        for t in threads:
-            t.join()
-
-    assert len(errors) == 0, f"Errors occurred: {errors}"
-    assert len(results) == num_threads
-    assert all(
-        r == 1 for r in results
-    ), f"Expected all callbacks to have 1 successful request, got {results}"
-
-
@pytest.mark.vcr()
 def test_llm_call_with_string_input():
    llm = LLM(model="gpt-4o-mini")
@@ -1054,4 +989,4 @@ async def test_usage_info_streaming_with_acall():
    assert llm._token_usage["completion_tokens"] > 0
    assert llm._token_usage["total_tokens"] > 0

-    assert len(result) > 0
+    assert len(result) > 0
Author	SHA1	Message	Date
Greyson LaLonde	c1776aca8a	feat: add utils for deferred imports	2026-01-13 11:27:02 -05:00
Koushiv	8f99fa76ed	feat: additional a2a transports Some checks failed CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details Check Documentation Broken Links / Check broken links (push) Has been cancelled Details Notify Downstream / notify-downstream (push) Has been cancelled Details Mark stale issues and pull requests / stale (push) Has been cancelled Details Co-authored-by: Koushiv Sadhukhan <koushiv.777@gmail.com> Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>	2026-01-12 12:03:06 -05:00
GininDenis	17e3fcbe1f	fix: unlink task in execution spans Some checks failed CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details Notify Downstream / notify-downstream (push) Has been cancelled Details Mark stale issues and pull requests / stale (push) Has been cancelled Details Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>	2026-01-12 02:58:42 -05:00
Joao Moura	b858d705a8	updating docs Some checks failed CodeQL Advanced / Analyze (actions) (push) Has been cancelled Details CodeQL Advanced / Analyze (python) (push) Has been cancelled Details Notify Downstream / notify-downstream (push) Has been cancelled Details Check Documentation Broken Links / Check broken links (push) Has been cancelled Details	2026-01-11 16:02:55 -08:00