Merge branch 'main' into brandon/improve-llm-structured-output

update docs
2026-01-07 15:18:29 +00:00 · 2025-02-04 13:44:28 -08:00 · 2025-02-04 12:42:30 -05:00 · 2025-02-04 12:41:02 -05:00 · 2025-02-04 11:44:48 -05:00
38 changed files with 292 additions and 2200 deletions
--- a/README.md
+++ b/README.md
@@ -1,18 +1,10 @@
 <div align="center">

-![Logo of CrewAI](./docs/crewai_logo.png)
+![Logo of CrewAI, two people rowing on a boat](./docs/crewai_logo.png)

 # **CrewAI**

-**CrewAI**: Production-grade framework for orchestrating sophisticated AI agent systems. From simple automations to complex real-world applications, CrewAI provides precise control and deep customization. By fostering collaborative intelligence through flexible, production-ready architecture, CrewAI empowers agents to work together seamlessly, tackling complex business challenges with predictable, consistent results.
-
-**CrewAI Enterprise**
-Want to plan, build (+ no code), deploy, monitor and interare your agents: [CrewAI Enterprise](https://www.crewai.com/enterprise). Designed for complex, real-world applications, our enterprise solution offers:
-
- **Seamless Integrations**
- **Scalable & Secure Deployment**
- **Actionable Insights**
- **24/7 Support**
+🤖 **CrewAI**: Production-grade framework for orchestrating sophisticated AI agent systems. From simple automations to complex real-world applications, CrewAI provides precise control and deep customization. By fostering collaborative intelligence through flexible, production-ready architecture, CrewAI empowers agents to work together seamlessly, tackling complex business challenges with predictable, consistent results.

 <h3>

@@ -400,7 +392,7 @@ class AdvancedAnalysisFlow(Flow[MarketState]):
            goal="Gather and validate supporting market data",
            backstory="You excel at finding and correlating multiple data sources"
        )
-
+        
        analysis_task = Task(
            description="Analyze {sector} sector data for the past {timeframe}",
            expected_output="Detailed market analysis with confidence score",
@@ -411,7 +403,7 @@ class AdvancedAnalysisFlow(Flow[MarketState]):
            expected_output="Corroborating evidence and potential contradictions",
            agent=researcher
        )
-
+        
        # Demonstrate crew autonomy
        analysis_crew = Crew(
            agents=[analyst, researcher],
--- a/docs/concepts/crews.mdx
+++ b/docs/concepts/crews.mdx
@@ -23,14 +23,14 @@ A crew in crewAI represents a collaborative group of agents working together to
 | **Language** _(optional)_             | `language`             | Language used for the crew, defaults to English.                                                                                                                                                                                                          |
 | **Language File** _(optional)_        | `language_file`        | Path to the language file to be used for the crew.                                                                                                                                                                                                        |
 | **Memory** _(optional)_               | `memory`               | Utilized for storing execution memories (short-term, long-term, entity memory).                                                                                                                                                                           |
-| **Memory Config** _(optional)_        | `memory_config`        | Configuration for the memory provider to be used by the crew.                                                                                                                                                                                             |
-| **Cache** _(optional)_                | `cache`                | Specifies whether to use a cache for storing the results of tools' execution. Defaults to `True`.                                                                                                                                                         |
-| **Embedder** _(optional)_             | `embedder`             | Configuration for the embedder to be used by the crew. Mostly used by memory for now. Default is `{"provider": "openai"}`.                                                                                                                                |
-| **Full Output** _(optional)_          | `full_output`          | Whether the crew should return the full output with all tasks outputs or just the final output. Defaults to `False`.                                                                                                                                      |
+| **Memory Config** _(optional)_        | `memory_config`        | Configuration for the memory provider to be used by the crew.                                                                                                                                                                                          |
+| **Cache** _(optional)_                | `cache`                | Specifies whether to use a cache for storing the results of tools' execution. Defaults to `True`.                                                                                                                                                          |
+| **Embedder** _(optional)_             | `embedder`             | Configuration for the embedder to be used by the crew. Mostly used by memory for now. Default is `{"provider": "openai"}`.                                                                                                                                                                     |
+| **Full Output** _(optional)_          | `full_output`          | Whether the crew should return the full output with all tasks outputs or just the final output. Defaults to `False`.                                                                                                                                       |
 | **Step Callback** _(optional)_        | `step_callback`        | A function that is called after each step of every agent. This can be used to log the agent's actions or to perform other operations; it won't override the agent-specific `step_callback`.                                                               |
 | **Task Callback** _(optional)_        | `task_callback`        | A function that is called after the completion of each task. Useful for monitoring or additional operations post-task execution.                                                                                                                          |
 | **Share Crew** _(optional)_           | `share_crew`           | Whether you want to share the complete crew information and execution with the crewAI team to make the library better, and allow us to train models.                                                                                                      |
-| **Output Log File** _(optional)_      | `output_log_file`      | Set to True to save logs as logs.txt in the current directory or provide a file path. Logs will be in JSON format if the filename ends in .json, otherwise .txt. Defautls to `None`.                                                                      |
+| **Output Log File** _(optional)_      | `output_log_file`      | Whether you want to have a file with the complete crew output and execution. You can set it using True and it will default to the folder you are currently in and it will be called logs.txt or passing a string with the full path and name of the file. |
 | **Manager Agent** _(optional)_        | `manager_agent`        | `manager` sets a custom agent that will be used as a manager.                                                                                                                                                                                             |
 | **Prompt File** _(optional)_          | `prompt_file`          | Path to the prompt JSON file to be used for the crew.                                                                                                                                                                                                     |
 | **Planning** *(optional)*             | `planning`             | Adds planning ability to the Crew. When activated before each Crew iteration, all Crew data is sent to an AgentPlanner that will plan the tasks and this plan will be added to each task description.                                                     |
@@ -240,23 +240,6 @@ print(f"Tasks Output: {crew_output.tasks_output}")
 print(f"Token Usage: {crew_output.token_usage}")
 ```

-## Accessing Crew Logs
-
-You can see real time log of the crew execution, by setting `output_log_file` as a `True(Boolean)` or a `file_name(str)`. Supports logging of events as both `file_name.txt` and `file_name.json`.
-In case of `True(Boolean)` will save as `logs.txt`.
-
-In case of `output_log_file` is set as `False(Booelan)` or `None`, the logs will not be populated.
-
-```python Code
-# Save crew logs
-crew = Crew(output_log_file = True)  # Logs will be saved as logs.txt
-crew = Crew(output_log_file = file_name)  # Logs will be saved as file_name.txt
-crew = Crew(output_log_file = file_name.txt)  # Logs will be saved as file_name.txt
-crew = Crew(output_log_file = file_name.json)  # Logs will be saved as file_name.json
-```
-
-
-
 ## Memory Utilization

 Crews can utilize memory (short-term, long-term, and entity memory) to enhance their execution and learning over time. This feature allows crews to store and recall execution memories, aiding in decision-making and task execution strategies.
--- a/docs/concepts/flows.mdx
+++ b/docs/concepts/flows.mdx
@@ -232,18 +232,18 @@ class UnstructuredExampleFlow(Flow):
    def first_method(self):
        # The state automatically includes an 'id' field
        print(f"State ID: {self.state['id']}")
-        self.state['counter'] = 0
-        self.state['message'] = "Hello from structured flow"
+        self.state.message = "Hello from structured flow"
+        self.state.counter = 0

    @listen(first_method)
    def second_method(self):
-        self.state['counter'] += 1
-        self.state['message'] += " - updated"
+        self.state.counter += 1
+        self.state.message += " - updated"

    @listen(second_method)
    def third_method(self):
-        self.state['counter'] += 1
-        self.state['message'] += " - updated again"
+        self.state.counter += 1
+        self.state.message += " - updated again"

        print(f"State after third_method: {self.state}")

--- a/docs/concepts/llms.mdx
+++ b/docs/concepts/llms.mdx
@@ -463,32 +463,26 @@ Learn how to get the most out of your LLM configuration:

  <Accordion title="Google">
    ```python Code
-    # Option 1: Gemini accessed with an API key.
+    # Option 1. Gemini accessed with an API key.
    # https://ai.google.dev/gemini-api/docs/api-key
    GEMINI_API_KEY=<your-api-key>

-    # Option 2: Vertex AI IAM credentials for Gemini, Anthropic, and Model Garden.
+    # Option 2. Vertex AI IAM credentials for Gemini, Anthropic, and anything in the Model Garden.
    # https://cloud.google.com/vertex-ai/generative-ai/docs/overview
    ```

-    Get credentials:
-    ```python Code
-    import json
-
+    ## GET CREDENTIALS 
    file_path = 'path/to/vertex_ai_service_account.json'

    # Load the JSON file
    with open(file_path, 'r') as file:
        vertex_credentials = json.load(file)

-    # Convert the credentials to a JSON string
+    # Convert to JSON string
    vertex_credentials_json = json.dumps(vertex_credentials)
-    ```

    Example usage:
    ```python Code
-    from crewai import LLM
-
    llm = LLM(
        model="gemini/gemini-1.5-pro-latest",
        temperature=0.7,
--- a/docs/concepts/memory.mdx
+++ b/docs/concepts/memory.mdx
@@ -58,107 +58,41 @@ my_crew = Crew(
 ### Example: Use Custom Memory Instances e.g FAISS as the VectorDB

 ```python Code
-from crewai import Crew, Process
-from crewai.memory import LongTermMemory, ShortTermMemory, EntityMemory
-from crewai.memory.storage import LTMSQLiteStorage, RAGStorage
-from typing import List, Optional
+from crewai import Crew, Agent, Task, Process

 # Assemble your crew with memory capabilities
-my_crew: Crew = Crew(
-    agents = [...],
-    tasks = [...],
-    process = Process.sequential,
-    memory = True,
-    # Long-term memory for persistent storage across sessions
-    long_term_memory = LongTermMemory(
+my_crew = Crew(
+    agents=[...],
+    tasks=[...],
+    process="Process.sequential",
+    memory=True,
+    long_term_memory=EnhanceLongTermMemory(
        storage=LTMSQLiteStorage(
-            db_path="/my_crew1/long_term_memory_storage.db"
+            db_path="/my_data_dir/my_crew1/long_term_memory_storage.db"
        )
    ),
-    # Short-term memory for current context using RAG
-    short_term_memory = ShortTermMemory(
-        storage = RAGStorage(
-                embedder_config={
-                    "provider": "openai",
-                    "config": {
-                        "model": 'text-embedding-3-small'
-                    }
-                },
-                type="short_term",
-                path="/my_crew1/"
-            )
+    short_term_memory=EnhanceShortTermMemory(
+        storage=CustomRAGStorage(
+            crew_name="my_crew",
+            storage_type="short_term",
+            data_dir="//my_data_dir",
+            model=embedder["model"],
+            dimension=embedder["dimension"],
        ),
    ),
-    # Entity memory for tracking key information about entities
-    entity_memory = EntityMemory(
-        storage=RAGStorage(
-            embedder_config={
-                "provider": "openai",
-                "config": {
-                    "model": 'text-embedding-3-small'
-                }
-            },
-            type="short_term",
-            path="/my_crew1/"
-        )
+    entity_memory=EnhanceEntityMemory(
+        storage=CustomRAGStorage(
+            crew_name="my_crew",
+            storage_type="entities",
+            data_dir="//my_data_dir",
+            model=embedder["model"],
+            dimension=embedder["dimension"],
+        ),
    ),
    verbose=True,
 )
 ```

-## Security Considerations
-
-When configuring memory storage:
- Use environment variables for storage paths (e.g., `CREWAI_STORAGE_DIR`)
- Never hardcode sensitive information like database credentials
- Consider access permissions for storage directories
- Use relative paths when possible to maintain portability
-
-Example using environment variables:
-```python
-import os
-from crewai import Crew
-from crewai.memory import LongTermMemory
-from crewai.memory.storage import LTMSQLiteStorage
-
-# Configure storage path using environment variable
-storage_path = os.getenv("CREWAI_STORAGE_DIR", "./storage")
-crew = Crew(
-    memory=True,
-    long_term_memory=LongTermMemory(
-        storage=LTMSQLiteStorage(
-            db_path="{storage_path}/memory.db".format(storage_path=storage_path)
-        )
-    )
-)
-```
-
-## Configuration Examples
-
-### Basic Memory Configuration
-```python
-from crewai import Crew
-from crewai.memory import LongTermMemory
-
-# Simple memory configuration
-crew = Crew(memory=True)  # Uses default storage locations
-```
-
-### Custom Storage Configuration
-```python
-from crewai import Crew
-from crewai.memory import LongTermMemory
-from crewai.memory.storage import LTMSQLiteStorage
-
-# Configure custom storage paths
-crew = Crew(
-    memory=True,
-    long_term_memory=LongTermMemory(
-        storage=LTMSQLiteStorage(db_path="./memory.db")
-    )
-)
-```
-
 ## Integrating Mem0 for Enhanced User Memory

 [Mem0](https://mem0.ai/) is a self-improving memory layer for LLM applications, enabling personalized AI experiences. 
@@ -251,12 +185,7 @@ my_crew = Crew(
    process=Process.sequential,
    memory=True,
    verbose=True,
-    embedder={
-        "provider": "openai",
-        "config": {
-            "model": 'text-embedding-3-small'
-        }
-    }
+    embedder=OpenAIEmbeddingFunction(api_key=os.getenv("OPENAI_API_KEY"), model="text-embedding-3-small"),
 )
 ```

@@ -313,15 +242,13 @@ my_crew = Crew(
    process=Process.sequential,
    memory=True,
    verbose=True,
-    embedder={
-        "provider": "openai",
-        "config": {
-            "api_key": "YOUR_API_KEY",
-            "api_base": "YOUR_API_BASE_PATH",
-            "api_version": "YOUR_API_VERSION",
-            "model_name": 'text-embedding-3-small'
-        }
-    }
+    embedder=OpenAIEmbeddingFunction(
+        api_key="YOUR_API_KEY",
+        api_base="YOUR_API_BASE_PATH",
+        api_type="azure",
+        api_version="YOUR_API_VERSION",
+        model="text-embedding-3-small"
+    )
 )
 ```

@@ -337,15 +264,12 @@ my_crew = Crew(
    process=Process.sequential,
    memory=True,
    verbose=True,
-    embedder={
-        "provider": "vertexai",
-        "config": {
-            "project_id"="YOUR_PROJECT_ID",
-            "region"="YOUR_REGION",
-            "api_key"="YOUR_API_KEY",
-            "model_name"="textembedding-gecko"
-        }
-    }
+    embedder=GoogleVertexEmbeddingFunction(
+        project_id="YOUR_PROJECT_ID",
+        region="YOUR_REGION",
+        api_key="YOUR_API_KEY",
+        model="textembedding-gecko"
+    )
 )
 ```

@@ -434,33 +358,6 @@ my_crew = Crew(
 )
 ```

-### Adding Custom Embedding Function
-
-```python Code
-from crewai import Crew, Agent, Task, Process
-from chromadb import Documents, EmbeddingFunction, Embeddings
-
-# Create a custom embedding function
-class CustomEmbedder(EmbeddingFunction):
-    def __call__(self, input: Documents) -> Embeddings:
-        # generate embeddings
-        return [1, 2, 3] # this is a dummy embedding
-
-my_crew = Crew(
-    agents=[...],
-    tasks=[...],
-    process=Process.sequential,
-    memory=True,
-    verbose=True,
-    embedder={
-        "provider": "custom",
-        "config": {
-            "embedder": CustomEmbedder()
-        }
-    }
-)
-```
-
 ### Resetting Memory

 ```shell
--- a/docs/how-to/multimodal-agents.mdx
+++ b/docs/how-to/multimodal-agents.mdx
@@ -45,7 +45,6 @@ image_analyst = Agent(
 # Create a task for image analysis
 task = Task(
    description="Analyze the product image at https://example.com/product.jpg and provide a detailed description",
-    expected_output="A detailed description of the product image",
    agent=image_analyst
 )

@@ -82,7 +81,6 @@ inspection_task = Task(
    3. Compliance with standards
    Provide a detailed report highlighting any issues found.
    """,
-    expected_output="A detailed report highlighting any issues found",
    agent=expert_analyst
 )

--- a/docs/tools/filewritetool.mdx
+++ b/docs/tools/filewritetool.mdx
@@ -8,9 +8,9 @@ icon: file-pen

 ## Description

-The `FileWriterTool` is a component of the crewai_tools package, designed to simplify the process of writing content to files with cross-platform compatibility (Windows, Linux, macOS). 
+The `FileWriterTool` is a component of the crewai_tools package, designed to simplify the process of writing content to files. 
 It is particularly useful in scenarios such as generating reports, saving logs, creating configuration files, and more. 
-This tool handles path differences across operating systems, supports UTF-8 encoding, and automatically creates directories if they don't exist, making it easier to organize your output reliably across different platforms.
+This tool supports creating new directories if they don't exist, making it easier to organize your output.

 ## Installation

@@ -43,8 +43,6 @@ print(result)

 ## Conclusion

-By integrating the `FileWriterTool` into your crews, the agents can reliably write content to files across different operating systems. 
-This tool is essential for tasks that require saving output data, creating structured file systems, and handling cross-platform file operations. 
-It's particularly recommended for Windows users who may encounter file writing issues with standard Python file operations.
-
-By adhering to the setup and usage guidelines provided, incorporating this tool into projects is straightforward and ensures consistent file writing behavior across all platforms.
+By integrating the `FileWriterTool` into your crews, the agents can execute the process of writing content to files and creating directories. 
+This tool is essential for tasks that require saving output data, creating structured file systems, and more. By adhering to the setup and usage guidelines provided, 
+incorporating this tool into projects is straightforward and efficient.
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "crewai"
-version = "0.100.1"
+version = "0.100.0"
 description = "Cutting-edge framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks."
 readme = "README.md"
 requires-python = ">=3.10,<3.13"
--- a/src/crewai/init.py
+++ b/src/crewai/init.py
@@ -14,7 +14,7 @@ warnings.filterwarnings(
    category=UserWarning,
    module="pydantic.main",
 )
-__version__ = "0.100.1"
+__version__ = "0.100.0"
 __all__ = [
    "Agent",
    "Crew",
--- a/src/crewai/agent.py
+++ b/src/crewai/agent.py
@@ -1,7 +1,6 @@
-import re
 import shutil
 import subprocess
-from typing import Any, Dict, List, Literal, Optional, Sequence, Union
+from typing import Any, Dict, List, Literal, Optional, Union

 from pydantic import Field, InstanceOf, PrivateAttr, model_validator

@@ -16,6 +15,7 @@ from crewai.memory.contextual.contextual_memory import ContextualMemory
 from crewai.task import Task
 from crewai.tools import BaseTool
 from crewai.tools.agent_tools.agent_tools import AgentTools
+from crewai.tools.base_tool import Tool
 from crewai.utilities import Converter, Prompts
 from crewai.utilities.constants import TRAINED_AGENTS_DATA_FILE, TRAINING_DATA_FILE
 from crewai.utilities.converter import generate_model_description
@@ -54,6 +54,7 @@ class Agent(BaseAgent):
            llm: The language model that will run the agent.
            function_calling_llm: The language model that will handle the tool calling for this agent, it overrides the crew function_calling_llm.
            max_iter: Maximum number of iterations for an agent to execute a task.
+            memory: Whether the agent should have memory or not.
            max_rpm: Maximum number of requests per minute for the agent execution to be respected.
            verbose: Whether the agent execution should be in verbose mode.
            allow_delegation: Whether the agent is allowed to delegate tasks to other agents.
@@ -70,6 +71,9 @@ class Agent(BaseAgent):
    )
    agent_ops_agent_name: str = None  # type: ignore # Incompatible types in assignment (expression has type "None", variable has type "str")
    agent_ops_agent_id: str = None  # type: ignore # Incompatible types in assignment (expression has type "None", variable has type "str")
+    cache_handler: InstanceOf[CacheHandler] = Field(
+        default=None, description="An instance of the CacheHandler class."
+    )
    step_callback: Optional[Any] = Field(
        default=None,
        description="Callback to be executed after each step of the agent execution.",
@@ -103,6 +107,10 @@ class Agent(BaseAgent):
        default=True,
        description="Keep messages under the context window size by summarizing content.",
    )
+    max_iter: int = Field(
+        default=20,
+        description="Maximum number of iterations for an agent to execute a task before giving it's best answer",
+    )
    max_retry_limit: int = Field(
        default=2,
        description="Maximum number of retries for an agent to execute a task when an error occurs.",
@@ -145,8 +153,7 @@ class Agent(BaseAgent):
    def _set_knowledge(self):
        try:
            if self.knowledge_sources:
-                full_pattern = re.compile(r"[^a-zA-Z0-9\-_\r\n]|(\.\.)")
-                knowledge_agent_name = f"{re.sub(full_pattern, '_', self.role)}"
+                knowledge_agent_name = f"{self.role.replace(' ', '_')}"
                if isinstance(self.knowledge_sources, list) and all(
                    isinstance(k, BaseKnowledgeSource) for k in self.knowledge_sources
                ):
@@ -188,15 +195,13 @@ class Agent(BaseAgent):
            if task.output_json:
                # schema = json.dumps(task.output_json, indent=2)
                schema = generate_model_description(task.output_json)
-                task_prompt += "\n" + self.i18n.slice(
-                    "formatted_task_instructions"
-                ).format(output_format=schema)

            elif task.output_pydantic:
                schema = generate_model_description(task.output_pydantic)
-                task_prompt += "\n" + self.i18n.slice(
-                    "formatted_task_instructions"
-                ).format(output_format=schema)
+
+            task_prompt += "\n" + self.i18n.slice("formatted_task_instructions").format(
+                output_format=schema
+            )

        if context:
            task_prompt = self.i18n.slice("task_with_context").format(
@@ -324,14 +329,14 @@ class Agent(BaseAgent):
        tools = agent_tools.tools()
        return tools

-    def get_multimodal_tools(self) -> Sequence[BaseTool]:
+    def get_multimodal_tools(self) -> List[Tool]:
        from crewai.tools.agent_tools.add_image_tool import AddImageTool

        return [AddImageTool()]

    def get_code_execution_tools(self):
        try:
-            from crewai_tools import CodeInterpreterTool  # type: ignore
+            from crewai_tools import CodeInterpreterTool

            # Set the unsafe_mode based on the code_execution_mode attribute
            unsafe_mode = self.code_execution_mode == "unsafe"
--- a/src/crewai/agents/agent_builder/base_agent.py
+++ b/src/crewai/agents/agent_builder/base_agent.py
@@ -24,7 +24,6 @@ from crewai.tools import BaseTool
 from crewai.tools.base_tool import Tool
 from crewai.utilities import I18N, Logger, RPMController
 from crewai.utilities.config import process_config
-from crewai.utilities.converter import Converter

 T = TypeVar("T", bound="BaseAgent")

@@ -43,7 +42,7 @@ class BaseAgent(ABC, BaseModel):
        max_rpm (Optional[int]): Maximum number of requests per minute for the agent execution.
        allow_delegation (bool): Allow delegation of tasks to agents.
        tools (Optional[List[Any]]): Tools at the agent's disposal.
-        max_iter (int): Maximum iterations for an agent to execute a task.
+        max_iter (Optional[int]): Maximum iterations for an agent to execute a task.
        agent_executor (InstanceOf): An instance of the CrewAgentExecutor class.
        llm (Any): Language model that will run the agent.
        crew (Any): Crew to which the agent belongs.
@@ -115,7 +114,7 @@ class BaseAgent(ABC, BaseModel):
    tools: Optional[List[Any]] = Field(
        default_factory=list, description="Tools at agents' disposal"
    )
-    max_iter: int = Field(
+    max_iter: Optional[int] = Field(
        default=25, description="Maximum iterations for an agent to execute a task"
    )
    agent_executor: InstanceOf = Field(
@@ -126,12 +125,11 @@ class BaseAgent(ABC, BaseModel):
    )
    crew: Any = Field(default=None, description="Crew to which the agent belongs.")
    i18n: I18N = Field(default=I18N(), description="Internationalization settings.")
-    cache_handler: Optional[InstanceOf[CacheHandler]] = Field(
+    cache_handler: InstanceOf[CacheHandler] = Field(
        default=None, description="An instance of the CacheHandler class."
    )
    tools_handler: InstanceOf[ToolsHandler] = Field(
-        default_factory=ToolsHandler,
-        description="An instance of the ToolsHandler class.",
+        default=None, description="An instance of the ToolsHandler class."
    )
    max_tokens: Optional[int] = Field(
        default=None, description="Maximum number of tokens for the agent's execution."
@@ -256,7 +254,7 @@ class BaseAgent(ABC, BaseModel):
    @abstractmethod
    def get_output_converter(
        self, llm: Any, text: str, model: type[BaseModel] | None, instructions: str
-    ) -> Converter:
+    ):
        """Get the converter class for the agent to create json/pydantic outputs."""
        pass

--- a/src/crewai/cli/reset_memories_command.py
+++ b/src/crewai/cli/reset_memories_command.py
@@ -2,7 +2,11 @@ import subprocess

 import click

-from crewai.cli.utils import get_crew
+from crewai.knowledge.storage.knowledge_storage import KnowledgeStorage
+from crewai.memory.entity.entity_memory import EntityMemory
+from crewai.memory.long_term.long_term_memory import LongTermMemory
+from crewai.memory.short_term.short_term_memory import ShortTermMemory
+from crewai.utilities.task_output_storage_handler import TaskOutputStorageHandler


 def reset_memories_command(
@@ -26,35 +30,30 @@ def reset_memories_command(
    """

    try:
-        crew = get_crew()
-        if not crew:
-            raise ValueError("No crew found.")
        if all:
-            crew.reset_memories(command_type="all")
+            ShortTermMemory().reset()
+            EntityMemory().reset()
+            LongTermMemory().reset()
+            TaskOutputStorageHandler().reset()
+            KnowledgeStorage().reset()
            click.echo("All memories have been reset.")
-            return
+        else:
+            if long:
+                LongTermMemory().reset()
+                click.echo("Long term memory has been reset.")

-        if not any([long, short, entity, kickoff_outputs, knowledge]):
-            click.echo(
-                "No memory type specified. Please specify at least one type to reset."
-            )
-            return
-
-        if long:
-            crew.reset_memories(command_type="long")
-            click.echo("Long term memory has been reset.")
-        if short:
-            crew.reset_memories(command_type="short")
-            click.echo("Short term memory has been reset.")
-        if entity:
-            crew.reset_memories(command_type="entity")
-            click.echo("Entity memory has been reset.")
-        if kickoff_outputs:
-            crew.reset_memories(command_type="kickoff_outputs")
-            click.echo("Latest Kickoff outputs stored has been reset.")
-        if knowledge:
-            crew.reset_memories(command_type="knowledge")
-            click.echo("Knowledge has been reset.")
+            if short:
+                ShortTermMemory().reset()
+                click.echo("Short term memory has been reset.")
+            if entity:
+                EntityMemory().reset()
+                click.echo("Entity memory has been reset.")
+            if kickoff_outputs:
+                TaskOutputStorageHandler().reset()
+                click.echo("Latest Kickoff outputs stored has been reset.")
+            if knowledge:
+                KnowledgeStorage().reset()
+                click.echo("Knowledge has been reset.")

    except subprocess.CalledProcessError as e:
        click.echo(f"An error occurred while resetting the memories: {e}", err=True)
--- a/src/crewai/cli/templates/crew/pyproject.toml
+++ b/src/crewai/cli/templates/crew/pyproject.toml
@@ -5,7 +5,7 @@ description = "{{name}} using crewAI"
 authors = [{ name = "Your Name", email = "you@example.com" }]
 requires-python = ">=3.10,<3.13"
 dependencies = [
-    "crewai[tools]>=0.100.1,<1.0.0"
+    "crewai[tools]>=0.100.0,<1.0.0"
 ]

 [project.scripts]
--- a/src/crewai/cli/templates/flow/pyproject.toml
+++ b/src/crewai/cli/templates/flow/pyproject.toml
@@ -5,7 +5,7 @@ description = "{{name}} using crewAI"
 authors = [{ name = "Your Name", email = "you@example.com" }]
 requires-python = ">=3.10,<3.13"
 dependencies = [
-    "crewai[tools]>=0.100.1,<1.0.0",
+    "crewai[tools]>=0.100.0,<1.0.0",
 ]

 [project.scripts]
--- a/src/crewai/cli/templates/tool/pyproject.toml
+++ b/src/crewai/cli/templates/tool/pyproject.toml
@@ -5,7 +5,7 @@ description = "Power up your crews with {{folder_name}}"
 readme = "README.md"
 requires-python = ">=3.10,<3.13"
 dependencies = [
-    "crewai[tools]>=0.100.1"
+    "crewai[tools]>=0.100.0"
 ]

 [tool.crewai]
--- a/src/crewai/cli/utils.py
+++ b/src/crewai/cli/utils.py
@@ -9,7 +9,6 @@ import tomli
 from rich.console import Console

 from crewai.cli.constants import ENV_VARS
-from crewai.crew import Crew

 if sys.version_info >= (3, 11):
    import tomllib
@@ -248,64 +247,3 @@ def write_env_file(folder_path, env_vars):
    with open(env_file_path, "w") as file:
        for key, value in env_vars.items():
            file.write(f"{key}={value}\n")
-
-
-def get_crew(crew_path: str = "crew.py", require: bool = False) -> Crew | None:
-    """Get the crew instance from the crew.py file."""
-    try:
-        import importlib.util
-        import os
-
-        for root, _, files in os.walk("."):
-            if "crew.py" in files:
-                crew_path = os.path.join(root, "crew.py")
-                try:
-                    spec = importlib.util.spec_from_file_location(
-                        "crew_module", crew_path
-                    )
-                    if not spec or not spec.loader:
-                        continue
-                    module = importlib.util.module_from_spec(spec)
-                    try:
-                        sys.modules[spec.name] = module
-                        spec.loader.exec_module(module)
-
-                        for attr_name in dir(module):
-                            attr = getattr(module, attr_name)
-                            try:
-                                if callable(attr) and hasattr(attr, "crew"):
-                                    crew_instance = attr().crew()
-                                    return crew_instance
-
-                            except Exception as e:
-                                print(f"Error processing attribute {attr_name}: {e}")
-                                continue
-
-                    except Exception as exec_error:
-                        print(f"Error executing module: {exec_error}")
-                        import traceback
-
-                        print(f"Traceback: {traceback.format_exc()}")
-
-                except (ImportError, AttributeError) as e:
-                    if require:
-                        console.print(
-                            f"Error importing crew from {crew_path}: {str(e)}",
-                            style="bold red",
-                        )
-                        continue
-
-                break
-
-        if require:
-            console.print("No valid Crew instance found in crew.py", style="bold red")
-            raise SystemExit
-        return None
-
-    except Exception as e:
-        if require:
-            console.print(
-                f"Unexpected error while loading crew: {str(e)}", style="bold red"
-            )
-            raise SystemExit
-        return None
--- a/src/crewai/crew.py
+++ b/src/crewai/crew.py
@@ -183,9 +183,9 @@ class Crew(BaseModel):
        default=None,
        description="Path to the prompt json file to be used for the crew.",
    )
-    output_log_file: Optional[Union[bool, str]] = Field(
+    output_log_file: Optional[str] = Field(
        default=None,
-        description="Path to the log file to be saved",
+        description="output_log_file",
    )
    planning: Optional[bool] = Field(
        default=False,
@@ -293,7 +293,7 @@ class Crew(BaseModel):
                ):
                    self.knowledge = Knowledge(
                        sources=self.knowledge_sources,
-                        embedder=self.embedder,
+                        embedder_config=self.embedder,
                        collection_name="crew",
                    )

@@ -380,22 +380,6 @@ class Crew(BaseModel):

        return self

-    @model_validator(mode="after")
-    def validate_must_have_non_conditional_task(self) -> "Crew":
-        """Ensure that a crew has at least one non-conditional task."""
-        if not self.tasks:
-            return self
-        non_conditional_count = sum(
-            1 for task in self.tasks if not isinstance(task, ConditionalTask)
-        )
-        if non_conditional_count == 0:
-            raise PydanticCustomError(
-                "only_conditional_tasks",
-                "Crew must include at least one non-conditional task",
-                {},
-            )
-        return self
-
    @model_validator(mode="after")
    def validate_first_task(self) -> "Crew":
        """Ensure the first task is not a ConditionalTask."""
@@ -455,8 +439,6 @@ class Crew(BaseModel):
                        )
        return self

-
-
    @property
    def key(self) -> str:
        source = [agent.key for agent in self.agents] + [
@@ -699,7 +681,12 @@ class Crew(BaseModel):
                manager.tools = []
                raise Exception("Manager agent should not have tools")
        else:
-            self.manager_llm = create_llm(self.manager_llm)
+            self.manager_llm = (
+                getattr(self.manager_llm, "model_name", None)
+                or getattr(self.manager_llm, "model", None)
+                or getattr(self.manager_llm, "deployment_name", None)
+                or self.manager_llm
+            )
            manager = Agent(
                role=i18n.retrieve("hierarchical_manager_agent", "role"),
                goal=i18n.retrieve("hierarchical_manager_agent", "goal"),
@@ -759,7 +746,6 @@ class Crew(BaseModel):
                    task, task_outputs, futures, task_index, was_replayed
                )
                if skipped_task_output:
-                    task_outputs.append(skipped_task_output)
                    continue

            if task.async_execution:
@@ -783,7 +769,7 @@ class Crew(BaseModel):
                    context=context,
                    tools=tools_for_task,
                )
-                task_outputs.append(task_output)
+                task_outputs = [task_output]
                self._process_task_result(task, task_output)
                self._store_execution_log(task, task_output, task_index, was_replayed)

@@ -804,7 +790,7 @@ class Crew(BaseModel):
            task_outputs = self._process_async_tasks(futures, was_replayed)
            futures.clear()

-        previous_output = task_outputs[-1] if task_outputs else None
+        previous_output = task_outputs[task_index - 1] if task_outputs else None
        if previous_output is not None and not task.should_execute(previous_output):
            self._logger.log(
                "debug",
@@ -926,15 +912,11 @@ class Crew(BaseModel):
            )

    def _create_crew_output(self, task_outputs: List[TaskOutput]) -> CrewOutput:
-        if not task_outputs:
-            raise ValueError("No task outputs available to create crew output.")
-            
-        # Filter out empty outputs and get the last valid one as the main output
-        valid_outputs = [t for t in task_outputs if t.raw]
-        if not valid_outputs:
-            raise ValueError("No valid task outputs available to create crew output.")
-        final_task_output = valid_outputs[-1]
-            
+        if len(task_outputs) != 1:
+            raise ValueError(
+                "Something went wrong. Kickoff should return only one task output."
+            )
+        final_task_output = task_outputs[0]
        final_string_output = final_task_output.raw
        self._finish_execution(final_string_output)
        token_usage = self.calculate_usage_metrics()
@@ -943,7 +925,7 @@ class Crew(BaseModel):
            raw=final_task_output.raw,
            pydantic=final_task_output.pydantic,
            json_dict=final_task_output.json_dict,
-            tasks_output=task_outputs,
+            tasks_output=[task.output for task in self.tasks if task.output],
            token_usage=token_usage,
        )

@@ -1147,32 +1129,20 @@ class Crew(BaseModel):

    def test(
        self,
-        n_iterations: int = 1,
+        n_iterations: int,
        openai_model_name: Optional[str] = None,
-        llm: Optional[Union[str, LLM]] = None,
        inputs: Optional[Dict[str, Any]] = None,
    ) -> None:
-        """Test and evaluate the Crew with the given inputs for n iterations.
-
-        Args:
-            n_iterations: Number of iterations to run the test
-            openai_model_name: OpenAI model name to use for evaluation (deprecated)
-            llm: LLM instance or model name to use for evaluation
-            inputs: Optional dictionary of inputs to pass to the crew
-        """
-        if not llm and not openai_model_name:
-            raise ValueError("Must provide either 'llm' or 'openai_model_name' parameter")
-        
-        model_to_use = self._get_llm_instance(llm, openai_model_name)
+        """Test and evaluate the Crew with the given inputs for n iterations concurrently using concurrent.futures."""
        test_crew = self.copy()

        self._test_execution_span = test_crew._telemetry.test_execution_span(
            test_crew,
            n_iterations,
            inputs,
-            str(model_to_use.model),
-        )
-        evaluator = CrewEvaluator(test_crew, model_to_use)
+            openai_model_name,  # type: ignore[arg-type]
+        )  # type: ignore[arg-type]
+        evaluator = CrewEvaluator(test_crew, openai_model_name)  # type: ignore[arg-type]

        for i in range(1, n_iterations + 1):
            evaluator.set_iteration(i)
@@ -1180,104 +1150,5 @@ class Crew(BaseModel):

        evaluator.print_crew_evaluation_result()

-    def _get_llm_instance(self, llm: Optional[Union[str, LLM]], openai_model_name: Optional[str]) -> LLM:
-        """Get an LLM instance from either llm or openai_model_name parameter.
-        
-        Args:
-            llm: LLM instance or model name
-            openai_model_name: OpenAI model name (deprecated)
-            
-        Returns:
-            LLM instance
-            
-        Raises:
-            ValueError: If neither llm nor openai_model_name is provided
-        """
-        model = llm if llm is not None else openai_model_name
-        if model is None:
-            raise ValueError("Must provide either 'llm' or 'openai_model_name' parameter")
-        if isinstance(model, str):
-            return LLM(model=model)
-        if not isinstance(model, LLM):
-            raise ValueError("Model must be either a string or an LLM instance")
-        return model
-
    def __repr__(self):
        return f"Crew(id={self.id}, process={self.process}, number_of_agents={len(self.agents)}, number_of_tasks={len(self.tasks)})"
-
-    def reset_memories(self, command_type: str) -> None:
-        """Reset specific or all memories for the crew.
-
-        Args:
-            command_type: Type of memory to reset.
-                Valid options: 'long', 'short', 'entity', 'knowledge',
-                'kickoff_outputs', or 'all'
-
-        Raises:
-            ValueError: If an invalid command type is provided.
-            RuntimeError: If memory reset operation fails.
-        """
-        VALID_TYPES = frozenset(
-            ["long", "short", "entity", "knowledge", "kickoff_outputs", "all"]
-        )
-
-        if command_type not in VALID_TYPES:
-            raise ValueError(
-                f"Invalid command type. Must be one of: {', '.join(sorted(VALID_TYPES))}"
-            )
-
-        try:
-            if command_type == "all":
-                self._reset_all_memories()
-            else:
-                self._reset_specific_memory(command_type)
-
-            self._logger.log("info", f"{command_type} memory has been reset")
-
-        except Exception as e:
-            error_msg = f"Failed to reset {command_type} memory: {str(e)}"
-            self._logger.log("error", error_msg)
-            raise RuntimeError(error_msg) from e
-
-    def _reset_all_memories(self) -> None:
-        """Reset all available memory systems."""
-        memory_systems = [
-            ("short term", self._short_term_memory),
-            ("entity", self._entity_memory),
-            ("long term", self._long_term_memory),
-            ("task output", self._task_output_handler),
-            ("knowledge", self.knowledge),
-        ]
-
-        for name, system in memory_systems:
-            if system is not None:
-                try:
-                    system.reset()
-                except Exception as e:
-                    raise RuntimeError(f"Failed to reset {name} memory") from e
-
-    def _reset_specific_memory(self, memory_type: str) -> None:
-        """Reset a specific memory system.
-
-        Args:
-            memory_type: Type of memory to reset
-
-        Raises:
-            RuntimeError: If the specified memory system fails to reset
-        """
-        reset_functions = {
-            "long": (self._long_term_memory, "long term"),
-            "short": (self._short_term_memory, "short term"),
-            "entity": (self._entity_memory, "entity"),
-            "knowledge": (self.knowledge, "knowledge"),
-            "kickoff_outputs": (self._task_output_handler, "task output"),
-        }
-
-        memory_system, name = reset_functions[memory_type]
-        if memory_system is None:
-            raise RuntimeError(f"{name} memory system is not initialized")
-
-        try:
-            memory_system.reset()
-        except Exception as e:
-            raise RuntimeError(f"Failed to reset {name} memory") from e
--- a/src/crewai/flow/flow.py
+++ b/src/crewai/flow/flow.py
@@ -600,7 +600,7 @@ class Flow(Generic[T], metaclass=FlowMeta):
            ```
        """
        try:
-            if not hasattr(self, "_state"):
+            if not hasattr(self, '_state'):
                return ""

            if isinstance(self._state, dict):
@@ -706,31 +706,26 @@ class Flow(Generic[T], metaclass=FlowMeta):
            inputs: Optional dictionary containing input values and potentially a state ID to restore
        """
        # Handle state restoration if ID is provided in inputs
-        if inputs and "id" in inputs and self._persistence is not None:
-            restore_uuid = inputs["id"]
+        if inputs and 'id' in inputs and self._persistence is not None:
+            restore_uuid = inputs['id']
            stored_state = self._persistence.load_state(restore_uuid)

            # Override the id in the state if it exists in inputs
-            if "id" in inputs:
+            if 'id' in inputs:
                if isinstance(self._state, dict):
-                    self._state["id"] = inputs["id"]
+                    self._state['id'] = inputs['id']
                elif isinstance(self._state, BaseModel):
-                    setattr(self._state, "id", inputs["id"])
+                    setattr(self._state, 'id', inputs['id'])

            if stored_state:
-                self._log_flow_event(
-                    f"Loading flow state from memory for UUID: {restore_uuid}",
-                    color="yellow",
-                )
+                self._log_flow_event(f"Loading flow state from memory for UUID: {restore_uuid}", color="yellow")
                # Restore the state
                self._restore_state(stored_state)
            else:
-                self._log_flow_event(
-                    f"No flow state found for UUID: {restore_uuid}", color="red"
-                )
+                self._log_flow_event(f"No flow state found for UUID: {restore_uuid}", color="red")

            # Apply any additional inputs after restoration
-            filtered_inputs = {k: v for k, v in inputs.items() if k != "id"}
+            filtered_inputs = {k: v for k, v in inputs.items() if k != 'id'}
            if filtered_inputs:
                self._initialize_state(filtered_inputs)

@@ -742,11 +737,9 @@ class Flow(Generic[T], metaclass=FlowMeta):
                flow_name=self.__class__.__name__,
            ),
        )
-        self._log_flow_event(
-            f"Flow started with ID: {self.flow_id}", color="bold_magenta"
-        )
+        self._log_flow_event(f"Flow started with ID: {self.flow_id}", color="bold_magenta")

-        if inputs is not None and "id" not in inputs:
+        if inputs is not None and 'id' not in inputs:
            self._initialize_state(inputs)

        return asyncio.run(self.kickoff_async())
@@ -991,9 +984,7 @@ class Flow(Generic[T], metaclass=FlowMeta):

            traceback.print_exc()

-    def _log_flow_event(
-        self, message: str, color: str = "yellow", level: str = "info"
-    ) -> None:
+    def _log_flow_event(self, message: str, color: str = "yellow", level: str = "info") -> None:
        """Centralized logging method for flow events.

        This method provides a consistent interface for logging flow-related events,
--- a/src/crewai/knowledge/knowledge.py
+++ b/src/crewai/knowledge/knowledge.py
@@ -67,9 +67,3 @@ class Knowledge(BaseModel):
                source.add()
        except Exception as e:
            raise e
-
-    def reset(self) -> None:
-        if self.storage:
-            self.storage.reset()
-        else:
-            raise ValueError("Storage is not initialized.")
--- a/src/crewai/llm.py
+++ b/src/crewai/llm.py
@@ -164,7 +164,6 @@ class LLM:
        self.context_window_size = 0
        self.reasoning_effort = reasoning_effort
        self.additional_params = kwargs
-        self.is_anthropic = self._is_anthropic_model(model)

        litellm.drop_params = True

@@ -179,62 +178,42 @@ class LLM:
        self.set_callbacks(callbacks)
        self.set_env_callbacks()

-    def _is_anthropic_model(self, model: str) -> bool:
-        """Determine if the model is from Anthropic provider.
-        
-        Args:
-            model: The model identifier string.
-            
-        Returns:
-            bool: True if the model is from Anthropic, False otherwise.
-        """
-        ANTHROPIC_PREFIXES = ('anthropic/', 'claude-', 'claude/')
-        return any(prefix in model.lower() for prefix in ANTHROPIC_PREFIXES)
-
    def call(
        self,
        messages: Union[str, List[Dict[str, str]]],
        tools: Optional[List[dict]] = None,
        callbacks: Optional[List[Any]] = None,
        available_functions: Optional[Dict[str, Any]] = None,
-    ) -> Union[str, Any]:
-        """High-level LLM call method.
-        
-        Args:
-            messages: Input messages for the LLM.
-                     Can be a string or list of message dictionaries.
-                     If string, it will be converted to a single user message.
-                     If list, each dict must have 'role' and 'content' keys.
-            tools: Optional list of tool schemas for function calling.
-                  Each tool should define its name, description, and parameters.
-            callbacks: Optional list of callback functions to be executed
-                      during and after the LLM call.
-            available_functions: Optional dict mapping function names to callables
-                               that can be invoked by the LLM.
-        
+    ) -> str:
+        """
+        High-level llm call method that:
+          1) Accepts either a string or a list of messages
+          2) Converts string input to the required message format
+          3) Calls litellm.completion
+          4) Handles function/tool calls if any
+          5) Returns the final text response or tool result
+
+        Parameters:
+        - messages (Union[str, List[Dict[str, str]]]): The input messages for the LLM.
+          - If a string is provided, it will be converted into a message list with a single entry.
+          - If a list of dictionaries is provided, each dictionary should have 'role' and 'content' keys.
+        - tools (Optional[List[dict]]): A list of tool schemas for function calling.
+        - callbacks (Optional[List[Any]]): A list of callback functions to be executed.
+        - available_functions (Optional[Dict[str, Any]]): A dictionary mapping function names to actual Python functions.
+
        Returns:
-            Union[str, Any]: Either a text response from the LLM (str) or
-                           the result of a tool function call (Any).
-        
-        Raises:
-            TypeError: If messages format is invalid
-            ValueError: If response format is not supported
-            LLMContextLengthExceededException: If input exceeds model's context limit
-        
+        - str: The final text response from the LLM or the result of a tool function call.
+
        Examples:
-            # Example 1: Simple string input
-            >>> response = llm.call("Return the name of a random city.")
-            >>> print(response)
-            "Paris"
-            
-            # Example 2: Message list with system and user messages
-            >>> messages = [
-            ...     {"role": "system", "content": "You are a geography expert"},
-            ...     {"role": "user", "content": "What is France's capital?"}
-            ... ]
-            >>> response = llm.call(messages)
-            >>> print(response)
-            "The capital of France is Paris."
+        ---------
+        # Example 1: Using a string input
+        response = llm.call("Return the name of a random city in the world.")
+        print(response)
+
+        # Example 2: Using a list of messages
+        messages = [{"role": "user", "content": "What is the capital of France?"}]
+        response = llm.call(messages)
+        print(response)
        """
        # Validate parameters before proceeding with the call.
        self._validate_call_params()
@@ -242,25 +221,15 @@ class LLM:
        if isinstance(messages, str):
            messages = [{"role": "user", "content": messages}]

-        # For O1 models, system messages are not supported.
-        # Convert any system messages into assistant messages.
-        if "o1" in self.model.lower():
-            for message in messages:
-                if message.get("role") == "system":
-                    message["role"] = "assistant"
-
        with suppress_warnings():
            if callbacks and len(callbacks) > 0:
                self.set_callbacks(callbacks)

            try:
-                # --- 1) Format messages according to provider requirements
-                formatted_messages = self._format_messages_for_provider(messages)
-
-                # --- 2) Prepare the parameters for the completion call
+                # --- 1) Prepare the parameters for the completion call
                params = {
                    "model": self.model,
-                    "messages": formatted_messages,
+                    "messages": messages,
                    "timeout": self.timeout,
                    "temperature": self.temperature,
                    "top_p": self.top_p,
@@ -348,38 +317,6 @@ class LLM:
                    logging.error(f"LiteLLM call failed: {str(e)}")
                raise

-    def _format_messages_for_provider(self, messages: List[Dict[str, str]]) -> List[Dict[str, str]]:
-        """Format messages according to provider requirements.
-        
-        Args:
-            messages: List of message dictionaries with 'role' and 'content' keys.
-                     Can be empty or None.
-        
-        Returns:
-            List of formatted messages according to provider requirements.
-            For Anthropic models, ensures first message has 'user' role.
-        
-        Raises:
-            TypeError: If messages is None or contains invalid message format.
-        """
-        if messages is None:
-            raise TypeError("Messages cannot be None")
-            
-        # Validate message format first
-        for msg in messages:
-            if not isinstance(msg, dict) or "role" not in msg or "content" not in msg:
-                raise TypeError("Invalid message format. Each message must be a dict with 'role' and 'content' keys")
-            
-        if not self.is_anthropic:
-            return messages
-            
-        # Anthropic requires messages to start with 'user' role
-        if not messages or messages[0]["role"] == "system":
-            # If first message is system or empty, add a placeholder user message
-            return [{"role": "user", "content": "."}, *messages]
-                
-        return messages
-
    def _get_custom_llm_provider(self) -> str:
        """
        Derives the custom_llm_provider from the model string.
--- a/src/crewai/memory/entity/entity_memory.py
+++ b/src/crewai/memory/entity/entity_memory.py
@@ -1,7 +1,3 @@
-from typing import Optional
-
-from pydantic import PrivateAttr
-
 from crewai.memory.entity.entity_memory_item import EntityMemoryItem
 from crewai.memory.memory import Memory
 from crewai.memory.storage.rag_storage import RAGStorage
@@ -14,15 +10,13 @@ class EntityMemory(Memory):
    Inherits from the Memory class.
    """

-    _memory_provider: Optional[str] = PrivateAttr()
-
    def __init__(self, crew=None, embedder_config=None, storage=None, path=None):
-        if crew and hasattr(crew, "memory_config") and crew.memory_config is not None:
-            memory_provider = crew.memory_config.get("provider")
+        if hasattr(crew, "memory_config") and crew.memory_config is not None:
+            self.memory_provider = crew.memory_config.get("provider")
        else:
-            memory_provider = None
+            self.memory_provider = None

-        if memory_provider == "mem0":
+        if self.memory_provider == "mem0":
            try:
                from crewai.memory.storage.mem0_storage import Mem0Storage
            except ImportError:
@@ -42,13 +36,11 @@ class EntityMemory(Memory):
                    path=path,
                )
            )
-
-        super().__init__(storage=storage)
-        self._memory_provider = memory_provider
+        super().__init__(storage)

    def save(self, item: EntityMemoryItem) -> None:  # type: ignore # BUG?: Signature of "save" incompatible with supertype "Memory"
        """Saves an entity item into the SQLite storage."""
-        if self._memory_provider == "mem0":
+        if self.memory_provider == "mem0":
            data = f"""
            Remember details about the following entity:
            Name: {item.name}
--- a/src/crewai/memory/long_term/long_term_memory.py
+++ b/src/crewai/memory/long_term/long_term_memory.py
@@ -17,7 +17,7 @@ class LongTermMemory(Memory):
    def __init__(self, storage=None, path=None):
        if not storage:
            storage = LTMSQLiteStorage(db_path=path) if path else LTMSQLiteStorage()
-        super().__init__(storage=storage)
+        super().__init__(storage)

    def save(self, item: LongTermMemoryItem) -> None:  # type: ignore # BUG?: Signature of "save" incompatible with supertype "Memory"
        metadata = item.metadata
--- a/src/crewai/memory/memory.py
+++ b/src/crewai/memory/memory.py
@@ -1,19 +1,15 @@
 from typing import Any, Dict, List, Optional

-from pydantic import BaseModel
+from crewai.memory.storage.rag_storage import RAGStorage


-class Memory(BaseModel):
+class Memory:
    """
    Base class for memory, now supporting agent tags and generic metadata.
    """

-    embedder_config: Optional[Dict[str, Any]] = None
-
-    storage: Any
-
-    def __init__(self, storage: Any, **data: Any):
-        super().__init__(storage=storage, **data)
+    def __init__(self, storage: RAGStorage):
+        self.storage = storage

    def save(
        self,
--- a/src/crewai/memory/short_term/short_term_memory.py
+++ b/src/crewai/memory/short_term/short_term_memory.py
@@ -1,7 +1,5 @@
 from typing import Any, Dict, Optional

-from pydantic import PrivateAttr
-
 from crewai.memory.memory import Memory
 from crewai.memory.short_term.short_term_memory_item import ShortTermMemoryItem
 from crewai.memory.storage.rag_storage import RAGStorage
@@ -16,15 +14,13 @@ class ShortTermMemory(Memory):
    MemoryItem instances.
    """

-    _memory_provider: Optional[str] = PrivateAttr()
-
    def __init__(self, crew=None, embedder_config=None, storage=None, path=None):
-        if crew and hasattr(crew, "memory_config") and crew.memory_config is not None:
-            memory_provider = crew.memory_config.get("provider")
+        if hasattr(crew, "memory_config") and crew.memory_config is not None:
+            self.memory_provider = crew.memory_config.get("provider")
        else:
-            memory_provider = None
+            self.memory_provider = None

-        if memory_provider == "mem0":
+        if self.memory_provider == "mem0":
            try:
                from crewai.memory.storage.mem0_storage import Mem0Storage
            except ImportError:
@@ -43,8 +39,7 @@ class ShortTermMemory(Memory):
                    path=path,
                )
            )
-        super().__init__(storage=storage)
-        self._memory_provider = memory_provider
+        super().__init__(storage)

    def save(
        self,
@@ -53,7 +48,7 @@ class ShortTermMemory(Memory):
        agent: Optional[str] = None,
    ) -> None:
        item = ShortTermMemoryItem(data=value, metadata=metadata, agent=agent)
-        if self._memory_provider == "mem0":
+        if self.memory_provider == "mem0":
            item.data = f"Remember the following insights from Agent run: {item.data}"

        super().save(value=item.data, metadata=item.metadata, agent=item.agent)
--- a/src/crewai/memory/storage/base_rag_storage.py
+++ b/src/crewai/memory/storage/base_rag_storage.py
@@ -13,7 +13,7 @@ class BaseRAGStorage(ABC):
        self,
        type: str,
        allow_reset: bool = True,
-        embedder_config: Optional[Dict[str, Any]] = None,
+        embedder_config: Optional[Any] = None,
        crew: Any = None,
    ):
        self.type = type
--- a/src/crewai/task.py
+++ b/src/crewai/task.py
@@ -423,10 +423,6 @@ class Task(BaseModel):
        if self.callback:
            self.callback(self.output)

-        crew = self.agent.crew  # type: ignore[union-attr]
-        if crew and crew.task_callback and crew.task_callback != self.callback:
-            crew.task_callback(self.output)
-
        if self._execution_span:
            self._telemetry.task_ended(self._execution_span, self, agent.crew)
            self._execution_span = None
@@ -674,32 +670,19 @@ class Task(BaseModel):
            return OutputFormat.PYDANTIC
        return OutputFormat.RAW

-    def _save_file(self, result: Union[Dict, str, Any]) -> None:
+    def _save_file(self, result: Any) -> None:
        """Save task output to a file.

-        Note:
-            For cross-platform file writing, especially on Windows, consider using FileWriterTool
-            from the crewai_tools package:
-                pip install 'crewai[tools]'
-                from crewai_tools import FileWriterTool
-
        Args:
            result: The result to save to the file. Can be a dict or any stringifiable object.

        Raises:
            ValueError: If output_file is not set
-            RuntimeError: If there is an error writing to the file. For cross-platform
-                compatibility, especially on Windows, use FileWriterTool from crewai_tools
-                package.
+            RuntimeError: If there is an error writing to the file
        """
        if self.output_file is None:
            raise ValueError("output_file is not set.")

-        FILEWRITER_RECOMMENDATION = (
-            "For cross-platform file writing, especially on Windows, "
-            "use FileWriterTool from crewai_tools package."
-        )
-
        try:
            resolved_path = Path(self.output_file).expanduser().resolve()
            directory = resolved_path.parent
@@ -715,12 +698,7 @@ class Task(BaseModel):
                else:
                    file.write(str(result))
        except (OSError, IOError) as e:
-            raise RuntimeError(
-                "\n".join([
-                    f"Failed to save output file: {e}",
-                    FILEWRITER_RECOMMENDATION
-                ])
-            )
+            raise RuntimeError(f"Failed to save output file: {e}")
        return None

    def __repr__(self):
--- a/src/crewai/tools/agent_tools/add_image_tool.py
+++ b/src/crewai/tools/agent_tools/add_image_tool.py
@@ -7,11 +7,11 @@ from crewai.utilities import I18N

 i18n = I18N()

-
 class AddImageToolSchema(BaseModel):
    image_url: str = Field(..., description="The URL or path of the image to add")
    action: Optional[str] = Field(
-        default=None, description="Optional context or question about the image"
+        default=None,
+        description="Optional context or question about the image"
    )


@@ -36,7 +36,10 @@ class AddImageTool(BaseTool):
                "image_url": {
                    "url": image_url,
                },
-            },
+            }
        ]

-        return {"role": "user", "content": content}
+        return {
+            "role": "user",
+            "content": content
+        }
--- a/src/crewai/translations/en.json
+++ b/src/crewai/translations/en.json
@@ -15,7 +15,7 @@
    "final_answer_format": "If you don't need to use any more tools, you must give your best complete final answer, make sure it satisfies the expected criteria, use the EXACT format below:\n\n```\nThought: I now can give a great answer\nFinal Answer: my best complete final answer to the task.\n\n```",
    "format_without_tools": "\nSorry, I didn't use the right format. I MUST either use a tool (among the available ones), OR give my best final answer.\nHere is the expected format I must follow:\n\n```\nQuestion: the input question you must answer\nThought: you should always think about what to do\nAction: the action to take, should be one of [{tool_names}]\nAction Input: the input to the action\nObservation: the result of the action\n```\n This Thought/Action/Action Input/Result process can repeat N times. Once I know the final answer, I must return the following format:\n\n```\nThought: I now can give a great answer\nFinal Answer: Your final answer must be the great and the most complete as possible, it must be outcome described\n\n```",
    "task_with_context": "{task}\n\nThis is the context you're working with:\n{context}",
-    "expected_output": "\nThis is the expected criteria for your final answer: {expected_output}\nyou MUST return the actual complete content as the final answer, not a summary.",
+    "expected_output": "\nThis is the expect criteria for your final answer: {expected_output}\nyou MUST return the actual complete content as the final answer, not a summary.",
    "human_feedback": "You got human feedback on your work, re-evaluate it and give a new Final Answer when ready.\n {human_feedback}",
    "getting_input": "This is the agent's final answer: {final_answer}\n\n",
    "summarizer_system_message": "You are a helpful assistant that summarizes text.",
--- a/src/crewai/utilities/embedding_configurator.py
+++ b/src/crewai/utilities/embedding_configurator.py
@@ -1,5 +1,5 @@
 import os
-from typing import Any, Dict, Optional, cast
+from typing import Any, Dict, cast

 from chromadb import Documents, EmbeddingFunction, Embeddings
 from chromadb.api.types import validate_embedding_function
@@ -18,12 +18,11 @@ class EmbeddingConfigurator:
            "bedrock": self._configure_bedrock,
            "huggingface": self._configure_huggingface,
            "watson": self._configure_watson,
-            "custom": self._configure_custom,
        }

    def configure_embedder(
        self,
-        embedder_config: Optional[Dict[str, Any]] = None,
+        embedder_config: Dict[str, Any] | None = None,
    ) -> EmbeddingFunction:
        """Configures and returns an embedding function based on the provided config."""
        if embedder_config is None:
@@ -31,19 +30,20 @@ class EmbeddingConfigurator:

        provider = embedder_config.get("provider")
        config = embedder_config.get("config", {})
-        model_name = config.get("model") if provider != "custom" else None
+        model_name = config.get("model")
+
+        if isinstance(provider, EmbeddingFunction):
+            try:
+                validate_embedding_function(provider)
+                return provider
+            except Exception as e:
+                raise ValueError(f"Invalid custom embedding function: {str(e)}")

        if provider not in self.embedding_functions:
            raise Exception(
                f"Unsupported embedding provider: {provider}, supported providers: {list(self.embedding_functions.keys())}"
            )
-
-        embedding_function = self.embedding_functions[provider]
-        return (
-            embedding_function(config)
-            if provider == "custom"
-            else embedding_function(config, model_name)
-        )
+        return self.embedding_functions[provider](config, model_name)

    @staticmethod
    def _create_default_embedding_function():
@@ -64,13 +64,6 @@ class EmbeddingConfigurator:
        return OpenAIEmbeddingFunction(
            api_key=config.get("api_key") or os.getenv("OPENAI_API_KEY"),
            model_name=model_name,
-            api_base=config.get("api_base", None),
-            api_type=config.get("api_type", None),
-            api_version=config.get("api_version", None),
-            default_headers=config.get("default_headers", None),
-            dimensions=config.get("dimensions", None),
-            deployment_id=config.get("deployment_id", None),
-            organization_id=config.get("organization_id", None),
        )

    @staticmethod
@@ -85,10 +78,6 @@ class EmbeddingConfigurator:
            api_type=config.get("api_type", "azure"),
            api_version=config.get("api_version"),
            model_name=model_name,
-            default_headers=config.get("default_headers"),
-            dimensions=config.get("dimensions"),
-            deployment_id=config.get("deployment_id"),
-            organization_id=config.get("organization_id"),
        )

    @staticmethod
@@ -111,8 +100,6 @@ class EmbeddingConfigurator:
        return GoogleVertexEmbeddingFunction(
            model_name=model_name,
            api_key=config.get("api_key"),
-            project_id=config.get("project_id"),
-            region=config.get("region"),
        )

    @staticmethod
@@ -124,7 +111,6 @@ class EmbeddingConfigurator:
        return GoogleGenerativeAiEmbeddingFunction(
            model_name=model_name,
            api_key=config.get("api_key"),
-            task_type=config.get("task_type"),
        )

    @staticmethod
@@ -209,28 +195,3 @@ class EmbeddingConfigurator:
                    raise e

        return WatsonEmbeddingFunction()
-
-    @staticmethod
-    def _configure_custom(config):
-        custom_embedder = config.get("embedder")
-        if isinstance(custom_embedder, EmbeddingFunction):
-            try:
-                validate_embedding_function(custom_embedder)
-                return custom_embedder
-            except Exception as e:
-                raise ValueError(f"Invalid custom embedding function: {str(e)}")
-        elif callable(custom_embedder):
-            try:
-                instance = custom_embedder()
-                if isinstance(instance, EmbeddingFunction):
-                    validate_embedding_function(instance)
-                    return instance
-                raise ValueError(
-                    "Custom embedder does not create an EmbeddingFunction instance"
-                )
-            except Exception as e:
-                raise ValueError(f"Error instantiating custom embedder: {str(e)}")
-        else:
-            raise ValueError(
-                "Custom embedder must be an instance of `EmbeddingFunction` or a callable that creates one"
-            )
--- a/src/crewai/utilities/evaluators/crew_evaluator_handler.py
+++ b/src/crewai/utilities/evaluators/crew_evaluator_handler.py
@@ -1,5 +1,4 @@
 from collections import defaultdict
-from typing import Union

 from pydantic import BaseModel, Field
 from rich.box import HEAVY_EDGE
@@ -7,7 +6,6 @@ from rich.console import Console
 from rich.table import Table

 from crewai.agent import Agent
-from crewai.llm import LLM
 from crewai.task import Task
 from crewai.tasks.task_output import TaskOutput
 from crewai.telemetry import Telemetry
@@ -34,9 +32,9 @@ class CrewEvaluator:
    run_execution_times: defaultdict = defaultdict(list)
    iteration: int = 0

-    def __init__(self, crew, llm: Union[str, LLM]):
+    def __init__(self, crew, openai_model_name: str):
        self.crew = crew
-        self.llm = LLM(model=llm) if isinstance(llm, str) else llm
+        self.openai_model_name = openai_model_name
        self._telemetry = Telemetry()
        self._setup_for_evaluating()

@@ -53,7 +51,7 @@ class CrewEvaluator:
            ),
            backstory="Evaluator agent for crew evaluation with precise capabilities to evaluate the performance of the agents in the crew based on the tasks they have performed",
            verbose=False,
-            llm=self.llm,
+            llm=self.openai_model_name,
        )

    def _evaluation_task(
@@ -183,7 +181,7 @@ class CrewEvaluator:
                self.crew,
                evaluation_result.pydantic.quality,
                current_task.execution_duration,
-                self.llm.model,
+                self.openai_model_name,
            )
            self.tasks_scores[self.iteration].append(evaluation_result.pydantic.quality)
            self.run_execution_times[self.iteration].append(
--- a/src/crewai/utilities/file_handler.py
+++ b/src/crewai/utilities/file_handler.py
@@ -1,64 +1,30 @@
-import json
 import os
 import pickle
 from datetime import datetime
-from typing import Union


 class FileHandler:
-    """Handler for file operations supporting both JSON and text-based logging.
-    
-    Args:
-        file_path (Union[bool, str]): Path to the log file or boolean flag
-    """
+    """take care of file operations, currently it only logs messages to a file"""

-    def __init__(self, file_path: Union[bool, str]):
-        self._initialize_path(file_path)
-        
-    def _initialize_path(self, file_path: Union[bool, str]):
-        if file_path is True:  # File path is boolean True
+    def __init__(self, file_path):
+        if isinstance(file_path, bool):
            self._path = os.path.join(os.curdir, "logs.txt")
-        
-        elif isinstance(file_path, str):  # File path is a string
-            if file_path.endswith((".json", ".txt")):
-                self._path = file_path  # No modification if the file ends with .json or .txt
-            else:
-                self._path = file_path + ".txt"  # Append .txt if the file doesn't end with .json or .txt
-        
+        elif isinstance(file_path, str):
+            self._path = file_path
        else:
-            raise ValueError("file_path must be a string or boolean.")  # Handle the case where file_path isn't valid
-        
+            raise ValueError("file_path must be either a boolean or a string.")
+
    def log(self, **kwargs):
-        try:
-            now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
-            log_entry = {"timestamp": now, **kwargs}
+        now = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+        message = (
+            f"{now}: "
+            + ", ".join([f'{key}="{value}"' for key, value in kwargs.items()])
+            + "\n"
+        )
+        with open(self._path, "a", encoding="utf-8") as file:
+            file.write(message + "\n")

-            if self._path.endswith(".json"):
-                # Append log in JSON format
-                with open(self._path, "a", encoding="utf-8") as file:
-                    # If the file is empty, start with a list; else, append to it
-                    try:
-                        # Try reading existing content to avoid overwriting
-                        with open(self._path, "r", encoding="utf-8") as read_file:
-                            existing_data = json.load(read_file)
-                            existing_data.append(log_entry)
-                    except (json.JSONDecodeError, FileNotFoundError):
-                        # If no valid JSON or file doesn't exist, start with an empty list
-                        existing_data = [log_entry]
-                    
-                    with open(self._path, "w", encoding="utf-8") as write_file:
-                        json.dump(existing_data, write_file, indent=4)
-                        write_file.write("\n")
-            
-            else:
-                # Append log in plain text format
-                message = f"{now}: " + ", ".join([f"{key}=\"{value}\"" for key, value in kwargs.items()]) + "\n"
-                with open(self._path, "a", encoding="utf-8") as file:
-                    file.write(message)

-        except Exception as e:
-            raise ValueError(f"Failed to log message: {str(e)}")
-        
 class PickleHandler:
    def __init__(self, file_name: str) -> None:
        """
--- a/tests/agent_test.py
+++ b/tests/agent_test.py
@@ -1183,7 +1183,7 @@ def test_agent_max_retry_limit():
            [
                mock.call(
                    {
-                        "input": "Say the word: Hi\n\nThis is the expected criteria for your final answer: The word: Hi\nyou MUST return the actual complete content as the final answer, not a summary.",
+                        "input": "Say the word: Hi\n\nThis is the expect criteria for your final answer: The word: Hi\nyou MUST return the actual complete content as the final answer, not a summary.",
                        "tool_names": "",
                        "tools": "",
                        "ask_for_human_input": True,
@@ -1191,7 +1191,7 @@ def test_agent_max_retry_limit():
                ),
                mock.call(
                    {
-                        "input": "Say the word: Hi\n\nThis is the expected criteria for your final answer: The word: Hi\nyou MUST return the actual complete content as the final answer, not a summary.",
+                        "input": "Say the word: Hi\n\nThis is the expect criteria for your final answer: The word: Hi\nyou MUST return the actual complete content as the final answer, not a summary.",
                        "tool_names": "",
                        "tools": "",
                        "ask_for_human_input": True,
--- a/tests/cli/cli_test.py
+++ b/tests/cli/cli_test.py
@@ -55,83 +55,72 @@ def test_train_invalid_string_iterations(train_crew, runner):
    )


-@mock.patch("crewai.cli.reset_memories_command.get_crew")
-def test_reset_all_memories(mock_get_crew, runner):
-    mock_crew = mock.Mock()
-    mock_get_crew.return_value = mock_crew
-    result = runner.invoke(reset_memories, ["-a"])
+@mock.patch("crewai.cli.reset_memories_command.ShortTermMemory")
+@mock.patch("crewai.cli.reset_memories_command.EntityMemory")
+@mock.patch("crewai.cli.reset_memories_command.LongTermMemory")
+@mock.patch("crewai.cli.reset_memories_command.TaskOutputStorageHandler")
+def test_reset_all_memories(
+    MockTaskOutputStorageHandler,
+    MockLongTermMemory,
+    MockEntityMemory,
+    MockShortTermMemory,
+    runner,
+):
+    result = runner.invoke(reset_memories, ["--all"])
+    MockShortTermMemory().reset.assert_called_once()
+    MockEntityMemory().reset.assert_called_once()
+    MockLongTermMemory().reset.assert_called_once()
+    MockTaskOutputStorageHandler().reset.assert_called_once()

-    mock_crew.reset_memories.assert_called_once_with(command_type="all")
    assert result.output == "All memories have been reset.\n"


-@mock.patch("crewai.cli.reset_memories_command.get_crew")
-def test_reset_short_term_memories(mock_get_crew, runner):
-    mock_crew = mock.Mock()
-    mock_get_crew.return_value = mock_crew
+@mock.patch("crewai.cli.reset_memories_command.ShortTermMemory")
+def test_reset_short_term_memories(MockShortTermMemory, runner):
    result = runner.invoke(reset_memories, ["-s"])
-
-    mock_crew.reset_memories.assert_called_once_with(command_type="short")
+    MockShortTermMemory().reset.assert_called_once()
    assert result.output == "Short term memory has been reset.\n"


-@mock.patch("crewai.cli.reset_memories_command.get_crew")
-def test_reset_entity_memories(mock_get_crew, runner):
-    mock_crew = mock.Mock()
-    mock_get_crew.return_value = mock_crew
+@mock.patch("crewai.cli.reset_memories_command.EntityMemory")
+def test_reset_entity_memories(MockEntityMemory, runner):
    result = runner.invoke(reset_memories, ["-e"])
-
-    mock_crew.reset_memories.assert_called_once_with(command_type="entity")
+    MockEntityMemory().reset.assert_called_once()
    assert result.output == "Entity memory has been reset.\n"


-@mock.patch("crewai.cli.reset_memories_command.get_crew")
-def test_reset_long_term_memories(mock_get_crew, runner):
-    mock_crew = mock.Mock()
-    mock_get_crew.return_value = mock_crew
+@mock.patch("crewai.cli.reset_memories_command.LongTermMemory")
+def test_reset_long_term_memories(MockLongTermMemory, runner):
    result = runner.invoke(reset_memories, ["-l"])
-
-    mock_crew.reset_memories.assert_called_once_with(command_type="long")
+    MockLongTermMemory().reset.assert_called_once()
    assert result.output == "Long term memory has been reset.\n"


-@mock.patch("crewai.cli.reset_memories_command.get_crew")
-def test_reset_kickoff_outputs(mock_get_crew, runner):
-    mock_crew = mock.Mock()
-    mock_get_crew.return_value = mock_crew
+@mock.patch("crewai.cli.reset_memories_command.TaskOutputStorageHandler")
+def test_reset_kickoff_outputs(MockTaskOutputStorageHandler, runner):
    result = runner.invoke(reset_memories, ["-k"])
-
-    mock_crew.reset_memories.assert_called_once_with(command_type="kickoff_outputs")
+    MockTaskOutputStorageHandler().reset.assert_called_once()
    assert result.output == "Latest Kickoff outputs stored has been reset.\n"


-@mock.patch("crewai.cli.reset_memories_command.get_crew")
-def test_reset_multiple_memory_flags(mock_get_crew, runner):
-    mock_crew = mock.Mock()
-    mock_get_crew.return_value = mock_crew
-    result = runner.invoke(reset_memories, ["-s", "-l"])
-
-    # Check that reset_memories was called twice with the correct arguments
-    assert mock_crew.reset_memories.call_count == 2
-    mock_crew.reset_memories.assert_has_calls(
-        [mock.call(command_type="long"), mock.call(command_type="short")]
+@mock.patch("crewai.cli.reset_memories_command.ShortTermMemory")
+@mock.patch("crewai.cli.reset_memories_command.LongTermMemory")
+def test_reset_multiple_memory_flags(MockShortTermMemory, MockLongTermMemory, runner):
+    result = runner.invoke(
+        reset_memories,
+        [
+            "-s",
+            "-l",
+        ],
    )
+    MockShortTermMemory().reset.assert_called_once()
+    MockLongTermMemory().reset.assert_called_once()
    assert (
        result.output
        == "Long term memory has been reset.\nShort term memory has been reset.\n"
    )


-@mock.patch("crewai.cli.reset_memories_command.get_crew")
-def test_reset_knowledge(mock_get_crew, runner):
-    mock_crew = mock.Mock()
-    mock_get_crew.return_value = mock_crew
-    result = runner.invoke(reset_memories, ["--knowledge"])
-
-    mock_crew.reset_memories.assert_called_once_with(command_type="knowledge")
-    assert result.output == "Knowledge has been reset.\n"
-
-
 def test_reset_no_memory_flags(runner):
    result = runner.invoke(
        reset_memories,
--- a/tests/crew_test.py
+++ b/tests/crew_test.py
@@ -49,39 +49,6 @@ writer = Agent(
 )


-def test_crew_with_only_conditional_tasks_raises_error():
-    """Test that creating a crew with only conditional tasks raises an error."""
-    def condition_func(task_output: TaskOutput) -> bool:
-        return True
-
-    conditional1 = ConditionalTask(
-        description="Conditional task 1",
-        expected_output="Output 1",
-        agent=researcher,
-        condition=condition_func,
-    )
-    conditional2 = ConditionalTask(
-        description="Conditional task 2",
-        expected_output="Output 2",
-        agent=researcher,
-        condition=condition_func,
-    )
-    conditional3 = ConditionalTask(
-        description="Conditional task 3",
-        expected_output="Output 3",
-        agent=researcher,
-        condition=condition_func,
-    )
-
-    with pytest.raises(
-        pydantic_core._pydantic_core.ValidationError,
-        match="Crew must include at least one non-conditional task",
-    ):
-        Crew(
-            agents=[researcher],
-            tasks=[conditional1, conditional2, conditional3],
-        )
-
 def test_crew_config_conditional_requirement():
    with pytest.raises(ValueError):
        Crew(process=Process.sequential)
@@ -1950,77 +1917,6 @@ def test_task_callback_on_crew():
        assert isinstance(args[0], TaskOutput)


-def test_task_callback_both_on_task_and_crew():
-    from unittest.mock import MagicMock, patch
-    mock_callback_on_task = MagicMock()
-    mock_callback_on_crew = MagicMock()
-
-    researcher_agent = Agent(
-        role="Researcher",
-        goal="Make the best research and analysis on content about AI and AI agents",
-        backstory="You're an expert researcher, specialized in technology, software engineering, AI and startups. You work as a freelancer and is now working on doing research and analysis for a new customer.",
-        allow_delegation=False,
-    )
-
-    list_ideas = Task(
-        description="Give me a list of 5 interesting ideas to explore for na article, what makes them unique and interesting.",
-        expected_output="Bullet point list of 5 important events.",
-        agent=researcher_agent,
-        async_execution=True,
-        callback=mock_callback_on_task,
-    )
-
-    crew = Crew(
-        agents=[researcher_agent],
-        process=Process.sequential,
-        tasks=[list_ideas],
-        task_callback=mock_callback_on_crew,
-    )
-
-    with patch.object(Agent, "execute_task") as execute:
-        execute.return_value = "ok"
-        crew.kickoff()
-
-        assert list_ideas.callback is not None
-        mock_callback_on_task.assert_called_once_with(list_ideas.output)
-        mock_callback_on_crew.assert_called_once_with(list_ideas.output)
-
-
-def test_task_same_callback_both_on_task_and_crew():
-    from unittest.mock import MagicMock, patch
-
-    mock_callback = MagicMock()
-
-    researcher_agent = Agent(
-        role="Researcher",
-        goal="Make the best research and analysis on content about AI and AI agents",
-        backstory="You're an expert researcher, specialized in technology, software engineering, AI and startups. You work as a freelancer and is now working on doing research and analysis for a new customer.",
-        allow_delegation=False,
-    )
-
-    list_ideas = Task(
-        description="Give me a list of 5 interesting ideas to explore for na article, what makes them unique and interesting.",
-        expected_output="Bullet point list of 5 important events.",
-        agent=researcher_agent,
-        async_execution=True,
-        callback=mock_callback,
-    )
-
-    crew = Crew(
-        agents=[researcher_agent],
-        process=Process.sequential,
-        tasks=[list_ideas],
-        task_callback=mock_callback,
-    )
-
-    with patch.object(Agent, "execute_task") as execute:
-        execute.return_value = "ok"
-        crew.kickoff()
-
-        assert list_ideas.callback is not None
-        mock_callback.assert_called_once_with(list_ideas.output)
-
-
@pytest.mark.vcr(filter_headers=["authorization"])
 def test_tools_with_custom_caching():
    from unittest.mock import patch
@@ -2093,195 +1989,6 @@ def test_tools_with_custom_caching():
            assert result.raw == "3"


-@pytest.mark.vcr(filter_headers=["authorization"])
-def test_conditional_task_uses_last_output():
-    """Test that conditional tasks use the last task output for condition evaluation."""
-    task1 = Task(
-        description="First task",
-        expected_output="First output",
-        agent=researcher,
-    )
-    def condition_fails(task_output: TaskOutput) -> bool:
-        # This condition will never be met
-        return "never matches" in task_output.raw.lower()
-    
-    def condition_succeeds(task_output: TaskOutput) -> bool:
-        # This condition will match first task's output
-        return "first success" in task_output.raw.lower()
-    
-    conditional_task1 = ConditionalTask(
-        description="Second task - conditional that fails condition",
-        expected_output="Second output",
-        agent=researcher,
-        condition=condition_fails,
-    )
-    
-    conditional_task2 = ConditionalTask(
-        description="Third task - conditional that succeeds using first task output",
-        expected_output="Third output",
-        agent=writer,
-        condition=condition_succeeds,
-    )
-
-    crew = Crew(
-        agents=[researcher, writer],
-        tasks=[task1, conditional_task1, conditional_task2],
-    )
-
-    # Mock outputs for tasks
-    mock_first = TaskOutput(
-        description="First task output",
-        raw="First success output",  # Will be used by third task's condition
-        agent=researcher.role,
-    )
-    mock_skipped = TaskOutput(
-        description="Second task output",
-        raw="",  # Empty output since condition fails
-        agent=researcher.role,
-    )
-    mock_third = TaskOutput(
-        description="Third task output",
-        raw="Third task executed",  # Output when condition succeeds using first task output
-        agent=writer.role,
-    )
-    
-    # Set up mocks for task execution and conditional logic
-    with patch.object(ConditionalTask, "should_execute") as mock_should_execute:
-        # First conditional fails, second succeeds
-        mock_should_execute.side_effect = [False, True]
-        
-        with patch.object(Task, "execute_sync") as mock_execute:
-            mock_execute.side_effect = [mock_first, mock_third]
-            result = crew.kickoff()
-            
-            # Verify execution behavior
-            assert mock_execute.call_count == 2  # Only first and third tasks execute
-            assert mock_should_execute.call_count == 2  # Both conditionals checked
-            
-            # Verify outputs collection
-            assert len(result.tasks_output) == 3
-            assert result.tasks_output[0].raw == "First success output"  # First task succeeded
-            assert result.tasks_output[1].raw == ""  # Second task skipped (condition failed)
-            assert result.tasks_output[2].raw == "Third task executed"  # Third task used first task's output
-
-@pytest.mark.vcr(filter_headers=["authorization"])
-def test_conditional_tasks_result_collection():
-    """Test that task outputs are properly collected based on execution status."""
-    task1 = Task(
-        description="Normal task that always executes",
-        expected_output="First output",
-        agent=researcher,
-    )
-    
-    def condition_never_met(task_output: TaskOutput) -> bool:
-        return "never matches" in task_output.raw.lower()
-    
-    def condition_always_met(task_output: TaskOutput) -> bool:
-        return "success" in task_output.raw.lower()
-    
-    task2 = ConditionalTask(
-        description="Conditional task that never executes",
-        expected_output="Second output",
-        agent=researcher,
-        condition=condition_never_met,
-    )
-    
-    task3 = ConditionalTask(
-        description="Conditional task that always executes",
-        expected_output="Third output",
-        agent=writer,
-        condition=condition_always_met,
-    )
-
-    crew = Crew(
-        agents=[researcher, writer],
-        tasks=[task1, task2, task3],
-    )
-
-    # Mock outputs for different execution paths
-    mock_success = TaskOutput(
-        description="Success output",
-        raw="Success output",  # Triggers third task's condition
-        agent=researcher.role,
-    )
-    mock_skipped = TaskOutput(
-        description="Skipped output",
-        raw="",  # Empty output for skipped task
-        agent=researcher.role,
-    )
-    mock_conditional = TaskOutput(
-        description="Conditional output",
-        raw="Conditional task executed",
-        agent=writer.role,
-    )
-    
-    # Set up mocks for task execution and conditional logic
-    with patch.object(ConditionalTask, "should_execute") as mock_should_execute:
-        # First conditional fails, second succeeds
-        mock_should_execute.side_effect = [False, True]
-        
-        with patch.object(Task, "execute_sync") as mock_execute:
-            mock_execute.side_effect = [mock_success, mock_conditional]
-            result = crew.kickoff()
-            
-            # Verify execution behavior
-            assert mock_execute.call_count == 2  # Only first and third tasks execute
-            assert mock_should_execute.call_count == 2  # Both conditionals checked
-            
-            # Verify task output collection
-            assert len(result.tasks_output) == 3
-            assert result.tasks_output[0].raw == "Success output"      # Normal task executed
-            assert result.tasks_output[1].raw == ""                    # Second task skipped
-            assert result.tasks_output[2].raw == "Conditional task executed"  # Third task executed
-
-@pytest.mark.vcr(filter_headers=["authorization"])
-def test_multiple_conditional_tasks():
-    """Test that having multiple conditional tasks in sequence works correctly."""
-    task1 = Task(
-        description="Initial research task",
-        expected_output="Research output",
-        agent=researcher,
-    )
-    
-    def condition1(task_output: TaskOutput) -> bool:
-        return "success" in task_output.raw.lower()
-    
-    def condition2(task_output: TaskOutput) -> bool:
-        return "proceed" in task_output.raw.lower()
-    
-    task2 = ConditionalTask(
-        description="First conditional task",
-        expected_output="Conditional output 1",
-        agent=writer,
-        condition=condition1,
-    )
-    
-    task3 = ConditionalTask(
-        description="Second conditional task",
-        expected_output="Conditional output 2",
-        agent=writer,
-        condition=condition2,
-    )
-
-    crew = Crew(
-        agents=[researcher, writer],
-        tasks=[task1, task2, task3],
-    )
-
-    # Mock different task outputs to test conditional logic
-    mock_success = TaskOutput(
-        description="Mock success",
-        raw="Success and proceed output",
-        agent=researcher.role,
-    )
-    
-    # Set up mocks for task execution
-    with patch.object(Task, "execute_sync", return_value=mock_success) as mock_execute:
-        result = crew.kickoff()
-        # Verify all tasks were executed (no IndexError)
-        assert mock_execute.call_count == 3
-        assert len(result.tasks_output) == 3
-
@pytest.mark.vcr(filter_headers=["authorization"])
 def test_using_contextual_memory():
    from unittest.mock import patch
@@ -3306,7 +3013,8 @@ def test_conditional_should_execute():

@mock.patch("crewai.crew.CrewEvaluator")
@mock.patch("crewai.crew.Crew.copy")
-def test_crew_testing_function(copy_mock, crew_evaluator_mock):
+@mock.patch("crewai.crew.Crew.kickoff")
+def test_crew_testing_function(kickoff_mock, copy_mock, crew_evaluator):
    task = Task(
        description="Come up with a list of 5 interesting ideas to explore for an article, then write one amazing paragraph highlight for each idea that showcases how good an article about this topic could be. Return the list of ideas with their paragraph and your notes.",
        expected_output="5 bullet points with a paragraph for each idea.",
@@ -3318,28 +3026,25 @@ def test_crew_testing_function(copy_mock, crew_evaluator_mock):
        tasks=[task],
    )

-    # Create a mock for the copied crew with a mock kickoff method
-    copied_crew = MagicMock()
-    copy_mock.return_value = copied_crew
-
-    # Create a mock for the CrewEvaluator instance
-    evaluator_instance = MagicMock()
-    crew_evaluator_mock.return_value = evaluator_instance
+    # Create a mock for the copied crew
+    copy_mock.return_value = crew

    n_iterations = 2
    crew.test(n_iterations, openai_model_name="gpt-4o-mini", inputs={"topic": "AI"})

    # Ensure kickoff is called on the copied crew
-    copied_crew.kickoff.assert_has_calls(
+    kickoff_mock.assert_has_calls(
        [mock.call(inputs={"topic": "AI"}), mock.call(inputs={"topic": "AI"})]
    )

-    # Verify CrewEvaluator interactions
-    # We don't check the exact LLM object since it's created internally
-    assert len(crew_evaluator_mock.mock_calls) == 4
-    assert crew_evaluator_mock.mock_calls[1] == mock.call().set_iteration(1)
-    assert crew_evaluator_mock.mock_calls[2] == mock.call().set_iteration(2)
-    assert crew_evaluator_mock.mock_calls[3] == mock.call().print_crew_evaluation_result()
+    crew_evaluator.assert_has_calls(
+        [
+            mock.call(crew, "gpt-4o-mini"),
+            mock.call().set_iteration(1),
+            mock.call().set_iteration(2),
+            mock.call().print_crew_evaluation_result(),
+        ]
+    )


@pytest.mark.vcr(filter_headers=["authorization"])
--- a/tests/llm_test.py
+++ b/tests/llm_test.py
@@ -286,79 +286,6 @@ def test_o3_mini_reasoning_effort_medium():


@pytest.mark.vcr(filter_headers=["authorization"])
-@pytest.fixture
-def anthropic_llm():
-    """Fixture providing an Anthropic LLM instance."""
-    return LLM(model="anthropic/claude-3-sonnet")
-
-@pytest.fixture
-def system_message():
-    """Fixture providing a system message."""
-    return {"role": "system", "content": "test"}
-
-@pytest.fixture
-def user_message():
-    """Fixture providing a user message."""
-    return {"role": "user", "content": "test"}
-
-def test_anthropic_message_formatting_edge_cases(anthropic_llm):
-    """Test edge cases for Anthropic message formatting."""
-    # Test None messages
-    with pytest.raises(TypeError, match="Messages cannot be None"):
-        anthropic_llm._format_messages_for_provider(None)
-        
-    # Test empty message list
-    formatted = anthropic_llm._format_messages_for_provider([])
-    assert len(formatted) == 1
-    assert formatted[0]["role"] == "user"
-    assert formatted[0]["content"] == "."
-    
-    # Test invalid message format
-    with pytest.raises(TypeError, match="Invalid message format"):
-        anthropic_llm._format_messages_for_provider([{"invalid": "message"}])
-
-def test_anthropic_model_detection():
-    """Test Anthropic model detection with various formats."""
-    models = [
-        ("anthropic/claude-3", True),
-        ("claude-instant", True),
-        ("claude/v1", True),
-        ("gpt-4", False),
-        ("", False),
-        ("anthropomorphic", False),  # Should not match partial words
-    ]
-    
-    for model, expected in models:
-        llm = LLM(model=model)
-        assert llm.is_anthropic == expected, f"Failed for model: {model}"
-
-def test_anthropic_message_formatting(anthropic_llm, system_message, user_message):
-    """Test Anthropic message formatting with fixtures."""
-    # Test when first message is system
-    formatted = anthropic_llm._format_messages_for_provider([system_message])
-    assert len(formatted) == 2
-    assert formatted[0]["role"] == "user"
-    assert formatted[0]["content"] == "."
-    assert formatted[1] == system_message
-
-    # Test when first message is already user
-    formatted = anthropic_llm._format_messages_for_provider([user_message])
-    assert len(formatted) == 1
-    assert formatted[0] == user_message
-
-    # Test with empty message list
-    formatted = anthropic_llm._format_messages_for_provider([])
-    assert len(formatted) == 1
-    assert formatted[0]["role"] == "user"
-    assert formatted[0]["content"] == "."
-
-    # Test with non-Anthropic model (should not modify messages)
-    non_anthropic_llm = LLM(model="gpt-4")
-    formatted = non_anthropic_llm._format_messages_for_provider([system_message])
-    assert len(formatted) == 1
-    assert formatted[0] == system_message
-
-
 def test_deepseek_r1_with_open_router():
    if not os.getenv("OPEN_ROUTER_API_KEY"):
        pytest.skip("OPEN_ROUTER_API_KEY not set; skipping test.")
--- a/tests/utilities/evaluators/cassettes/test_crew_test_with_custom_llm.yaml
+++ b/tests/utilities/evaluators/cassettes/test_crew_test_with_custom_llm.yaml
@@ -1,942 +0,0 @@
-interactions:
- request:
-    body: '{"messages": [{"role": "system", "content": "You are test. test\nYour personal
-      goal is: test\nTo give my best complete final answer to the task respond using
-      the exact following format:\n\nThought: I now can give a great answer\nFinal
-      Answer: Your final answer must be the great and the most complete as possible,
-      it must be outcome described.\n\nI MUST use these formats, my job depends on
-      it!"}, {"role": "user", "content": "\nCurrent Task: test\n\nThis is the expected
-      criteria for your final answer: test output\nyou MUST return the actual complete
-      content as the final answer, not a summary.\n\nBegin! This is VERY important
-      to you, use the tools available and give your best Final Answer, your job depends
-      on it!\n\nThought:"}], "model": "gpt-4", "stop": ["\nObservation:"]}'
-    headers:
-      accept:
-      - application/json
-      accept-encoding:
-      - gzip, deflate
-      authorization:
-      - Bearer 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
-      connection:
-      - keep-alive
-      content-length:
-      - '780'
-      content-type:
-      - application/json
-      host:
-      - api.openai.com
-      user-agent:
-      - OpenAI/Python 1.61.0
-      x-stainless-arch:
-      - x64
-      x-stainless-async:
-      - 'false'
-      x-stainless-lang:
-      - python
-      x-stainless-os:
-      - Linux
-      x-stainless-package-version:
-      - 1.61.0
-      x-stainless-raw-response:
-      - 'true'
-      x-stainless-retry-count:
-      - '0'
-      x-stainless-runtime:
-      - CPython
-      x-stainless-runtime-version:
-      - 3.12.7
-    method: POST
-    uri: https://api.openai.com/v1/chat/completions
-  response:
-    content: "{\n  \"id\": \"chatcmpl-AzAMqMWFX8hC1szIKxWNSyXm0SPFi\",\n  \"object\":
-      \"chat.completion\",\n  \"created\": 1739141224,\n  \"model\": \"gpt-4-0613\",\n
-      \ \"choices\": [\n    {\n      \"index\": 0,\n      \"message\": {\n        \"role\":
-      \"assistant\",\n        \"content\": \"I am prepared to conduct the test efficiently.\\n\\nFinal
-      Answer: The test output that aligns with the given criteria is a detailed description
-      of the testing process, providing a thorough understanding for anyone reviewing
-      it. The output not only contains the raw data or results but also includes step-by-step
-      documentation of the process employed, thoughts and reasoning behind each step,
-      deviations if any from the original plan, and how these deviations impacted
-      the results. In addition, it captures any errors or unexpected occurrences during
-      the course of the test, and proposes possible explanations or solutions for
-      these. It is detailed yet comprehensible, catering to both technical and non-technical
-      audiences. It is a result of meticulous planning, diligent execution, and robust
-      post-test analysis, making it a complete content.\",\n        \"refusal\": null\n
-      \     },\n      \"logprobs\": null,\n      \"finish_reason\": \"stop\"\n    }\n
-      \ ],\n  \"usage\": {\n    \"prompt_tokens\": 149,\n    \"completion_tokens\":
-      151,\n    \"total_tokens\": 300,\n    \"prompt_tokens_details\": {\n      \"cached_tokens\":
-      0,\n      \"audio_tokens\": 0\n    },\n    \"completion_tokens_details\": {\n
-      \     \"reasoning_tokens\": 0,\n      \"audio_tokens\": 0,\n      \"accepted_prediction_tokens\":
-      0,\n      \"rejected_prediction_tokens\": 0\n    }\n  },\n  \"service_tier\":
-      \"default\",\n  \"system_fingerprint\": null\n}\n"
-    headers:
-      CF-RAY:
-      - 90f7662cab11ba33-SEA
-      Connection:
-      - keep-alive
-      Content-Encoding:
-      - gzip
-      Content-Type:
-      - application/json
-      Date:
-      - Sun, 09 Feb 2025 22:47:09 GMT
-      Server:
-      - cloudflare
-      Set-Cookie:
-      - __cf_bm=p1aGVyahvfLAvEwvbX0FMmrN5o18PpVAu2dG_dTgMSU-1739141229-1.0.1.1-_q7aCslZTr11IMFZ81VgyuqsGiqTARFPANUvBEWM_0dZdb97Py78KE1omxdNv5F1pFKoWUqA1kEF2wzQ2wz4aA;
-        path=/; expires=Sun, 09-Feb-25 23:17:09 GMT; domain=.api.openai.com; HttpOnly;
-        Secure; SameSite=None
-      - _cfuvid=bsF0jwE67cS.ywAaQU59jKPFC03S1dvynClHm_wTQik-1739141229143-0.0.1.1-604800000;
-        path=/; domain=.api.openai.com; HttpOnly; Secure; SameSite=None
-      Transfer-Encoding:
-      - chunked
-      X-Content-Type-Options:
-      - nosniff
-      access-control-expose-headers:
-      - X-Request-ID
-      alt-svc:
-      - h3=":443"; ma=86400
-      cf-cache-status:
-      - DYNAMIC
-      openai-organization:
-      - crewai-iuxna1
-      openai-processing-ms:
-      - '4585'
-      openai-version:
-      - '2020-10-01'
-      strict-transport-security:
-      - max-age=31536000; includeSubDomains; preload
-      x-ratelimit-limit-requests:
-      - '10000'
-      x-ratelimit-limit-tokens:
-      - '1000000'
-      x-ratelimit-remaining-requests:
-      - '9999'
-      x-ratelimit-remaining-tokens:
-      - '999822'
-      x-ratelimit-reset-requests:
-      - 6ms
-      x-ratelimit-reset-tokens:
-      - 10ms
-      x-request-id:
-      - req_1ba81a80018602119b871a7a42d7becf
-    http_version: HTTP/1.1
-    status_code: 200
- request:
-    body: !!binary |
-      Ct8LCiQKIgoMc2VydmljZS5uYW1lEhIKEGNyZXdBSS10ZWxlbWV0cnkStgsKEgoQY3Jld2FpLnRl
-      bGVtZXRyeRL5AQoQT8uAOJ+suOhFs22RW56o6BII0Ob64+TP3XQqE0NyZXcgVGVzdCBFeGVjdXRp
-      b24wATk1IlKguqsiGEGGgWOguqsiGEobCg5jcmV3YWlfdmVyc2lvbhIJCgcwLjEwMC4xSi4KCGNy
-      ZXdfa2V5EiIKIGZlYjFlMjFiMzI1NmM1OWE2NDcxNTJhZmRkNjYzMjJlSjEKB2NyZXdfaWQSJgok
-      M2Q1MGJkYWItZDI1NS00MjFiLThkMzMtZjZmOTAzMThhOWQwShEKCml0ZXJhdGlvbnMSAwoBMUoV
-      Cgptb2RlbF9uYW1lEgcKBWdwdC00egIYAYUBAAEAABKSBwoQgzrB2KxaHe9FwPNktJHbFRIIT8gM
-      r7rSvJUqDENyZXcgQ3JlYXRlZDABOW4kaqC6qyIYQe+yeaC6qyIYShsKDmNyZXdhaV92ZXJzaW9u
-      EgkKBzAuMTAwLjFKGgoOcHl0aG9uX3ZlcnNpb24SCAoGMy4xMi43Si4KCGNyZXdfa2V5EiIKIGZl
-      YjFlMjFiMzI1NmM1OWE2NDcxNTJhZmRkNjYzMjJlSjEKB2NyZXdfaWQSJgokM2Q1MGJkYWItZDI1
-      NS00MjFiLThkMzMtZjZmOTAzMThhOWQwShwKDGNyZXdfcHJvY2VzcxIMCgpzZXF1ZW50aWFsShEK
-      C2NyZXdfbWVtb3J5EgIQAEoaChRjcmV3X251bWJlcl9vZl90YXNrcxICGAFKGwoVY3Jld19udW1i
-      ZXJfb2ZfYWdlbnRzEgIYAUrFAgoLY3Jld19hZ2VudHMStQIKsgJbeyJrZXkiOiAiOTc2ZjhmNTBh
-      Y2NmZWJhMjIzZTQ5YzQyYjE2ZTk5ZTYiLCAiaWQiOiAiN2E3NmZjNmYtZTI5YS00MDBlLWI0NGEt
-      NzAyMDNlMzg1Y2RmIiwgInJvbGUiOiAidGVzdCIsICJ2ZXJib3NlPyI6IGZhbHNlLCAibWF4X2l0
-      ZXIiOiAyNSwgIm1heF9ycG0iOiBudWxsLCAiZnVuY3Rpb25fY2FsbGluZ19sbG0iOiAiIiwgImxs
-      bSI6ICJncHQtNCIsICJkZWxlZ2F0aW9uX2VuYWJsZWQ/IjogZmFsc2UsICJhbGxvd19jb2RlX2V4
-      ZWN1dGlvbj8iOiBmYWxzZSwgIm1heF9yZXRyeV9saW1pdCI6IDIsICJ0b29sc19uYW1lcyI6IFtd
-      fV1K+QEKCmNyZXdfdGFza3MS6gEK5wFbeyJrZXkiOiAiZGE5NWViZGIzNmU0Y2RmOTJkZjZhNmRk
-      MTZiY2VlMGUiLCAiaWQiOiAiNTcwYmJlYjQtYzkzNi00NTNkLTg2MjktYzhjMDM0ODA5NDhjIiwg
-      ImFzeW5jX2V4ZWN1dGlvbj8iOiBmYWxzZSwgImh1bWFuX2lucHV0PyI6IGZhbHNlLCAiYWdlbnRf
-      cm9sZSI6ICJ0ZXN0IiwgImFnZW50X2tleSI6ICI5NzZmOGY1MGFjY2ZlYmEyMjNlNDljNDJiMTZl
-      OTllNiIsICJ0b29sc19uYW1lcyI6IFtdfV16AhgBhQEAAQAAEo4CChAus8hZAJcezzXdP2XqhVyF
-      Egi1wnliqIdQdSoMVGFzayBDcmVhdGVkMAE5nneIoLqrIhhBKDyJoLqrIhhKLgoIY3Jld19rZXkS
-      IgogZmViMWUyMWIzMjU2YzU5YTY0NzE1MmFmZGQ2NjMyMmVKMQoHY3Jld19pZBImCiQzZDUwYmRh
-      Yi1kMjU1LTQyMWItOGQzMy1mNmY5MDMxOGE5ZDBKLgoIdGFza19rZXkSIgogZGE5NWViZGIzNmU0
-      Y2RmOTJkZjZhNmRkMTZiY2VlMGVKMQoHdGFza19pZBImCiQ1NzBiYmViNC1jOTM2LTQ1M2QtODYy
-      OS1jOGMwMzQ4MDk0OGN6AhgBhQEAAQAA
-    headers:
-      Accept:
-      - '*/*'
-      Accept-Encoding:
-      - gzip, deflate
-      Connection:
-      - keep-alive
-      Content-Length:
-      - '1506'
-      Content-Type:
-      - application/x-protobuf
-      User-Agent:
-      - OTel-OTLP-Exporter-Python/1.27.0
-    method: POST
-    uri: https://telemetry.crewai.com:4319/v1/traces
-  response:
-    body:
-      string: "\n\0"
-    headers:
-      Content-Length:
-      - '2'
-      Content-Type:
-      - application/x-protobuf
-      Date:
-      - Sun, 09 Feb 2025 22:47:09 GMT
-    status:
-      code: 200
-      message: OK
- request:
-    body: '{"messages": [{"role": "system", "content": "You are Task Execution Evaluator.
-      Evaluator agent for crew evaluation with precise capabilities to evaluate the
-      performance of the agents in the crew based on the tasks they have performed\nYour
-      personal goal is: Your goal is to evaluate the performance of the agents in
-      the crew based on the tasks they have performed using score from 1 to 10 evaluating
-      on completion, quality, and overall performance.\nTo give my best complete final
-      answer to the task respond using the exact following format:\n\nThought: I now
-      can give a great answer\nFinal Answer: Your final answer must be the great and
-      the most complete as possible, it must be outcome described.\n\nI MUST use these
-      formats, my job depends on it!"}, {"role": "user", "content": "\nCurrent Task:
-      Based on the task description and the expected output, compare and evaluate
-      the performance of the agents in the crew based on the Task Output they have
-      performed using score from 1 to 10 evaluating on completion, quality, and overall
-      performance.task_description: test task_expected_output: test output agent:
-      test agent_goal: test Task Output: The test output that aligns with the given
-      criteria is a detailed description of the testing process, providing a thorough
-      understanding for anyone reviewing it. The output not only contains the raw
-      data or results but also includes step-by-step documentation of the process
-      employed, thoughts and reasoning behind each step, deviations if any from the
-      original plan, and how these deviations impacted the results. In addition, it
-      captures any errors or unexpected occurrences during the course of the test,
-      and proposes possible explanations or solutions for these. It is detailed yet
-      comprehensible, catering to both technical and non-technical audiences. It is
-      a result of meticulous planning, diligent execution, and robust post-test analysis,
-      making it a complete content.\n\nThis is the expected criteria for your final
-      answer: Evaluation Score from 1 to 10 based on the performance of the agents
-      on the tasks\nyou MUST return the actual complete content as the final answer,
-      not a summary.\nEnsure your final answer contains only the content in the following
-      format: {\n  \"quality\": float\n}\n\nEnsure the final output does not include
-      any code block markers like ```json or ```python.\n\nBegin! This is VERY important
-      to you, use the tools available and give your best Final Answer, your job depends
-      on it!\n\nThought:"}], "model": "gpt-4", "stop": ["\nObservation:"]}'
-    headers:
-      accept:
-      - application/json
-      accept-encoding:
-      - gzip, deflate
-      authorization:
-      - Bearer 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
-      connection:
-      - keep-alive
-      content-length:
-      - '2523'
-      content-type:
-      - application/json
-      cookie:
-      - __cf_bm=p1aGVyahvfLAvEwvbX0FMmrN5o18PpVAu2dG_dTgMSU-1739141229-1.0.1.1-_q7aCslZTr11IMFZ81VgyuqsGiqTARFPANUvBEWM_0dZdb97Py78KE1omxdNv5F1pFKoWUqA1kEF2wzQ2wz4aA;
-        _cfuvid=bsF0jwE67cS.ywAaQU59jKPFC03S1dvynClHm_wTQik-1739141229143-0.0.1.1-604800000
-      host:
-      - api.openai.com
-      user-agent:
-      - OpenAI/Python 1.61.0
-      x-stainless-arch:
-      - x64
-      x-stainless-async:
-      - 'false'
-      x-stainless-lang:
-      - python
-      x-stainless-os:
-      - Linux
-      x-stainless-package-version:
-      - 1.61.0
-      x-stainless-raw-response:
-      - 'true'
-      x-stainless-retry-count:
-      - '0'
-      x-stainless-runtime:
-      - CPython
-      x-stainless-runtime-version:
-      - 3.12.7
-    method: POST
-    uri: https://api.openai.com/v1/chat/completions
-  response:
-    content: "{\n  \"id\": \"chatcmpl-AzAMvAOSw5847reo2vh61focjnyK2\",\n  \"object\":
-      \"chat.completion\",\n  \"created\": 1739141229,\n  \"model\": \"gpt-4-0613\",\n
-      \ \"choices\": [\n    {\n      \"index\": 0,\n      \"message\": {\n        \"role\":
-      \"assistant\",\n        \"content\": \"Based on the given task output, I can
-      determine that the test agent has performed impressively well. Their work is
-      comprehensive, catering to both non-technical and technical audiences and includes
-      complete and detailed process documentation. Further, the way they detect and
-      elaborate deviances and errors shows their meticulousness and efficiency. Their
-      planning, execution, and analysis are sound.\\n\\nFinal Answer: The quality
-      of this task is admirable, paying attention to details and meticulously planning
-      and reasoning behind each step. Considering all these, on a scale from 1 to
-      10, I would rate the task performed by the test agent as follows:\\n\\n{\\n
-      \ \\\"quality\\\": 9.5\\n}\",\n        \"refusal\": null\n      },\n      \"logprobs\":
-      null,\n      \"finish_reason\": \"stop\"\n    }\n  ],\n  \"usage\": {\n    \"prompt_tokens\":
-      471,\n    \"completion_tokens\": 133,\n    \"total_tokens\": 604,\n    \"prompt_tokens_details\":
-      {\n      \"cached_tokens\": 0,\n      \"audio_tokens\": 0\n    },\n    \"completion_tokens_details\":
-      {\n      \"reasoning_tokens\": 0,\n      \"audio_tokens\": 0,\n      \"accepted_prediction_tokens\":
-      0,\n      \"rejected_prediction_tokens\": 0\n    }\n  },\n  \"service_tier\":
-      \"default\",\n  \"system_fingerprint\": null\n}\n"
-    headers:
-      CF-Cache-Status:
-      - DYNAMIC
-      CF-RAY:
-      - 90f7664a581eba33-SEA
-      Connection:
-      - keep-alive
-      Content-Encoding:
-      - gzip
-      Content-Type:
-      - application/json
-      Date:
-      - Sun, 09 Feb 2025 22:47:14 GMT
-      Server:
-      - cloudflare
-      Transfer-Encoding:
-      - chunked
-      X-Content-Type-Options:
-      - nosniff
-      access-control-expose-headers:
-      - X-Request-ID
-      alt-svc:
-      - h3=":443"; ma=86400
-      openai-organization:
-      - crewai-iuxna1
-      openai-processing-ms:
-      - '4884'
-      openai-version:
-      - '2020-10-01'
-      strict-transport-security:
-      - max-age=31536000; includeSubDomains; preload
-      x-ratelimit-limit-requests:
-      - '10000'
-      x-ratelimit-limit-tokens:
-      - '1000000'
-      x-ratelimit-remaining-requests:
-      - '9999'
-      x-ratelimit-remaining-tokens:
-      - '999388'
-      x-ratelimit-reset-requests:
-      - 6ms
-      x-ratelimit-reset-tokens:
-      - 36ms
-      x-request-id:
-      - req_0335ff13c1777c1bcdbee89879bc132c
-    http_version: HTTP/1.1
-    status_code: 200
- request:
-    body: !!binary |
-      CvsNCiQKIgoMc2VydmljZS5uYW1lEhIKEGNyZXdBSS10ZWxlbWV0cnkS0g0KEgoQY3Jld2FpLnRl
-      bGVtZXRyeRKZAgoQJQy0LAglHxA7Ok+n0Gmi9hIIybabc5KDQlkqG0NyZXcgSW5kaXZpZHVhbCBU
-      ZXN0IFJlc3VsdDABOQK5tem8qyIYQVTw0+m8qyIYShsKDmNyZXdhaV92ZXJzaW9uEgkKBzAuMTAw
-      LjFKLgoIY3Jld19rZXkSIgogZmViMWUyMWIzMjU2YzU5YTY0NzE1MmFmZGQ2NjMyMmVKMQoHY3Jl
-      d19pZBImCiQzZDUwYmRhYi1kMjU1LTQyMWItOGQzMy1mNmY5MDMxOGE5ZDBKEAoHcXVhbGl0eRIF
-      CgM5LjVKFwoJZXhlY190aW1lEgoKCDQuODA5MTc4ShUKCm1vZGVsX25hbWUSBwoFZ3B0LTR6AhgB
-      hQEAAQAAEvkBChCnePSvJJg/cFeYF3HlEvyVEgh3YAewpHkssyoTQ3JldyBUZXN0IEV4ZWN1dGlv
-      bjABOeiBSOq8qyIYQQWTV+q8qyIYShsKDmNyZXdhaV92ZXJzaW9uEgkKBzAuMTAwLjFKLgoIY3Jl
-      d19rZXkSIgogZmViMWUyMWIzMjU2YzU5YTY0NzE1MmFmZGQ2NjMyMmVKMQoHY3Jld19pZBImCiQ2
-      YzYyNmEyZi05OGRlLTQ2ODAtOWJhNC01NWVkYzdmODhiZTNKEQoKaXRlcmF0aW9ucxIDCgExShUK
-      Cm1vZGVsX25hbWUSBwoFZ3B0LTR6AhgBhQEAAQAAEpIHChB0Gz3vlppAjams1hbMI/RQEggPCufR
-      e9thfSoMQ3JldyBDcmVhdGVkMAE5X3td6ryrIhhBWNhr6ryrIhhKGwoOY3Jld2FpX3ZlcnNpb24S
-      CQoHMC4xMDAuMUoaCg5weXRob25fdmVyc2lvbhIICgYzLjEyLjdKLgoIY3Jld19rZXkSIgogZmVi
-      MWUyMWIzMjU2YzU5YTY0NzE1MmFmZGQ2NjMyMmVKMQoHY3Jld19pZBImCiQ2YzYyNmEyZi05OGRl
-      LTQ2ODAtOWJhNC01NWVkYzdmODhiZTNKHAoMY3Jld19wcm9jZXNzEgwKCnNlcXVlbnRpYWxKEQoL
-      Y3Jld19tZW1vcnkSAhAAShoKFGNyZXdfbnVtYmVyX29mX3Rhc2tzEgIYAUobChVjcmV3X251bWJl
-      cl9vZl9hZ2VudHMSAhgBSsUCCgtjcmV3X2FnZW50cxK1AgqyAlt7ImtleSI6ICI5NzZmOGY1MGFj
-      Y2ZlYmEyMjNlNDljNDJiMTZlOTllNiIsICJpZCI6ICJkYTA4M2Q5ZS0xOWU5LTQyMzAtYjZmNC0y
-      NjlhNzM1NzViOWQiLCAicm9sZSI6ICJ0ZXN0IiwgInZlcmJvc2U/IjogZmFsc2UsICJtYXhfaXRl
-      ciI6IDI1LCAibWF4X3JwbSI6IG51bGwsICJmdW5jdGlvbl9jYWxsaW5nX2xsbSI6ICIiLCAibGxt
-      IjogImdwdC00IiwgImRlbGVnYXRpb25fZW5hYmxlZD8iOiBmYWxzZSwgImFsbG93X2NvZGVfZXhl
-      Y3V0aW9uPyI6IGZhbHNlLCAibWF4X3JldHJ5X2xpbWl0IjogMiwgInRvb2xzX25hbWVzIjogW119
-      XUr5AQoKY3Jld190YXNrcxLqAQrnAVt7ImtleSI6ICJkYTk1ZWJkYjM2ZTRjZGY5MmRmNmE2ZGQx
-      NmJjZWUwZSIsICJpZCI6ICJhNGUwYjM1Ny0zMDBlLTQ0MjMtYTU1My0yZTZlMWQxODg1M2MiLCAi
-      YXN5bmNfZXhlY3V0aW9uPyI6IGZhbHNlLCAiaHVtYW5faW5wdXQ/IjogZmFsc2UsICJhZ2VudF9y
-      b2xlIjogInRlc3QiLCAiYWdlbnRfa2V5IjogIjk3NmY4ZjUwYWNjZmViYTIyM2U0OWM0MmIxNmU5
-      OWU2IiwgInRvb2xzX25hbWVzIjogW119XXoCGAGFAQABAAASjgIKEC5jZek+sSlZP8lSwF5zTSYS
-      CIHcVJhIpsWuKgxUYXNrIENyZWF0ZWQwATl9vnrqvKsiGEFgjnvqvKsiGEouCghjcmV3X2tleRIi
-      CiBmZWIxZTIxYjMyNTZjNTlhNjQ3MTUyYWZkZDY2MzIyZUoxCgdjcmV3X2lkEiYKJDZjNjI2YTJm
-      LTk4ZGUtNDY4MC05YmE0LTU1ZWRjN2Y4OGJlM0ouCgh0YXNrX2tleRIiCiBkYTk1ZWJkYjM2ZTRj
-      ZGY5MmRmNmE2ZGQxNmJjZWUwZUoxCgd0YXNrX2lkEiYKJGE0ZTBiMzU3LTMwMGUtNDQyMy1hNTUz
-      LTJlNmUxZDE4ODUzY3oCGAGFAQABAAA=
-    headers:
-      Accept:
-      - '*/*'
-      Accept-Encoding:
-      - gzip, deflate
-      Connection:
-      - keep-alive
-      Content-Length:
-      - '1790'
-      Content-Type:
-      - application/x-protobuf
-      User-Agent:
-      - OTel-OTLP-Exporter-Python/1.27.0
-    method: POST
-    uri: https://telemetry.crewai.com:4319/v1/traces
-  response:
-    body:
-      string: "\n\0"
-    headers:
-      Content-Length:
-      - '2'
-      Content-Type:
-      - application/x-protobuf
-      Date:
-      - Sun, 09 Feb 2025 22:47:14 GMT
-    status:
-      code: 200
-      message: OK
- request:
-    body: '{"messages": [{"role": "system", "content": "You are test. test\nYour personal
-      goal is: test\nTo give my best complete final answer to the task respond using
-      the exact following format:\n\nThought: I now can give a great answer\nFinal
-      Answer: Your final answer must be the great and the most complete as possible,
-      it must be outcome described.\n\nI MUST use these formats, my job depends on
-      it!"}, {"role": "user", "content": "\nCurrent Task: test\n\nThis is the expected
-      criteria for your final answer: test output\nyou MUST return the actual complete
-      content as the final answer, not a summary.\n\nBegin! This is VERY important
-      to you, use the tools available and give your best Final Answer, your job depends
-      on it!\n\nThought:"}], "model": "gpt-4", "stop": ["\nObservation:"]}'
-    headers:
-      accept:
-      - application/json
-      accept-encoding:
-      - gzip, deflate
-      authorization:
-      - Bearer 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
-      connection:
-      - keep-alive
-      content-length:
-      - '780'
-      content-type:
-      - application/json
-      cookie:
-      - __cf_bm=p1aGVyahvfLAvEwvbX0FMmrN5o18PpVAu2dG_dTgMSU-1739141229-1.0.1.1-_q7aCslZTr11IMFZ81VgyuqsGiqTARFPANUvBEWM_0dZdb97Py78KE1omxdNv5F1pFKoWUqA1kEF2wzQ2wz4aA;
-        _cfuvid=bsF0jwE67cS.ywAaQU59jKPFC03S1dvynClHm_wTQik-1739141229143-0.0.1.1-604800000
-      host:
-      - api.openai.com
-      user-agent:
-      - OpenAI/Python 1.61.0
-      x-stainless-arch:
-      - x64
-      x-stainless-async:
-      - 'false'
-      x-stainless-lang:
-      - python
-      x-stainless-os:
-      - Linux
-      x-stainless-package-version:
-      - 1.61.0
-      x-stainless-raw-response:
-      - 'true'
-      x-stainless-retry-count:
-      - '0'
-      x-stainless-runtime:
-      - CPython
-      x-stainless-runtime-version:
-      - 3.12.7
-    method: POST
-    uri: https://api.openai.com/v1/chat/completions
-  response:
-    content: "{\n  \"id\": \"chatcmpl-AzAN0cgAktzQnGukedPNpZsTy461c\",\n  \"object\":
-      \"chat.completion\",\n  \"created\": 1739141234,\n  \"model\": \"gpt-4-0613\",\n
-      \ \"choices\": [\n    {\n      \"index\": 0,\n      \"message\": {\n        \"role\":
-      \"assistant\",\n        \"content\": \"I have an understanding of the task at
-      hand and am ready to provide an in-depth and comprehensive answer.\\n\\nFinal
-      Answer: As per the requirement of the task to provide a complete output, I am
-      returning this test output as my final answer. It is not a summary, but rather
-      a full and comprehensive response that fully addresses the question and expectations
-      set forth. Your test output is ready.\",\n        \"refusal\": null\n      },\n
-      \     \"logprobs\": null,\n      \"finish_reason\": \"stop\"\n    }\n  ],\n
-      \ \"usage\": {\n    \"prompt_tokens\": 149,\n    \"completion_tokens\": 77,\n
-      \   \"total_tokens\": 226,\n    \"prompt_tokens_details\": {\n      \"cached_tokens\":
-      0,\n      \"audio_tokens\": 0\n    },\n    \"completion_tokens_details\": {\n
-      \     \"reasoning_tokens\": 0,\n      \"audio_tokens\": 0,\n      \"accepted_prediction_tokens\":
-      0,\n      \"rejected_prediction_tokens\": 0\n    }\n  },\n  \"service_tier\":
-      \"default\",\n  \"system_fingerprint\": null\n}\n"
-    headers:
-      CF-Cache-Status:
-      - DYNAMIC
-      CF-RAY:
-      - 90f76669bd9eba33-SEA
-      Connection:
-      - keep-alive
-      Content-Encoding:
-      - gzip
-      Content-Type:
-      - application/json
-      Date:
-      - Sun, 09 Feb 2025 22:47:17 GMT
-      Server:
-      - cloudflare
-      Transfer-Encoding:
-      - chunked
-      X-Content-Type-Options:
-      - nosniff
-      access-control-expose-headers:
-      - X-Request-ID
-      alt-svc:
-      - h3=":443"; ma=86400
-      openai-organization:
-      - crewai-iuxna1
-      openai-processing-ms:
-      - '3379'
-      openai-version:
-      - '2020-10-01'
-      strict-transport-security:
-      - max-age=31536000; includeSubDomains; preload
-      x-ratelimit-limit-requests:
-      - '10000'
-      x-ratelimit-limit-tokens:
-      - '1000000'
-      x-ratelimit-remaining-requests:
-      - '9999'
-      x-ratelimit-remaining-tokens:
-      - '999822'
-      x-ratelimit-reset-requests:
-      - 6ms
-      x-ratelimit-reset-tokens:
-      - 10ms
-      x-request-id:
-      - req_977371ae262154885689d766016ed132
-    http_version: HTTP/1.1
-    status_code: 200
- request:
-    body: '{"messages": [{"role": "system", "content": "You are Task Execution Evaluator.
-      Evaluator agent for crew evaluation with precise capabilities to evaluate the
-      performance of the agents in the crew based on the tasks they have performed\nYour
-      personal goal is: Your goal is to evaluate the performance of the agents in
-      the crew based on the tasks they have performed using score from 1 to 10 evaluating
-      on completion, quality, and overall performance.\nTo give my best complete final
-      answer to the task respond using the exact following format:\n\nThought: I now
-      can give a great answer\nFinal Answer: Your final answer must be the great and
-      the most complete as possible, it must be outcome described.\n\nI MUST use these
-      formats, my job depends on it!"}, {"role": "user", "content": "\nCurrent Task:
-      Based on the task description and the expected output, compare and evaluate
-      the performance of the agents in the crew based on the Task Output they have
-      performed using score from 1 to 10 evaluating on completion, quality, and overall
-      performance.task_description: test task_expected_output: test output agent:
-      test agent_goal: test Task Output: As per the requirement of the task to provide
-      a complete output, I am returning this test output as my final answer. It is
-      not a summary, but rather a full and comprehensive response that fully addresses
-      the question and expectations set forth. Your test output is ready.\n\nThis
-      is the expected criteria for your final answer: Evaluation Score from 1 to 10
-      based on the performance of the agents on the tasks\nyou MUST return the actual
-      complete content as the final answer, not a summary.\nEnsure your final answer
-      contains only the content in the following format: {\n  \"quality\": float\n}\n\nEnsure
-      the final output does not include any code block markers like ```json or ```python.\n\nBegin!
-      This is VERY important to you, use the tools available and give your best Final
-      Answer, your job depends on it!\n\nThought:"}], "model": "gpt-4", "stop": ["\nObservation:"]}'
-    headers:
-      accept:
-      - application/json
-      accept-encoding:
-      - gzip, deflate
-      authorization:
-      - Bearer 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
-      connection:
-      - keep-alive
-      content-length:
-      - '2017'
-      content-type:
-      - application/json
-      cookie:
-      - __cf_bm=p1aGVyahvfLAvEwvbX0FMmrN5o18PpVAu2dG_dTgMSU-1739141229-1.0.1.1-_q7aCslZTr11IMFZ81VgyuqsGiqTARFPANUvBEWM_0dZdb97Py78KE1omxdNv5F1pFKoWUqA1kEF2wzQ2wz4aA;
-        _cfuvid=bsF0jwE67cS.ywAaQU59jKPFC03S1dvynClHm_wTQik-1739141229143-0.0.1.1-604800000
-      host:
-      - api.openai.com
-      user-agent:
-      - OpenAI/Python 1.61.0
-      x-stainless-arch:
-      - x64
-      x-stainless-async:
-      - 'false'
-      x-stainless-lang:
-      - python
-      x-stainless-os:
-      - Linux
-      x-stainless-package-version:
-      - 1.61.0
-      x-stainless-raw-response:
-      - 'true'
-      x-stainless-retry-count:
-      - '0'
-      x-stainless-runtime:
-      - CPython
-      x-stainless-runtime-version:
-      - 3.12.7
-    method: POST
-    uri: https://api.openai.com/v1/chat/completions
-  response:
-    content: "{\n  \"id\": \"chatcmpl-AzAN3PPgDvH836sMJBXCVRc3im99S\",\n  \"object\":
-      \"chat.completion\",\n  \"created\": 1739141237,\n  \"model\": \"gpt-4-0613\",\n
-      \ \"choices\": [\n    {\n      \"index\": 0,\n      \"message\": {\n        \"role\":
-      \"assistant\",\n        \"content\": \"Based on the information provided, the
-      agent appears to have completed the task, providing an output that they have
-      defined as 'full and comprehensive'. It appears that the agent has attempted
-      to meet all the expectations of the task description and has reached the goal
-      of returning a 'test output' as the final answer.\\n\\nFinal Answer: Considering
-      these aspects, for task completion, the agent receives a 10 as they have successfully
-      generated an output. For quality, the agent again receives a significant score
-      of 10, because the full and comprehensive nature of the output matches the task's
-      expectations. Finally, taking into account both the completion and quality aspects,
-      the overall performance evaluation is also 10, recognizing the perfect alignment
-      between the task's expected output and the output delivered by the agent. \\nTherefore,
-      the final evaluation score can be summarized in the below format:\\n{\\n\\\"completion\\\":
-      10,\\n\\\"quality\\\": 10,\\n\\\"overall performance\\\": 10\\n}\",\n        \"refusal\":
-      null\n      },\n      \"logprobs\": null,\n      \"finish_reason\": \"stop\"\n
-      \   }\n  ],\n  \"usage\": {\n    \"prompt_tokens\": 385,\n    \"completion_tokens\":
-      190,\n    \"total_tokens\": 575,\n    \"prompt_tokens_details\": {\n      \"cached_tokens\":
-      0,\n      \"audio_tokens\": 0\n    },\n    \"completion_tokens_details\": {\n
-      \     \"reasoning_tokens\": 0,\n      \"audio_tokens\": 0,\n      \"accepted_prediction_tokens\":
-      0,\n      \"rejected_prediction_tokens\": 0\n    }\n  },\n  \"service_tier\":
-      \"default\",\n  \"system_fingerprint\": null\n}\n"
-    headers:
-      CF-RAY:
-      - 90f7667f9bfdba33-SEA
-      Connection:
-      - keep-alive
-      Content-Encoding:
-      - gzip
-      Content-Type:
-      - application/json
-      Date:
-      - Sun, 09 Feb 2025 22:47:24 GMT
-      Server:
-      - cloudflare
-      Transfer-Encoding:
-      - chunked
-      X-Content-Type-Options:
-      - nosniff
-      access-control-expose-headers:
-      - X-Request-ID
-      alt-svc:
-      - h3=":443"; ma=86400
-      cf-cache-status:
-      - DYNAMIC
-      openai-organization:
-      - crewai-iuxna1
-      openai-processing-ms:
-      - '6909'
-      openai-version:
-      - '2020-10-01'
-      strict-transport-security:
-      - max-age=31536000; includeSubDomains; preload
-      x-ratelimit-limit-requests:
-      - '10000'
-      x-ratelimit-limit-tokens:
-      - '1000000'
-      x-ratelimit-remaining-requests:
-      - '9999'
-      x-ratelimit-remaining-tokens:
-      - '999515'
-      x-ratelimit-reset-requests:
-      - 6ms
-      x-ratelimit-reset-tokens:
-      - 29ms
-      x-request-id:
-      - req_3e283ae8c1cd132001ecf2d96198bbd6
-    http_version: HTTP/1.1
-    status_code: 200
- request:
-    body: '{"messages": [{"role": "system", "content": "You are test. test\nYour personal
-      goal is: test\nTo give my best complete final answer to the task respond using
-      the exact following format:\n\nThought: I now can give a great answer\nFinal
-      Answer: Your final answer must be the great and the most complete as possible,
-      it must be outcome described.\n\nI MUST use these formats, my job depends on
-      it!"}, {"role": "user", "content": "\nCurrent Task: test\n\nThis is the expected
-      criteria for your final answer: test output\nyou MUST return the actual complete
-      content as the final answer, not a summary.\n\nBegin! This is VERY important
-      to you, use the tools available and give your best Final Answer, your job depends
-      on it!\n\nThought:"}], "model": "gpt-4", "stop": ["\nObservation:"]}'
-    headers:
-      accept:
-      - application/json
-      accept-encoding:
-      - gzip, deflate
-      authorization:
-      - Bearer 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
-      connection:
-      - keep-alive
-      content-length:
-      - '780'
-      content-type:
-      - application/json
-      cookie:
-      - __cf_bm=p1aGVyahvfLAvEwvbX0FMmrN5o18PpVAu2dG_dTgMSU-1739141229-1.0.1.1-_q7aCslZTr11IMFZ81VgyuqsGiqTARFPANUvBEWM_0dZdb97Py78KE1omxdNv5F1pFKoWUqA1kEF2wzQ2wz4aA;
-        _cfuvid=bsF0jwE67cS.ywAaQU59jKPFC03S1dvynClHm_wTQik-1739141229143-0.0.1.1-604800000
-      host:
-      - api.openai.com
-      user-agent:
-      - OpenAI/Python 1.61.0
-      x-stainless-arch:
-      - x64
-      x-stainless-async:
-      - 'false'
-      x-stainless-lang:
-      - python
-      x-stainless-os:
-      - Linux
-      x-stainless-package-version:
-      - 1.61.0
-      x-stainless-raw-response:
-      - 'true'
-      x-stainless-retry-count:
-      - '0'
-      x-stainless-runtime:
-      - CPython
-      x-stainless-runtime-version:
-      - 3.12.7
-    method: POST
-    uri: https://api.openai.com/v1/chat/completions
-  response:
-    content: "{\n  \"id\": \"chatcmpl-AzANAznjrZdppuFIsRnEouHG8WuM0\",\n  \"object\":
-      \"chat.completion\",\n  \"created\": 1739141244,\n  \"model\": \"gpt-4-0613\",\n
-      \ \"choices\": [\n    {\n      \"index\": 0,\n      \"message\": {\n        \"role\":
-      \"assistant\",\n        \"content\": \"I am ready to prepare my final answer
-      based on the test output criteria provided\\nFinal Answer: I have followed all
-      the instructions provided in the task to the best of my ability, and the outcome
-      of the test is as described in the final answer. It is complete, detailed, and
-      accurate.\",\n        \"refusal\": null\n      },\n      \"logprobs\": null,\n
-      \     \"finish_reason\": \"stop\"\n    }\n  ],\n  \"usage\": {\n    \"prompt_tokens\":
-      149,\n    \"completion_tokens\": 59,\n    \"total_tokens\": 208,\n    \"prompt_tokens_details\":
-      {\n      \"cached_tokens\": 0,\n      \"audio_tokens\": 0\n    },\n    \"completion_tokens_details\":
-      {\n      \"reasoning_tokens\": 0,\n      \"audio_tokens\": 0,\n      \"accepted_prediction_tokens\":
-      0,\n      \"rejected_prediction_tokens\": 0\n    }\n  },\n  \"service_tier\":
-      \"default\",\n  \"system_fingerprint\": null\n}\n"
-    headers:
-      CF-Cache-Status:
-      - DYNAMIC
-      CF-RAY:
-      - 90f766abaaa3ba33-SEA
-      Connection:
-      - keep-alive
-      Content-Encoding:
-      - gzip
-      Content-Type:
-      - application/json
-      Date:
-      - Sun, 09 Feb 2025 22:47:27 GMT
-      Server:
-      - cloudflare
-      Transfer-Encoding:
-      - chunked
-      X-Content-Type-Options:
-      - nosniff
-      access-control-expose-headers:
-      - X-Request-ID
-      alt-svc:
-      - h3=":443"; ma=86400
-      openai-organization:
-      - crewai-iuxna1
-      openai-processing-ms:
-      - '2533'
-      openai-version:
-      - '2020-10-01'
-      strict-transport-security:
-      - max-age=31536000; includeSubDomains; preload
-      x-ratelimit-limit-requests:
-      - '10000'
-      x-ratelimit-limit-tokens:
-      - '1000000'
-      x-ratelimit-remaining-requests:
-      - '9999'
-      x-ratelimit-remaining-tokens:
-      - '999822'
-      x-ratelimit-reset-requests:
-      - 6ms
-      x-ratelimit-reset-tokens:
-      - 10ms
-      x-request-id:
-      - req_6ea4c81627695f58de56727aa8d8cc59
-    http_version: HTTP/1.1
-    status_code: 200
- request:
-    body: !!binary |
-      CvwNCiQKIgoMc2VydmljZS5uYW1lEhIKEGNyZXdBSS10ZWxlbWV0cnkS0w0KEgoQY3Jld2FpLnRl
-      bGVtZXRyeRKaAgoQGJwIgEdh/Dq2y8ue+Gl/XxIInTNNpEL8yjQqG0NyZXcgSW5kaXZpZHVhbCBU
-      ZXN0IFJlc3VsdDABOaAPzl6/qyIYQYnY6V6/qyIYShsKDmNyZXdhaV92ZXJzaW9uEgkKBzAuMTAw
-      LjFKLgoIY3Jld19rZXkSIgogZmViMWUyMWIzMjU2YzU5YTY0NzE1MmFmZGQ2NjMyMmVKMQoHY3Jl
-      d19pZBImCiQ2YzYyNmEyZi05OGRlLTQ2ODAtOWJhNC01NWVkYzdmODhiZTNKEQoHcXVhbGl0eRIG
-      CgQxMC4wShcKCWV4ZWNfdGltZRIKCggzLjUwMzUyNkoVCgptb2RlbF9uYW1lEgcKBWdwdC00egIY
-      AYUBAAEAABL5AQoQsfCFg6/ZkEo2LShWV3X+WhII4I9o90lQzxMqE0NyZXcgVGVzdCBFeGVjdXRp
-      b24wATmWhUlfv6siGEFeYlZfv6siGEobCg5jcmV3YWlfdmVyc2lvbhIJCgcwLjEwMC4xSi4KCGNy
-      ZXdfa2V5EiIKIGZlYjFlMjFiMzI1NmM1OWE2NDcxNTJhZmRkNjYzMjJlSjEKB2NyZXdfaWQSJgok
-      ZDU2ZjljMWEtYmRkMS00MDI3LWI1ZjctMzg1ZGVlMWU2YjljShEKCml0ZXJhdGlvbnMSAwoBMUoV
-      Cgptb2RlbF9uYW1lEgcKBWdwdC00egIYAYUBAAEAABKSBwoQlucrHD/mwnCU8Dl9QKzgYhIIGyix
-      8K7RcoAqDENyZXcgQ3JlYXRlZDABOTQxXF+/qyIYQamXaV+/qyIYShsKDmNyZXdhaV92ZXJzaW9u
-      EgkKBzAuMTAwLjFKGgoOcHl0aG9uX3ZlcnNpb24SCAoGMy4xMi43Si4KCGNyZXdfa2V5EiIKIGZl
-      YjFlMjFiMzI1NmM1OWE2NDcxNTJhZmRkNjYzMjJlSjEKB2NyZXdfaWQSJgokZDU2ZjljMWEtYmRk
-      MS00MDI3LWI1ZjctMzg1ZGVlMWU2YjljShwKDGNyZXdfcHJvY2VzcxIMCgpzZXF1ZW50aWFsShEK
-      C2NyZXdfbWVtb3J5EgIQAEoaChRjcmV3X251bWJlcl9vZl90YXNrcxICGAFKGwoVY3Jld19udW1i
-      ZXJfb2ZfYWdlbnRzEgIYAUrFAgoLY3Jld19hZ2VudHMStQIKsgJbeyJrZXkiOiAiOTc2ZjhmNTBh
-      Y2NmZWJhMjIzZTQ5YzQyYjE2ZTk5ZTYiLCAiaWQiOiAiMzcwMzA5YTQtMDU5OS00MWVlLWFiMTgt
-      YWE1ZmQ1Mjg2ZGQ1IiwgInJvbGUiOiAidGVzdCIsICJ2ZXJib3NlPyI6IGZhbHNlLCAibWF4X2l0
-      ZXIiOiAyNSwgIm1heF9ycG0iOiBudWxsLCAiZnVuY3Rpb25fY2FsbGluZ19sbG0iOiAiIiwgImxs
-      bSI6ICJncHQtNCIsICJkZWxlZ2F0aW9uX2VuYWJsZWQ/IjogZmFsc2UsICJhbGxvd19jb2RlX2V4
-      ZWN1dGlvbj8iOiBmYWxzZSwgIm1heF9yZXRyeV9saW1pdCI6IDIsICJ0b29sc19uYW1lcyI6IFtd
-      fV1K+QEKCmNyZXdfdGFza3MS6gEK5wFbeyJrZXkiOiAiZGE5NWViZGIzNmU0Y2RmOTJkZjZhNmRk
-      MTZiY2VlMGUiLCAiaWQiOiAiZTBmNDgzNjAtYzNjNS00ZGY1LThkZjEtNDg2ZTc4OWNiZWUyIiwg
-      ImFzeW5jX2V4ZWN1dGlvbj8iOiBmYWxzZSwgImh1bWFuX2lucHV0PyI6IGZhbHNlLCAiYWdlbnRf
-      cm9sZSI6ICJ0ZXN0IiwgImFnZW50X2tleSI6ICI5NzZmOGY1MGFjY2ZlYmEyMjNlNDljNDJiMTZl
-      OTllNiIsICJ0b29sc19uYW1lcyI6IFtdfV16AhgBhQEAAQAAEo4CChA4OLnKHp32b0EUM2g5rs+r
-      EgjifMpu5dQ6xCoMVGFzayBDcmVhdGVkMAE5x1d3X7+rIhhBMxF4X7+rIhhKLgoIY3Jld19rZXkS
-      IgogZmViMWUyMWIzMjU2YzU5YTY0NzE1MmFmZGQ2NjMyMmVKMQoHY3Jld19pZBImCiRkNTZmOWMx
-      YS1iZGQxLTQwMjctYjVmNy0zODVkZWUxZTZiOWNKLgoIdGFza19rZXkSIgogZGE5NWViZGIzNmU0
-      Y2RmOTJkZjZhNmRkMTZiY2VlMGVKMQoHdGFza19pZBImCiRlMGY0ODM2MC1jM2M1LTRkZjUtOGRm
-      MS00ODZlNzg5Y2JlZTJ6AhgBhQEAAQAA
-    headers:
-      Accept:
-      - '*/*'
-      Accept-Encoding:
-      - gzip, deflate
-      Connection:
-      - keep-alive
-      Content-Length:
-      - '1791'
-      Content-Type:
-      - application/x-protobuf
-      User-Agent:
-      - OTel-OTLP-Exporter-Python/1.27.0
-    method: POST
-    uri: https://telemetry.crewai.com:4319/v1/traces
-  response:
-    body:
-      string: "\n\0"
-    headers:
-      Content-Length:
-      - '2'
-      Content-Type:
-      - application/x-protobuf
-      Date:
-      - Sun, 09 Feb 2025 22:47:29 GMT
-    status:
-      code: 200
-      message: OK
- request:
-    body: '{"messages": [{"role": "system", "content": "You are Task Execution Evaluator.
-      Evaluator agent for crew evaluation with precise capabilities to evaluate the
-      performance of the agents in the crew based on the tasks they have performed\nYour
-      personal goal is: Your goal is to evaluate the performance of the agents in
-      the crew based on the tasks they have performed using score from 1 to 10 evaluating
-      on completion, quality, and overall performance.\nTo give my best complete final
-      answer to the task respond using the exact following format:\n\nThought: I now
-      can give a great answer\nFinal Answer: Your final answer must be the great and
-      the most complete as possible, it must be outcome described.\n\nI MUST use these
-      formats, my job depends on it!"}, {"role": "user", "content": "\nCurrent Task:
-      Based on the task description and the expected output, compare and evaluate
-      the performance of the agents in the crew based on the Task Output they have
-      performed using score from 1 to 10 evaluating on completion, quality, and overall
-      performance.task_description: test task_expected_output: test output agent:
-      test agent_goal: test Task Output: I have followed all the instructions provided
-      in the task to the best of my ability, and the outcome of the test is as described
-      in the final answer. It is complete, detailed, and accurate.\n\nThis is the
-      expected criteria for your final answer: Evaluation Score from 1 to 10 based
-      on the performance of the agents on the tasks\nyou MUST return the actual complete
-      content as the final answer, not a summary.\nEnsure your final answer contains
-      only the content in the following format: {\n  \"quality\": float\n}\n\nEnsure
-      the final output does not include any code block markers like ```json or ```python.\n\nBegin!
-      This is VERY important to you, use the tools available and give your best Final
-      Answer, your job depends on it!\n\nThought:"}], "model": "gpt-4", "stop": ["\nObservation:"]}'
-    headers:
-      accept:
-      - application/json
-      accept-encoding:
-      - gzip, deflate
-      authorization:
-      - Bearer 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
-      connection:
-      - keep-alive
-      content-length:
-      - '1935'
-      content-type:
-      - application/json
-      cookie:
-      - __cf_bm=p1aGVyahvfLAvEwvbX0FMmrN5o18PpVAu2dG_dTgMSU-1739141229-1.0.1.1-_q7aCslZTr11IMFZ81VgyuqsGiqTARFPANUvBEWM_0dZdb97Py78KE1omxdNv5F1pFKoWUqA1kEF2wzQ2wz4aA;
-        _cfuvid=bsF0jwE67cS.ywAaQU59jKPFC03S1dvynClHm_wTQik-1739141229143-0.0.1.1-604800000
-      host:
-      - api.openai.com
-      user-agent:
-      - OpenAI/Python 1.61.0
-      x-stainless-arch:
-      - x64
-      x-stainless-async:
-      - 'false'
-      x-stainless-lang:
-      - python
-      x-stainless-os:
-      - Linux
-      x-stainless-package-version:
-      - 1.61.0
-      x-stainless-raw-response:
-      - 'true'
-      x-stainless-retry-count:
-      - '0'
-      x-stainless-runtime:
-      - CPython
-      x-stainless-runtime-version:
-      - 3.12.7
-    method: POST
-    uri: https://api.openai.com/v1/chat/completions
-  response:
-    content: "{\n  \"id\": \"chatcmpl-AzANDGSiRIu1XO56ZMfLuO2SuL4l0\",\n  \"object\":
-      \"chat.completion\",\n  \"created\": 1739141247,\n  \"model\": \"gpt-4-0613\",\n
-      \ \"choices\": [\n    {\n      \"index\": 0,\n      \"message\": {\n        \"role\":
-      \"assistant\",\n        \"content\": \"Looking at the task output provided by
-      the agent, the agent has expressed a high level of confidence in their ability
-      to follow the instructions provided to the best of their ability. The agent
-      seems to have executed the task with detailed attention and accuracy.\\n\\nI
-      need to evaluate both the quality and the overall performance of the agent keeping
-      in mind the task description and the expected output. Given the agent's output
-      and goal, I can deduce the quality of their work as well as their overall performance.\\n\\nFinal
-      Answer: \\n{\\n  \\\"quality\\\": 8.5\\n}\",\n        \"refusal\": null\n      },\n
-      \     \"logprobs\": null,\n      \"finish_reason\": \"stop\"\n    }\n  ],\n
-      \ \"usage\": {\n    \"prompt_tokens\": 372,\n    \"completion_tokens\": 112,\n
-      \   \"total_tokens\": 484,\n    \"prompt_tokens_details\": {\n      \"cached_tokens\":
-      0,\n      \"audio_tokens\": 0\n    },\n    \"completion_tokens_details\": {\n
-      \     \"reasoning_tokens\": 0,\n      \"audio_tokens\": 0,\n      \"accepted_prediction_tokens\":
-      0,\n      \"rejected_prediction_tokens\": 0\n    }\n  },\n  \"service_tier\":
-      \"default\",\n  \"system_fingerprint\": null\n}\n"
-    headers:
-      CF-RAY:
-      - 90f766bc4d87ba33-SEA
-      Connection:
-      - keep-alive
-      Content-Encoding:
-      - gzip
-      Content-Type:
-      - application/json
-      Date:
-      - Sun, 09 Feb 2025 22:47:32 GMT
-      Server:
-      - cloudflare
-      Transfer-Encoding:
-      - chunked
-      X-Content-Type-Options:
-      - nosniff
-      access-control-expose-headers:
-      - X-Request-ID
-      alt-svc:
-      - h3=":443"; ma=86400
-      cf-cache-status:
-      - DYNAMIC
-      openai-organization:
-      - crewai-iuxna1
-      openai-processing-ms:
-      - '5056'
-      openai-version:
-      - '2020-10-01'
-      strict-transport-security:
-      - max-age=31536000; includeSubDomains; preload
-      x-ratelimit-limit-requests:
-      - '10000'
-      x-ratelimit-limit-tokens:
-      - '1000000'
-      x-ratelimit-remaining-requests:
-      - '9999'
-      x-ratelimit-remaining-tokens:
-      - '999535'
-      x-ratelimit-reset-requests:
-      - 6ms
-      x-ratelimit-reset-tokens:
-      - 27ms
-      x-request-id:
-      - req_ff2926b015823a70e2173c71f8d63209
-    http_version: HTTP/1.1
-    status_code: 200
-version: 1
--- a/tests/utilities/evaluators/test_custom_llm_support.py
+++ b/tests/utilities/evaluators/test_custom_llm_support.py
@@ -1,71 +0,0 @@
-from unittest.mock import MagicMock
-
-import pytest
-
-from crewai.agent import Agent
-from crewai.crew import Crew
-from crewai.llm import LLM
-from crewai.task import Task
-from crewai.utilities.evaluators.crew_evaluator_handler import CrewEvaluator
-
-
-@pytest.mark.vcr()
-def test_crew_test_with_custom_llm():
-    """Test Crew.test() with both string model name and LLM instance."""
-
-    # Setup
-    agent = Agent(
-        role="test",
-        goal="test",
-        backstory="test",
-        llm=LLM(model="gpt-4"),
-    )
-    task = Task(
-        description="test",
-        expected_output="test output",
-        agent=agent,
-    )
-    crew = Crew(agents=[agent], tasks=[task])
-
-    # Test with string model name
-    crew.test(n_iterations=1, llm="gpt-4")
-
-    # Test with LLM instance
-    custom_llm = LLM(model="gpt-4")
-    crew.test(n_iterations=1, llm=custom_llm)
-
-    # Test backward compatibility
-    crew.test(n_iterations=1, openai_model_name="gpt-4")
-
-    # Test error when neither parameter is provided
-    with pytest.raises(ValueError, match="Must provide either 'llm' or 'openai_model_name' parameter"):
-        crew.test(n_iterations=1)
-
-def test_crew_evaluator_with_custom_llm():
-    # Setup
-    agent = Agent(
-        role="test",
-        goal="test",
-        backstory="test",
-        llm=LLM(model="gpt-4"),
-    )
-    task = Task(
-        description="test",
-        expected_output="test output",
-        agent=agent,
-    )
-    crew = Crew(agents=[agent], tasks=[task])
-
-    # Test with string model name
-    evaluator = CrewEvaluator(crew, "gpt-4")
-    assert isinstance(evaluator.llm, LLM)
-    assert evaluator.llm.model == "gpt-4"
-
-    # Test with LLM instance
-    custom_llm = LLM(model="gpt-4")
-    evaluator = CrewEvaluator(crew, custom_llm)
-    assert evaluator.llm == custom_llm
-
-    # Test that evaluator agent uses the correct LLM
-    evaluator_agent = evaluator._evaluator_agent()
-    assert evaluator_agent.llm == evaluator.llm
--- a/uv.lock
+++ b/uv.lock
@@ -649,7 +649,7 @@ wheels = [

 [[package]]
 name = "crewai"
-version = "0.100.1"
+version = "0.100.0"
 source = { editable = "." }
 dependencies = [
    { name = "appdirs" },
Author	SHA1	Message	Date
Lorenze Jay	75bd0310f3	Merge branch 'main' into brandon/improve-llm-structured-output	2025-02-04 13:44:28 -08:00
Brandon Hancock	3de4653023	Merge branch 'main' into brandon/improve-llm-structured-output	2025-02-04 12:42:30 -05:00
Brandon Hancock	ce6ffb1570	update docs	2025-02-04 12:41:02 -05:00
Brandon Hancock	47b3d8f3fa	code and tests work	2025-02-04 11:44:48 -05:00