diff --git a/docs/en/concepts/agents.mdx b/docs/en/concepts/agents.mdx
index 5240c5a9f..7d0ceb9f3 100644
--- a/docs/en/concepts/agents.mdx
+++ b/docs/en/concepts/agents.mdx
@@ -23,6 +23,17 @@ In the CrewAI framework, an `Agent` is an autonomous unit that can:
   at creating content.
 </Tip>
 
+## When to Use Agents
+
+- You need role-specific reasoning and decision-making.
+- You need tool-enabled execution with delegated responsibilities.
+- You need reusable behavioral units across tasks and crews.
+
+## When Not to Use Agents
+
+- Deterministic business logic in plain code is sufficient.
+- A static transformation without reasoning is sufficient.
+
 <Note type="info" title="Enterprise Enhancement: Visual Agent Builder">
 CrewAI AMP includes a Visual Agent Builder that simplifies agent creation and configuration without writing code. Design your agents visually and test them in real-time.
 
diff --git a/docs/en/concepts/crews.mdx b/docs/en/concepts/crews.mdx
index 07fcfd59d..cee0db4a1 100644
--- a/docs/en/concepts/crews.mdx
+++ b/docs/en/concepts/crews.mdx
@@ -9,6 +9,17 @@ mode: "wide"
 
 A crew in crewAI represents a collaborative group of agents working together to achieve a set of tasks. Each crew defines the strategy for task execution, agent collaboration, and the overall workflow.
 
+## When to Use Crews
+
+- You need multiple specialized agents collaborating on a shared outcome.
+- You need process-level orchestration (`sequential` or `hierarchical`).
+- You need task-level handoffs and context propagation.
+
+## When Not to Use Crews
+
+- A single agent can complete the work end-to-end.
+- You do not need multi-step task decomposition.
+
 ## Crew Attributes
 
 | Attribute                             | Parameters             | Description                                                                                                                                                                                                                                               |
@@ -417,3 +428,17 @@ crewai replay -t <task_id>
 ```
 
 These commands let you replay from your latest kickoff tasks, still retaining context from previously executed tasks.
+
+## Common Failure Modes
+
+### Agents overlap responsibilities
+- Cause: role/goal definitions are too broad.
+- Fix: tighten role boundaries and task ownership.
+
+### Hierarchical runs stall or degrade
+- Cause: weak manager configuration or unclear delegation criteria.
+- Fix: define a stronger manager objective and explicit completion criteria.
+
+### Crew outputs are inconsistent
+- Cause: expected outputs are underspecified across tasks.
+- Fix: enforce structured outputs and stronger task contracts.
diff --git a/docs/en/concepts/flows.mdx b/docs/en/concepts/flows.mdx
index 6c39cf761..e0a9fbf75 100644
--- a/docs/en/concepts/flows.mdx
+++ b/docs/en/concepts/flows.mdx
@@ -19,6 +19,19 @@ Flows allow you to create structured, event-driven workflows. They provide a sea
 
 4. **Flexible Control Flow**: Implement conditional logic, loops, and branching within your workflows.
 
+## When to Use Flows
+
+- You need deterministic orchestration and branching logic.
+- You need explicit state transitions across multiple steps.
+- You need resumable workflows with persistence.
+- You need to combine crews, direct model calls, and Python logic in one runtime.
+
+## When Not to Use Flows
+
+- A single prompt/response call is sufficient.
+- A single crew kickoff with no orchestration logic is sufficient.
+- You do not need stateful multi-step execution.
+
 ## Getting Started
 
 The example below shows a realistic Flow for support-ticket triage. It demonstrates features teams use in production: typed state, routing, memory access, and persistence.
@@ -767,201 +780,17 @@ This example demonstrates several key features of using Agents in flows:
 
 3. **Tool Integration**: Agents can use tools (like `WebsiteSearchTool`) to enhance their capabilities.
 
-## Adding Crews to Flows
+## Multi-Crew Flows and Plotting
 
-Creating a flow with multiple crews in CrewAI is straightforward.
+Detailed build walkthroughs and project scaffolding are documented in guide pages to keep this concepts page focused.
 
-You can generate a new CrewAI project that includes all the scaffolding needed to create a flow with multiple crews by running the following command:
+- Build your first flow: [/en/guides/flows/first-flow](/en/guides/flows/first-flow)
+- Master state and persistence: [/en/guides/flows/mastering-flow-state](/en/guides/flows/mastering-flow-state)
+- Real-world chat-state pattern: [/en/learn/flowstate-chat-history](/en/learn/flowstate-chat-history)
 
-```bash
-crewai create flow name_of_flow
-```
-
-This command will generate a new CrewAI project with the necessary folder structure. The generated project includes a prebuilt crew called `poem_crew` that is already working. You can use this crew as a template by copying, pasting, and editing it to create other crews.
-
-### Folder Structure
-
-After running the `crewai create flow name_of_flow` command, you will see a folder structure similar to the following:
-
-| Directory/File         | Description                                                        |
-| :--------------------- | :----------------------------------------------------------------- |
-| `name_of_flow/`        | Root directory for the flow.                                       |
-| ├── `crews/`           | Contains directories for specific crews.                           |
-| │ └── `poem_crew/`     | Directory for the "poem_crew" with its configurations and scripts. |
-| │ ├── `config/`        | Configuration files directory for the "poem_crew".                 |
-| │ │ ├── `agents.yaml`  | YAML file defining the agents for "poem_crew".                     |
-| │ │ └── `tasks.yaml`   | YAML file defining the tasks for "poem_crew".                      |
-| │ ├── `poem_crew.py`   | Script for "poem_crew" functionality.                              |
-| ├── `tools/`           | Directory for additional tools used in the flow.                   |
-| │ └── `custom_tool.py` | Custom tool implementation.                                        |
-| ├── `main.py`          | Main script for running the flow.                                  |
-| ├── `README.md`        | Project description and instructions.                              |
-| ├── `pyproject.toml`   | Configuration file for project dependencies and settings.          |
-| └── `.gitignore`       | Specifies files and directories to ignore in version control.      |
-
-### Building Your Crews
-
-In the `crews` folder, you can define multiple crews. Each crew will have its own folder containing configuration files and the crew definition file. For example, the `poem_crew` folder contains:
-
-- `config/agents.yaml`: Defines the agents for the crew.
-- `config/tasks.yaml`: Defines the tasks for the crew.
-- `poem_crew.py`: Contains the crew definition, including agents, tasks, and the crew itself.
-
-You can copy, paste, and edit the `poem_crew` to create other crews.
-
-### Connecting Crews in `main.py`
-
-The `main.py` file is where you create your flow and connect the crews together. You can define your flow by using the `Flow` class and the decorators `@start` and `@listen` to specify the flow of execution.
-
-Here's an example of how you can connect the `poem_crew` in the `main.py` file:
-
-```python Code
-#!/usr/bin/env python
-from random import randint
-
-from pydantic import BaseModel
-from crewai.flow.flow import Flow, listen, start
-from .crews.poem_crew.poem_crew import PoemCrew
-
-class PoemState(BaseModel):
-    sentence_count: int = 1
-    poem: str = ""
-
-class PoemFlow(Flow[PoemState]):
-
-    @start()
-    def generate_sentence_count(self):
-        print("Generating sentence count")
-        self.state.sentence_count = randint(1, 5)
-
-    @listen(generate_sentence_count)
-    def generate_poem(self):
-        print("Generating poem")
-        result = PoemCrew().crew().kickoff(inputs={"sentence_count": self.state.sentence_count})
-
-        print("Poem generated", result.raw)
-        self.state.poem = result.raw
-
-    @listen(generate_poem)
-    def save_poem(self):
-        print("Saving poem")
-        with open("poem.txt", "w") as f:
-            f.write(self.state.poem)
-
-def kickoff():
-    poem_flow = PoemFlow()
-    poem_flow.kickoff()
-
-
-def plot():
-    poem_flow = PoemFlow()
-    poem_flow.plot("PoemFlowPlot")
-
-if __name__ == "__main__":
-    kickoff()
-    plot()
-```
-
-In this example, the `PoemFlow` class defines a flow that generates a sentence count, uses the `PoemCrew` to generate a poem, and then saves the poem to a file. The flow is kicked off by calling the `kickoff()` method. The PoemFlowPlot will be generated by `plot()` method.
-
-![Flow Visual image](/images/crewai-flow-8.png)
-
-### Running the Flow
-
-(Optional) Before running the flow, you can install the dependencies by running:
-
-```bash
-crewai install
-```
-
-Once all of the dependencies are installed, you need to activate the virtual environment by running:
-
-```bash
-source .venv/bin/activate
-```
-
-After activating the virtual environment, you can run the flow by executing one of the following commands:
-
-```bash
-crewai flow kickoff
-```
-
-or
-
-```bash
-uv run kickoff
-```
-
-The flow will execute, and you should see the output in the console.
-
-## Plot Flows
-
-Visualizing your AI workflows can provide valuable insights into the structure and execution paths of your flows. CrewAI offers a powerful visualization tool that allows you to generate interactive plots of your flows, making it easier to understand and optimize your AI workflows.
-
-### What are Plots?
-
-Plots in CrewAI are graphical representations of your AI workflows. They display the various tasks, their connections, and the flow of data between them. This visualization helps in understanding the sequence of operations, identifying bottlenecks, and ensuring that the workflow logic aligns with your expectations.
-
-### How to Generate a Plot
-
-CrewAI provides two convenient methods to generate plots of your flows:
-
-#### Option 1: Using the `plot()` Method
-
-If you are working directly with a flow instance, you can generate a plot by calling the `plot()` method on your flow object. This method will create an HTML file containing the interactive plot of your flow.
-
-```python Code
-# Assuming you have a flow instance
-flow.plot("my_flow_plot")
-```
-
-This will generate a file named `my_flow_plot.html` in your current directory. You can open this file in a web browser to view the interactive plot.
-
-#### Option 2: Using the Command Line
-
-If you are working within a structured CrewAI project, you can generate a plot using the command line. This is particularly useful for larger projects where you want to visualize the entire flow setup.
-
-```bash
-crewai flow plot
-```
-
-This command will generate an HTML file with the plot of your flow, similar to the `plot()` method. The file will be saved in your project directory, and you can open it in a web browser to explore the flow.
-
-### Understanding the Plot
-
-The generated plot will display nodes representing the tasks in your flow, with directed edges indicating the flow of execution. The plot is interactive, allowing you to zoom in and out, and hover over nodes to see additional details.
-
-By visualizing your flows, you can gain a clearer understanding of the workflow's structure, making it easier to debug, optimize, and communicate your AI processes to others.
-
-### Conclusion
-
-Plotting your flows is a powerful feature of CrewAI that enhances your ability to design and manage complex AI workflows. Whether you choose to use the `plot()` method or the command line, generating plots will provide you with a visual representation of your workflows, aiding in both development and presentation.
-
-## Next Steps
-
-If you're interested in exploring additional examples of flows, we have a variety of recommendations in our examples repository. Here are four specific flow examples, each showcasing unique use cases to help you match your current problem type to a specific example:
-
-1. **Email Auto Responder Flow**: This example demonstrates an infinite loop where a background job continually runs to automate email responses. It's a great use case for tasks that need to be performed repeatedly without manual intervention. [View Example](https://github.com/crewAIInc/crewAI-examples/tree/main/email_auto_responder_flow)
-
-2. **Lead Score Flow**: This flow showcases adding human-in-the-loop feedback and handling different conditional branches using the router. It's an excellent example of how to incorporate dynamic decision-making and human oversight into your workflows. [View Example](https://github.com/crewAIInc/crewAI-examples/tree/main/lead-score-flow)
-
-3. **Write a Book Flow**: This example excels at chaining multiple crews together, where the output of one crew is used by another. Specifically, one crew outlines an entire book, and another crew generates chapters based on the outline. Eventually, everything is connected to produce a complete book. This flow is perfect for complex, multi-step processes that require coordination between different tasks. [View Example](https://github.com/crewAIInc/crewAI-examples/tree/main/write_a_book_with_flows)
-
-4. **Meeting Assistant Flow**: This flow demonstrates how to broadcast one event to trigger multiple follow-up actions. For instance, after a meeting is completed, the flow can update a Trello board, send a Slack message, and save the results. It's a great example of handling multiple outcomes from a single event, making it ideal for comprehensive task management and notification systems. [View Example](https://github.com/crewAIInc/crewAI-examples/tree/main/meeting_assistant_flow)
-
-By exploring these examples, you can gain insights into how to leverage CrewAI Flows for various use cases, from automating repetitive tasks to managing complex, multi-step processes with dynamic decision-making and human feedback.
-
-Also, check out our YouTube video on how to use flows in CrewAI below!
-
-<iframe
-  className="w-full aspect-video rounded-xl"
-  src="https://www.youtube.com/embed/MTb5my6VOT8"
-  title="CrewAI Flows overview"
-  frameBorder="0"
-  allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"
-  referrerPolicy="strict-origin-when-cross-origin"
-  allowFullScreen
-></iframe>
+For visualization:
+- Use `flow.plot("my_flow_plot")` in code, or
+- Use `crewai flow plot` in CLI projects.
 
 ## Running Flows
 
@@ -972,7 +801,7 @@ There are two ways to run a flow:
 You can run a flow programmatically by creating an instance of your flow class and calling the `kickoff()` method:
 
 ```python
-flow = ExampleFlow()
+flow = SupportTriageFlow()
 result = flow.kickoff()
 ```
 
@@ -1091,3 +920,21 @@ crewai flow kickoff
 ```
 
 However, the `crewai run` command is now the preferred method as it works for both crews and flows.
+
+## Common Failure Modes
+
+### Router branch not firing
+- Cause: returned label does not match a `@listen("label")` value.
+- Fix: align router return strings with listener labels exactly.
+
+### State fields missing at runtime
+- Cause: untyped dynamic fields or missing kickoff inputs.
+- Fix: use typed state and validate required fields in `@start()`.
+
+### Prompt/token growth over time
+- Cause: appending unbounded message history in state.
+- Fix: apply sliding-window state and summary compaction patterns.
+
+### Non-idempotent retries
+- Cause: side effects executed on retried steps.
+- Fix: add idempotency keys/markers to state and guard external writes.
diff --git a/docs/en/concepts/llms.mdx b/docs/en/concepts/llms.mdx
index 1889d479d..b9c956a39 100644
--- a/docs/en/concepts/llms.mdx
+++ b/docs/en/concepts/llms.mdx
@@ -9,6 +9,17 @@ mode: "wide"
 
 CrewAI integrates with multiple LLM providers through providers native sdks, giving you the flexibility to choose the right model for your specific use case. This guide will help you understand how to configure and use different LLM providers in your CrewAI projects.
 
+## When to Use Advanced LLM Configuration
+
+- You need strict control of latency, cost, and output format.
+- You need model routing by task type.
+- You need reproducible, policy-sensitive behavior in production.
+
+## When Not to Over-Configure
+
+- You are in early prototyping with one simple task path.
+- You do not yet need structured outputs or model routing.
+
 
 ## What are LLMs?
 
@@ -202,999 +213,19 @@ reasoning_llm = LLM(
 
 This is especially useful in long-running assistants where you want conversation continuity and controllable reasoning depth.
 
-## Provider Configuration Examples
+## Provider Configuration
 
-CrewAI supports a multitude of LLM providers, each offering unique features, authentication methods, and model capabilities.
-In this section, you'll find detailed examples that help you select, configure, and optimize the LLM that best fits your project's needs.
+For concept-level usage, keep provider setup minimal and explicit:
 
-<AccordionGroup>
-  <Accordion title="OpenAI">
-    CrewAI provides native integration with OpenAI through the OpenAI Python SDK.
+1. Set provider credentials via environment variables.
+2. Pin model IDs explicitly in code or YAML.
+3. Set reliability defaults (`timeout`, `max_retries`, low `temperature`) for production.
 
-    ```toml Code
-    # Required
-    OPENAI_API_KEY=sk-...
-
-    # Optional
-    OPENAI_BASE_URL=<custom-base-url>
-    ```
-
-    **Basic Usage:**
-    ```python Code
-    from crewai import LLM
-
-    llm = LLM(
-        model="openai/gpt-4o",
-        api_key="your-api-key",  # Or set OPENAI_API_KEY
-        temperature=0.7,
-        max_tokens=4000
-    )
-    ```
-
-    **Advanced Configuration:**
-    ```python Code
-    from crewai import LLM
-
-    llm = LLM(
-        model="openai/gpt-4o",
-        api_key="your-api-key",
-        base_url="https://api.openai.com/v1",  # Optional custom endpoint
-        organization="org-...",  # Optional organization ID
-        project="proj_...",  # Optional project ID
-        temperature=0.7,
-        max_tokens=4000,
-        max_completion_tokens=4000,  # For newer models
-        top_p=0.9,
-        frequency_penalty=0.1,
-        presence_penalty=0.1,
-        stop=["END"],
-        seed=42,  # For reproducible outputs
-        stream=True,  # Enable streaming
-        timeout=60.0,  # Request timeout in seconds
-        max_retries=3,  # Maximum retry attempts
-        logprobs=True,  # Return log probabilities
-        top_logprobs=5,  # Number of most likely tokens
-        reasoning_effort="medium"  # For o1 models: low, medium, high
-    )
-    ```
-
-    **Structured Outputs:**
-    ```python Code
-    from pydantic import BaseModel
-    from crewai import LLM
-
-    class ResponseFormat(BaseModel):
-        name: str
-        age: int
-        summary: str
-
-    llm = LLM(
-        model="openai/gpt-4o",
-    )
-    ```
-
-    **Supported Environment Variables:**
-    - `OPENAI_API_KEY`: Your OpenAI API key (required)
-    - `OPENAI_BASE_URL`: Custom base URL for OpenAI API (optional)
-
-    **Features:**
-    - Native function calling support (except o1 models)
-    - Structured outputs with JSON schema
-    - Streaming support for real-time responses
-    - Token usage tracking
-    - Stop sequences support (except o1 models)
-    - Log probabilities for token-level insights
-    - Reasoning effort control for o1 models
-
-    **Supported Models:**
-
-    | Model               | Context Window   | Best For                                      |
-    |---------------------|------------------|-----------------------------------------------|
-    | gpt-4.1             | 1M tokens        | Latest model with enhanced capabilities       |
-    | gpt-4.1-mini        | 1M tokens        | Efficient version with large context          |
-    | gpt-4.1-nano        | 1M tokens        | Ultra-efficient variant                       |
-    | gpt-4o              | 128,000 tokens   | Optimized for speed and intelligence          |
-    | gpt-4o-mini         | 200,000 tokens   | Cost-effective with large context             |
-    | gpt-4-turbo         | 128,000 tokens   | Long-form content, document analysis          |
-    | gpt-4               | 8,192 tokens     | High-accuracy tasks, complex reasoning        |
-    | o1                  | 200,000 tokens   | Advanced reasoning, complex problem-solving   |
-    | o1-preview          | 128,000 tokens   | Preview of reasoning capabilities             |
-    | o1-mini             | 128,000 tokens   | Efficient reasoning model                     |
-    | o3-mini             | 200,000 tokens   | Lightweight reasoning model                   |
-    | o4-mini             | 200,000 tokens   | Next-gen efficient reasoning                  |
-
-    **Responses API:**
-
-    OpenAI offers two APIs: Chat Completions (default) and the newer Responses API. The Responses API was designed from the ground up with native multimodal support—text, images, audio, and function calls are all first-class citizens. It provides better performance with reasoning models and supports additional features like auto-chaining and built-in tools.
-
-    ```python Code
-    from crewai import LLM
-
-    # Use the Responses API instead of Chat Completions
-    llm = LLM(
-        model="openai/gpt-4o",
-        api="responses",  # Enable Responses API
-        store=True,  # Store responses for multi-turn (optional)
-        auto_chain=True,  # Auto-chain for reasoning models (optional)
-    )
-    ```
-
-    **Responses API Parameters:**
-    - `api`: Set to `"responses"` to use the Responses API (default: `"completions"`)
-    - `instructions`: System-level instructions (Responses API only)
-    - `store`: Whether to store responses for multi-turn conversations
-    - `previous_response_id`: ID of previous response for multi-turn
-    - `include`: Additional data to include in response (e.g., `["reasoning.encrypted_content"]`)
-    - `builtin_tools`: List of OpenAI built-in tools: `"web_search"`, `"file_search"`, `"code_interpreter"`, `"computer_use"`
-    - `parse_tool_outputs`: Return structured `ResponsesAPIResult` with parsed built-in tool outputs
-    - `auto_chain`: Automatically track and use response IDs for multi-turn conversations
-    - `auto_chain_reasoning`: Track encrypted reasoning items for ZDR (Zero Data Retention) compliance
-
-    <Tip>
-      Use the Responses API for new projects, especially when working with reasoning models (o1, o3, o4) or when you need native multimodal support for [files](/en/concepts/files).
-    </Tip>
-
-    **Note:** To use OpenAI, install the required dependencies:
-    ```bash
-    uv add "crewai[openai]"
-    ```
-  </Accordion>
-
-  <Accordion title="Meta-Llama">
-    Meta's Llama API provides access to Meta's family of large language models.
-    The API is available through the [Meta Llama API](https://llama.developer.meta.com?utm_source=partner-crewai&utm_medium=website).
-    Set the following environment variables in your `.env` file:
-
-    ```toml Code
-    # Meta Llama API Key Configuration
-    LLAMA_API_KEY=LLM|your_api_key_here
-    ```
-
-    Example usage in your CrewAI project:
-    ```python Code
-    from crewai import LLM
-
-    # Initialize Meta Llama LLM
-    llm = LLM(
-        model="meta_llama/Llama-4-Scout-17B-16E-Instruct-FP8",
-        temperature=0.8,
-        stop=["END"],
-        seed=42
-    )
-    ```
-
-    All models listed here https://llama.developer.meta.com/docs/models/ are supported.
-
-    | Model ID | Input context length | Output context length | Input Modalities | Output Modalities |
-    | --- | --- | --- | --- | --- |
-    | `meta_llama/Llama-4-Scout-17B-16E-Instruct-FP8` | 128k | 4028 | Text, Image | Text |
-    | `meta_llama/Llama-4-Maverick-17B-128E-Instruct-FP8` | 128k | 4028 | Text, Image | Text |
-    | `meta_llama/Llama-3.3-70B-Instruct` | 128k | 4028 | Text | Text |
-    | `meta_llama/Llama-3.3-8B-Instruct` | 128k | 4028 | Text | Text |
-  </Accordion>
-
-  <Accordion title="Anthropic">
-    CrewAI provides native integration with Anthropic through the Anthropic Python SDK.
-
-    ```toml Code
-    # Required
-    ANTHROPIC_API_KEY=sk-ant-...
-    ```
-
-    **Basic Usage:**
-    ```python Code
-    from crewai import LLM
-
-    llm = LLM(
-        model="anthropic/claude-3-5-sonnet-20241022",
-        api_key="your-api-key",  # Or set ANTHROPIC_API_KEY
-        max_tokens=4096  # Required for Anthropic
-    )
-    ```
-
-    **Advanced Configuration:**
-    ```python Code
-    from crewai import LLM
-
-    llm = LLM(
-        model="anthropic/claude-3-5-sonnet-20241022",
-        api_key="your-api-key",
-        base_url="https://api.anthropic.com",  # Optional custom endpoint
-        temperature=0.7,
-        max_tokens=4096,  # Required parameter
-        top_p=0.9,
-        stop_sequences=["END", "STOP"],  # Anthropic uses stop_sequences
-        stream=True,  # Enable streaming
-        timeout=60.0,  # Request timeout in seconds
-        max_retries=3  # Maximum retry attempts
-    )
-    ```
-
-    **Extended Thinking (Claude Sonnet 4 and Beyond):**
-
-    CrewAI supports Anthropic's Extended Thinking feature, which allows Claude to think through problems in a more human-like way before responding. This is particularly useful for complex reasoning, analysis, and problem-solving tasks.
-
-    ```python Code
-    from crewai import LLM
-
-    # Enable extended thinking with default settings
-    llm = LLM(
-        model="anthropic/claude-sonnet-4",
-        thinking={"type": "enabled"},
-        max_tokens=10000
-    )
-
-    # Configure thinking with budget control
-    llm = LLM(
-        model="anthropic/claude-sonnet-4",
-        thinking={
-            "type": "enabled",
-            "budget_tokens": 5000  # Limit thinking tokens
-        },
-        max_tokens=10000
-    )
-    ```
-
-    **Thinking Configuration Options:**
-    - `type`: Set to `"enabled"` to activate extended thinking mode
-    - `budget_tokens` (optional): Maximum tokens to use for thinking (helps control costs)
-
-    **Models Supporting Extended Thinking:**
-    - `claude-sonnet-4` and newer models
-    - `claude-3-7-sonnet` (with extended thinking capabilities)
-
-    **When to Use Extended Thinking:**
-    - Complex reasoning and multi-step problem solving
-    - Mathematical calculations and proofs
-    - Code analysis and debugging
-    - Strategic planning and decision making
-    - Research and analytical tasks
-
-    **Note:** Extended thinking consumes additional tokens but can significantly improve response quality for complex tasks.
-
-    **Supported Environment Variables:**
-    - `ANTHROPIC_API_KEY`: Your Anthropic API key (required)
-
-    **Features:**
-    - Native tool use support for Claude 3+ models
-    - Extended Thinking support for Claude Sonnet 4+
-    - Streaming support for real-time responses
-    - Automatic system message handling
-    - Stop sequences for controlled output
-    - Token usage tracking
-    - Multi-turn tool use conversations
-
-    **Important Notes:**
-    - `max_tokens` is a **required** parameter for all Anthropic models
-    - Claude uses `stop_sequences` instead of `stop`
-    - System messages are handled separately from conversation messages
-    - First message must be from the user (automatically handled)
-    - Messages must alternate between user and assistant
-
-    **Supported Models:**
-
-    | Model                        | Context Window | Best For                                      |
-    |------------------------------|----------------|-----------------------------------------------|
-    | claude-sonnet-4              | 200,000 tokens | Latest with extended thinking capabilities    |
-    | claude-3-7-sonnet            | 200,000 tokens | Advanced reasoning and agentic tasks          |
-    | claude-3-5-sonnet-20241022   | 200,000 tokens | Latest Sonnet with best performance           |
-    | claude-3-5-haiku             | 200,000 tokens | Fast, compact model for quick responses       |
-    | claude-3-opus                | 200,000 tokens | Most capable for complex tasks                |
-    | claude-3-sonnet              | 200,000 tokens | Balanced intelligence and speed               |
-    | claude-3-haiku               | 200,000 tokens | Fastest for simple tasks                      |
-    | claude-2.1                   | 200,000 tokens | Extended context, reduced hallucinations      |
-    | claude-2                     | 100,000 tokens | Versatile model for various tasks             |
-    | claude-instant               | 100,000 tokens | Fast, cost-effective for everyday tasks       |
-
-    **Note:** To use Anthropic, install the required dependencies:
-    ```bash
-    uv add "crewai[anthropic]"
-    ```
-  </Accordion>
-
-  <Accordion title="Google (Gemini API)">
-    CrewAI provides native integration with Google Gemini through the Google Gen AI Python SDK.
-
-    Set your API key in your `.env` file. If you need a key, check [AI Studio](https://aistudio.google.com/apikey).
-
-    ```toml .env
-    # Required (one of the following)
-    GOOGLE_API_KEY=<your-api-key>
-    GEMINI_API_KEY=<your-api-key>
-
-    # For Vertex AI Express mode (API key authentication)
-    GOOGLE_GENAI_USE_VERTEXAI=true
-    GOOGLE_API_KEY=<your-api-key>
-
-    # For Vertex AI with service account
-    GOOGLE_CLOUD_PROJECT=<your-project-id>
-    GOOGLE_CLOUD_LOCATION=<location>  # Defaults to us-central1
-    ```
-
-    **Basic Usage:**
-    ```python Code
-    from crewai import LLM
-
-    llm = LLM(
-        model="gemini/gemini-2.0-flash",
-        api_key="your-api-key",  # Or set GOOGLE_API_KEY/GEMINI_API_KEY
-        temperature=0.7
-    )
-    ```
-
-    **Advanced Configuration:**
-    ```python Code
-    from crewai import LLM
-
-    llm = LLM(
-        model="gemini/gemini-2.5-flash",
-        api_key="your-api-key",
-        temperature=0.7,
-        top_p=0.9,
-        top_k=40,  # Top-k sampling parameter
-        max_output_tokens=8192,
-        stop_sequences=["END", "STOP"],
-        stream=True,  # Enable streaming
-        safety_settings={
-            "HARM_CATEGORY_HARASSMENT": "BLOCK_NONE",
-            "HARM_CATEGORY_HATE_SPEECH": "BLOCK_NONE"
-        }
-    )
-    ```
-
-    **Vertex AI Express Mode (API Key Authentication):**
-
-    Vertex AI Express mode allows you to use Vertex AI with simple API key authentication instead of service account credentials. This is the quickest way to get started with Vertex AI.
-
-    To enable Express mode, set both environment variables in your `.env` file:
-    ```toml .env
-    GOOGLE_GENAI_USE_VERTEXAI=true
-    GOOGLE_API_KEY=<your-api-key>
-    ```
-
-    Then use the LLM as usual:
-    ```python Code
-    from crewai import LLM
-
-    llm = LLM(
-        model="gemini/gemini-2.0-flash",
-        temperature=0.7
-    )
-    ```
-
-    <Info>
-      To get an Express mode API key:
-      - New Google Cloud users: Get an [express mode API key](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart?usertype=apikey)
-      - Existing Google Cloud users: Get a [Google Cloud API key bound to a service account](https://cloud.google.com/docs/authentication/api-keys)
-      
-      For more details, see the [Vertex AI Express mode documentation](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/start/quickstart?usertype=apikey).
-    </Info>
-
-    **Vertex AI Configuration (Service Account):**
-    ```python Code
-    from crewai import LLM
-
-    llm = LLM(
-        model="gemini/gemini-1.5-pro",
-        project="your-gcp-project-id",
-        location="us-central1"  # GCP region
-    )
-    ```
-
-    **Supported Environment Variables:**
-    - `GOOGLE_API_KEY` or `GEMINI_API_KEY`: Your Google API key (required for Gemini API and Vertex AI Express mode)
-    - `GOOGLE_GENAI_USE_VERTEXAI`: Set to `true` to use Vertex AI (required for Express mode)
-    - `GOOGLE_CLOUD_PROJECT`: Google Cloud project ID (for Vertex AI with service account)
-    - `GOOGLE_CLOUD_LOCATION`: GCP location (defaults to `us-central1`)
-
-    **Features:**
-    - Native function calling support for Gemini 1.5+ and 2.x models
-    - Streaming support for real-time responses
-    - Multimodal capabilities (text, images, video)
-    - Safety settings configuration
-    - Support for both Gemini API and Vertex AI
-    - Automatic system instruction handling
-    - Token usage tracking
-
-    **Gemini Models:**
-
-    Google offers a range of powerful models optimized for different use cases.
-
-    | Model                          | Context Window | Best For                                                          |
-    |--------------------------------|----------------|-------------------------------------------------------------------|
-    | gemini-2.5-flash               | 1M tokens      | Adaptive thinking, cost efficiency                                |
-    | gemini-2.5-pro                 | 1M tokens      | Enhanced thinking and reasoning, multimodal understanding         |
-    | gemini-2.0-flash               | 1M tokens      | Next generation features, speed, thinking                         |
-    | gemini-2.0-flash-thinking      | 32,768 tokens  | Advanced reasoning with thinking process                          |
-    | gemini-2.0-flash-lite          | 1M tokens      | Cost efficiency and low latency                                   |
-    | gemini-1.5-pro                 | 2M tokens      | Best performing, logical reasoning, coding                        |
-    | gemini-1.5-flash               | 1M tokens      | Balanced multimodal model, good for most tasks                    |
-    | gemini-1.5-flash-8b            | 1M tokens      | Fastest, most cost-efficient                                      |
-    | gemini-1.0-pro                 | 32,768 tokens  | Earlier generation model                                          |
-
-    **Gemma Models:**
-
-    The Gemini API also supports [Gemma models](https://ai.google.dev/gemma/docs) hosted on Google infrastructure.
-
-    | Model          | Context Window | Best For                           |
-    |----------------|----------------|------------------------------------|
-    | gemma-3-1b     | 32,000 tokens  | Ultra-lightweight tasks            |
-    | gemma-3-4b     | 128,000 tokens | Efficient general-purpose tasks    |
-    | gemma-3-12b    | 128,000 tokens | Balanced performance and efficiency|
-    | gemma-3-27b    | 128,000 tokens | High-performance tasks             |
-
-    **Note:** To use Google Gemini, install the required dependencies:
-    ```bash
-    uv add "crewai[google-genai]"
-    ```
-
-    The full list of models is available in the [Gemini model docs](https://ai.google.dev/gemini-api/docs/models).
-  </Accordion>
-  <Accordion title="Google (Vertex AI)">
-    Get credentials from your Google Cloud Console and save it to a JSON file, then load it with the following code:
-    ```python Code
-    import json
-
-    file_path = 'path/to/vertex_ai_service_account.json'
-
-    # Load the JSON file
-    with open(file_path, 'r') as file:
-        vertex_credentials = json.load(file)
-
-    # Convert the credentials to a JSON string
-    vertex_credentials_json = json.dumps(vertex_credentials)
-    ```
-
-    Example usage in your CrewAI project:
-    ```python Code
-    from crewai import LLM
-
-    llm = LLM(
-        model="gemini-1.5-pro-latest", # or vertex_ai/gemini-1.5-pro-latest
-        temperature=0.7,
-        vertex_credentials=vertex_credentials_json
-    )
-    ```
-
-    Google offers a range of powerful models optimized for different use cases:
-
-    | Model                          | Context Window | Best For                                                          |
-    |--------------------------------|----------------|-------------------------------------------------------------------|
-    | gemini-2.5-flash-preview-04-17 | 1M tokens      | Adaptive thinking, cost efficiency                                |
-    | gemini-2.5-pro-preview-05-06   | 1M tokens      | Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more |
-    | gemini-2.0-flash               | 1M tokens      | Next generation features, speed, thinking, and realtime streaming |
-    | gemini-2.0-flash-lite          | 1M tokens      | Cost efficiency and low latency                                   |
-    | gemini-1.5-flash               | 1M tokens      | Balanced multimodal model, good for most tasks                    |
-    | gemini-1.5-flash-8B            | 1M tokens      | Fastest, most cost-efficient, good for high-frequency tasks       |
-    | gemini-1.5-pro                 | 2M tokens      | Best performing, wide variety of reasoning tasks including logical reasoning, coding, and creative collaboration |
-  </Accordion>
-
-  <Accordion title="Azure">
-    CrewAI provides native integration with Azure AI Inference and Azure OpenAI through the Azure AI Inference Python SDK.
-
-    ```toml Code
-    # Required
-    AZURE_API_KEY=<your-api-key>
-    AZURE_ENDPOINT=<your-endpoint-url>
-
-    # Optional
-    AZURE_API_VERSION=<api-version>  # Defaults to 2024-06-01
-    ```
-
-    **Endpoint URL Formats:**
-
-    For Azure OpenAI deployments:
-    ```
-    https://<resource-name>.openai.azure.com/openai/deployments/<deployment-name>
-    ```
-
-    For Azure AI Inference endpoints:
-    ```
-    https://<resource-name>.inference.azure.com
-    ```
-
-    **Basic Usage:**
-    ```python Code
-    llm = LLM(
-        model="azure/gpt-4",
-        api_key="<your-api-key>",  # Or set AZURE_API_KEY
-        endpoint="<your-endpoint-url>",
-        api_version="2024-06-01"
-    )
-    ```
-
-    **Advanced Configuration:**
-    ```python Code
-    llm = LLM(
-        model="azure/gpt-4o",
-        temperature=0.7,
-        max_tokens=4000,
-        top_p=0.9,
-        frequency_penalty=0.0,
-        presence_penalty=0.0,
-        stop=["END"],
-        stream=True,
-        timeout=60.0,
-        max_retries=3
-    )
-    ```
-
-    **Supported Environment Variables:**
-    - `AZURE_API_KEY`: Your Azure API key (required)
-    - `AZURE_ENDPOINT`: Your Azure endpoint URL (required, also checks `AZURE_OPENAI_ENDPOINT` and `AZURE_API_BASE`)
-    - `AZURE_API_VERSION`: API version (optional, defaults to `2024-06-01`)
-
-    **Features:**
-    - Native function calling support for Azure OpenAI models (gpt-4, gpt-4o, gpt-3.5-turbo, etc.)
-    - Streaming support for real-time responses
-    - Automatic endpoint URL validation and correction
-    - Comprehensive error handling with retry logic
-    - Token usage tracking
-
-    **Note:** To use Azure AI Inference, install the required dependencies:
-    ```bash
-    uv add "crewai[azure-ai-inference]"
-    ```
-  </Accordion>
-
-  <Accordion title="AWS Bedrock">
-    CrewAI provides native integration with AWS Bedrock through the boto3 SDK using the Converse API.
-
-    ```toml Code
-    # Required
-    AWS_ACCESS_KEY_ID=<your-access-key>
-    AWS_SECRET_ACCESS_KEY=<your-secret-key>
-
-    # Optional
-    AWS_SESSION_TOKEN=<your-session-token>  # For temporary credentials
-    AWS_DEFAULT_REGION=<your-region>  # Defaults to us-east-1
-    ```
-
-    **Basic Usage:**
-    ```python Code
-    from crewai import LLM
-
-    llm = LLM(
-        model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
-        region_name="us-east-1"
-    )
-    ```
-
-    **Advanced Configuration:**
-    ```python Code
-    from crewai import LLM
-
-    llm = LLM(
-        model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
-        aws_access_key_id="your-access-key",  # Or set AWS_ACCESS_KEY_ID
-        aws_secret_access_key="your-secret-key",  # Or set AWS_SECRET_ACCESS_KEY
-        aws_session_token="your-session-token",  # For temporary credentials
-        region_name="us-east-1",
-        temperature=0.7,
-        max_tokens=4096,
-        top_p=0.9,
-        top_k=250,  # For Claude models
-        stop_sequences=["END", "STOP"],
-        stream=True,  # Enable streaming
-        guardrail_config={  # Optional content filtering
-            "guardrailIdentifier": "your-guardrail-id",
-            "guardrailVersion": "1"
-        },
-        additional_model_request_fields={  # Model-specific parameters
-            "top_k": 250
-        }
-    )
-    ```
-
-    **Supported Environment Variables:**
-    - `AWS_ACCESS_KEY_ID`: AWS access key (required)
-    - `AWS_SECRET_ACCESS_KEY`: AWS secret key (required)
-    - `AWS_SESSION_TOKEN`: AWS session token for temporary credentials (optional)
-    - `AWS_DEFAULT_REGION`: AWS region (defaults to `us-east-1`)
-
-    **Features:**
-    - Native tool calling support via Converse API
-    - Streaming and non-streaming responses
-    - Comprehensive error handling with retry logic
-    - Guardrail configuration for content filtering
-    - Model-specific parameters via `additional_model_request_fields`
-    - Token usage tracking and stop reason logging
-    - Support for all Bedrock foundation models
-    - Automatic conversation format handling
-
-    **Important Notes:**
-    - Uses the modern Converse API for unified model access
-    - Automatic handling of model-specific conversation requirements
-    - System messages are handled separately from conversation
-    - First message must be from user (automatically handled)
-    - Some models (like Cohere) require conversation to end with user message
-
-    [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html) is a managed service that provides access to multiple foundation models from top AI companies through a unified API.
-
-    | Model                   | Context Window       | Best For                                                          |
-    |-------------------------|----------------------|-------------------------------------------------------------------|
-    | Amazon Nova Pro         | Up to 300k tokens    | High-performance, model balancing accuracy, speed, and cost-effectiveness across diverse tasks. |
-    | Amazon Nova Micro       | Up to 128k tokens    | High-performance, cost-effective text-only model optimized for lowest latency responses. |
-    | Amazon Nova Lite        | Up to 300k tokens    | High-performance, affordable multimodal processing for images, video, and text with real-time capabilities. |
-    | Claude 3.7 Sonnet       | Up to 128k tokens    | High-performance, best for complex reasoning, coding & AI agents |
-    | Claude 3.5 Sonnet v2    | Up to 200k tokens    | State-of-the-art model specialized in software engineering, agentic capabilities, and computer interaction at optimized cost. |
-    | Claude 3.5 Sonnet       | Up to 200k tokens    | High-performance model delivering superior intelligence and reasoning across diverse tasks with optimal speed-cost balance. |
-    | Claude 3.5 Haiku        | Up to 200k tokens    | Fast, compact multimodal model optimized for quick responses and seamless human-like interactions |
-    | Claude 3 Sonnet         | Up to 200k tokens    | Multimodal model balancing intelligence and speed for high-volume deployments. |
-    | Claude 3 Haiku          | Up to 200k tokens    | Compact, high-speed multimodal model optimized for quick responses and natural conversational interactions |
-    | Claude 3 Opus           | Up to 200k tokens    | Most advanced multimodal model exceling at complex tasks with human-like reasoning and superior contextual understanding. |
-    | Claude 2.1              | Up to 200k tokens    | Enhanced version with expanded context window, improved reliability, and reduced hallucinations for long-form and RAG applications |
-    | Claude                  | Up to 100k tokens    | Versatile model excelling in sophisticated dialogue, creative content, and precise instruction following. |
-    | Claude Instant          | Up to 100k tokens    | Fast, cost-effective model for everyday tasks like dialogue, analysis, summarization, and document Q&A |
-    | Llama 3.1 405B Instruct | Up to 128k tokens    | Advanced LLM for synthetic data generation, distillation, and inference for chatbots, coding, and domain-specific tasks. |
-    | Llama 3.1 70B Instruct  | Up to 128k tokens    | Powers complex conversations with superior contextual understanding, reasoning and text generation. |
-    | Llama 3.1 8B Instruct   | Up to 128k tokens    | Advanced state-of-the-art model with language understanding, superior reasoning, and text generation. |
-    | Llama 3 70B Instruct    | Up to 8k tokens      | Powers complex conversations with superior contextual understanding, reasoning and text generation. |
-    | Llama 3 8B Instruct     | Up to 8k tokens      | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
-    | Titan Text G1 - Lite    | Up to 4k tokens      | Lightweight, cost-effective model optimized for English tasks and fine-tuning with focus on summarization and content generation. |
-    | Titan Text G1 - Express | Up to 8k tokens      | Versatile model for general language tasks, chat, and RAG applications with support for English and 100+ languages. |
-    | Cohere Command          | Up to 4k tokens      | Model specialized in following user commands and delivering practical enterprise solutions. |
-    | Jurassic-2 Mid          | Up to 8,191 tokens   | Cost-effective model balancing quality and affordability for diverse language tasks like Q&A, summarization, and content generation. |
-    | Jurassic-2 Ultra        | Up to 8,191 tokens   | Model for advanced text generation and comprehension, excelling in complex tasks like analysis and content creation. |
-    | Jamba-Instruct          | Up to 256k tokens    | Model with extended context window optimized for cost-effective text generation, summarization, and Q&A. |
-    | Mistral 7B Instruct     | Up to 32k tokens     | This LLM follows instructions, completes requests, and generates creative text. |
-    | Mistral 8x7B Instruct   | Up to 32k tokens     | An MOE LLM that follows instructions, completes requests, and generates creative text. |
-    | DeepSeek R1             | 32,768 tokens        | Advanced reasoning model                                                       |
-
-    **Note:** To use AWS Bedrock, install the required dependencies:
-    ```bash
-    uv add "crewai[bedrock]"
-    ```
-  </Accordion>
-
-  <Accordion title="Amazon SageMaker">
-    ```toml Code
-    AWS_ACCESS_KEY_ID=<your-access-key>
-    AWS_SECRET_ACCESS_KEY=<your-secret-key>
-    AWS_DEFAULT_REGION=<your-region>
-    ```
-
-    Example usage in your CrewAI project:
-    ```python Code
-    llm = LLM(
-        model="sagemaker/<my-endpoint>"
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Mistral">
-    Set the following environment variables in your `.env` file:
-    ```toml Code
-    MISTRAL_API_KEY=<your-api-key>
-    ```
-
-    Example usage in your CrewAI project:
-    ```python Code
-    llm = LLM(
-        model="mistral/mistral-large-latest",
-        temperature=0.7
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Nvidia NIM">
-    Set the following environment variables in your `.env` file:
-    ```toml Code
-    NVIDIA_API_KEY=<your-api-key>
-    ```
-
-    Example usage in your CrewAI project:
-    ```python Code
-    llm = LLM(
-        model="nvidia_nim/meta/llama3-70b-instruct",
-        temperature=0.7
-    )
-    ```
-
-    Nvidia NIM provides a comprehensive suite of models for various use cases, from general-purpose tasks to specialized applications.
-
-    | Model                                                                   | Context Window | Best For                                                          |
-    |-------------------------------------------------------------------------|----------------|-------------------------------------------------------------------|
-    | nvidia/mistral-nemo-minitron-8b-8k-instruct                              | 8,192 tokens   | State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation. |
-    | nvidia/nemotron-4-mini-hindi-4b-instruct                                 | 4,096 tokens   | A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language. |
-    | nvidia/llama-3.1-nemotron-70b-instruct                                  | 128k tokens    | Customized for enhanced helpfulness in responses                  |
-    | nvidia/llama3-chatqa-1.5-8b                                                | 128k tokens    | Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines. |
-    | nvidia/llama3-chatqa-1.5-70b                                               | 128k tokens    | Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines. |
-    | nvidia/vila                                                             | 128k tokens    | Multi-modal vision-language model that understands text/img/video and creates informative responses |
-    | nvidia/neva-22                                                          | 4,096 tokens   | Multi-modal vision-language model that understands text/images and generates informative responses |
-    | nvidia/nemotron-mini-4b-instruct                                         | 8,192 tokens   | General-purpose tasks |
-    | nvidia/usdcode-llama3-70b-instruct                                       | 128k tokens    | State-of-the-art LLM that answers OpenUSD knowledge queries and generates USD-Python code. |
-    | nvidia/nemotron-4-340b-instruct                                          | 4,096 tokens   | Creates diverse synthetic data that mimics the characteristics of real-world data. |
-    | meta/codellama-70b                                                      | 100k tokens    | LLM capable of generating code from natural language and vice versa. |
-    | meta/llama2-70b                                                         | 4,096 tokens   | Cutting-edge large language AI model capable of generating text and code in response to prompts. |
-    | meta/llama3-8b-instruct                                                | 8,192 tokens   | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
-    | meta/llama3-70b-instruct                                               | 8,192 tokens   | Powers complex conversations with superior contextual understanding, reasoning and text generation. |
-    | meta/llama-3.1-8b-instruct                                             | 128k tokens    | Advanced state-of-the-art model with language understanding, superior reasoning, and text generation. |
-    | meta/llama-3.1-70b-instruct                                            | 128k tokens    | Powers complex conversations with superior contextual understanding, reasoning and text generation. |
-    | meta/llama-3.1-405b-instruct                                           | 128k tokens    | Advanced LLM for synthetic data generation, distillation, and inference for chatbots, coding, and domain-specific tasks. |
-    | meta/llama-3.2-1b-instruct                                             | 128k tokens    | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
-    | meta/llama-3.2-3b-instruct                                             | 128k tokens    | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
-    | meta/llama-3.2-11b-vision-instruct                                     | 128k tokens    | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
-    | meta/llama-3.2-90b-vision-instruct                                     | 128k tokens    | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
-    | google/gemma-7b                                                        | 8,192 tokens   | Cutting-edge text generation model text understanding, transformation, and code generation. |
-    | google/gemma-2b                                                        | 8,192 tokens   | Cutting-edge text generation model text understanding, transformation, and code generation. |
-    | google/codegemma-7b                                                    | 8,192 tokens   | Cutting-edge model built on Google's Gemma-7B specialized for code generation and code completion. |
-    | google/codegemma-1.1-7b                                               | 8,192 tokens   | Advanced programming model for code generation, completion, reasoning, and instruction following. |
-    | google/recurrentgemma-2b                                              | 8,192 tokens   | Novel recurrent architecture based language model for faster inference when generating long sequences. |
-    | google/gemma-2-9b-it                                                  | 8,192 tokens   | Cutting-edge text generation model text understanding, transformation, and code generation. |
-    | google/gemma-2-27b-it                                                 | 8,192 tokens   | Cutting-edge text generation model text understanding, transformation, and code generation. |
-    | google/gemma-2-2b-it                                                  | 8,192 tokens   | Cutting-edge text generation model text understanding, transformation, and code generation. |
-    | google/deplot                                                         | 512 tokens     | One-shot visual language understanding model that translates images of plots into tables. |
-    | google/paligemma                                                      | 8,192 tokens   | Vision language model adept at comprehending text and visual inputs to produce informative responses. |
-    | mistralai/mistral-7b-instruct-v0.2                                   | 32k tokens     | This LLM follows instructions, completes requests, and generates creative text. |
-    | mistralai/mixtral-8x7b-instruct-v0.1                                 | 8,192 tokens   | An MOE LLM that follows instructions, completes requests, and generates creative text. |
-    | mistralai/mistral-large                                              | 4,096 tokens   | Creates diverse synthetic data that mimics the characteristics of real-world data. |
-    | mistralai/mixtral-8x22b-instruct-v0.1                               | 8,192 tokens   | Creates diverse synthetic data that mimics the characteristics of real-world data. |
-    | mistralai/mistral-7b-instruct-v0.3                                  | 32k tokens     | This LLM follows instructions, completes requests, and generates creative text. |
-    | nv-mistralai/mistral-nemo-12b-instruct                              | 128k tokens    | Most advanced language model for reasoning, code, multilingual tasks; runs on a single GPU. |
-    | mistralai/mamba-codestral-7b-v0.1                                   | 256k tokens    | Model for writing and interacting with code across a wide range of programming languages and tasks. |
-    | microsoft/phi-3-mini-128k-instruct                                  | 128K tokens    | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
-    | microsoft/phi-3-mini-4k-instruct                                    | 4,096 tokens   | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
-    | microsoft/phi-3-small-8k-instruct                                   | 8,192 tokens   | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
-    | microsoft/phi-3-small-128k-instruct                                 | 128K tokens    | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
-    | microsoft/phi-3-medium-4k-instruct                                  | 4,096 tokens   | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
-    | microsoft/phi-3-medium-128k-instruct                                | 128K tokens    | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
-    | microsoft/phi-3.5-mini-instruct                                     | 128K tokens    | Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments |
-    | microsoft/phi-3.5-moe-instruct                                      | 128K tokens    | Advanced LLM based on Mixture of Experts architecture to deliver compute efficient content generation |
-    | microsoft/kosmos-2                                                  | 1,024 tokens   | Groundbreaking multimodal model designed to understand and reason about visual elements in images. |
-    | microsoft/phi-3-vision-128k-instruct                               | 128k tokens    | Cutting-edge open multimodal model exceling in high-quality reasoning from images. |
-    | microsoft/phi-3.5-vision-instruct                                  | 128k tokens    | Cutting-edge open multimodal model exceling in high-quality reasoning from images. |
-    | databricks/dbrx-instruct                                           | 12k tokens     | A general-purpose LLM with state-of-the-art performance in language understanding, coding, and RAG. |
-    | snowflake/arctic                                                   | 1,024 tokens   | Delivers high efficiency inference for enterprise applications focused on SQL generation and coding. |
-    | aisingapore/sea-lion-7b-instruct                                  | 4,096 tokens   | LLM to represent and serve the linguistic and cultural diversity of Southeast Asia |
-    | ibm/granite-8b-code-instruct                                      | 4,096 tokens   | Software programming LLM for code generation, completion, explanation, and multi-turn conversion. |
-    | ibm/granite-34b-code-instruct                                     | 8,192 tokens   | Software programming LLM for code generation, completion, explanation, and multi-turn conversion. |
-    | ibm/granite-3.0-8b-instruct                                       | 4,096 tokens   | Advanced Small Language Model supporting RAG, summarization, classification, code, and agentic AI |
-    | ibm/granite-3.0-3b-a800m-instruct                                | 4,096 tokens   | Highly efficient Mixture of Experts model for RAG, summarization, entity extraction, and classification |
-    | mediatek/breeze-7b-instruct                                       | 4,096 tokens   | Creates diverse synthetic data that mimics the characteristics of real-world data. |
-    | upstage/solar-10.7b-instruct                                      | 4,096 tokens   | Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics. |
-    | writer/palmyra-med-70b-32k                                        | 32k tokens     | Leading LLM for accurate, contextually relevant responses in the medical domain. |
-    | writer/palmyra-med-70b                                            | 32k tokens     | Leading LLM for accurate, contextually relevant responses in the medical domain. |
-    | writer/palmyra-fin-70b-32k                                        | 32k tokens     | Specialized LLM for financial analysis, reporting, and data processing |
-    | 01-ai/yi-large                                                    | 32k tokens     | Powerful model trained on English and Chinese for diverse tasks including chatbot and creative writing. |
-    | deepseek-ai/deepseek-coder-6.7b-instruct                         | 2k tokens      | Powerful coding model offering advanced capabilities in code generation, completion, and infilling |
-    | rakuten/rakutenai-7b-instruct                                     | 1,024 tokens   | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
-    | rakuten/rakutenai-7b-chat                                         | 1,024 tokens   | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
-    | baichuan-inc/baichuan2-13b-chat                                  | 4,096 tokens   | Support Chinese and English chat, coding, math, instruction following, solving quizzes |
-  </Accordion>
-
-  <Accordion title="Local NVIDIA NIM Deployed using WSL2">
-
-    NVIDIA NIM enables you to run powerful LLMs locally on your Windows machine using WSL2 (Windows Subsystem for Linux).
-    This approach allows you to leverage your NVIDIA GPU for private, secure, and cost-effective AI inference without relying on cloud services.
-    Perfect for development, testing, or production scenarios where data privacy or offline capabilities are required.
-
-    Here is a step-by-step guide to setting up a local NVIDIA NIM model:
-
-    1. Follow installation instructions from [NVIDIA Website](https://docs.nvidia.com/nim/wsl2/latest/getting-started.html)
-
-    2. Install the local model. For Llama 3.1-8b follow [instructions](https://build.nvidia.com/meta/llama-3_1-8b-instruct/deploy)
-
-    3. Configure your crewai local models:
-
-    ```python Code
-    from crewai.llm import LLM
-
-    local_nvidia_nim_llm = LLM(
-        model="openai/meta/llama-3.1-8b-instruct", # it's an openai-api compatible model
-        base_url="http://localhost:8000/v1",
-        api_key="<your_api_key|any text if you have not configured it>", # api_key is required, but you can use any text
-    )
-
-    # Then you can use it in your crew:
-
-    @CrewBase
-    class MyCrew():
-        # ...
-
-        @agent
-        def researcher(self) -> Agent:
-            return Agent(
-                config=self.agents_config['researcher'], # type: ignore[index]
-                llm=local_nvidia_nim_llm
-            )
-
-        # ...
-    ```
-  </Accordion>
-
-  <Accordion title="Groq">
-    Set the following environment variables in your `.env` file:
-
-    ```toml Code
-    GROQ_API_KEY=<your-api-key>
-    ```
-
-    Example usage in your CrewAI project:
-    ```python Code
-    llm = LLM(
-        model="groq/llama-3.2-90b-text-preview",
-        temperature=0.7
-    )
-    ```
-    | Model             | Context Window   | Best For                                   |
-    |-------------------|------------------|--------------------------------------------|
-    | Llama 3.1 70B/8B  | 131,072 tokens   | High-performance, large context tasks      |
-    | Llama 3.2 Series  | 8,192 tokens     | General-purpose tasks                      |
-    | Mixtral 8x7B      | 32,768 tokens    | Balanced performance and context           |
-  </Accordion>
-
-  <Accordion title="IBM watsonx.ai">
-    Set the following environment variables in your `.env` file:
-    ```toml Code
-    # Required
-    WATSONX_URL=<your-url>
-    WATSONX_APIKEY=<your-apikey>
-    WATSONX_PROJECT_ID=<your-project-id>
-
-    # Optional
-    WATSONX_TOKEN=<your-token>
-    WATSONX_DEPLOYMENT_SPACE_ID=<your-space-id>
-    ```
-
-    Example usage in your CrewAI project:
-    ```python Code
-    llm = LLM(
-        model="watsonx/meta-llama/llama-3-1-70b-instruct",
-        base_url="https://api.watsonx.ai/v1"
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Ollama (Local LLMs)">
-    1. Install Ollama: [ollama.ai](https://ollama.ai/)
-    2. Run a model: `ollama run llama3`
-    3. Configure:
-
-    ```python Code
-    llm = LLM(
-        model="ollama/llama3:70b",
-        base_url="http://localhost:11434"
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Fireworks AI">
-    Set the following environment variables in your `.env` file:
-    ```toml Code
-    FIREWORKS_API_KEY=<your-api-key>
-    ```
-
-    Example usage in your CrewAI project:
-    ```python Code
-    llm = LLM(
-        model="fireworks_ai/accounts/fireworks/models/llama-v3-70b-instruct",
-        temperature=0.7
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Perplexity AI">
-    Set the following environment variables in your `.env` file:
-    ```toml Code
-    PERPLEXITY_API_KEY=<your-api-key>
-    ```
-
-    Example usage in your CrewAI project:
-    ```python Code
-    llm = LLM(
-        model="llama-3.1-sonar-large-128k-online",
-        base_url="https://api.perplexity.ai/"
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Hugging Face">
-    Set the following environment variables in your `.env` file:
-    ```toml Code
-    HF_TOKEN=<your-api-key>
-    ```
-
-    Example usage in your CrewAI project:
-    ```python Code
-    llm = LLM(
-        model="huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct"
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="SambaNova">
-    Set the following environment variables in your `.env` file:
-
-    ```toml Code
-    SAMBANOVA_API_KEY=<your-api-key>
-    ```
-
-    Example usage in your CrewAI project:
-    ```python Code
-    llm = LLM(
-        model="sambanova/Meta-Llama-3.1-8B-Instruct",
-        temperature=0.7
-    )
-    ```
-    | Model              | Context Window         | Best For                                     |
-    |--------------------|------------------------|----------------------------------------------|
-    | Llama 3.1 70B/8B   | Up to 131,072 tokens   | High-performance, large context tasks        |
-    | Llama 3.1 405B     | 8,192 tokens           | High-performance and output quality          |
-    | Llama 3.2 Series   | 8,192 tokens           | General-purpose, multimodal tasks            |
-    | Llama 3.3 70B      | Up to 131,072 tokens   | High-performance and output quality          |
-    | Qwen2 familly      | 8,192 tokens           | High-performance and output quality          |
-  </Accordion>
-
-  <Accordion title="Cerebras">
-    Set the following environment variables in your `.env` file:
-    ```toml Code
-    # Required
-    CEREBRAS_API_KEY=<your-api-key>
-    ```
-
-    Example usage in your CrewAI project:
-    ```python Code
-    llm = LLM(
-        model="cerebras/llama3.1-70b",
-        temperature=0.7,
-        max_tokens=8192
-    )
-    ```
-
-    <Info>
-      Cerebras features:
-      - Fast inference speeds
-      - Competitive pricing
-      - Good balance of speed and quality
-      - Support for long context windows
-    </Info>
-  </Accordion>
-
-  <Accordion title="Open Router">
-    Set the following environment variables in your `.env` file:
-    ```toml Code
-    OPENROUTER_API_KEY=<your-api-key>
-    ```
-
-    Example usage in your CrewAI project:
-    ```python Code
-    llm = LLM(
-        model="openrouter/deepseek/deepseek-r1",
-        base_url="https://openrouter.ai/api/v1",
-        api_key=OPENROUTER_API_KEY
-    )
-    ```
-
-    <Info>
-      Open Router models:
-      - openrouter/deepseek/deepseek-r1
-      - openrouter/deepseek/deepseek-chat
-    </Info>
-  </Accordion>
-
-  <Accordion title="Nebius AI Studio">
-    Set the following environment variables in your `.env` file:
-    ```toml Code
-    NEBIUS_API_KEY=<your-api-key>
-    ```
-
-    Example usage in your CrewAI project:
-    ```python Code
-    llm = LLM(
-        model="nebius/Qwen/Qwen3-30B-A3B"
-    )
-    ```
-
-    <Info>
-      Nebius AI Studio features:
-      - Large collection of open source models
-      - Higher rate limits
-      - Competitive pricing
-      - Good balance of speed and quality
-    </Info>
-  </Accordion>
-</AccordionGroup>
+Use these pages for deeper provider setup and runtime decisions:
+- Connections and provider setup: [/en/learn/llm-connections](/en/learn/llm-connections)
+- Custom provider integration: [/en/learn/custom-llm](/en/learn/custom-llm)
+- Production routing and reliability patterns: [/en/ai/llms/patterns](/en/ai/llms/patterns)
+- Parameter contract reference: [/en/ai/llms/reference](/en/ai/llms/reference)
 
 ## Streaming Responses
 
diff --git a/docs/en/concepts/planning.mdx b/docs/en/concepts/planning.mdx
index c1992718d..a40eef2e1 100644
--- a/docs/en/concepts/planning.mdx
+++ b/docs/en/concepts/planning.mdx
@@ -10,6 +10,17 @@ mode: "wide"
 The planning feature in CrewAI allows you to add planning capability to your crew. When enabled, before each Crew iteration, 
 all Crew information is sent to an AgentPlanner that will plan the tasks step by step, and this plan will be added to each task description.
 
+## When to Use Planning
+
+- Tasks require multi-step decomposition before execution.
+- You need more consistent execution quality on complex tasks.
+- You want transparent planning traces in crew runs.
+
+## When Not to Use Planning
+
+- Tasks are simple and deterministic.
+- Latency and token budget are strict and planning overhead is not justified.
+
 ### Using the Planning Feature
 
 Getting started with the planning feature is very easy, the only step required is to add `planning=True` to your Crew:
@@ -31,7 +42,7 @@ my_crew = Crew(
 From this point on, your crew will have planning enabled, and the tasks will be planned before each iteration.
 
 <Warning>
-When planning is enabled, crewAI will use `gpt-4o-mini` as the default LLM for planning, which requires a valid OpenAI API key. Since your agents might be using different LLMs, this could cause confusion if you don't have an OpenAI API key configured or if you're experiencing unexpected behavior related to LLM API calls.
+Planning model defaults can vary by version and environment. To avoid implicit provider dependencies, set `planning_llm` explicitly in your crew configuration.
 </Warning>
 
 #### Planning LLM
@@ -152,4 +163,14 @@ A list with 10 bullet points of the most relevant information about AI LLMs.
 **Expected Output:**
 A fully fledged report with the main topics, each with a full section of information. Formatted as markdown without '```'.
 ```
-</CodeGroup>
\ No newline at end of file
+</CodeGroup>
+
+## Common Failure Modes
+
+### Planning adds cost/latency without quality gains
+- Cause: planning enabled for simple tasks.
+- Fix: disable `planning` for straightforward pipelines.
+
+### Unexpected provider authentication errors
+- Cause: implicit planner model/provider assumptions.
+- Fix: set `planning_llm` explicitly and ensure matching credentials are configured.
diff --git a/docs/en/concepts/processes.mdx b/docs/en/concepts/processes.mdx
index b237ce737..8c7a7f186 100644
--- a/docs/en/concepts/processes.mdx
+++ b/docs/en/concepts/processes.mdx
@@ -12,11 +12,20 @@ mode: "wide"
   These processes ensure tasks are distributed and executed efficiently, in alignment with a predefined strategy.
 </Tip>
 
+## When to Use Each Process
+
+- Use `sequential` when task order is fixed and outputs feed directly into the next task.
+- Use `hierarchical` when you need a manager to delegate and validate work dynamically.
+
+## When Not to Use Hierarchical
+
+- You do not need dynamic delegation.
+- You cannot provide a reliable `manager_llm` or `manager_agent`.
+
 ## Process Implementations
 
 - **Sequential**: Executes tasks sequentially, ensuring tasks are completed in an orderly progression.
 - **Hierarchical**: Organizes tasks in a managerial hierarchy, where tasks are delegated and executed based on a structured chain of command. A manager language model (`manager_llm`) or a custom manager agent (`manager_agent`) must be specified in the crew to enable the hierarchical process, facilitating the creation and management of tasks by the manager.
-- **Consensual Process (Planned)**: Aiming for collaborative decision-making among agents on task execution, this process type introduces a democratic approach to task management within CrewAI. It is planned for future development and is not currently implemented in the codebase.
 
 ## The Role of Processes in Teamwork
 Processes enable individual agents to operate as a cohesive unit, streamlining their efforts to achieve common objectives with efficiency and coherence.
@@ -59,9 +68,17 @@ Emulates a corporate hierarchy, CrewAI allows specifying a custom manager agent
 
 ## Process Class: Detailed Overview
 
-The `Process` class is implemented as an enumeration (`Enum`), ensuring type safety and restricting process values to the defined types (`sequential`, `hierarchical`). The consensual process is planned for future inclusion, emphasizing our commitment to continuous development and innovation.
+The `Process` class is implemented as an enumeration (`Enum`), ensuring type safety and restricting process values to the defined types (`sequential`, `hierarchical`).
 
 ## Conclusion
 
 The structured collaboration facilitated by processes within CrewAI is crucial for enabling systematic teamwork among agents. 
-This documentation has been updated to reflect the latest features, enhancements, and the planned integration of the Consensual Process, ensuring users have access to the most current and comprehensive information.
\ No newline at end of file
+## Common Failure Modes
+
+### Hierarchical process fails at startup
+- Cause: missing `manager_llm` or `manager_agent`.
+- Fix: provide one of them explicitly in crew configuration.
+
+### Sequential process produces weak outputs
+- Cause: task boundaries/context are underspecified.
+- Fix: improve task descriptions, expected outputs, and task context chaining.
diff --git a/docs/en/concepts/testing.mdx b/docs/en/concepts/testing.mdx
index dbac110b4..53d024a00 100644
--- a/docs/en/concepts/testing.mdx
+++ b/docs/en/concepts/testing.mdx
@@ -9,9 +9,20 @@ mode: "wide"
 
 Testing is a crucial part of the development process, and it is essential to ensure that your crew is performing as expected. With crewAI, you can easily test your crew and evaluate its performance using the built-in testing capabilities.
 
+## When to Use Testing
+
+- Before promoting a crew to production.
+- After changing prompts, tools, or model configurations.
+- When benchmarking quality/cost/latency tradeoffs.
+
+## When Not to Rely on Testing Alone
+
+- For safety-critical deployments without human review gates.
+- When test datasets are too small or unrepresentative.
+
 ### Using the Testing Feature
 
-We added the CLI command `crewai test` to make it easy to test your crew. This command will run your crew for a specified number of iterations and provide detailed performance metrics. The parameters are `n_iterations` and `model`, which are optional and default to 2 and `gpt-4o-mini` respectively. For now, the only provider available is OpenAI.
+Use the CLI command `crewai test` to run repeated crew executions and compare outputs across iterations. The parameters are `n_iterations` and `model`, which are optional and default to `2` and `gpt-4o-mini`.
 
 ```bash
 crewai test
@@ -47,3 +58,13 @@ A table of scores at the end will show the performance of the crew in terms of t
 | Execution Time (s) |  126  |  145  |    **135**     |                                |                                  |
 
 The example above shows the test results for two runs of the crew with two tasks, with the average total score for each task and the crew as a whole.
+
+## Common Failure Modes
+
+### Scores fluctuate too much between runs
+- Cause: high sampling randomness or unstable prompts.
+- Fix: lower temperature and tighten output constraints.
+
+### Good test scores but poor production quality
+- Cause: test prompts do not match real workload.
+- Fix: build a representative test set from real production inputs.
diff --git a/docs/en/concepts/tools.mdx b/docs/en/concepts/tools.mdx
index 1023d1281..b08e75b04 100644
--- a/docs/en/concepts/tools.mdx
+++ b/docs/en/concepts/tools.mdx
@@ -10,6 +10,17 @@ mode: "wide"
 CrewAI tools empower agents with capabilities ranging from web searching and data analysis to collaboration and delegating tasks among coworkers.
 This documentation outlines how to create, integrate, and leverage these tools within the CrewAI framework, including a new focus on collaboration tools.
 
+## When to Use Tools
+
+- Agents need external data or side effects.
+- You need deterministic actions wrapped in reusable interfaces.
+- You need to connect APIs, files, databases, or browser actions into agent workflows.
+
+## When Not to Use Tools
+
+- The task can be solved entirely from prompt context.
+- The external side effect cannot be made safe or idempotent.
+
 ## What is a Tool?
 
 A tool in CrewAI is a skill or function that agents can utilize to perform various actions.
@@ -285,3 +296,17 @@ writer1 = Agent(
 Tools are pivotal in extending the capabilities of CrewAI agents, enabling them to undertake a broad spectrum of tasks and collaborate effectively.
 When building solutions with CrewAI, leverage both custom and existing tools to empower your agents and enhance the AI ecosystem. Consider utilizing error handling,
 caching mechanisms, and the flexibility of tool arguments to optimize your agents' performance and capabilities.
+
+## Common Failure Modes
+
+### Tool schema mismatch
+- Cause: model-generated arguments do not match tool signature.
+- Fix: tighten tool descriptions and validate input schemas.
+
+### Repeated side effects
+- Cause: retries trigger duplicate writes/actions.
+- Fix: add idempotency keys and deduplication checks in tool logic.
+
+### Tool timeouts under load
+- Cause: unbounded retries or slow external services.
+- Fix: set explicit timeout/retry policy and graceful fallbacks.