From 4eaa8755ebb3c71e97703d0f7edcb848b89686b2 Mon Sep 17 00:00:00 2001
From: Tony Kipkemboi <iamtonykipkemboi@gmail.com>
Date: Wed, 19 Feb 2025 11:06:46 -0500
Subject: [PATCH] docs: update accordions and fix layout (#2110)

Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com>
---
 docs/concepts/llms.mdx | 1037 ++++++++++++++++++----------------------
 1 file changed, 461 insertions(+), 576 deletions(-)
diff --git a/docs/concepts/llms.mdx b/docs/concepts/llms.mdx
index 117face04..12061d1a6 100644
--- a/docs/concepts/llms.mdx
+++ b/docs/concepts/llms.mdx
@@ -27,157 +27,6 @@ Large Language Models (LLMs) are the core intelligence behind CrewAI agents. The
   </Card>
 </CardGroup>
 
-## Available Models and Their Capabilities
-
-Here's a detailed breakdown of supported models and their capabilities, you can compare performance at [lmarena.ai](https://lmarena.ai/?leaderboard) and [artificialanalysis.ai](https://artificialanalysis.ai/):
-
-<Tabs>
-  <Tab title="OpenAI">
-    | Model | Context Window | Best For |
-    |-------|---------------|-----------|
-    | GPT-4 | 8,192 tokens | High-accuracy tasks, complex reasoning |
-    | GPT-4 Turbo | 128,000 tokens | Long-form content, document analysis |
-    | GPT-4o & GPT-4o-mini | 128,000 tokens | Cost-effective large context processing |
-    | o3-mini | 200,000 tokens | Fast reasoning, complex reasoning |
-
-    <Note>
-      1 token ≈ 4 characters in English. For example, 8,192 tokens ≈ 32,768 characters or about 6,000 words.
-    </Note>
-  </Tab>
-  <Tab title="Nvidia NIM">
-    | Model | Context Window | Best For |
-    |-------|---------------|-----------|
-    | nvidia/mistral-nemo-minitron-8b-8k-instruct | 8,192 tokens | State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation. |
-    | nvidia/nemotron-4-mini-hindi-4b-instruct| 4,096 tokens | A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language. |
-    | "nvidia/llama-3.1-nemotron-70b-instruct | 128k tokens | Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA in order to improve the helpfulness of LLM generated responses. |
-    | nvidia/llama3-chatqa-1.5-8b | 128k tokens | Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines. |
-    | nvidia/llama3-chatqa-1.5-70b | 128k tokens | Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines. |
-    | nvidia/vila | 128k tokens | Multi-modal vision-language model that understands text/img/video and creates informative responses |
-    | nvidia/neva-22| 4,096 tokens | Multi-modal vision-language model that understands text/images and generates informative responses |
-    | nvidia/nemotron-mini-4b-instruct | 8,192 tokens | General-purpose tasks |
-    | nvidia/usdcode-llama3-70b-instruct | 128k tokens | State-of-the-art LLM that answers OpenUSD knowledge queries and generates USD-Python code. |
-    | nvidia/nemotron-4-340b-instruct | 4,096 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. |
-    | meta/codellama-70b | 100k tokens | LLM capable of generating code from natural language and vice versa. |
-    | meta/llama2-70b | 4,096 tokens | Cutting-edge large language AI model capable of generating text and code in response to prompts. |
-    | meta/llama3-8b-instruct | 8,192 tokens | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
-    | meta/llama3-70b-instruct | 8,192 tokens | Powers complex conversations with superior contextual understanding, reasoning and text generation. |
-    | meta/llama-3.1-8b-instruct | 128k tokens | Advanced state-of-the-art model with language understanding, superior reasoning, and text generation. |
-    | meta/llama-3.1-70b-instruct | 128k tokens | Powers complex conversations with superior contextual understanding, reasoning and text generation. |
-    | meta/llama-3.1-405b-instruct | 128k tokens | Advanced LLM for synthetic data generation, distillation, and inference for chatbots, coding, and domain-specific tasks. |
-    | meta/llama-3.2-1b-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
-    | meta/llama-3.2-3b-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
-    | meta/llama-3.2-11b-vision-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
-    | meta/llama-3.2-90b-vision-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
-    | meta/llama-3.1-70b-instruct | 128k tokens | Powers complex conversations with superior contextual understanding, reasoning and text generation. |
-    | google/gemma-7b | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. |
-    | google/gemma-2b | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. |
-    | google/codegemma-7b | 8,192 tokens | Cutting-edge model built on Google's Gemma-7B specialized for code generation and code completion. |
-    | google/codegemma-1.1-7b | 8,192 tokens | Advanced programming model for code generation, completion, reasoning, and instruction following. |
-    | google/recurrentgemma-2b | 8,192 tokens | Novel recurrent architecture based language model for faster inference when generating long sequences. |
-    | google/gemma-2-9b-it | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. |
-    | google/gemma-2-27b-it | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. |
-    | google/gemma-2-2b-it | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. |
-    | google/deplot | 512 tokens | One-shot visual language understanding model that translates images of plots into tables. |
-    | google/paligemma | 8,192 tokens | Vision language model adept at comprehending text and visual inputs to produce informative responses. |
-    | mistralai/mistral-7b-instruct-v0.2 | 32k tokens | This LLM follows instructions, completes requests, and generates creative text. |
-    | mistralai/mixtral-8x7b-instruct-v0.1 | 8,192 tokens | An MOE LLM that follows instructions, completes requests, and generates creative text. |
-    | mistralai/mistral-large | 4,096 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. |
-    | mistralai/mixtral-8x22b-instruct-v0.1 | 8,192 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. |
-    | mistralai/mistral-7b-instruct-v0.3 | 32k tokens | This LLM follows instructions, completes requests, and generates creative text. |
-    | nv-mistralai/mistral-nemo-12b-instruct | 128k tokens | Most advanced language model for reasoning, code, multilingual tasks; runs on a single GPU. |
-    | mistralai/mamba-codestral-7b-v0.1 | 256k tokens | Model for writing and interacting with code across a wide range of programming languages and tasks. |
-    | microsoft/phi-3-mini-128k-instruct | 128K tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
-    | microsoft/phi-3-mini-4k-instruct | 4,096 tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
-    | microsoft/phi-3-small-8k-instruct | 8,192 tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
-    | microsoft/phi-3-small-128k-instruct | 128K tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
-    | microsoft/phi-3-medium-4k-instruct | 4,096 tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
-    | microsoft/phi-3-medium-128k-instruct | 128K tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
-    | microsoft/phi-3.5-mini-instruct | 128K tokens | Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments |
-    | microsoft/phi-3.5-moe-instruct | 128K tokens | Advanced LLM based on Mixture of Experts architecure to deliver compute efficient content generation |
-    | microsoft/kosmos-2 | 1,024 tokens | Groundbreaking multimodal model designed to understand and reason about visual elements in images. |
-    | microsoft/phi-3-vision-128k-instruct | 128k tokens | Cutting-edge open multimodal model exceling in high-quality reasoning from images. |
-    | microsoft/phi-3.5-vision-instruct | 128k tokens | Cutting-edge open multimodal model exceling in high-quality reasoning from images. |
-    | databricks/dbrx-instruct | 12k tokens | A general-purpose LLM with state-of-the-art performance in language understanding, coding, and RAG. |
-    | snowflake/arctic | 1,024 tokens | Delivers high efficiency inference for enterprise applications focused on SQL generation and coding. |
-    | aisingapore/sea-lion-7b-instruct | 4,096 tokens | LLM to represent and serve the linguistic and cultural diversity of Southeast Asia |
-    | ibm/granite-8b-code-instruct | 4,096 tokens | Software programming LLM for code generation, completion, explanation, and multi-turn conversion. |
-    | ibm/granite-34b-code-instruct | 8,192 tokens | Software programming LLM for code generation, completion, explanation, and multi-turn conversion. |
-    | ibm/granite-3.0-8b-instruct | 4,096 tokens | Advanced Small Language Model supporting RAG, summarization, classification, code, and agentic AI |
-    | ibm/granite-3.0-3b-a800m-instruct | 4,096 tokens | Highly efficient Mixture of Experts model for RAG, summarization, entity extraction, and classification |
-    | mediatek/breeze-7b-instruct | 4,096 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. |
-    | upstage/solar-10.7b-instruct | 4,096 tokens | Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics. |
-    | writer/palmyra-med-70b-32k | 32k tokens | Leading LLM for accurate, contextually relevant responses in the medical domain. |
-    | writer/palmyra-med-70b | 32k tokens | Leading LLM for accurate, contextually relevant responses in the medical domain. |
-    | writer/palmyra-fin-70b-32k | 32k tokens | Specialized LLM for financial analysis, reporting, and data processing |
-    | 01-ai/yi-large | 32k tokens | Powerful model trained on English and Chinese for diverse tasks including chatbot and creative writing. |
-    | deepseek-ai/deepseek-coder-6.7b-instruct | 2k tokens | Powerful coding model offering advanced capabilities in code generation, completion, and infilling |
-    | rakuten/rakutenai-7b-instruct | 1,024 tokens | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
-    | rakuten/rakutenai-7b-chat | 1,024 tokens | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
-    | baichuan-inc/baichuan2-13b-chat | 4,096 tokens | Support Chinese and English chat, coding, math, instruction following, solving quizzes |
-
-    <Note>
-      NVIDIA's NIM support for models is expanding continuously! For the most up-to-date list of available models, please visit build.nvidia.com.
-    </Note>
-  </Tab>
-  <Tab title="Gemini">
-    | Model | Context Window | Best For |
-    |-------|---------------|-----------|
-    | gemini-2.0-flash-exp | 1M tokens | Higher quality at faster speed, multimodal model, good for most tasks |
-    | gemini-1.5-flash | 1M tokens | Balanced multimodal model, good for most tasks |
-    | gemini-1.5-flash-8B | 1M tokens | Fastest, most cost-efficient, good for high-frequency tasks |
-    | gemini-1.5-pro | 2M tokens | Best performing, wide variety of reasoning tasks including logical reasoning, coding, and creative collaboration |
-
-    <Tip>
-      Google's Gemini models are all multimodal, supporting audio, images, video and text, supporting context caching, json schema, function calling, etc.
-
-      These models are available via API_KEY from 
-      [The Gemini API](https://ai.google.dev/gemini-api/docs) and also from 
-      [Google Cloud Vertex](https://cloud.google.com/vertex-ai/generative-ai/docs/migrate/migrate-google-ai) as part of the
-      [Model Garden](https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models).
-    </Tip>
-  </Tab>
-  <Tab title="Groq">
-    | Model | Context Window | Best For |
-    |-------|---------------|-----------|
-    | Llama 3.1 70B/8B | 131,072 tokens | High-performance, large context tasks |
-    | Llama 3.2 Series | 8,192 tokens | General-purpose tasks |
-    | Mixtral 8x7B | 32,768 tokens | Balanced performance and context |
-
-    <Tip>
-      Groq is known for its fast inference speeds, making it suitable for real-time applications.
-    </Tip>
-  </Tab>
-  <Tab title="SambaNova">
-    | Model | Context Window | Best For |
-    |-------|---------------|-----------|
-    | Llama 3.1 70B/8B | Up to 131,072 tokens | High-performance, large context tasks |
-    | Llama 3.1 405B | 8,192 tokens | High-performance and output quality |
-    | Llama 3.2 Series | 8,192 tokens | General-purpose tasks, multimodal |
-    | Llama 3.3 70B | Up to 131,072 tokens | High-performance and output quality|
-    | Qwen2 familly | 8,192 tokens | High-performance and output quality |
-
-    <Tip>
-      [SambaNova](https://cloud.sambanova.ai/) has several models with fast inference speed at full precision.
-    </Tip>
-  </Tab>
-  <Tab title="Others">
-    | Provider | Context Window | Key Features |
-    |----------|---------------|--------------|
-    | Deepseek Chat | 64,000 tokens | Specialized in technical discussions |
-    | Deepseek R1 | 64,000 tokens | Affordable reasoning model |
-    | Claude 3 | Up to 200K tokens | Strong reasoning, code understanding |
-    | Gemma Series | 8,192 tokens | Efficient, smaller-scale tasks |
-
-    <Info>
-      Provider selection should consider factors like:
-      - API availability in your region
-      - Pricing structure
-      - Required features (e.g., streaming, function calling)
-      - Performance requirements
-    </Info>
-  </Tab>
-</Tabs>
-
 ## Setting Up Your LLM
 
 There are three ways to configure LLMs in CrewAI. Choose the method that best fits your workflow:
@@ -206,102 +55,12 @@ There are three ways to configure LLMs in CrewAI. Choose the method that best fi
 
     ```yaml
     researcher:
-        # Agent Definition
         role: Research Specialist
         goal: Conduct comprehensive research and analysis
         backstory: A dedicated research professional with years of experience
         verbose: true
-
-        # Model Selection (uncomment your choice)
-        
-        # OpenAI Models - Known for reliability and performance
-        llm: openai/gpt-4o-mini
-        # llm: openai/gpt-4        # More accurate but expensive
-        # llm: openai/gpt-4-turbo  # Fast with large context
-        # llm: openai/gpt-4o       # Optimized for longer texts
-        # llm: openai/o1-preview   # Latest features
-        # llm: openai/o1-mini      # Cost-effective
-
-        # Azure Models - For enterprise deployments
-        # llm: azure/gpt-4o-mini
-        # llm: azure/gpt-4
-        # llm: azure/gpt-35-turbo
-
-        # Anthropic Models - Strong reasoning capabilities
-        # llm: anthropic/claude-3-opus-20240229-v1:0
-        # llm: anthropic/claude-3-sonnet-20240229-v1:0
-        # llm: anthropic/claude-3-haiku-20240307-v1:0
-        # llm: anthropic/claude-2.1
-        # llm: anthropic/claude-2.0
-
-        # Google Models - Strong reasoning, large cachable context window, multimodal
-        # llm: gemini/gemini-1.5-pro-latest
-        # llm: gemini/gemini-1.5-flash-latest
-        # llm: gemini/gemini-1.5-flash-8b-latest
-
-        # AWS Bedrock Models - Enterprise-grade
-        # llm: bedrock/anthropic.claude-3-sonnet-20240229-v1:0
-        # llm: bedrock/anthropic.claude-v2:1
-        # llm: bedrock/amazon.titan-text-express-v1
-        # llm: bedrock/meta.llama2-70b-chat-v1
-
-        # Amazon SageMaker Models - Enterprise-grade
-        # llm: sagemaker/<my-endpoint>
-
-        # Mistral Models - Open source alternative
-        # llm: mistral/mistral-large-latest
-        # llm: mistral/mistral-medium-latest
-        # llm: mistral/mistral-small-latest
-
-        # Groq Models - Fast inference
-        # llm: groq/mixtral-8x7b-32768
-        # llm: groq/llama-3.1-70b-versatile
-        # llm: groq/llama-3.2-90b-text-preview
-        # llm: groq/gemma2-9b-it
-        # llm: groq/gemma-7b-it
-
-        # IBM watsonx.ai Models - Enterprise features
-        # llm: watsonx/ibm/granite-13b-chat-v2
-        # llm: watsonx/meta-llama/llama-3-1-70b-instruct
-        # llm: watsonx/bigcode/starcoder2-15b
-
-        # Ollama Models - Local deployment
-        # llm: ollama/llama3:70b
-        # llm: ollama/codellama
-        # llm: ollama/mistral
-        # llm: ollama/mixtral
-        # llm: ollama/phi
-
-        # Fireworks AI Models - Specialized tasks
-        # llm: fireworks_ai/accounts/fireworks/models/llama-v3-70b-instruct
-        # llm: fireworks_ai/accounts/fireworks/models/mixtral-8x7b
-        # llm: fireworks_ai/accounts/fireworks/models/zephyr-7b-beta
-
-        # Perplexity AI Models - Research focused
-        # llm: pplx/llama-3.1-sonar-large-128k-online
-        # llm: pplx/mistral-7b-instruct
-        # llm: pplx/codellama-34b-instruct
-        # llm: pplx/mixtral-8x7b-instruct
-
-        # Hugging Face Models - Community models
-        # llm: huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct
-        # llm: huggingface/mistralai/Mixtral-8x7B-Instruct-v0.1
-        # llm: huggingface/tiiuae/falcon-180B-chat
-        # llm: huggingface/google/gemma-7b-it
-
-        # Nvidia NIM Models - GPU-optimized
-        # llm: nvidia_nim/meta/llama3-70b-instruct
-        # llm: nvidia_nim/mistral/mixtral-8x7b
-        # llm: nvidia_nim/google/gemma-7b
-
-        # SambaNova Models - Enterprise AI
-        # llm: sambanova/Meta-Llama-3.1-8B-Instruct
-        # llm: sambanova/BioMistral-7B
-        # llm: sambanova/Falcon-180B
-
-        # Open Router Models - Affordable reasoning
-        # llm: openrouter/deepseek/deepseek-r1
-        # llm: openrouter/deepseek/deepseek-chat
+        llm: openai/gpt-4o-mini # your model here 
+        # (see provider configuration examples below for more)
     ```
 
     <Info>
@@ -349,6 +108,465 @@ There are three ways to configure LLMs in CrewAI. Choose the method that best fi
   </Tab>
 </Tabs>
 
+## Provider Configuration Examples
+
+
+CrewAI supports a multitude of LLM providers, each offering unique features, authentication methods, and model capabilities. 
+In this section, you'll find detailed examples that help you select, configure, and optimize the LLM that best fits your project's needs.
+
+<AccordionGroup>
+  <Accordion title="OpenAI">
+    Set the following environment variables in your `.env` file:
+
+    ```toml Code
+    # Required
+    OPENAI_API_KEY=sk-...
+    
+    # Optional
+    OPENAI_API_BASE=<custom-base-url>
+    OPENAI_ORGANIZATION=<your-org-id>
+    ```
+
+    Example usage in your CrewAI project:
+    ```python Code
+    from crewai import LLM
+
+    llm = LLM(
+        model="openai/gpt-4", # call model by provider/model_name
+        temperature=0.8,
+        max_tokens=150,
+        top_p=0.9,
+        frequency_penalty=0.1,
+        presence_penalty=0.1,
+        stop=["END"],
+        seed=42
+    )
+    ```
+
+    OpenAI is one of the leading providers of LLMs with a wide range of models and features.
+
+    | Model               | Context Window   | Best For                                      |
+    |---------------------|------------------|-----------------------------------------------|
+    | GPT-4               | 8,192 tokens     | High-accuracy tasks, complex reasoning        |
+    | GPT-4 Turbo         | 128,000 tokens   | Long-form content, document analysis          |
+    | GPT-4o & GPT-4o-mini  | 128,000 tokens  | Cost-effective large context processing       |
+    | o3-mini             | 200,000 tokens   | Fast reasoning, complex reasoning             |
+    | o1-mini             | 128,000 tokens   | Fast reasoning, complex reasoning             |
+    | o1-preview          | 128,000 tokens   | Fast reasoning, complex reasoning             |
+    | o1                  | 200,000 tokens   | Fast reasoning, complex reasoning             |
+  </Accordion>
+
+  <Accordion title="Anthropic">
+    ```toml Code
+    ANTHROPIC_API_KEY=sk-ant-...
+    ```
+
+    Example usage in your CrewAI project:
+    ```python Code
+    llm = LLM(
+        model="anthropic/claude-3-sonnet-20240229-v1:0",
+        temperature=0.7
+    )
+    ```
+  </Accordion>
+
+  <Accordion title="Google">
+    Set the following environment variables in your `.env` file:
+
+    ```toml Code
+    # Option 1: Gemini accessed with an API key.
+    # https://ai.google.dev/gemini-api/docs/api-key
+    GEMINI_API_KEY=<your-api-key>
+
+    # Option 2: Vertex AI IAM credentials for Gemini, Anthropic, and Model Garden.
+    # https://cloud.google.com/vertex-ai/generative-ai/docs/overview
+    ```
+
+    Get credentials from your Google Cloud Console and save it to a JSON file with the following code:
+    ```python Code
+    import json
+
+    file_path = 'path/to/vertex_ai_service_account.json'
+
+    # Load the JSON file
+    with open(file_path, 'r') as file:
+        vertex_credentials = json.load(file)
+
+    # Convert the credentials to a JSON string
+    vertex_credentials_json = json.dumps(vertex_credentials)
+    ```
+
+    Example usage in your CrewAI project:
+    ```python Code
+    from crewai import LLM
+
+    llm = LLM(
+        model="gemini/gemini-1.5-pro-latest",
+        temperature=0.7,
+        vertex_credentials=vertex_credentials_json
+    )
+    ```
+    Google offers a range of powerful models optimized for different use cases:
+
+    | Model                  | Context Window | Best For                                                          |
+    |-----------------------|----------------|------------------------------------------------------------------|
+    | gemini-2.0-flash-exp  | 1M tokens      | Higher quality at faster speed, multimodal model, good for most tasks |
+    | gemini-1.5-flash      | 1M tokens      | Balanced multimodal model, good for most tasks                    |
+    | gemini-1.5-flash-8B   | 1M tokens      | Fastest, most cost-efficient, good for high-frequency tasks       |
+    | gemini-1.5-pro        | 2M tokens      | Best performing, wide variety of reasoning tasks including logical reasoning, coding, and creative collaboration |
+  </Accordion>
+
+  <Accordion title="Azure">
+    ```toml Code
+    # Required
+    AZURE_API_KEY=<your-api-key>
+    AZURE_API_BASE=<your-resource-url>
+    AZURE_API_VERSION=<api-version>
+    
+    # Optional
+    AZURE_AD_TOKEN=<your-azure-ad-token>
+    AZURE_API_TYPE=<your-azure-api-type>
+    ```
+
+    Example usage in your CrewAI project:
+    ```python Code
+    llm = LLM(
+        model="azure/gpt-4",
+        api_version="2023-05-15"
+    )
+    ```
+  </Accordion>
+
+  <Accordion title="AWS Bedrock">
+    ```toml Code
+    AWS_ACCESS_KEY_ID=<your-access-key>
+    AWS_SECRET_ACCESS_KEY=<your-secret-key>
+    AWS_DEFAULT_REGION=<your-region>
+    ```
+
+    Example usage in your CrewAI project:
+    ```python Code
+    llm = LLM(
+        model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0"
+    )
+    ```
+  </Accordion>
+  
+  <Accordion title="Amazon SageMaker">
+    ```toml Code
+    AWS_ACCESS_KEY_ID=<your-access-key>
+    AWS_SECRET_ACCESS_KEY=<your-secret-key>
+    AWS_DEFAULT_REGION=<your-region>
+    ```
+
+    Example usage in your CrewAI project:
+    ```python Code
+    llm = LLM(
+        model="sagemaker/<my-endpoint>"
+    )
+    ```
+  </Accordion>
+
+  <Accordion title="Mistral">
+    Set the following environment variables in your `.env` file:
+    ```toml Code
+    MISTRAL_API_KEY=<your-api-key>
+    ```
+
+    Example usage in your CrewAI project:
+    ```python Code
+    llm = LLM(
+        model="mistral/mistral-large-latest",
+        temperature=0.7
+    )
+    ```
+  </Accordion>
+
+  <Accordion title="Nvidia NIM">
+    Set the following environment variables in your `.env` file:
+    ```toml Code
+    NVIDIA_API_KEY=<your-api-key>
+    ```
+
+    Example usage in your CrewAI project:
+    ```python Code
+    llm = LLM(
+        model="nvidia_nim/meta/llama3-70b-instruct",
+        temperature=0.7
+    )
+    ```
+
+    Nvidia NIM provides a comprehensive suite of models for various use cases, from general-purpose tasks to specialized applications.
+
+    | Model                                                                   | Context Window | Best For                                                          |
+    |-------------------------------------------------------------------------|----------------|-------------------------------------------------------------------|
+    | nvidia/mistral-nemo-minitron-8b-8k-instruct                              | 8,192 tokens   | State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation. |
+    | nvidia/nemotron-4-mini-hindi-4b-instruct                                 | 4,096 tokens   | A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language. |
+    | nvidia/llama-3.1-nemotron-70b-instruct                                  | 128k tokens    | Customized for enhanced helpfulness in responses                  |
+    | nvidia/llama3-chatqa-1.5-8b                                                | 128k tokens    | Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines. |
+    | nvidia/llama3-chatqa-1.5-70b                                               | 128k tokens    | Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines. |
+    | nvidia/vila                                                             | 128k tokens    | Multi-modal vision-language model that understands text/img/video and creates informative responses |
+    | nvidia/neva-22                                                          | 4,096 tokens   | Multi-modal vision-language model that understands text/images and generates informative responses |
+    | nvidia/nemotron-mini-4b-instruct                                         | 8,192 tokens   | General-purpose tasks |
+    | nvidia/usdcode-llama3-70b-instruct                                       | 128k tokens    | State-of-the-art LLM that answers OpenUSD knowledge queries and generates USD-Python code. |
+    | nvidia/nemotron-4-340b-instruct                                          | 4,096 tokens   | Creates diverse synthetic data that mimics the characteristics of real-world data. |
+    | meta/codellama-70b                                                      | 100k tokens    | LLM capable of generating code from natural language and vice versa. |
+    | meta/llama2-70b                                                         | 4,096 tokens   | Cutting-edge large language AI model capable of generating text and code in response to prompts. |
+    | meta/llama3-8b-instruct                                                | 8,192 tokens   | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
+    | meta/llama3-70b-instruct                                               | 8,192 tokens   | Powers complex conversations with superior contextual understanding, reasoning and text generation. |
+    | meta/llama-3.1-8b-instruct                                             | 128k tokens    | Advanced state-of-the-art model with language understanding, superior reasoning, and text generation. |
+    | meta/llama-3.1-70b-instruct                                            | 128k tokens    | Powers complex conversations with superior contextual understanding, reasoning and text generation. |
+    | meta/llama-3.1-405b-instruct                                           | 128k tokens    | Advanced LLM for synthetic data generation, distillation, and inference for chatbots, coding, and domain-specific tasks. |
+    | meta/llama-3.2-1b-instruct                                             | 128k tokens    | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
+    | meta/llama-3.2-3b-instruct                                             | 128k tokens    | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
+    | meta/llama-3.2-11b-vision-instruct                                     | 128k tokens    | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
+    | meta/llama-3.2-90b-vision-instruct                                     | 128k tokens    | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
+    | google/gemma-7b                                                        | 8,192 tokens   | Cutting-edge text generation model text understanding, transformation, and code generation. |
+    | google/gemma-2b                                                        | 8,192 tokens   | Cutting-edge text generation model text understanding, transformation, and code generation. |
+    | google/codegemma-7b                                                    | 8,192 tokens   | Cutting-edge model built on Google's Gemma-7B specialized for code generation and code completion. |
+    | google/codegemma-1.1-7b                                               | 8,192 tokens   | Advanced programming model for code generation, completion, reasoning, and instruction following. |
+    | google/recurrentgemma-2b                                              | 8,192 tokens   | Novel recurrent architecture based language model for faster inference when generating long sequences. |
+    | google/gemma-2-9b-it                                                  | 8,192 tokens   | Cutting-edge text generation model text understanding, transformation, and code generation. |
+    | google/gemma-2-27b-it                                                 | 8,192 tokens   | Cutting-edge text generation model text understanding, transformation, and code generation. |
+    | google/gemma-2-2b-it                                                  | 8,192 tokens   | Cutting-edge text generation model text understanding, transformation, and code generation. |
+    | google/deplot                                                         | 512 tokens     | One-shot visual language understanding model that translates images of plots into tables. |
+    | google/paligemma                                                      | 8,192 tokens   | Vision language model adept at comprehending text and visual inputs to produce informative responses. |
+    | mistralai/mistral-7b-instruct-v0.2                                   | 32k tokens     | This LLM follows instructions, completes requests, and generates creative text. |
+    | mistralai/mixtral-8x7b-instruct-v0.1                                 | 8,192 tokens   | An MOE LLM that follows instructions, completes requests, and generates creative text. |
+    | mistralai/mistral-large                                              | 4,096 tokens   | Creates diverse synthetic data that mimics the characteristics of real-world data. |
+    | mistralai/mixtral-8x22b-instruct-v0.1                               | 8,192 tokens   | Creates diverse synthetic data that mimics the characteristics of real-world data. |
+    | mistralai/mistral-7b-instruct-v0.3                                  | 32k tokens     | This LLM follows instructions, completes requests, and generates creative text. |
+    | nv-mistralai/mistral-nemo-12b-instruct                              | 128k tokens    | Most advanced language model for reasoning, code, multilingual tasks; runs on a single GPU. |
+    | mistralai/mamba-codestral-7b-v0.1                                   | 256k tokens    | Model for writing and interacting with code across a wide range of programming languages and tasks. |
+    | microsoft/phi-3-mini-128k-instruct                                  | 128K tokens    | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
+    | microsoft/phi-3-mini-4k-instruct                                    | 4,096 tokens   | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
+    | microsoft/phi-3-small-8k-instruct                                   | 8,192 tokens   | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
+    | microsoft/phi-3-small-128k-instruct                                 | 128K tokens    | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
+    | microsoft/phi-3-medium-4k-instruct                                  | 4,096 tokens   | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
+    | microsoft/phi-3-medium-128k-instruct                                | 128K tokens    | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
+    | microsoft/phi-3.5-mini-instruct                                     | 128K tokens    | Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments |
+    | microsoft/phi-3.5-moe-instruct                                      | 128K tokens    | Advanced LLM based on Mixture of Experts architecure to deliver compute efficient content generation |
+    | microsoft/kosmos-2                                                  | 1,024 tokens   | Groundbreaking multimodal model designed to understand and reason about visual elements in images. |
+    | microsoft/phi-3-vision-128k-instruct                               | 128k tokens    | Cutting-edge open multimodal model exceling in high-quality reasoning from images. |
+    | microsoft/phi-3.5-vision-instruct                                  | 128k tokens    | Cutting-edge open multimodal model exceling in high-quality reasoning from images. |
+    | databricks/dbrx-instruct                                           | 12k tokens     | A general-purpose LLM with state-of-the-art performance in language understanding, coding, and RAG. |
+    | snowflake/arctic                                                   | 1,024 tokens   | Delivers high efficiency inference for enterprise applications focused on SQL generation and coding. |
+    | aisingapore/sea-lion-7b-instruct                                  | 4,096 tokens   | LLM to represent and serve the linguistic and cultural diversity of Southeast Asia |
+    | ibm/granite-8b-code-instruct                                      | 4,096 tokens   | Software programming LLM for code generation, completion, explanation, and multi-turn conversion. |
+    | ibm/granite-34b-code-instruct                                     | 8,192 tokens   | Software programming LLM for code generation, completion, explanation, and multi-turn conversion. |
+    | ibm/granite-3.0-8b-instruct                                       | 4,096 tokens   | Advanced Small Language Model supporting RAG, summarization, classification, code, and agentic AI |
+    | ibm/granite-3.0-3b-a800m-instruct                                | 4,096 tokens   | Highly efficient Mixture of Experts model for RAG, summarization, entity extraction, and classification |
+    | mediatek/breeze-7b-instruct                                       | 4,096 tokens   | Creates diverse synthetic data that mimics the characteristics of real-world data. |
+    | upstage/solar-10.7b-instruct                                      | 4,096 tokens   | Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics. |
+    | writer/palmyra-med-70b-32k                                        | 32k tokens     | Leading LLM for accurate, contextually relevant responses in the medical domain. |
+    | writer/palmyra-med-70b                                            | 32k tokens     | Leading LLM for accurate, contextually relevant responses in the medical domain. |
+    | writer/palmyra-fin-70b-32k                                        | 32k tokens     | Specialized LLM for financial analysis, reporting, and data processing |
+    | 01-ai/yi-large                                                    | 32k tokens     | Powerful model trained on English and Chinese for diverse tasks including chatbot and creative writing. |
+    | deepseek-ai/deepseek-coder-6.7b-instruct                         | 2k tokens      | Powerful coding model offering advanced capabilities in code generation, completion, and infilling |
+    | rakuten/rakutenai-7b-instruct                                     | 1,024 tokens   | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
+    | rakuten/rakutenai-7b-chat                                         | 1,024 tokens   | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
+    | baichuan-inc/baichuan2-13b-chat                                  | 4,096 tokens   | Support Chinese and English chat, coding, math, instruction following, solving quizzes |
+  </Accordion>
+
+  <Accordion title="Groq">
+    Set the following environment variables in your `.env` file:
+
+    ```toml Code
+    GROQ_API_KEY=<your-api-key>
+    ```
+
+    Example usage in your CrewAI project:
+    ```python Code
+    llm = LLM(
+        model="groq/llama-3.2-90b-text-preview",
+        temperature=0.7
+    )
+    ```
+    | Model             | Context Window   | Best For                                   |
+    |-------------------|------------------|--------------------------------------------|
+    | Llama 3.1 70B/8B  | 131,072 tokens   | High-performance, large context tasks      |
+    | Llama 3.2 Series  | 8,192 tokens     | General-purpose tasks                      |
+    | Mixtral 8x7B      | 32,768 tokens    | Balanced performance and context           |
+  </Accordion>
+
+  <Accordion title="IBM watsonx.ai">
+    Set the following environment variables in your `.env` file:
+    ```toml Code
+    # Required
+    WATSONX_URL=<your-url>
+    WATSONX_APIKEY=<your-apikey>
+    WATSONX_PROJECT_ID=<your-project-id>
+    
+    # Optional
+    WATSONX_TOKEN=<your-token>
+    WATSONX_DEPLOYMENT_SPACE_ID=<your-space-id>
+    ```
+
+    Example usage in your CrewAI project:
+    ```python Code
+    llm = LLM(
+        model="watsonx/meta-llama/llama-3-1-70b-instruct",
+        base_url="https://api.watsonx.ai/v1"
+    )
+    ```
+  </Accordion>
+
+  <Accordion title="Ollama (Local LLMs)">
+    1. Install Ollama: [ollama.ai](https://ollama.ai/)
+    2. Run a model: `ollama run llama2`
+    3. Configure:
+
+    ```python Code
+    llm = LLM(
+        model="ollama/llama3:70b",
+        base_url="http://localhost:11434"
+    )
+    ```
+  </Accordion>
+
+  <Accordion title="Fireworks AI">
+    Set the following environment variables in your `.env` file:
+    ```toml Code
+    FIREWORKS_API_KEY=<your-api-key>
+    ```
+
+    Example usage in your CrewAI project:
+    ```python Code
+    llm = LLM(
+        model="fireworks_ai/accounts/fireworks/models/llama-v3-70b-instruct",
+        temperature=0.7
+    )
+    ```
+  </Accordion>
+
+  <Accordion title="Perplexity AI">
+    Set the following environment variables in your `.env` file:
+    ```toml Code
+    PERPLEXITY_API_KEY=<your-api-key>
+    ```
+
+    Example usage in your CrewAI project:
+    ```python Code
+    llm = LLM(
+        model="llama-3.1-sonar-large-128k-online",
+        base_url="https://api.perplexity.ai/"
+    )
+    ```
+  </Accordion>
+
+  <Accordion title="Hugging Face">
+    Set the following environment variables in your `.env` file:
+    ```toml Code
+    HUGGINGFACE_API_KEY=<your-api-key>
+    ```
+
+    Example usage in your CrewAI project:
+    ```python Code
+    llm = LLM(
+        model="huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct",
+        base_url="your_api_endpoint"
+    )
+    ```
+  </Accordion>
+
+  <Accordion title="SambaNova">
+    Set the following environment variables in your `.env` file:
+
+    ```toml Code
+    SAMBANOVA_API_KEY=<your-api-key>
+    ```
+
+    Example usage in your CrewAI project:
+    ```python Code
+    llm = LLM(
+        model="sambanova/Meta-Llama-3.1-8B-Instruct",
+        temperature=0.7
+    )
+    ```
+    | Model              | Context Window         | Best For                                     |
+    |--------------------|------------------------|----------------------------------------------|
+    | Llama 3.1 70B/8B   | Up to 131,072 tokens   | High-performance, large context tasks        |
+    | Llama 3.1 405B     | 8,192 tokens           | High-performance and output quality          |
+    | Llama 3.2 Series   | 8,192 tokens           | General-purpose, multimodal tasks            |
+    | Llama 3.3 70B      | Up to 131,072 tokens   | High-performance and output quality          |
+    | Qwen2 familly      | 8,192 tokens           | High-performance and output quality          |
+  </Accordion>
+
+  <Accordion title="Cerebras">
+    Set the following environment variables in your `.env` file:
+    ```toml Code
+    # Required
+    CEREBRAS_API_KEY=<your-api-key>
+    ```
+
+    Example usage in your CrewAI project:
+    ```python Code
+    llm = LLM(
+        model="cerebras/llama3.1-70b",
+        temperature=0.7,
+        max_tokens=8192
+    )
+    ```
+
+    <Info>
+      Cerebras features:
+      - Fast inference speeds
+      - Competitive pricing
+      - Good balance of speed and quality
+      - Support for long context windows
+    </Info>
+  </Accordion>
+
+  <Accordion title="Open Router">
+    Set the following environment variables in your `.env` file:
+    ```toml Code
+    OPENROUTER_API_KEY=<your-api-key>
+    ```
+    
+    Example usage in your CrewAI project:
+    ```python Code
+    llm = LLM(
+        model="openrouter/deepseek/deepseek-r1",
+        base_url="https://openrouter.ai/api/v1",
+        api_key=OPENROUTER_API_KEY
+    )
+    ```
+
+    <Info>
+      Open Router models:
+      - openrouter/deepseek/deepseek-r1
+      - openrouter/deepseek/deepseek-chat
+    </Info>
+  </Accordion>
+</AccordionGroup>
+
+## Structured LLM Calls
+
+CrewAI supports structured responses from LLM calls by allowing you to define a `response_format` using a Pydantic model. This enables the framework to automatically parse and validate the output, making it easier to integrate the response into your application without manual post-processing.
+
+For example, you can define a Pydantic model to represent the expected response structure and pass it as the `response_format` when instantiating the LLM. The model will then be used to convert the LLM output into a structured Python object.
+
+```python Code
+from crewai import LLM
+
+class Dog(BaseModel):
+    name: str
+    age: int
+    breed: str
+
+
+llm = LLM(model="gpt-4o", response_format=Dog)
+
+response = llm.call(
+    "Analyze the following messages and return the name, age, and breed. "
+    "Meet Kona! She is 3 years old and is a black german shepherd."
+)
+print(response)
+
+# Output:
+# Dog(name='Kona', age=3, breed='black german shepherd')
+```
+
 ## Advanced Features and Optimization
 
 Learn how to get the most out of your LLM configuration:
@@ -417,339 +635,6 @@ Learn how to get the most out of your LLM configuration:
   </Accordion>
 </AccordionGroup>
 
-## Provider Configuration Examples
-
-<AccordionGroup>
-  <Accordion title="OpenAI">
-    ```python Code
-    # Required
-    OPENAI_API_KEY=sk-...
-    
-    # Optional
-    OPENAI_API_BASE=<custom-base-url>
-    OPENAI_ORGANIZATION=<your-org-id>
-    ```
-
-    Example usage:
-    ```python Code
-    from crewai import LLM
-
-    llm = LLM(
-        model="gpt-4",
-        temperature=0.8,
-        max_tokens=150,
-        top_p=0.9,
-        frequency_penalty=0.1,
-        presence_penalty=0.1,
-        stop=["END"],
-        seed=42
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Anthropic">
-    ```python Code
-    ANTHROPIC_API_KEY=sk-ant-...
-    ```
-
-    Example usage:
-    ```python Code
-    llm = LLM(
-        model="anthropic/claude-3-sonnet-20240229-v1:0",
-        temperature=0.7
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Google">
-    ```python Code
-    # Option 1: Gemini accessed with an API key.
-    # https://ai.google.dev/gemini-api/docs/api-key
-    GEMINI_API_KEY=<your-api-key>
-
-    # Option 2: Vertex AI IAM credentials for Gemini, Anthropic, and Model Garden.
-    # https://cloud.google.com/vertex-ai/generative-ai/docs/overview
-    ```
-
-    Get credentials:
-    ```python Code
-    import json
-
-    file_path = 'path/to/vertex_ai_service_account.json'
-
-    # Load the JSON file
-    with open(file_path, 'r') as file:
-        vertex_credentials = json.load(file)
-
-    # Convert the credentials to a JSON string
-    vertex_credentials_json = json.dumps(vertex_credentials)
-    ```
-
-    Example usage:
-    ```python Code
-    from crewai import LLM
-
-    llm = LLM(
-        model="gemini/gemini-1.5-pro-latest",
-        temperature=0.7,
-        vertex_credentials=vertex_credentials_json
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Azure">
-    ```python Code
-    # Required
-    AZURE_API_KEY=<your-api-key>
-    AZURE_API_BASE=<your-resource-url>
-    AZURE_API_VERSION=<api-version>
-    
-    # Optional
-    AZURE_AD_TOKEN=<your-azure-ad-token>
-    AZURE_API_TYPE=<your-azure-api-type>
-    ```
-
-    Example usage:
-    ```python Code
-    llm = LLM(
-        model="azure/gpt-4",
-        api_version="2023-05-15"
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="AWS Bedrock">
-    ```python Code
-    AWS_ACCESS_KEY_ID=<your-access-key>
-    AWS_SECRET_ACCESS_KEY=<your-secret-key>
-    AWS_DEFAULT_REGION=<your-region>
-    ```
-
-    Example usage:
-    ```python Code
-    llm = LLM(
-        model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0"
-    )
-    ```
-  </Accordion>
-  
-  <Accordion title="Amazon SageMaker">
-    ```python Code
-    AWS_ACCESS_KEY_ID=<your-access-key>
-    AWS_SECRET_ACCESS_KEY=<your-secret-key>
-    AWS_DEFAULT_REGION=<your-region>
-    ```
-
-    Example usage:
-    ```python Code
-    llm = LLM(
-        model="sagemaker/<my-endpoint>"
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Mistral">
-    ```python Code
-    MISTRAL_API_KEY=<your-api-key>
-    ```
-
-    Example usage:
-    ```python Code
-    llm = LLM(
-        model="mistral/mistral-large-latest",
-        temperature=0.7
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Nvidia NIM">
-    ```python Code
-    NVIDIA_API_KEY=<your-api-key>
-    ```
-
-    Example usage:
-    ```python Code
-    llm = LLM(
-        model="nvidia_nim/meta/llama3-70b-instruct",
-        temperature=0.7
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Groq">
-    ```python Code
-    GROQ_API_KEY=<your-api-key>
-    ```
-
-    Example usage:
-    ```python Code
-    llm = LLM(
-        model="groq/llama-3.2-90b-text-preview",
-        temperature=0.7
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="IBM watsonx.ai">
-    ```python Code
-    # Required
-    WATSONX_URL=<your-url>
-    WATSONX_APIKEY=<your-apikey>
-    WATSONX_PROJECT_ID=<your-project-id>
-    
-    # Optional
-    WATSONX_TOKEN=<your-token>
-    WATSONX_DEPLOYMENT_SPACE_ID=<your-space-id>
-    ```
-
-    Example usage:
-    ```python Code
-    llm = LLM(
-        model="watsonx/meta-llama/llama-3-1-70b-instruct",
-        base_url="https://api.watsonx.ai/v1"
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Ollama (Local LLMs)">
-    1. Install Ollama: [ollama.ai](https://ollama.ai/)
-    2. Run a model: `ollama run llama2`
-    3. Configure:
-
-    ```python Code
-    llm = LLM(
-        model="ollama/llama3:70b",
-        base_url="http://localhost:11434"
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Fireworks AI">
-    ```python Code
-    FIREWORKS_API_KEY=<your-api-key>
-    ```
-
-    Example usage:
-    ```python Code
-    llm = LLM(
-        model="fireworks_ai/accounts/fireworks/models/llama-v3-70b-instruct",
-        temperature=0.7
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Perplexity AI">
-    ```python Code
-    PERPLEXITY_API_KEY=<your-api-key>
-    ```
-
-    Example usage:
-    ```python Code
-    llm = LLM(
-        model="llama-3.1-sonar-large-128k-online",
-        base_url="https://api.perplexity.ai/"
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Hugging Face">
-    ```python Code
-    HUGGINGFACE_API_KEY=<your-api-key>
-    ```
-
-    Example usage:
-    ```python Code
-    llm = LLM(
-        model="huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct",
-        base_url="your_api_endpoint"
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="SambaNova">
-    ```python Code
-    SAMBANOVA_API_KEY=<your-api-key>
-    ```
-
-    Example usage:
-    ```python Code
-    llm = LLM(
-        model="sambanova/Meta-Llama-3.1-8B-Instruct",
-        temperature=0.7
-    )
-    ```
-  </Accordion>
-
-  <Accordion title="Cerebras">
-    ```python Code
-    # Required
-    CEREBRAS_API_KEY=<your-api-key>
-    ```
-
-    Example usage:
-    ```python Code
-    llm = LLM(
-        model="cerebras/llama3.1-70b",
-        temperature=0.7,
-        max_tokens=8192
-    )
-    ```
-
-    <Info>
-      Cerebras features:
-      - Fast inference speeds
-      - Competitive pricing
-      - Good balance of speed and quality
-      - Support for long context windows
-    </Info>
-  </Accordion>
-
-  <Accordion title="Open Router">
-    ```python Code
-    OPENROUTER_API_KEY=<your-api-key>
-    ```
-    
-    Example usage:
-    ```python Code
-    llm = LLM(
-        model="openrouter/deepseek/deepseek-r1",
-        base_url="https://openrouter.ai/api/v1",
-        api_key=OPENROUTER_API_KEY
-    )
-    ```
-
-    <Info>
-      Open Router models:
-      - openrouter/deepseek/deepseek-r1
-      - openrouter/deepseek/deepseek-chat
-    </Info>
-  </Accordion>
-</AccordionGroup>
-
-## Structured LLM Calls
-
-CrewAI supports structured responses from LLM calls by allowing you to define a `response_format` using a Pydantic model. This enables the framework to automatically parse and validate the output, making it easier to integrate the response into your application without manual post-processing.
-
-For example, you can define a Pydantic model to represent the expected response structure and pass it as the `response_format` when instantiating the LLM. The model will then be used to convert the LLM output into a structured Python object.
-
-```python Code
-from crewai import LLM
-
-class Dog(BaseModel):
-    name: str
-    age: int
-    breed: str
-
-
-llm = LLM(model="gpt-4o", response_format=Dog)
-
-response = llm.call(
-    "Analyze the following messages and return the name, age, and breed. "
-    "Meet Kona! She is 3 years old and is a black german shepherd."
-)
-print(response)
-```
-
 ## Common Issues and Solutions
 
 <Tabs>