docs: update LLM integration details and examples

* docs: update LLM integration details and examples - Changed references from LiteLLM to native SDKs for LLM providers. - Enhanced OpenAI and AWS Bedrock sections with new usage examples and advanced configuration options. - Added structured output examples and supported environment variables for better clarity. - Improved documentation on additional parameters and features for LLM configurations. * drop this example - should use strucutred output from task instead --------- Co-authored-by: Greyson LaLonde <greyson.r.lalonde@gmail.com>
2025-12-15 20:08:29 +00:00 · 2025-10-21 11:39:50 -07:00
parent dba27cf8b5
commit e7b3ce27ca
1 changed files with 343 additions and 58 deletions
--- a/docs/en/concepts/llms.mdx
+++ b/docs/en/concepts/llms.mdx
@@ -7,7 +7,7 @@ mode: "wide"

 ## Overview

-CrewAI integrates with multiple LLM providers through LiteLLM, giving you the flexibility to choose the right model for your specific use case. This guide will help you understand how to configure and use different LLM providers in your CrewAI projects.
+CrewAI integrates with multiple LLM providers through providers native sdks, giving you the flexibility to choose the right model for your specific use case. This guide will help you understand how to configure and use different LLM providers in your CrewAI projects.


 ## What are LLMs?
@@ -113,44 +113,104 @@ In this section, you'll find detailed examples that help you select, configure,

 <AccordionGroup>
  <Accordion title="OpenAI">
-    Set the following environment variables in your `.env` file:
+    CrewAI provides native integration with OpenAI through the OpenAI Python SDK.

    ```toml Code
    # Required
    OPENAI_API_KEY=sk-...

    # Optional
-    OPENAI_API_BASE=<custom-base-url>
-    OPENAI_ORGANIZATION=<your-org-id>
+    OPENAI_BASE_URL=<custom-base-url>
    ```

-    Example usage in your CrewAI project:
+    **Basic Usage:**
    ```python Code
    from crewai import LLM

    llm = LLM(
-        model="openai/gpt-4", # call model by provider/model_name
-        temperature=0.8,
-        max_tokens=150,
+        model="openai/gpt-4o",
+        api_key="your-api-key",  # Or set OPENAI_API_KEY
+        temperature=0.7,
+        max_tokens=4000
+    )
+    ```
+
+    **Advanced Configuration:**
+    ```python Code
+    from crewai import LLM
+
+    llm = LLM(
+        model="openai/gpt-4o",
+        api_key="your-api-key",
+        base_url="https://api.openai.com/v1",  # Optional custom endpoint
+        organization="org-...",  # Optional organization ID
+        project="proj_...",  # Optional project ID
+        temperature=0.7,
+        max_tokens=4000,
+        max_completion_tokens=4000,  # For newer models
        top_p=0.9,
        frequency_penalty=0.1,
        presence_penalty=0.1,
        stop=["END"],
-        seed=42
+        seed=42,  # For reproducible outputs
+        stream=True,  # Enable streaming
+        timeout=60.0,  # Request timeout in seconds
+        max_retries=3,  # Maximum retry attempts
+        logprobs=True,  # Return log probabilities
+        top_logprobs=5,  # Number of most likely tokens
+        reasoning_effort="medium"  # For o1 models: low, medium, high
    )
    ```

-    OpenAI is one of the leading providers of LLMs with a wide range of models and features.
+    **Structured Outputs:**
+    ```python Code
+    from pydantic import BaseModel
+    from crewai import LLM
+
+    class ResponseFormat(BaseModel):
+        name: str
+        age: int
+        summary: str
+
+    llm = LLM(
+        model="openai/gpt-4o",
+    )
+    ```
+
+    **Supported Environment Variables:**
+    - `OPENAI_API_KEY`: Your OpenAI API key (required)
+    - `OPENAI_BASE_URL`: Custom base URL for OpenAI API (optional)
+
+    **Features:**
+    - Native function calling support (except o1 models)
+    - Structured outputs with JSON schema
+    - Streaming support for real-time responses
+    - Token usage tracking
+    - Stop sequences support (except o1 models)
+    - Log probabilities for token-level insights
+    - Reasoning effort control for o1 models
+
+    **Supported Models:**

    | Model               | Context Window   | Best For                                      |
    |---------------------|------------------|-----------------------------------------------|
-    | GPT-4               | 8,192 tokens     | High-accuracy tasks, complex reasoning        |
-    | GPT-4 Turbo         | 128,000 tokens   | Long-form content, document analysis          |
-    | GPT-4o & GPT-4o-mini  | 128,000 tokens  | Cost-effective large context processing       |
-    | o3-mini             | 200,000 tokens   | Fast reasoning, complex reasoning             |
-    | o1-mini             | 128,000 tokens   | Fast reasoning, complex reasoning             |
-    | o1-preview          | 128,000 tokens   | Fast reasoning, complex reasoning             |
-    | o1                  | 200,000 tokens   | Fast reasoning, complex reasoning             |
+    | gpt-4.1             | 1M tokens        | Latest model with enhanced capabilities       |
+    | gpt-4.1-mini        | 1M tokens        | Efficient version with large context          |
+    | gpt-4.1-nano        | 1M tokens        | Ultra-efficient variant                       |
+    | gpt-4o              | 128,000 tokens   | Optimized for speed and intelligence          |
+    | gpt-4o-mini         | 200,000 tokens   | Cost-effective with large context             |
+    | gpt-4-turbo         | 128,000 tokens   | Long-form content, document analysis          |
+    | gpt-4               | 8,192 tokens     | High-accuracy tasks, complex reasoning        |
+    | o1                  | 200,000 tokens   | Advanced reasoning, complex problem-solving   |
+    | o1-preview          | 128,000 tokens   | Preview of reasoning capabilities             |
+    | o1-mini             | 128,000 tokens   | Efficient reasoning model                     |
+    | o3-mini             | 200,000 tokens   | Lightweight reasoning model                   |
+    | o4-mini             | 200,000 tokens   | Next-gen efficient reasoning                  |
+
+    **Note:** To use OpenAI, install the required dependencies:
+    ```bash
+    uv add "crewai[openai]"
+    ```
  </Accordion>

  <Accordion title="Meta-Llama">
@@ -187,69 +247,186 @@ In this section, you'll find detailed examples that help you select, configure,
  </Accordion>

  <Accordion title="Anthropic">
+    CrewAI provides native integration with Anthropic through the Anthropic Python SDK.
+
    ```toml Code
    # Required
    ANTHROPIC_API_KEY=sk-ant-...
-
-    # Optional
-    ANTHROPIC_API_BASE=<custom-base-url>
    ```

-    Example usage in your CrewAI project:
+    **Basic Usage:**
    ```python Code
+    from crewai import LLM
+
    llm = LLM(
-        model="anthropic/claude-3-sonnet-20240229-v1:0",
-        temperature=0.7
+        model="anthropic/claude-3-5-sonnet-20241022",
+        api_key="your-api-key",  # Or set ANTHROPIC_API_KEY
+        max_tokens=4096  # Required for Anthropic
    )
    ```
+
+    **Advanced Configuration:**
+    ```python Code
+    from crewai import LLM
+
+    llm = LLM(
+        model="anthropic/claude-3-5-sonnet-20241022",
+        api_key="your-api-key",
+        base_url="https://api.anthropic.com",  # Optional custom endpoint
+        temperature=0.7,
+        max_tokens=4096,  # Required parameter
+        top_p=0.9,
+        stop_sequences=["END", "STOP"],  # Anthropic uses stop_sequences
+        stream=True,  # Enable streaming
+        timeout=60.0,  # Request timeout in seconds
+        max_retries=3  # Maximum retry attempts
+    )
+    ```
+
+    **Supported Environment Variables:**
+    - `ANTHROPIC_API_KEY`: Your Anthropic API key (required)
+
+    **Features:**
+    - Native tool use support for Claude 3+ models
+    - Streaming support for real-time responses
+    - Automatic system message handling
+    - Stop sequences for controlled output
+    - Token usage tracking
+    - Multi-turn tool use conversations
+
+    **Important Notes:**
+    - `max_tokens` is a **required** parameter for all Anthropic models
+    - Claude uses `stop_sequences` instead of `stop`
+    - System messages are handled separately from conversation messages
+    - First message must be from the user (automatically handled)
+    - Messages must alternate between user and assistant
+
+    **Supported Models:**
+
+    | Model                        | Context Window | Best For                                      |
+    |------------------------------|----------------|-----------------------------------------------|
+    | claude-3-7-sonnet            | 200,000 tokens | Advanced reasoning and agentic tasks          |
+    | claude-3-5-sonnet-20241022   | 200,000 tokens | Latest Sonnet with best performance           |
+    | claude-3-5-haiku             | 200,000 tokens | Fast, compact model for quick responses       |
+    | claude-3-opus                | 200,000 tokens | Most capable for complex tasks                |
+    | claude-3-sonnet              | 200,000 tokens | Balanced intelligence and speed               |
+    | claude-3-haiku               | 200,000 tokens | Fastest for simple tasks                      |
+    | claude-2.1                   | 200,000 tokens | Extended context, reduced hallucinations      |
+    | claude-2                     | 100,000 tokens | Versatile model for various tasks             |
+    | claude-instant               | 100,000 tokens | Fast, cost-effective for everyday tasks       |
+
+    **Note:** To use Anthropic, install the required dependencies:
+    ```bash
+    uv add "crewai[anthropic]"
+    ```
  </Accordion>

  <Accordion title="Google (Gemini API)">
-    Set your API key in your `.env` file. If you need a key, or need to find an
-    existing key, check [AI Studio](https://aistudio.google.com/apikey).
+    CrewAI provides native integration with Google Gemini through the Google Gen AI Python SDK.
+
+    Set your API key in your `.env` file. If you need a key, check [AI Studio](https://aistudio.google.com/apikey).

    ```toml .env
-    # https://ai.google.dev/gemini-api/docs/api-key
+    # Required (one of the following)
+    GOOGLE_API_KEY=<your-api-key>
    GEMINI_API_KEY=<your-api-key>
+
+    # Optional - for Vertex AI
+    GOOGLE_CLOUD_PROJECT=<your-project-id>
+    GOOGLE_CLOUD_LOCATION=<location>  # Defaults to us-central1
+    GOOGLE_GENAI_USE_VERTEXAI=true  # Set to use Vertex AI
    ```

-    Example usage in your CrewAI project:
+    **Basic Usage:**
    ```python Code
    from crewai import LLM

    llm = LLM(
        model="gemini/gemini-2.0-flash",
-        temperature=0.7,
+        api_key="your-api-key",  # Or set GOOGLE_API_KEY/GEMINI_API_KEY
+        temperature=0.7
    )
    ```

-    ### Gemini models
+    **Advanced Configuration:**
+    ```python Code
+    from crewai import LLM
+
+    llm = LLM(
+        model="gemini/gemini-2.5-flash",
+        api_key="your-api-key",
+        temperature=0.7,
+        top_p=0.9,
+        top_k=40,  # Top-k sampling parameter
+        max_output_tokens=8192,
+        stop_sequences=["END", "STOP"],
+        stream=True,  # Enable streaming
+        safety_settings={
+            "HARM_CATEGORY_HARASSMENT": "BLOCK_NONE",
+            "HARM_CATEGORY_HATE_SPEECH": "BLOCK_NONE"
+        }
+    )
+    ```
+
+    **Vertex AI Configuration:**
+    ```python Code
+    from crewai import LLM
+
+    llm = LLM(
+        model="gemini/gemini-1.5-pro",
+        project="your-gcp-project-id",
+        location="us-central1"  # GCP region
+    )
+    ```
+
+    **Supported Environment Variables:**
+    - `GOOGLE_API_KEY` or `GEMINI_API_KEY`: Your Google API key (required for Gemini API)
+    - `GOOGLE_CLOUD_PROJECT`: Google Cloud project ID (for Vertex AI)
+    - `GOOGLE_CLOUD_LOCATION`: GCP location (defaults to `us-central1`)
+    - `GOOGLE_GENAI_USE_VERTEXAI`: Set to `true` to use Vertex AI
+
+    **Features:**
+    - Native function calling support for Gemini 1.5+ and 2.x models
+    - Streaming support for real-time responses
+    - Multimodal capabilities (text, images, video)
+    - Safety settings configuration
+    - Support for both Gemini API and Vertex AI
+    - Automatic system instruction handling
+    - Token usage tracking
+
+    **Gemini Models:**

    Google offers a range of powerful models optimized for different use cases.

    | Model                          | Context Window | Best For                                                          |
    |--------------------------------|----------------|-------------------------------------------------------------------|
-    | gemini-2.5-flash-preview-04-17 | 1M tokens      | Adaptive thinking, cost efficiency                                |
-    | gemini-2.5-pro-preview-05-06   | 1M tokens      | Enhanced thinking and reasoning, multimodal understanding, advanced coding, and more |
-    | gemini-2.0-flash               | 1M tokens      | Next generation features, speed, thinking, and realtime streaming |
+    | gemini-2.5-flash               | 1M tokens      | Adaptive thinking, cost efficiency                                |
+    | gemini-2.5-pro                 | 1M tokens      | Enhanced thinking and reasoning, multimodal understanding         |
+    | gemini-2.0-flash               | 1M tokens      | Next generation features, speed, thinking                         |
+    | gemini-2.0-flash-thinking      | 32,768 tokens  | Advanced reasoning with thinking process                          |
    | gemini-2.0-flash-lite          | 1M tokens      | Cost efficiency and low latency                                   |
+    | gemini-1.5-pro                 | 2M tokens      | Best performing, logical reasoning, coding                        |
    | gemini-1.5-flash               | 1M tokens      | Balanced multimodal model, good for most tasks                    |
-    | gemini-1.5-flash-8B            | 1M tokens      | Fastest, most cost-efficient, good for high-frequency tasks       |
-    | gemini-1.5-pro                 | 2M tokens      | Best performing, wide variety of reasoning tasks including logical reasoning, coding, and creative collaboration |
+    | gemini-1.5-flash-8b            | 1M tokens      | Fastest, most cost-efficient                                      |
+    | gemini-1.0-pro                 | 32,768 tokens  | Earlier generation model                                          |
+
+    **Gemma Models:**
+
+    The Gemini API also supports [Gemma models](https://ai.google.dev/gemma/docs) hosted on Google infrastructure.
+
+    | Model          | Context Window | Best For                           |
+    |----------------|----------------|------------------------------------|
+    | gemma-3-1b     | 32,000 tokens  | Ultra-lightweight tasks            |
+    | gemma-3-4b     | 128,000 tokens | Efficient general-purpose tasks    |
+    | gemma-3-12b    | 128,000 tokens | Balanced performance and efficiency|
+    | gemma-3-27b    | 128,000 tokens | High-performance tasks             |
+
+    **Note:** To use Google Gemini, install the required dependencies:
+    ```bash
+    uv add "crewai[google-genai]"
+    ```

    The full list of models is available in the [Gemini model docs](https://ai.google.dev/gemini-api/docs/models).
-
-    ### Gemma
-
-    The Gemini API also allows you to use your API key to access [Gemma models](https://ai.google.dev/gemma/docs) hosted on Google infrastructure.
-
-    | Model          | Context Window |
-    |----------------|----------------|
-    | gemma-3-1b-it  | 32k tokens     |
-    | gemma-3-4b-it  | 32k tokens     |
-    | gemma-3-12b-it | 32k tokens     |
-    | gemma-3-27b-it | 128k tokens    |
-
  </Accordion>
  <Accordion title="Google (Vertex AI)">
    Get credentials from your Google Cloud Console and save it to a JSON file, then load it with the following code:
@@ -291,43 +468,146 @@ In this section, you'll find detailed examples that help you select, configure,
  </Accordion>

  <Accordion title="Azure">
+    CrewAI provides native integration with Azure AI Inference and Azure OpenAI through the Azure AI Inference Python SDK.
+
    ```toml Code
    # Required
    AZURE_API_KEY=<your-api-key>
-    AZURE_API_BASE=<your-resource-url>
-    AZURE_API_VERSION=<api-version>
+    AZURE_ENDPOINT=<your-endpoint-url>

    # Optional
-    AZURE_AD_TOKEN=<your-azure-ad-token>
-    AZURE_API_TYPE=<your-azure-api-type>
+    AZURE_API_VERSION=<api-version>  # Defaults to 2024-06-01
    ```

-    Example usage in your CrewAI project:
+    **Endpoint URL Formats:**
+
+    For Azure OpenAI deployments:
+    ```
+    https://<resource-name>.openai.azure.com/openai/deployments/<deployment-name>
+    ```
+
+    For Azure AI Inference endpoints:
+    ```
+    https://<resource-name>.inference.azure.com
+    ```
+
+    **Basic Usage:**
    ```python Code
    llm = LLM(
        model="azure/gpt-4",
-        api_version="2023-05-15"
+        api_key="<your-api-key>",  # Or set AZURE_API_KEY
+        endpoint="<your-endpoint-url>",
+        api_version="2024-06-01"
    )
    ```
+
+    **Advanced Configuration:**
+    ```python Code
+    llm = LLM(
+        model="azure/gpt-4o",
+        temperature=0.7,
+        max_tokens=4000,
+        top_p=0.9,
+        frequency_penalty=0.0,
+        presence_penalty=0.0,
+        stop=["END"],
+        stream=True,
+        timeout=60.0,
+        max_retries=3
+    )
+    ```
+
+    **Supported Environment Variables:**
+    - `AZURE_API_KEY`: Your Azure API key (required)
+    - `AZURE_ENDPOINT`: Your Azure endpoint URL (required, also checks `AZURE_OPENAI_ENDPOINT` and `AZURE_API_BASE`)
+    - `AZURE_API_VERSION`: API version (optional, defaults to `2024-06-01`)
+
+    **Features:**
+    - Native function calling support for Azure OpenAI models (gpt-4, gpt-4o, gpt-3.5-turbo, etc.)
+    - Streaming support for real-time responses
+    - Automatic endpoint URL validation and correction
+    - Comprehensive error handling with retry logic
+    - Token usage tracking
+
+    **Note:** To use Azure AI Inference, install the required dependencies:
+    ```bash
+    uv add "crewai[azure-ai-inference]"
+    ```
  </Accordion>

  <Accordion title="AWS Bedrock">
+    CrewAI provides native integration with AWS Bedrock through the boto3 SDK using the Converse API.
+
    ```toml Code
+    # Required
    AWS_ACCESS_KEY_ID=<your-access-key>
    AWS_SECRET_ACCESS_KEY=<your-secret-key>
-    AWS_DEFAULT_REGION=<your-region>
+
+    # Optional
+    AWS_SESSION_TOKEN=<your-session-token>  # For temporary credentials
+    AWS_DEFAULT_REGION=<your-region>  # Defaults to us-east-1
    ```

-    Example usage in your CrewAI project:
+    **Basic Usage:**
    ```python Code
+    from crewai import LLM
+
    llm = LLM(
-        model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0"
+        model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
+        region_name="us-east-1"
    )
    ```

-    Before using Amazon Bedrock, make sure you have boto3 installed in your environment
+    **Advanced Configuration:**
+    ```python Code
+    from crewai import LLM

-    [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html) is a managed service that provides access to multiple foundation models from top AI companies through a unified API, enabling secure and responsible AI application development.
+    llm = LLM(
+        model="bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0",
+        aws_access_key_id="your-access-key",  # Or set AWS_ACCESS_KEY_ID
+        aws_secret_access_key="your-secret-key",  # Or set AWS_SECRET_ACCESS_KEY
+        aws_session_token="your-session-token",  # For temporary credentials
+        region_name="us-east-1",
+        temperature=0.7,
+        max_tokens=4096,
+        top_p=0.9,
+        top_k=250,  # For Claude models
+        stop_sequences=["END", "STOP"],
+        stream=True,  # Enable streaming
+        guardrail_config={  # Optional content filtering
+            "guardrailIdentifier": "your-guardrail-id",
+            "guardrailVersion": "1"
+        },
+        additional_model_request_fields={  # Model-specific parameters
+            "top_k": 250
+        }
+    )
+    ```
+
+    **Supported Environment Variables:**
+    - `AWS_ACCESS_KEY_ID`: AWS access key (required)
+    - `AWS_SECRET_ACCESS_KEY`: AWS secret key (required)
+    - `AWS_SESSION_TOKEN`: AWS session token for temporary credentials (optional)
+    - `AWS_DEFAULT_REGION`: AWS region (defaults to `us-east-1`)
+
+    **Features:**
+    - Native tool calling support via Converse API
+    - Streaming and non-streaming responses
+    - Comprehensive error handling with retry logic
+    - Guardrail configuration for content filtering
+    - Model-specific parameters via `additional_model_request_fields`
+    - Token usage tracking and stop reason logging
+    - Support for all Bedrock foundation models
+    - Automatic conversation format handling
+
+    **Important Notes:**
+    - Uses the modern Converse API for unified model access
+    - Automatic handling of model-specific conversation requirements
+    - System messages are handled separately from conversation
+    - First message must be from user (automatically handled)
+    - Some models (like Cohere) require conversation to end with user message
+
+    [Amazon Bedrock](https://docs.aws.amazon.com/bedrock/latest/userguide/models-regions.html) is a managed service that provides access to multiple foundation models from top AI companies through a unified API.

    | Model                   | Context Window       | Best For                                                          |
    |-------------------------|----------------------|-------------------------------------------------------------------|
@@ -357,7 +637,12 @@ In this section, you'll find detailed examples that help you select, configure,
    | Jamba-Instruct          | Up to 256k tokens    | Model with extended context window optimized for cost-effective text generation, summarization, and Q&A. |
    | Mistral 7B Instruct     | Up to 32k tokens     | This LLM follows instructions, completes requests, and generates creative text. |
    | Mistral 8x7B Instruct   | Up to 32k tokens     | An MOE LLM that follows instructions, completes requests, and generates creative text. |
+    | DeepSeek R1             | 32,768 tokens        | Advanced reasoning model                                                       |

+    **Note:** To use AWS Bedrock, install the required dependencies:
+    ```bash
+    uv add "crewai[bedrock]"
+    ```
  </Accordion>

  <Accordion title="Amazon SageMaker">
@@ -899,7 +1184,7 @@ Learn how to get the most out of your LLM configuration:
  </Accordion>

  <Accordion title="Drop Additional Parameters">
-    CrewAI internally uses Litellm for LLM calls, which allows you to drop additional parameters that are not needed for your specific use case. This can help simplify your code and reduce the complexity of your LLM configuration.
+    CrewAI internally uses native sdks for LLM calls, which allows you to drop additional parameters that are not needed for your specific use case. This can help simplify your code and reduce the complexity of your LLM configuration.
    For example, if you don't need to send the <code>stop</code> parameter, you can simply omit it from your LLM call:

    ```python