crewAI/docs/concepts/llms.mdx

---
title: 'LLMs'
description: 'A comprehensive guide to configuring and using Large Language Models (LLMs) in your CrewAI projects'
icon: 'microchip-ai'
---

<Note>
  CrewAI integrates with multiple LLM providers through LiteLLM, giving you the flexibility to choose the right model for your specific use case. This guide will help you understand how to configure and use different LLM providers in your CrewAI projects.
</Note>

## What are LLMs?

Large Language Models (LLMs) are the core intelligence behind CrewAI agents. They enable agents to understand context, make decisions, and generate human-like responses. Here's what you need to know:

<CardGroup cols={2}>
  <Card title="LLM Basics" icon="brain">
    Large Language Models are AI systems trained on vast amounts of text data. They power the intelligence of your CrewAI agents, enabling them to understand and generate human-like text.
  </Card>
  <Card title="Context Window" icon="window">
    The context window determines how much text an LLM can process at once. Larger windows (e.g., 128K tokens) allow for more context but may be more expensive and slower.
  </Card>
  <Card title="Temperature" icon="temperature-three-quarters">
    Temperature (0.0 to 1.0) controls response randomness. Lower values (e.g., 0.2) produce more focused, deterministic outputs, while higher values (e.g., 0.8) increase creativity and variability.
  </Card>
  <Card title="Provider Selection" icon="server">
    Each LLM provider (e.g., OpenAI, Anthropic, Google) offers different models with varying capabilities, pricing, and features. Choose based on your needs for accuracy, speed, and cost.
  </Card>
</CardGroup>

## Available Models and Their Capabilities

Here's a detailed breakdown of supported models and their capabilities, you can compare performance at [lmarena.ai](https://lmarena.ai/):

<Tabs>
  <Tab title="OpenAI">
    | Model | Context Window | Best For |
    |-------|---------------|-----------|
    | GPT-4 | 8,192 tokens | High-accuracy tasks, complex reasoning |
    | GPT-4 Turbo | 128,000 tokens | Long-form content, document analysis |
    | GPT-4o & GPT-4o-mini | 128,000 tokens | Cost-effective large context processing |

    <Note>
      1 token ≈ 4 characters in English. For example, 8,192 tokens ≈ 32,768 characters or about 6,000 words.
    </Note>
  </Tab>
  <Tab title="Gemini">
    | Model | Context Window | Best For |
    |-------|---------------|-----------|
    | Gemini 1.5 Flash | 1M tokens | Balanced multimodal model, good for most tasks |
    | Gemini 1.5 Flash 8B | 1M tokens | Fastest, most cost-efficient, good for high-frequency tasks |
    | Gemini 1.5 Pro | 2M tokens | Best performing, wide variety of reasoning tasks including logical reasoning, coding, and creative collaboration |

    <Tip>
      Google's Gemini models are all multimodal, supporting audio, images, video and text, supporting context caching, json schema, function calling, etc.
    </Tip>
  </Tab>
  <Tab title="Groq">
    | Model | Context Window | Best For |
    |-------|---------------|-----------|
    | Llama 3.1 70B/8B | 131,072 tokens | High-performance, large context tasks |
    | Llama 3.2 Series | 8,192 tokens | General-purpose tasks |
    | Mixtral 8x7B | 32,768 tokens | Balanced performance and context |
    | Gemma Series | 8,192 tokens | Efficient, smaller-scale tasks |

    <Tip>
      Groq is known for its fast inference speeds, making it suitable for real-time applications.
    </Tip>
  </Tab>
  <Tab title="Others">
    | Provider | Context Window | Key Features |
    |----------|---------------|--------------|
    | Deepseek Chat | 128,000 tokens | Specialized in technical discussions |
    | Claude 3 | Up to 200K tokens | Strong reasoning, code understanding |
    | Gemini | Varies by model | Multimodal capabilities |

    <Info>
      Provider selection should consider factors like:
      - API availability in your region
      - Pricing structure
      - Required features (e.g., streaming, function calling)
      - Performance requirements
    </Info>
  </Tab>
</Tabs>

## Setting Up Your LLM

There are three ways to configure LLMs in CrewAI. Choose the method that best fits your workflow:

<Tabs>
  <Tab title="1. Environment Variables">
    The simplest way to get started. Set these variables in your environment:

    ```bash
    # Required: Your API key for authentication
    OPENAI_API_KEY=<your-api-key>

    # Optional: Default model selection
    OPENAI_MODEL_NAME=gpt-4o-mini  # Default if not set

    # Optional: Organization ID (if applicable)
    OPENAI_ORGANIZATION_ID=<your-org-id>
    ```

    <Warning>
      Never commit API keys to version control. Use environment files (.env) or your system's secret management.
    </Warning>
  </Tab>
  <Tab title="2. YAML Configuration">
    Create a YAML file to define your agent configurations. This method is great for version control and team collaboration:

    ```yaml
    researcher:
        # Agent Definition
        role: Research Specialist
        goal: Conduct comprehensive research and analysis
        backstory: A dedicated research professional with years of experience
        verbose: true

        # Model Selection (uncomment your choice)

        # OpenAI Models - Known for reliability and performance
        llm: openai/gpt-4o-mini
        # llm: openai/gpt-4        # More accurate but expensive
        # llm: openai/gpt-4-turbo  # Fast with large context
        # llm: openai/gpt-4o       # Optimized for longer texts
        # llm: openai/o1-preview   # Latest features
        # llm: openai/o1-mini      # Cost-effective

        # Azure Models - For enterprise deployments
        # llm: azure/gpt-4o-mini
        # llm: azure/gpt-4
        # llm: azure/gpt-35-turbo

        # Anthropic Models - Strong reasoning capabilities
        # llm: anthropic/claude-3-opus-20240229-v1:0
        # llm: anthropic/claude-3-sonnet-20240229-v1:0
        # llm: anthropic/claude-3-haiku-20240307-v1:0
        # llm: anthropic/claude-2.1
        # llm: anthropic/claude-2.0

        # Google Models - Strong reasoning, large cachable context window, multimodal
        # llm: gemini/gemini-1.5-pro-latest
        # llm: gemini/gemini-1.5-flash-latest
        # llm: gemini/gemini-1.5-flash-8b-latest

        # AWS Bedrock Models - Enterprise-grade
        # llm: bedrock/anthropic.claude-3-sonnet-20240229-v1:0
        # llm: bedrock/anthropic.claude-v2:1
        # llm: bedrock/amazon.titan-text-express-v1
        # llm: bedrock/meta.llama2-70b-chat-v1

        # Mistral Models - Open source alternative
        # llm: mistral/mistral-large-latest
        # llm: mistral/mistral-medium-latest
        # llm: mistral/mistral-small-latest

        # Groq Models - Fast inference
        # llm: groq/mixtral-8x7b-32768
        # llm: groq/llama-3.1-70b-versatile
        # llm: groq/llama-3.2-90b-text-preview
        # llm: groq/gemma2-9b-it
        # llm: groq/gemma-7b-it

        # IBM watsonx.ai Models - Enterprise features
        # llm: watsonx/ibm/granite-13b-chat-v2
        # llm: watsonx/meta-llama/llama-3-1-70b-instruct
        # llm: watsonx/bigcode/starcoder2-15b

        # Ollama Models - Local deployment
        # llm: ollama/llama3:70b
        # llm: ollama/codellama
        # llm: ollama/mistral
        # llm: ollama/mixtral
        # llm: ollama/phi

        # Fireworks AI Models - Specialized tasks
        # llm: fireworks_ai/accounts/fireworks/models/llama-v3-70b-instruct
        # llm: fireworks_ai/accounts/fireworks/models/mixtral-8x7b
        # llm: fireworks_ai/accounts/fireworks/models/zephyr-7b-beta

        # Perplexity AI Models - Research focused
        # llm: pplx/llama-3.1-sonar-large-128k-online
        # llm: pplx/mistral-7b-instruct
        # llm: pplx/codellama-34b-instruct
        # llm: pplx/mixtral-8x7b-instruct

        # Hugging Face Models - Community models
        # llm: huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct
        # llm: huggingface/mistralai/Mixtral-8x7B-Instruct-v0.1
        # llm: huggingface/tiiuae/falcon-180B-chat
        # llm: huggingface/google/gemma-7b-it

        # Nvidia NIM Models - GPU-optimized
        # llm: nvidia_nim/meta/llama3-70b-instruct
        # llm: nvidia_nim/mistral/mixtral-8x7b
        # llm: nvidia_nim/google/gemma-7b

        # SambaNova Models - Enterprise AI
        # llm: sambanova/Meta-Llama-3.1-8B-Instruct
        # llm: sambanova/BioMistral-7B
        # llm: sambanova/Falcon-180B
    ```

    <Info>
      The YAML configuration allows you to:
      - Version control your agent settings
      - Easily switch between different models
      - Share configurations across team members
      - Document model choices and their purposes
    </Info>
  </Tab>
  <Tab title="3. Direct Code">
    For maximum flexibility, configure LLMs directly in your Python code:

    ```python
    from crewai import LLM

    # Basic configuration
    llm = LLM(model="gpt-4")

    # Advanced configuration with detailed parameters
    llm = LLM(
        model="gpt-4o-mini",
        temperature=0.7,        # Higher for more creative outputs
        timeout=120,           # Seconds to wait for response
        max_tokens=4000,       # Maximum length of response
        top_p=0.9,            # Nucleus sampling parameter
        frequency_penalty=0.1, # Reduce repetition
        presence_penalty=0.1,  # Encourage topic diversity
        response_format={"type": "json"},  # For structured outputs
        seed=42               # For reproducible results
    )
    ```

    <Info>
      Parameter explanations:
      - `temperature`: Controls randomness (0.0-1.0)
      - `timeout`: Maximum wait time for response
      - `max_tokens`: Limits response length
      - `top_p`: Alternative to temperature for sampling
      - `frequency_penalty`: Reduces word repetition
      - `presence_penalty`: Encourages new topics
      - `response_format`: Specifies output structure
      - `seed`: Ensures consistent outputs
    </Info>
  </Tab>
</Tabs>

## Advanced Features and Optimization

Learn how to get the most out of your LLM configuration:

<AccordionGroup>
  <Accordion title="Context Window Management">
    CrewAI includes smart context management features:

    ```python
    from crewai import LLM

    # CrewAI automatically handles:
    # 1. Token counting and tracking
    # 2. Content summarization when needed
    # 3. Task splitting for large contexts

    llm = LLM(
        model="gpt-4",
        max_tokens=4000,  # Limit response length
    )
    ```

    <Info>
      Best practices for context management:
      1. Choose models with appropriate context windows
      2. Pre-process long inputs when possible
      3. Use chunking for large documents
      4. Monitor token usage to optimize costs
    </Info>
  </Accordion>

  <Accordion title="Performance Optimization">
    <Steps>
      <Step title="Token Usage Optimization">
        Choose the right context window for your task:
        - Small tasks (up to 4K tokens): Standard models
        - Medium tasks (between 4K-32K): Enhanced models
        - Large tasks (over 32K): Large context models

        ```python
        # Configure model with appropriate settings
        llm = LLM(
            model="openai/gpt-4-turbo-preview",
            temperature=0.7,    # Adjust based on task
            max_tokens=4096,    # Set based on output needs
            timeout=300        # Longer timeout for complex tasks
        )
        ```
        <Tip>
          - Lower temperature (0.1 to 0.3) for factual responses
          - Higher temperature (0.7 to 0.9) for creative tasks
        </Tip>
      </Step>

      <Step title="Best Practices">
        1. Monitor token usage
        2. Implement rate limiting
        3. Use caching when possible
        4. Set appropriate max_tokens limits
      </Step>
    </Steps>

    <Info>
      Remember to regularly monitor your token usage and adjust your configuration as needed to optimize costs and performance.
    </Info>
  </Accordion>
</AccordionGroup>

## Provider Configuration Examples

<AccordionGroup>
  <Accordion title="OpenAI">
    ```python Code
    # Required
    OPENAI_API_KEY=sk-...

    # Optional
    OPENAI_API_BASE=<custom-base-url>
    OPENAI_ORGANIZATION=<your-org-id>
    ```

    Example usage:
    ```python Code
    from crewai import LLM

    llm = LLM(
        model="gpt-4",
        temperature=0.8,
        max_tokens=150,
        top_p=0.9,
        frequency_penalty=0.1,
        presence_penalty=0.1,
        stop=["END"],
        seed=42
    )
    ```
  </Accordion>

  <Accordion title="Anthropic">
    ```python Code
    ANTHROPIC_API_KEY=sk-ant-...
    ```

    Example usage:
    ```python Code
    llm = LLM(
        model="anthropic/claude-3-sonnet-20240229-v1:0",
        temperature=0.7
    )
    ```
  </Accordion>

  <Accordion title="Google">
    ```python Code
    # Option 1. Gemini accessed with an API key.
    # https://ai.google.dev/gemini-api/docs/api-key
    GEMINI_API_KEY=<your-api-key>

    # Option 2. Vertex AI IAM credentials for Gemini, Anthropic, and anything in the Model Garden.
    # https://cloud.google.com/vertex-ai/generative-ai/docs/overview
    ```

    Example usage:
    ```python Code
    llm = LLM(
        model="gemini/gemini-1.5-pro-latest",
        temperature=0.7
    )
    ```
  </Accordion>

  <Accordion title="Azure">
    ```python Code
    # Required
    AZURE_API_KEY=<your-api-key>
    AZURE_API_BASE=<your-resource-url>
    AZURE_API_VERSION=<api-version>

    # Optional
    AZURE_AD_TOKEN=<your-azure-ad-token>
    AZURE_API_TYPE=<your-azure-api-type>
    ```

    Example usage:
    ```python Code
    llm = LLM(
        model="azure/gpt-4",
        api_version="2023-05-15"
    )
    ```
  </Accordion>

  <Accordion title="AWS Bedrock">
    ```python Code
    AWS_ACCESS_KEY_ID=<your-access-key>
    AWS_SECRET_ACCESS_KEY=<your-secret-key>
    AWS_DEFAULT_REGION=<your-region>
    ```

    Example usage:
    ```python Code
    llm = LLM(
        model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0"
    )
    ```
  </Accordion>

  <Accordion title="Mistral">
    ```python Code
    MISTRAL_API_KEY=<your-api-key>
    ```

    Example usage:
    ```python Code
    llm = LLM(
        model="mistral/mistral-large-latest",
        temperature=0.7
    )
    ```
  </Accordion>

  <Accordion title="Groq">
    ```python Code
    GROQ_API_KEY=<your-api-key>
    ```

    Example usage:
    ```python Code
    llm = LLM(
        model="groq/llama-3.2-90b-text-preview",
        temperature=0.7
    )
    ```
  </Accordion>

  <Accordion title="IBM watsonx.ai">
    ```python Code
    # Required
    WATSONX_URL=<your-url>
    WATSONX_APIKEY=<your-apikey>
    WATSONX_PROJECT_ID=<your-project-id>

    # Optional
    WATSONX_TOKEN=<your-token>
    WATSONX_DEPLOYMENT_SPACE_ID=<your-space-id>
    ```

    Example usage:
    ```python Code
    llm = LLM(
        model="watsonx/meta-llama/llama-3-1-70b-instruct",
        base_url="https://api.watsonx.ai/v1"
    )
    ```
  </Accordion>

  <Accordion title="Ollama (Local LLMs)">
    1. Install Ollama: [ollama.ai](https://ollama.ai/)
    2. Run a model: `ollama run llama2`
    3. Configure:

    ```python Code
    llm = LLM(
        model="ollama/llama3:70b",
        base_url="http://localhost:11434"
    )
    ```
  </Accordion>

  <Accordion title="Fireworks AI">
    ```python Code
    FIREWORKS_API_KEY=<your-api-key>
    ```

    Example usage:
    ```python Code
    llm = LLM(
        model="fireworks_ai/accounts/fireworks/models/llama-v3-70b-instruct",
        temperature=0.7
    )
    ```
  </Accordion>

  <Accordion title="Perplexity AI">
    ```python Code
    PERPLEXITY_API_KEY=<your-api-key>
    ```

    Example usage:
    ```python Code
    llm = LLM(
        model="llama-3.1-sonar-large-128k-online",
        base_url="https://api.perplexity.ai/"
    )
    ```
  </Accordion>

  <Accordion title="Hugging Face">
    ```python Code
    HUGGINGFACE_API_KEY=<your-api-key>
    ```

    Example usage:
    ```python Code
    llm = LLM(
        model="huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct",
        base_url="your_api_endpoint"
    )
    ```
  </Accordion>

  <Accordion title="Nvidia NIM">
    ```python Code
    NVIDIA_API_KEY=<your-api-key>
    ```

    Example usage:
    ```python Code
    llm = LLM(
        model="nvidia_nim/meta/llama3-70b-instruct",
        temperature=0.7
    )
    ```
  </Accordion>

  <Accordion title="SambaNova">
    ```python Code
    SAMBANOVA_API_KEY=<your-api-key>
    ```

    Example usage:
    ```python Code
    llm = LLM(
        model="sambanova/Meta-Llama-3.1-8B-Instruct",
        temperature=0.7
    )
    ```
  </Accordion>

  <Accordion title="Cerebras">
    ```python Code
    # Required
    CEREBRAS_API_KEY=<your-api-key>
    ```

    Example usage:
    ```python Code
    llm = LLM(
        model="cerebras/llama3.1-70b",
        temperature=0.7,
        max_tokens=8192
    )
    ```

    <Info>
      Cerebras features:
      - Fast inference speeds
      - Competitive pricing
      - Good balance of speed and quality
      - Support for long context windows
    </Info>
  </Accordion>
</AccordionGroup>

## Common Issues and Solutions

<Tabs>
  <Tab title="Authentication">
    <Warning>
      Most authentication issues can be resolved by checking API key format and environment variable names.
    </Warning>

    ```bash
    # OpenAI
    OPENAI_API_KEY=sk-...

    # Anthropic
    ANTHROPIC_API_KEY=sk-ant-...
    ```
  </Tab>
  <Tab title="Model Names">
    <Check>
      Always include the provider prefix in model names
    </Check>

    ```python
    # Correct
    llm = LLM(model="openai/gpt-4")

    # Incorrect
    llm = LLM(model="gpt-4")
    ```
  </Tab>
  <Tab title="Context Length">
    <Tip>
      Use larger context models for extensive tasks
    </Tip>

    ```python
    # Large context model
    llm = LLM(model="openai/gpt-4o")  # 128K tokens
    ```
  </Tab>
</Tabs>

## Getting Help

If you need assistance, these resources are available:

<CardGroup cols={3}>
  <Card
    title="LiteLLM Documentation"
    href="https://docs.litellm.ai/docs/"
    icon="book"
  >
    Comprehensive documentation for LiteLLM integration and troubleshooting common issues.
  </Card>
  <Card
    title="GitHub Issues"
    href="https://github.com/joaomdmoura/crewAI/issues"
    icon="bug"
  >
    Report bugs, request features, or browse existing issues for solutions.
  </Card>
  <Card
    title="Community Forum"
    href="https://community.crewai.com"
    icon="comment-question"
  >
    Connect with other CrewAI users, share experiences, and get help from the community.
  </Card>
</CardGroup>

<Note>
  Best Practices for API Key Security:
  - Use environment variables or secure vaults
  - Never commit keys to version control
  - Rotate keys regularly
  - Use separate keys for development and production
  - Monitor key usage for unusual patterns
</Note>