Files
crewAI/docs/en/concepts/llms.mdx
2026-02-19 11:33:49 -08:00

568 lines
18 KiB
Plaintext

---
title: 'LLMs'
description: 'A comprehensive guide to configuring and using Large Language Models (LLMs) in your CrewAI projects'
icon: 'microchip-ai'
mode: "wide"
---
## Overview
CrewAI integrates with multiple LLM providers through providers native sdks, giving you the flexibility to choose the right model for your specific use case. This guide will help you understand how to configure and use different LLM providers in your CrewAI projects.
## When to Use Advanced LLM Configuration
- You need strict control of latency, cost, and output format.
- You need model routing by task type.
- You need reproducible, policy-sensitive behavior in production.
## When Not to Over-Configure
- You are in early prototyping with one simple task path.
- You do not yet need structured outputs or model routing.
## What are LLMs?
Large Language Models (LLMs) are the core intelligence behind CrewAI agents. They enable agents to understand context, make decisions, and generate human-like responses. Here's what you need to know:
<CardGroup cols={2}>
<Card title="LLM Basics" icon="brain">
Large Language Models are AI systems trained on vast amounts of text data. They power the intelligence of your CrewAI agents, enabling them to understand and generate human-like text.
</Card>
<Card title="Context Window" icon="window">
The context window determines how much text an LLM can process at once. Larger windows (e.g., 128K tokens) allow for more context but may be more expensive and slower.
</Card>
<Card title="Temperature" icon="temperature-three-quarters">
Temperature (0.0 to 1.0) controls response randomness. Lower values (e.g., 0.2) produce more focused, deterministic outputs, while higher values (e.g., 0.8) increase creativity and variability.
</Card>
<Card title="Provider Selection" icon="server">
Each LLM provider (e.g., OpenAI, Anthropic, Google) offers different models with varying capabilities, pricing, and features. Choose based on your needs for accuracy, speed, and cost.
</Card>
</CardGroup>
## Setting up your LLM
There are different places in CrewAI code where you can specify the model to use. Once you specify the model you are using, you will need to provide the configuration (like an API key) for each of the model providers you use. See the [provider configuration examples](#provider-configuration-examples) section for your provider.
<Tabs>
<Tab title="1. Environment Variables">
The simplest way to get started. Set the model in your environment directly, through an `.env` file or in your app code. If you used `crewai create` to bootstrap your project, it will be set already.
```bash .env
MODEL=model-id # e.g. gpt-4o, gemini-2.0-flash, claude-3-sonnet-...
# Be sure to set your API keys here too. See the Provider
# section below.
```
<Warning>
Never commit API keys to version control. Use environment files (.env) or your system's secret management.
</Warning>
</Tab>
<Tab title="2. YAML Configuration">
Create a YAML file to define your agent configurations. This method is great for version control and team collaboration:
```yaml agents.yaml {6}
researcher:
role: Research Specialist
goal: Conduct comprehensive research and analysis
backstory: A dedicated research professional with years of experience
verbose: true
llm: provider/model-id # e.g. openai/gpt-4o, google/gemini-2.0-flash, anthropic/claude...
# (see provider configuration examples below for more)
```
<Info>
The YAML configuration allows you to:
- Version control your agent settings
- Easily switch between different models
- Share configurations across team members
- Document model choices and their purposes
</Info>
</Tab>
<Tab title="3. Direct Code">
For maximum flexibility, configure LLMs directly in your Python code:
```python {4,8}
from crewai import LLM
# Basic configuration
llm = LLM(model="model-id-here") # gpt-4o, gemini-2.0-flash, anthropic/claude...
# Advanced configuration with detailed parameters
llm = LLM(
model="model-id-here", # gpt-4o, gemini-2.0-flash, anthropic/claude...
temperature=0.7, # Higher for more creative outputs
timeout=120, # Seconds to wait for response
max_tokens=4000, # Maximum length of response
top_p=0.9, # Nucleus sampling parameter
frequency_penalty=0.1 , # Reduce repetition
presence_penalty=0.1, # Encourage topic diversity
response_format={"type": "json"}, # For structured outputs
seed=42 # For reproducible results
)
```
<Info>
Parameter explanations:
- `temperature`: Controls randomness (0.0-1.0)
- `timeout`: Maximum wait time for response
- `max_tokens`: Limits response length
- `top_p`: Alternative to temperature for sampling
- `frequency_penalty`: Reduces word repetition
- `presence_penalty`: Encourages new topics
- `response_format`: Specifies output structure
- `seed`: Ensures consistent outputs
</Info>
</Tab>
</Tabs>
## Production LLM Patterns
The basics above show how to configure one model. In real systems, you usually combine several LLM patterns for cost, quality, and reliability.
### Pattern 1: Route models by agent role
Use faster/cheaper models for extraction and heavier models for synthesis or critical decisions.
```python Code
from crewai import Agent, Crew, Process, Task
researcher = Agent(
role="Researcher",
goal="Collect factual inputs quickly",
backstory="Fast information-gathering specialist",
llm="openai/gpt-4o-mini",
)
reviewer = Agent(
role="Reviewer",
goal="Validate claims and produce final answer",
backstory="Careful editor focused on correctness",
llm="provider/model-id",
)
crew = Crew(
agents=[researcher, reviewer],
tasks=[
Task(
description="Find the latest policy changes and list the key points",
expected_output="Bullet list of validated policy changes",
agent=researcher,
),
Task(
description="Review findings and produce a final executive summary",
expected_output="Concise, decision-ready summary",
agent=reviewer,
),
],
process=Process.sequential,
)
```
### Pattern 2: Set reliability defaults once
Configure retry, timeout, and deterministic sampling in one reusable `LLM` object.
```python Code
from crewai import LLM
reliable_llm = LLM(
model="openai/gpt-4o-mini",
temperature=0.1,
timeout=45,
max_retries=3,
max_tokens=1200,
seed=7,
)
```
Use this for extraction, classification, and policy-sensitive tasks where variance should be low.
### Pattern 3: Use structured outputs for machine-readable responses
For downstream automation, force JSON-shaped outputs rather than free-form prose.
```python Code
from crewai import LLM
json_llm = LLM(
model="openai/gpt-4o",
response_format={"type": "json"},
temperature=0.0,
)
```
This reduces parser fragility in pipelines that feed APIs, databases, or workflow routers.
### Pattern 4: Use OpenAI Responses API for multi-turn reasoning flows
When you need built-in tools, response chaining, or reasoning-model workflows, enable the Responses API explicitly.
```python Code
from crewai import LLM
reasoning_llm = LLM(
model="openai/o4-mini",
api="responses",
auto_chain=True,
store=True,
reasoning_effort="medium",
)
```
This is especially useful in long-running assistants where you want conversation continuity and controllable reasoning depth.
## Provider Configuration
For concept-level usage, keep provider setup minimal and explicit:
1. Set provider credentials via environment variables.
2. Pin model IDs explicitly in code or YAML.
3. Set reliability defaults (`timeout`, `max_retries`, low `temperature`) for production.
Use these pages for deeper provider setup and runtime decisions:
- Connections and provider setup: [/en/learn/llm-connections](/en/learn/llm-connections)
- Custom provider integration: [/en/learn/custom-llm](/en/learn/custom-llm)
- Production routing and reliability patterns: [/en/ai/llms/patterns](/en/ai/llms/patterns)
- Parameter contract reference: [/en/ai/llms/reference](/en/ai/llms/reference)
## Streaming Responses
CrewAI supports streaming responses from LLMs, allowing your application to receive and process outputs in real-time as they're generated.
<Tabs>
<Tab title="Basic Setup">
Enable streaming by setting the `stream` parameter to `True` when initializing your LLM:
```python
from crewai import LLM
# Create an LLM with streaming enabled
llm = LLM(
model="openai/gpt-4o",
stream=True # Enable streaming
)
```
When streaming is enabled, responses are delivered in chunks as they're generated, creating a more responsive user experience.
</Tab>
<Tab title="Event Handling">
CrewAI emits events for each chunk received during streaming:
```python
from crewai.events import (
LLMStreamChunkEvent
)
from crewai.events import BaseEventListener
class MyCustomListener(BaseEventListener):
def setup_listeners(self, crewai_event_bus):
@crewai_event_bus.on(LLMStreamChunkEvent)
def on_llm_stream_chunk(self, event: LLMStreamChunkEvent):
# Process each chunk as it arrives
print(f"Received chunk: {event.chunk}")
my_listener = MyCustomListener()
```
<Tip>
[Click here](/en/concepts/event-listener#event-listeners) for more details
</Tip>
</Tab>
<Tab title="Agent & Task Tracking">
All LLM events in CrewAI include agent and task information, allowing you to track and filter LLM interactions by specific agents or tasks:
```python
from crewai import LLM, Agent, Task, Crew
from crewai.events import LLMStreamChunkEvent
from crewai.events import BaseEventListener
class MyCustomListener(BaseEventListener):
def setup_listeners(self, crewai_event_bus):
@crewai_event_bus.on(LLMStreamChunkEvent)
def on_llm_stream_chunk(source, event):
if researcher.id == event.agent_id:
print("\n==============\n Got event:", event, "\n==============\n")
my_listener = MyCustomListener()
llm = LLM(model="gpt-4o-mini", temperature=0, stream=True)
researcher = Agent(
role="About User",
goal="You know everything about the user.",
backstory="""You are a master at understanding people and their preferences.""",
llm=llm,
)
search = Task(
description="Answer the following questions about the user: {question}",
expected_output="An answer to the question.",
agent=researcher,
)
crew = Crew(agents=[researcher], tasks=[search])
result = crew.kickoff(
inputs={"question": "..."}
)
```
<Info>
This feature is particularly useful for:
- Debugging specific agent behaviors
- Logging LLM usage by task type
- Auditing which agents are making what types of LLM calls
- Performance monitoring of specific tasks
</Info>
</Tab>
</Tabs>
## Async LLM Calls
CrewAI supports asynchronous LLM calls for improved performance and concurrency in your AI workflows. Async calls allow you to run multiple LLM requests concurrently without blocking, making them ideal for high-throughput applications and parallel agent operations.
<Tabs>
<Tab title="Basic Usage">
Use the `acall` method for asynchronous LLM requests:
```python
import asyncio
from crewai import LLM
async def main():
llm = LLM(model="openai/gpt-4o")
# Single async call
response = await llm.acall("What is the capital of France?")
print(response)
asyncio.run(main())
```
The `acall` method supports all the same parameters as the synchronous `call` method, including messages, tools, and callbacks.
</Tab>
<Tab title="With Streaming">
Combine async calls with streaming for real-time concurrent responses:
```python
import asyncio
from crewai import LLM
async def stream_async():
llm = LLM(model="openai/gpt-4o", stream=True)
response = await llm.acall("Write a short story about AI")
print(response)
asyncio.run(stream_async())
```
</Tab>
</Tabs>
## Structured LLM Calls
CrewAI supports structured responses from LLM calls by allowing you to define a `response_format` using a Pydantic model. This enables the framework to automatically parse and validate the output, making it easier to integrate the response into your application without manual post-processing.
For example, you can define a Pydantic model to represent the expected response structure and pass it as the `response_format` when instantiating the LLM. The model will then be used to convert the LLM output into a structured Python object.
```python Code
from crewai import LLM
class Dog(BaseModel):
name: str
age: int
breed: str
llm = LLM(model="gpt-4o", response_format=Dog)
response = llm.call(
"Analyze the following messages and return the name, age, and breed. "
"Meet Kona! She is 3 years old and is a black german shepherd."
)
print(response)
# Output:
# Dog(name='Kona', age=3, breed='black german shepherd')
```
## Advanced Features and Optimization
Learn how to get the most out of your LLM configuration:
<AccordionGroup>
<Accordion title="Context Window Management">
CrewAI includes smart context management features:
```python
from crewai import LLM
# CrewAI automatically handles:
# 1. Token counting and tracking
# 2. Content summarization when needed
# 3. Task splitting for large contexts
llm = LLM(
model="gpt-4",
max_tokens=4000, # Limit response length
)
```
<Info>
Best practices for context management:
1. Choose models with appropriate context windows
2. Pre-process long inputs when possible
3. Use chunking for large documents
4. Monitor token usage to optimize costs
</Info>
</Accordion>
<Accordion title="Performance Optimization">
<Steps>
<Step title="Token Usage Optimization">
Choose the right context window for your task:
- Small tasks (up to 4K tokens): Standard models
- Medium tasks (between 4K-32K): Enhanced models
- Large tasks (over 32K): Large context models
```python
# Configure model with appropriate settings
llm = LLM(
model="openai/gpt-4-turbo-preview",
temperature=0.7, # Adjust based on task
max_tokens=4096, # Set based on output needs
timeout=300 # Longer timeout for complex tasks
)
```
<Tip>
- Lower temperature (0.1 to 0.3) for factual responses
- Higher temperature (0.7 to 0.9) for creative tasks
</Tip>
</Step>
<Step title="Best Practices">
1. Monitor token usage
2. Implement rate limiting
3. Use caching when possible
4. Set appropriate max_tokens limits
</Step>
</Steps>
<Info>
Remember to regularly monitor your token usage and adjust your configuration as needed to optimize costs and performance.
</Info>
</Accordion>
<Accordion title="Drop Additional Parameters">
CrewAI internally uses native sdks for LLM calls, which allows you to drop additional parameters that are not needed for your specific use case. This can help simplify your code and reduce the complexity of your LLM configuration.
For example, if you don't need to send the <code>stop</code> parameter, you can simply omit it from your LLM call:
```python
from crewai import LLM
import os
os.environ["OPENAI_API_KEY"] = "<api-key>"
o3_llm = LLM(
model="o3",
drop_params=True,
additional_drop_params=["stop"]
)
```
</Accordion>
<Accordion title="Transport Interceptors">
CrewAI provides message interceptors for several providers, allowing you to hook into request/response cycles at the transport layer.
**Supported Providers:**
- ✅ OpenAI
- ✅ Anthropic
**Basic Usage:**
```python
import httpx
from crewai import LLM
from crewai.llms.hooks import BaseInterceptor
class CustomInterceptor(BaseInterceptor[httpx.Request, httpx.Response]):
"""Custom interceptor to modify requests and responses."""
def on_outbound(self, request: httpx.Request) -> httpx.Request:
"""Print request before sending to the LLM provider."""
print(request)
return request
def on_inbound(self, response: httpx.Response) -> httpx.Response:
"""Process response after receiving from the LLM provider."""
print(f"Status: {response.status_code}")
print(f"Response time: {response.elapsed}")
return response
# Use the interceptor with an LLM
llm = LLM(
model="openai/gpt-4o",
interceptor=CustomInterceptor()
)
```
**Important Notes:**
- Both methods must return the received object or type of object.
- Modifying received objects may result in unexpected behavior or application crashes.
- Not all providers support interceptors - check the supported providers list above
<Info>
Interceptors operate at the transport layer. This is particularly useful for:
- Message transformation and filtering
- Debugging API interactions
</Info>
</Accordion>
</AccordionGroup>
## Common Issues and Solutions
<Tabs>
<Tab title="Authentication">
<Warning>
Most authentication issues can be resolved by checking API key format and environment variable names.
</Warning>
```bash
# OpenAI
OPENAI_API_KEY=sk-...
# Anthropic
ANTHROPIC_API_KEY=sk-ant-...
```
</Tab>
<Tab title="Model Names">
<Check>
Always include the provider prefix in model names
</Check>
```python
# Correct
llm = LLM(model="openai/gpt-4")
# Incorrect
llm = LLM(model="gpt-4")
```
</Tab>
<Tab title="Context Length">
<Tip>
Use larger context models for extensive tasks
</Tip>
```python
# Large context model
llm = LLM(model="openai/gpt-4o") # 128K tokens
```
</Tab>
</Tabs>