mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-04-25 12:22:38 +00:00
568 lines
18 KiB
Plaintext
568 lines
18 KiB
Plaintext
---
|
|
title: 'LLMs'
|
|
description: 'A comprehensive guide to configuring and using Large Language Models (LLMs) in your CrewAI projects'
|
|
icon: 'microchip-ai'
|
|
mode: "wide"
|
|
---
|
|
|
|
## Overview
|
|
|
|
CrewAI integrates with multiple LLM providers through providers native sdks, giving you the flexibility to choose the right model for your specific use case. This guide will help you understand how to configure and use different LLM providers in your CrewAI projects.
|
|
|
|
## When to Use Advanced LLM Configuration
|
|
|
|
- You need strict control of latency, cost, and output format.
|
|
- You need model routing by task type.
|
|
- You need reproducible, policy-sensitive behavior in production.
|
|
|
|
## When Not to Over-Configure
|
|
|
|
- You are in early prototyping with one simple task path.
|
|
- You do not yet need structured outputs or model routing.
|
|
|
|
|
|
## What are LLMs?
|
|
|
|
Large Language Models (LLMs) are the core intelligence behind CrewAI agents. They enable agents to understand context, make decisions, and generate human-like responses. Here's what you need to know:
|
|
|
|
<CardGroup cols={2}>
|
|
<Card title="LLM Basics" icon="brain">
|
|
Large Language Models are AI systems trained on vast amounts of text data. They power the intelligence of your CrewAI agents, enabling them to understand and generate human-like text.
|
|
</Card>
|
|
<Card title="Context Window" icon="window">
|
|
The context window determines how much text an LLM can process at once. Larger windows (e.g., 128K tokens) allow for more context but may be more expensive and slower.
|
|
</Card>
|
|
<Card title="Temperature" icon="temperature-three-quarters">
|
|
Temperature (0.0 to 1.0) controls response randomness. Lower values (e.g., 0.2) produce more focused, deterministic outputs, while higher values (e.g., 0.8) increase creativity and variability.
|
|
</Card>
|
|
<Card title="Provider Selection" icon="server">
|
|
Each LLM provider (e.g., OpenAI, Anthropic, Google) offers different models with varying capabilities, pricing, and features. Choose based on your needs for accuracy, speed, and cost.
|
|
</Card>
|
|
</CardGroup>
|
|
|
|
## Setting up your LLM
|
|
|
|
There are different places in CrewAI code where you can specify the model to use. Once you specify the model you are using, you will need to provide the configuration (like an API key) for each of the model providers you use. See the [provider configuration examples](#provider-configuration-examples) section for your provider.
|
|
|
|
<Tabs>
|
|
<Tab title="1. Environment Variables">
|
|
The simplest way to get started. Set the model in your environment directly, through an `.env` file or in your app code. If you used `crewai create` to bootstrap your project, it will be set already.
|
|
|
|
```bash .env
|
|
MODEL=model-id # e.g. gpt-4o, gemini-2.0-flash, claude-3-sonnet-...
|
|
|
|
# Be sure to set your API keys here too. See the Provider
|
|
# section below.
|
|
```
|
|
|
|
<Warning>
|
|
Never commit API keys to version control. Use environment files (.env) or your system's secret management.
|
|
</Warning>
|
|
</Tab>
|
|
<Tab title="2. YAML Configuration">
|
|
Create a YAML file to define your agent configurations. This method is great for version control and team collaboration:
|
|
|
|
```yaml agents.yaml {6}
|
|
researcher:
|
|
role: Research Specialist
|
|
goal: Conduct comprehensive research and analysis
|
|
backstory: A dedicated research professional with years of experience
|
|
verbose: true
|
|
llm: provider/model-id # e.g. openai/gpt-4o, google/gemini-2.0-flash, anthropic/claude...
|
|
# (see provider configuration examples below for more)
|
|
```
|
|
|
|
<Info>
|
|
The YAML configuration allows you to:
|
|
- Version control your agent settings
|
|
- Easily switch between different models
|
|
- Share configurations across team members
|
|
- Document model choices and their purposes
|
|
</Info>
|
|
</Tab>
|
|
<Tab title="3. Direct Code">
|
|
For maximum flexibility, configure LLMs directly in your Python code:
|
|
|
|
```python {4,8}
|
|
from crewai import LLM
|
|
|
|
# Basic configuration
|
|
llm = LLM(model="model-id-here") # gpt-4o, gemini-2.0-flash, anthropic/claude...
|
|
|
|
# Advanced configuration with detailed parameters
|
|
llm = LLM(
|
|
model="model-id-here", # gpt-4o, gemini-2.0-flash, anthropic/claude...
|
|
temperature=0.7, # Higher for more creative outputs
|
|
timeout=120, # Seconds to wait for response
|
|
max_tokens=4000, # Maximum length of response
|
|
top_p=0.9, # Nucleus sampling parameter
|
|
frequency_penalty=0.1 , # Reduce repetition
|
|
presence_penalty=0.1, # Encourage topic diversity
|
|
response_format={"type": "json"}, # For structured outputs
|
|
seed=42 # For reproducible results
|
|
)
|
|
```
|
|
|
|
<Info>
|
|
Parameter explanations:
|
|
- `temperature`: Controls randomness (0.0-1.0)
|
|
- `timeout`: Maximum wait time for response
|
|
- `max_tokens`: Limits response length
|
|
- `top_p`: Alternative to temperature for sampling
|
|
- `frequency_penalty`: Reduces word repetition
|
|
- `presence_penalty`: Encourages new topics
|
|
- `response_format`: Specifies output structure
|
|
- `seed`: Ensures consistent outputs
|
|
</Info>
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
## Production LLM Patterns
|
|
|
|
The basics above show how to configure one model. In real systems, you usually combine several LLM patterns for cost, quality, and reliability.
|
|
|
|
### Pattern 1: Route models by agent role
|
|
|
|
Use faster/cheaper models for extraction and heavier models for synthesis or critical decisions.
|
|
|
|
```python Code
|
|
from crewai import Agent, Crew, Process, Task
|
|
|
|
researcher = Agent(
|
|
role="Researcher",
|
|
goal="Collect factual inputs quickly",
|
|
backstory="Fast information-gathering specialist",
|
|
llm="openai/gpt-4o-mini",
|
|
)
|
|
|
|
reviewer = Agent(
|
|
role="Reviewer",
|
|
goal="Validate claims and produce final answer",
|
|
backstory="Careful editor focused on correctness",
|
|
llm="provider/model-id",
|
|
)
|
|
|
|
crew = Crew(
|
|
agents=[researcher, reviewer],
|
|
tasks=[
|
|
Task(
|
|
description="Find the latest policy changes and list the key points",
|
|
expected_output="Bullet list of validated policy changes",
|
|
agent=researcher,
|
|
),
|
|
Task(
|
|
description="Review findings and produce a final executive summary",
|
|
expected_output="Concise, decision-ready summary",
|
|
agent=reviewer,
|
|
),
|
|
],
|
|
process=Process.sequential,
|
|
)
|
|
```
|
|
|
|
### Pattern 2: Set reliability defaults once
|
|
|
|
Configure retry, timeout, and deterministic sampling in one reusable `LLM` object.
|
|
|
|
```python Code
|
|
from crewai import LLM
|
|
|
|
reliable_llm = LLM(
|
|
model="openai/gpt-4o-mini",
|
|
temperature=0.1,
|
|
timeout=45,
|
|
max_retries=3,
|
|
max_tokens=1200,
|
|
seed=7,
|
|
)
|
|
```
|
|
|
|
Use this for extraction, classification, and policy-sensitive tasks where variance should be low.
|
|
|
|
### Pattern 3: Use structured outputs for machine-readable responses
|
|
|
|
For downstream automation, force JSON-shaped outputs rather than free-form prose.
|
|
|
|
```python Code
|
|
from crewai import LLM
|
|
|
|
json_llm = LLM(
|
|
model="openai/gpt-4o",
|
|
response_format={"type": "json"},
|
|
temperature=0.0,
|
|
)
|
|
```
|
|
|
|
This reduces parser fragility in pipelines that feed APIs, databases, or workflow routers.
|
|
|
|
### Pattern 4: Use OpenAI Responses API for multi-turn reasoning flows
|
|
|
|
When you need built-in tools, response chaining, or reasoning-model workflows, enable the Responses API explicitly.
|
|
|
|
```python Code
|
|
from crewai import LLM
|
|
|
|
reasoning_llm = LLM(
|
|
model="openai/o4-mini",
|
|
api="responses",
|
|
auto_chain=True,
|
|
store=True,
|
|
reasoning_effort="medium",
|
|
)
|
|
```
|
|
|
|
This is especially useful in long-running assistants where you want conversation continuity and controllable reasoning depth.
|
|
|
|
## Provider Configuration
|
|
|
|
For concept-level usage, keep provider setup minimal and explicit:
|
|
|
|
1. Set provider credentials via environment variables.
|
|
2. Pin model IDs explicitly in code or YAML.
|
|
3. Set reliability defaults (`timeout`, `max_retries`, low `temperature`) for production.
|
|
|
|
Use these pages for deeper provider setup and runtime decisions:
|
|
- Connections and provider setup: [/en/learn/llm-connections](/en/learn/llm-connections)
|
|
- Custom provider integration: [/en/learn/custom-llm](/en/learn/custom-llm)
|
|
- Production routing and reliability patterns: [/en/ai/llms/patterns](/en/ai/llms/patterns)
|
|
- Parameter contract reference: [/en/ai/llms/reference](/en/ai/llms/reference)
|
|
|
|
## Streaming Responses
|
|
|
|
CrewAI supports streaming responses from LLMs, allowing your application to receive and process outputs in real-time as they're generated.
|
|
|
|
<Tabs>
|
|
<Tab title="Basic Setup">
|
|
Enable streaming by setting the `stream` parameter to `True` when initializing your LLM:
|
|
|
|
```python
|
|
from crewai import LLM
|
|
|
|
# Create an LLM with streaming enabled
|
|
llm = LLM(
|
|
model="openai/gpt-4o",
|
|
stream=True # Enable streaming
|
|
)
|
|
```
|
|
|
|
When streaming is enabled, responses are delivered in chunks as they're generated, creating a more responsive user experience.
|
|
</Tab>
|
|
|
|
<Tab title="Event Handling">
|
|
CrewAI emits events for each chunk received during streaming:
|
|
|
|
```python
|
|
from crewai.events import (
|
|
LLMStreamChunkEvent
|
|
)
|
|
from crewai.events import BaseEventListener
|
|
|
|
class MyCustomListener(BaseEventListener):
|
|
def setup_listeners(self, crewai_event_bus):
|
|
@crewai_event_bus.on(LLMStreamChunkEvent)
|
|
def on_llm_stream_chunk(self, event: LLMStreamChunkEvent):
|
|
# Process each chunk as it arrives
|
|
print(f"Received chunk: {event.chunk}")
|
|
|
|
my_listener = MyCustomListener()
|
|
```
|
|
|
|
<Tip>
|
|
[Click here](/en/concepts/event-listener#event-listeners) for more details
|
|
</Tip>
|
|
</Tab>
|
|
|
|
<Tab title="Agent & Task Tracking">
|
|
All LLM events in CrewAI include agent and task information, allowing you to track and filter LLM interactions by specific agents or tasks:
|
|
|
|
```python
|
|
from crewai import LLM, Agent, Task, Crew
|
|
from crewai.events import LLMStreamChunkEvent
|
|
from crewai.events import BaseEventListener
|
|
|
|
class MyCustomListener(BaseEventListener):
|
|
def setup_listeners(self, crewai_event_bus):
|
|
@crewai_event_bus.on(LLMStreamChunkEvent)
|
|
def on_llm_stream_chunk(source, event):
|
|
if researcher.id == event.agent_id:
|
|
print("\n==============\n Got event:", event, "\n==============\n")
|
|
|
|
|
|
my_listener = MyCustomListener()
|
|
|
|
llm = LLM(model="gpt-4o-mini", temperature=0, stream=True)
|
|
|
|
researcher = Agent(
|
|
role="About User",
|
|
goal="You know everything about the user.",
|
|
backstory="""You are a master at understanding people and their preferences.""",
|
|
llm=llm,
|
|
)
|
|
|
|
search = Task(
|
|
description="Answer the following questions about the user: {question}",
|
|
expected_output="An answer to the question.",
|
|
agent=researcher,
|
|
)
|
|
|
|
crew = Crew(agents=[researcher], tasks=[search])
|
|
|
|
result = crew.kickoff(
|
|
inputs={"question": "..."}
|
|
)
|
|
```
|
|
|
|
<Info>
|
|
This feature is particularly useful for:
|
|
- Debugging specific agent behaviors
|
|
- Logging LLM usage by task type
|
|
- Auditing which agents are making what types of LLM calls
|
|
- Performance monitoring of specific tasks
|
|
</Info>
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
## Async LLM Calls
|
|
|
|
CrewAI supports asynchronous LLM calls for improved performance and concurrency in your AI workflows. Async calls allow you to run multiple LLM requests concurrently without blocking, making them ideal for high-throughput applications and parallel agent operations.
|
|
|
|
<Tabs>
|
|
<Tab title="Basic Usage">
|
|
Use the `acall` method for asynchronous LLM requests:
|
|
|
|
```python
|
|
import asyncio
|
|
from crewai import LLM
|
|
|
|
async def main():
|
|
llm = LLM(model="openai/gpt-4o")
|
|
|
|
# Single async call
|
|
response = await llm.acall("What is the capital of France?")
|
|
print(response)
|
|
|
|
asyncio.run(main())
|
|
```
|
|
|
|
The `acall` method supports all the same parameters as the synchronous `call` method, including messages, tools, and callbacks.
|
|
</Tab>
|
|
|
|
<Tab title="With Streaming">
|
|
Combine async calls with streaming for real-time concurrent responses:
|
|
|
|
```python
|
|
import asyncio
|
|
from crewai import LLM
|
|
|
|
async def stream_async():
|
|
llm = LLM(model="openai/gpt-4o", stream=True)
|
|
|
|
response = await llm.acall("Write a short story about AI")
|
|
|
|
print(response)
|
|
|
|
asyncio.run(stream_async())
|
|
```
|
|
</Tab>
|
|
</Tabs>
|
|
|
|
## Structured LLM Calls
|
|
|
|
CrewAI supports structured responses from LLM calls by allowing you to define a `response_format` using a Pydantic model. This enables the framework to automatically parse and validate the output, making it easier to integrate the response into your application without manual post-processing.
|
|
|
|
For example, you can define a Pydantic model to represent the expected response structure and pass it as the `response_format` when instantiating the LLM. The model will then be used to convert the LLM output into a structured Python object.
|
|
|
|
```python Code
|
|
from crewai import LLM
|
|
|
|
class Dog(BaseModel):
|
|
name: str
|
|
age: int
|
|
breed: str
|
|
|
|
|
|
llm = LLM(model="gpt-4o", response_format=Dog)
|
|
|
|
response = llm.call(
|
|
"Analyze the following messages and return the name, age, and breed. "
|
|
"Meet Kona! She is 3 years old and is a black german shepherd."
|
|
)
|
|
print(response)
|
|
|
|
# Output:
|
|
# Dog(name='Kona', age=3, breed='black german shepherd')
|
|
```
|
|
|
|
## Advanced Features and Optimization
|
|
|
|
Learn how to get the most out of your LLM configuration:
|
|
|
|
<AccordionGroup>
|
|
<Accordion title="Context Window Management">
|
|
CrewAI includes smart context management features:
|
|
|
|
```python
|
|
from crewai import LLM
|
|
|
|
# CrewAI automatically handles:
|
|
# 1. Token counting and tracking
|
|
# 2. Content summarization when needed
|
|
# 3. Task splitting for large contexts
|
|
|
|
llm = LLM(
|
|
model="gpt-4",
|
|
max_tokens=4000, # Limit response length
|
|
)
|
|
```
|
|
|
|
<Info>
|
|
Best practices for context management:
|
|
1. Choose models with appropriate context windows
|
|
2. Pre-process long inputs when possible
|
|
3. Use chunking for large documents
|
|
4. Monitor token usage to optimize costs
|
|
</Info>
|
|
</Accordion>
|
|
|
|
<Accordion title="Performance Optimization">
|
|
<Steps>
|
|
<Step title="Token Usage Optimization">
|
|
Choose the right context window for your task:
|
|
- Small tasks (up to 4K tokens): Standard models
|
|
- Medium tasks (between 4K-32K): Enhanced models
|
|
- Large tasks (over 32K): Large context models
|
|
|
|
```python
|
|
# Configure model with appropriate settings
|
|
llm = LLM(
|
|
model="openai/gpt-4-turbo-preview",
|
|
temperature=0.7, # Adjust based on task
|
|
max_tokens=4096, # Set based on output needs
|
|
timeout=300 # Longer timeout for complex tasks
|
|
)
|
|
```
|
|
<Tip>
|
|
- Lower temperature (0.1 to 0.3) for factual responses
|
|
- Higher temperature (0.7 to 0.9) for creative tasks
|
|
</Tip>
|
|
</Step>
|
|
|
|
<Step title="Best Practices">
|
|
1. Monitor token usage
|
|
2. Implement rate limiting
|
|
3. Use caching when possible
|
|
4. Set appropriate max_tokens limits
|
|
</Step>
|
|
</Steps>
|
|
|
|
<Info>
|
|
Remember to regularly monitor your token usage and adjust your configuration as needed to optimize costs and performance.
|
|
</Info>
|
|
</Accordion>
|
|
|
|
<Accordion title="Drop Additional Parameters">
|
|
CrewAI internally uses native sdks for LLM calls, which allows you to drop additional parameters that are not needed for your specific use case. This can help simplify your code and reduce the complexity of your LLM configuration.
|
|
For example, if you don't need to send the <code>stop</code> parameter, you can simply omit it from your LLM call:
|
|
|
|
```python
|
|
from crewai import LLM
|
|
import os
|
|
|
|
os.environ["OPENAI_API_KEY"] = "<api-key>"
|
|
|
|
o3_llm = LLM(
|
|
model="o3",
|
|
drop_params=True,
|
|
additional_drop_params=["stop"]
|
|
)
|
|
```
|
|
</Accordion>
|
|
|
|
<Accordion title="Transport Interceptors">
|
|
CrewAI provides message interceptors for several providers, allowing you to hook into request/response cycles at the transport layer.
|
|
|
|
**Supported Providers:**
|
|
- ✅ OpenAI
|
|
- ✅ Anthropic
|
|
|
|
**Basic Usage:**
|
|
```python
|
|
import httpx
|
|
from crewai import LLM
|
|
from crewai.llms.hooks import BaseInterceptor
|
|
|
|
class CustomInterceptor(BaseInterceptor[httpx.Request, httpx.Response]):
|
|
"""Custom interceptor to modify requests and responses."""
|
|
|
|
def on_outbound(self, request: httpx.Request) -> httpx.Request:
|
|
"""Print request before sending to the LLM provider."""
|
|
print(request)
|
|
return request
|
|
|
|
def on_inbound(self, response: httpx.Response) -> httpx.Response:
|
|
"""Process response after receiving from the LLM provider."""
|
|
print(f"Status: {response.status_code}")
|
|
print(f"Response time: {response.elapsed}")
|
|
return response
|
|
|
|
# Use the interceptor with an LLM
|
|
llm = LLM(
|
|
model="openai/gpt-4o",
|
|
interceptor=CustomInterceptor()
|
|
)
|
|
```
|
|
|
|
**Important Notes:**
|
|
- Both methods must return the received object or type of object.
|
|
- Modifying received objects may result in unexpected behavior or application crashes.
|
|
- Not all providers support interceptors - check the supported providers list above
|
|
|
|
<Info>
|
|
Interceptors operate at the transport layer. This is particularly useful for:
|
|
- Message transformation and filtering
|
|
- Debugging API interactions
|
|
</Info>
|
|
</Accordion>
|
|
</AccordionGroup>
|
|
|
|
## Common Issues and Solutions
|
|
|
|
<Tabs>
|
|
<Tab title="Authentication">
|
|
<Warning>
|
|
Most authentication issues can be resolved by checking API key format and environment variable names.
|
|
</Warning>
|
|
|
|
```bash
|
|
# OpenAI
|
|
OPENAI_API_KEY=sk-...
|
|
|
|
# Anthropic
|
|
ANTHROPIC_API_KEY=sk-ant-...
|
|
```
|
|
</Tab>
|
|
<Tab title="Model Names">
|
|
<Check>
|
|
Always include the provider prefix in model names
|
|
</Check>
|
|
|
|
```python
|
|
# Correct
|
|
llm = LLM(model="openai/gpt-4")
|
|
|
|
# Incorrect
|
|
llm = LLM(model="gpt-4")
|
|
```
|
|
</Tab>
|
|
<Tab title="Context Length">
|
|
<Tip>
|
|
Use larger context models for extensive tasks
|
|
</Tip>
|
|
|
|
```python
|
|
# Large context model
|
|
llm = LLM(model="openai/gpt-4o") # 128K tokens
|
|
```
|
|
</Tab>
|
|
</Tabs>
|