diff --git a/docs/concepts/llms.mdx b/docs/concepts/llms.mdx
index 117face04..12061d1a6 100644
--- a/docs/concepts/llms.mdx
+++ b/docs/concepts/llms.mdx
@@ -27,157 +27,6 @@ Large Language Models (LLMs) are the core intelligence behind CrewAI agents. The
-## Available Models and Their Capabilities
-
-Here's a detailed breakdown of supported models and their capabilities, you can compare performance at [lmarena.ai](https://lmarena.ai/?leaderboard) and [artificialanalysis.ai](https://artificialanalysis.ai/):
-
-
-
- | Model | Context Window | Best For |
- |-------|---------------|-----------|
- | GPT-4 | 8,192 tokens | High-accuracy tasks, complex reasoning |
- | GPT-4 Turbo | 128,000 tokens | Long-form content, document analysis |
- | GPT-4o & GPT-4o-mini | 128,000 tokens | Cost-effective large context processing |
- | o3-mini | 200,000 tokens | Fast reasoning, complex reasoning |
-
-
- 1 token ≈ 4 characters in English. For example, 8,192 tokens ≈ 32,768 characters or about 6,000 words.
-
-
-
- | Model | Context Window | Best For |
- |-------|---------------|-----------|
- | nvidia/mistral-nemo-minitron-8b-8k-instruct | 8,192 tokens | State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation. |
- | nvidia/nemotron-4-mini-hindi-4b-instruct| 4,096 tokens | A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language. |
- | "nvidia/llama-3.1-nemotron-70b-instruct | 128k tokens | Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA in order to improve the helpfulness of LLM generated responses. |
- | nvidia/llama3-chatqa-1.5-8b | 128k tokens | Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines. |
- | nvidia/llama3-chatqa-1.5-70b | 128k tokens | Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines. |
- | nvidia/vila | 128k tokens | Multi-modal vision-language model that understands text/img/video and creates informative responses |
- | nvidia/neva-22| 4,096 tokens | Multi-modal vision-language model that understands text/images and generates informative responses |
- | nvidia/nemotron-mini-4b-instruct | 8,192 tokens | General-purpose tasks |
- | nvidia/usdcode-llama3-70b-instruct | 128k tokens | State-of-the-art LLM that answers OpenUSD knowledge queries and generates USD-Python code. |
- | nvidia/nemotron-4-340b-instruct | 4,096 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. |
- | meta/codellama-70b | 100k tokens | LLM capable of generating code from natural language and vice versa. |
- | meta/llama2-70b | 4,096 tokens | Cutting-edge large language AI model capable of generating text and code in response to prompts. |
- | meta/llama3-8b-instruct | 8,192 tokens | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
- | meta/llama3-70b-instruct | 8,192 tokens | Powers complex conversations with superior contextual understanding, reasoning and text generation. |
- | meta/llama-3.1-8b-instruct | 128k tokens | Advanced state-of-the-art model with language understanding, superior reasoning, and text generation. |
- | meta/llama-3.1-70b-instruct | 128k tokens | Powers complex conversations with superior contextual understanding, reasoning and text generation. |
- | meta/llama-3.1-405b-instruct | 128k tokens | Advanced LLM for synthetic data generation, distillation, and inference for chatbots, coding, and domain-specific tasks. |
- | meta/llama-3.2-1b-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
- | meta/llama-3.2-3b-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
- | meta/llama-3.2-11b-vision-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
- | meta/llama-3.2-90b-vision-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
- | meta/llama-3.1-70b-instruct | 128k tokens | Powers complex conversations with superior contextual understanding, reasoning and text generation. |
- | google/gemma-7b | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. |
- | google/gemma-2b | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. |
- | google/codegemma-7b | 8,192 tokens | Cutting-edge model built on Google's Gemma-7B specialized for code generation and code completion. |
- | google/codegemma-1.1-7b | 8,192 tokens | Advanced programming model for code generation, completion, reasoning, and instruction following. |
- | google/recurrentgemma-2b | 8,192 tokens | Novel recurrent architecture based language model for faster inference when generating long sequences. |
- | google/gemma-2-9b-it | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. |
- | google/gemma-2-27b-it | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. |
- | google/gemma-2-2b-it | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. |
- | google/deplot | 512 tokens | One-shot visual language understanding model that translates images of plots into tables. |
- | google/paligemma | 8,192 tokens | Vision language model adept at comprehending text and visual inputs to produce informative responses. |
- | mistralai/mistral-7b-instruct-v0.2 | 32k tokens | This LLM follows instructions, completes requests, and generates creative text. |
- | mistralai/mixtral-8x7b-instruct-v0.1 | 8,192 tokens | An MOE LLM that follows instructions, completes requests, and generates creative text. |
- | mistralai/mistral-large | 4,096 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. |
- | mistralai/mixtral-8x22b-instruct-v0.1 | 8,192 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. |
- | mistralai/mistral-7b-instruct-v0.3 | 32k tokens | This LLM follows instructions, completes requests, and generates creative text. |
- | nv-mistralai/mistral-nemo-12b-instruct | 128k tokens | Most advanced language model for reasoning, code, multilingual tasks; runs on a single GPU. |
- | mistralai/mamba-codestral-7b-v0.1 | 256k tokens | Model for writing and interacting with code across a wide range of programming languages and tasks. |
- | microsoft/phi-3-mini-128k-instruct | 128K tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
- | microsoft/phi-3-mini-4k-instruct | 4,096 tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
- | microsoft/phi-3-small-8k-instruct | 8,192 tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
- | microsoft/phi-3-small-128k-instruct | 128K tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
- | microsoft/phi-3-medium-4k-instruct | 4,096 tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
- | microsoft/phi-3-medium-128k-instruct | 128K tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
- | microsoft/phi-3.5-mini-instruct | 128K tokens | Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments |
- | microsoft/phi-3.5-moe-instruct | 128K tokens | Advanced LLM based on Mixture of Experts architecure to deliver compute efficient content generation |
- | microsoft/kosmos-2 | 1,024 tokens | Groundbreaking multimodal model designed to understand and reason about visual elements in images. |
- | microsoft/phi-3-vision-128k-instruct | 128k tokens | Cutting-edge open multimodal model exceling in high-quality reasoning from images. |
- | microsoft/phi-3.5-vision-instruct | 128k tokens | Cutting-edge open multimodal model exceling in high-quality reasoning from images. |
- | databricks/dbrx-instruct | 12k tokens | A general-purpose LLM with state-of-the-art performance in language understanding, coding, and RAG. |
- | snowflake/arctic | 1,024 tokens | Delivers high efficiency inference for enterprise applications focused on SQL generation and coding. |
- | aisingapore/sea-lion-7b-instruct | 4,096 tokens | LLM to represent and serve the linguistic and cultural diversity of Southeast Asia |
- | ibm/granite-8b-code-instruct | 4,096 tokens | Software programming LLM for code generation, completion, explanation, and multi-turn conversion. |
- | ibm/granite-34b-code-instruct | 8,192 tokens | Software programming LLM for code generation, completion, explanation, and multi-turn conversion. |
- | ibm/granite-3.0-8b-instruct | 4,096 tokens | Advanced Small Language Model supporting RAG, summarization, classification, code, and agentic AI |
- | ibm/granite-3.0-3b-a800m-instruct | 4,096 tokens | Highly efficient Mixture of Experts model for RAG, summarization, entity extraction, and classification |
- | mediatek/breeze-7b-instruct | 4,096 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. |
- | upstage/solar-10.7b-instruct | 4,096 tokens | Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics. |
- | writer/palmyra-med-70b-32k | 32k tokens | Leading LLM for accurate, contextually relevant responses in the medical domain. |
- | writer/palmyra-med-70b | 32k tokens | Leading LLM for accurate, contextually relevant responses in the medical domain. |
- | writer/palmyra-fin-70b-32k | 32k tokens | Specialized LLM for financial analysis, reporting, and data processing |
- | 01-ai/yi-large | 32k tokens | Powerful model trained on English and Chinese for diverse tasks including chatbot and creative writing. |
- | deepseek-ai/deepseek-coder-6.7b-instruct | 2k tokens | Powerful coding model offering advanced capabilities in code generation, completion, and infilling |
- | rakuten/rakutenai-7b-instruct | 1,024 tokens | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
- | rakuten/rakutenai-7b-chat | 1,024 tokens | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
- | baichuan-inc/baichuan2-13b-chat | 4,096 tokens | Support Chinese and English chat, coding, math, instruction following, solving quizzes |
-
-
- NVIDIA's NIM support for models is expanding continuously! For the most up-to-date list of available models, please visit build.nvidia.com.
-
-
-
- | Model | Context Window | Best For |
- |-------|---------------|-----------|
- | gemini-2.0-flash-exp | 1M tokens | Higher quality at faster speed, multimodal model, good for most tasks |
- | gemini-1.5-flash | 1M tokens | Balanced multimodal model, good for most tasks |
- | gemini-1.5-flash-8B | 1M tokens | Fastest, most cost-efficient, good for high-frequency tasks |
- | gemini-1.5-pro | 2M tokens | Best performing, wide variety of reasoning tasks including logical reasoning, coding, and creative collaboration |
-
-
- Google's Gemini models are all multimodal, supporting audio, images, video and text, supporting context caching, json schema, function calling, etc.
-
- These models are available via API_KEY from
- [The Gemini API](https://ai.google.dev/gemini-api/docs) and also from
- [Google Cloud Vertex](https://cloud.google.com/vertex-ai/generative-ai/docs/migrate/migrate-google-ai) as part of the
- [Model Garden](https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models).
-
-
-
- | Model | Context Window | Best For |
- |-------|---------------|-----------|
- | Llama 3.1 70B/8B | 131,072 tokens | High-performance, large context tasks |
- | Llama 3.2 Series | 8,192 tokens | General-purpose tasks |
- | Mixtral 8x7B | 32,768 tokens | Balanced performance and context |
-
-
- Groq is known for its fast inference speeds, making it suitable for real-time applications.
-
-
-
- | Model | Context Window | Best For |
- |-------|---------------|-----------|
- | Llama 3.1 70B/8B | Up to 131,072 tokens | High-performance, large context tasks |
- | Llama 3.1 405B | 8,192 tokens | High-performance and output quality |
- | Llama 3.2 Series | 8,192 tokens | General-purpose tasks, multimodal |
- | Llama 3.3 70B | Up to 131,072 tokens | High-performance and output quality|
- | Qwen2 familly | 8,192 tokens | High-performance and output quality |
-
-
- [SambaNova](https://cloud.sambanova.ai/) has several models with fast inference speed at full precision.
-
-
-
- | Provider | Context Window | Key Features |
- |----------|---------------|--------------|
- | Deepseek Chat | 64,000 tokens | Specialized in technical discussions |
- | Deepseek R1 | 64,000 tokens | Affordable reasoning model |
- | Claude 3 | Up to 200K tokens | Strong reasoning, code understanding |
- | Gemma Series | 8,192 tokens | Efficient, smaller-scale tasks |
-
-
- Provider selection should consider factors like:
- - API availability in your region
- - Pricing structure
- - Required features (e.g., streaming, function calling)
- - Performance requirements
-
-
-
-
## Setting Up Your LLM
There are three ways to configure LLMs in CrewAI. Choose the method that best fits your workflow:
@@ -206,102 +55,12 @@ There are three ways to configure LLMs in CrewAI. Choose the method that best fi
```yaml
researcher:
- # Agent Definition
role: Research Specialist
goal: Conduct comprehensive research and analysis
backstory: A dedicated research professional with years of experience
verbose: true
-
- # Model Selection (uncomment your choice)
-
- # OpenAI Models - Known for reliability and performance
- llm: openai/gpt-4o-mini
- # llm: openai/gpt-4 # More accurate but expensive
- # llm: openai/gpt-4-turbo # Fast with large context
- # llm: openai/gpt-4o # Optimized for longer texts
- # llm: openai/o1-preview # Latest features
- # llm: openai/o1-mini # Cost-effective
-
- # Azure Models - For enterprise deployments
- # llm: azure/gpt-4o-mini
- # llm: azure/gpt-4
- # llm: azure/gpt-35-turbo
-
- # Anthropic Models - Strong reasoning capabilities
- # llm: anthropic/claude-3-opus-20240229-v1:0
- # llm: anthropic/claude-3-sonnet-20240229-v1:0
- # llm: anthropic/claude-3-haiku-20240307-v1:0
- # llm: anthropic/claude-2.1
- # llm: anthropic/claude-2.0
-
- # Google Models - Strong reasoning, large cachable context window, multimodal
- # llm: gemini/gemini-1.5-pro-latest
- # llm: gemini/gemini-1.5-flash-latest
- # llm: gemini/gemini-1.5-flash-8b-latest
-
- # AWS Bedrock Models - Enterprise-grade
- # llm: bedrock/anthropic.claude-3-sonnet-20240229-v1:0
- # llm: bedrock/anthropic.claude-v2:1
- # llm: bedrock/amazon.titan-text-express-v1
- # llm: bedrock/meta.llama2-70b-chat-v1
-
- # Amazon SageMaker Models - Enterprise-grade
- # llm: sagemaker/
-
- # Mistral Models - Open source alternative
- # llm: mistral/mistral-large-latest
- # llm: mistral/mistral-medium-latest
- # llm: mistral/mistral-small-latest
-
- # Groq Models - Fast inference
- # llm: groq/mixtral-8x7b-32768
- # llm: groq/llama-3.1-70b-versatile
- # llm: groq/llama-3.2-90b-text-preview
- # llm: groq/gemma2-9b-it
- # llm: groq/gemma-7b-it
-
- # IBM watsonx.ai Models - Enterprise features
- # llm: watsonx/ibm/granite-13b-chat-v2
- # llm: watsonx/meta-llama/llama-3-1-70b-instruct
- # llm: watsonx/bigcode/starcoder2-15b
-
- # Ollama Models - Local deployment
- # llm: ollama/llama3:70b
- # llm: ollama/codellama
- # llm: ollama/mistral
- # llm: ollama/mixtral
- # llm: ollama/phi
-
- # Fireworks AI Models - Specialized tasks
- # llm: fireworks_ai/accounts/fireworks/models/llama-v3-70b-instruct
- # llm: fireworks_ai/accounts/fireworks/models/mixtral-8x7b
- # llm: fireworks_ai/accounts/fireworks/models/zephyr-7b-beta
-
- # Perplexity AI Models - Research focused
- # llm: pplx/llama-3.1-sonar-large-128k-online
- # llm: pplx/mistral-7b-instruct
- # llm: pplx/codellama-34b-instruct
- # llm: pplx/mixtral-8x7b-instruct
-
- # Hugging Face Models - Community models
- # llm: huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct
- # llm: huggingface/mistralai/Mixtral-8x7B-Instruct-v0.1
- # llm: huggingface/tiiuae/falcon-180B-chat
- # llm: huggingface/google/gemma-7b-it
-
- # Nvidia NIM Models - GPU-optimized
- # llm: nvidia_nim/meta/llama3-70b-instruct
- # llm: nvidia_nim/mistral/mixtral-8x7b
- # llm: nvidia_nim/google/gemma-7b
-
- # SambaNova Models - Enterprise AI
- # llm: sambanova/Meta-Llama-3.1-8B-Instruct
- # llm: sambanova/BioMistral-7B
- # llm: sambanova/Falcon-180B
-
- # Open Router Models - Affordable reasoning
- # llm: openrouter/deepseek/deepseek-r1
- # llm: openrouter/deepseek/deepseek-chat
+ llm: openai/gpt-4o-mini # your model here
+ # (see provider configuration examples below for more)
```
@@ -349,6 +108,465 @@ There are three ways to configure LLMs in CrewAI. Choose the method that best fi
+## Provider Configuration Examples
+
+
+CrewAI supports a multitude of LLM providers, each offering unique features, authentication methods, and model capabilities.
+In this section, you'll find detailed examples that help you select, configure, and optimize the LLM that best fits your project's needs.
+
+
+
+ Set the following environment variables in your `.env` file:
+
+ ```toml Code
+ # Required
+ OPENAI_API_KEY=sk-...
+
+ # Optional
+ OPENAI_API_BASE=
+ OPENAI_ORGANIZATION=
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ from crewai import LLM
+
+ llm = LLM(
+ model="openai/gpt-4", # call model by provider/model_name
+ temperature=0.8,
+ max_tokens=150,
+ top_p=0.9,
+ frequency_penalty=0.1,
+ presence_penalty=0.1,
+ stop=["END"],
+ seed=42
+ )
+ ```
+
+ OpenAI is one of the leading providers of LLMs with a wide range of models and features.
+
+ | Model | Context Window | Best For |
+ |---------------------|------------------|-----------------------------------------------|
+ | GPT-4 | 8,192 tokens | High-accuracy tasks, complex reasoning |
+ | GPT-4 Turbo | 128,000 tokens | Long-form content, document analysis |
+ | GPT-4o & GPT-4o-mini | 128,000 tokens | Cost-effective large context processing |
+ | o3-mini | 200,000 tokens | Fast reasoning, complex reasoning |
+ | o1-mini | 128,000 tokens | Fast reasoning, complex reasoning |
+ | o1-preview | 128,000 tokens | Fast reasoning, complex reasoning |
+ | o1 | 200,000 tokens | Fast reasoning, complex reasoning |
+
+
+
+ ```toml Code
+ ANTHROPIC_API_KEY=sk-ant-...
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ llm = LLM(
+ model="anthropic/claude-3-sonnet-20240229-v1:0",
+ temperature=0.7
+ )
+ ```
+
+
+
+ Set the following environment variables in your `.env` file:
+
+ ```toml Code
+ # Option 1: Gemini accessed with an API key.
+ # https://ai.google.dev/gemini-api/docs/api-key
+ GEMINI_API_KEY=
+
+ # Option 2: Vertex AI IAM credentials for Gemini, Anthropic, and Model Garden.
+ # https://cloud.google.com/vertex-ai/generative-ai/docs/overview
+ ```
+
+ Get credentials from your Google Cloud Console and save it to a JSON file with the following code:
+ ```python Code
+ import json
+
+ file_path = 'path/to/vertex_ai_service_account.json'
+
+ # Load the JSON file
+ with open(file_path, 'r') as file:
+ vertex_credentials = json.load(file)
+
+ # Convert the credentials to a JSON string
+ vertex_credentials_json = json.dumps(vertex_credentials)
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ from crewai import LLM
+
+ llm = LLM(
+ model="gemini/gemini-1.5-pro-latest",
+ temperature=0.7,
+ vertex_credentials=vertex_credentials_json
+ )
+ ```
+ Google offers a range of powerful models optimized for different use cases:
+
+ | Model | Context Window | Best For |
+ |-----------------------|----------------|------------------------------------------------------------------|
+ | gemini-2.0-flash-exp | 1M tokens | Higher quality at faster speed, multimodal model, good for most tasks |
+ | gemini-1.5-flash | 1M tokens | Balanced multimodal model, good for most tasks |
+ | gemini-1.5-flash-8B | 1M tokens | Fastest, most cost-efficient, good for high-frequency tasks |
+ | gemini-1.5-pro | 2M tokens | Best performing, wide variety of reasoning tasks including logical reasoning, coding, and creative collaboration |
+
+
+
+ ```toml Code
+ # Required
+ AZURE_API_KEY=
+ AZURE_API_BASE=
+ AZURE_API_VERSION=
+
+ # Optional
+ AZURE_AD_TOKEN=
+ AZURE_API_TYPE=
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ llm = LLM(
+ model="azure/gpt-4",
+ api_version="2023-05-15"
+ )
+ ```
+
+
+
+ ```toml Code
+ AWS_ACCESS_KEY_ID=
+ AWS_SECRET_ACCESS_KEY=
+ AWS_DEFAULT_REGION=
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ llm = LLM(
+ model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0"
+ )
+ ```
+
+
+
+ ```toml Code
+ AWS_ACCESS_KEY_ID=
+ AWS_SECRET_ACCESS_KEY=
+ AWS_DEFAULT_REGION=
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ llm = LLM(
+ model="sagemaker/"
+ )
+ ```
+
+
+
+ Set the following environment variables in your `.env` file:
+ ```toml Code
+ MISTRAL_API_KEY=
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ llm = LLM(
+ model="mistral/mistral-large-latest",
+ temperature=0.7
+ )
+ ```
+
+
+
+ Set the following environment variables in your `.env` file:
+ ```toml Code
+ NVIDIA_API_KEY=
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ llm = LLM(
+ model="nvidia_nim/meta/llama3-70b-instruct",
+ temperature=0.7
+ )
+ ```
+
+ Nvidia NIM provides a comprehensive suite of models for various use cases, from general-purpose tasks to specialized applications.
+
+ | Model | Context Window | Best For |
+ |-------------------------------------------------------------------------|----------------|-------------------------------------------------------------------|
+ | nvidia/mistral-nemo-minitron-8b-8k-instruct | 8,192 tokens | State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation. |
+ | nvidia/nemotron-4-mini-hindi-4b-instruct | 4,096 tokens | A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language. |
+ | nvidia/llama-3.1-nemotron-70b-instruct | 128k tokens | Customized for enhanced helpfulness in responses |
+ | nvidia/llama3-chatqa-1.5-8b | 128k tokens | Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines. |
+ | nvidia/llama3-chatqa-1.5-70b | 128k tokens | Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines. |
+ | nvidia/vila | 128k tokens | Multi-modal vision-language model that understands text/img/video and creates informative responses |
+ | nvidia/neva-22 | 4,096 tokens | Multi-modal vision-language model that understands text/images and generates informative responses |
+ | nvidia/nemotron-mini-4b-instruct | 8,192 tokens | General-purpose tasks |
+ | nvidia/usdcode-llama3-70b-instruct | 128k tokens | State-of-the-art LLM that answers OpenUSD knowledge queries and generates USD-Python code. |
+ | nvidia/nemotron-4-340b-instruct | 4,096 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. |
+ | meta/codellama-70b | 100k tokens | LLM capable of generating code from natural language and vice versa. |
+ | meta/llama2-70b | 4,096 tokens | Cutting-edge large language AI model capable of generating text and code in response to prompts. |
+ | meta/llama3-8b-instruct | 8,192 tokens | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
+ | meta/llama3-70b-instruct | 8,192 tokens | Powers complex conversations with superior contextual understanding, reasoning and text generation. |
+ | meta/llama-3.1-8b-instruct | 128k tokens | Advanced state-of-the-art model with language understanding, superior reasoning, and text generation. |
+ | meta/llama-3.1-70b-instruct | 128k tokens | Powers complex conversations with superior contextual understanding, reasoning and text generation. |
+ | meta/llama-3.1-405b-instruct | 128k tokens | Advanced LLM for synthetic data generation, distillation, and inference for chatbots, coding, and domain-specific tasks. |
+ | meta/llama-3.2-1b-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
+ | meta/llama-3.2-3b-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
+ | meta/llama-3.2-11b-vision-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
+ | meta/llama-3.2-90b-vision-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. |
+ | google/gemma-7b | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. |
+ | google/gemma-2b | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. |
+ | google/codegemma-7b | 8,192 tokens | Cutting-edge model built on Google's Gemma-7B specialized for code generation and code completion. |
+ | google/codegemma-1.1-7b | 8,192 tokens | Advanced programming model for code generation, completion, reasoning, and instruction following. |
+ | google/recurrentgemma-2b | 8,192 tokens | Novel recurrent architecture based language model for faster inference when generating long sequences. |
+ | google/gemma-2-9b-it | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. |
+ | google/gemma-2-27b-it | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. |
+ | google/gemma-2-2b-it | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. |
+ | google/deplot | 512 tokens | One-shot visual language understanding model that translates images of plots into tables. |
+ | google/paligemma | 8,192 tokens | Vision language model adept at comprehending text and visual inputs to produce informative responses. |
+ | mistralai/mistral-7b-instruct-v0.2 | 32k tokens | This LLM follows instructions, completes requests, and generates creative text. |
+ | mistralai/mixtral-8x7b-instruct-v0.1 | 8,192 tokens | An MOE LLM that follows instructions, completes requests, and generates creative text. |
+ | mistralai/mistral-large | 4,096 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. |
+ | mistralai/mixtral-8x22b-instruct-v0.1 | 8,192 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. |
+ | mistralai/mistral-7b-instruct-v0.3 | 32k tokens | This LLM follows instructions, completes requests, and generates creative text. |
+ | nv-mistralai/mistral-nemo-12b-instruct | 128k tokens | Most advanced language model for reasoning, code, multilingual tasks; runs on a single GPU. |
+ | mistralai/mamba-codestral-7b-v0.1 | 256k tokens | Model for writing and interacting with code across a wide range of programming languages and tasks. |
+ | microsoft/phi-3-mini-128k-instruct | 128K tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
+ | microsoft/phi-3-mini-4k-instruct | 4,096 tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
+ | microsoft/phi-3-small-8k-instruct | 8,192 tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
+ | microsoft/phi-3-small-128k-instruct | 128K tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
+ | microsoft/phi-3-medium-4k-instruct | 4,096 tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
+ | microsoft/phi-3-medium-128k-instruct | 128K tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. |
+ | microsoft/phi-3.5-mini-instruct | 128K tokens | Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments |
+ | microsoft/phi-3.5-moe-instruct | 128K tokens | Advanced LLM based on Mixture of Experts architecure to deliver compute efficient content generation |
+ | microsoft/kosmos-2 | 1,024 tokens | Groundbreaking multimodal model designed to understand and reason about visual elements in images. |
+ | microsoft/phi-3-vision-128k-instruct | 128k tokens | Cutting-edge open multimodal model exceling in high-quality reasoning from images. |
+ | microsoft/phi-3.5-vision-instruct | 128k tokens | Cutting-edge open multimodal model exceling in high-quality reasoning from images. |
+ | databricks/dbrx-instruct | 12k tokens | A general-purpose LLM with state-of-the-art performance in language understanding, coding, and RAG. |
+ | snowflake/arctic | 1,024 tokens | Delivers high efficiency inference for enterprise applications focused on SQL generation and coding. |
+ | aisingapore/sea-lion-7b-instruct | 4,096 tokens | LLM to represent and serve the linguistic and cultural diversity of Southeast Asia |
+ | ibm/granite-8b-code-instruct | 4,096 tokens | Software programming LLM for code generation, completion, explanation, and multi-turn conversion. |
+ | ibm/granite-34b-code-instruct | 8,192 tokens | Software programming LLM for code generation, completion, explanation, and multi-turn conversion. |
+ | ibm/granite-3.0-8b-instruct | 4,096 tokens | Advanced Small Language Model supporting RAG, summarization, classification, code, and agentic AI |
+ | ibm/granite-3.0-3b-a800m-instruct | 4,096 tokens | Highly efficient Mixture of Experts model for RAG, summarization, entity extraction, and classification |
+ | mediatek/breeze-7b-instruct | 4,096 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. |
+ | upstage/solar-10.7b-instruct | 4,096 tokens | Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics. |
+ | writer/palmyra-med-70b-32k | 32k tokens | Leading LLM for accurate, contextually relevant responses in the medical domain. |
+ | writer/palmyra-med-70b | 32k tokens | Leading LLM for accurate, contextually relevant responses in the medical domain. |
+ | writer/palmyra-fin-70b-32k | 32k tokens | Specialized LLM for financial analysis, reporting, and data processing |
+ | 01-ai/yi-large | 32k tokens | Powerful model trained on English and Chinese for diverse tasks including chatbot and creative writing. |
+ | deepseek-ai/deepseek-coder-6.7b-instruct | 2k tokens | Powerful coding model offering advanced capabilities in code generation, completion, and infilling |
+ | rakuten/rakutenai-7b-instruct | 1,024 tokens | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
+ | rakuten/rakutenai-7b-chat | 1,024 tokens | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. |
+ | baichuan-inc/baichuan2-13b-chat | 4,096 tokens | Support Chinese and English chat, coding, math, instruction following, solving quizzes |
+
+
+
+ Set the following environment variables in your `.env` file:
+
+ ```toml Code
+ GROQ_API_KEY=
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ llm = LLM(
+ model="groq/llama-3.2-90b-text-preview",
+ temperature=0.7
+ )
+ ```
+ | Model | Context Window | Best For |
+ |-------------------|------------------|--------------------------------------------|
+ | Llama 3.1 70B/8B | 131,072 tokens | High-performance, large context tasks |
+ | Llama 3.2 Series | 8,192 tokens | General-purpose tasks |
+ | Mixtral 8x7B | 32,768 tokens | Balanced performance and context |
+
+
+
+ Set the following environment variables in your `.env` file:
+ ```toml Code
+ # Required
+ WATSONX_URL=
+ WATSONX_APIKEY=
+ WATSONX_PROJECT_ID=
+
+ # Optional
+ WATSONX_TOKEN=
+ WATSONX_DEPLOYMENT_SPACE_ID=
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ llm = LLM(
+ model="watsonx/meta-llama/llama-3-1-70b-instruct",
+ base_url="https://api.watsonx.ai/v1"
+ )
+ ```
+
+
+
+ 1. Install Ollama: [ollama.ai](https://ollama.ai/)
+ 2. Run a model: `ollama run llama2`
+ 3. Configure:
+
+ ```python Code
+ llm = LLM(
+ model="ollama/llama3:70b",
+ base_url="http://localhost:11434"
+ )
+ ```
+
+
+
+ Set the following environment variables in your `.env` file:
+ ```toml Code
+ FIREWORKS_API_KEY=
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ llm = LLM(
+ model="fireworks_ai/accounts/fireworks/models/llama-v3-70b-instruct",
+ temperature=0.7
+ )
+ ```
+
+
+
+ Set the following environment variables in your `.env` file:
+ ```toml Code
+ PERPLEXITY_API_KEY=
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ llm = LLM(
+ model="llama-3.1-sonar-large-128k-online",
+ base_url="https://api.perplexity.ai/"
+ )
+ ```
+
+
+
+ Set the following environment variables in your `.env` file:
+ ```toml Code
+ HUGGINGFACE_API_KEY=
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ llm = LLM(
+ model="huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct",
+ base_url="your_api_endpoint"
+ )
+ ```
+
+
+
+ Set the following environment variables in your `.env` file:
+
+ ```toml Code
+ SAMBANOVA_API_KEY=
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ llm = LLM(
+ model="sambanova/Meta-Llama-3.1-8B-Instruct",
+ temperature=0.7
+ )
+ ```
+ | Model | Context Window | Best For |
+ |--------------------|------------------------|----------------------------------------------|
+ | Llama 3.1 70B/8B | Up to 131,072 tokens | High-performance, large context tasks |
+ | Llama 3.1 405B | 8,192 tokens | High-performance and output quality |
+ | Llama 3.2 Series | 8,192 tokens | General-purpose, multimodal tasks |
+ | Llama 3.3 70B | Up to 131,072 tokens | High-performance and output quality |
+ | Qwen2 familly | 8,192 tokens | High-performance and output quality |
+
+
+
+ Set the following environment variables in your `.env` file:
+ ```toml Code
+ # Required
+ CEREBRAS_API_KEY=
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ llm = LLM(
+ model="cerebras/llama3.1-70b",
+ temperature=0.7,
+ max_tokens=8192
+ )
+ ```
+
+
+ Cerebras features:
+ - Fast inference speeds
+ - Competitive pricing
+ - Good balance of speed and quality
+ - Support for long context windows
+
+
+
+
+ Set the following environment variables in your `.env` file:
+ ```toml Code
+ OPENROUTER_API_KEY=
+ ```
+
+ Example usage in your CrewAI project:
+ ```python Code
+ llm = LLM(
+ model="openrouter/deepseek/deepseek-r1",
+ base_url="https://openrouter.ai/api/v1",
+ api_key=OPENROUTER_API_KEY
+ )
+ ```
+
+
+ Open Router models:
+ - openrouter/deepseek/deepseek-r1
+ - openrouter/deepseek/deepseek-chat
+
+
+
+
+## Structured LLM Calls
+
+CrewAI supports structured responses from LLM calls by allowing you to define a `response_format` using a Pydantic model. This enables the framework to automatically parse and validate the output, making it easier to integrate the response into your application without manual post-processing.
+
+For example, you can define a Pydantic model to represent the expected response structure and pass it as the `response_format` when instantiating the LLM. The model will then be used to convert the LLM output into a structured Python object.
+
+```python Code
+from crewai import LLM
+
+class Dog(BaseModel):
+ name: str
+ age: int
+ breed: str
+
+
+llm = LLM(model="gpt-4o", response_format=Dog)
+
+response = llm.call(
+ "Analyze the following messages and return the name, age, and breed. "
+ "Meet Kona! She is 3 years old and is a black german shepherd."
+)
+print(response)
+
+# Output:
+# Dog(name='Kona', age=3, breed='black german shepherd')
+```
+
## Advanced Features and Optimization
Learn how to get the most out of your LLM configuration:
@@ -417,339 +635,6 @@ Learn how to get the most out of your LLM configuration:
-## Provider Configuration Examples
-
-
-
- ```python Code
- # Required
- OPENAI_API_KEY=sk-...
-
- # Optional
- OPENAI_API_BASE=
- OPENAI_ORGANIZATION=
- ```
-
- Example usage:
- ```python Code
- from crewai import LLM
-
- llm = LLM(
- model="gpt-4",
- temperature=0.8,
- max_tokens=150,
- top_p=0.9,
- frequency_penalty=0.1,
- presence_penalty=0.1,
- stop=["END"],
- seed=42
- )
- ```
-
-
-
- ```python Code
- ANTHROPIC_API_KEY=sk-ant-...
- ```
-
- Example usage:
- ```python Code
- llm = LLM(
- model="anthropic/claude-3-sonnet-20240229-v1:0",
- temperature=0.7
- )
- ```
-
-
-
- ```python Code
- # Option 1: Gemini accessed with an API key.
- # https://ai.google.dev/gemini-api/docs/api-key
- GEMINI_API_KEY=
-
- # Option 2: Vertex AI IAM credentials for Gemini, Anthropic, and Model Garden.
- # https://cloud.google.com/vertex-ai/generative-ai/docs/overview
- ```
-
- Get credentials:
- ```python Code
- import json
-
- file_path = 'path/to/vertex_ai_service_account.json'
-
- # Load the JSON file
- with open(file_path, 'r') as file:
- vertex_credentials = json.load(file)
-
- # Convert the credentials to a JSON string
- vertex_credentials_json = json.dumps(vertex_credentials)
- ```
-
- Example usage:
- ```python Code
- from crewai import LLM
-
- llm = LLM(
- model="gemini/gemini-1.5-pro-latest",
- temperature=0.7,
- vertex_credentials=vertex_credentials_json
- )
- ```
-
-
-
- ```python Code
- # Required
- AZURE_API_KEY=
- AZURE_API_BASE=
- AZURE_API_VERSION=
-
- # Optional
- AZURE_AD_TOKEN=
- AZURE_API_TYPE=
- ```
-
- Example usage:
- ```python Code
- llm = LLM(
- model="azure/gpt-4",
- api_version="2023-05-15"
- )
- ```
-
-
-
- ```python Code
- AWS_ACCESS_KEY_ID=
- AWS_SECRET_ACCESS_KEY=
- AWS_DEFAULT_REGION=
- ```
-
- Example usage:
- ```python Code
- llm = LLM(
- model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0"
- )
- ```
-
-
-
- ```python Code
- AWS_ACCESS_KEY_ID=
- AWS_SECRET_ACCESS_KEY=
- AWS_DEFAULT_REGION=
- ```
-
- Example usage:
- ```python Code
- llm = LLM(
- model="sagemaker/"
- )
- ```
-
-
-
- ```python Code
- MISTRAL_API_KEY=
- ```
-
- Example usage:
- ```python Code
- llm = LLM(
- model="mistral/mistral-large-latest",
- temperature=0.7
- )
- ```
-
-
-
- ```python Code
- NVIDIA_API_KEY=
- ```
-
- Example usage:
- ```python Code
- llm = LLM(
- model="nvidia_nim/meta/llama3-70b-instruct",
- temperature=0.7
- )
- ```
-
-
-
- ```python Code
- GROQ_API_KEY=
- ```
-
- Example usage:
- ```python Code
- llm = LLM(
- model="groq/llama-3.2-90b-text-preview",
- temperature=0.7
- )
- ```
-
-
-
- ```python Code
- # Required
- WATSONX_URL=
- WATSONX_APIKEY=
- WATSONX_PROJECT_ID=
-
- # Optional
- WATSONX_TOKEN=
- WATSONX_DEPLOYMENT_SPACE_ID=
- ```
-
- Example usage:
- ```python Code
- llm = LLM(
- model="watsonx/meta-llama/llama-3-1-70b-instruct",
- base_url="https://api.watsonx.ai/v1"
- )
- ```
-
-
-
- 1. Install Ollama: [ollama.ai](https://ollama.ai/)
- 2. Run a model: `ollama run llama2`
- 3. Configure:
-
- ```python Code
- llm = LLM(
- model="ollama/llama3:70b",
- base_url="http://localhost:11434"
- )
- ```
-
-
-
- ```python Code
- FIREWORKS_API_KEY=
- ```
-
- Example usage:
- ```python Code
- llm = LLM(
- model="fireworks_ai/accounts/fireworks/models/llama-v3-70b-instruct",
- temperature=0.7
- )
- ```
-
-
-
- ```python Code
- PERPLEXITY_API_KEY=
- ```
-
- Example usage:
- ```python Code
- llm = LLM(
- model="llama-3.1-sonar-large-128k-online",
- base_url="https://api.perplexity.ai/"
- )
- ```
-
-
-
- ```python Code
- HUGGINGFACE_API_KEY=
- ```
-
- Example usage:
- ```python Code
- llm = LLM(
- model="huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct",
- base_url="your_api_endpoint"
- )
- ```
-
-
-
- ```python Code
- SAMBANOVA_API_KEY=
- ```
-
- Example usage:
- ```python Code
- llm = LLM(
- model="sambanova/Meta-Llama-3.1-8B-Instruct",
- temperature=0.7
- )
- ```
-
-
-
- ```python Code
- # Required
- CEREBRAS_API_KEY=
- ```
-
- Example usage:
- ```python Code
- llm = LLM(
- model="cerebras/llama3.1-70b",
- temperature=0.7,
- max_tokens=8192
- )
- ```
-
-
- Cerebras features:
- - Fast inference speeds
- - Competitive pricing
- - Good balance of speed and quality
- - Support for long context windows
-
-
-
-
- ```python Code
- OPENROUTER_API_KEY=
- ```
-
- Example usage:
- ```python Code
- llm = LLM(
- model="openrouter/deepseek/deepseek-r1",
- base_url="https://openrouter.ai/api/v1",
- api_key=OPENROUTER_API_KEY
- )
- ```
-
-
- Open Router models:
- - openrouter/deepseek/deepseek-r1
- - openrouter/deepseek/deepseek-chat
-
-
-
-
-## Structured LLM Calls
-
-CrewAI supports structured responses from LLM calls by allowing you to define a `response_format` using a Pydantic model. This enables the framework to automatically parse and validate the output, making it easier to integrate the response into your application without manual post-processing.
-
-For example, you can define a Pydantic model to represent the expected response structure and pass it as the `response_format` when instantiating the LLM. The model will then be used to convert the LLM output into a structured Python object.
-
-```python Code
-from crewai import LLM
-
-class Dog(BaseModel):
- name: str
- age: int
- breed: str
-
-
-llm = LLM(model="gpt-4o", response_format=Dog)
-
-response = llm.call(
- "Analyze the following messages and return the name, age, and breed. "
- "Meet Kona! She is 3 years old and is a black german shepherd."
-)
-print(response)
-```
-
## Common Issues and Solutions