From 4eaa8755ebb3c71e97703d0f7edcb848b89686b2 Mon Sep 17 00:00:00 2001 From: Tony Kipkemboi Date: Wed, 19 Feb 2025 11:06:46 -0500 Subject: [PATCH] docs: update accordions and fix layout (#2110) Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com> --- docs/concepts/llms.mdx | 1037 ++++++++++++++++++---------------------- 1 file changed, 461 insertions(+), 576 deletions(-) diff --git a/docs/concepts/llms.mdx b/docs/concepts/llms.mdx index 117face04..12061d1a6 100644 --- a/docs/concepts/llms.mdx +++ b/docs/concepts/llms.mdx @@ -27,157 +27,6 @@ Large Language Models (LLMs) are the core intelligence behind CrewAI agents. The -## Available Models and Their Capabilities - -Here's a detailed breakdown of supported models and their capabilities, you can compare performance at [lmarena.ai](https://lmarena.ai/?leaderboard) and [artificialanalysis.ai](https://artificialanalysis.ai/): - - - - | Model | Context Window | Best For | - |-------|---------------|-----------| - | GPT-4 | 8,192 tokens | High-accuracy tasks, complex reasoning | - | GPT-4 Turbo | 128,000 tokens | Long-form content, document analysis | - | GPT-4o & GPT-4o-mini | 128,000 tokens | Cost-effective large context processing | - | o3-mini | 200,000 tokens | Fast reasoning, complex reasoning | - - - 1 token ≈ 4 characters in English. For example, 8,192 tokens ≈ 32,768 characters or about 6,000 words. - - - - | Model | Context Window | Best For | - |-------|---------------|-----------| - | nvidia/mistral-nemo-minitron-8b-8k-instruct | 8,192 tokens | State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation. | - | nvidia/nemotron-4-mini-hindi-4b-instruct| 4,096 tokens | A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language. | - | "nvidia/llama-3.1-nemotron-70b-instruct | 128k tokens | Llama-3.1-Nemotron-70B-Instruct is a large language model customized by NVIDIA in order to improve the helpfulness of LLM generated responses. | - | nvidia/llama3-chatqa-1.5-8b | 128k tokens | Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines. | - | nvidia/llama3-chatqa-1.5-70b | 128k tokens | Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines. | - | nvidia/vila | 128k tokens | Multi-modal vision-language model that understands text/img/video and creates informative responses | - | nvidia/neva-22| 4,096 tokens | Multi-modal vision-language model that understands text/images and generates informative responses | - | nvidia/nemotron-mini-4b-instruct | 8,192 tokens | General-purpose tasks | - | nvidia/usdcode-llama3-70b-instruct | 128k tokens | State-of-the-art LLM that answers OpenUSD knowledge queries and generates USD-Python code. | - | nvidia/nemotron-4-340b-instruct | 4,096 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. | - | meta/codellama-70b | 100k tokens | LLM capable of generating code from natural language and vice versa. | - | meta/llama2-70b | 4,096 tokens | Cutting-edge large language AI model capable of generating text and code in response to prompts. | - | meta/llama3-8b-instruct | 8,192 tokens | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. | - | meta/llama3-70b-instruct | 8,192 tokens | Powers complex conversations with superior contextual understanding, reasoning and text generation. | - | meta/llama-3.1-8b-instruct | 128k tokens | Advanced state-of-the-art model with language understanding, superior reasoning, and text generation. | - | meta/llama-3.1-70b-instruct | 128k tokens | Powers complex conversations with superior contextual understanding, reasoning and text generation. | - | meta/llama-3.1-405b-instruct | 128k tokens | Advanced LLM for synthetic data generation, distillation, and inference for chatbots, coding, and domain-specific tasks. | - | meta/llama-3.2-1b-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. | - | meta/llama-3.2-3b-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. | - | meta/llama-3.2-11b-vision-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. | - | meta/llama-3.2-90b-vision-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. | - | meta/llama-3.1-70b-instruct | 128k tokens | Powers complex conversations with superior contextual understanding, reasoning and text generation. | - | google/gemma-7b | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. | - | google/gemma-2b | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. | - | google/codegemma-7b | 8,192 tokens | Cutting-edge model built on Google's Gemma-7B specialized for code generation and code completion. | - | google/codegemma-1.1-7b | 8,192 tokens | Advanced programming model for code generation, completion, reasoning, and instruction following. | - | google/recurrentgemma-2b | 8,192 tokens | Novel recurrent architecture based language model for faster inference when generating long sequences. | - | google/gemma-2-9b-it | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. | - | google/gemma-2-27b-it | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. | - | google/gemma-2-2b-it | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. | - | google/deplot | 512 tokens | One-shot visual language understanding model that translates images of plots into tables. | - | google/paligemma | 8,192 tokens | Vision language model adept at comprehending text and visual inputs to produce informative responses. | - | mistralai/mistral-7b-instruct-v0.2 | 32k tokens | This LLM follows instructions, completes requests, and generates creative text. | - | mistralai/mixtral-8x7b-instruct-v0.1 | 8,192 tokens | An MOE LLM that follows instructions, completes requests, and generates creative text. | - | mistralai/mistral-large | 4,096 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. | - | mistralai/mixtral-8x22b-instruct-v0.1 | 8,192 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. | - | mistralai/mistral-7b-instruct-v0.3 | 32k tokens | This LLM follows instructions, completes requests, and generates creative text. | - | nv-mistralai/mistral-nemo-12b-instruct | 128k tokens | Most advanced language model for reasoning, code, multilingual tasks; runs on a single GPU. | - | mistralai/mamba-codestral-7b-v0.1 | 256k tokens | Model for writing and interacting with code across a wide range of programming languages and tasks. | - | microsoft/phi-3-mini-128k-instruct | 128K tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. | - | microsoft/phi-3-mini-4k-instruct | 4,096 tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. | - | microsoft/phi-3-small-8k-instruct | 8,192 tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. | - | microsoft/phi-3-small-128k-instruct | 128K tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. | - | microsoft/phi-3-medium-4k-instruct | 4,096 tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. | - | microsoft/phi-3-medium-128k-instruct | 128K tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. | - | microsoft/phi-3.5-mini-instruct | 128K tokens | Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments | - | microsoft/phi-3.5-moe-instruct | 128K tokens | Advanced LLM based on Mixture of Experts architecure to deliver compute efficient content generation | - | microsoft/kosmos-2 | 1,024 tokens | Groundbreaking multimodal model designed to understand and reason about visual elements in images. | - | microsoft/phi-3-vision-128k-instruct | 128k tokens | Cutting-edge open multimodal model exceling in high-quality reasoning from images. | - | microsoft/phi-3.5-vision-instruct | 128k tokens | Cutting-edge open multimodal model exceling in high-quality reasoning from images. | - | databricks/dbrx-instruct | 12k tokens | A general-purpose LLM with state-of-the-art performance in language understanding, coding, and RAG. | - | snowflake/arctic | 1,024 tokens | Delivers high efficiency inference for enterprise applications focused on SQL generation and coding. | - | aisingapore/sea-lion-7b-instruct | 4,096 tokens | LLM to represent and serve the linguistic and cultural diversity of Southeast Asia | - | ibm/granite-8b-code-instruct | 4,096 tokens | Software programming LLM for code generation, completion, explanation, and multi-turn conversion. | - | ibm/granite-34b-code-instruct | 8,192 tokens | Software programming LLM for code generation, completion, explanation, and multi-turn conversion. | - | ibm/granite-3.0-8b-instruct | 4,096 tokens | Advanced Small Language Model supporting RAG, summarization, classification, code, and agentic AI | - | ibm/granite-3.0-3b-a800m-instruct | 4,096 tokens | Highly efficient Mixture of Experts model for RAG, summarization, entity extraction, and classification | - | mediatek/breeze-7b-instruct | 4,096 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. | - | upstage/solar-10.7b-instruct | 4,096 tokens | Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics. | - | writer/palmyra-med-70b-32k | 32k tokens | Leading LLM for accurate, contextually relevant responses in the medical domain. | - | writer/palmyra-med-70b | 32k tokens | Leading LLM for accurate, contextually relevant responses in the medical domain. | - | writer/palmyra-fin-70b-32k | 32k tokens | Specialized LLM for financial analysis, reporting, and data processing | - | 01-ai/yi-large | 32k tokens | Powerful model trained on English and Chinese for diverse tasks including chatbot and creative writing. | - | deepseek-ai/deepseek-coder-6.7b-instruct | 2k tokens | Powerful coding model offering advanced capabilities in code generation, completion, and infilling | - | rakuten/rakutenai-7b-instruct | 1,024 tokens | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. | - | rakuten/rakutenai-7b-chat | 1,024 tokens | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. | - | baichuan-inc/baichuan2-13b-chat | 4,096 tokens | Support Chinese and English chat, coding, math, instruction following, solving quizzes | - - - NVIDIA's NIM support for models is expanding continuously! For the most up-to-date list of available models, please visit build.nvidia.com. - - - - | Model | Context Window | Best For | - |-------|---------------|-----------| - | gemini-2.0-flash-exp | 1M tokens | Higher quality at faster speed, multimodal model, good for most tasks | - | gemini-1.5-flash | 1M tokens | Balanced multimodal model, good for most tasks | - | gemini-1.5-flash-8B | 1M tokens | Fastest, most cost-efficient, good for high-frequency tasks | - | gemini-1.5-pro | 2M tokens | Best performing, wide variety of reasoning tasks including logical reasoning, coding, and creative collaboration | - - - Google's Gemini models are all multimodal, supporting audio, images, video and text, supporting context caching, json schema, function calling, etc. - - These models are available via API_KEY from - [The Gemini API](https://ai.google.dev/gemini-api/docs) and also from - [Google Cloud Vertex](https://cloud.google.com/vertex-ai/generative-ai/docs/migrate/migrate-google-ai) as part of the - [Model Garden](https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models). - - - - | Model | Context Window | Best For | - |-------|---------------|-----------| - | Llama 3.1 70B/8B | 131,072 tokens | High-performance, large context tasks | - | Llama 3.2 Series | 8,192 tokens | General-purpose tasks | - | Mixtral 8x7B | 32,768 tokens | Balanced performance and context | - - - Groq is known for its fast inference speeds, making it suitable for real-time applications. - - - - | Model | Context Window | Best For | - |-------|---------------|-----------| - | Llama 3.1 70B/8B | Up to 131,072 tokens | High-performance, large context tasks | - | Llama 3.1 405B | 8,192 tokens | High-performance and output quality | - | Llama 3.2 Series | 8,192 tokens | General-purpose tasks, multimodal | - | Llama 3.3 70B | Up to 131,072 tokens | High-performance and output quality| - | Qwen2 familly | 8,192 tokens | High-performance and output quality | - - - [SambaNova](https://cloud.sambanova.ai/) has several models with fast inference speed at full precision. - - - - | Provider | Context Window | Key Features | - |----------|---------------|--------------| - | Deepseek Chat | 64,000 tokens | Specialized in technical discussions | - | Deepseek R1 | 64,000 tokens | Affordable reasoning model | - | Claude 3 | Up to 200K tokens | Strong reasoning, code understanding | - | Gemma Series | 8,192 tokens | Efficient, smaller-scale tasks | - - - Provider selection should consider factors like: - - API availability in your region - - Pricing structure - - Required features (e.g., streaming, function calling) - - Performance requirements - - - - ## Setting Up Your LLM There are three ways to configure LLMs in CrewAI. Choose the method that best fits your workflow: @@ -206,102 +55,12 @@ There are three ways to configure LLMs in CrewAI. Choose the method that best fi ```yaml researcher: - # Agent Definition role: Research Specialist goal: Conduct comprehensive research and analysis backstory: A dedicated research professional with years of experience verbose: true - - # Model Selection (uncomment your choice) - - # OpenAI Models - Known for reliability and performance - llm: openai/gpt-4o-mini - # llm: openai/gpt-4 # More accurate but expensive - # llm: openai/gpt-4-turbo # Fast with large context - # llm: openai/gpt-4o # Optimized for longer texts - # llm: openai/o1-preview # Latest features - # llm: openai/o1-mini # Cost-effective - - # Azure Models - For enterprise deployments - # llm: azure/gpt-4o-mini - # llm: azure/gpt-4 - # llm: azure/gpt-35-turbo - - # Anthropic Models - Strong reasoning capabilities - # llm: anthropic/claude-3-opus-20240229-v1:0 - # llm: anthropic/claude-3-sonnet-20240229-v1:0 - # llm: anthropic/claude-3-haiku-20240307-v1:0 - # llm: anthropic/claude-2.1 - # llm: anthropic/claude-2.0 - - # Google Models - Strong reasoning, large cachable context window, multimodal - # llm: gemini/gemini-1.5-pro-latest - # llm: gemini/gemini-1.5-flash-latest - # llm: gemini/gemini-1.5-flash-8b-latest - - # AWS Bedrock Models - Enterprise-grade - # llm: bedrock/anthropic.claude-3-sonnet-20240229-v1:0 - # llm: bedrock/anthropic.claude-v2:1 - # llm: bedrock/amazon.titan-text-express-v1 - # llm: bedrock/meta.llama2-70b-chat-v1 - - # Amazon SageMaker Models - Enterprise-grade - # llm: sagemaker/ - - # Mistral Models - Open source alternative - # llm: mistral/mistral-large-latest - # llm: mistral/mistral-medium-latest - # llm: mistral/mistral-small-latest - - # Groq Models - Fast inference - # llm: groq/mixtral-8x7b-32768 - # llm: groq/llama-3.1-70b-versatile - # llm: groq/llama-3.2-90b-text-preview - # llm: groq/gemma2-9b-it - # llm: groq/gemma-7b-it - - # IBM watsonx.ai Models - Enterprise features - # llm: watsonx/ibm/granite-13b-chat-v2 - # llm: watsonx/meta-llama/llama-3-1-70b-instruct - # llm: watsonx/bigcode/starcoder2-15b - - # Ollama Models - Local deployment - # llm: ollama/llama3:70b - # llm: ollama/codellama - # llm: ollama/mistral - # llm: ollama/mixtral - # llm: ollama/phi - - # Fireworks AI Models - Specialized tasks - # llm: fireworks_ai/accounts/fireworks/models/llama-v3-70b-instruct - # llm: fireworks_ai/accounts/fireworks/models/mixtral-8x7b - # llm: fireworks_ai/accounts/fireworks/models/zephyr-7b-beta - - # Perplexity AI Models - Research focused - # llm: pplx/llama-3.1-sonar-large-128k-online - # llm: pplx/mistral-7b-instruct - # llm: pplx/codellama-34b-instruct - # llm: pplx/mixtral-8x7b-instruct - - # Hugging Face Models - Community models - # llm: huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct - # llm: huggingface/mistralai/Mixtral-8x7B-Instruct-v0.1 - # llm: huggingface/tiiuae/falcon-180B-chat - # llm: huggingface/google/gemma-7b-it - - # Nvidia NIM Models - GPU-optimized - # llm: nvidia_nim/meta/llama3-70b-instruct - # llm: nvidia_nim/mistral/mixtral-8x7b - # llm: nvidia_nim/google/gemma-7b - - # SambaNova Models - Enterprise AI - # llm: sambanova/Meta-Llama-3.1-8B-Instruct - # llm: sambanova/BioMistral-7B - # llm: sambanova/Falcon-180B - - # Open Router Models - Affordable reasoning - # llm: openrouter/deepseek/deepseek-r1 - # llm: openrouter/deepseek/deepseek-chat + llm: openai/gpt-4o-mini # your model here + # (see provider configuration examples below for more) ``` @@ -349,6 +108,465 @@ There are three ways to configure LLMs in CrewAI. Choose the method that best fi +## Provider Configuration Examples + + +CrewAI supports a multitude of LLM providers, each offering unique features, authentication methods, and model capabilities. +In this section, you'll find detailed examples that help you select, configure, and optimize the LLM that best fits your project's needs. + + + + Set the following environment variables in your `.env` file: + + ```toml Code + # Required + OPENAI_API_KEY=sk-... + + # Optional + OPENAI_API_BASE= + OPENAI_ORGANIZATION= + ``` + + Example usage in your CrewAI project: + ```python Code + from crewai import LLM + + llm = LLM( + model="openai/gpt-4", # call model by provider/model_name + temperature=0.8, + max_tokens=150, + top_p=0.9, + frequency_penalty=0.1, + presence_penalty=0.1, + stop=["END"], + seed=42 + ) + ``` + + OpenAI is one of the leading providers of LLMs with a wide range of models and features. + + | Model | Context Window | Best For | + |---------------------|------------------|-----------------------------------------------| + | GPT-4 | 8,192 tokens | High-accuracy tasks, complex reasoning | + | GPT-4 Turbo | 128,000 tokens | Long-form content, document analysis | + | GPT-4o & GPT-4o-mini | 128,000 tokens | Cost-effective large context processing | + | o3-mini | 200,000 tokens | Fast reasoning, complex reasoning | + | o1-mini | 128,000 tokens | Fast reasoning, complex reasoning | + | o1-preview | 128,000 tokens | Fast reasoning, complex reasoning | + | o1 | 200,000 tokens | Fast reasoning, complex reasoning | + + + + ```toml Code + ANTHROPIC_API_KEY=sk-ant-... + ``` + + Example usage in your CrewAI project: + ```python Code + llm = LLM( + model="anthropic/claude-3-sonnet-20240229-v1:0", + temperature=0.7 + ) + ``` + + + + Set the following environment variables in your `.env` file: + + ```toml Code + # Option 1: Gemini accessed with an API key. + # https://ai.google.dev/gemini-api/docs/api-key + GEMINI_API_KEY= + + # Option 2: Vertex AI IAM credentials for Gemini, Anthropic, and Model Garden. + # https://cloud.google.com/vertex-ai/generative-ai/docs/overview + ``` + + Get credentials from your Google Cloud Console and save it to a JSON file with the following code: + ```python Code + import json + + file_path = 'path/to/vertex_ai_service_account.json' + + # Load the JSON file + with open(file_path, 'r') as file: + vertex_credentials = json.load(file) + + # Convert the credentials to a JSON string + vertex_credentials_json = json.dumps(vertex_credentials) + ``` + + Example usage in your CrewAI project: + ```python Code + from crewai import LLM + + llm = LLM( + model="gemini/gemini-1.5-pro-latest", + temperature=0.7, + vertex_credentials=vertex_credentials_json + ) + ``` + Google offers a range of powerful models optimized for different use cases: + + | Model | Context Window | Best For | + |-----------------------|----------------|------------------------------------------------------------------| + | gemini-2.0-flash-exp | 1M tokens | Higher quality at faster speed, multimodal model, good for most tasks | + | gemini-1.5-flash | 1M tokens | Balanced multimodal model, good for most tasks | + | gemini-1.5-flash-8B | 1M tokens | Fastest, most cost-efficient, good for high-frequency tasks | + | gemini-1.5-pro | 2M tokens | Best performing, wide variety of reasoning tasks including logical reasoning, coding, and creative collaboration | + + + + ```toml Code + # Required + AZURE_API_KEY= + AZURE_API_BASE= + AZURE_API_VERSION= + + # Optional + AZURE_AD_TOKEN= + AZURE_API_TYPE= + ``` + + Example usage in your CrewAI project: + ```python Code + llm = LLM( + model="azure/gpt-4", + api_version="2023-05-15" + ) + ``` + + + + ```toml Code + AWS_ACCESS_KEY_ID= + AWS_SECRET_ACCESS_KEY= + AWS_DEFAULT_REGION= + ``` + + Example usage in your CrewAI project: + ```python Code + llm = LLM( + model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0" + ) + ``` + + + + ```toml Code + AWS_ACCESS_KEY_ID= + AWS_SECRET_ACCESS_KEY= + AWS_DEFAULT_REGION= + ``` + + Example usage in your CrewAI project: + ```python Code + llm = LLM( + model="sagemaker/" + ) + ``` + + + + Set the following environment variables in your `.env` file: + ```toml Code + MISTRAL_API_KEY= + ``` + + Example usage in your CrewAI project: + ```python Code + llm = LLM( + model="mistral/mistral-large-latest", + temperature=0.7 + ) + ``` + + + + Set the following environment variables in your `.env` file: + ```toml Code + NVIDIA_API_KEY= + ``` + + Example usage in your CrewAI project: + ```python Code + llm = LLM( + model="nvidia_nim/meta/llama3-70b-instruct", + temperature=0.7 + ) + ``` + + Nvidia NIM provides a comprehensive suite of models for various use cases, from general-purpose tasks to specialized applications. + + | Model | Context Window | Best For | + |-------------------------------------------------------------------------|----------------|-------------------------------------------------------------------| + | nvidia/mistral-nemo-minitron-8b-8k-instruct | 8,192 tokens | State-of-the-art small language model delivering superior accuracy for chatbot, virtual assistants, and content generation. | + | nvidia/nemotron-4-mini-hindi-4b-instruct | 4,096 tokens | A bilingual Hindi-English SLM for on-device inference, tailored specifically for Hindi Language. | + | nvidia/llama-3.1-nemotron-70b-instruct | 128k tokens | Customized for enhanced helpfulness in responses | + | nvidia/llama3-chatqa-1.5-8b | 128k tokens | Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines. | + | nvidia/llama3-chatqa-1.5-70b | 128k tokens | Advanced LLM to generate high-quality, context-aware responses for chatbots and search engines. | + | nvidia/vila | 128k tokens | Multi-modal vision-language model that understands text/img/video and creates informative responses | + | nvidia/neva-22 | 4,096 tokens | Multi-modal vision-language model that understands text/images and generates informative responses | + | nvidia/nemotron-mini-4b-instruct | 8,192 tokens | General-purpose tasks | + | nvidia/usdcode-llama3-70b-instruct | 128k tokens | State-of-the-art LLM that answers OpenUSD knowledge queries and generates USD-Python code. | + | nvidia/nemotron-4-340b-instruct | 4,096 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. | + | meta/codellama-70b | 100k tokens | LLM capable of generating code from natural language and vice versa. | + | meta/llama2-70b | 4,096 tokens | Cutting-edge large language AI model capable of generating text and code in response to prompts. | + | meta/llama3-8b-instruct | 8,192 tokens | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. | + | meta/llama3-70b-instruct | 8,192 tokens | Powers complex conversations with superior contextual understanding, reasoning and text generation. | + | meta/llama-3.1-8b-instruct | 128k tokens | Advanced state-of-the-art model with language understanding, superior reasoning, and text generation. | + | meta/llama-3.1-70b-instruct | 128k tokens | Powers complex conversations with superior contextual understanding, reasoning and text generation. | + | meta/llama-3.1-405b-instruct | 128k tokens | Advanced LLM for synthetic data generation, distillation, and inference for chatbots, coding, and domain-specific tasks. | + | meta/llama-3.2-1b-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. | + | meta/llama-3.2-3b-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. | + | meta/llama-3.2-11b-vision-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. | + | meta/llama-3.2-90b-vision-instruct | 128k tokens | Advanced state-of-the-art small language model with language understanding, superior reasoning, and text generation. | + | google/gemma-7b | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. | + | google/gemma-2b | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. | + | google/codegemma-7b | 8,192 tokens | Cutting-edge model built on Google's Gemma-7B specialized for code generation and code completion. | + | google/codegemma-1.1-7b | 8,192 tokens | Advanced programming model for code generation, completion, reasoning, and instruction following. | + | google/recurrentgemma-2b | 8,192 tokens | Novel recurrent architecture based language model for faster inference when generating long sequences. | + | google/gemma-2-9b-it | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. | + | google/gemma-2-27b-it | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. | + | google/gemma-2-2b-it | 8,192 tokens | Cutting-edge text generation model text understanding, transformation, and code generation. | + | google/deplot | 512 tokens | One-shot visual language understanding model that translates images of plots into tables. | + | google/paligemma | 8,192 tokens | Vision language model adept at comprehending text and visual inputs to produce informative responses. | + | mistralai/mistral-7b-instruct-v0.2 | 32k tokens | This LLM follows instructions, completes requests, and generates creative text. | + | mistralai/mixtral-8x7b-instruct-v0.1 | 8,192 tokens | An MOE LLM that follows instructions, completes requests, and generates creative text. | + | mistralai/mistral-large | 4,096 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. | + | mistralai/mixtral-8x22b-instruct-v0.1 | 8,192 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. | + | mistralai/mistral-7b-instruct-v0.3 | 32k tokens | This LLM follows instructions, completes requests, and generates creative text. | + | nv-mistralai/mistral-nemo-12b-instruct | 128k tokens | Most advanced language model for reasoning, code, multilingual tasks; runs on a single GPU. | + | mistralai/mamba-codestral-7b-v0.1 | 256k tokens | Model for writing and interacting with code across a wide range of programming languages and tasks. | + | microsoft/phi-3-mini-128k-instruct | 128K tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. | + | microsoft/phi-3-mini-4k-instruct | 4,096 tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. | + | microsoft/phi-3-small-8k-instruct | 8,192 tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. | + | microsoft/phi-3-small-128k-instruct | 128K tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. | + | microsoft/phi-3-medium-4k-instruct | 4,096 tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. | + | microsoft/phi-3-medium-128k-instruct | 128K tokens | Lightweight, state-of-the-art open LLM with strong math and logical reasoning skills. | + | microsoft/phi-3.5-mini-instruct | 128K tokens | Lightweight multilingual LLM powering AI applications in latency bound, memory/compute constrained environments | + | microsoft/phi-3.5-moe-instruct | 128K tokens | Advanced LLM based on Mixture of Experts architecure to deliver compute efficient content generation | + | microsoft/kosmos-2 | 1,024 tokens | Groundbreaking multimodal model designed to understand and reason about visual elements in images. | + | microsoft/phi-3-vision-128k-instruct | 128k tokens | Cutting-edge open multimodal model exceling in high-quality reasoning from images. | + | microsoft/phi-3.5-vision-instruct | 128k tokens | Cutting-edge open multimodal model exceling in high-quality reasoning from images. | + | databricks/dbrx-instruct | 12k tokens | A general-purpose LLM with state-of-the-art performance in language understanding, coding, and RAG. | + | snowflake/arctic | 1,024 tokens | Delivers high efficiency inference for enterprise applications focused on SQL generation and coding. | + | aisingapore/sea-lion-7b-instruct | 4,096 tokens | LLM to represent and serve the linguistic and cultural diversity of Southeast Asia | + | ibm/granite-8b-code-instruct | 4,096 tokens | Software programming LLM for code generation, completion, explanation, and multi-turn conversion. | + | ibm/granite-34b-code-instruct | 8,192 tokens | Software programming LLM for code generation, completion, explanation, and multi-turn conversion. | + | ibm/granite-3.0-8b-instruct | 4,096 tokens | Advanced Small Language Model supporting RAG, summarization, classification, code, and agentic AI | + | ibm/granite-3.0-3b-a800m-instruct | 4,096 tokens | Highly efficient Mixture of Experts model for RAG, summarization, entity extraction, and classification | + | mediatek/breeze-7b-instruct | 4,096 tokens | Creates diverse synthetic data that mimics the characteristics of real-world data. | + | upstage/solar-10.7b-instruct | 4,096 tokens | Excels in NLP tasks, particularly in instruction-following, reasoning, and mathematics. | + | writer/palmyra-med-70b-32k | 32k tokens | Leading LLM for accurate, contextually relevant responses in the medical domain. | + | writer/palmyra-med-70b | 32k tokens | Leading LLM for accurate, contextually relevant responses in the medical domain. | + | writer/palmyra-fin-70b-32k | 32k tokens | Specialized LLM for financial analysis, reporting, and data processing | + | 01-ai/yi-large | 32k tokens | Powerful model trained on English and Chinese for diverse tasks including chatbot and creative writing. | + | deepseek-ai/deepseek-coder-6.7b-instruct | 2k tokens | Powerful coding model offering advanced capabilities in code generation, completion, and infilling | + | rakuten/rakutenai-7b-instruct | 1,024 tokens | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. | + | rakuten/rakutenai-7b-chat | 1,024 tokens | Advanced state-of-the-art LLM with language understanding, superior reasoning, and text generation. | + | baichuan-inc/baichuan2-13b-chat | 4,096 tokens | Support Chinese and English chat, coding, math, instruction following, solving quizzes | + + + + Set the following environment variables in your `.env` file: + + ```toml Code + GROQ_API_KEY= + ``` + + Example usage in your CrewAI project: + ```python Code + llm = LLM( + model="groq/llama-3.2-90b-text-preview", + temperature=0.7 + ) + ``` + | Model | Context Window | Best For | + |-------------------|------------------|--------------------------------------------| + | Llama 3.1 70B/8B | 131,072 tokens | High-performance, large context tasks | + | Llama 3.2 Series | 8,192 tokens | General-purpose tasks | + | Mixtral 8x7B | 32,768 tokens | Balanced performance and context | + + + + Set the following environment variables in your `.env` file: + ```toml Code + # Required + WATSONX_URL= + WATSONX_APIKEY= + WATSONX_PROJECT_ID= + + # Optional + WATSONX_TOKEN= + WATSONX_DEPLOYMENT_SPACE_ID= + ``` + + Example usage in your CrewAI project: + ```python Code + llm = LLM( + model="watsonx/meta-llama/llama-3-1-70b-instruct", + base_url="https://api.watsonx.ai/v1" + ) + ``` + + + + 1. Install Ollama: [ollama.ai](https://ollama.ai/) + 2. Run a model: `ollama run llama2` + 3. Configure: + + ```python Code + llm = LLM( + model="ollama/llama3:70b", + base_url="http://localhost:11434" + ) + ``` + + + + Set the following environment variables in your `.env` file: + ```toml Code + FIREWORKS_API_KEY= + ``` + + Example usage in your CrewAI project: + ```python Code + llm = LLM( + model="fireworks_ai/accounts/fireworks/models/llama-v3-70b-instruct", + temperature=0.7 + ) + ``` + + + + Set the following environment variables in your `.env` file: + ```toml Code + PERPLEXITY_API_KEY= + ``` + + Example usage in your CrewAI project: + ```python Code + llm = LLM( + model="llama-3.1-sonar-large-128k-online", + base_url="https://api.perplexity.ai/" + ) + ``` + + + + Set the following environment variables in your `.env` file: + ```toml Code + HUGGINGFACE_API_KEY= + ``` + + Example usage in your CrewAI project: + ```python Code + llm = LLM( + model="huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct", + base_url="your_api_endpoint" + ) + ``` + + + + Set the following environment variables in your `.env` file: + + ```toml Code + SAMBANOVA_API_KEY= + ``` + + Example usage in your CrewAI project: + ```python Code + llm = LLM( + model="sambanova/Meta-Llama-3.1-8B-Instruct", + temperature=0.7 + ) + ``` + | Model | Context Window | Best For | + |--------------------|------------------------|----------------------------------------------| + | Llama 3.1 70B/8B | Up to 131,072 tokens | High-performance, large context tasks | + | Llama 3.1 405B | 8,192 tokens | High-performance and output quality | + | Llama 3.2 Series | 8,192 tokens | General-purpose, multimodal tasks | + | Llama 3.3 70B | Up to 131,072 tokens | High-performance and output quality | + | Qwen2 familly | 8,192 tokens | High-performance and output quality | + + + + Set the following environment variables in your `.env` file: + ```toml Code + # Required + CEREBRAS_API_KEY= + ``` + + Example usage in your CrewAI project: + ```python Code + llm = LLM( + model="cerebras/llama3.1-70b", + temperature=0.7, + max_tokens=8192 + ) + ``` + + + Cerebras features: + - Fast inference speeds + - Competitive pricing + - Good balance of speed and quality + - Support for long context windows + + + + + Set the following environment variables in your `.env` file: + ```toml Code + OPENROUTER_API_KEY= + ``` + + Example usage in your CrewAI project: + ```python Code + llm = LLM( + model="openrouter/deepseek/deepseek-r1", + base_url="https://openrouter.ai/api/v1", + api_key=OPENROUTER_API_KEY + ) + ``` + + + Open Router models: + - openrouter/deepseek/deepseek-r1 + - openrouter/deepseek/deepseek-chat + + + + +## Structured LLM Calls + +CrewAI supports structured responses from LLM calls by allowing you to define a `response_format` using a Pydantic model. This enables the framework to automatically parse and validate the output, making it easier to integrate the response into your application without manual post-processing. + +For example, you can define a Pydantic model to represent the expected response structure and pass it as the `response_format` when instantiating the LLM. The model will then be used to convert the LLM output into a structured Python object. + +```python Code +from crewai import LLM + +class Dog(BaseModel): + name: str + age: int + breed: str + + +llm = LLM(model="gpt-4o", response_format=Dog) + +response = llm.call( + "Analyze the following messages and return the name, age, and breed. " + "Meet Kona! She is 3 years old and is a black german shepherd." +) +print(response) + +# Output: +# Dog(name='Kona', age=3, breed='black german shepherd') +``` + ## Advanced Features and Optimization Learn how to get the most out of your LLM configuration: @@ -417,339 +635,6 @@ Learn how to get the most out of your LLM configuration: -## Provider Configuration Examples - - - - ```python Code - # Required - OPENAI_API_KEY=sk-... - - # Optional - OPENAI_API_BASE= - OPENAI_ORGANIZATION= - ``` - - Example usage: - ```python Code - from crewai import LLM - - llm = LLM( - model="gpt-4", - temperature=0.8, - max_tokens=150, - top_p=0.9, - frequency_penalty=0.1, - presence_penalty=0.1, - stop=["END"], - seed=42 - ) - ``` - - - - ```python Code - ANTHROPIC_API_KEY=sk-ant-... - ``` - - Example usage: - ```python Code - llm = LLM( - model="anthropic/claude-3-sonnet-20240229-v1:0", - temperature=0.7 - ) - ``` - - - - ```python Code - # Option 1: Gemini accessed with an API key. - # https://ai.google.dev/gemini-api/docs/api-key - GEMINI_API_KEY= - - # Option 2: Vertex AI IAM credentials for Gemini, Anthropic, and Model Garden. - # https://cloud.google.com/vertex-ai/generative-ai/docs/overview - ``` - - Get credentials: - ```python Code - import json - - file_path = 'path/to/vertex_ai_service_account.json' - - # Load the JSON file - with open(file_path, 'r') as file: - vertex_credentials = json.load(file) - - # Convert the credentials to a JSON string - vertex_credentials_json = json.dumps(vertex_credentials) - ``` - - Example usage: - ```python Code - from crewai import LLM - - llm = LLM( - model="gemini/gemini-1.5-pro-latest", - temperature=0.7, - vertex_credentials=vertex_credentials_json - ) - ``` - - - - ```python Code - # Required - AZURE_API_KEY= - AZURE_API_BASE= - AZURE_API_VERSION= - - # Optional - AZURE_AD_TOKEN= - AZURE_API_TYPE= - ``` - - Example usage: - ```python Code - llm = LLM( - model="azure/gpt-4", - api_version="2023-05-15" - ) - ``` - - - - ```python Code - AWS_ACCESS_KEY_ID= - AWS_SECRET_ACCESS_KEY= - AWS_DEFAULT_REGION= - ``` - - Example usage: - ```python Code - llm = LLM( - model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0" - ) - ``` - - - - ```python Code - AWS_ACCESS_KEY_ID= - AWS_SECRET_ACCESS_KEY= - AWS_DEFAULT_REGION= - ``` - - Example usage: - ```python Code - llm = LLM( - model="sagemaker/" - ) - ``` - - - - ```python Code - MISTRAL_API_KEY= - ``` - - Example usage: - ```python Code - llm = LLM( - model="mistral/mistral-large-latest", - temperature=0.7 - ) - ``` - - - - ```python Code - NVIDIA_API_KEY= - ``` - - Example usage: - ```python Code - llm = LLM( - model="nvidia_nim/meta/llama3-70b-instruct", - temperature=0.7 - ) - ``` - - - - ```python Code - GROQ_API_KEY= - ``` - - Example usage: - ```python Code - llm = LLM( - model="groq/llama-3.2-90b-text-preview", - temperature=0.7 - ) - ``` - - - - ```python Code - # Required - WATSONX_URL= - WATSONX_APIKEY= - WATSONX_PROJECT_ID= - - # Optional - WATSONX_TOKEN= - WATSONX_DEPLOYMENT_SPACE_ID= - ``` - - Example usage: - ```python Code - llm = LLM( - model="watsonx/meta-llama/llama-3-1-70b-instruct", - base_url="https://api.watsonx.ai/v1" - ) - ``` - - - - 1. Install Ollama: [ollama.ai](https://ollama.ai/) - 2. Run a model: `ollama run llama2` - 3. Configure: - - ```python Code - llm = LLM( - model="ollama/llama3:70b", - base_url="http://localhost:11434" - ) - ``` - - - - ```python Code - FIREWORKS_API_KEY= - ``` - - Example usage: - ```python Code - llm = LLM( - model="fireworks_ai/accounts/fireworks/models/llama-v3-70b-instruct", - temperature=0.7 - ) - ``` - - - - ```python Code - PERPLEXITY_API_KEY= - ``` - - Example usage: - ```python Code - llm = LLM( - model="llama-3.1-sonar-large-128k-online", - base_url="https://api.perplexity.ai/" - ) - ``` - - - - ```python Code - HUGGINGFACE_API_KEY= - ``` - - Example usage: - ```python Code - llm = LLM( - model="huggingface/meta-llama/Meta-Llama-3.1-8B-Instruct", - base_url="your_api_endpoint" - ) - ``` - - - - ```python Code - SAMBANOVA_API_KEY= - ``` - - Example usage: - ```python Code - llm = LLM( - model="sambanova/Meta-Llama-3.1-8B-Instruct", - temperature=0.7 - ) - ``` - - - - ```python Code - # Required - CEREBRAS_API_KEY= - ``` - - Example usage: - ```python Code - llm = LLM( - model="cerebras/llama3.1-70b", - temperature=0.7, - max_tokens=8192 - ) - ``` - - - Cerebras features: - - Fast inference speeds - - Competitive pricing - - Good balance of speed and quality - - Support for long context windows - - - - - ```python Code - OPENROUTER_API_KEY= - ``` - - Example usage: - ```python Code - llm = LLM( - model="openrouter/deepseek/deepseek-r1", - base_url="https://openrouter.ai/api/v1", - api_key=OPENROUTER_API_KEY - ) - ``` - - - Open Router models: - - openrouter/deepseek/deepseek-r1 - - openrouter/deepseek/deepseek-chat - - - - -## Structured LLM Calls - -CrewAI supports structured responses from LLM calls by allowing you to define a `response_format` using a Pydantic model. This enables the framework to automatically parse and validate the output, making it easier to integrate the response into your application without manual post-processing. - -For example, you can define a Pydantic model to represent the expected response structure and pass it as the `response_format` when instantiating the LLM. The model will then be used to convert the LLM output into a structured Python object. - -```python Code -from crewai import LLM - -class Dog(BaseModel): - name: str - age: int - breed: str - - -llm = LLM(model="gpt-4o", response_format=Dog) - -response = llm.call( - "Analyze the following messages and return the name, age, and breed. " - "Meet Kona! She is 3 years old and is a black german shepherd." -) -print(response) -``` - ## Common Issues and Solutions