docs: update multimodal agents guide and mint.json configuration

This commit is contained in:
Tony Kipkemboi
2025-01-15 14:13:37 -05:00
parent 835557e648
commit c12343a8b8
2 changed files with 6 additions and 6 deletions

View File

@@ -1,14 +1,14 @@
--- ---
title: Using Multimodal Agents title: Using Multimodal Agents
description: Learn how to enable and use multimodal capabilities in your agents for processing images and other non-text content within the CrewAI framework. description: Learn how to enable and use multimodal capabilities in your agents for processing images and other non-text content within the CrewAI framework.
icon: image icon: video
--- ---
# Using Multimodal Agents ## Using Multimodal Agents
CrewAI supports multimodal agents that can process both text and non-text content like images. This guide will show you how to enable and use multimodal capabilities in your agents. CrewAI supports multimodal agents that can process both text and non-text content like images. This guide will show you how to enable and use multimodal capabilities in your agents.
## Enabling Multimodal Capabilities ### Enabling Multimodal Capabilities
To create a multimodal agent, simply set the `multimodal` parameter to `True` when initializing your agent: To create a multimodal agent, simply set the `multimodal` parameter to `True` when initializing your agent:
@@ -25,7 +25,7 @@ agent = Agent(
When you set `multimodal=True`, the agent is automatically configured with the necessary tools for handling non-text content, including the `AddImageTool`. When you set `multimodal=True`, the agent is automatically configured with the necessary tools for handling non-text content, including the `AddImageTool`.
## Working with Images ### Working with Images
The multimodal agent comes pre-configured with the `AddImageTool`, which allows it to process images. You don't need to manually add this tool - it's automatically included when you enable multimodal capabilities. The multimodal agent comes pre-configured with the `AddImageTool`, which allows it to process images. You don't need to manually add this tool - it's automatically included when you enable multimodal capabilities.
@@ -108,7 +108,7 @@ The multimodal agent will automatically handle the image processing through its
- Process image content with optional context or specific questions - Process image content with optional context or specific questions
- Provide analysis and insights based on the visual information and task requirements - Provide analysis and insights based on the visual information and task requirements
## Best Practices ### Best Practices
When working with multimodal agents, keep these best practices in mind: When working with multimodal agents, keep these best practices in mind:

View File

@@ -91,7 +91,7 @@
"how-to/custom-manager-agent", "how-to/custom-manager-agent",
"how-to/llm-connections", "how-to/llm-connections",
"how-to/customizing-agents", "how-to/customizing-agents",
"how-to/multimodal-agents.mdx", "how-to/multimodal-agents",
"how-to/coding-agents", "how-to/coding-agents",
"how-to/force-tool-output-as-result", "how-to/force-tool-output-as-result",
"how-to/human-input-on-execution", "how-to/human-input-on-execution",