mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-01-09 16:18:30 +00:00
docs: update multimodal agents guide and mint.json configuration
This commit is contained in:
@@ -1,14 +1,14 @@
|
|||||||
---
|
---
|
||||||
title: Using Multimodal Agents
|
title: Using Multimodal Agents
|
||||||
description: Learn how to enable and use multimodal capabilities in your agents for processing images and other non-text content within the CrewAI framework.
|
description: Learn how to enable and use multimodal capabilities in your agents for processing images and other non-text content within the CrewAI framework.
|
||||||
icon: image
|
icon: video
|
||||||
---
|
---
|
||||||
|
|
||||||
# Using Multimodal Agents
|
## Using Multimodal Agents
|
||||||
|
|
||||||
CrewAI supports multimodal agents that can process both text and non-text content like images. This guide will show you how to enable and use multimodal capabilities in your agents.
|
CrewAI supports multimodal agents that can process both text and non-text content like images. This guide will show you how to enable and use multimodal capabilities in your agents.
|
||||||
|
|
||||||
## Enabling Multimodal Capabilities
|
### Enabling Multimodal Capabilities
|
||||||
|
|
||||||
To create a multimodal agent, simply set the `multimodal` parameter to `True` when initializing your agent:
|
To create a multimodal agent, simply set the `multimodal` parameter to `True` when initializing your agent:
|
||||||
|
|
||||||
@@ -25,7 +25,7 @@ agent = Agent(
|
|||||||
|
|
||||||
When you set `multimodal=True`, the agent is automatically configured with the necessary tools for handling non-text content, including the `AddImageTool`.
|
When you set `multimodal=True`, the agent is automatically configured with the necessary tools for handling non-text content, including the `AddImageTool`.
|
||||||
|
|
||||||
## Working with Images
|
### Working with Images
|
||||||
|
|
||||||
The multimodal agent comes pre-configured with the `AddImageTool`, which allows it to process images. You don't need to manually add this tool - it's automatically included when you enable multimodal capabilities.
|
The multimodal agent comes pre-configured with the `AddImageTool`, which allows it to process images. You don't need to manually add this tool - it's automatically included when you enable multimodal capabilities.
|
||||||
|
|
||||||
@@ -108,7 +108,7 @@ The multimodal agent will automatically handle the image processing through its
|
|||||||
- Process image content with optional context or specific questions
|
- Process image content with optional context or specific questions
|
||||||
- Provide analysis and insights based on the visual information and task requirements
|
- Provide analysis and insights based on the visual information and task requirements
|
||||||
|
|
||||||
## Best Practices
|
### Best Practices
|
||||||
|
|
||||||
When working with multimodal agents, keep these best practices in mind:
|
When working with multimodal agents, keep these best practices in mind:
|
||||||
|
|
||||||
|
|||||||
@@ -91,7 +91,7 @@
|
|||||||
"how-to/custom-manager-agent",
|
"how-to/custom-manager-agent",
|
||||||
"how-to/llm-connections",
|
"how-to/llm-connections",
|
||||||
"how-to/customizing-agents",
|
"how-to/customizing-agents",
|
||||||
"how-to/multimodal-agents.mdx",
|
"how-to/multimodal-agents",
|
||||||
"how-to/coding-agents",
|
"how-to/coding-agents",
|
||||||
"how-to/force-tool-output-as-result",
|
"how-to/force-tool-output-as-result",
|
||||||
"how-to/human-input-on-execution",
|
"how-to/human-input-on-execution",
|
||||||
|
|||||||
Reference in New Issue
Block a user