Compare commits

..

5 Commits

Author SHA1 Message Date
Tony Kipkemboi
c12343a8b8 docs: update multimodal agents guide and mint.json configuration 2025-01-15 14:13:37 -05:00
Tony Kipkemboi
835557e648 fix: add multimodal docs path to mint.json 2025-01-15 13:54:32 -05:00
Daniel Barreto
4185ea688f fix: get rid of translation typo (#1880)
Co-authored-by: Brandon Hancock (bhancock_ai) <109994880+bhancockio@users.noreply.github.com>
2025-01-14 14:06:01 -05:00
Brandon Hancock (bhancock_ai)
0532089246 Incorporate y4izus fix (#1893) 2025-01-14 13:35:21 -05:00
Brandon Hancock (bhancock_ai)
24b155015c before kickoff breaks if inputs are none. (#1883)
* before kickoff breaks if inputs are none.

* improve none type

* Fix failing tests

* add tests for new code

* Fix failing test

* drop extra comments

* clean up based on eduardo feedback
2025-01-14 13:24:03 -05:00
4 changed files with 8 additions and 6 deletions

View File

@@ -1,14 +1,14 @@
---
title: Using Multimodal Agents
description: Learn how to enable and use multimodal capabilities in your agents for processing images and other non-text content within the CrewAI framework.
icon: image
icon: video
---
# Using Multimodal Agents
## Using Multimodal Agents
CrewAI supports multimodal agents that can process both text and non-text content like images. This guide will show you how to enable and use multimodal capabilities in your agents.
## Enabling Multimodal Capabilities
### Enabling Multimodal Capabilities
To create a multimodal agent, simply set the `multimodal` parameter to `True` when initializing your agent:
@@ -25,7 +25,7 @@ agent = Agent(
When you set `multimodal=True`, the agent is automatically configured with the necessary tools for handling non-text content, including the `AddImageTool`.
## Working with Images
### Working with Images
The multimodal agent comes pre-configured with the `AddImageTool`, which allows it to process images. You don't need to manually add this tool - it's automatically included when you enable multimodal capabilities.
@@ -108,7 +108,7 @@ The multimodal agent will automatically handle the image processing through its
- Process image content with optional context or specific questions
- Provide analysis and insights based on the visual information and task requirements
## Best Practices
### Best Practices
When working with multimodal agents, keep these best practices in mind:

View File

@@ -91,6 +91,7 @@
"how-to/custom-manager-agent",
"how-to/llm-connections",
"how-to/customizing-agents",
"how-to/multimodal-agents",
"how-to/coding-agents",
"how-to/force-tool-output-as-result",
"how-to/human-input-on-execution",

View File

@@ -676,6 +676,7 @@ class Crew(BaseModel):
else:
self.manager_llm = (
getattr(self.manager_llm, "model_name", None)
or getattr(self.manager_llm, "model", None)
or getattr(self.manager_llm, "deployment_name", None)
or self.manager_llm
)

View File

@@ -43,7 +43,7 @@
"ask_question": "Ask a specific question to one of the following coworkers: {coworkers}\nThe input to this tool should be the coworker, the question you have for them, and ALL necessary context to ask the question properly, they know nothing about the question, so share absolute everything you know, don't reference things but instead explain them.",
"add_image": {
"name": "Add image to content",
"description": "See image to understand it's content, you can optionally ask a question about the image",
"description": "See image to understand its content, you can optionally ask a question about the image",
"default_action": "Please provide a detailed description of this image, including all visual elements, context, and any notable details you can observe."
}
}