Commit Graph

1 Commits

Author SHA1 Message Date
Devin AI
29d40de651 fix: properly handle multimodal image content in agents (Issue #4016)
This commit fixes the multimodal image handling in CrewAI agents:

1. CrewAgentExecutor._handle_agent_action: Fixed to properly append
   multimodal messages without double-wrapping. The add_image tool
   result is now appended directly if it's already a properly formatted
   message dict with role and content.

2. ToolUsage._use: Fixed to preserve the raw dict result for add_image
   tool instead of stringifying it via _format_result(). This ensures
   the multimodal content structure is maintained.

3. Gemini provider _format_messages_for_gemini: Added proper handling
   for multimodal content with image_url parts. The provider now:
   - Converts image_url parts to Gemini's inline_data format
   - Supports HTTP(S) URLs by fetching and converting to base64
   - Supports data URLs by parsing and extracting base64 data
   - Supports local file paths by reading and converting to base64

4. Added comprehensive tests for all multimodal handling fixes.

Fixes #4016

Co-Authored-By: João <joao@crewai.com>
2025-12-02 10:13:10 +00:00