This commit fixes the multimodal image handling in CrewAI agents:
1. CrewAgentExecutor._handle_agent_action: Fixed to properly append
multimodal messages without double-wrapping. The add_image tool
result is now appended directly if it's already a properly formatted
message dict with role and content.
2. ToolUsage._use: Fixed to preserve the raw dict result for add_image
tool instead of stringifying it via _format_result(). This ensures
the multimodal content structure is maintained.
3. Gemini provider _format_messages_for_gemini: Added proper handling
for multimodal content with image_url parts. The provider now:
- Converts image_url parts to Gemini's inline_data format
- Supports HTTP(S) URLs by fetching and converting to base64
- Supports data URLs by parsing and extracting base64 data
- Supports local file paths by reading and converting to base64
4. Added comprehensive tests for all multimodal handling fixes.
Fixes#4016
Co-Authored-By: João <joao@crewai.com>