mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-07-02 05:38:12 +00:00
When using input_files with PDFFile, the read_file tool was returning the entire PDF as base64-encoded binary data. This caused: - Massive context bloat for the LLM - Inconsistent responses and context overflow - The same file being re-processed on each tool call Now ReadFileTool detects application/pdf content and extracts text using pypdf (already a dependency via crewai-files) instead of base64-encoding the raw bytes. Each page is labeled with a page number header for clarity. Graceful fallbacks are provided when: - pypdf is not installed (short install message) - The PDF contains no extractable text (friendly message) - The PDF is corrupted (error message, never base64) Co-Authored-By: João <joao@crewai.com>