Files
crewAI/lib
Devin AI 53a726ad6d Fix #5930: Extract text from PDFs in ReadFileTool instead of returning base64
When using input_files with PDFFile, the read_file tool was returning
the entire PDF as base64-encoded binary data. This caused:
- Massive context bloat for the LLM
- Inconsistent responses and context overflow
- The same file being re-processed on each tool call

Now ReadFileTool detects application/pdf content and extracts text
using pypdf (already a dependency via crewai-files) instead of
base64-encoding the raw bytes. Each page is labeled with a page
number header for clarity. Graceful fallbacks are provided when:
- pypdf is not installed (short install message)
- The PDF contains no extractable text (friendly message)
- The PDF is corrupted (error message, never base64)

Co-Authored-By: João <joao@crewai.com>
2026-05-26 13:13:56 +00:00
..
2026-05-21 15:09:48 +08:00