mirror of https://github.com/crewAIInc/crewAI.git synced 2026-01-10 16:48:30 +00:00

Files

Greyson Lalonde e16606672a Squashed 'packages/tools/' content from commit 78317b9c

git-subtree-dir: packages/tools
git-subtree-split: 78317b9c127f18bd040c1d77e3c0840cdc9a5b38

2025-09-12 21:58:02 -04:00

ocr_tool.py

Squashed 'packages/tools/' content from commit 78317b9c

2025-09-12 21:58:02 -04:00

README.md

Squashed 'packages/tools/' content from commit 78317b9c

2025-09-12 21:58:02 -04:00

README.md

OCR Tool

Description

This tool performs Optical Character Recognition (OCR) on images using supported LLMs. It can extract text from both local image files and images available via URLs. The tool leverages the LLM's vision capabilities to provide accurate text extraction from images.

Installation

Install the crewai_tools package

pip install 'crewai[tools]'

Supported LLMs

Any LLM that supports the vision feature should work. It must accept image_url as a user message. The tool has been tested with:

OpenAI's gpt-4o
Gemini's gemini/gemini-1.5-pro

Usage

In order to use the OCRTool, make sure your LLM supports the vision feature and the appropriate API key is set in the environment (e.g., OPENAI_API_KEY for OpenAI).

from crewai_tools import OCRTool

selected_llm = LLM(model="gpt-4o") # select your LLM, the tool has been tested with gpt-4o and gemini/gemini-1.5-pro

ocr_tool = OCRTool(llm=selected_llm)

@agent
def researcher(self) -> Agent:
    return Agent(
        config=self.agents_config["researcher"],
        allow_delegation=False,
        tools=[ocr_tool]
    )

The tool accepts either a local file path or a URL to the image:

For local files, provide the absolute or relative path
For remote images, provide the complete URL starting with 'http' or 'https'