Files
crewAI/docs/pt-BR/tools/file-document/ocrtool.mdx
Tony Kipkemboi bf9e0423f2
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Notify Downstream / notify-downstream (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
chore(docs): bring AMP doc refresh from release/v1.0.0 into main (#3637)
* WIP: v1 docs (#3626)

(cherry picked from commit d46e20fa09bcd2f5916282f5553ddeb7183bd92c)

* docs: parity for all translations

* docs: full name of acronym AMP

* docs: fix lingering unused code

* docs: expand contextual options in docs.json

* docs: add contextual action to request feature on GitHub

* chore: tidy docs formatting
2025-10-02 11:36:04 -04:00

89 lines
1.8 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
title: OCR Tool
description: The `OCRTool` extracts text from local images or image URLs using an LLM with vision.
icon: image
mode: "wide"
---
# `OCRTool`
## Description
Extract text from images (local path or URL). Uses a visioncapable LLM via CrewAIs LLM interface.
## Installation
No extra install beyond `crewai-tools`. Ensure your selected LLM supports vision.
## Parameters
### Run Parameters
- `image_path_url` (str, required): Local image path or HTTP(S) URL.
## Examples
### Direct usage
```python Code
from crewai_tools import OCRTool
print(OCRTool().run(image_path_url="/tmp/receipt.png"))
```
### With an agent
```python Code
from crewai import Agent, Task, Crew
from crewai_tools import OCRTool
ocr = OCRTool()
agent = Agent(
role="OCR",
goal="Extract text",
tools=[ocr],
)
task = Task(
description="Extract text from https://example.com/invoice.jpg",
expected_output="All detected text in plain text",
agent=agent,
)
crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff()
```
## Notes
- Ensure the selected LLM supports image inputs.
- For large images, consider downscaling to reduce token usage.
- You can pass a specific LLM instance to the tool (e.g., `LLM(model="gpt-4o")`) if needed, matching the README guidance.
## Example
```python Code
from crewai import Agent, Task, Crew
from crewai_tools import OCRTool
tool = OCRTool()
agent = Agent(
role="OCR Specialist",
goal="Extract text from images",
backstory="Visionenabled analyst",
tools=[tool],
verbose=True,
)
task = Task(
description="Extract text from https://example.com/receipt.png",
expected_output="All detected text in plain text",
agent=agent,
)
crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff()
```