Files
crewAI/lib/crewai-tools/src/crewai_tools/tools/pdf_search_tool
Greyson LaLonde a928cde6ee fix: rag tool embeddings config
* fix: ensure config is not flattened, add tests

* chore: refactor inits to model_validator

* chore: refactor rag tool config parsing

* chore: add initial docs

* chore: add additional validation aliases for provider env vars

* chore: add solid docs

* chore: move imports to top

* fix: revert circular import

* fix: lazy import qdrant-client

* fix: allow collection name config

* chore: narrow model names for google

* chore: update additional docs

* chore: add backward compat on model name aliases

* chore: add tests for config changes
2025-11-24 16:51:28 -05:00
..
2025-10-20 14:10:19 -07:00
2025-10-20 14:10:19 -07:00

PDFSearchTool

Description

The PDFSearchTool is a RAG tool designed for semantic searches within PDF content. It allows for inputting a search query and a PDF document, leveraging advanced search techniques to find relevant content efficiently. This capability makes it especially useful for extracting specific information from large PDF files quickly.

Installation

To get started with the PDFSearchTool, first, ensure the crewai_tools package is installed with the following command:

pip install 'crewai[tools]'

Example

Here's how to use the PDFSearchTool to search within a PDF document:

from crewai_tools import PDFSearchTool

# Initialize the tool allowing for any PDF content search if the path is provided during execution
tool = PDFSearchTool()

# OR

# Initialize the tool with a specific PDF path for exclusive search within that document
tool = PDFSearchTool(pdf='path/to/your/document.pdf')

Arguments

  • pdf: Optinal The PDF path for the search. Can be provided at initialization or within the run method's arguments. If provided at initialization, the tool confines its search to the specified document.

Custom model and embeddings

By default, the tool uses OpenAI for both embeddings and summarization. To customize the model, you can use a config dictionary as follows:

tool = PDFSearchTool(
    config=dict(
        llm=dict(
            provider="ollama", # or google, openai, anthropic, llama2, ...
            config=dict(
                model="llama2",
                # temperature=0.5,
                # top_p=1,
                # stream=true,
            ),
        ),
        embedder=dict(
            provider="google",
            config=dict(
                model="models/embedding-001",
                task_type="retrieval_document",
                # title="Embeddings",
            ),
        ),
    )
)