mirror of https://github.com/crewAIInc/crewAI.git synced 2026-01-20 13:28:13 +00:00

Files

Shady Ali 5af2108307 Fix: FireCrawl FirecrawlCrawlWebsiteTool for crawling. FireCrawl API does not recognize sent paramters (HTTPError: Unexpected error during start crawl job: Status code 400. Bad Request -

[{'code': 'unrecognized_keys', 'keys': ['crawlerOptions', 'timeout'], 'path': [], 'message': 'Unrecognized key in body -- please review the v1 API documentation for request body changes'}]) because it has been updated to v1. I updated the sent parameters to match v1 and updated their description in the readme file

2025-03-08 09:35:23 +02:00

1.6 KiB

Raw Blame History

FirecrawlCrawlWebsiteTool

Description

Firecrawl is a platform for crawling and convert any website into clean markdown or structured data.

Installation

Get an API key from firecrawl.dev and set it in environment variables (FIRECRAWL_API_KEY).
Install the Firecrawl SDK along with crewai[tools] package:

pip install firecrawl-py 'crewai[tools]'

Example

Utilize the FirecrawlScrapeFromWebsiteTool as follows to allow your agent to load websites:

from crewai_tools import FirecrawlCrawlWebsiteTool

tool = FirecrawlCrawlWebsiteTool(url='firecrawl.dev')

Arguments

api_key: Optional. Specifies Firecrawl API key. Defaults is the FIRECRAWL_API_KEY environment variable.
url: The base URL to start crawling from.
page_options: Optional.
- onlyMainContent: Optional. Only return the main content of the page excluding headers, navs, footers, etc.
- includeHtml: Optional. Include the raw HTML content of the page. Will output a html key in the response.
crawler_options: Optional. Options for controlling the crawling behavior.
- maxDepth: Optional. Maximum depth to crawl. Depth 1 is the base URL, depth 2 includes the base URL and its direct children and so on.
- limit: Optional. Maximum number of pages to crawl.
- scrapeOptions: Optional. Additional options for controlling the crawler.
  - formats: Optional. Formats for the page's content to be returned (eg. markdown, html, screenshot, links).
  - timeout: Optional. Timeout in milliseconds for the crawling operation.

1.6 KiB Raw Blame History