Files
crewAI/lib/crewai-tools/src/crewai_tools/tools/scrapegraph_scrape_tool
Greyson LaLonde c5ea415cda
Some checks failed
CodeQL Advanced / Analyze (actions) (push) Has been cancelled
CodeQL Advanced / Analyze (python) (push) Has been cancelled
Check Documentation Broken Links / Check broken links (push) Has been cancelled
Vulnerability Scan / pip-audit (push) Has been cancelled
Nightly Canary Release / Check for new commits (push) Has been cancelled
Nightly Canary Release / Build nightly packages (push) Has been cancelled
Nightly Canary Release / Publish nightly to PyPI (push) Has been cancelled
Mark stale issues and pull requests / stale (push) Has been cancelled
chore(crewai-tools): drop self-explanatory comments
2026-05-26 16:25:07 -07:00
..
2025-10-20 14:10:19 -07:00
2025-10-20 14:10:19 -07:00

ScrapegraphScrapeTool

Description

A tool that leverages Scrapegraph AI's SmartScraper API to intelligently extract content from websites. This tool provides advanced web scraping capabilities with AI-powered content extraction, making it ideal for targeted data collection and content analysis tasks.

Installation

Install the required packages:

pip install 'crewai[tools]'

Example Usage

Basic Usage

from crewai_tools import ScrapegraphScrapeTool

# Basic usage with API key
tool = ScrapegraphScrapeTool(api_key="your_api_key")
result = tool.run(
    website_url="https://www.example.com",
    user_prompt="Extract the main heading and summary"
)

Fixed Website URL

# Initialize with a fixed website URL
tool = ScrapegraphScrapeTool(
    website_url="https://www.example.com",
    api_key="your_api_key"
)
result = tool.run()

Custom Prompt

# With custom prompt
tool = ScrapegraphScrapeTool(
    api_key="your_api_key",
    user_prompt="Extract all product prices and descriptions"
)
result = tool.run(website_url="https://www.example.com")

Error Handling

try:
    tool = ScrapegraphScrapeTool(api_key="your_api_key")
    result = tool.run(
        website_url="https://www.example.com",
        user_prompt="Extract the main heading"
    )
except ValueError as e:
    print(f"Configuration error: {e}")  # Handles invalid URLs or missing API keys
except RuntimeError as e:
    print(f"Scraping error: {e}")  # Handles API or network errors

Arguments

  • website_url: The URL of the website to scrape (required if not set during initialization)
  • user_prompt: Custom instructions for content extraction (optional)
  • api_key: Your Scrapegraph API key (required, can be set via SCRAPEGRAPH_API_KEY environment variable)

Environment Variables

  • SCRAPEGRAPH_API_KEY: Your Scrapegraph API key, you can obtain one here

Rate Limiting

The Scrapegraph API has rate limits that vary based on your subscription plan. Consider the following best practices:

  • Implement appropriate delays between requests when processing multiple URLs
  • Handle rate limit errors gracefully in your application
  • Check your API plan limits on the Scrapegraph dashboard

Error Handling

The tool may raise the following exceptions:

  • ValueError: When API key is missing or URL format is invalid
  • RuntimeError: When scraping operation fails (network issues, API errors)
  • RateLimitError: When API rate limits are exceeded

Best Practices

  1. Always validate URLs before making requests
  2. Implement proper error handling as shown in examples
  3. Consider caching results for frequently accessed pages
  4. Monitor your API usage through the Scrapegraph dashboard