Files
crewAI/crewai_tools/tools/scrapegraph_scrape_tool
Greyson Lalonde e16606672a Squashed 'packages/tools/' content from commit 78317b9c
git-subtree-dir: packages/tools
git-subtree-split: 78317b9c127f18bd040c1d77e3c0840cdc9a5b38
2025-09-12 21:58:02 -04:00
..

ScrapegraphScrapeTool

Description

A tool that leverages Scrapegraph AI's SmartScraper API to intelligently extract content from websites. This tool provides advanced web scraping capabilities with AI-powered content extraction, making it ideal for targeted data collection and content analysis tasks.

Installation

Install the required packages:

pip install 'crewai[tools]'

Example Usage

Basic Usage

from crewai_tools import ScrapegraphScrapeTool

# Basic usage with API key
tool = ScrapegraphScrapeTool(api_key="your_api_key")
result = tool.run(
    website_url="https://www.example.com",
    user_prompt="Extract the main heading and summary"
)

Fixed Website URL

# Initialize with a fixed website URL
tool = ScrapegraphScrapeTool(
    website_url="https://www.example.com",
    api_key="your_api_key"
)
result = tool.run()

Custom Prompt

# With custom prompt
tool = ScrapegraphScrapeTool(
    api_key="your_api_key",
    user_prompt="Extract all product prices and descriptions"
)
result = tool.run(website_url="https://www.example.com")

Error Handling

try:
    tool = ScrapegraphScrapeTool(api_key="your_api_key")
    result = tool.run(
        website_url="https://www.example.com",
        user_prompt="Extract the main heading"
    )
except ValueError as e:
    print(f"Configuration error: {e}")  # Handles invalid URLs or missing API keys
except RuntimeError as e:
    print(f"Scraping error: {e}")  # Handles API or network errors

Arguments

  • website_url: The URL of the website to scrape (required if not set during initialization)
  • user_prompt: Custom instructions for content extraction (optional)
  • api_key: Your Scrapegraph API key (required, can be set via SCRAPEGRAPH_API_KEY environment variable)

Environment Variables

  • SCRAPEGRAPH_API_KEY: Your Scrapegraph API key, you can obtain one here

Rate Limiting

The Scrapegraph API has rate limits that vary based on your subscription plan. Consider the following best practices:

  • Implement appropriate delays between requests when processing multiple URLs
  • Handle rate limit errors gracefully in your application
  • Check your API plan limits on the Scrapegraph dashboard

Error Handling

The tool may raise the following exceptions:

  • ValueError: When API key is missing or URL format is invalid
  • RuntimeError: When scraping operation fails (network issues, API errors)
  • RateLimitError: When API rate limits are exceeded

Best Practices

  1. Always validate URLs before making requests
  2. Implement proper error handling as shown in examples
  3. Consider caching results for frequently accessed pages
  4. Monitor your API usage through the Scrapegraph dashboard