mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-01-10 16:48:30 +00:00
git-subtree-dir: packages/tools git-subtree-split: 78317b9c127f18bd040c1d77e3c0840cdc9a5b38
ScrapegraphScrapeTool
Description
A tool that leverages Scrapegraph AI's SmartScraper API to intelligently extract content from websites. This tool provides advanced web scraping capabilities with AI-powered content extraction, making it ideal for targeted data collection and content analysis tasks.
Installation
Install the required packages:
pip install 'crewai[tools]'
Example Usage
Basic Usage
from crewai_tools import ScrapegraphScrapeTool
# Basic usage with API key
tool = ScrapegraphScrapeTool(api_key="your_api_key")
result = tool.run(
website_url="https://www.example.com",
user_prompt="Extract the main heading and summary"
)
Fixed Website URL
# Initialize with a fixed website URL
tool = ScrapegraphScrapeTool(
website_url="https://www.example.com",
api_key="your_api_key"
)
result = tool.run()
Custom Prompt
# With custom prompt
tool = ScrapegraphScrapeTool(
api_key="your_api_key",
user_prompt="Extract all product prices and descriptions"
)
result = tool.run(website_url="https://www.example.com")
Error Handling
try:
tool = ScrapegraphScrapeTool(api_key="your_api_key")
result = tool.run(
website_url="https://www.example.com",
user_prompt="Extract the main heading"
)
except ValueError as e:
print(f"Configuration error: {e}") # Handles invalid URLs or missing API keys
except RuntimeError as e:
print(f"Scraping error: {e}") # Handles API or network errors
Arguments
website_url: The URL of the website to scrape (required if not set during initialization)user_prompt: Custom instructions for content extraction (optional)api_key: Your Scrapegraph API key (required, can be set via SCRAPEGRAPH_API_KEY environment variable)
Environment Variables
SCRAPEGRAPH_API_KEY: Your Scrapegraph API key, you can obtain one here
Rate Limiting
The Scrapegraph API has rate limits that vary based on your subscription plan. Consider the following best practices:
- Implement appropriate delays between requests when processing multiple URLs
- Handle rate limit errors gracefully in your application
- Check your API plan limits on the Scrapegraph dashboard
Error Handling
The tool may raise the following exceptions:
ValueError: When API key is missing or URL format is invalidRuntimeError: When scraping operation fails (network issues, API errors)RateLimitError: When API rate limits are exceeded
Best Practices
- Always validate URLs before making requests
- Implement proper error handling as shown in examples
- Consider caching results for frequently accessed pages
- Monitor your API usage through the Scrapegraph dashboard