andre/crewAI

Fork 0

mirror of https://github.com/crewAIInc/crewAI.git synced 2026-07-03 22:19:27 +00:00

Files

History

Greyson LaLonde c5ea415cda

CodeQL Advanced / Analyze (actions) (push) Has been cancelled

Details

CodeQL Advanced / Analyze (python) (push) Has been cancelled

Details

Check Documentation Broken Links / Check broken links (push) Has been cancelled

Details

Vulnerability Scan / pip-audit (push) Has been cancelled

Details

Nightly Canary Release / Check for new commits (push) Has been cancelled

Details

Nightly Canary Release / Build nightly packages (push) Has been cancelled

Details

Nightly Canary Release / Publish nightly to PyPI (push) Has been cancelled

Details

Mark stale issues and pull requests / stale (push) Has been cancelled

Details

chore(crewai-tools): drop self-explanatory comments

2026-05-26 16:25:07 -07:00

__init__.py

Release/v1.0.0 (#3618 )

2025-10-20 14:10:19 -07:00

README.md

Release/v1.0.0 (#3618 )

2025-10-20 14:10:19 -07:00

scrapegraph_scrape_tool.py

chore(crewai-tools): drop self-explanatory comments

2026-05-26 16:25:07 -07:00

README.md

ScrapegraphScrapeTool

Description

A tool that leverages Scrapegraph AI's SmartScraper API to intelligently extract content from websites. This tool provides advanced web scraping capabilities with AI-powered content extraction, making it ideal for targeted data collection and content analysis tasks.

Installation

Install the required packages:

pip install 'crewai[tools]'

Example Usage

Basic Usage

from crewai_tools import ScrapegraphScrapeTool

# Basic usage with API key
tool = ScrapegraphScrapeTool(api_key="your_api_key")
result = tool.run(
    website_url="https://www.example.com",
    user_prompt="Extract the main heading and summary"
)

Fixed Website URL

# Initialize with a fixed website URL
tool = ScrapegraphScrapeTool(
    website_url="https://www.example.com",
    api_key="your_api_key"
)
result = tool.run()

Custom Prompt

# With custom prompt
tool = ScrapegraphScrapeTool(
    api_key="your_api_key",
    user_prompt="Extract all product prices and descriptions"
)
result = tool.run(website_url="https://www.example.com")

Error Handling

try:
    tool = ScrapegraphScrapeTool(api_key="your_api_key")
    result = tool.run(
        website_url="https://www.example.com",
        user_prompt="Extract the main heading"
    )
except ValueError as e:
    print(f"Configuration error: {e}")  # Handles invalid URLs or missing API keys
except RuntimeError as e:
    print(f"Scraping error: {e}")  # Handles API or network errors

Arguments

website_url: The URL of the website to scrape (required if not set during initialization)
user_prompt: Custom instructions for content extraction (optional)
api_key: Your Scrapegraph API key (required, can be set via SCRAPEGRAPH_API_KEY environment variable)

Environment Variables

SCRAPEGRAPH_API_KEY: Your Scrapegraph API key, you can obtain one here

Rate Limiting

The Scrapegraph API has rate limits that vary based on your subscription plan. Consider the following best practices:

Implement appropriate delays between requests when processing multiple URLs
Handle rate limit errors gracefully in your application
Check your API plan limits on the Scrapegraph dashboard

Error Handling

The tool may raise the following exceptions:

ValueError: When API key is missing or URL format is invalid
RuntimeError: When scraping operation fails (network issues, API errors)
RateLimitError: When API rate limits are exceeded

Best Practices

Always validate URLs before making requests
Implement proper error handling as shown in examples
Consider caching results for frequently accessed pages
Monitor your API usage through the Scrapegraph dashboard