crewAI/docs/tools/scrapegraphscrapetool.mdx

---
title: Scrapegraph Scrape Tool
description: The `ScrapegraphScrapeTool` leverages Scrapegraph AI's SmartScraper API to intelligently extract content from websites.
icon: chart-area
---

# `ScrapegraphScrapeTool`

## Description

The `ScrapegraphScrapeTool` is designed to leverage Scrapegraph AI's SmartScraper API to intelligently extract content from websites. This tool provides advanced web scraping capabilities with AI-powered content extraction, making it ideal for targeted data collection and content analysis tasks. Unlike traditional web scrapers, it can understand the context and structure of web pages to extract the most relevant information based on natural language prompts.

## Installation

To use this tool, you need to install the Scrapegraph Python client:

```shell
uv add scrapegraph-py
```

You'll also need to set up your Scrapegraph API key as an environment variable:

```shell
export SCRAPEGRAPH_API_KEY="your_api_key"
```

You can obtain an API key from [Scrapegraph AI](https://scrapegraphai.com).

## Steps to Get Started

To effectively use the `ScrapegraphScrapeTool`, follow these steps:

1. **Install Dependencies**: Install the required package using the command above.
2. **Set Up API Key**: Set your Scrapegraph API key as an environment variable or provide it during initialization.
3. **Initialize the Tool**: Create an instance of the tool with the necessary parameters.
4. **Define Extraction Prompts**: Create natural language prompts to guide the extraction of specific content.

## Example

The following example demonstrates how to use the `ScrapegraphScrapeTool` to extract content from a website:

```python Code
from crewai import Agent, Task, Crew
from crewai_tools import ScrapegraphScrapeTool

# Initialize the tool
scrape_tool = ScrapegraphScrapeTool(api_key="your_api_key")

# Define an agent that uses the tool
web_scraper_agent = Agent(
    role="Web Scraper",
    goal="Extract specific information from websites",
    backstory="An expert in web scraping who can extract targeted content from web pages.",
    tools=[scrape_tool],
    verbose=True,
)

# Example task to extract product information from an e-commerce site
scrape_task = Task(
    description="Extract product names, prices, and descriptions from the featured products section of example.com.",
    expected_output="A structured list of product information including names, prices, and descriptions.",
    agent=web_scraper_agent,
)

# Create and run the crew
crew = Crew(agents=[web_scraper_agent], tasks=[scrape_task])
result = crew.kickoff()
```

You can also initialize the tool with predefined parameters:

```python Code
# Initialize the tool with predefined parameters
scrape_tool = ScrapegraphScrapeTool(
    website_url="https://www.example.com",
    user_prompt="Extract all product prices and descriptions",
    api_key="your_api_key"
)
```

## Parameters

The `ScrapegraphScrapeTool` accepts the following parameters during initialization:

- **api_key**: Optional. Your Scrapegraph API key. If not provided, it will look for the `SCRAPEGRAPH_API_KEY` environment variable.
- **website_url**: Optional. The URL of the website to scrape. If provided during initialization, the agent won't need to specify it when using the tool.
- **user_prompt**: Optional. Custom instructions for content extraction. If provided during initialization, the agent won't need to specify it when using the tool.
- **enable_logging**: Optional. Whether to enable logging for the Scrapegraph client. Default is `False`.

## Usage

When using the `ScrapegraphScrapeTool` with an agent, the agent will need to provide the following parameters (unless they were specified during initialization):

- **website_url**: The URL of the website to scrape.
- **user_prompt**: Optional. Custom instructions for content extraction. Default is "Extract the main content of the webpage".

The tool will return the extracted content based on the provided prompt.

```python Code
# Example of using the tool with an agent
web_scraper_agent = Agent(
    role="Web Scraper",
    goal="Extract specific information from websites",
    backstory="An expert in web scraping who can extract targeted content from web pages.",
    tools=[scrape_tool],
    verbose=True,
)

# Create a task for the agent to extract specific content
extract_task = Task(
    description="Extract the main heading and summary from example.com",
    expected_output="The main heading and summary from the website",
    agent=web_scraper_agent,
)

# Run the task
crew = Crew(agents=[web_scraper_agent], tasks=[extract_task])
result = crew.kickoff()
```

## Error Handling

The `ScrapegraphScrapeTool` may raise the following exceptions:

- **ValueError**: When API key is missing or URL format is invalid.
- **RateLimitError**: When API rate limits are exceeded.
- **RuntimeError**: When scraping operation fails (network issues, API errors).

It's recommended to instruct agents to handle potential errors gracefully:

```python Code
# Create a task that includes error handling instructions
robust_extract_task = Task(
    description="""
    Extract the main heading from example.com.
    Be aware that you might encounter errors such as:
    - Invalid URL format
    - Missing API key
    - Rate limit exceeded
    - Network or API errors

    If you encounter any errors, provide a clear explanation of what went wrong
    and suggest possible solutions.
    """,
    expected_output="Either the extracted heading or a clear error explanation",
    agent=web_scraper_agent,
)
```

## Rate Limiting

The Scrapegraph API has rate limits that vary based on your subscription plan. Consider the following best practices:

- Implement appropriate delays between requests when processing multiple URLs.
- Handle rate limit errors gracefully in your application.
- Check your API plan limits on the Scrapegraph dashboard.

## Implementation Details

The `ScrapegraphScrapeTool` uses the Scrapegraph Python client to interact with the SmartScraper API:

```python Code
class ScrapegraphScrapeTool(BaseTool):
    """
    A tool that uses Scrapegraph AI to intelligently scrape website content.
    """

    # Implementation details...

    def _run(self, **kwargs: Any) -> Any:
        website_url = kwargs.get("website_url", self.website_url)
        user_prompt = (
            kwargs.get("user_prompt", self.user_prompt)
            or "Extract the main content of the webpage"
        )

        if not website_url:
            raise ValueError("website_url is required")

        # Validate URL format
        self._validate_url(website_url)

        try:
            # Make the SmartScraper request
            response = self._client.smartscraper(
                website_url=website_url,
                user_prompt=user_prompt,
            )

            return response
        # Error handling...
```

## Conclusion

The `ScrapegraphScrapeTool` provides a powerful way to extract content from websites using AI-powered understanding of web page structure. By enabling agents to target specific information using natural language prompts, it makes web scraping tasks more efficient and focused. This tool is particularly useful for data extraction, content monitoring, and research tasks where specific information needs to be extracted from web pages.