crewAI/docs/tools/jina-scrape-website-tool.mdx

---
title: JinaScrapeWebsiteTool
description: A tool for scraping website content using Jina.ai's reader service with markdown output
icon: globe
---

## JinaScrapeWebsiteTool

The JinaScrapeWebsiteTool provides website content scraping capabilities using Jina.ai's reader service. It converts web content into clean markdown format and supports both fixed and dynamic URL modes with optional authentication.

## Installation

```bash
pip install 'crewai[tools]'
```

## Usage Example

```python
from crewai import Agent
from crewai_tools import JinaScrapeWebsiteTool

# Method 1: Fixed URL (specified at initialization)
fixed_tool = JinaScrapeWebsiteTool(
    website_url="https://example.com",
    api_key="your-jina-api-key"  # Optional
)

# Method 2: Dynamic URL (specified at runtime)
dynamic_tool = JinaScrapeWebsiteTool(
    api_key="your-jina-api-key"  # Optional
)

# Create an agent with the tool
researcher = Agent(
    role='Web Content Researcher',
    goal='Extract and analyze website content',
    backstory='Expert at gathering and processing web information.',
    tools=[fixed_tool],  # or [dynamic_tool]
    verbose=True
)
```

## Input Schema

```python
class JinaScrapeWebsiteToolInput(BaseModel):
    website_url: str = Field(
        description="Mandatory website url to read the file"
    )
```

## Function Signature

```python
def __init__(
    self,
    website_url: Optional[str] = None,
    api_key: Optional[str] = None,
    custom_headers: Optional[dict] = None,
    **kwargs
):
    """
    Initialize the website scraping tool.

    Args:
        website_url (Optional[str]): URL to scrape (optional for dynamic mode)
        api_key (Optional[str]): Jina.ai API key for authentication
        custom_headers (Optional[dict]): Custom HTTP headers
        **kwargs: Additional arguments for base tool
    """

def _run(
    self,
    website_url: Optional[str] = None
) -> str:
    """
    Execute website scraping.

    Args:
        website_url (Optional[str]): URL to scrape (required for dynamic mode)

    Returns:
        str: Markdown-formatted website content
    """
```

## Best Practices

1. URL Handling:
   - Use complete URLs
   - Validate URL format
   - Handle redirects
   - Monitor timeouts

2. Authentication:
   - Secure API key storage
   - Use environment variables
   - Manage headers properly
   - Handle auth errors

3. Content Processing:
   - Handle large pages
   - Process markdown output
   - Manage encoding
   - Handle errors

4. Mode Selection:
   - Choose fixed mode for static sites
   - Use dynamic mode for variable URLs
   - Consider caching
   - Manage timeouts

## Integration Example

```python
from crewai import Agent, Task, Crew
from crewai_tools import JinaScrapeWebsiteTool
import os

# Initialize tool with API key
scraper_tool = JinaScrapeWebsiteTool(
    api_key=os.getenv('JINA_API_KEY'),
    custom_headers={
        'User-Agent': 'CrewAI Bot 1.0'
    }
)

# Create agent
researcher = Agent(
    role='Web Content Analyst',
    goal='Extract and analyze website content',
    backstory='Expert at processing web information.',
    tools=[scraper_tool]
)

# Define task
analysis_task = Task(
    description="""Analyze the content of
    https://example.com/blog for key insights.""",
    agent=researcher
)

# Create crew
crew = Crew(
    agents=[researcher],
    tasks=[analysis_task]
)

# Execute
result = crew.kickoff()
```

## Advanced Usage

### Multiple Site Analysis
```python
# Initialize tool
scraper = JinaScrapeWebsiteTool(
    api_key=os.getenv('JINA_API_KEY')
)

# Analyze multiple sites
results = []
sites = [
    "https://site1.com",
    "https://site2.com",
    "https://site3.com"
]

for site in sites:
    content = scraper.run(
        website_url=site
    )
    results.append(content)
```

### Custom Headers Configuration
```python
# Initialize with custom headers
tool = JinaScrapeWebsiteTool(
    custom_headers={
        'User-Agent': 'Custom Bot 1.0',
        'Accept-Language': 'en-US,en;q=0.9',
        'Accept': 'text/html,application/xhtml+xml'
    }
)

# Use the tool
content = tool.run(
    website_url="https://example.com"
)
```

### Error Handling Example
```python
try:
    scraper = JinaScrapeWebsiteTool()
    content = scraper.run(
        website_url="https://example.com"
    )
    print(content)
except requests.exceptions.RequestException as e:
    print(f"Error accessing website: {str(e)}")
except Exception as e:
    print(f"Error processing content: {str(e)}")
```

## Notes

- Uses Jina.ai reader service
- Markdown output format
- API key authentication
- Custom headers support
- Error handling
- Timeout management
- Content processing
- URL validation
- Redirect handling
- Response formatting