docs: major docs updates (#2897)

2026-01-09 16:18:30 +00:00 · 2025-05-23 16:04:37 -04:00
parent be24559630
commit 2460f61d3e
111 changed files with 2952 additions and 1362 deletions
--- a/docs/tools/web-scraping/overview.mdx
+++ b/docs/tools/web-scraping/overview.mdx
@@ -0,0 +1,103 @@
+---
+title: "Overview"
+description: "Extract data from websites and automate browser interactions with powerful scraping tools"
+icon: "face-smile"
+---
+
+These tools enable your agents to interact with the web, extract data from websites, and automate browser-based tasks. From simple web scraping to complex browser automation, these tools cover all your web interaction needs.
+
+## **Available Tools**
+
+<CardGroup cols={2}>
+  <Card title="Scrape Website Tool" icon="globe" href="/tools/web-scraping/scrapewebsitetool">
+    General-purpose web scraping tool for extracting content from any website.
+  </Card>
+
+  <Card title="Scrape Element Tool" icon="crosshairs" href="/tools/web-scraping/scrapeelementfromwebsitetool">
+    Target specific elements on web pages with precision scraping capabilities.
+  </Card>
+
+  <Card title="Firecrawl Crawl Tool" icon="spider" href="/tools/web-scraping/firecrawlcrawlwebsitetool">
+    Crawl entire websites systematically with Firecrawl's powerful engine.
+  </Card>
+
+  <Card title="Firecrawl Scrape Tool" icon="fire" href="/tools/web-scraping/firecrawlscrapewebsitetool">
+    High-performance web scraping with Firecrawl's advanced capabilities.
+  </Card>
+
+  <Card title="Firecrawl Search Tool" icon="magnifying-glass" href="/tools/web-scraping/firecrawlsearchtool">
+    Search and extract specific content using Firecrawl's search features.
+  </Card>
+
+  <Card title="Selenium Scraping Tool" icon="robot" href="/tools/web-scraping/seleniumscrapingtool">
+    Browser automation and scraping with Selenium WebDriver capabilities.
+  </Card>
+
+  <Card title="ScrapFly Tool" icon="plane" href="/tools/web-scraping/scrapflyscrapetool">
+    Professional web scraping with ScrapFly's premium scraping service.
+  </Card>
+
+  <Card title="ScrapGraph Tool" icon="network-wired" href="/tools/web-scraping/scrapegraphscrapetool">
+    Graph-based web scraping for complex data relationships.
+  </Card>
+
+  <Card title="Spider Tool" icon="spider" href="/tools/web-scraping/spidertool">
+    Comprehensive web crawling and data extraction capabilities.
+  </Card>
+
+  <Card title="BrowserBase Tool" icon="browser" href="/tools/web-scraping/browserbaseloadtool">
+    Cloud-based browser automation with BrowserBase infrastructure.
+  </Card>
+
+  <Card title="HyperBrowser Tool" icon="window-maximize" href="/tools/web-scraping/hyperbrowserloadtool">
+    Fast browser interactions with HyperBrowser's optimized engine.
+  </Card>
+
+  <Card title="Stagehand Tool" icon="hand" href="/tools/web-scraping/stagehandtool">
+    Intelligent browser automation with natural language commands.
+  </Card>
+</CardGroup>
+
+## **Common Use Cases**
+
+- **Data Extraction**: Scrape product information, prices, and reviews
+- **Content Monitoring**: Track changes on websites and news sources
+- **Lead Generation**: Extract contact information and business data
+- **Market Research**: Gather competitive intelligence and market data
+- **Testing & QA**: Automate browser testing and validation workflows
+- **Social Media**: Extract posts, comments, and social media analytics
+
+## **Quick Start Example**
+
+```python
+from crewai_tools import ScrapeWebsiteTool, FirecrawlScrapeWebsiteTool, SeleniumScrapingTool
+
+# Create scraping tools
+simple_scraper = ScrapeWebsiteTool()
+advanced_scraper = FirecrawlScrapeWebsiteTool()
+browser_automation = SeleniumScrapingTool()
+
+# Add to your agent
+agent = Agent(
+    role="Web Research Specialist",
+    tools=[simple_scraper, advanced_scraper, browser_automation],
+    goal="Extract and analyze web data efficiently"
+)
+```
+
+## **Scraping Best Practices**
+
+- **Respect robots.txt**: Always check and follow website scraping policies
+- **Rate Limiting**: Implement delays between requests to avoid overwhelming servers
+- **User Agents**: Use appropriate user agent strings to identify your bot
+- **Legal Compliance**: Ensure your scraping activities comply with terms of service
+- **Error Handling**: Implement robust error handling for network issues and blocked requests
+- **Data Quality**: Validate and clean extracted data before processing
+
+## **Tool Selection Guide**
+
+- **Simple Tasks**: Use `ScrapeWebsiteTool` for basic content extraction
+- **JavaScript-Heavy Sites**: Use `SeleniumScrapingTool` for dynamic content
+- **Scale & Performance**: Use `FirecrawlScrapeWebsiteTool` for high-volume scraping
+- **Cloud Infrastructure**: Use `BrowserBaseLoadTool` for scalable browser automation
+- **Complex Workflows**: Use `StagehandTool` for intelligent browser interactions