mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-01-15 19:18:30 +00:00
Migrate docs from MkDocs to Mintlify (#1423)
* add new mintlify docs * add favicon.svg * minor edits * add github stats
This commit is contained in:
76
docs/tools/seleniumscrapingtool.mdx
Normal file
76
docs/tools/seleniumscrapingtool.mdx
Normal file
@@ -0,0 +1,76 @@
|
||||
---
|
||||
title: Selenium Scraper
|
||||
description: The `SeleniumScrapingTool` is designed to extract and read the content of a specified website using Selenium.
|
||||
icon: clipboard-user
|
||||
---
|
||||
|
||||
# `SeleniumScrapingTool`
|
||||
|
||||
<Note>
|
||||
This tool is currently in development. As we refine its capabilities, users may encounter unexpected behavior.
|
||||
Your feedback is invaluable to us for making improvements.
|
||||
</Note>
|
||||
|
||||
## Description
|
||||
|
||||
The SeleniumScrapingTool is crafted for high-efficiency web scraping tasks.
|
||||
It allows for precise extraction of content from web pages by using CSS selectors to target specific elements.
|
||||
Its design caters to a wide range of scraping needs, offering flexibility to work with any provided website URL.
|
||||
|
||||
## Installation
|
||||
|
||||
To get started with the SeleniumScrapingTool, install the crewai_tools package using pip:
|
||||
|
||||
```shell
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Usage Examples
|
||||
|
||||
Below are some scenarios where the SeleniumScrapingTool can be utilized:
|
||||
|
||||
```python Code
|
||||
from crewai_tools import SeleniumScrapingTool
|
||||
|
||||
# Example 1:
|
||||
# Initialize the tool without any parameters to scrape
|
||||
# the current page it navigates to
|
||||
tool = SeleniumScrapingTool()
|
||||
|
||||
# Example 2:
|
||||
# Scrape the entire webpage of a given URL
|
||||
tool = SeleniumScrapingTool(website_url='https://example.com')
|
||||
|
||||
# Example 3:
|
||||
# Target and scrape a specific CSS element from a webpage
|
||||
tool = SeleniumScrapingTool(
|
||||
website_url='https://example.com',
|
||||
css_element='.main-content'
|
||||
)
|
||||
|
||||
# Example 4:
|
||||
# Perform scraping with additional parameters for a customized experience
|
||||
tool = SeleniumScrapingTool(
|
||||
website_url='https://example.com',
|
||||
css_element='.main-content',
|
||||
cookie={'name': 'user', 'value': 'John Doe'},
|
||||
wait_time=10
|
||||
)
|
||||
```
|
||||
|
||||
## Arguments
|
||||
|
||||
The following parameters can be used to customize the SeleniumScrapingTool's scraping process:
|
||||
|
||||
| Argument | Type | Description |
|
||||
|:---------------|:---------|:-------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| **website_url** | `string` | **Mandatory**. Specifies the URL of the website from which content is to be scraped. |
|
||||
| **css_element** | `string` | **Mandatory**. The CSS selector for a specific element to target on the website, enabling focused scraping of a particular part of a webpage. |
|
||||
| **cookie** | `object` | **Optional**. A dictionary containing cookie information, useful for simulating a logged-in session to access restricted content. |
|
||||
| **wait_time** | `int` | **Optional**. Specifies the delay (in seconds) before scraping, allowing the website and any dynamic content to fully load. |
|
||||
|
||||
|
||||
<Warning>
|
||||
Since the `SeleniumScrapingTool` is under active development, the parameters and functionality may evolve over time.
|
||||
Users are encouraged to keep the tool updated and report any issues or suggestions for enhancements.
|
||||
</Warning>
|
||||
Reference in New Issue
Block a user