Updating docs

This commit is contained in:
João Moura
2024-04-04 13:25:04 -03:00
parent fcffc4a898
commit a7f007f475
23 changed files with 337 additions and 360 deletions

View File

@@ -1,36 +1,44 @@
# SeleniumScrapingTool
!!! note "Experimental"
We are still working on improving tools, so there might be unexpected behavior or changes in the future.
This tool is currently in development. As we refine its capabilities, users may encounter unexpected behavior. Your feedback is invaluable to us for making improvements.
## Description
This tool is designed for efficient web scraping, enabling users to extract content from web pages. It supports targeted scraping by allowing the specification of a CSS selector for desired elements. The flexibility of the tool enables it to be used on any website URL provided by the user, making it a versatile tool for various web scraping needs.
The SeleniumScrapingTool is crafted for high-efficiency web scraping tasks. It allows for precise extraction of content from web pages by using CSS selectors to target specific elements. Its design caters to a wide range of scraping needs, offering flexibility to work with any provided website URL.
## Installation
Install the crewai_tools package
To get started with the SeleniumScrapingTool, install the crewai_tools package using pip:
```
pip install 'crewai[tools]'
```
## Example
## Usage Examples
Below are some scenarios where the SeleniumScrapingTool can be utilized:
```python
from crewai_tools import SeleniumScrapingTool
# Example 1: Scrape any website it finds during its execution
# Example 1: Initialize the tool without any parameters to scrape the current page it navigates to
tool = SeleniumScrapingTool()
# Example 2: Scrape the entire webpage
# Example 2: Scrape the entire webpage of a given URL
tool = SeleniumScrapingTool(website_url='https://example.com')
# Example 3: Scrape a specific CSS element from the webpage
# Example 3: Target and scrape a specific CSS element from a webpage
tool = SeleniumScrapingTool(website_url='https://example.com', css_element='.main-content')
# Example 4: Scrape using optional parameters for customized scraping
tool = SeleniumScrapingTool(website_url='https://example.com', css_element='.main-content', cookie={'name': 'user', 'value': 'John Doe'})
# Example 4: Perform scraping with additional parameters for a customized experience
tool = SeleniumScrapingTool(website_url='https://example.com', css_element='.main-content', cookie={'name': 'user', 'value': 'John Doe'}, wait_time=10)
```
## Arguments
- `website_url`: Mandatory. The URL of the website to scrape.
- `css_element`: Mandatory. The CSS selector for a specific element to scrape from the website.
- `cookie`: Optional. A dictionary containing cookie information. This parameter allows the tool to simulate a session with cookie information, providing access to content that may be restricted to logged-in users.
- `wait_time`: Optional. The number of seconds the tool waits after loading the website and after setting a cookie, before scraping the content. This allows for dynamic content to load properly.
The following parameters can be used to customize the SeleniumScrapingTool's scraping process:
- `website_url`: **Mandatory**. Specifies the URL of the website from which content is to be scraped.
- `css_element`: **Mandatory**. The CSS selector for a specific element to target on the website. This enables focused scraping of a particular part of a webpage.
- `cookie`: **Optional**. A dictionary that contains cookie information. Useful for simulating a logged-in session, thereby providing access to content that might be restricted to non-logged-in users.
- `wait_time`: **Optional**. Specifies the delay (in seconds) before the content is scraped. This delay allows for the website and any dynamic content to fully load, ensuring a successful scrape.
!!! attention
Since the SeleniumScrapingTool is under active development, the parameters and functionality may evolve over time. Users are encouraged to keep the tool updated and report any issues or suggestions for enhancements.