mirror of
https://github.com/crewAIInc/crewAI.git
synced 2026-01-23 15:18:14 +00:00
Squashed 'packages/tools/' content from commit 78317b9c
git-subtree-dir: packages/tools git-subtree-split: 78317b9c127f18bd040c1d77e3c0840cdc9a5b38
This commit is contained in:
44
crewai_tools/tools/selenium_scraping_tool/README.md
Normal file
44
crewai_tools/tools/selenium_scraping_tool/README.md
Normal file
@@ -0,0 +1,44 @@
|
||||
# SeleniumScrapingTool
|
||||
|
||||
## Description
|
||||
This tool is designed for efficient web scraping, enabling users to extract content from web pages. It supports targeted scraping by allowing the specification of a CSS selector for desired elements. The flexibility of the tool enables it to be used on any website URL provided by the user, making it a versatile tool for various web scraping needs.
|
||||
|
||||
## Installation
|
||||
Install the crewai_tools package
|
||||
```
|
||||
pip install 'crewai[tools]'
|
||||
```
|
||||
|
||||
## Example
|
||||
```python
|
||||
from crewai_tools import SeleniumScrapingTool
|
||||
|
||||
# Example 1: Scrape any website it finds during its execution
|
||||
tool = SeleniumScrapingTool()
|
||||
|
||||
# Example 2: Scrape the entire webpage
|
||||
tool = SeleniumScrapingTool(website_url='https://example.com')
|
||||
|
||||
# Example 3: Scrape a specific CSS element from the webpage
|
||||
tool = SeleniumScrapingTool(website_url='https://example.com', css_element='.main-content')
|
||||
|
||||
# Example 4: Scrape using optional parameters for customized scraping
|
||||
tool = SeleniumScrapingTool(website_url='https://example.com', css_element='.main-content', cookie={'name': 'user', 'value': 'John Doe'})
|
||||
|
||||
# Example 5: Scrape content in HTML format
|
||||
tool = SeleniumScrapingTool(website_url='https://example.com', return_html=True)
|
||||
result = tool._run()
|
||||
# Returns HTML content like: ['<div class="content">Hello World</div>', '<div class="footer">Copyright 2024</div>']
|
||||
|
||||
# Example 6: Scrape content in text format (default)
|
||||
tool = SeleniumScrapingTool(website_url='https://example.com', return_html=False)
|
||||
result = tool._run()
|
||||
# Returns text content like: ['Hello World', 'Copyright 2024']
|
||||
```
|
||||
|
||||
## Arguments
|
||||
- `website_url`: Mandatory. The URL of the website to scrape.
|
||||
- `css_element`: Mandatory. The CSS selector for a specific element to scrape from the website.
|
||||
- `cookie`: Optional. A dictionary containing cookie information. This parameter allows the tool to simulate a session with cookie information, providing access to content that may be restricted to logged-in users.
|
||||
- `wait_time`: Optional. The number of seconds the tool waits after loading the website and after setting a cookie, before scraping the content. This allows for dynamic content to load properly.
|
||||
- `return_html`: Optional. If True, the tool returns HTML content. If False, the tool returns text content.
|
||||
Reference in New Issue
Block a user