docs: add StagehandTool documentation and improve MDX structure (#2842)

2026-04-30 14:52:36 +00:00 · 2025-05-15 12:24:25 -04:00
parent 49bbf3f234
commit 0b35e40a24
9 changed files with 245 additions and 14 deletions
--- a/docs/tools/stagehandtool.mdx
+++ b/docs/tools/stagehandtool.mdx
@@ -0,0 +1,244 @@
+---
+title: Stagehand Tool
+description: Web automation tool that integrates Stagehand with CrewAI for browser interaction and automation
+icon: hand
+---
+
+
+# Overview
+
+The `StagehandTool` integrates the [Stagehand](https://docs.stagehand.dev/get_started/introduction) framework with CrewAI, enabling agents to interact with websites and automate browser tasks using natural language instructions.
+
+## Overview
+
+Stagehand is a powerful browser automation framework built by Browserbase that allows AI agents to:
+
+- Navigate to websites
+- Click buttons, links, and other elements
+- Fill in forms
+- Extract data from web pages
+- Observe and identify elements
+- Perform complex workflows
+
+The StagehandTool wraps the Stagehand Python SDK to provide CrewAI agents with browser control capabilities through three core primitives:
+
+1. **Act**: Perform actions like clicking, typing, or navigating
+2. **Extract**: Extract structured data from web pages
+3. **Observe**: Identify and analyze elements on the page
+
+## Prerequisites
+
+Before using this tool, ensure you have:
+
+1. A [Browserbase](https://www.browserbase.com/) account with API key and project ID
+2. An API key for an LLM (OpenAI or Anthropic Claude)
+3. The Stagehand Python SDK installed
+
+Install the required dependency:
+
+```bash
+pip install stagehand-py
+```
+
+## Usage
+
+### Basic Implementation
+
+The StagehandTool can be implemented in two ways:
+
+#### 1. Using Context Manager (Recommended)
+<Tip>
+  The context manager approach is recommended as it ensures proper cleanup of resources even if exceptions occur.
+</Tip>
+
+```python
+from crewai import Agent, Task, Crew
+from crewai_tools import StagehandTool
+from stagehand.schemas import AvailableModel
+
+# Initialize the tool with your API keys using a context manager
+with StagehandTool(
+    api_key="your-browserbase-api-key",
+    project_id="your-browserbase-project-id",
+    model_api_key="your-llm-api-key",  # OpenAI or Anthropic API key
+    model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,  # Optional: specify which model to use
+) as stagehand_tool:
+    # Create an agent with the tool
+    researcher = Agent(
+        role="Web Researcher",
+        goal="Find and summarize information from websites",
+        backstory="I'm an expert at finding information online.",
+        verbose=True,
+        tools=[stagehand_tool],
+    )
+
+    # Create a task that uses the tool
+    research_task = Task(
+        description="Go to https://www.example.com and tell me what you see on the homepage.",
+        agent=researcher,
+    )
+
+    # Run the crew
+    crew = Crew(
+        agents=[researcher],
+        tasks=[research_task],
+        verbose=True,
+    )
+
+    result = crew.kickoff()
+    print(result)
+```
+
+#### 2. Manual Resource Management
+
+```python
+from crewai import Agent, Task, Crew
+from crewai_tools import StagehandTool
+from stagehand.schemas import AvailableModel
+
+# Initialize the tool with your API keys
+stagehand_tool = StagehandTool(
+    api_key="your-browserbase-api-key",
+    project_id="your-browserbase-project-id",
+    model_api_key="your-llm-api-key",
+    model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,
+)
+
+try:
+    # Create an agent with the tool
+    researcher = Agent(
+        role="Web Researcher",
+        goal="Find and summarize information from websites",
+        backstory="I'm an expert at finding information online.",
+        verbose=True,
+        tools=[stagehand_tool],
+    )
+
+    # Create a task that uses the tool
+    research_task = Task(
+        description="Go to https://www.example.com and tell me what you see on the homepage.",
+        agent=researcher,
+    )
+
+    # Run the crew
+    crew = Crew(
+        agents=[researcher],
+        tasks=[research_task],
+        verbose=True,
+    )
+
+    result = crew.kickoff()
+    print(result)
+finally:
+    # Explicitly clean up resources
+    stagehand_tool.close()
+```
+
+## Command Types
+
+The StagehandTool supports three different command types for specific web automation tasks:
+
+### 1. Act Command
+
+The `act` command type (default) enables webpage interactions like clicking buttons, filling forms, and navigation.
+
+```python
+# Perform an action (default behavior)
+result = stagehand_tool.run(
+    instruction="Click the login button", 
+    url="https://example.com",
+    command_type="act"  # Default, so can be omitted
+)
+
+# Fill out a form
+result = stagehand_tool.run(
+    instruction="Fill the contact form with name 'John Doe', email 'john@example.com', and message 'Hello world'", 
+    url="https://example.com/contact"
+)
+```
+
+### 2. Extract Command
+
+The `extract` command type retrieves structured data from webpages.
+
+```python
+# Extract all product information
+result = stagehand_tool.run(
+    instruction="Extract all product names, prices, and descriptions", 
+    url="https://example.com/products",
+    command_type="extract"
+)
+
+# Extract specific information with a selector
+result = stagehand_tool.run(
+    instruction="Extract the main article title and content", 
+    url="https://example.com/blog/article",
+    command_type="extract",
+    selector=".article-container"  # Optional CSS selector
+)
+```
+
+### 3. Observe Command
+
+The `observe` command type identifies and analyzes webpage elements.
+
+```python
+# Find interactive elements
+result = stagehand_tool.run(
+    instruction="Find all interactive elements in the navigation menu", 
+    url="https://example.com",
+    command_type="observe"
+)
+
+# Identify form fields
+result = stagehand_tool.run(
+    instruction="Identify all the input fields in the registration form", 
+    url="https://example.com/register",
+    command_type="observe",
+    selector="#registration-form"
+)
+```
+
+## Configuration Options
+
+Customize the StagehandTool behavior with these parameters:
+
+```python
+stagehand_tool = StagehandTool(
+    api_key="your-browserbase-api-key",
+    project_id="your-browserbase-project-id",
+    model_api_key="your-llm-api-key",
+    model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,
+    dom_settle_timeout_ms=5000,  # Wait longer for DOM to settle
+    headless=True,  # Run browser in headless mode
+    self_heal=True,  # Attempt to recover from errors
+    wait_for_captcha_solves=True,  # Wait for CAPTCHA solving
+    verbose=1,  # Control logging verbosity (0-3)
+)
+```
+
+## Best Practices
+
+1. **Be Specific**: Provide detailed instructions for better results
+2. **Choose Appropriate Command Type**: Select the right command type for your task
+3. **Use Selectors**: Leverage CSS selectors to improve accuracy
+4. **Break Down Complex Tasks**: Split complex workflows into multiple tool calls
+5. **Implement Error Handling**: Add error handling for potential issues
+
+## Troubleshooting
+
+
+Common issues and solutions:
+
+- **Session Issues**: Verify API keys for both Browserbase and LLM provider
+- **Element Not Found**: Increase `dom_settle_timeout_ms` for slower pages
+- **Action Failures**: Use `observe` to identify correct elements first
+- **Incomplete Data**: Refine instructions or provide specific selectors
+
+
+## Additional Resources
+
+For questions about the CrewAI integration:
+- Join Stagehand's [Slack community](https://stagehand.dev/slack)
+- Open an issue in the [Stagehand repository](https://github.com/browserbase/stagehand)
+- Visit [Stagehand documentation](https://docs.stagehand.dev/)