docs: add StagehandTool documentation and improve MDX structure (#2842)

2025-12-16 04:18:35 +00:00 · 2025-05-15 12:24:25 -04:00
parent 49bbf3f234
commit 0b35e40a24
9 changed files with 245 additions and 14 deletions
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -129,6 +129,7 @@
              "tools/seleniumscrapingtool",
              "tools/snowflakesearchtool",
              "tools/spidertool",
+              "tools/stagehandtool",
              "tools/txtsearchtool",
              "tools/visiontool",
              "tools/weaviatevectorsearchtool",
--- a/docs/guides/advanced/customizing-prompts.mdx
+++ b/docs/guides/advanced/customizing-prompts.mdx
@@ -4,8 +4,6 @@ description: Dive deeper into low-level prompt customization for CrewAI, enablin
 icon: message-pen
 ---

-# Customizing Prompts at a Low Level
-
 ## Why Customize Prompts?

 Although CrewAI's default prompts work well for many scenarios, low-level customization opens the door to significantly more flexible and powerful agent behavior. Here’s why you might want to take advantage of this deeper control:
--- a/docs/guides/advanced/fingerprinting.mdx
+++ b/docs/guides/advanced/fingerprinting.mdx
@@ -4,8 +4,6 @@ description: Learn how to use CrewAI's fingerprinting system to uniquely identif
 icon: fingerprint
 ---

-# Fingerprinting in CrewAI
-
 ## Overview

 Fingerprints in CrewAI provide a way to uniquely identify and track components throughout their lifecycle. Each `Agent`, `Crew`, and `Task` automatically receives a unique fingerprint when created, which cannot be manually overridden.
--- a/docs/guides/agents/crafting-effective-agents.mdx
+++ b/docs/guides/agents/crafting-effective-agents.mdx
@@ -4,8 +4,6 @@ description: Learn best practices for designing powerful, specialized AI agents
 icon: robot
 ---

-# Crafting Effective Agents
-
 ## The Art and Science of Agent Design

 At the heart of CrewAI lies the agent - a specialized AI entity designed to perform specific roles within a collaborative framework. While creating basic agents is simple, crafting truly effective agents that produce exceptional results requires understanding key design principles and best practices.
--- a/docs/guides/concepts/evaluating-use-cases.mdx
+++ b/docs/guides/concepts/evaluating-use-cases.mdx
@@ -4,8 +4,6 @@ description: Learn how to assess your AI application needs and choose the right
 icon: scale-balanced
 ---

-# Evaluating Use Cases for CrewAI
-
 ## Understanding the Decision Framework

 When building AI applications with CrewAI, one of the most important decisions you'll make is choosing the right approach for your specific use case. Should you use a Crew? A Flow? A combination of both? This guide will help you evaluate your requirements and make informed architectural decisions.
--- a/docs/guides/crews/first-crew.mdx
+++ b/docs/guides/crews/first-crew.mdx
@@ -4,8 +4,6 @@ description: Step-by-step tutorial to create a collaborative AI team that works
 icon: users-gear
 ---

-# Build Your First Crew
-
 ## Unleashing the Power of Collaborative AI

 Imagine having a team of specialized AI agents working together seamlessly to solve complex problems, each contributing their unique skills to achieve a common goal. This is the power of CrewAI - a framework that enables you to create collaborative AI systems that can accomplish tasks far beyond what a single AI could achieve alone.
--- a/docs/guides/flows/first-flow.mdx
+++ b/docs/guides/flows/first-flow.mdx
@@ -4,8 +4,6 @@ description: Learn how to create structured, event-driven workflows with precise
 icon: diagram-project
 ---

-# Build Your First Flow
-
 ## Taking Control of AI Workflows with Flows

 CrewAI Flows represent the next level in AI orchestration - combining the collaborative power of AI agent crews with the precision and flexibility of procedural programming. While crews excel at agent collaboration, flows give you fine-grained control over exactly how and when different components of your AI system interact.
--- a/docs/guides/flows/mastering-flow-state.mdx
+++ b/docs/guides/flows/mastering-flow-state.mdx
@@ -4,8 +4,6 @@ description: A comprehensive guide to managing, persisting, and leveraging state
 icon: diagram-project
 ---

-# Mastering Flow State Management
-
 ## Understanding the Power of State in Flows

 State management is the backbone of any sophisticated AI workflow. In CrewAI Flows, the state system allows you to maintain context, share data between steps, and build complex application logic. Mastering state management is essential for creating reliable, maintainable, and powerful AI applications.
--- a/docs/tools/stagehandtool.mdx
+++ b/docs/tools/stagehandtool.mdx
@@ -0,0 +1,244 @@
+---
+title: Stagehand Tool
+description: Web automation tool that integrates Stagehand with CrewAI for browser interaction and automation
+icon: hand
+---
+
+
+# Overview
+
+The `StagehandTool` integrates the [Stagehand](https://docs.stagehand.dev/get_started/introduction) framework with CrewAI, enabling agents to interact with websites and automate browser tasks using natural language instructions.
+
+## Overview
+
+Stagehand is a powerful browser automation framework built by Browserbase that allows AI agents to:
+
+- Navigate to websites
+- Click buttons, links, and other elements
+- Fill in forms
+- Extract data from web pages
+- Observe and identify elements
+- Perform complex workflows
+
+The StagehandTool wraps the Stagehand Python SDK to provide CrewAI agents with browser control capabilities through three core primitives:
+
+1. **Act**: Perform actions like clicking, typing, or navigating
+2. **Extract**: Extract structured data from web pages
+3. **Observe**: Identify and analyze elements on the page
+
+## Prerequisites
+
+Before using this tool, ensure you have:
+
+1. A [Browserbase](https://www.browserbase.com/) account with API key and project ID
+2. An API key for an LLM (OpenAI or Anthropic Claude)
+3. The Stagehand Python SDK installed
+
+Install the required dependency:
+
+```bash
+pip install stagehand-py
+```
+
+## Usage
+
+### Basic Implementation
+
+The StagehandTool can be implemented in two ways:
+
+#### 1. Using Context Manager (Recommended)
+<Tip>
+  The context manager approach is recommended as it ensures proper cleanup of resources even if exceptions occur.
+</Tip>
+
+```python
+from crewai import Agent, Task, Crew
+from crewai_tools import StagehandTool
+from stagehand.schemas import AvailableModel
+
+# Initialize the tool with your API keys using a context manager
+with StagehandTool(
+    api_key="your-browserbase-api-key",
+    project_id="your-browserbase-project-id",
+    model_api_key="your-llm-api-key",  # OpenAI or Anthropic API key
+    model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,  # Optional: specify which model to use
+) as stagehand_tool:
+    # Create an agent with the tool
+    researcher = Agent(
+        role="Web Researcher",
+        goal="Find and summarize information from websites",
+        backstory="I'm an expert at finding information online.",
+        verbose=True,
+        tools=[stagehand_tool],
+    )
+
+    # Create a task that uses the tool
+    research_task = Task(
+        description="Go to https://www.example.com and tell me what you see on the homepage.",
+        agent=researcher,
+    )
+
+    # Run the crew
+    crew = Crew(
+        agents=[researcher],
+        tasks=[research_task],
+        verbose=True,
+    )
+
+    result = crew.kickoff()
+    print(result)
+```
+
+#### 2. Manual Resource Management
+
+```python
+from crewai import Agent, Task, Crew
+from crewai_tools import StagehandTool
+from stagehand.schemas import AvailableModel
+
+# Initialize the tool with your API keys
+stagehand_tool = StagehandTool(
+    api_key="your-browserbase-api-key",
+    project_id="your-browserbase-project-id",
+    model_api_key="your-llm-api-key",
+    model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,
+)
+
+try:
+    # Create an agent with the tool
+    researcher = Agent(
+        role="Web Researcher",
+        goal="Find and summarize information from websites",
+        backstory="I'm an expert at finding information online.",
+        verbose=True,
+        tools=[stagehand_tool],
+    )
+
+    # Create a task that uses the tool
+    research_task = Task(
+        description="Go to https://www.example.com and tell me what you see on the homepage.",
+        agent=researcher,
+    )
+
+    # Run the crew
+    crew = Crew(
+        agents=[researcher],
+        tasks=[research_task],
+        verbose=True,
+    )
+
+    result = crew.kickoff()
+    print(result)
+finally:
+    # Explicitly clean up resources
+    stagehand_tool.close()
+```
+
+## Command Types
+
+The StagehandTool supports three different command types for specific web automation tasks:
+
+### 1. Act Command
+
+The `act` command type (default) enables webpage interactions like clicking buttons, filling forms, and navigation.
+
+```python
+# Perform an action (default behavior)
+result = stagehand_tool.run(
+    instruction="Click the login button", 
+    url="https://example.com",
+    command_type="act"  # Default, so can be omitted
+)
+
+# Fill out a form
+result = stagehand_tool.run(
+    instruction="Fill the contact form with name 'John Doe', email 'john@example.com', and message 'Hello world'", 
+    url="https://example.com/contact"
+)
+```
+
+### 2. Extract Command
+
+The `extract` command type retrieves structured data from webpages.
+
+```python
+# Extract all product information
+result = stagehand_tool.run(
+    instruction="Extract all product names, prices, and descriptions", 
+    url="https://example.com/products",
+    command_type="extract"
+)
+
+# Extract specific information with a selector
+result = stagehand_tool.run(
+    instruction="Extract the main article title and content", 
+    url="https://example.com/blog/article",
+    command_type="extract",
+    selector=".article-container"  # Optional CSS selector
+)
+```
+
+### 3. Observe Command
+
+The `observe` command type identifies and analyzes webpage elements.
+
+```python
+# Find interactive elements
+result = stagehand_tool.run(
+    instruction="Find all interactive elements in the navigation menu", 
+    url="https://example.com",
+    command_type="observe"
+)
+
+# Identify form fields
+result = stagehand_tool.run(
+    instruction="Identify all the input fields in the registration form", 
+    url="https://example.com/register",
+    command_type="observe",
+    selector="#registration-form"
+)
+```
+
+## Configuration Options
+
+Customize the StagehandTool behavior with these parameters:
+
+```python
+stagehand_tool = StagehandTool(
+    api_key="your-browserbase-api-key",
+    project_id="your-browserbase-project-id",
+    model_api_key="your-llm-api-key",
+    model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,
+    dom_settle_timeout_ms=5000,  # Wait longer for DOM to settle
+    headless=True,  # Run browser in headless mode
+    self_heal=True,  # Attempt to recover from errors
+    wait_for_captcha_solves=True,  # Wait for CAPTCHA solving
+    verbose=1,  # Control logging verbosity (0-3)
+)
+```
+
+## Best Practices
+
+1. **Be Specific**: Provide detailed instructions for better results
+2. **Choose Appropriate Command Type**: Select the right command type for your task
+3. **Use Selectors**: Leverage CSS selectors to improve accuracy
+4. **Break Down Complex Tasks**: Split complex workflows into multiple tool calls
+5. **Implement Error Handling**: Add error handling for potential issues
+
+## Troubleshooting
+
+
+Common issues and solutions:
+
+- **Session Issues**: Verify API keys for both Browserbase and LLM provider
+- **Element Not Found**: Increase `dom_settle_timeout_ms` for slower pages
+- **Action Failures**: Use `observe` to identify correct elements first
+- **Incomplete Data**: Refine instructions or provide specific selectors
+
+
+## Additional Resources
+
+For questions about the CrewAI integration:
+- Join Stagehand's [Slack community](https://stagehand.dev/slack)
+- Open an issue in the [Stagehand repository](https://github.com/browserbase/stagehand)
+- Visit [Stagehand documentation](https://docs.stagehand.dev/)