docs: add StagehandTool documentation and improve MDX structure (#2842)

2025-12-16 04:18:35 +00:00 · 2025-05-15 12:24:25 -04:00
parent 49bbf3f234
commit 0b35e40a24
9 changed files with 245 additions and 14 deletions
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -129,6 +129,7 @@
              "tools/seleniumscrapingtool",
              "tools/snowflakesearchtool",
              "tools/spidertool",
              "tools/stagehandtool",
              "tools/txtsearchtool",
              "tools/visiontool",
              "tools/weaviatevectorsearchtool",
--- a/docs/guides/advanced/customizing-prompts.mdx
+++ b/docs/guides/advanced/customizing-prompts.mdx
@@ -4,8 +4,6 @@ description: Dive deeper into low-level prompt customization for CrewAI, enablin
 icon: message-pen
 ---
 # Customizing Prompts at a Low Level
 ## Why Customize Prompts?
 Although CrewAI's default prompts work well for many scenarios, low-level customization opens the door to significantly more flexible and powerful agent behavior. Here’s why you might want to take advantage of this deeper control:
--- a/docs/guides/advanced/fingerprinting.mdx
+++ b/docs/guides/advanced/fingerprinting.mdx
@@ -4,8 +4,6 @@ description: Learn how to use CrewAI's fingerprinting system to uniquely identif
 icon: fingerprint
 ---
 # Fingerprinting in CrewAI
 ## Overview
 Fingerprints in CrewAI provide a way to uniquely identify and track components throughout their lifecycle. Each `Agent`, `Crew`, and `Task` automatically receives a unique fingerprint when created, which cannot be manually overridden.
--- a/docs/guides/agents/crafting-effective-agents.mdx
+++ b/docs/guides/agents/crafting-effective-agents.mdx
@@ -4,8 +4,6 @@ description: Learn best practices for designing powerful, specialized AI agents
 icon: robot
 ---
 # Crafting Effective Agents
 ## The Art and Science of Agent Design
 At the heart of CrewAI lies the agent - a specialized AI entity designed to perform specific roles within a collaborative framework. While creating basic agents is simple, crafting truly effective agents that produce exceptional results requires understanding key design principles and best practices.
--- a/docs/guides/concepts/evaluating-use-cases.mdx
+++ b/docs/guides/concepts/evaluating-use-cases.mdx
@@ -4,8 +4,6 @@ description: Learn how to assess your AI application needs and choose the right
 icon: scale-balanced
 ---
 # Evaluating Use Cases for CrewAI
 ## Understanding the Decision Framework
 When building AI applications with CrewAI, one of the most important decisions you'll make is choosing the right approach for your specific use case. Should you use a Crew? A Flow? A combination of both? This guide will help you evaluate your requirements and make informed architectural decisions.
--- a/docs/guides/crews/first-crew.mdx
+++ b/docs/guides/crews/first-crew.mdx
@@ -4,8 +4,6 @@ description: Step-by-step tutorial to create a collaborative AI team that works
 icon: users-gear
 ---
 # Build Your First Crew
 ## Unleashing the Power of Collaborative AI
 Imagine having a team of specialized AI agents working together seamlessly to solve complex problems, each contributing their unique skills to achieve a common goal. This is the power of CrewAI - a framework that enables you to create collaborative AI systems that can accomplish tasks far beyond what a single AI could achieve alone.
--- a/docs/guides/flows/first-flow.mdx
+++ b/docs/guides/flows/first-flow.mdx
@@ -4,8 +4,6 @@ description: Learn how to create structured, event-driven workflows with precise
 icon: diagram-project
 ---
 # Build Your First Flow
 ## Taking Control of AI Workflows with Flows
 CrewAI Flows represent the next level in AI orchestration - combining the collaborative power of AI agent crews with the precision and flexibility of procedural programming. While crews excel at agent collaboration, flows give you fine-grained control over exactly how and when different components of your AI system interact.
--- a/docs/guides/flows/mastering-flow-state.mdx
+++ b/docs/guides/flows/mastering-flow-state.mdx
@@ -4,8 +4,6 @@ description: A comprehensive guide to managing, persisting, and leveraging state
 icon: diagram-project
 ---
 # Mastering Flow State Management
 ## Understanding the Power of State in Flows
 State management is the backbone of any sophisticated AI workflow. In CrewAI Flows, the state system allows you to maintain context, share data between steps, and build complex application logic. Mastering state management is essential for creating reliable, maintainable, and powerful AI applications.
--- a/docs/tools/stagehandtool.mdx
+++ b/docs/tools/stagehandtool.mdx
@@ -0,0 +1,244 @@
 ---
 title: Stagehand Tool
 description: Web automation tool that integrates Stagehand with CrewAI for browser interaction and automation
 icon: hand
 ---
 # Overview
 The `StagehandTool` integrates the [Stagehand](https://docs.stagehand.dev/get_started/introduction) framework with CrewAI, enabling agents to interact with websites and automate browser tasks using natural language instructions.
 ## Overview
 Stagehand is a powerful browser automation framework built by Browserbase that allows AI agents to:
 - Navigate to websites
 - Click buttons, links, and other elements
 - Fill in forms
 - Extract data from web pages
 - Observe and identify elements
 - Perform complex workflows
 The StagehandTool wraps the Stagehand Python SDK to provide CrewAI agents with browser control capabilities through three core primitives:
 1. **Act**: Perform actions like clicking, typing, or navigating
 2. **Extract**: Extract structured data from web pages
 3. **Observe**: Identify and analyze elements on the page
 ## Prerequisites
 Before using this tool, ensure you have:
 1. A [Browserbase](https://www.browserbase.com/) account with API key and project ID
 2. An API key for an LLM (OpenAI or Anthropic Claude)
 3. The Stagehand Python SDK installed
 Install the required dependency:
 ```bash
 pip install stagehand-py
 ```
 ## Usage
 ### Basic Implementation
 The StagehandTool can be implemented in two ways:
 #### 1. Using Context Manager (Recommended)
 <Tip>
  The context manager approach is recommended as it ensures proper cleanup of resources even if exceptions occur.
 </Tip>
 ```python
 from crewai import Agent, Task, Crew
 from crewai_tools import StagehandTool
 from stagehand.schemas import AvailableModel
 # Initialize the tool with your API keys using a context manager
 with StagehandTool(
    api_key="your-browserbase-api-key",
    project_id="your-browserbase-project-id",
    model_api_key="your-llm-api-key",  # OpenAI or Anthropic API key
    model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,  # Optional: specify which model to use
 ) as stagehand_tool:
    # Create an agent with the tool
    researcher = Agent(
        role="Web Researcher",
        goal="Find and summarize information from websites",
        backstory="I'm an expert at finding information online.",
        verbose=True,
        tools=[stagehand_tool],
    )
    # Create a task that uses the tool
    research_task = Task(
        description="Go to https://www.example.com and tell me what you see on the homepage.",
        agent=researcher,
    )
    # Run the crew
    crew = Crew(
        agents=[researcher],
        tasks=[research_task],
        verbose=True,
    )
    result = crew.kickoff()
    print(result)
 ```
 #### 2. Manual Resource Management
 ```python
 from crewai import Agent, Task, Crew
 from crewai_tools import StagehandTool
 from stagehand.schemas import AvailableModel
 # Initialize the tool with your API keys
 stagehand_tool = StagehandTool(
    api_key="your-browserbase-api-key",
    project_id="your-browserbase-project-id",
    model_api_key="your-llm-api-key",
    model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,
 )
 try:
    # Create an agent with the tool
    researcher = Agent(
        role="Web Researcher",
        goal="Find and summarize information from websites",
        backstory="I'm an expert at finding information online.",
        verbose=True,
        tools=[stagehand_tool],
    )
    # Create a task that uses the tool
    research_task = Task(
        description="Go to https://www.example.com and tell me what you see on the homepage.",
        agent=researcher,
    )
    # Run the crew
    crew = Crew(
        agents=[researcher],
        tasks=[research_task],
        verbose=True,
    )
    result = crew.kickoff()
    print(result)
 finally:
    # Explicitly clean up resources
    stagehand_tool.close()
 ```
 ## Command Types
 The StagehandTool supports three different command types for specific web automation tasks:
 ### 1. Act Command
 The `act` command type (default) enables webpage interactions like clicking buttons, filling forms, and navigation.
 ```python
 # Perform an action (default behavior)
 result = stagehand_tool.run(
    instruction="Click the login button", 
    url="https://example.com",
    command_type="act"  # Default, so can be omitted
 )
 # Fill out a form
 result = stagehand_tool.run(
    instruction="Fill the contact form with name 'John Doe', email 'john@example.com', and message 'Hello world'", 
    url="https://example.com/contact"
 )
 ```
 ### 2. Extract Command
 The `extract` command type retrieves structured data from webpages.
 ```python
 # Extract all product information
 result = stagehand_tool.run(
    instruction="Extract all product names, prices, and descriptions", 
    url="https://example.com/products",
    command_type="extract"
 )
 # Extract specific information with a selector
 result = stagehand_tool.run(
    instruction="Extract the main article title and content", 
    url="https://example.com/blog/article",
    command_type="extract",
    selector=".article-container"  # Optional CSS selector
 )
 ```
 ### 3. Observe Command
 The `observe` command type identifies and analyzes webpage elements.
 ```python
 # Find interactive elements
 result = stagehand_tool.run(
    instruction="Find all interactive elements in the navigation menu", 
    url="https://example.com",
    command_type="observe"
 )
 # Identify form fields
 result = stagehand_tool.run(
    instruction="Identify all the input fields in the registration form", 
    url="https://example.com/register",
    command_type="observe",
    selector="#registration-form"
 )
 ```
 ## Configuration Options
 Customize the StagehandTool behavior with these parameters:
 ```python
 stagehand_tool = StagehandTool(
    api_key="your-browserbase-api-key",
    project_id="your-browserbase-project-id",
    model_api_key="your-llm-api-key",
    model_name=AvailableModel.CLAUDE_3_7_SONNET_LATEST,
    dom_settle_timeout_ms=5000,  # Wait longer for DOM to settle
    headless=True,  # Run browser in headless mode
    self_heal=True,  # Attempt to recover from errors
    wait_for_captcha_solves=True,  # Wait for CAPTCHA solving
    verbose=1,  # Control logging verbosity (0-3)
 )
 ```
 ## Best Practices
 1. **Be Specific**: Provide detailed instructions for better results
 2. **Choose Appropriate Command Type**: Select the right command type for your task
 3. **Use Selectors**: Leverage CSS selectors to improve accuracy
 4. **Break Down Complex Tasks**: Split complex workflows into multiple tool calls
 5. **Implement Error Handling**: Add error handling for potential issues
 ## Troubleshooting
 Common issues and solutions:
 - **Session Issues**: Verify API keys for both Browserbase and LLM provider
 - **Element Not Found**: Increase `dom_settle_timeout_ms` for slower pages
 - **Action Failures**: Use `observe` to identify correct elements first
 - **Incomplete Data**: Refine instructions or provide specific selectors
 ## Additional Resources
 For questions about the CrewAI integration:
 - Join Stagehand's [Slack community](https://stagehand.dev/slack)
 - Open an issue in the [Stagehand repository](https://github.com/browserbase/stagehand)
 - Visit [Stagehand documentation](https://docs.stagehand.dev/)