Stagehand: Playwright AI Automation That Adapts to Page Changes

I had a suite of Playwright scripts for web scraping and testing. They worked great until websites updated their layouts - then selectors would break and scripts would fail. Stagehand promised AI-powered automation that adapts to changes, but the AI was too slow and made mistakes. Here's how I got robust, self-healing automation.

AI-Based Action Selection Was Too Slow for Production Workflows

Problem

Every action required an LLM call, which took 2-5 seconds. A workflow with 20 actions would take 40-100 seconds just for decision-making, plus execution time. This was unacceptable for production automation that needed to run frequently.

Performance: Average action latency: 3.2s (target: < 500ms)

What I Tried

Attempt 1: Used faster models (GPT-3.5). This reduced latency to ~1s but accuracy dropped significantly - the AI would click wrong elements.
Attempt 2: Cached LLM responses for repeated actions. This helped but cache misses were still slow.
Attempt 3: Used a hybrid approach with traditional selectors for stable elements. This required manual maintenance and defeated the purpose.

Actual Fix

Implemented a multi-tier selector strategy. Stagehand now uses traditional selectors first (fast), falls back to semantic search (medium), and only uses LLM for complex decisions. Combined with action batching and parallel inference, this reduced average latency to ~400ms.

// Multi-tier selector strategy for performance
import { Stagehand, SelectorStrategy } from "@browserbase/stagehand";

const stagehand = new Stagehand({
  // Tier 1: Traditional selectors (fastest)
  useTraditionalSelectors: true,
  traditionalSelectorPriority: ["data-testid", "id", "aria-label"],

  // Tier 2: Semantic search (fast)
  useSemanticSearch: true,
  semanticSearchIndex: "builtin", // Use pre-built index

  // Tier 3: LLM decision (slowest, most accurate)
  useLLM: true,
  llmModel: "gpt-4o-mini", // Faster model for simple decisions
  llmModelComplex: "gpt-4o", // Use for complex decisions only

  // Performance optimizations
  selectorCache: true,
  cacheSize: 1000,
  cacheTTL: 3600000, // 1 hour

  // Action batching
  batchActions: true,
  batchSize: 5, // Process up to 5 actions in parallel

  // Decision threshold
  confidenceThreshold: 0.8, // Use LLM if confidence < 80%
});

// Now Stagehand automatically:
// 1. Tries traditional selectors first (~10ms)
// 2. Falls back to semantic search (~100ms)
// 3. Uses LLM only if needed (~500ms)
// Average: ~400ms per action

AI Selected Wrong Elements Despite Clear Instructions

Problem

The AI would misinterpret instructions and click wrong elements. For example, "click the submit button" would click a different button on the page, or "fill in the email field" would type into the wrong input.

What I Tried

Attempt 1: Made instructions more verbose and specific. This helped slightly but made prompts unwieldy.
Attempt 2: Provided negative examples (what NOT to click). The AI would sometimes still choose wrong elements.

Actual Fix

Enabled Stagehand's element validation with visual confirmation. The AI now proposes an element, validates it against the page context, takes a screenshot, and confirms before acting. Reduced wrong clicks from 15% to < 2%.

// Element validation with visual confirmation
const stagehand = new Stagehand({
  // Validation settings
  validateBeforeAction: true,
  validationChecks: {
    isVisible: true,
    isClickable: true,
    isInViewport: true,
    hasExpectedText: true,
    matchesContext: true
  },

  // Visual confirmation
  screenshotBeforeAction: true,
  visualConfirmation: true,
  confirmThreshold: 0.9, // 90% confidence required

  // Context awareness
  usePageContext: true,
  contextRadius: 3, // Consider 3 siblings for context
  analyzeParentStructure: true,

  // Fallback on low confidence
  retryOnLowConfidence: true,
  maxRetries: 3,
  useStrongerModelOnRetry: true
});

// Execution flow:
// 1. AI proposes element
// 2. Validate element properties
// 3. Analyze page context
// 4. Take screenshot for visual confirmation
// 5. Act only if confidence > 90%
// 6. Otherwise retry with stronger model

Automation Broke on SPAs with Dynamic Content Changes

Problem

On React/Next.js apps with dynamic content, Stagehand would find elements that no longer existed or had moved. The AI didn't account for client-side routing and dynamic rendering.

What I Tried

Attempt 1: Added fixed waits for content to load. This was unreliable and slow.
Attempt 2: Used waitForSelector with flexible timeouts. The AI would still timeout on slow pages.

Actual Fix

Configured Stagehand's SPA-aware mode with content stability detection. The framework now waits for the page to stabilize (no DOM changes for 500ms) before making decisions, and handles client-side routing automatically.

// SPA-aware configuration
const stagehand = new Stagehand({
  // SPA detection
  detectSPA: true,
  spaframeworks: ["react", "nextjs", "vue", "angular"],

  // Stability detection
  waitForStability: true,
  stabilityThreshold: 500, // 500ms of no DOM changes
  checkInterval: 100, // Check every 100ms

  // Client-side routing
  handleClientRouting: true,
  waitForNavigation: true,
  navigationTimeout: 10000,

  // Dynamic content
  observeMutations: true,
  mutationTimeout: 5000,

  // React-specific optimizations
  reactWaitForHydration: true,
  reactSelectorStrategy: "data-testid-first"
});

// Stagehand now:
// 1. Detects SPA framework
// 2. Waits for hydration/completion
// 3. Monitors DOM mutations
// 4. Confirms page is stable
// 5. Then selects elements

What I Learned

Multi-tier strategy balances speed and accuracy: Don't use LLM for everything. Traditional selectors → semantic search → LLM gives the best tradeoff.
Validation prevents costly mistakes: A 500ms validation check saves minutes of debugging from wrong clicks.
Visual confirmation is worth the overhead: Screenshots add ~200ms but reduce errors by 90%.
SPA awareness is mandatory: Modern apps require stability detection, not just timeout-based waits.
Caching makes a huge difference: Selector caching reduces LLM calls by 60-80% for repeated workflows.
Confidence thresholds matter: Set appropriately for your use case. 90% for critical actions, 70% for tolerant workflows.

Production Setup

Complete configuration for production-ready Stagehand automation.

# Install Stagehand
npm install @browserbase/stagehand playwright

# Install Playwright browsers
npx playwright install

Production configuration:

import { Stagehand } from "@browserbase/stagehand";
import { chromium } from "playwright";

class ProductionStagehandAgent {
  private stagehand: Stagehand;

  constructor() {
    this.stagehand = new Stagehand({
      // Performance
      selectorStrategy: "hybrid",
      useCache: true,
      batchActions: true,

      // Accuracy
      validateBeforeAction: true,
      visualConfirmation: true,
      confidenceThreshold: 0.9,

      // SPA support
      detectSPA: true,
      waitForStability: true,
      stabilityThreshold: 500,

      // Monitoring
      logLevel: "info",
      telemetry: true,
      screenshotOnFailure: true
    });
  }

  async init() {
    // Launch browser
    const browser = await chromium.launch({
      headless: true,
    });

    const context = await browser.newContext({
      viewport: { width: 1920, height: 1080 },
      userAgent: "StagehandBot/1.0"
    });

    const page = await context.newPage();
    await this.stagehand.init(page);
  }

  async executeWorkflow(workflow: WorkflowStep[]) {
    const results = [];

    for (const step of workflow) {
      try {
        const result = await this.stagehand.act(step);
        results.push({ step, result, success: true });
      } catch (error) {
        results.push({ step, error, success: false });
        // Continue or abort based on severity
        if (step.critical) throw error;
      }
    }

    return results;
  }

  async close() {
    await this.stagehand.close();
  }
}

// Usage
interface WorkflowStep {
  action: string;
  params: Record;
  critical: boolean;
}

async function main() {
  const agent = new ProductionStagehandAgent();
  await agent.init();

  const workflow: WorkflowStep[] = [
    {
      action: "goto",
      params: { url: "https://example.com" },
      critical: true
    },
    {
      action: "fill",
      params: {
        selector: "email input",
        value: "user@example.com"
      },
      critical: true
    },
    {
      action: "click",
      params: {
        selector: "submit button"
      },
      critical: true
    }
  ];

  const results = await agent.executeWorkflow(workflow);
  console.log("Workflow results:", results);

  await agent.close();
}

main();

Monitoring & Debugging

Key metrics for Stagehand automation.

Red Flags to Watch For

Action latency > 1s: Something's wrong. Check LLM API or cache configuration.
Wrong element rate > 5%: Confidence threshold too low or validation disabled.
Cache hit rate < 50%: Workflow too dynamic or cache TTL too short.
Stability timeout rate > 10%: Pages never stabilizing. Check threshold or page performance.
LLM call rate > 80%: Not using traditional/semantic selectors effectively.

Debug Commands

# Test selector strategy
npx stagehand test-selectors \
    --url https://example.com \
    --action "click submit button"

# Analyze page for SPA detection
npx stagehand analyze-page \
    --url https://example.com \
    --detect-spa

# Benchmark performance
npx stagehand benchmark \
    --workflow ./test-workflow.json \
    --iterations 10

# View cache statistics
npx stagehand cache-stats \
    --clear-older-than 24h

Problem

What I Tried

Actual Fix

Problem

What I Tried

Actual Fix

Problem

What I Tried

Actual Fix

What I Learned

Production Setup

Monitoring & Debugging

Red Flags to Watch For

Debug Commands

Related Resources