Stagehand: Playwright AI Automation That Adapts to Page Changes
I had a suite of Playwright scripts for web scraping and testing. They worked great until websites updated their layouts - then selectors would break and scripts would fail. Stagehand promised AI-powered automation that adapts to changes, but the AI was too slow and made mistakes. Here's how I got robust, self-healing automation.
Problem
Every action required an LLM call, which took 2-5 seconds. A workflow with 20 actions would take 40-100 seconds just for decision-making, plus execution time. This was unacceptable for production automation that needed to run frequently.
Performance: Average action latency: 3.2s (target: < 500ms)
What I Tried
Attempt 1: Used faster models (GPT-3.5). This reduced latency to ~1s but accuracy dropped significantly - the AI would click wrong elements.
Attempt 2: Cached LLM responses for repeated actions. This helped but cache misses were still slow.
Attempt 3: Used a hybrid approach with traditional selectors for stable elements. This required manual maintenance and defeated the purpose.
Actual Fix
Implemented a multi-tier selector strategy. Stagehand now uses traditional selectors first (fast), falls back to semantic search (medium), and only uses LLM for complex decisions. Combined with action batching and parallel inference, this reduced average latency to ~400ms.
// Multi-tier selector strategy for performance
import { Stagehand, SelectorStrategy } from "@browserbase/stagehand";
const stagehand = new Stagehand({
// Tier 1: Traditional selectors (fastest)
useTraditionalSelectors: true,
traditionalSelectorPriority: ["data-testid", "id", "aria-label"],
// Tier 2: Semantic search (fast)
useSemanticSearch: true,
semanticSearchIndex: "builtin", // Use pre-built index
// Tier 3: LLM decision (slowest, most accurate)
useLLM: true,
llmModel: "gpt-4o-mini", // Faster model for simple decisions
llmModelComplex: "gpt-4o", // Use for complex decisions only
// Performance optimizations
selectorCache: true,
cacheSize: 1000,
cacheTTL: 3600000, // 1 hour
// Action batching
batchActions: true,
batchSize: 5, // Process up to 5 actions in parallel
// Decision threshold
confidenceThreshold: 0.8, // Use LLM if confidence < 80%
});
// Now Stagehand automatically:
// 1. Tries traditional selectors first (~10ms)
// 2. Falls back to semantic search (~100ms)
// 3. Uses LLM only if needed (~500ms)
// Average: ~400ms per action
Problem
The AI would misinterpret instructions and click wrong elements. For example, "click the submit button" would click a different button on the page, or "fill in the email field" would type into the wrong input.
What I Tried
Attempt 1: Made instructions more verbose and specific. This helped slightly but made prompts unwieldy.
Attempt 2: Provided negative examples (what NOT to click). The AI would sometimes still choose wrong elements.
Actual Fix
Enabled Stagehand's element validation with visual confirmation. The AI now proposes an element, validates it against the page context, takes a screenshot, and confirms before acting. Reduced wrong clicks from 15% to < 2%.
// Element validation with visual confirmation
const stagehand = new Stagehand({
// Validation settings
validateBeforeAction: true,
validationChecks: {
isVisible: true,
isClickable: true,
isInViewport: true,
hasExpectedText: true,
matchesContext: true
},
// Visual confirmation
screenshotBeforeAction: true,
visualConfirmation: true,
confirmThreshold: 0.9, // 90% confidence required
// Context awareness
usePageContext: true,
contextRadius: 3, // Consider 3 siblings for context
analyzeParentStructure: true,
// Fallback on low confidence
retryOnLowConfidence: true,
maxRetries: 3,
useStrongerModelOnRetry: true
});
// Execution flow:
// 1. AI proposes element
// 2. Validate element properties
// 3. Analyze page context
// 4. Take screenshot for visual confirmation
// 5. Act only if confidence > 90%
// 6. Otherwise retry with stronger model
Problem
On React/Next.js apps with dynamic content, Stagehand would find elements that no longer existed or had moved. The AI didn't account for client-side routing and dynamic rendering.
What I Tried
Attempt 1: Added fixed waits for content to load. This was unreliable and slow.
Attempt 2: Used waitForSelector with flexible timeouts. The AI would still timeout on slow pages.
Actual Fix
Configured Stagehand's SPA-aware mode with content stability detection. The framework now waits for the page to stabilize (no DOM changes for 500ms) before making decisions, and handles client-side routing automatically.
// SPA-aware configuration
const stagehand = new Stagehand({
// SPA detection
detectSPA: true,
spaframeworks: ["react", "nextjs", "vue", "angular"],
// Stability detection
waitForStability: true,
stabilityThreshold: 500, // 500ms of no DOM changes
checkInterval: 100, // Check every 100ms
// Client-side routing
handleClientRouting: true,
waitForNavigation: true,
navigationTimeout: 10000,
// Dynamic content
observeMutations: true,
mutationTimeout: 5000,
// React-specific optimizations
reactWaitForHydration: true,
reactSelectorStrategy: "data-testid-first"
});
// Stagehand now:
// 1. Detects SPA framework
// 2. Waits for hydration/completion
// 3. Monitors DOM mutations
// 4. Confirms page is stable
// 5. Then selects elements
What I Learned
- Multi-tier strategy balances speed and accuracy: Don't use LLM for everything. Traditional selectors → semantic search → LLM gives the best tradeoff.
- Validation prevents costly mistakes: A 500ms validation check saves minutes of debugging from wrong clicks.
- Visual confirmation is worth the overhead: Screenshots add ~200ms but reduce errors by 90%.
- SPA awareness is mandatory: Modern apps require stability detection, not just timeout-based waits.
- Caching makes a huge difference: Selector caching reduces LLM calls by 60-80% for repeated workflows.
- Confidence thresholds matter: Set appropriately for your use case. 90% for critical actions, 70% for tolerant workflows.
Production Setup
Complete configuration for production-ready Stagehand automation.
# Install Stagehand
npm install @browserbase/stagehand playwright
# Install Playwright browsers
npx playwright install
Production configuration:
import { Stagehand } from "@browserbase/stagehand";
import { chromium } from "playwright";
class ProductionStagehandAgent {
private stagehand: Stagehand;
constructor() {
this.stagehand = new Stagehand({
// Performance
selectorStrategy: "hybrid",
useCache: true,
batchActions: true,
// Accuracy
validateBeforeAction: true,
visualConfirmation: true,
confidenceThreshold: 0.9,
// SPA support
detectSPA: true,
waitForStability: true,
stabilityThreshold: 500,
// Monitoring
logLevel: "info",
telemetry: true,
screenshotOnFailure: true
});
}
async init() {
// Launch browser
const browser = await chromium.launch({
headless: true,
});
const context = await browser.newContext({
viewport: { width: 1920, height: 1080 },
userAgent: "StagehandBot/1.0"
});
const page = await context.newPage();
await this.stagehand.init(page);
}
async executeWorkflow(workflow: WorkflowStep[]) {
const results = [];
for (const step of workflow) {
try {
const result = await this.stagehand.act(step);
results.push({ step, result, success: true });
} catch (error) {
results.push({ step, error, success: false });
// Continue or abort based on severity
if (step.critical) throw error;
}
}
return results;
}
async close() {
await this.stagehand.close();
}
}
// Usage
interface WorkflowStep {
action: string;
params: Record;
critical: boolean;
}
async function main() {
const agent = new ProductionStagehandAgent();
await agent.init();
const workflow: WorkflowStep[] = [
{
action: "goto",
params: { url: "https://example.com" },
critical: true
},
{
action: "fill",
params: {
selector: "email input",
value: "user@example.com"
},
critical: true
},
{
action: "click",
params: {
selector: "submit button"
},
critical: true
}
];
const results = await agent.executeWorkflow(workflow);
console.log("Workflow results:", results);
await agent.close();
}
main();
Monitoring & Debugging
Key metrics for Stagehand automation.
Red Flags to Watch For
- Action latency > 1s: Something's wrong. Check LLM API or cache configuration.
- Wrong element rate > 5%: Confidence threshold too low or validation disabled.
- Cache hit rate < 50%: Workflow too dynamic or cache TTL too short.
- Stability timeout rate > 10%: Pages never stabilizing. Check threshold or page performance.
- LLM call rate > 80%: Not using traditional/semantic selectors effectively.
Debug Commands
# Test selector strategy
npx stagehand test-selectors \
--url https://example.com \
--action "click submit button"
# Analyze page for SPA detection
npx stagehand analyze-page \
--url https://example.com \
--detect-spa
# Benchmark performance
npx stagehand benchmark \
--workflow ./test-workflow.json \
--iterations 10
# View cache statistics
npx stagehand cache-stats \
--clear-older-than 24h