Browser-use: AI Agent Framework That Actually Completes Browser Tasks

I needed to build an AI agent that could autonomously navigate websites and complete multi-step tasks like filling forms, clicking through menus, and extracting data. Traditional Playwright scripts break when pages change. Browser-use promised an LLM-powered agent that adapts to changes, but the agent would get stuck or click wrong elements. Here's how I got reliable browser automation.

Agent Couldn't Find Elements That Were Clearly Visible on Page

Problem

The agent would fail to find buttons or forms that were clearly visible on the page. It would say "element not found" even though I could see it in the browser. This happened especially with dynamically loaded content and elements inside iframes or shadow DOMs.

Error: ElementNotFoundError: Cannot find element with text 'Submit' after 3 attempts

What I Tried

Attempt 1: Increased the max_attempts parameter from 3 to 10. The agent would retry but still fail, just taking longer.
Attempt 2: Provided explicit XPath selectors. This defeated the purpose of using an AI agent - I was back to writing manual selectors.
Attempt 3: Added longer wait times for page loads. This helped with dynamic content but didn't fix selector issues.

Actual Fix

The issue was that Browser-use's element detection only looked at the initial page snapshot. I enabled multi-step element detection with vision-based fallback. Now the agent takes screenshots, uses vision models to identify elements, and falls back to text-based search if needed.

# Enhanced element detection
from browser_use import Agent, BrowserConfig
from browser_use.element_detection import VisionElementDetector

# Configure agent with vision-based detection
config = BrowserConfig(
    headless=False,
    # Multi-step detection
    element_detection_strategy="vision_first",  # Try vision first
    fallback_to_text_search=True,  # Fall back to text
    # Wait for dynamic content
    wait_for_selector_timeout=5000,  # 5 seconds
    wait_for_stable_content=True,  # Wait for content to stop changing
    # Vision settings
    screenshot_on_error=True,
    vision_model="gpt-4o",  # Use vision for element detection
    # Retry logic
    max_retries=3,
    retry_delay=1.0
)

agent = Agent(
    task="Fill out the contact form and submit",
    browser_config=config
)

# Agent now uses vision to find elements
result = agent.run()

Agent Lost Context During Multi-Step Tasks

Problem

On tasks requiring 5+ steps, the agent would forget earlier actions and get stuck in loops. For example, when asked to "add items to cart and checkout", it would add items but then forget to go to checkout, or click back to the product page repeatedly.

What I Tried

Attempt 1: Broke tasks into smaller subtasks. This required manual orchestration and wasn't autonomous.
Attempt 2: Increased context window. This helped slightly but the agent still lost track of what it had done.

Actual Fix

Enabled Browser-use's memory system with action logging. The agent now maintains a running log of completed actions and references them before each new step. Also added checkpointing for long-running tasks.

# Agent with memory and checkpoints
from browser_use import Agent
from browser_use.memory import ActionMemory, CheckpointManager

# Enable action memory
memory = ActionMemory(
    max_history=50,  # Remember last 50 actions
    summarize_after=20,  # Summarize older actions
    include_screenshots=True  # Store screenshots in memory
)

# Enable checkpointing
checkpoints = CheckpointManager(
    checkpoint_every=5,  # Save state every 5 actions
    checkpoint_dir="./checkpoints",
    auto_resume=True  # Resume from checkpoint if agent crashes
)

agent = Agent(
    task="Add 3 items to cart and complete checkout",
    memory=memory,
    checkpoints=checkpoints,
    # Context management
    maintain_context=True,
    context_window=8192,
    # Task planning
    use_planning=True,  # Break task into sub-steps
    validate_plan=True  # Ask for confirmation before executing
)

result = agent.run()
# Agent now:
# 1. Plans the full task
# 2. Logs each action
# 3. Checks memory before acting
# 4. Creates checkpoints for recovery

Actions Failed Due to Timing Issues and Element State Changes

Problem

The agent would try to click elements before they were ready, or type into fields that were still disabled. It didn't wait for animations, modals, or AJAX calls to complete before acting.

What I Tried

Attempt 1: Added fixed delays (time.sleep) between actions. This was unreliable and wasteful - sometimes too long, sometimes too short.
Attempt 2: Used explicit wait conditions. The agent couldn't predict what conditions to wait for.

Actual Fix

Configured Browser-use's smart waiting with automatic stability detection. The agent now waits for elements to become interactive, animations to complete, and network requests to settle before taking action.

# Smart waiting configuration
from browser_use import Agent, BrowserConfig
from browser_use.waiting import SmartWaiter

config = BrowserConfig(
    # Smart waiting
    use_smart_wait=True,
    wait_strategy="stability",  # Wait for content to stabilize
    stability_threshold=500,  # 500ms of no DOM changes
    # Element readiness
    wait_for_clickable=True,  # Element must be clickable
    wait_for_visible=True,  # Element must be visible
    wait_for_enabled=True,  # Element must be enabled
    # Network requests
    wait_for_network_idle=True,  # Wait for AJAX/fetch to complete
    network_idle_timeout=2000,  # 2 seconds of no network activity
    # Animations
    wait_for_animations=False,  # Don't wait for CSS animations (too slow)
    # Timeouts
    default_timeout=10000,  # 10 second default timeout
    page_load_timeout=30000  # 30 second page load timeout
)

agent = Agent(
    task="Complete the purchase flow",
    browser_config=config
)

# Agent now intelligently waits for:
# - Elements to be ready
# - Network requests to finish
# - Content to stabilize
# - No more failed clicks due to timing!

What I Learned

Vision-first detection is more reliable: Using GPT-4o's vision to find elements works better than text search alone, especially for dynamic content.
Memory prevents repeated mistakes: Without action logging, agents repeat failed actions. Memory is essential for multi-step tasks.
Checkpoints save time: Long-running tasks will fail. Checkpoints let you resume without starting over.
Smart waiting beats fixed delays: Waiting for stability and network idle is more reliable than arbitrary sleep times.
Task planning helps execution: Breaking tasks into sub-steps and validating the plan prevents the agent from going off-track.
Screenshots aid debugging: Always save screenshots on error. They're invaluable for understanding why the agent failed.

Production Setup

Complete configuration for production-ready browser automation agents.

# Install browser-use
pip install browser-use

# Install Playwright browsers
playwright install chromium

# Install vision model dependencies
pip install openai  # For GPT-4o vision
pip install anthropic  # For Claude vision

Production agent configuration:

import asyncio
from browser_use import Agent, BrowserConfig
from browser_use.memory import ActionMemory, CheckpointManager
from browser_use.monitoring import AgentMonitor

class ProductionBrowserAgent:
    """Production-ready browser automation agent."""

    def __init__(self):
        # Configure browser
        self.config = BrowserConfig(
            headless=True,  # Run headless in production
            browser_type="chromium",
            # Element detection
            element_detection_strategy="vision_first",
            vision_model="gpt-4o",
            screenshot_on_error=True,
            # Smart waiting
            use_smart_wait=True,
            wait_strategy="stability",
            stability_threshold=500,
            wait_for_network_idle=True,
            # Performance
            disable_gpu=False,
            user_data_dir="./browser_profile",
        )

        # Memory and checkpoints
        self.memory = ActionMemory(
            max_history=100,
            summarize_after=50,
            include_screenshots=True,
            persist_to_disk=True,
            memory_dir="./agent_memory"
        )

        self.checkpoints = CheckpointManager(
            checkpoint_every=5,
            checkpoint_dir="./checkpoints",
            auto_resume=True,
            keep_last_n=10
        )

        # Monitoring
        self.monitor = AgentMonitor(
            log_level="INFO",
            log_file="agent.log",
            metrics_enabled=True,
            alert_on_failure=True
        )

    async def run_task(self, task: str, max_steps: int = 50):
        """Run a task with all production features."""
        agent = Agent(
            task=task,
            browser_config=self.config,
            memory=self.memory,
            checkpoints=self.checkpoints,
            monitor=self.monitor,
            max_steps=max_steps,
            use_planning=True
        )

        result = await agent.run()
        return result

# Usage
async def main():
    agent = ProductionBrowserAgent()

    # Run complex task
    result = await agent.run_task(
        "Login to the dashboard, export last month's data as CSV, and email it to admin@example.com",
        max_steps=30
    )

    print(f"Task completed: {result.success}")
    print(f"Steps taken: {len(result.actions)}")

if __name__ == "__main__":
    asyncio.run(main())

Monitoring & Debugging

Key metrics for monitoring browser agents in production.

Red Flags to Watch For

Element detection failure rate > 20%: Agent can't find elements. Check vision model or try different detection strategy.
Action retry rate > 30%: Timing issues or unstable page. Increase stability_threshold.
Task completion rate < 60%: Agent failing too often. Review task instructions and add more examples.
Average steps per task > 2x expected: Agent is inefficient or stuck in loops. Check memory and planning.
Checkpoint recovery rate > 50%: Agent crashing frequently. Investigate stability issues.

Debug Commands

# View agent memory
python -m browser_use.tools.inspect_memory \
    --memory_dir ./agent_memory \
    --show_screenshots

# Replay agent execution
python -m browser_use.tools.replay \
    --checkpoint ./checkpoints/task_123

# Test element detection
python -m browser_use.tools.test_detection \
    --url https://example.com \
    --element "Submit button"

# Monitor live agent
python -m browser_use.tools.monitor \
    --log_file agent.log \
    --follow

Problem

What I Tried

Actual Fix

Problem

What I Tried

Actual Fix

Problem

What I Tried

Actual Fix

What I Learned

Production Setup

Monitoring & Debugging

Red Flags to Watch For

Debug Commands

Related Resources