Browser-use: AI Agent Framework That Actually Completes Browser Tasks

I needed to build an AI agent that could autonomously navigate websites and complete multi-step tasks like filling forms, clicking through menus, and extracting data. Traditional Playwright scripts break when pages change. Browser-use promised an LLM-powered agent that adapts to changes, but the agent would get stuck or click wrong elements. Here's how I got reliable browser automation.

Problem

The agent would fail to find buttons or forms that were clearly visible on the page. It would say "element not found" even though I could see it in the browser. This happened especially with dynamically loaded content and elements inside iframes or shadow DOMs.

Error: ElementNotFoundError: Cannot find element with text 'Submit' after 3 attempts

What I Tried

Attempt 1: Increased the max_attempts parameter from 3 to 10. The agent would retry but still fail, just taking longer.
Attempt 2: Provided explicit XPath selectors. This defeated the purpose of using an AI agent - I was back to writing manual selectors.
Attempt 3: Added longer wait times for page loads. This helped with dynamic content but didn't fix selector issues.

Actual Fix

The issue was that Browser-use's element detection only looked at the initial page snapshot. I enabled multi-step element detection with vision-based fallback. Now the agent takes screenshots, uses vision models to identify elements, and falls back to text-based search if needed.

# Enhanced element detection
from browser_use import Agent, BrowserConfig
from browser_use.element_detection import VisionElementDetector

# Configure agent with vision-based detection
config = BrowserConfig(
    headless=False,
    # Multi-step detection
    element_detection_strategy="vision_first",  # Try vision first
    fallback_to_text_search=True,  # Fall back to text
    # Wait for dynamic content
    wait_for_selector_timeout=5000,  # 5 seconds
    wait_for_stable_content=True,  # Wait for content to stop changing
    # Vision settings
    screenshot_on_error=True,
    vision_model="gpt-4o",  # Use vision for element detection
    # Retry logic
    max_retries=3,
    retry_delay=1.0
)

agent = Agent(
    task="Fill out the contact form and submit",
    browser_config=config
)

# Agent now uses vision to find elements
result = agent.run()

Problem

On tasks requiring 5+ steps, the agent would forget earlier actions and get stuck in loops. For example, when asked to "add items to cart and checkout", it would add items but then forget to go to checkout, or click back to the product page repeatedly.

What I Tried

Attempt 1: Broke tasks into smaller subtasks. This required manual orchestration and wasn't autonomous.
Attempt 2: Increased context window. This helped slightly but the agent still lost track of what it had done.

Actual Fix

Enabled Browser-use's memory system with action logging. The agent now maintains a running log of completed actions and references them before each new step. Also added checkpointing for long-running tasks.

# Agent with memory and checkpoints
from browser_use import Agent
from browser_use.memory import ActionMemory, CheckpointManager

# Enable action memory
memory = ActionMemory(
    max_history=50,  # Remember last 50 actions
    summarize_after=20,  # Summarize older actions
    include_screenshots=True  # Store screenshots in memory
)

# Enable checkpointing
checkpoints = CheckpointManager(
    checkpoint_every=5,  # Save state every 5 actions
    checkpoint_dir="./checkpoints",
    auto_resume=True  # Resume from checkpoint if agent crashes
)

agent = Agent(
    task="Add 3 items to cart and complete checkout",
    memory=memory,
    checkpoints=checkpoints,
    # Context management
    maintain_context=True,
    context_window=8192,
    # Task planning
    use_planning=True,  # Break task into sub-steps
    validate_plan=True  # Ask for confirmation before executing
)

result = agent.run()
# Agent now:
# 1. Plans the full task
# 2. Logs each action
# 3. Checks memory before acting
# 4. Creates checkpoints for recovery

Problem

The agent would try to click elements before they were ready, or type into fields that were still disabled. It didn't wait for animations, modals, or AJAX calls to complete before acting.

What I Tried

Attempt 1: Added fixed delays (time.sleep) between actions. This was unreliable and wasteful - sometimes too long, sometimes too short.
Attempt 2: Used explicit wait conditions. The agent couldn't predict what conditions to wait for.

Actual Fix

Configured Browser-use's smart waiting with automatic stability detection. The agent now waits for elements to become interactive, animations to complete, and network requests to settle before taking action.

# Smart waiting configuration
from browser_use import Agent, BrowserConfig
from browser_use.waiting import SmartWaiter

config = BrowserConfig(
    # Smart waiting
    use_smart_wait=True,
    wait_strategy="stability",  # Wait for content to stabilize
    stability_threshold=500,  # 500ms of no DOM changes
    # Element readiness
    wait_for_clickable=True,  # Element must be clickable
    wait_for_visible=True,  # Element must be visible
    wait_for_enabled=True,  # Element must be enabled
    # Network requests
    wait_for_network_idle=True,  # Wait for AJAX/fetch to complete
    network_idle_timeout=2000,  # 2 seconds of no network activity
    # Animations
    wait_for_animations=False,  # Don't wait for CSS animations (too slow)
    # Timeouts
    default_timeout=10000,  # 10 second default timeout
    page_load_timeout=30000  # 30 second page load timeout
)

agent = Agent(
    task="Complete the purchase flow",
    browser_config=config
)

# Agent now intelligently waits for:
# - Elements to be ready
# - Network requests to finish
# - Content to stabilize
# - No more failed clicks due to timing!

What I Learned

Production Setup

Complete configuration for production-ready browser automation agents.

# Install browser-use
pip install browser-use

# Install Playwright browsers
playwright install chromium

# Install vision model dependencies
pip install openai  # For GPT-4o vision
pip install anthropic  # For Claude vision

Production agent configuration:

import asyncio
from browser_use import Agent, BrowserConfig
from browser_use.memory import ActionMemory, CheckpointManager
from browser_use.monitoring import AgentMonitor

class ProductionBrowserAgent:
    """Production-ready browser automation agent."""

    def __init__(self):
        # Configure browser
        self.config = BrowserConfig(
            headless=True,  # Run headless in production
            browser_type="chromium",
            # Element detection
            element_detection_strategy="vision_first",
            vision_model="gpt-4o",
            screenshot_on_error=True,
            # Smart waiting
            use_smart_wait=True,
            wait_strategy="stability",
            stability_threshold=500,
            wait_for_network_idle=True,
            # Performance
            disable_gpu=False,
            user_data_dir="./browser_profile",
        )

        # Memory and checkpoints
        self.memory = ActionMemory(
            max_history=100,
            summarize_after=50,
            include_screenshots=True,
            persist_to_disk=True,
            memory_dir="./agent_memory"
        )

        self.checkpoints = CheckpointManager(
            checkpoint_every=5,
            checkpoint_dir="./checkpoints",
            auto_resume=True,
            keep_last_n=10
        )

        # Monitoring
        self.monitor = AgentMonitor(
            log_level="INFO",
            log_file="agent.log",
            metrics_enabled=True,
            alert_on_failure=True
        )

    async def run_task(self, task: str, max_steps: int = 50):
        """Run a task with all production features."""
        agent = Agent(
            task=task,
            browser_config=self.config,
            memory=self.memory,
            checkpoints=self.checkpoints,
            monitor=self.monitor,
            max_steps=max_steps,
            use_planning=True
        )

        result = await agent.run()
        return result

# Usage
async def main():
    agent = ProductionBrowserAgent()

    # Run complex task
    result = await agent.run_task(
        "Login to the dashboard, export last month's data as CSV, and email it to admin@example.com",
        max_steps=30
    )

    print(f"Task completed: {result.success}")
    print(f"Steps taken: {len(result.actions)}")

if __name__ == "__main__":
    asyncio.run(main())

Monitoring & Debugging

Key metrics for monitoring browser agents in production.

Red Flags to Watch For

Debug Commands

# View agent memory
python -m browser_use.tools.inspect_memory \
    --memory_dir ./agent_memory \
    --show_screenshots

# Replay agent execution
python -m browser_use.tools.replay \
    --checkpoint ./checkpoints/task_123

# Test element detection
python -m browser_use.tools.test_detection \
    --url https://example.com \
    --element "Submit button"

# Monitor live agent
python -m browser_use.tools.monitor \
    --log_file agent.log \
    --follow

Related Resources