Playwright Stealth: Scraping Modern React/Vue Apps Without Getting Blocked

Handle dynamic content and avoid detection with playwright-stealth and code recorder.

I spent two weeks trying to scrape a client's React dashboard. Requests returned empty divs. Selenium got blocked immediately. Regular Playwright lasted about 20 requests before hitting a 403.

Eventually figured out the problem: modern SPAs (React, Vue, Angular) render everything client-side, and they've gotten really good at detecting automation. The content you want appears seconds after page load, and if you're using WebDriver, they know.

The solution

Use Playwright with playwright-stealth (adapted from puppeteer-extra). It patches browser automation traces. Combine it with Playwright's code recorder and you can scrape any SPA without writing selectors manually.

Why SPAs break traditional scrapers

# What requests/BeautifulSoup sees:

<div id="app"></div>

<div id="root"></div>

# Content rendered by JavaScript later

# What you actually want:

<div class="product">iPhone 15</div>

<div class="product">Samsung S24</div>

The HTML source never contains the product data. It arrives via XHR calls after page load, then gets inserted into the DOM. You need a browser that executes JavaScript.

Installing the tools

pip install playwright playwright-stealth
playwright install chromium

playwright-stealth is a Python port of the puppeteer-extra stealth plugin. It's maintained separately from Playwright itself.

Getting stealth to work

Basic setup to avoid detection:

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()

    # This patches the browser
    stealth_sync(page)

    page.goto('https://facebook.com')
    page.wait_for_selector('.data-loaded')

    content = page.inner_text('.data-loaded')
    print(content)

    browser.close()

Always test with headless=False first. You can see what's happening and verify stealth is working.

How stealth patches the browser

The plugin modifies several things bot detection scripts look for:

Detection vector What stealth does
navigator.webdriver Removes the flag (sets to undefined)
window.chrome Adds missing chrome object
navigator.permissions Mocks permission query responses
navigator.plugins Returns fake plugin list
WebGL renderer Masks GPU information
CDP traces Hides Chrome DevTools Protocol indicators

Use the code recorder

This is the killer feature. No more inspecting elements or guessing CSS selectors:

# Start recording
playwright codegen https://facebook.com

What happens:

  1. Browser opens with the site loaded
  2. You click around, fill forms, scroll
  3. Code generates in the sidebar in real-time
  4. Copy and paste when done

Generated output:

from playwright.sync_api import Page, expect

def run(page: Page) -> None:
    page.goto("https://facebook.com/")
    page.get_by_role("button", name="Load More").click()
    page.wait_for_selector(".product-card")

    products = page.locator(".product-card").all()
    for product in products:
        print(product.inner_text())

Why this matters

The recorder uses role-based selectors (button, link, heading) instead of CSS classes. These don't break when the app rebuilds and class names change.

Handling async content

React/Vue apps load data asynchronously. Multiple ways to wait:

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()
    stealth_sync(page)

    page.goto('https://facebook.com')

    # Option 1: Wait for element to appear
    page.wait_for_selector('.loaded-content')

    # Option 2: Wait for no network requests
    page.wait_for_load_state('networkidle')

    # Option 3: Wait for specific text
    page.wait_for_selector('text=Data loaded')

    # Option 4: Wait for API response
    with page.expect_response('/api/data') as response_info:
        page.click('button:has-text("Refresh")')
    response = response_info.value

    # Now scrape
    items = page.locator('.item').all()
    print(f"Found {len(items)} items")

I usually use networkidle + selector wait. Belt and suspenders approach.

Selecting elements in React apps

Playwright has special React support. These selectors are more stable than CSS:

# By role (accessible name)
submit = page.get_by_role('button', name='Submit')

# By test ID (if developers added them)
element = page.get_by_test_id('submit-button')

# By text content
title = page.get_by_text('Welcome back')

# Combine filters
sale_items = page.get_by_role('listitem').filter(
    has_text='sale'
)

# React DevTools locator
from playwright.sync_api import Page
page.get_by_test_id('user-profile')

Dealing with infinite scroll

Most modern shops use this. Here's a pattern that works:

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync
import time

def scrape_infinite_scroll(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=False)
        page = browser.new_page()
        stealth_sync(page)

        page.goto(url)

        items = set()

        while True:
            # Wait for items to load
            page.wait_for_selector('.item')
            current = page.locator('.item').all_text_contents()
            new_items = set(current) - items

            if not new_items:
                break  # Reached the end

            items.update(new_items)
            print(f"Collected {len(items)} items...")

            # Scroll down to trigger more
            page.evaluate('window.scrollTo(0, document.body.scrollHeight)')
            time.sleep(2)

        browser.close()
        return list(items)

Intercept API responses

Sometimes it's easier to grab the JSON directly:

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.new_page()
    stealth_sync(page)

    api_responses = []

    def handle_response(response):
        if '/api/products' in response.url:
            api_responses.append(response.json())

    page.on('response', handle_response)

    page.goto('https://facebook.com')

    # Trigger the API call
    page.click('button:has-text("Load")')
    page.wait_for_load_state('networkidle')

    # Process the JSON data directly
    for product in api_responses[0]['results']:
        print(f"{product['name']}: ${product['price']}")

This skips all the HTML parsing. The data arrives already structured.

Errors I encountered

"Element not found" even with wait_for_selector

React components render conditionally. The element might be in a loading state:

# Wait for element to be attached AND visible
page.wait_for_selector('.item', state='attached', state='visible')

# Or wait for it to NOT be loading
page.wait_for_selector('.content:not(.loading)')

Still getting blocked

Some sites check more than WebDriver flags:

# Don't use headless mode
browser = p.chromium.launch(headless=False)

# Set a realistic viewport
page.set_viewport_size({'width': 1920, 'height': 1080})

# Add user agent
page = browser.new_context(
    user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64)'
).new_page()

Content loads but shows loading spinner forever

Multiple loading states in SPAs:

# Wait for spinner to disappear first
page.wait_for_selector('.spinner', state='hidden', timeout=15000)

# Then wait for actual content
page.wait_for_selector('.content', state='visible')

Code recorder uses fragile selectors

It might generate CSS selectors like `div > div:nth-child(2)`:

# Replace with stable selectors
page.click('button:has-text("Submit")')
page.get_by_role('button', name='Submit').click()

Extra stealth measures

For sites with aggressive detection:

from playwright.sync_api import sync_playwright
from playwright_stealth import stealth_sync

with sync_playwright() as p:
    browser = p.chromium.launch(
        headless=False,
        args=[
            '--disable-blink-features=AutomationControlled',
            '--disable-infobars',
        ]
    )

    context = browser.new_context(
        viewport={'width': 1920, 'height': 1080},
        user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
        locale='en-US',
        timezone_id='America/New_York',
    )

    page = context.new_page()
    stealth_sync(page)

    # Scroll gradually like a human would
    page.goto('https://facebook.com')
    for i in range(5):
        page.evaluate(f'window.scrollTo(0, {i * 300})')
        page.wait_for_timeout(800)

Which approach to use

Not every site needs stealth:

Writing this down because I know I'll forget the stealth configuration next time I need it. The code recorder alone is worth the setup time - no more guessing selectors or debugging why `element.click()` keeps failing.