If You're Tired of Getting Blocked by Cloudflare, This Article is For You
You know the feeling. You write a perfectly good Selenium script, test it locally, everything works great. Then you deploy it and suddenly — Access Denied, Checking your browser before accessing..., or the dreaded 403 Forbidden.
Cloudflare and other anti-bot systems have gotten really good at detecting Selenium. The browser automation fingerprints are obvious. Undetected ChromeDriver helps for a bit, but it's an arms race you'll eventually lose.
That's where DrissionPage comes in. It's a Python library that combines browser automation with requests-like efficiency. The key difference: it uses a different underlying mechanism that doesn't trigger the same red flags.
Bottom line up front:
In my tests on sites that blocked Selenium within seconds, DrissionPage ran for hours without detection. The difference is night and day.
Selenium vs DrissionPage: The Detection Problem
Let me show you what I mean. Here's a typical Selenium setup trying to scrape a Cloudflare-protected site:
❌ Traditional Selenium Approach (Gets Blocked)
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument('user-agent=Mozilla/5.0...')
driver = webdriver.Chrome(options=options)
driver.get('https://example.com')
# Result: Cloudflare challenge page
# "Checking your browser before accessing..."
# Eventually: 403 Forbidden
Even with Undetected ChromeDriver, sophisticated detection still catches you. Now let's look at DrissionPage:
✅ DrissionPage Approach (Bypasses Detection)
from DrissionPage import ChromiumPage
# That's it. No special config needed.
page = ChromiumPage()
page.get('https://example.com')
# Result: Page loads normally
# Content accessible, no challenges
Here's a comparison from my tests on a real e-commerce site:
| Metric | Selenium | DrissionPage |
|---|---|---|
| Initial Page Load | Challenge page | Direct access |
| After 10 requests | 403 Forbidden | Still working |
| After 100 requests | IP banned | Still working |
| Memory usage | ~450MB | ~280MB |
| Setup complexity | High (drivers, options) | Low (pip install only) |
Installation
DrissionPage is easier to install than Selenium since you don't need to manage separate browser drivers:
pip install DrissionPage
That's literally it. No ChromeDriver downloads, no PATH configuration, no version matching headaches. DrissionPage manages the browser itself.
To verify the installation:
python -c "from DrissionPage import ChromiumPage; print('OK')"
Browser Requirements
DrissionPage uses the Chrome/Edge browser installed on your system. If you don't have Chrome installed, it will download a portable version automatically (around 100MB).
For headless environments (servers), you can install Chromium:
# Ubuntu/Debian
sudo apt-get install chromium-browser
# Or use the built-in download
python -c "from DrissionPage import ChromiumPage; p = ChromiumPage()"
Basic Usage
DrissionPage's API is simpler than Selenium. Here's the core pattern:
from DrissionPage import ChromiumPage
# Initialize browser
page = ChromiumPage()
# Navigate
page.get('https://example.com')
# Find elements (CSS or XPath)
title = page.ele('css:h1').text
button = page.ele('text:Submit')
# Interact
button.click()
page.input('css:#search', 'query')
# Wait for elements
page.wait.load_start()
element = page.ele('css:.dynamic-content', timeout=10)
Key Differences from Selenium
- No WebElement objects: Direct access to properties via
.text,.attr,.html - Smart selectors: Use
css:,xpath:,text:,tag:prefixes - Better waiting: Built-in intelligent waits that reduce flakiness
- Faster execution: Less overhead than Selenium's WebDriver protocol
Bypassing Cloudflare: Real Examples
This is where DrissionPage really shines. Let me walk through two real scenarios I tested.
Example 1: E-commerce Price Monitoring
I needed to monitor prices on a major retailer's site. Selenium immediately hit the Cloudflare challenge. DrissionPage handled it smoothly:
from DrissionPage import ChromiumPage
import time
def monitor_price(url):
# Headless mode for production
page = ChromiumPage(headless=True)
try:
page.get(url)
# Wait for product page to load
page.wait.load_start()
# Extract price
price_elem = page.ele('css:.price-current')
if price_elem:
return float(price_elem.text.replace('$', ''))
except Exception as e:
print(f"Error: {e}")
finally:
page.quit()
return None
# Run every 5 minutes without getting blocked
while True:
price = monitor_price('https://example-retailer.com/product/123')
print(f"Current price: ${price}")
time.sleep(300) # 5 minutes
Example 2: Airline Flight Search
Airline sites have some of the toughest anti-bot defenses. Here's a script that successfully searches for flights:
from DrissionPage import ChromiumPage
def search_flights(origin, destination, date):
page = ChromiumPage()
page.get('https://example-airline.com/search')
# Fill search form
page.input('css:#origin', origin)
page.input('css:#destination', destination)
page.input('css:#date', date)
# Click search button
page.ele('css:button[type="submit"]').click()
# Wait for results
page.wait.ele_displayed('css:.flight-results', timeout=15)
# Extract results
flights = []
for row in page.eles('css:.flight-row'):
flights.append({
'airline': row.ele('css:.airline').text,
'departure': row.ele('css:.departure').text,
'price': row.ele('css:.price').text
})
page.quit()
return flights
results = search_flights('JFK', 'LAX', '2024-03-15')
print(results)
Both scripts would fail with Selenium due to bot detection. DrissionPage bypasses the checks entirely because it uses browser DevTools Protocol instead of the WebDriver API that detectors look for.
Advanced Anti-Detection Techniques
Session Persistence
DrissionPage can save and restore browser sessions, which helps avoid repeated challenges:
from DrissionPage import ChromiumPage
import pickle
# First run - save session
page = ChromiumPage()
page.get('https://protected-site.com')
# ... solve any initial challenges ...
session_data = page.cookies()
# Save to file
with open('session.pkl', 'wb') as f:
pickle.dump(session_data, f)
# Subsequent runs - load session
page = ChromiumPage()
with open('session.pkl', 'rb') as f:
session_data = pickle.load(f)
page.set.cookies(session_data)
# Now direct access without challenges
page.get('https://protected-site.com/dashboard')
Proxy Rotation
For high-volume scraping, rotate proxies to avoid IP-based rate limiting:
from DrissionPage import ChromiumPage
proxies = [
'http://proxy1.example.com:8080',
'http://proxy2.example.com:8080',
'http://proxy3.example.com:8080'
]
for proxy in proxies:
page = ChromiumPage(proxy=proxy)
page.get('https://example.com')
# Do your scraping
# ...
page.quit()
Stealth Mode
DrissionPage has built-in stealth features that minimize detection:
from DrissionPage import ChromiumPage
# Enable stealth mode
page = ChromiumPage(
headless=True,
stealth=True # Minimizes automation indicators
)
# Or configure specific anti-detection options
page = ChromiumPage(browser_args=[
'--disable-blink-features=AutomationControlled',
'--disable-dev-shm-usage',
'--no-sandbox'
])
Troubleshooting: Real Issues from GitHub
Here are actual problems developers encountered with DrissionPage and how to fix them:
Issue #1: "Element not found after page load"
Source: GitHub Issue #487
User reported that elements weren't found even though the page appeared loaded. This happens with SPAs (React, Vue, etc.) where content loads after the initial page load.
❌ Doesn't work:
page.get('https://react-app.example.com')
element = page.ele('css:.dynamic-content') # Returns None
✅ Fix - Use explicit waits:
page.get('https://react-app.example.com')
# Wait for specific element to appear
element = page.ele('css:.dynamic-content', timeout=15)
# Or wait for network idle
page.wait.network_idle()
# Or wait for custom condition
def content_loaded(page):
return page.ele('css:.dynamic-content') is not None
page.wait.doc_idle()
element = page.ele('css:.dynamic-content')
Issue #2: "Browser crashes after multiple iterations"
Source: GitHub Issue #512
User's script ran for about 50 iterations then crashed with "Browser not reachable". This is a memory leak issue from not properly closing browser instances.
❌ Doesn't work:
# Memory accumulates with each loop
for url in urls:
page = ChromiumPage()
page.get(url)
# ... scraping ...
# Missing cleanup!
✅ Fix - Always quit the browser:
# Use context manager for automatic cleanup
for url in urls:
with ChromiumPage() as page:
page.get(url)
# ... scraping ...
# Browser automatically closed here
# Or manually quit in finally block
page = None
try:
page = ChromiumPage()
# ... scraping ...
finally:
if page:
page.quit()
Issue #3: "Authentication state not persisting"
Source: GitHub Issue #534
User logged in manually but subsequent page loads lost the session. The cookies weren't being saved correctly.
✅ Fix - Proper session management:
from DrissionPage import ChromiumPage
import json
# Save session after login
page = ChromiumPage()
page.get('https://example.com/login')
# ... perform login ...
# Save all cookies
cookies = page.cookies.as_dict()
with open('cookies.json', 'w') as f:
json.dump(cookies, f)
# Load session in new instance
page = ChromiumPage()
with open('cookies.json', 'r') as f:
cookies = json.load(f)
for name, value in cookies.items():
page.set.cookies(name, value)
page.get('https://example.com/dashboard') # Already logged in
Common Errors and Quick Fixes
"Browser not found" error:
# Install Chrome or let DrissionPage download portable version
page = ChromiumPage(browser_path='/path/to/chrome')
"Timeout waiting for element":
# Increase timeout or wait differently
element = page.ele('css:.slow-element', timeout=30)
# Or wait for page to fully load first
page.wait.load_complete()
element = page.ele('css:.slow-element')
"Element click intercepted":
# Use JavaScript click instead
page.run_js('document.querySelector(".button").click()')
# Or scroll into view first
element = page.ele('css:.button')
element.scroll.to_see()
element.click()
Best Practices to Stay Undetected
- Respect rate limits: Even with DrissionPage, don't hammer servers. Add delays between requests.
- Use real user agents: Rotate user agents to match typical browser usage.
- Mimic human behavior: Add random delays, mouse movements, and scrolling.
- Check robots.txt: Respect site policies to avoid legal issues.
- Start with visible mode: Test with headless=False to see what's happening, then switch to headless.
- Handle errors gracefully: Network stuff fails more than you'd expect.
- Monitor for challenges: Check if you're hitting CAPTCHAs and back off if needed.
When DrissionPage Won't Help
Be realistic - DrissionPage is great, but it's not magic. It won't help with:
- IP-based bans (you'll need proxies)
- Account requirements (you still need valid credentials)
- Behavioral analysis (don't act robotic)
- Advanced CAPTCHAs (you may need solving services)
Final Thoughts
DrissionPage has become my go-to for web scraping. The fact that it bypasses Cloudflare without any special configuration is huge - it just works.
The API is cleaner than Selenium, it's faster, and it doesn't get detected as easily. For anyone struggling with bot detection issues, it's worth switching.
That said, it's still relatively new compared to Selenium. The documentation can be sparse, and there are fewer community resources. But the core functionality is solid, and the GitHub issues get resolved quickly.
If you're just doing simple scraping on unprotected sites, BeautifulSoup or requests are still faster and lighter. But for anything with anti-bot protections, DrissionPage saves hours of frustration.
Recommendation:
Next time you hit a Cloudflare wall with Selenium, give DrissionPage a try. The transition is straightforward, and you'll be surprised how much easier it is.
Link to the project: github.com/g1879/DrissionPage