Tweepy: Scraping Twitter After the API Price Hike

I needed to scrape 50K tweets for a research project. Then X/Twitter changed their API pricing in 2023 and the free tier became practically useless. Here's what actually works in 2026.

The API Pricing Problem

Before 2023, Tweepy with the free API tier could fetch thousands of tweets per month. After Elon's acquisition, everything changed:

For my research project, the Basic tier would cost $500. Not happening.

Problem

I enabled Tweepy's automatic rate limit handling, but still kept getting 429 errors after about 200 requests. The API was cutting me off before the documented limits.

Error: 429 Too Many Requests - Rate limit exceeded

What I Tried

Attempt 1: Set wait_on_rate_limit=True - Still hit 429 errors
Attempt 2: Added manual delays between requests - Worked but incredibly slow (10 req/min)
Attempt 3: Used multiple bearer tokens rotating - Got banned after an hour

Actual Fix

The issue was that Twitter's v2 API has undocumented rate limits that differ from v1.1. The fix combines proper authentication with aggressive rate limiting.

import tweepy
import time

# Use OAuth 1.0a user context instead of bearer token
# User context has higher rate limits than app-only auth
client = tweepy.Client(
    bearer_token="YOUR_BEARER_TOKEN",
    consumer_key="YOUR_CONSUMER_KEY",
    consumer_secret="YOUR_CONSUMER_SECRET",
    access_token="YOUR_ACCESS_TOKEN",
    access_token_secret="YOUR_ACCESS_TOKEN_SECRET",
    wait_on_rate_limit=True
)

# Additional safety margin - request 30% less than limit
def safe_get_tweets(query, max_results=100):
    tweets = []
    try:
        # Start with conservative rate limit
        for _ in range(30):  # Well below documented limits
            response = client.search_recent_tweets(
                query=query,
                max_results=min(max_results, 100),
                tweet_fields=['created_at', 'public_metrics']
            )
            if response.data:
                tweets.extend(response.data)

            # Add 2 second delay between requests
            time.sleep(2)

            if len(response.data) < max_results:
                break

    except tweepy.errors.TooManyRequests:
        print("Hit rate limit, waiting 15 min...")
        time.sleep(900)

    return tweets

Problem

Some v2 endpoints returned 403 Forbidden even with valid authentication. Search worked, but user lookup and timeline fetching failed.

Error: 403 Forbidden - Endpoint not accessible with current access level

What I Tried

Attempt 1: Regenerated bearer token - Same 403 errors
Attempt 2: Switched to OAuth 1.0a user context - Some endpoints worked, others still 403
Attempt 3: Checked Twitter Developer Portal - My access level was "Free" (most restricted)

Actual Fix

The 403 errors were because the Free tier doesn't access to most v2 endpoints. I had to use a hybrid approach: scrape publicly available data without authentication when possible.

# For public tweets, use the search endpoint (available on Free tier)
# For user data, you'll need a different approach

import requests
from bs4 import BeautifulSoup

def scrape_user_profile(username):
    """
    Fallback: Scrape public profile data without API
    Only works for publicly available information
    """
    url = f"https://twitter.com/{username}"
    headers = {
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36'
    }

    response = requests.get(url, headers=headers)
    if response.status_code == 200:
        soup = BeautifulSoup(response.text, 'html.parser')
        # Parse public data from HTML
        # Note: This breaks frequently as Twitter changes their DOM
        return {
            'username': username,
            'public_data': 'extract from HTML'
        }

    return None

Problem

When using the filtered stream endpoint, the connection would drop after ~30 seconds with a "Stream ended" message. No errors, just silent disconnection.

What I Tried

Attempt 1: Added keep-alive ping - Didn't help
Attempt 2: Increased timeout values - Connection still dropped
Attempt 3: Tried different rules - Same issue

Actual Fix

Twitter's streaming API on the Free tier has severe restrictions. The solution is to use polling with exponential backoff instead of streaming.

import tweepy
import time

def poll_tweets(query, interval=30):
    """
    Polling alternative to streaming API
    More reliable on free tier, works with rate limits
    """
    client = tweepy.Client(bearer_token="YOUR_TOKEN")

    seen_tweets = set()

    while True:
        try:
            response = client.search_recent_tweets(
                query=query,
                max_results=100,
                tweet_fields=['created_at', 'author_id']
            )

            if response.data:
                for tweet in response.data:
                    if tweet.id not in seen_tweets:
                        seen_tweets.add(tweet.id)
                        yield tweet

            # Wait before next poll
            time.sleep(interval)

        except tweepy.errors.TooManyRequests:
            # Exponential backoff
            wait_time = min(interval * 2, 900)  # Max 15 min
            print(f"Rate limited. Waiting {wait_time}s...")
            time.sleep(wait_time)

        except Exception as e:
            print(f"Error: {e}")
            time.sleep(60)

What I Learned

API-Free Alternatives

When the official API won't cut it, here are alternatives that still work in 2026:

1. Nitter Instances (Public Frontends)

Nitter is an open-source Twitter frontend. Public instances don't require authentication:

import requests
from bs4 import BeautifulSoup

def scrape_with_nitter(username):
    """
    Use public Nitter instances to avoid API entirely
    Note: Instances go down frequently, need fallback list
    """
    instances = [
        "nitter.net",
        "nitter.poast.org",
        "nitter.privacydev.net"
    ]

    for instance in instances:
        try:
            url = f"https://{instance}/{username}"
            headers = {'User-Agent': 'Mozilla/5.0'}
            response = requests.get(url, headers=headers, timeout=10)

            if response.status_code == 200:
                soup = BeautifulSoup(response.text, 'html.parser')
                # Parse tweets from HTML
                tweets = soup.find_all('div', class_='timeline-item')
                return [extract_tweet_data(t) for t in tweets]

        except Exception as e:
            continue

    return None

2. Browser Automation (Last Resort)

For limited data needs, undetected-chromedriver can work:

import undetected_chromedriver as uc
from selenium.webdriver.common.by import By
import time

def scrape_with_selenium(url):
    """
    Only use for small-scale scraping
    Twitter detects automation quickly
    """
    driver = uc.Chrome()
    driver.get(url)

    # Wait for manual login if needed
    time.sleep(10)

    # Scroll to load more tweets
    for _ in range(5):
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
        time.sleep(2)

    # Extract data
    tweets = driver.find_elements(By.CSS_SELECTOR, '[data-testid="tweet"]')
    # Parse and return...

    driver.quit()

Production Setup That Works

Here's my final setup that reliably fetches tweets without hitting API limits:

# twitter_scraper.py - Production configuration

import tweepy
import time
from typing import List, Generator
import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

class TwitterScraper:
    def __init__(self, bearer_token: str, consumer_key: str = None,
                 consumer_secret: str = None, access_token: str = None,
                 access_token_secret: str = None):
        """
        Initialize with OAuth 1.0a user context for better rate limits
        Falls back to bearer-only if user context not provided
        """
        self.client = tweepy.Client(
            bearer_token=bearer_token,
            consumer_key=consumer_key,
            consumer_secret=consumer_secret,
            access_token=access_token,
            access_token_secret=access_token_secret,
            wait_on_rate_limit=True
        )

    def fetch_tweets(self, query: str, max_tweets: int = 1000) -> Generator:
        """
        Fetch tweets with built-in rate limit protection

        Args:
            query: Search query
            max_tweets: Maximum tweets to fetch

        Yields:
            Tweet objects
        """
        collected = 0
        next_token = None

        while collected < max_tweets:
            try:
                response = self.client.search_recent_tweets(
                    query=query,
                    max_results=min(100, max_tweets - collected),
                    next_token=next_token,
                    tweet_fields=['created_at', 'public_metrics', 'author_id']
                )

                if not response.data:
                    logger.info("No more tweets available")
                    break

                for tweet in response.data:
                    yield tweet
                    collected += 1

                next_token = response.meta.get('next_token')
                if not next_token:
                    break

                # Conservative delay between requests
                time.sleep(3)

            except tweepy.errors.TooManyRequests:
                logger.warning("Rate limit hit, waiting 15 min")
                time.sleep(900)

            except Exception as e:
                logger.error(f"Error fetching tweets: {e}")
                time.sleep(60)

# Usage
if __name__ == "__main__":
    scraper = TwitterScraper(
        bearer_token="YOUR_TOKEN",
        consumer_key="YOUR_KEY",
        consumer_secret="YOUR_SECRET"
    )

    for tweet in scraper.fetch_tweets("python programming", max_tweets=500):
        print(f"@{tweet.author_id}: {tweet.text[:100]}...")

Monitoring & Debugging

When scraping Twitter at scale, watch for these red flags:

Red Flags to Watch For

Debugging Checklist

# Check your current rate limit status
curl -X GET "https://api.twitter.com/2/users/me?user.fields=public_metrics" \
  -H "Authorization: Bearer $BEARER_TOKEN"

# Test authentication
curl -X GET "https://api.twitter.com/2/tweets/search/recent?query=test" \
  -H "Authorization: Bearer $BEARER_TOKEN"

# Monitor headers for rate limit info
# Look for: x-rate-limit-remaining, x-rate-limit-reset

Related Resources

⚠️ Legal Note

Web scraping Twitter's data may violate their Terms of Service. This article is for educational purposes. Always check Twitter's current ToS and API terms before scraping. Consider using the official API with appropriate licensing for production use.