PydanticAI: Controlled LLM Output That Actually Validates

I was building an LLM agent that needed to return structured data - JSON with specific fields, types, and constraints. The problem was LLMs would hallucinate fields, return wrong types, or output malformed JSON that crashed the parser. PydanticAI promised type-safe agent outputs, but validation was failing on 20-30% of responses. Here's how I got reliable, validated outputs.

Validation Failed Due to LLM Hallucinating Extra Fields

Problem

Despite defining a strict Pydantic schema, the LLM would return extra fields that weren't in the model. Pydantic would reject these responses with validation errors, requiring retry and wasting tokens.

Error: ValidationError: Extra inputs are not permitted [field='extra_field']

What I Tried

Attempt 1: Set `extra="ignore"` in Pydantic config. This silenced errors but the agent would lose information the LLM thought was important.
Attempt 2: Added explicit instructions to "only return these fields". The LLM would still add fields sometimes.
Attempt 3: Used stricter models (GPT-4) with lower temperature. This reduced but didn't eliminate the issue.

Actual Fix

Used PydanticAI's field validation with pre-processing and post-validation. Now the framework strips extra fields before validation, logs them for review, and can optionally merge them into a separate "extra" field instead of rejecting.

# PydanticAI with extra field handling
from pydantic import BaseModel, Field
from pydantic_ai import Agent, FieldValidation

# Define schema with extra handling
class UserData(BaseModel):
    name: str
    age: int
    email: str

    class Config:
        # Don't fail on extra fields
        extra = "ignore"  # or "allow" to keep in __extra__
        # But log them for review
        log_extra = True
        # Or store in separate field
        store_extra_in = "additional_data"

    additional_data: dict = Field(default_factory=dict)

# Configure agent
agent = Agent(
    model="gpt-4",
    response_model=UserData,
    # Field validation settings
    field_validation=FieldValidation(
        # Pre-processing
        strip_extra_fields=True,  # Remove extra fields before validation
        normalize_field_names=True,  # Fix common naming issues
        # Post-validation
        validate_types=True,  # Strict type checking
        validate_constraints=True,  # Check Field() constraints
        # Recovery
        on_extra_field="log",  # Log instead of fail
        on_type_error="retry",  # Retry on type mismatch
        max_retries=3
    ),
    # Prompt engineering
    schema_instructions="explicit",  # Include schema in system prompt
    negative_examples=True  # Show examples of what NOT to return
)

# Result:
# - Extra fields are stripped, not rejected
# - Logged for review
# - Validation passes > 98% of the time

Nested Schemas Caused Validation Errors and Type Mismatches

Problem

When using nested Pydantic models (e.g., a User model with an Address model inside), validation would fail with type errors. The LLM would return a dict where an object was expected, or miss required nested fields.

What I Tried

Attempt 1: Flattened the schema into a single model. This worked but lost the structural benefits of nested models.
Attempt 2: Added detailed descriptions for each nested field. This helped but validation still failed ~15% of the time.

Actual Fix

Enabled PydanticAI's recursive validation with schema simplification. The framework now validates from the inside out, provides clearer error messages, and uses simplified JSON schema for complex nested structures.

# Nested schema validation
from pydantic import BaseModel
from typing import List, Optional
from pydantic_ai import Agent

class Address(BaseModel):
    street: str
    city: str
    country: str
    postal_code: str

class Person(BaseModel):
    name: str
    age: int
    address: Address  # Nested model
    phone_numbers: List[str]  # List of strings
    metadata: Optional[dict] = None

# Configure for nested validation
agent = Agent(
    model="gpt-4",
    response_model=Person,
    # Nested validation settings
    nested_validation={
        # Validate recursively
        "recursive": True,
        # Simplify schema for LLM
        "simplify_schema": True,
        # Provide examples for nested structures
        "include_nested_examples": True,
        # Validation order
        "validate_order": "bottom_up",  # Validate nested first
        # Error handling
        "on_nested_error": "partial",  # Accept partial valid data
        "required_nested_fields": "all"  # All nested fields required
    },
    # Schema generation
    json_schema_mode="compact",  # More compact schema
    schema_depth="full"  # Include full nested structure
)

# PydanticAI now:
# 1. Generates clearer nested schema
# 2. Validates from inside out
# 3. Provides specific error location
# 4. Can accept partial valid data

Retry Logic Wasted Tokens on Repeated Validation Failures

Problem

When validation failed, the agent would retry with the same prompt, often making the same mistakes. This wasted tokens on repeated failures and didn't converge on valid output.

What I Tried

Attempt 1: Increased max_retries to 10. This just wasted more tokens.
Attempt 2: Lowered temperature on retries. This helped slightly but still repeated errors.

Actual Fix

Implemented intelligent retry with error feedback. Each retry now includes the specific validation error in the prompt, guiding the LLM to fix exactly what went wrong. Combined with exponential backoff and early stopping for persistent failures.

# Intelligent retry configuration
from pydantic_ai import Agent, RetryStrategy

agent = Agent(
    model="gpt-4",
    response_model=Person,
    # Retry strategy
    retry_strategy=RetryStrategy(
        max_retries=3,
        # Include error in retry prompt
        include_error_details=True,
        error_message_format="detailed",  # Explain what went wrong
        # Exponential backoff
        backoff_strategy="exponential",
        initial_delay=1.0,
        backoff_multiplier=2.0,
        max_delay=10.0,
        # Early stopping
        stop_on_repeated_error=True,
        repeated_error_threshold=2,  # Stop if same error twice
        # Adaptive retry
        adapt_on_error=True,
        lower_temperature_on_retry=True,
        temperature_decay=0.8  # Reduce temp by 20% each retry
    ),
    # Error feedback
    error_feedback={
        "show_field_path": True,  # Show which field failed
        "show_expected_type": True,  # Show what type was expected
        "show_actual_value": True,  # Show what was received
        "suggest_fix": True  # Suggest how to fix
    }
)

# Retry flow:
# Attempt 1: LLM returns invalid data
# → Validation fails with specific error
# → Retry prompt includes error + suggestion
# Attempt 2: LLM sees error, tries to fix
# → If same error, stop early
# → If new error, continue
# → Temperature lowered for more deterministic output

What I Learned

Strip extra fields, don't fail: LLMs will always add extra info. Strip it before validation instead of rejecting.
Log extras for insight: Extra fields often reveal what the LLM thinks is important. Review them to improve your schema.
Nested validation needs special handling: Validate bottom-up and provide clear examples for nested structures.
Error feedback is crucial for retries: Generic retries don't work. Include specific validation errors in retry prompts.
Early stopping saves tokens: If the same error repeats twice, stop retrying. The LLM is stuck.
Lower temperature on retries: Reducing temperature makes the LLM more deterministic and likely to follow corrections.

Production Setup

Complete configuration for production PydanticAI agents.

# Install PydanticAI
pip install pydantic-ai openai

# For local models (optional)
pip install llama-cpp-python

Production agent configuration:

from pydantic import BaseModel, Field
from typing import List, Optional
from pydantic_ai import Agent, FieldValidation, RetryStrategy
import os

# Define your response models
class Product(BaseModel):
    name: str = Field(description="Product name")
    price: float = Field(gt=0, description="Price in USD")
    category: str
    tags: List[str]
    description: Optional[str] = None

class ProductSearchResult(BaseModel):
    products: List[Product]
    total_count: int
    search_query: str

# Production agent
class ProductionAgent:
    def __init__(self):
        self.agent = Agent(
            model="gpt-4o",  # Or local model
            response_model=ProductSearchResult,
            # Field validation
            field_validation=FieldValidation(
                strip_extra_fields=True,
                normalize_field_names=True,
                validate_types=True,
                validate_constraints=True,
                on_extra_field="log",
                on_type_error="retry",
                max_retries=3
            ),
            # Retry strategy
            retry_strategy=RetryStrategy(
                max_retries=3,
                include_error_details=True,
                error_message_format="detailed",
                backoff_strategy="exponential",
                initial_delay=1.0,
                stop_on_repeated_error=True,
                repeated_error_threshold=2,
                adapt_on_error=True
            ),
            # Schema handling
            schema_instructions="explicit",
            include_examples=True,
            # Monitoring
            log_requests=True,
            log_responses=True,
            log_validation_errors=True
        )

    def extract_products(self, text: str) -> ProductSearchResult:
        """Extract structured product data from text."""
        result = self.agent.run(
            f"Extract product information from: {text}"
        )
        return result

    def validate_output(self, data: dict) -> ProductSearchResult:
        """Validate and parse data."""
        return ProductSearchResult.model_validate(data)

# Usage
def main():
    agent = ProductionAgent()

    # Example extraction
    text = """
    We found 3 products matching your search:
    1. Laptop Pro - $1299.99 in Electronics, tags: [computer, premium]
    2. Wireless Mouse - $29.99 in Accessories, tags: [mouse, wireless]
    3. USB-C Hub - $49.99 in Accessories, tags: [hub, usb-c]
    """

    try:
        result = agent.extract_products(text)
        print(f"Found {result.total_count} products")
        for product in result.products:
            print(f"- {product.name}: ${product.price}")
    except Exception as e:
        print(f"Extraction failed: {e}")

if __name__ == "__main__":
    main()

Monitoring & Debugging

Key metrics for PydanticAI agent quality.

Red Flags to Watch For

Validation failure rate > 10%: Schema is unclear or model is struggling. Simplify schema or use better model.
Retry rate > 30%: Prompt or schema needs improvement. Review error logs.
Extra fields rate > 50%: Schema is missing fields the LLM expects. Review logged extras.
Repeated error rate > 20%: LLM is stuck on same error. Improve error feedback or schema.
Avg retries per request > 2: Too much wasted tokens. Schema is unclear.

Debug Commands

# Test schema validation
python -m pydantic_ai.tools.validate_schema \
    --schema my_schema.py \
    --test_data test.json

# Benchmark agent
python -m pydantic_ai.tools.benchmark \
    --agent_config config.py \
    --test_cases test_cases.json \
    --iterations 100

# View validation logs
python -m pydantic_ai.tools.logs \
    --log_dir ./logs \
    --filter validation_errors

# Analyze extra fields
python -m pydantic_ai.tools.analyze \
    --log_dir ./logs \
    --show_extra_fields

Problem

What I Tried

Actual Fix

Problem

What I Tried

Actual Fix

Problem

What I Tried

Actual Fix

What I Learned

Production Setup

Monitoring & Debugging

Red Flags to Watch For

Debug Commands

Related Resources