Level 2 · Chapter 1.4

Structured Output
Engineering

AI can generate more than prose. This chapter teaches you how to engineer prompts that reliably produce JSON, XML, CSV, and other structured formats that integrate seamlessly with other systems and workflows.

Watch the Lecture

The Power of Structured Output

Most AI output is natural language text. But in professional workflows, you often need data in specific formats: JSON for APIs, CSV for spreadsheets, XML for enterprise systems. Asking an AI to generate structured output seems strange until you realize it works beautifully. Language models can generate perfectly valid JSON, CSV, or XML if you ask them clearly.

The advantage is enormous. Instead of getting prose that you manually transform into a structured format, you get directly usable data. This enables automation: AI output feeds directly into your other systems without human transformation. A document processing workflow can extract data directly into your database. A research summary can generate structured findings ready for analysis. This is the foundation of AI-powered automation.

Generating JSON from AI

JSON (JavaScript Object Notation) is the most common structured format. Most modern APIs expect JSON. Here is how to engineer prompts that generate valid JSON:

Be explicit about format. "Generate output in valid JSON format" is better. But specifying the schema is much better: "Generate output as a JSON object with these fields: name (string), email (string), purchase_count (integer), last_purchase_date (string in YYYY-MM-DD format)".

Provide a schema example. Show the exact structure you want: {"name": "John Doe", "email": "[email protected]", "purchase_count": 5, "last_purchase_date": "2026-03-01"}

Specify what happens with edge cases. What if a field is missing or unknown? Should it be null, empty string, or omitted? Being explicit prevents variations in output. "If email is unknown, use null. If purchase_count is unknown, use 0."

The AI will then generate valid JSON objects that match your schema exactly. This output can be parsed programmatically and fed directly into your systems.

CSV Output for Data Analysis

CSV (Comma-Separated Values) is the standard format for spreadsheets and data analysis. Asking an AI to extract data and format it as CSV is powerful for research, data synthesis, and analysis workflows.

Prompt example: "Extract the following information from the text and format as CSV: product name, price, rating (1-5), number of reviews. Use this header row: product_name,price,rating,review_count"

The AI will then generate properly formatted CSV that can be imported directly into Excel or used for further data processing. This is particularly valuable for bulk data extraction from documents or research synthesis.

Defining Clear Schemas in Prompts

The key to reliable structured output is a clear, unambiguous schema. A schema is the blueprint that defines what fields exist, what type each field is, and any constraints on values.

Simple schema example:

"Generate JSON with this structure: { customer: { name (string), email (string), phone (string) }, order: { order_id (integer), total_amount (number with 2 decimals), items: [{ name (string), quantity (integer), price (number) }] } }"

This schema is clear: it specifies the object structure, the type of each field, and nesting. The AI will generate JSON that exactly matches this schema.

Validation requirements: You can also specify validation rules in the schema: "email must be a valid email format", "phone must be 10 digits", "price must be greater than zero". The AI will respect these constraints.

Handling Formatting Errors

Sometimes the AI generates structured output that is almost valid but has small errors. Here are techniques to minimize this:

1. Explicit examples of valid output. Show what valid output looks like. Show multiple examples if you have variations.

2. Validation instructions. "Double-check that the JSON is valid before sending it" or "Verify all required fields are present and in the correct format".

3. Error recovery instructions. "If you cannot generate valid output, explain why instead of generating invalid output." This prevents garbage output that breaks downstream systems.

4. Testing before deployment. When using structured output in production, always validate the first few outputs to ensure they match your schema exactly.

Practical Exercise: Build a Structured Data Pipeline

Choose a text source: Customer emails, research papers, product reviews, or meeting notes.

Define what you want to extract: What structured data would be valuable from this text?

Design your schema: Create a clear JSON or CSV schema for the data you want.

Write your extraction prompt: "Extract the following information from the text and format as JSON matching this schema: [YOUR SCHEMA]. Here is an example of valid output: [EXAMPLE]"

Test it: Feed the prompt and source text to ChatGPT or Claude. Does it generate valid structured output? Refine until it works reliably.

Key Takeaway

Structured output engineering unlocks AI's integration potential. By specifying clear schemas in your prompts, you enable the AI to generate data in formats that directly feed into your other systems. This transforms AI from a tool that produces prose you manually process into a tool that generates structured data you can automate with.

The investment in designing clear schemas pays dividends across many future applications. Once you have engineered one good JSON extraction prompt, you can reuse and adapt it for similar tasks. Structured output is the bridge between AI's language capabilities and your organization's data systems.

Chapter Details
Reading Time ~50 minutes
Difficulty Intermediate
Prerequisites Chapters 1.1-1.3