Convert JSON to CSV with Python: A Practical Developer's Guide

Learn practical methods to convert JSON to CSV using Python, covering pure Python and pandas approaches, flattening nested data, encoding choices, and performance considerations.

MyDataTables
MyDataTables Team
·5 min read
JSON to CSV in Python - MyDataTables
Quick AnswerSteps

To convert json to csv python, use either the standard library (json + csv) or the pandas library. The quick steps are: load JSON, flatten if needed, and write rows to a CSV file. This article demonstrates both approaches with practical code, including encoding considerations and basic validation, for reliability across common datasets.

Why Python is a natural choice for JSON to CSV

Python is a popular choice for data tasks like convert json to csv python due to its readable syntax, mature standard library, and strong ecosystem. In this section we explore why Python's json and csv modules provide a reliable baseline for JSON-to-CSV conversion, and how to structure scripts for clarity and reuse.

Python
import json, csv with open('data.json','r', encoding='utf-8') as f: data = json.load(f) # If the JSON is a dictionary with a list under a key, normalize it if isinstance(data, dict): # adapt to your data structure data = data.get('items', []) with open('data.csv','w', newline='', encoding='utf-8') as f: if not data: raise SystemExit('No data found') writer = csv.DictWriter(f, fieldnames=data[0].keys()) writer.writeheader() writer.writerows(data)
  • Key takeaways:
  • Use DictWriter to automatically handle header row and mapping keys to columns.
  • Ensure consistent field order by fixing fieldnames or sorting keys before writing.

This approach works well for flat JSON arrays and keeps dependencies minimal. For simple datasets, the standard library is often the most portable choice.

bodyBlocks2_1 plsolation? no

Two common approaches: pure Python vs pandas

Two common pathways exist for convert json to csv python: a pure Python approach with the json and csv modules, and a pandas-based approach that leverages json reading and DataFrame serialization for convenience and speed. The pure-Python route gives full control and minimal dependencies, while pandas can simplify handling irregular shapes and nested data with built-in normalization helpers.

Python
# Pure Python example (same data as Block 1) import json, csv with open('data.json','r', encoding='utf-8') as f: data = json.load(f) if isinstance(data, dict): data = data.get('items', []) with open('data.csv','w', newline='', encoding='utf-8') as f: writer = csv.DictWriter(f, fieldnames=data[0].keys()) writer.writeheader() writer.writerows(data)
Python
# Pandas approach import pandas as pd pd.set_option('display.max_columns', None) df = pd.read_json('data.json') df.to_csv('data.csv', index=False)
  • When data is nested, pandas often offers easier flattening with json_normalize.
  • If you prefer zero dependencies, stick to the pure-Python method; for rapid prototyping, pandas is a strong choice.

bodyBlocks3_2

Flattening nested JSON for CSV output

Many real-world JSON datasets contain nested objects or arrays. CSV, however, is tabular, so you must flatten or normalize nested structures before writing rows. Here are two robust approaches.

Python
# Method A: Pandas json_normalize import pandas as pd from pandas import json_normalize raw = pd.read_json('nested.json') flat = json_normalize(raw.to_dict(orient='records')) flat.to_csv('flattened.csv', index=False)
Python
# Method B: Pure Python flatten import json, csv def flatten_dict(d, parent_key='', sep='_'): items = {} for k, v in d.items(): new_key = f"{parent_key}{sep}{k}" if parent_key else k if isinstance(v, dict): items.update(flatten_dict(v, new_key, sep=sep)) else: items[new_key] = v return items with open('nested.json','r', encoding='utf-8') as f: data = json.load(f) flattened = [flatten_dict(item) for item in data] with open('flattened.csv','w', newline='', encoding='utf-8') as f: fieldnames = sorted(flattened[0].keys()) w = csv.DictWriter(f, fieldnames=fieldnames) w.writeheader() w.writerows(flattened)
  • Flattening with pandas is convenient for complex schemas but requires understanding of json_normalize; the pure-Python path gives full control and avoids extra dependencies.

bodyBlocks4_3

Streaming large JSON files to CSV

For very large JSON datasets, loading everything into memory is impractical. Streaming can prevent out-of-memory errors and reduce peak memory usage. Below are two common strategies.

Python
# Strategy A: JSON Lines (one JSON object per line) import json, csv with open('large.jsonl','r', encoding='utf-8') as f, open('large.csv','w', newline='', encoding='utf-8') as out: first = True writer = None for line in f: obj = json.loads(line) if writer is None: writer = csv.DictWriter(out, fieldnames=obj.keys()) writer.writeheader() writer.writerow(obj)
Python
# Strategy B: Iterative deserialization with ijson (requires extra package) import ijson, csv with open('large.json','r', encoding='utf-8') as f, open('large.csv','w', newline='', encoding='utf-8') as out: objects = ijson.items(f, 'item') writer = None for obj in objects: if writer is None: fieldnames = list(obj.keys()) writer = csv.DictWriter(out, fieldnames=fieldnames) writer.writeheader() writer.writerow(obj)
  • JSON Lines is memory-friendly for simple objects; ijson enables streaming from standard JSON arrays but adds an external dependency.
  • When streaming, predefine fieldnames if possible to avoid repeated key detection and improve performance.

bodyBlocks5_4

Encoding, CSV dialects, and edge cases

Encoding choices can affect compatibility with downstream tools like Excel. Use UTF-8 by default, and consider including a UTF-8 BOM for Excel compatibility. Also be mindful of CSV dialects when sharing across teams.

Python
# UTF-8 with BOM for Excel friendliness import pandas as pd pd.read_json('data.json').to_csv('data.csv', index=False, encoding='utf-8-sig')
Python
# Dialects example with the csv module import csv with open('data.csv','w', newline='', encoding='utf-8') as f: writer = csv.DictWriter(f, fieldnames=['id','name','email'], dialect='excel') writer.writeheader() writer.writerow({'id':1,'name':'Alice','email':'[email protected]'})
  • If you have missing fields in some records, decide whether to fill with empty strings or drop those rows. The DictWriter will fill missing keys only if you supply the exact fieldnames for every row, so consider a preprocessing step to compute all possible keys.
  • For large datasets, write headers once and avoid re-creating writers in a loop to minimize overhead.

bodyBlocks6_5

End-to-end example: from input.json to output.csv

This section demonstrates a complete, runnable example from a small input.json to a CSV file, including input and output visuals.

JSON
# data.json [ {"id": 1, "name": "Alice", "email": "[email protected]"}, {"id": 2, "name": "Bob", "email": "[email protected]"} ]
Python
# json_to_csv.py import json, csv, sys input_file = sys.argv[1] output_file = sys.argv[2] with open(input_file, 'r', encoding='utf-8') as f: data = json.load(f) with open(output_file, 'w', newline='', encoding='utf-8') as f: writer = csv.DictWriter(f, fieldnames=data[0].keys()) writer.writeheader() writer.writerows(data)
Bash
# Run (from project root): python3 json_to_csv.py data.json data.csv
Text
# Output (data.csv) id,name,email 1,Alice,[email protected] 2,Bob,[email protected]
  • This end-to-end example shows a flat JSON structure; if your data contains nested objects, apply flattening steps before writing to CSV.
  • For validation, consider loading the resulting CSV back into Python to verify row counts and field alignment.

bodyBlocks7_6

Troubleshooting common issues

Even small JSON-to-CSV tasks can encounter edge cases. A common issue is missing fields in some records leading to KeyError during header writing. A robust approach computes the union of fields across all records and fills missing values with empty strings.

Python
import json, csv with open('data.json','r', encoding='utf-8') as f: data = json.load(f) fieldnames = set() for item in data: fieldnames.update(item.keys()) fieldnames = sorted(fieldnames) with open('data.csv','w', newline='', encoding='utf-8') as f: writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() for row in data: writer.writerow({k: row.get(k, '') for k in fieldnames})
  • If you encounter encoding errors, ensure input JSON is UTF-8 and write CSV with an explicit encoding, e.g., encoding='utf-8' or encoding='utf-8-sig' for Excel compatibility.
  • For nested data, flatten first; otherwise, a direct DictWriter will miss nested fields or serialize them as strings, which may not be desirable.

bodyBlocks8_7

Performance tips and best practices

Performance hinges on data shape, memory, and I/O. Prefer streaming for large arrays, cache the fieldnames, and minimize per-row Python overhead. When possible, convert in a single pass and avoid repeated dictionary lookups.

Python
# Precompute fieldnames and reuse the writer import json, csv with open('data.json','r', encoding='utf-8') as f: data = json.load(f) # Assume data is a list of dictionaries fieldnames = list(data[0].keys()) with open('data.csv','w', newline='', encoding='utf-8') as f: writer = csv.DictWriter(f, fieldnames=fieldnames) writer.writeheader() for row in data: writer.writerow(row)
  • If you need to process streaming data, prefer a streaming parser (ijson) and avoid loading the entire dataset into memory.
  • For extremely large datasets, consider a pipeline that writes to CSV in chunks or uses a database export as an intermediate step.

Steps

Estimated time: 45-90 minutes

  1. 1

    Prepare the JSON data

    Ensure your JSON data is in a list of objects format (array of dictionaries). If your JSON is a dict with a top-level list, extract that list before converting.

    Tip: Validate the shape with a quick Python snippet: print(type(data)); print(len(data))
  2. 2

    Choose a conversion method

    Decide between a pure Python approach (no extra dependencies) or a pandas-based approach for ease of flattening and handling irregular structures.

    Tip: If you expect nested data, start with pandas for easier normalization.
  3. 3

    Write or prepare the script

    Create a script like json_to_csv.py that reads the JSON, flattens if needed, and writes a CSV with a stable header.

    Tip: Always define fieldnames to produce deterministic CSV columns.
  4. 4

    Run the conversion

    Execute the script from your project root and verify the output CSV is created with the expected columns.

    Tip: Use a dry-run by printing fieldnames and a few rows first.
  5. 5

    Validate the CSV

    Read the resulting CSV and confirm there are no missing values or misaligned columns. Handle missing fields by filling empty strings.

    Tip: Cross-check a few sample records manually or with a quick test script.
  6. 6

    Handle edge cases

    If your JSON contains nested objects or arrays, apply flattening or normalization before CSV conversion, and consider encoding issues.

    Tip: For Excel compatibility, consider encoding='utf-8-sig'.
Pro Tip: Use DictWriter with a fixed fieldnames list to ensure consistent CSV columns regardless of input order.
Warning: Large JSON files can exhaust memory; prefer streaming or chunked processing for reliability.
Note: When sharing CSVs, choose UTF-8 or UTF-8-SIG to improve compatibility across tools.

Prerequisites

Required

  • Required
  • pip package manager
    Required
  • Access to a JSON data file (e.g., data.json)
    Required
  • Command line or terminal access
    Required

Optional

Commands

ActionCommand
Convert JSON to CSV using a Python scriptRequires a script that accepts input/output file paths
Run a one-liner Python commandInline script to convert without a separate file
Convert via pandas for quick resultsRequires pandas; good for quick exploration

People Also Ask

Can I convert JSONL (newline-delimited JSON) to CSV?

Yes. For JSON Lines, read each line as a separate JSON object, then write to CSV with a DictWriter. This avoids loading the entire dataset into memory and is a common pattern for large data streams.

Yes. Treat each line as an independent record, then write to CSV.

What if the JSON contains nested objects or arrays?

Flatten the data before writing to CSV. Use pandas.json_normalize or a custom recursive flatten function to convert nested keys into top-level columns.

Flatten nested data before writing to CSV.

Should I use pandas for all JSON-to-CSV tasks?

Pandas is convenient for complex or nested data and rapid development. For simple, flat JSON, the standard library approach is lighter and more portable.

Pandas is great for complex data, but not always necessary.

How do I handle missing keys in some records?

Precompute the union of keys across all records and fill missing fields with empty strings to avoid KeyError during write.

Fill missing keys with blanks to keep rows aligned.

What encoding should I use for CSV output?

UTF-8 is standard. Use UTF-8-SIG if you need to open the file in Excel without data corruption.

Use UTF-8 or UTF-8-SIG for Excel compatibility.

Can I verify the CSV output automatically?

Yes. Load the generated CSV back into Python and compare the number of rows and fieldnames with the input data.

You can verify by round-tripping back to JSON.

Main Points

  • Use Python's json and csv modules for a dependency-light workflow
  • Pandas simplifies nested JSON handling and rapid prototyping
  • Flatten nested structures before CSV to ensure clean tabular data
  • Prefer streaming for large datasets to prevent memory issues
  • Choose proper encoding (UTF-8 or UTF-8-SIG) for cross-tool compatibility

Related Articles