Convert JSON to CSV with Python: A Practical Developer's Guide
Learn practical methods to convert JSON to CSV using Python, covering pure Python and pandas approaches, flattening nested data, encoding choices, and performance considerations.

To convert json to csv python, use either the standard library (json + csv) or the pandas library. The quick steps are: load JSON, flatten if needed, and write rows to a CSV file. This article demonstrates both approaches with practical code, including encoding considerations and basic validation, for reliability across common datasets.
Why Python is a natural choice for JSON to CSV
Python is a popular choice for data tasks like convert json to csv python due to its readable syntax, mature standard library, and strong ecosystem. In this section we explore why Python's json and csv modules provide a reliable baseline for JSON-to-CSV conversion, and how to structure scripts for clarity and reuse.
import json, csv
with open('data.json','r', encoding='utf-8') as f:
data = json.load(f)
# If the JSON is a dictionary with a list under a key, normalize it
if isinstance(data, dict):
# adapt to your data structure
data = data.get('items', [])
with open('data.csv','w', newline='', encoding='utf-8') as f:
if not data:
raise SystemExit('No data found')
writer = csv.DictWriter(f, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)- Key takeaways:
- Use DictWriter to automatically handle header row and mapping keys to columns.
- Ensure consistent field order by fixing fieldnames or sorting keys before writing.
This approach works well for flat JSON arrays and keeps dependencies minimal. For simple datasets, the standard library is often the most portable choice.
bodyBlocks2_1 plsolation? no
Two common approaches: pure Python vs pandas
Two common pathways exist for convert json to csv python: a pure Python approach with the json and csv modules, and a pandas-based approach that leverages json reading and DataFrame serialization for convenience and speed. The pure-Python route gives full control and minimal dependencies, while pandas can simplify handling irregular shapes and nested data with built-in normalization helpers.
# Pure Python example (same data as Block 1)
import json, csv
with open('data.json','r', encoding='utf-8') as f:
data = json.load(f)
if isinstance(data, dict):
data = data.get('items', [])
with open('data.csv','w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)# Pandas approach
import pandas as pd
pd.set_option('display.max_columns', None)
df = pd.read_json('data.json')
df.to_csv('data.csv', index=False)- When data is nested, pandas often offers easier flattening with json_normalize.
- If you prefer zero dependencies, stick to the pure-Python method; for rapid prototyping, pandas is a strong choice.
bodyBlocks3_2
Flattening nested JSON for CSV output
Many real-world JSON datasets contain nested objects or arrays. CSV, however, is tabular, so you must flatten or normalize nested structures before writing rows. Here are two robust approaches.
# Method A: Pandas json_normalize
import pandas as pd
from pandas import json_normalize
raw = pd.read_json('nested.json')
flat = json_normalize(raw.to_dict(orient='records'))
flat.to_csv('flattened.csv', index=False)# Method B: Pure Python flatten
import json, csv
def flatten_dict(d, parent_key='', sep='_'):
items = {}
for k, v in d.items():
new_key = f"{parent_key}{sep}{k}" if parent_key else k
if isinstance(v, dict):
items.update(flatten_dict(v, new_key, sep=sep))
else:
items[new_key] = v
return items
with open('nested.json','r', encoding='utf-8') as f:
data = json.load(f)
flattened = [flatten_dict(item) for item in data]
with open('flattened.csv','w', newline='', encoding='utf-8') as f:
fieldnames = sorted(flattened[0].keys())
w = csv.DictWriter(f, fieldnames=fieldnames)
w.writeheader()
w.writerows(flattened)- Flattening with pandas is convenient for complex schemas but requires understanding of json_normalize; the pure-Python path gives full control and avoids extra dependencies.
bodyBlocks4_3
Streaming large JSON files to CSV
For very large JSON datasets, loading everything into memory is impractical. Streaming can prevent out-of-memory errors and reduce peak memory usage. Below are two common strategies.
# Strategy A: JSON Lines (one JSON object per line)
import json, csv
with open('large.jsonl','r', encoding='utf-8') as f, open('large.csv','w', newline='', encoding='utf-8') as out:
first = True
writer = None
for line in f:
obj = json.loads(line)
if writer is None:
writer = csv.DictWriter(out, fieldnames=obj.keys())
writer.writeheader()
writer.writerow(obj)# Strategy B: Iterative deserialization with ijson (requires extra package)
import ijson, csv
with open('large.json','r', encoding='utf-8') as f, open('large.csv','w', newline='', encoding='utf-8') as out:
objects = ijson.items(f, 'item')
writer = None
for obj in objects:
if writer is None:
fieldnames = list(obj.keys())
writer = csv.DictWriter(out, fieldnames=fieldnames)
writer.writeheader()
writer.writerow(obj)- JSON Lines is memory-friendly for simple objects; ijson enables streaming from standard JSON arrays but adds an external dependency.
- When streaming, predefine fieldnames if possible to avoid repeated key detection and improve performance.
bodyBlocks5_4
Encoding, CSV dialects, and edge cases
Encoding choices can affect compatibility with downstream tools like Excel. Use UTF-8 by default, and consider including a UTF-8 BOM for Excel compatibility. Also be mindful of CSV dialects when sharing across teams.
# UTF-8 with BOM for Excel friendliness
import pandas as pd
pd.read_json('data.json').to_csv('data.csv', index=False, encoding='utf-8-sig')# Dialects example with the csv module
import csv
with open('data.csv','w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=['id','name','email'], dialect='excel')
writer.writeheader()
writer.writerow({'id':1,'name':'Alice','email':'[email protected]'})- If you have missing fields in some records, decide whether to fill with empty strings or drop those rows. The DictWriter will fill missing keys only if you supply the exact fieldnames for every row, so consider a preprocessing step to compute all possible keys.
- For large datasets, write headers once and avoid re-creating writers in a loop to minimize overhead.
bodyBlocks6_5
End-to-end example: from input.json to output.csv
This section demonstrates a complete, runnable example from a small input.json to a CSV file, including input and output visuals.
# data.json
[
{"id": 1, "name": "Alice", "email": "[email protected]"},
{"id": 2, "name": "Bob", "email": "[email protected]"}
]# json_to_csv.py
import json, csv, sys
input_file = sys.argv[1]
output_file = sys.argv[2]
with open(input_file, 'r', encoding='utf-8') as f:
data = json.load(f)
with open(output_file, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)# Run (from project root):
python3 json_to_csv.py data.json data.csv# Output (data.csv)
id,name,email
1,Alice,[email protected]
2,Bob,[email protected]- This end-to-end example shows a flat JSON structure; if your data contains nested objects, apply flattening steps before writing to CSV.
- For validation, consider loading the resulting CSV back into Python to verify row counts and field alignment.
bodyBlocks7_6
Troubleshooting common issues
Even small JSON-to-CSV tasks can encounter edge cases. A common issue is missing fields in some records leading to KeyError during header writing. A robust approach computes the union of fields across all records and fills missing values with empty strings.
import json, csv
with open('data.json','r', encoding='utf-8') as f:
data = json.load(f)
fieldnames = set()
for item in data:
fieldnames.update(item.keys())
fieldnames = sorted(fieldnames)
with open('data.csv','w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for row in data:
writer.writerow({k: row.get(k, '') for k in fieldnames})- If you encounter encoding errors, ensure input JSON is UTF-8 and write CSV with an explicit encoding, e.g., encoding='utf-8' or encoding='utf-8-sig' for Excel compatibility.
- For nested data, flatten first; otherwise, a direct DictWriter will miss nested fields or serialize them as strings, which may not be desirable.
bodyBlocks8_7
Performance tips and best practices
Performance hinges on data shape, memory, and I/O. Prefer streaming for large arrays, cache the fieldnames, and minimize per-row Python overhead. When possible, convert in a single pass and avoid repeated dictionary lookups.
# Precompute fieldnames and reuse the writer
import json, csv
with open('data.json','r', encoding='utf-8') as f:
data = json.load(f)
# Assume data is a list of dictionaries
fieldnames = list(data[0].keys())
with open('data.csv','w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=fieldnames)
writer.writeheader()
for row in data:
writer.writerow(row)- If you need to process streaming data, prefer a streaming parser (ijson) and avoid loading the entire dataset into memory.
- For extremely large datasets, consider a pipeline that writes to CSV in chunks or uses a database export as an intermediate step.
Steps
Estimated time: 45-90 minutes
- 1
Prepare the JSON data
Ensure your JSON data is in a list of objects format (array of dictionaries). If your JSON is a dict with a top-level list, extract that list before converting.
Tip: Validate the shape with a quick Python snippet: print(type(data)); print(len(data)) - 2
Choose a conversion method
Decide between a pure Python approach (no extra dependencies) or a pandas-based approach for ease of flattening and handling irregular structures.
Tip: If you expect nested data, start with pandas for easier normalization. - 3
Write or prepare the script
Create a script like json_to_csv.py that reads the JSON, flattens if needed, and writes a CSV with a stable header.
Tip: Always define fieldnames to produce deterministic CSV columns. - 4
Run the conversion
Execute the script from your project root and verify the output CSV is created with the expected columns.
Tip: Use a dry-run by printing fieldnames and a few rows first. - 5
Validate the CSV
Read the resulting CSV and confirm there are no missing values or misaligned columns. Handle missing fields by filling empty strings.
Tip: Cross-check a few sample records manually or with a quick test script. - 6
Handle edge cases
If your JSON contains nested objects or arrays, apply flattening or normalization before CSV conversion, and consider encoding issues.
Tip: For Excel compatibility, consider encoding='utf-8-sig'.
Prerequisites
Required
- Required
- pip package managerRequired
- Access to a JSON data file (e.g., data.json)Required
- Command line or terminal accessRequired
Optional
- Optional
- Text editor or IDEOptional
Commands
| Action | Command |
|---|---|
| Convert JSON to CSV using a Python scriptRequires a script that accepts input/output file paths | — |
| Run a one-liner Python commandInline script to convert without a separate file | — |
| Convert via pandas for quick resultsRequires pandas; good for quick exploration | — |
People Also Ask
Can I convert JSONL (newline-delimited JSON) to CSV?
Yes. For JSON Lines, read each line as a separate JSON object, then write to CSV with a DictWriter. This avoids loading the entire dataset into memory and is a common pattern for large data streams.
Yes. Treat each line as an independent record, then write to CSV.
What if the JSON contains nested objects or arrays?
Flatten the data before writing to CSV. Use pandas.json_normalize or a custom recursive flatten function to convert nested keys into top-level columns.
Flatten nested data before writing to CSV.
Should I use pandas for all JSON-to-CSV tasks?
Pandas is convenient for complex or nested data and rapid development. For simple, flat JSON, the standard library approach is lighter and more portable.
Pandas is great for complex data, but not always necessary.
How do I handle missing keys in some records?
Precompute the union of keys across all records and fill missing fields with empty strings to avoid KeyError during write.
Fill missing keys with blanks to keep rows aligned.
What encoding should I use for CSV output?
UTF-8 is standard. Use UTF-8-SIG if you need to open the file in Excel without data corruption.
Use UTF-8 or UTF-8-SIG for Excel compatibility.
Can I verify the CSV output automatically?
Yes. Load the generated CSV back into Python and compare the number of rows and fieldnames with the input data.
You can verify by round-tripping back to JSON.
Main Points
- Use Python's json and csv modules for a dependency-light workflow
- Pandas simplifies nested JSON handling and rapid prototyping
- Flatten nested structures before CSV to ensure clean tabular data
- Prefer streaming for large datasets to prevent memory issues
- Choose proper encoding (UTF-8 or UTF-8-SIG) for cross-tool compatibility