Create CSV in Python: A Practical Developer's Guide

Master creating CSV files with Python’s csv module: writer and DictWriter patterns, proper encoding, newline handling, and robust I/O practices for reliable data export.

MyDataTables
MyDataTables Team
·5 min read
CSV in Python - MyDataTables

Getting Started with CSVs in Python

CSV is a lightweight, widely supported format for tabular data. In Python, the built-in csv module provides two primary ways to write data: writer and DictWriter. We'll start with a simple writer example, then show a dictionary-based approach so you can map your data to headers. This section sets up the minimal patterns you’ll reuse across projects.

Python
import csv rows = [ {"name": "Alice", "age": 30, "city": "London"}, {"name": "Bob", "age": 25, "city": "Paris"} ] with open("people_writer.csv", "w", newline="", encoding="utf-8") as f: w = csv.writer(f) w.writerow(["name", "age", "city"]) for r in rows: w.writerow([r["name"], r["age"], r["city"]])
Python
import csv rows = [ {"name": "Alice", "age": 30, "city": "London"}, {"name": "Bob", "age": 25, "city": "Paris"} ] with open("people_dictwriter.csv", "w", newline="", encoding="utf-8") as f: fieldnames = ["name", "age", "city"] dw = csv.DictWriter(f, fieldnames=fieldnames) dw.writeheader() for r in rows: dw.writerow(r)

Why this matters: The writer API writes simple sequences, while DictWriter uses headers as keys, reducing the chance of misaligned columns. Both approaches handle escaping and quotes correctly, which is essential when data contains commas or special characters.

Writing CSV Files: writer vs DictWriter

Choosing between csv.writer and csv.DictWriter depends on your data shape. If you already have rows as lists, writer is straightforward. If your data is a collection of dictionaries or you want headers baked into the output, DictWriter is more ergonomic. We'll compare both with practical examples and show how to configure quoting and delimiters.

Python
import csv # Writer with custom delimiter and minimal quoting with open("custom_delimiter.csv", "w", newline="", encoding="utf-8") as f: w = csv.writer(f, delimiter=';') w.writerow(["name", "score"]) w.writerow(["Eve", 92.5])
Python
import csv # DictWriter with all fields quoted with open("quoted.csv", "w", newline="", encoding="utf-8") as f: fieldnames = ["name", "score"] dw = csv.DictWriter(f, fieldnames=fieldnames, quoting=csv.QUOTE_ALL) dw.writeheader() dw.writerow({"name": "Grace", "score": 88.0})

Edge choices: If your data can include the delimiter or newline characters, consider setting the delimiter explicitly and controlling quoting with csv.QUOTE_MINIMAL (default) or csv.QUOTE_ALL for consistency across tools. For most app work, DictWriter is preferable when you operate on dict records, while writer is fine for small, simple lists.

Reading CSV Files Safely

Reading CSV data correctly is as important as writing it. The csv module offers both reader and DictReader. Reader yields rows as lists, while DictReader returns dictionaries using the first row as headers. We’ll demonstrate both approaches and show how to handle missing headers gracefully.

Python
import csv with open("people_writer.csv", "r", newline="", encoding="utf-8") as f: r = csv.reader(f) header = next(r) # capture headers for row in r: print(dict(zip(header, row)))
Python
import csv with open("people_dictwriter.csv", "r", newline="", encoding="utf-8") as f: dr = csv.DictReader(f) for row in dr: print(row)

Notes: If a header row is missing, DictReader can be instructed with fieldnames to supply your own keys. Always specify encoding to avoid Unicode errors when data contains non‑ASCII characters. When consuming very large files, consider iterating instead of loading the entire file into memory.

Handling Encoding and Delimiters

CSV files come from diverse environments, so encoding and delimiters matter. UTF-8 is a safe default for most data. If your source uses a semicolon or a tab instead of a comma, set the delimiter accordingly. The csv module supports quoting and escaping, which helps preserve data integrity when fields contain commas or quotes.

Python
import csv with open("data_semicolon.csv", "w", newline="", encoding="utf-8") as f: w = csv.writer(f, delimiter=';') w.writerow(["id", "value"]) w.writerow([1, "αβγ"])
Python
with open("data_semicolon.csv", "r", newline="", encoding="utf-8") as f: r = csv.reader(f, delimiter=';') for row in r: print(row)

Best practice: Always define encoding and newline handling explicitly. If your data mixes encodings, normalize to UTF-8 before writing, or convert on read to a common internal representation. When exporting for ingestion by other tools, keep a stable delimiter and consistent quote rules to avoid breaking downstream parsing.

Working with Large CSV Files

Large CSVs require streaming rather than loading everything into memory. The csv module supports iteration over the file handle, which lets you process millions of rows efficiently. We'll show a simple processor and a generator-based writing pattern to keep memory usage predictable.

Python
import csv def process(row): # placeholder for complex processing return row with open("large.csv", "r", newline="", encoding="utf-8") as f: reader = csv.DictReader(f) for row in reader: _ = process(row)
Python
# Incremental write to avoid large temporary data structures import csv with open("large_out.csv", "w", newline="", encoding="utf-8") as f: w = csv.writer(f) w.writerow(["id", "value"]) for i in range(1_000_000): w.writerow([i, i * 2])

Scaling tip: For extremely large datasets, write in chunks, use DictWriter to maintain clear headers, and consider buffering strategies to minimize disk I/O. Pair CSV with a streaming processor or a producer-consumer pattern if you have a data pipeline.

Using Pandas for CSV I/O (When to use it)

Pandas offers high-level APIs for CSV I/O that are convenient for data analysis tasks. If you’re performing filtering, aggregation, or complex transforms, pandas can simplify code and improve readability. However, it adds a dependency and may require more memory for very large files. Here are common patterns.

Python
import pandas as pd df = pd.DataFrame({"name": ["Alice", "Bob"], "age": [30, 25], "city": ["London", "Paris"]}) df.to_csv("pd_output.csv", index=False)
Python
import pandas as pd df_read = pd.read_csv("pd_output.csv") print(df_read.head())

Guidance: Use pandas when your workflow includes data exploration or analysis; otherwise, the csv module keeps dependencies light and is perfectly adequate for straightforward CSV creation. For incremental or streaming needs, rely on the standard library.

Best Practices and Common Pitfalls

Even experienced developers trip over a few CSV pitfalls. The following examples illustrate common mistakes and how to fix them. Always specify newline and encoding to prevent extra blank lines or encoding errors. When headers are present, prefer DictReader/DictWriter for robust mapping. If a file will be used across platforms, test on Windows and macOS to confirm newline behavior.

Python
import csv # Pitfall: forgetting newline causes blank lines on Windows with open("bad.csv", "w", newline="") as f: w = csv.writer(f) w.writerow(["a", "b"]) w.writerow([1, 2])
Python
# Correct: explicit newline and utf-8 encoding with open("good.csv", "w", newline="", encoding="utf-8") as f: w = csv.writer(f) w.writerow(["a", "b"]) w.writerow([1, 2])
Python
# DictWriter with headers ensures consistent column order import csv rows = [{"name": "Carol", "score": 95}] with open("headers.csv", "w", newline="", encoding="utf-8") as f: dw = csv.DictWriter(f, fieldnames=["name", "score"]) dw.writeheader() for r in rows: dw.writerow(r)

Takeaway: Favor explicit encodings, consistent delimiters, and header-aware writing to reduce downstream parsing errors. Validate outputs with a quick read-back check in development before deploying to production.

Related Articles