Python XML to CSV: A Practical Python Guide
A comprehensive guide to converting XML to CSV using Python. Learn parsing with ElementTree, handling namespaces and attributes, streaming large files, and optional use of lxml or pandas for succinct workflows.

To convert XML to CSV in Python, parse the XML with ElementTree (or lxml) and write rows using the csv module. Map each record node to a CSV row, handle missing data, and manage namespaces. This approach supports attributes and nested elements and scales to large files with streaming parsing.
Introduction: Why convert XML to CSV with Python?
XML remains a widely used data interchange format, but for data analysis and reporting, CSV is often easier to consume in tools like spreadsheets, databases, and data frames. The process of converting XML to CSV (often described as python xml to csv) involves mapping hierarchical XML structures to flat rows and columns. According to MyDataTables, this conversion enables efficient data analysis workflows and reproducible pipelines. In this guide, you’ll learn practical patterns for parsing XML, handling namespaces and attributes, and using a mix of built-in Python libraries and optional third-party tools to tailor the workflow to your data. The focus is on reliability, readability, and scalability, so you can adapt to small datasets or large XML files.
Basic XML to CSV with ElementTree
import xml.etree.ElementTree as ET
import csv
# Load the XML document
tree = ET.parse('data.xml')
root = tree.getroot()
# Create the CSV writer
with open('output.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
# Header row – adjust columns to your data model
writer.writerow(['id', 'name', 'value'])
# Iterate over each record element (adjust path as needed)
for rec in root.findall('.//record'):
rid = rec.findtext('id') or ''
name = rec.findtext('name') or ''
value = rec.findtext('value') or ''
writer.writerow([rid, name, value])This minimal pattern is great for flat XML structures where each <record> contains simple child elements. It demonstrates the essential approach: parse, extract, and write. The next sections extend this pattern to handle namespaces, attributes, nested structures, and large files.
Handling Attributes and Nested Elements in XML to CSV
import xml.etree.ElementTree as ET
import csv
tree = ET.parse('data.xml')
root = tree.getroot()
with open('output.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['record_id', 'type', 'category'])
for rec in root.findall('.//record'):
record_id = rec.findtext('id') or ''
# Access an attribute on the <record> element
rec_type = rec.attrib.get('type', '')
# Nested element value
category = rec.findtext('./category/name') or ''
writer.writerow([record_id, rec_type, category])- Attributes require using .attrib on the element and indexing with get.
- Nested elements should be addressed with explicit paths; consider defensive checks if elements may be missing.
- This approach keeps the code readable while supporting richer XML schemas. If you anticipate complex nesting, build a small mapping function to normalize tags before writing.
Namespaces and Robust XML Matching
import xml.etree.ElementTree as ET
import csv
ns = {'ns': 'http://example.com/ns'} # replace with your actual namespace
tree = ET.parse('data.xml')
root = tree.getroot()
with open('output.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['id', 'title'])
# Use explicit namespace mapping for precise queries
for rec in root.findall('.//ns:record', ns):
rec_id = rec.findtext('ns:id', ns) or ''
title = rec.findtext('ns:title', ns) or ''
writer.writerow([rec_id, title])If your XML uses a default or multiple namespaces, you may need to adapt the namespace dictionary or switch to a lax approach that strips namespaces during processing. For very complex schemas, consider using an XML library with robust XPath support like lxml; however, ElementTree remains sufficient for many standard tasks.
Streaming for Large XML Files: Iterative Parsing
import csv
import xml.etree.ElementTree as ET
def iter_records(xml_path):
# Use iterative parsing to keep memory usage low
context = ET.iterparse(xml_path, events=("end",))
for event, elem in context:
# Process only record elements (adjust tag as needed)
if elem.tag.endswith('record'):
yield {
'id': (elem.findtext('./id') or '').strip(),
'name': (elem.findtext('./name') or '').strip(),
'value': (elem.findtext('./value') or '').strip(),
}
elem.clear() # free memory
with open('output.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['id', 'name', 'value'])
for rec in iter_records('data.xml'):
writer.writerow([rec['id'], rec['name'], rec['value']])Streaming transforms are essential for very large XML files, as they avoid loading the entire document into memory. The trade-off is slightly more intricate control flow, but you gain the ability to process gigabytes of XML data row by row. For many teams, this is the difference between a practical workflow and an in-memory-only prototype.
Real-World Example: Catalog XML to CSV (Attributes, Nested Elements, and Prices)
import xml.etree.ElementTree as ET
import csv
tree = ET.parse('catalog.xml')
root = tree.getroot()
with open('catalog.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
writer.writerow(['sku', 'name', 'price', 'currency', 'category', 'stock'])
for item in root.findall('.//item'):
sku = item.findtext('sku') or ''
name = item.findtext('name') or ''
price = item.findtext('./pricing/price') or ''
currency = item.findtext('./pricing/currency') or 'USD'
category = item.findtext('./category/name') or ''
stock = item.attrib.get('stock', '') or (item.findtext('./stock') or '')
writer.writerow([sku, name, price, currency, category, stock])This example demonstrates how to bring together attributes (stock), nested structures (pricing/price), and disjoint paths (category/name). When the XML structure varies by product type, you can implement a small mapping layer to normalize fields before writing to CSV. If you need to preserve the original order of records, consider adding an index field to the header.
Alternative: Using pandas read_xml for concise pipelines
import pandas as pd
# read_xml provides a straightforward path to extract matching records
df = pd.read_xml('data.xml', xpath='.//record')
# Write to CSV with a single call
df.to_csv('output.csv', index=False, encoding='utf-8')Pandas offers a concise alternative for quick data extraction, especially when the XML structure maps cleanly to tabular rows. The read_xml function supports XPath queries, which can simplify the extraction of repeated structures. For large datasets, you may prefer streaming approaches; pandas is excellent for smaller to mid-sized XML-to-CSV tasks where development speed matters.
Validation, Troubleshooting, and Best Practices
import csv
with open('output.csv', 'r', encoding='utf-8') as f:
reader = csv.reader(f)
header = next(reader, None)
sample = [row for _, row in zip(range(5), reader)]
print('Header:', header)
print('Sample rows:', sample)- Always validate headers and row counts to ensure your mapping is correct.
- If you encounter missing fields, normalize them to empty strings to preserve column alignment.
- When dealing with large CSVs, consider writing in chunks or using a streaming approach to minimize memory usage.
- If you reuse the same script for multiple XML files, parameterize the input/output paths and the field mapping to improve reusability.
Steps
Estimated time: 60-90 minutes
- 1
Plan data mapping
Define which XML elements become CSV columns and how to handle missing data and attributes. Create a small sample showing the expected structure.
Tip: Draft a tiny sample XML to validate your mapping. - 2
Choose parser strategy
Decide between ElementTree (built-in) or lxml for your XPath needs. For simple mappings, ElementTree is usually sufficient.
Tip: Start with ElementTree and switch to lxml if you need advanced queries. - 3
Write a minimal converter
Implement a Python script that reads XML, extracts fields, and writes to CSV with a header row.
Tip: Comment each extraction so the intent is clear to future you. - 4
Add error handling
Handle missing fields gracefully and normalize data types so the CSV remains consistent.
Tip: Treat missing values as empty strings to preserve column alignment. - 5
Test with sample and validation
Run the script on a small sample and validate header and first few rows against expectations.
Tip: Print a sample row during early development to verify mapping. - 6
Scale to large files
Switch to streaming parsing (iterparse) and write rows incrementally to avoid high memory usage.
Tip: Call elem.clear() after processing a record to free memory.
Prerequisites
Required
- Required
- pip package managerRequired
- Basic command line knowledgeRequired
Optional
- VS Code or any code editorOptional
- Optional
Commands
| Action | Command |
|---|---|
| Check Python versionEnsure Python 3.8+ installed | python --version |
| Install optional lxmlUse for advanced XPath support if needed | pip install lxml |
| Run conversion scriptReplace with your actual file paths | python convert_xml_to_csv.py input.xml output.csv |
People Also Ask
What is XML and CSV?
XML is a hierarchical format suitable for nested data; CSV is a flat table. Converting involves mapping nested structures to a row-column layout.
XML is hierarchical; CSV is flat. You map nested data into columns to form rows.
Which Python parser should I use?
ElementTree is built-in and reliable for many tasks; lxml offers richer XPath support and speed when needed.
Start with ElementTree; switch to lxml if you need advanced features.
How do I handle namespaces?
Namespaces add prefixes to tags. You can ignore them by stripping, or map them with a namespace dictionary when querying.
Namespaces can complicate lookups; map them or strip for simplicity.
Can I process large XML files?
Yes—use streaming parsing like iterparse and write to CSV incrementally to control memory usage.
Yes, by streaming and writing rows as you go.
Is pandas a good alternative?
Pandas has read_xml (SciPy environment) for concise syntax, but it adds dependencies and memory pressure. Use it when quick prototyping is desired.
You can use pandas for a quick path, but it may be heavier.
How can I validate the resulting CSV?
Check header consistency, row counts, and inspect sample rows to ensure mapping correctness.
Make sure the header matches expectations and sample rows look right.
Main Points
- Plan mapping before coding
- Handle namespaces and attributes carefully
- Use streaming for large XML files
- Validate CSV output after conversion
- Consider pandas for concise pipelines when appropriate