Python XML to CSV: A Practical Python Guide

A comprehensive guide to converting XML to CSV using Python. Learn parsing with ElementTree, handling namespaces and attributes, streaming large files, and optional use of lxml or pandas for succinct workflows.

MyDataTables Team

March 25, 2026·5 min read

Python CSV CSV Encoding Read CSV Python MyDataTables CSV Writer

XML to CSV in Python - MyDataTables — Photo by Sunriseforevervia Pixabay

Quick AnswerSteps

To convert XML to CSV in Python, parse the XML with ElementTree (or lxml) and write rows using the csv module. Map each record node to a CSV row, handle missing data, and manage namespaces. This approach supports attributes and nested elements and scales to large files with streaming parsing.

Introduction: Why convert XML to CSV with Python?

XML remains a widely used data interchange format, but for data analysis and reporting, CSV is often easier to consume in tools like spreadsheets, databases, and data frames. The process of converting XML to CSV (often described as python xml to csv) involves mapping hierarchical XML structures to flat rows and columns. According to MyDataTables, this conversion enables efficient data analysis workflows and reproducible pipelines. In this guide, you’ll learn practical patterns for parsing XML, handling namespaces and attributes, and using a mix of built-in Python libraries and optional third-party tools to tailor the workflow to your data. The focus is on reliability, readability, and scalability, so you can adapt to small datasets or large XML files.

Basic XML to CSV with ElementTree

Python

import xml.etree.ElementTree as ET
import csv

# Load the XML document
tree = ET.parse('data.xml')
root = tree.getroot()

# Create the CSV writer
with open('output.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    # Header row – adjust columns to your data model
    writer.writerow(['id', 'name', 'value'])
    # Iterate over each record element (adjust path as needed)
    for rec in root.findall('.//record'):
        rid = rec.findtext('id') or ''
        name = rec.findtext('name') or ''
        value = rec.findtext('value') or ''
        writer.writerow([rid, name, value])

This minimal pattern is great for flat XML structures where each <record> contains simple child elements. It demonstrates the essential approach: parse, extract, and write. The next sections extend this pattern to handle namespaces, attributes, nested structures, and large files.

Handling Attributes and Nested Elements in XML to CSV

Python

import xml.etree.ElementTree as ET
import csv

tree = ET.parse('data.xml')
root = tree.getroot()

with open('output.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(['record_id', 'type', 'category'])
    for rec in root.findall('.//record'):
        record_id = rec.findtext('id') or ''
        # Access an attribute on the <record> element
        rec_type = rec.attrib.get('type', '')
        # Nested element value
        category = rec.findtext('./category/name') or ''
        writer.writerow([record_id, rec_type, category])

Attributes require using .attrib on the element and indexing with get.
Nested elements should be addressed with explicit paths; consider defensive checks if elements may be missing.
This approach keeps the code readable while supporting richer XML schemas. If you anticipate complex nesting, build a small mapping function to normalize tags before writing.

Namespaces and Robust XML Matching

Python

import xml.etree.ElementTree as ET
import csv

ns = {'ns': 'http://example.com/ns'}  # replace with your actual namespace

tree = ET.parse('data.xml')
root = tree.getroot()

with open('output.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(['id', 'title'])
    # Use explicit namespace mapping for precise queries
    for rec in root.findall('.//ns:record', ns):
        rec_id = rec.findtext('ns:id', ns) or ''
        title = rec.findtext('ns:title', ns) or ''
        writer.writerow([rec_id, title])

If your XML uses a default or multiple namespaces, you may need to adapt the namespace dictionary or switch to a lax approach that strips namespaces during processing. For very complex schemas, consider using an XML library with robust XPath support like lxml; however, ElementTree remains sufficient for many standard tasks.

Streaming for Large XML Files: Iterative Parsing

Python

import csv
import xml.etree.ElementTree as ET


def iter_records(xml_path):
    # Use iterative parsing to keep memory usage low
    context = ET.iterparse(xml_path, events=("end",))
    for event, elem in context:
        # Process only record elements (adjust tag as needed)
        if elem.tag.endswith('record'):
            yield {
                'id': (elem.findtext('./id') or '').strip(),
                'name': (elem.findtext('./name') or '').strip(),
                'value': (elem.findtext('./value') or '').strip(),
            }
            elem.clear()  # free memory


with open('output.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(['id', 'name', 'value'])
    for rec in iter_records('data.xml'):
        writer.writerow([rec['id'], rec['name'], rec['value']])

Streaming transforms are essential for very large XML files, as they avoid loading the entire document into memory. The trade-off is slightly more intricate control flow, but you gain the ability to process gigabytes of XML data row by row. For many teams, this is the difference between a practical workflow and an in-memory-only prototype.

Real-World Example: Catalog XML to CSV (Attributes, Nested Elements, and Prices)

Python

import xml.etree.ElementTree as ET
import csv

tree = ET.parse('catalog.xml')
root = tree.getroot()

with open('catalog.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    writer.writerow(['sku', 'name', 'price', 'currency', 'category', 'stock'])
    for item in root.findall('.//item'):
        sku = item.findtext('sku') or ''
        name = item.findtext('name') or ''
        price = item.findtext('./pricing/price') or ''
        currency = item.findtext('./pricing/currency') or 'USD'
        category = item.findtext('./category/name') or ''
        stock = item.attrib.get('stock', '') or (item.findtext('./stock') or '')
        writer.writerow([sku, name, price, currency, category, stock])

This example demonstrates how to bring together attributes (stock), nested structures (pricing/price), and disjoint paths (category/name). When the XML structure varies by product type, you can implement a small mapping layer to normalize fields before writing to CSV. If you need to preserve the original order of records, consider adding an index field to the header.

Alternative: Using pandas read_xml for concise pipelines

Python

import pandas as pd

# read_xml provides a straightforward path to extract matching records
df = pd.read_xml('data.xml', xpath='.//record')
# Write to CSV with a single call
df.to_csv('output.csv', index=False, encoding='utf-8')

Pandas offers a concise alternative for quick data extraction, especially when the XML structure maps cleanly to tabular rows. The read_xml function supports XPath queries, which can simplify the extraction of repeated structures. For large datasets, you may prefer streaming approaches; pandas is excellent for smaller to mid-sized XML-to-CSV tasks where development speed matters.

Validation, Troubleshooting, and Best Practices

Python

import csv

with open('output.csv', 'r', encoding='utf-8') as f:
    reader = csv.reader(f)
    header = next(reader, None)
    sample = [row for _, row in zip(range(5), reader)]
    print('Header:', header)
    print('Sample rows:', sample)

Always validate headers and row counts to ensure your mapping is correct.
If you encounter missing fields, normalize them to empty strings to preserve column alignment.
When dealing with large CSVs, consider writing in chunks or using a streaming approach to minimize memory usage.
If you reuse the same script for multiple XML files, parameterize the input/output paths and the field mapping to improve reusability.

Steps

Estimated time: 60-90 minutes

1
Plan data mapping
Define which XML elements become CSV columns and how to handle missing data and attributes. Create a small sample showing the expected structure.
Tip: Draft a tiny sample XML to validate your mapping.
2
Choose parser strategy
Decide between ElementTree (built-in) or lxml for your XPath needs. For simple mappings, ElementTree is usually sufficient.
Tip: Start with ElementTree and switch to lxml if you need advanced queries.
3
Write a minimal converter
Implement a Python script that reads XML, extracts fields, and writes to CSV with a header row.
Tip: Comment each extraction so the intent is clear to future you.
4
Add error handling
Handle missing fields gracefully and normalize data types so the CSV remains consistent.
Tip: Treat missing values as empty strings to preserve column alignment.
5
Test with sample and validation
Run the script on a small sample and validate header and first few rows against expectations.
Tip: Print a sample row during early development to verify mapping.
6
Scale to large files
Switch to streaming parsing (iterparse) and write rows incrementally to avoid high memory usage.
Tip: Call elem.clear() after processing a record to free memory.

Pro Tip: Use streaming parsers (iterparse) for large XML to keep memory usage low.

Warning: Namespaces complicate tag matching; consider normalizing tags or stripping namespaces during parsing.

Note: Validate input XML with a schema when possible to catch structural issues early.

Pro Tip: Capture missing fields as empty strings to preserve column alignment in CSV.

Prerequisites

Required

Python 3.8 or higher↗
Required
pip package manager
Required
Basic command line knowledge
Required

Optional

VS Code or any code editor
Optional
Optional: lxml (for advanced XPath and speed)↗
Optional

Commands

Action	Command
Check Python versionEnsure Python 3.8+ installed	`python --version`
Install optional lxmlUse for advanced XPath support if needed	`pip install lxml`
Run conversion scriptReplace with your actual file paths	`python convert_xml_to_csv.py input.xml output.csv`

Main Points

Plan mapping before coding
Handle namespaces and attributes carefully
Use streaming for large XML files
Validate CSV output after conversion
Consider pandas for concise pipelines when appropriate

← More in CSV with Python

Python XML to CSV: A Practical Python Guide

Introduction: Why convert XML to CSV with Python?

Basic XML to CSV with ElementTree

Handling Attributes and Nested Elements in XML to CSV

Namespaces and Robust XML Matching

Streaming for Large XML Files: Iterative Parsing

Real-World Example: Catalog XML to CSV (Attributes, Nested Elements, and Prices)

Alternative: Using pandas read_xml for concise pipelines

Validation, Troubleshooting, and Best Practices

Steps

Plan data mapping

Choose parser strategy

Write a minimal converter

Add error handling

Test with sample and validation

Scale to large files

Prerequisites

Commands

People Also Ask

Main Points

Related Articles