Python XLSX to CSV: A Practical Guide for Data Engineers

Learn how to convert Excel (.xlsx) to CSV in Python using pandas. This guide covers single and multi-sheet workbooks, encoding options, and performance tips for reliable data pipelines.

MyDataTables Team

March 25, 2026·5 min read

Python CSV Read CSV Python CSV Tutorial

Quick AnswerSteps

To convert an XLSX file to CSV with Python, use pandas' read_excel to load the workbook and to_csv to write CSV. Start with a single-sheet workflow, then scale to multi-sheet processing by looping through the sheet names. Always choose utf-8 encoding to maximize compatibility and handle empty cells gracefully.

Why convert XLSX to CSV with Python\n\nThe python xlsx to csv workflow is a common pattern in data analytics. CSV is lighter, easier to ingest, and widely supported by data processing tools. Python, with the pandas library, provides a clean API to read Excel (.xlsx) data and export it as CSV. This approach preserves the data values, while dropping formatting and formulas, which are Excel-specific constructs. In this article, we’ll walk through practical, copy-paste-ready examples that illustrate how to convert a workbook with one sheet and then scale to multi-sheet workbooks. The goal is to give you a reliable, repeatable method you can plug into ETL pipelines or data-cleaning scripts. As you read, note how the phrase python xlsx to csv appears naturally as a common task for data professionals. \n\n`python\nimport pandas as pd\n# Load the first sheet (index 0) from input.xlsx and export to CSV\ndf = pd.read_excel("input.xlsx", sheet_name=0, engine="openpyxl")\ndf.to_csv("output.csv", index=False, encoding="utf-8")\n`\n\n`python\n# Convert every sheet in a workbook to separate CSV files\nxlsx = pd.ExcelFile("input.xlsx", engine="openpyxl")\nfor sheet in xlsx.sheet_names:\n df = pd.read_excel(xlsx, sheet_name=sheet)\n df.to_csv(f"{sheet}.csv", index=False, encoding="utf-8")\n`\n\nNotes: Use engine="openpyxl" for modern Excel files. If you want to preserve non-ASCII text, UTF-8 encoding is recommended, possibly with a UTF-8 BOM if your downstream tools require it.

Single-sheet vs multi-sheet conversion: choices\n\nIn many cases, you’ll only need a single-sheet extraction. The pandas API lets you pick a particular sheet with sheet_name, which streamlines scripts and reduces overhead. For multi-sheet workbooks, you can either loop through all sheet names or use pd.ExcelFile to read sheet-by-sheet. The key decision is whether to combine data into one CSV or create one per sheet. In both scenarios, you’ll typically still rely on index=False to avoid writing row indices, and encoding="utf-8" to maintain compatibility with international data. The phrase python xlsx to csv recurs here as the canonical workflow for converting structured Excel data to portable CSVs.\n\n`python\n# Load a specific sheet named 'Data' and export to CSV\nimport pandas as pd\nfp = "input.xlsx"\ndf = pd.read_excel(fp, sheet_name="Data", engine="openpyxl")\ndf.to_csv("Data.csv", index=False, encoding="utf-8")\n`\n\n`python\n# Read all sheets and export to separate CSVs\nimport pandas as pd\nxls = pd.ExcelFile("input.xlsx", engine="openpyxl")\nfor name, df in pd.read_excel(xls, sheet_name=None).items():\n df.to_csv(f"{name}.csv", index=False, encoding="utf-8")\n`

Performance tips for large workbooks\n\nWhen workbook size grows, memory usage becomes a concern. A straightforward approach loads whole sheets into memory, which may cause issues for very large files. A memory-friendly pattern is to process one sheet at a time and write results incrementally. You can also switch to a streaming approach by using openpyxl in read_only mode and Python’s csv module to write rows as they’re read. This is a practical technique for a python xlsx to csv workflow that scales with data volume. \n\n`python\nfrom openpyxl import load_workbook\nimport csv\nwb = load_workbook("input.xlsx", read_only=True, data_only=True)\nfor ws in wb.worksheets:\n with open(f"{ws.title}.csv", "w", newline="", encoding="utf-8") as f:\n writer = csv.writer(f)\n for row in ws.iter_rows(values_only=True):\n writer.writerow(row)\n`\n\n`python\n# Alternative: stream with a single sheet to a single CSV\nimport pandas as pd\nfor sheet in pd.ExcelFile("input.xlsx", engine="openpyxl").sheet_names:\n df = pd.read_excel("input.xlsx", sheet_name=sheet, engine="openpyxl")\n df.to_csv(f"{sheet}.csv", index=False, encoding="utf-8")\n`

Handling encoding and data types\n\nExcel data can contain a mix of strings, numbers, dates, and missing values. When converting to CSV, you should decide how to represent missing data and how to handle non-ASCII characters. The standard approach is to use UTF-8 encoding and, if needed, force consistent data types to avoid surprises downstream. You’ll often convert to strings or apply explicit dtype, then export. Remember that CSV cannot capture Excel-specific types like formulas or charts; only the evaluated values survive. This is a core aspect of the python xlsx to csv transformation. \n\n`python\nimport pandas as pd\ndf = pd.read_excel("input.xlsx", sheet_name="Data", engine="openpyxl")\n# Normalize all columns to strings to ensure consistent CSV output\ndf = df.astype(str)\ndf.to_csv("Data_str.csv", index=False, encoding="utf-8-sig")\n`\n\n`python\n# Preserve numeric types where appropriate\nimport pandas as pd\nwith pd.ExcelFile("input.xlsx", engine="openpyxl") as xls:\n df = pd.read_excel(xls, sheet_name="Data")\n# Save with native dtypes (no cast)\ndf.to_csv("Data_native.csv", index=False, encoding="utf-8")\n`

Saving to CSV with proper encoding and dialect\n\nChoosing the right encoding and newline style helps compatibility across systems and tools. UTF-8 without BOM is typically sufficient, but some Windows tools expect a BOM, which you can provide with utf-8-sig. The line terminator can be adjusted if your downstream system requires CRLF or LF. These choices are part of the robust python xlsx to csv pipeline and prevent subtle data corruption. \n\n`python\nimport pandas as pd\n# Simple export with UTF-8 (no BOM)\ndf = pd.read_excel("input.xlsx", sheet_name=None, engine="openpyxl")\n# If you want a single CSV across all sheets, you’d concatenate first; here we demonstrate per-sheet export\nfor sheet, data in df.items():\n data.to_csv(f"{sheet}.csv", index=False, encoding="utf-8")\n`\n\n`python\n# Alternative with UTF-8-SIG BOM for Windows tools\nfor sheet in pd.ExcelFile("input.xlsx", engine="openpyxl").sheet_names:\n df = pd.read_excel("input.xlsx", sheet_name=sheet, engine="openpyxl")\n df.to_csv(f"{sheet}_bom.csv", index=False, encoding="utf-8-sig")\n`

Practical end-to-end example\n\nPutting it all together, here is a complete end-to-end example that reads an input workbook and exports each sheet as its own CSV file. This demonstrates the typical end-to-end python xlsx to csv workflow you would embed in a data ETL script or a notebook. You can adapt the script to produce a single CSV, or to merge sheets with an added Sheet column for traceability. The MyDataTables guidance on this topic emphasizes reproducibility and clarity in naming conventions. \n\n`python\nimport pandas as pd\n\ndef convert_workbook_to_csv(input_path: str, output_dir: str = "."):\n xls = pd.ExcelFile(input_path, engine="openpyxl")\n for sheet in xls.sheet_names:\n df = pd.read_excel(xls, sheet_name=sheet)\n df.to_csv(f"{output_dir}/{sheet}.csv", index=False, encoding="utf-8")\n\nif name == "main":\n # Example usage: convert_workbook_to_csv("input.xlsx", "output")\n convert_workbook_to_csv("input.xlsx", "output")\n`\n\nIf you want to combine data from all sheets into a single CSV, you can adapt the function to append a new column called Sheet and then concatenate. This is a common pattern in python xlsx to csv workflows used by data scientists and engineers.

Steps

Estimated time: 15-25 minutes

1
Define target sheet scope
Decide whether you need a single CSV or one CSV per sheet. This guides script design and memory usage.
Tip: Start with a small workbook to test the flow.
2
Set up Python environment
Create a dedicated virtual environment to isolate dependencies and avoid conflicts.
Tip: Use venv or conda to manage environments.
3
Install dependencies
Install pandas and openpyxl to enable read_excel and Excel parsing.
Tip: Lock versions if you’re building a reproducible pipeline.
4
Write the conversion script
Create a script that reads from input.xlsx and writes CSV files, handling encoding and sheet names.
Tip: Comment key decisions for future maintenance.
5
Run and verify
Execute the script and inspect the resulting CSV files for structure and data integrity.
Tip: Use head or tail to sample lines quickly.
6
Handle edge cases
Address blank cells, date formats, and non-ASCII characters to ensure portability.
Tip: Define a test set with diverse data.

Warning: CSV cannot store Excel formulas or formatting; only values are preserved.

Pro Tip: Use encoding='utf-8' (or 'utf-8-sig' if BOM is required) for broad compatibility.

Note: Test a small sample workbook before running on large datasets.

Prerequisites

Required

Python 3.8+↗
Required
pip
Required
pandas library↗
Required

Optional

openpyxl engine↗
Optional
Optional: numpy for numeric handling↗
Optional

Commands

Action	Command
Check Python versionIf your system uses Python 3, use 'python3 --version'.	`python --version`
Install pandas and openpyxlUse 'pip3' if your system requires it.	`pip install pandas openpyxl`
Run conversion scriptScript loads Excel and writes CSV per sheet.	`python convert_xlsx_to_csv.py`
Inline one-liner conversionAd-hoc conversion without a script.	`python -c 'import pandas as pd; df = pd.read_excel("input.xlsx"); df.to_csv("output.csv", index=False)'`
Verify generated CSVCheck first few lines to validate header and content.	`head -n 5 output.csv`