Python XLSX to CSV: A Practical Guide for Data Engineers
Learn how to convert Excel (.xlsx) to CSV in Python using pandas. This guide covers single and multi-sheet workbooks, encoding options, and performance tips for reliable data pipelines.

To convert an XLSX file to CSV with Python, use pandas' read_excel to load the workbook and to_csv to write CSV. Start with a single-sheet workflow, then scale to multi-sheet processing by looping through the sheet names. Always choose utf-8 encoding to maximize compatibility and handle empty cells gracefully.
Why convert XLSX to CSV with Python\n\nThe python xlsx to csv workflow is a common pattern in data analytics. CSV is lighter, easier to ingest, and widely supported by data processing tools. Python, with the pandas library, provides a clean API to read Excel (.xlsx) data and export it as CSV. This approach preserves the data values, while dropping formatting and formulas, which are Excel-specific constructs. In this article, we’ll walk through practical, copy-paste-ready examples that illustrate how to convert a workbook with one sheet and then scale to multi-sheet workbooks. The goal is to give you a reliable, repeatable method you can plug into ETL pipelines or data-cleaning scripts. As you read, note how the phrase python xlsx to csv appears naturally as a common task for data professionals. \n\npython\nimport pandas as pd\n# Load the first sheet (index 0) from input.xlsx and export to CSV\ndf = pd.read_excel("input.xlsx", sheet_name=0, engine="openpyxl")\ndf.to_csv("output.csv", index=False, encoding="utf-8")\n\n\npython\n# Convert every sheet in a workbook to separate CSV files\nxlsx = pd.ExcelFile("input.xlsx", engine="openpyxl")\nfor sheet in xlsx.sheet_names:\n df = pd.read_excel(xlsx, sheet_name=sheet)\n df.to_csv(f"{sheet}.csv", index=False, encoding="utf-8")\n\n\nNotes: Use engine="openpyxl" for modern Excel files. If you want to preserve non-ASCII text, UTF-8 encoding is recommended, possibly with a UTF-8 BOM if your downstream tools require it.
Single-sheet vs multi-sheet conversion: choices\n\nIn many cases, you’ll only need a single-sheet extraction. The pandas API lets you pick a particular sheet with sheet_name, which streamlines scripts and reduces overhead. For multi-sheet workbooks, you can either loop through all sheet names or use pd.ExcelFile to read sheet-by-sheet. The key decision is whether to combine data into one CSV or create one per sheet. In both scenarios, you’ll typically still rely on index=False to avoid writing row indices, and encoding="utf-8" to maintain compatibility with international data. The phrase python xlsx to csv recurs here as the canonical workflow for converting structured Excel data to portable CSVs.\n\npython\n# Load a specific sheet named 'Data' and export to CSV\nimport pandas as pd\nfp = "input.xlsx"\ndf = pd.read_excel(fp, sheet_name="Data", engine="openpyxl")\ndf.to_csv("Data.csv", index=False, encoding="utf-8")\n\n\npython\n# Read all sheets and export to separate CSVs\nimport pandas as pd\nxls = pd.ExcelFile("input.xlsx", engine="openpyxl")\nfor name, df in pd.read_excel(xls, sheet_name=None).items():\n df.to_csv(f"{name}.csv", index=False, encoding="utf-8")\n
Performance tips for large workbooks\n\nWhen workbook size grows, memory usage becomes a concern. A straightforward approach loads whole sheets into memory, which may cause issues for very large files. A memory-friendly pattern is to process one sheet at a time and write results incrementally. You can also switch to a streaming approach by using openpyxl in read_only mode and Python’s csv module to write rows as they’re read. This is a practical technique for a python xlsx to csv workflow that scales with data volume. \n\npython\nfrom openpyxl import load_workbook\nimport csv\nwb = load_workbook("input.xlsx", read_only=True, data_only=True)\nfor ws in wb.worksheets:\n with open(f"{ws.title}.csv", "w", newline="", encoding="utf-8") as f:\n writer = csv.writer(f)\n for row in ws.iter_rows(values_only=True):\n writer.writerow(row)\n\n\npython\n# Alternative: stream with a single sheet to a single CSV\nimport pandas as pd\nfor sheet in pd.ExcelFile("input.xlsx", engine="openpyxl").sheet_names:\n df = pd.read_excel("input.xlsx", sheet_name=sheet, engine="openpyxl")\n df.to_csv(f"{sheet}.csv", index=False, encoding="utf-8")\n
Handling encoding and data types\n\nExcel data can contain a mix of strings, numbers, dates, and missing values. When converting to CSV, you should decide how to represent missing data and how to handle non-ASCII characters. The standard approach is to use UTF-8 encoding and, if needed, force consistent data types to avoid surprises downstream. You’ll often convert to strings or apply explicit dtype, then export. Remember that CSV cannot capture Excel-specific types like formulas or charts; only the evaluated values survive. This is a core aspect of the python xlsx to csv transformation. \n\npython\nimport pandas as pd\ndf = pd.read_excel("input.xlsx", sheet_name="Data", engine="openpyxl")\n# Normalize all columns to strings to ensure consistent CSV output\ndf = df.astype(str)\ndf.to_csv("Data_str.csv", index=False, encoding="utf-8-sig")\n\n\npython\n# Preserve numeric types where appropriate\nimport pandas as pd\nwith pd.ExcelFile("input.xlsx", engine="openpyxl") as xls:\n df = pd.read_excel(xls, sheet_name="Data")\n# Save with native dtypes (no cast)\ndf.to_csv("Data_native.csv", index=False, encoding="utf-8")\n
Saving to CSV with proper encoding and dialect\n\nChoosing the right encoding and newline style helps compatibility across systems and tools. UTF-8 without BOM is typically sufficient, but some Windows tools expect a BOM, which you can provide with utf-8-sig. The line terminator can be adjusted if your downstream system requires CRLF or LF. These choices are part of the robust python xlsx to csv pipeline and prevent subtle data corruption. \n\npython\nimport pandas as pd\n# Simple export with UTF-8 (no BOM)\ndf = pd.read_excel("input.xlsx", sheet_name=None, engine="openpyxl")\n# If you want a single CSV across all sheets, you’d concatenate first; here we demonstrate per-sheet export\nfor sheet, data in df.items():\n data.to_csv(f"{sheet}.csv", index=False, encoding="utf-8")\n\n\npython\n# Alternative with UTF-8-SIG BOM for Windows tools\nfor sheet in pd.ExcelFile("input.xlsx", engine="openpyxl").sheet_names:\n df = pd.read_excel("input.xlsx", sheet_name=sheet, engine="openpyxl")\n df.to_csv(f"{sheet}_bom.csv", index=False, encoding="utf-8-sig")\n
Practical end-to-end example\n\nPutting it all together, here is a complete end-to-end example that reads an input workbook and exports each sheet as its own CSV file. This demonstrates the typical end-to-end python xlsx to csv workflow you would embed in a data ETL script or a notebook. You can adapt the script to produce a single CSV, or to merge sheets with an added Sheet column for traceability. The MyDataTables guidance on this topic emphasizes reproducibility and clarity in naming conventions. \n\npython\nimport pandas as pd\n\ndef convert_workbook_to_csv(input_path: str, output_dir: str = "."):\n xls = pd.ExcelFile(input_path, engine="openpyxl")\n for sheet in xls.sheet_names:\n df = pd.read_excel(xls, sheet_name=sheet)\n df.to_csv(f"{output_dir}/{sheet}.csv", index=False, encoding="utf-8")\n\nif __name__ == "__main__":\n # Example usage: convert_workbook_to_csv("input.xlsx", "output")\n convert_workbook_to_csv("input.xlsx", "output")\n\n\nIf you want to combine data from all sheets into a single CSV, you can adapt the function to append a new column called Sheet and then concatenate. This is a common pattern in python xlsx to csv workflows used by data scientists and engineers.
Steps
Estimated time: 15-25 minutes
- 1
Define target sheet scope
Decide whether you need a single CSV or one CSV per sheet. This guides script design and memory usage.
Tip: Start with a small workbook to test the flow. - 2
Set up Python environment
Create a dedicated virtual environment to isolate dependencies and avoid conflicts.
Tip: Use venv or conda to manage environments. - 3
Install dependencies
Install pandas and openpyxl to enable read_excel and Excel parsing.
Tip: Lock versions if you’re building a reproducible pipeline. - 4
Write the conversion script
Create a script that reads from input.xlsx and writes CSV files, handling encoding and sheet names.
Tip: Comment key decisions for future maintenance. - 5
Run and verify
Execute the script and inspect the resulting CSV files for structure and data integrity.
Tip: Use head or tail to sample lines quickly. - 6
Handle edge cases
Address blank cells, date formats, and non-ASCII characters to ensure portability.
Tip: Define a test set with diverse data.
Prerequisites
Required
- Required
- pipRequired
- Required
Optional
- Optional
- Optional
Commands
| Action | Command |
|---|---|
| Check Python versionIf your system uses Python 3, use 'python3 --version'. | python --version |
| Install pandas and openpyxlUse 'pip3' if your system requires it. | pip install pandas openpyxl |
| Run conversion scriptScript loads Excel and writes CSV per sheet. | python convert_xlsx_to_csv.py |
| Inline one-liner conversionAd-hoc conversion without a script. | python -c 'import pandas as pd; df = pd.read_excel("input.xlsx"); df.to_csv("output.csv", index=False)' |
| Verify generated CSVCheck first few lines to validate header and content. | head -n 5 output.csv |
People Also Ask
What is the simplest way to convert XLSX to CSV with Python?
The simplest approach uses pandas: read_excel to load the sheet and to_csv to write a CSV. For multi-sheet workbooks, loop through sheet names and export each as a separate CSV.
Use pandas to read the Excel sheet and write a CSV, looping for multiple sheets if needed.
How do I convert multiple sheets to separate CSV files?
Load the workbook with ExcelFile and iterate over sheet names, exporting each DataFrame with to_csv. This keeps data organized by sheet name.
Iterate over the sheets and save each as its own CSV.
Can I preserve non-ASCII text in the CSV?
Yes, export with UTF-8 encoding (encoding='utf-8' or 'utf-8-sig' if BOM is required). This avoids character loss.
Export with UTF-8 to keep non-ASCII characters safe.
What about Excel formulas and date types?
CSV stores values only. Formulas aren’t saved; dates are exported as strings or as numbers depending on your handling.
CSV only saves values, not formulas; dates may appear as strings or numbers.
How can I handle very large Excel files efficiently?
Process one sheet at a time, or use a streaming approach with openpyxl in read_only mode to avoid loading the entire workbook into memory.
Process sheets one by one or stream data to CSV to save memory.
Main Points
- Convert XLSX to CSV with pandas using read_excel and to_csv
- Handle single-sheet and multi-sheet workbooks with sheet_name and ExcelFile
- Prefer UTF-8 encoding for broad compatibility
- Process large workbooks per sheet or streaming to save memory
- Validate the output with quick read-back checks