Convert Excel to CSV with Python: A Practical Guide
Learn to convert Excel to CSV with Python using pandas. This guide covers single- and multi-sheet workflows, encoding choices, and performance considerations.

To convert Excel to CSV in Python, install pandas and openpyxl, then read the workbook with read_excel and export to_csv. For multi-sheet workbooks, iterate through sheet names and save each as a separate CSV. Use encoding='utf-8' and index=False to keep clean CSV files. This approach works on Windows, macOS, and Linux.
Why Python is a natural fit for Excel to CSV workflows
According to MyDataTables, Python has become a natural choice for Excel to CSV pipelines due to readability, a rich ecosystem, and consistent cross‑platform behavior. Teams rely on this language for reproducible data work, from quick one‑off conversions to automated batch jobs. The pandas library provides a simple, expressive API for loading Excel files and exporting CSV output, while engines like openpyxl offer robust support for modern .xlsx files. This section explains why Python excels at this task and sets expectations for what's possible.
- Readability keeps data engineers productive.
- A vast ecosystem means fewer compatibility headaches.
- Cross‑platform scripts run the same on Windows, macOS, and Linux.
- Clear, portable CSV output reduces downstream friction for data consumers.
# Quick start: read all sheets and export to CSV
import pandas as pd
xlsx = "input.xlsx"
# Read all sheets as a dictionary of DataFrames
data = pd.read_excel(xlsx, sheet_name=None, engine="openpyxl")
for name, df in data.items():
df.to_csv(f"{name}.csv", index=False, encoding="utf-8")Explanation:
- sheet_name=None loads every sheet into a dictionary keyed by sheet name.
- Iterating over items exports each DataFrame as a separate CSV named after the sheet.
- encoding="utf-8" ensures ASCII compatibility with modern systems.
Common variations:
- If you only need a subset of sheets, pass sheet_name=["Sheet1", "Sheet2"] or a single string.
- For large workbooks, consider reading specific sheets one at a time to avoid peak memory usage.
Basic conversion: single-sheet Excel to CSV with pandas
Converting a single sheet is the most common case and is straightforward with pandas. The read_excel function loads the specified sheet into a DataFrame, and to_csv writes it to a CSV file. This approach preserves column headers and row values, while giving you full control over encoding and delimiter choices. You can also disable the index column to keep a clean CSV.
import pandas as pd
# Read a single sheet
df = pd.read_excel("input.xlsx", sheet_name="Sheet1", engine="openpyxl")
# Write as CSV without the index
df.to_csv("Sheet1.csv", index=False, encoding="utf-8")Line-by-line:
- read_excel loads the target sheet; sheet_name can be a string, an int, or None.
- to_csv writes the DataFrame to a CSV file; index=False omits the index column, yielding a clean dataset.
- encoding specifies text encoding for portability across systems.
Alternative: if the sheet names are unknown, you can first inspect them with
pd.ExcelFile("input.xlsx", engine="openpyxl").sheet_namesThis yields a list you can loop over.
Handling multiple sheets in a workbook and exporting to separate CSVs
When a workbook contains several sheets, a common pattern is to export each sheet to its own CSV. This keeps downstream data consumers organized and aligns with typical ETL pipelines. The approach shown here reads the workbook as an ExcelFile, which is more efficient than loading the entire workbook at once. We then iterate through all sheet names, read each sheet into a DataFrame, and write a separate CSV per sheet.
import pandas as pd
path = "workbook.xlsx"
# Create a lightweight loader for the workbook
xls = pd.ExcelFile(path, engine="openpyxl")
for sheet in xls.sheet_names:
df = pd.read_excel(xls, sheet_name=sheet)
df.to_csv(f"{sheet}.csv", index=False, encoding="utf-8")Notes:
- This approach preserves the sheet names as CSV filenames, making it easy to map back to origin data.
- If you want a different destination directory, adjust the to_csv path accordingly.
- For very large workbooks, consider processing one sheet at a time to minimize memory pressure.
Handling dates, types, and encoding for CSV fidelity
CSV is a plain text format, so preserving date formats and data types requires explicit handling. Use parse_dates to convert date columns during import, and date_format to control how dates appear in the CSV. You can also provide converters for specific columns to enforce normalization rules. This reduces downstream surprises when CSV is consumed by other tools.
import pandas as pd
# Example with a date column and a numeric column that should not be coerced
df = pd.read_excel("input.xlsx", sheet_name="Sheet1", engine="openpyxl",
parse_dates=["DateColumn"])
# Persist a consistent date format in the CSV
df.to_csv("Sheet1.csv", index=False, encoding="utf-8", date_format="%Y-%m-%d")Best practices:
- Always specify encoding="utf-8" to maximize compatibility, especially for non‑ASCII content.
- If you have mixed locales, consider using a delimiter like comma or semicolon deliberately and document it for downstream users.
- For CSVs with many columns, you may want to inspect the data types first to avoid surprises in downstream processing.
Performance considerations for large Excel files
Excel files with many rows can stress memory and CPU, particularly when loading with read_excel. While pandas loads the entire sheet into memory, you can adopt strategies that reduce peak usage. One approach is to process data in chunks by reading targeted row ranges with skiprows and nrows, then appending to a CSV. Another strategy is to convert the workbook to CSV sheet-by-sheet using a streaming tool or an intermediate conversion step that writes to disk progressively.
import pandas as pd
chunk_size = 100000 # number of rows per chunk
path = "large.xlsx"
with pd.ExcelFile(path, engine="openpyxl") as xls:
sheet = xls.sheet_names[0]
out_path = "large.csv"
first_chunk = True
for start in range(0, 1000000, chunk_size): # example range; adapt to actual row count
df = pd.read_excel(xls, sheet_name=sheet, skiprows=range(1, start+1),
nrows=chunk_size)
if df.empty:
break
mode = 'w' if first_chunk else 'a'
header = first_chunk
df.to_csv(out_path, index=False, encoding="utf-8", mode=mode, header=header)
first_chunk = FalseNotes:
- This approach is not magic; some large Excel workflows benefit from an intermediate conversion to a binary format, then a subsequent pass to CSV.
- If you truly need streaming, consider a workflow that converts to CSV per sheet in separate processes or uses a library designed for out-of-core processing.
- Always measure memory usage with a representative dataset before choosing a strategy.
End-to-end script you can reuse
The practical path is an end-to-end script that handles the common variants: single sheet, multiple sheets, and basic CLI integration. This script uses argparse to accept input, an optional output directory, and flags to convert all sheets or just one. It demonstrates a reusable pattern suitable for automation tasks and pipelines.
import argparse
import os
import pandas as pd
def convert(input_file, output_dir=".", sheet=None, all_sheets=False):
xls = pd.ExcelFile(input_file, engine="openpyxl")
sheets = xls.sheet_names if all_sheets else ([sheet] if sheet else [xls.sheet_names[0]])
base = os.path.splitext(os.path.basename(input_file))[0]
os.makedirs(output_dir, exist_ok=True)
for sh in sheets:
df = pd.read_excel(xls, sheet_name=sh)
out_path = os.path.join(output_dir, f"{base}_{sh}.csv")
df.to_csv(out_path, index=False, encoding="utf-8")
print(f"Wrote CSV for {len(sheets)} sheet(s) to {output_dir}")
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Convert Excel to CSV using pandas")
parser.add_argument("input_file", help="Path to Excel workbook (.xlsx)")
parser.add_argument("--output-dir", default=".", help="Directory to save CSVs")
parser.add_argument("--sheet", default=None, help="Convert a single sheet by name")
parser.add_argument("--all-sheets", action="store_true", help="Convert all sheets")
args = parser.parse_args()
convert(args.input_file, args.output_dir, args.sheet, args.all_sheets)This script can be extended with error handling, logging, or additional options like custom date formats and delimiter choices. Adapt the code to your environment and test with representative Excel files before running in production.
Common variations and troubleshooting
- If you encounter a module import error for pandas or openpyxl, install them in your active environment using pip.
- Different Excel files might have leading/trailing spaces in column names; consider cleaning headers before exporting.
- Some Excel cells contain formulas; pandas reads the displayed values by default, not the underlying formulas. If you need raw formulas, use a specialized reader or a pre–export step in Excel.
- When integrating with other tools, you may want to export with a delimiter other than a comma; pandas supports separators via the to_csv function (sep parameter).
# Custom delimiter example
import pandas as pd
df = pd.read_excel("input.xlsx", sheet_name="Sheet1")
df.to_csv("Sheet1_semicolon.csv", index=False, encoding="utf-8", sep=";")These variations help you tailor the process to specific data governance and downstream consumption requirements.
Steps
Estimated time: 60-90 minutes
- 1
Install prerequisites
Install Python 3.8+ and set up a working environment. Confirm that PATH includes the Python and pip executables. This ensures smooth installation of pandas and openpyxl.
Tip: Use a virtual environment to isolate project dependencies. - 2
Install pandas and openpyxl
Install the core libraries required for Excel I/O and CSV export. Verify the installation by importing pandas in a Python shell.
Tip: If you use a project manager, pin versions to avoid breaking changes. - 3
Create a simple single-sheet converter
Write a small Python script that reads a single sheet and writes a CSV. This validates your environment and Python path.
Tip: Start with a single, known-good file before generalizing. - 4
Extend to multiple sheets
Modify the script to iterate through all sheets, exporting each to its own CSV. This easy extension scales to workbooks with many sheets.
Tip: Keep filenames deterministic (e.g., sheet name as the CSV name). - 5
Handle dates and encodings
Add parse_dates and date_format to preserve date values and ensure UTF-8 encoding for portability.
Tip: Always verify a sample of the CSV to confirm formatting. - 6
Run and validate
Execute the script from the CLI and inspect the resulting CSVs. Compare against the original Excel data for accuracy.
Tip: Automate tests or a small diff to catch regressions.
Prerequisites
Required
- Required
- pip package managerRequired
- Required
- Required
- Basic command line knowledgeRequired
Optional
- Optional
- Optional
Commands
| Action | Command |
|---|---|
| Install required Python packagesInstalls pandas and the Excel engine | — |
| Check Python versionEnsure Python 3.8+ is available | — |
| Convert single sheetConverts one sheet to a CSV file | — |
| Convert all sheetsExports every sheet to separate CSVs | — |
People Also Ask
What library do I need to convert Excel to CSV in Python?
The pandas library is the typical choice. It provides read_excel for loading Excel data and to_csv for exporting CSV files. Supporting engines like openpyxl handle .xlsx files reliably.
Pandas is the go-to library for Excel-to-CSV in Python. It makes reading and exporting straightforward.
Can pandas read both .xlsx and .xls formats?
Pandas can read .xlsx files using the openpyxl engine and .xls files with xlrd or other supported engines. Ensure the correct engine is installed for your file type.
Yes, with the appropriate engine like openpyxl for .xlsx and xlrd for .xls.
How do I preserve date formats in the CSV?
Use parse_dates to interpret date columns on load and date_format to control CSV presentation. Remember that CSV stores dates as text, so consistent formatting helps downstream users.
Parse dates on load and format them when exporting to CSV.
Is it possible to convert multiple sheets at once?
Yes. Iterate over the workbook's sheet_names and export each sheet to its own CSV file. This keeps data organized and scalable for workbooks with many sheets.
Absolutely—loop over all sheets and export each as a separate CSV.
What if the Excel file is very large and won’t fit in memory?
For very large files, read and write in chunks or per sheet. Pandas doesn’t stream Excel in the same way as CSV, so plan a chunked strategy or convert sheets individually to limit peak memory.
Chunk by chunk or sheet-by-sheet to avoid exhausting memory.
How can I run the script from the command line?
Create a small CLI using argparse that accepts the input file, output directory, and options for all-sheets or a specific sheet. Then run the script with python script.py input.xlsx --all-sheets.
Use a command-line interface with argparse and run the script directly.
Main Points
- Install pandas and openpyxl to begin
- Read Excel with read_excel, then export with to_csv
- Use encoding='utf-8' and index=False for clean CSVs
- Handle multiple sheets by looping over sheet_names
- Verify results to ensure data fidelity