xls to csv python: Practical guide for data teams

Learn how to convert Excel files to CSV with Python using pandas, covering single-sheet and multi-sheet workflows, data-type handling, date parsing, and CLI alternatives for speed.

MyDataTables
MyDataTables Team
ยท5 min read
Quick AnswerDefinition

Converting an Excel workbook to CSV in Python is straightforward with pandas. To perform an xls to csv python conversion, read the Excel file with pandas.read_excel and then export each sheet to a separate CSV using DataFrame.to_csv. This quick approach handles headers, data types, and missing values, and scales from a single sheet to multi-sheet workbooks with minimal code.

Introduction to xls to csv python

In data workflows, converting Excel workbooks to CSV is a common preprocessing step. Analysts and developers frequently face mixed data types, leading zeros, and date columns that require careful handling. The goal is a portable, readable CSV that preserves the essential structure of the original workbook. This section demonstrates a practical, Python-first approach using pandas, which is well-suited for both quick ad-hoc conversions and production-grade pipelines. You will see a minimal, repeatable pattern you can extend to larger datasets.

Python
# Minimal example: single-sheet Excel to CSV import pandas as pd # Read a single sheet from an Excel file df = pd.read_excel('data.xls', sheet_name='Sheet1') # Write to CSV without the index column df.to_csv('data.csv', index=False)

Notes: pandas infers datatypes and keeps the header row intact by default. You can adjust the delimiter, encoding, and missing-value handling with to_csv parameters for different environments.

Handling multiple sheets in a workbook

Excel workbooks often contain several sheets. The simplest approach is to iterate through sheet_names and export each as a CSV. The following patterns show both a per-sheet loop and a bulk-approach that returns all sheets as a dictionary. This is essential for xls to csv python workflows that must scale beyond a single sheet.

Python
import pandas as pd # Pattern 1: ExcelFile for efficient repeated reads xlsx = pd.ExcelFile('workbook.xls') for sheet in xlsx.sheet_names: df = pd.read_excel(xlsx, sheet_name=sheet) df.to_csv(f'{sheet}.csv', index=False) # Pattern 2: sheet_name=None returns all sheets as a dict sheets = pd.read_excel('workbook.xls', sheet_name=None) for name, df in sheets.items(): df.to_csv(f'{name}.csv', index=False)

This demonstrates how to systematically convert every sheet, a common requirement in xls to csv python workflows.

Data typing, dates, and encoding considerations

CSV is a plain-text format, but preserving data types and dates requires careful handling. Numeric precision, leading zeros, and date values can shift if not explicitly managed. The patterns shown here help maintain fidelity while keeping outputs compatible with downstream systems that expect UTF-8 and consistent columns. You can coerce dtypes, parse dates, and control encoding to avoid misinterpretation. The following shows practical practices to maintain data fidelity as you convert.

Python
# Example: preserve IDs as strings to avoid losing leading zeros import pandas as pd dtype_map = {'ID': str} df = pd.read_excel('data.xls', sheet_name='Sheet1', dtype=dtype_map) # Parse dates explicitly df['OrderDate'] = pd.to_datetime(df['OrderDate'], errors='coerce') # Write with UTF-8 encoding and no index df.to_csv('data.csv', index=False, encoding='utf-8')

If you must maintain numeric fidelity for large datasets, consider reading with specified dtypes and validating a subset of rows before writing.

CLI options: speed and automation

Beyond Python scripts, CLI tools like csvkit and xlsx2csv offer quick, one-liner conversions for automation pipelines or CI jobs. These tools are especially handy when you want to convert many sheets without writing Python code, enabling fast, repeatable workflows in production.

Bash
# Using csvkit (in2csv) to convert a specific sheet pip install csvkit in2csv workbook.xls --sheet "Sheet1" > sheet1.csv # Using xlsx2csv to export all sheets with a delimiter pip install xlsx2csv xlsx2csv workbook.xls -a -d ',' > all_sheets.csv

These CLI options complement a Python-centric approach and are ideal for scripting in data pipelines.

Validation, testing, and best practices

Finally, validate outputs to catch edge cases such as empty rows or unusual encodings. A quick check helps prevent downstream failure in dashboards or data warehouses. This section shows lightweight validation techniques you can adapt to your workflow. Validation early ensures the xls to csv python process remains dependable in production.

Python
import pandas as pd import hashlib # Read back the produced CSV to validate basic structure csv_path = 'data.csv' df = pd.read_csv(csv_path, nrows=5) # quick skim print(df.columns.tolist()) # Simple hash of the first 1024 bytes for quick integrity check with open(csv_path, 'rb') as f: h = hashlib.sha256(f.read(1024)).hexdigest() print('SHA-256 (first 1024 bytes):', h)

These checks help you detect encoding issues, truncated data, or unexpected newlines early in the workflow.

Steps

Estimated time: 30-60 minutes

  1. 1

    Install dependencies

    Install Python 3.8+ and pandas, then verify with a quick version print to confirm the environment is ready for xls to csv python tasks.

    Tip: Use a virtual environment to avoid version conflicts.
  2. 2

    Write a simple converter

    Create a Python script that reads an Excel file and writes a single sheet to CSV. Start with a straightforward path before handling multiple sheets.

    Tip: Explicitly set index=False to avoid an extra CSV column.
  3. 3

    Test with a sample workbook

    Run the script on a small workbook and inspect the resulting CSV for header correctness and expected column order.

    Tip: Open the CSV in a text editor to visually inspect delimiters and line endings.
  4. 4

    Extend to multiple sheets

    Modify the script to loop through all sheets and export each as a separate CSV, using the sheet name to generate filenames.

    Tip: Use f-strings for clean output filenames.
  5. 5

    Add validation

    Read back a few produced CSVs to validate columns and spot-check data types.

    Tip: Compute a quick SHA256 hash to ensure file integrity.
Pro Tip: UTF-8 encoding is recommended to avoid non-ASCII issues.
Warning: Large Excel files can consume substantial memory when loaded entirely.
Note: CSV imports are affected by locale and decimal separators.
Pro Tip: Specify dtype for columns with leading zeros to preserve IDs.

Prerequisites

Required

Keyboard Shortcuts

ActionShortcut
CopyCopy selected text in your editor or terminalCtrl+C
PastePaste copied content into editorCtrl+V
FindSearch within a document or terminal outputCtrl+F
Run Python scriptExecute the converter scriptpython script.py

People Also Ask

What library should I use for xls to csv python?

Pandas is the standard choice for xls to csv python workflows, offering read_excel and to_csv. For CLI-only tasks, csvkit and xlsx2csv provide lightweight alternatives.

Pandas is the go-to for most Excel-to-CSV tasks, with simple read and write methods.

Can I convert multiple sheets at once?

Yes. Load the workbook and iterate over sheet names, exporting each sheet to its own CSV. Pandas can also load all sheets as a dict for fast iteration.

Yes, you can export all sheets in one go by looping through them.

How do I preserve leading zeros in IDs?

Read the relevant columns as strings (dtype=str) or use a dtype map when calling read_excel to prevent numeric truncation.

Treat IDs as strings to keep leading zeros.

How should I handle dates?

Parse dates with pandas.to_datetime after reading, using errors='coerce' to handle invalid values. This keeps date columns usable in CSV.

Parse dates explicitly to avoid misinterpreting dates.

Are there CLI alternatives?

Yes. Tools like csvkit's in2csv or xlsx2csv can convert Excel sheets to CSV from the command line, ideal for automation.

Yes, CLI tools exist for quick conversions.

What about encoding issues?

Prefer UTF-8 when writing CSVs and ensure source encodings are properly read to avoid garbled text.

UTF-8 encoding is recommended.

Main Points

  • Use pandas to read Excel files efficiently
  • Export each sheet to CSV for clarity
  • Preserve data types with dtype and to_datetime
  • Validate outputs with quick checks
  • CLI options enable fast automation

Related Articles