Apache Commons CSV + Maven: A Practical Guide for Java

A comprehensive, code-rich guide to using Apache Commons CSV with Maven for reading and writing CSV data in Java, including headers, delimiters, and best practices.

MyDataTables
MyDataTables Team
·5 min read
CSV + Maven - MyDataTables
Photo by StockSnapvia Pixabay
Quick AnswerSteps

To use Apache Commons CSV with Maven, declare the commons-csv dependency in your pom.xml, then read or write CSV data with CSVParser and CSVPrinter. Start by choosing a delimiter (comma by default), enabling header support if present, and handling quotes and escapes correctly. This approach keeps parsing robust, testable, and easy to extend with custom formats.

What is Apache Commons CSV and why use it with Maven

Apache Commons CSV provides a simple, robust API for reading and writing CSV data in Java. It handles common edge cases like quoted fields, embedded newlines, and escaped delimiters so you don’t manually parse lines. When you combine it with Maven, you gain repeatable builds and centralized dependency management across teams. The canonical approach is to declare a dependency in pom.xml and then use a few concise classes to parse or generate CSV. In this section, we review core concepts and show a minimal end-to-end example that emphasizes readability and maintainability.

Java
import org.apache.commons.csv.CSVFormat; import org.apache.commons.csv.CSVParser; import org.apache.commons.csv.CSVRecord; import java.nio.file.Paths; import java.nio.file.Path; import java.io.IOException; public class ReadCsvExample { public static void main(String[] args) throws IOException { Path path = Paths.get("data/products.csv"); try (CSVParser parser = CSVFormat.DEFAULT .withFirstRecordAsHeader() .withIgnoreSurroundingSpaces() .parse(java.nio.file.Files.newBufferedReader(path))) { for (CSVRecord record : parser) { String id = record.get("id"); String name = record.get("name"); String price = record.get("price"); System.out.printf("Product %s: %s costs %s%n", id, name, price); } } } }
  • Core concepts: CSVFormat configuration, header handling, and safe resource management.
  • Variations: use withFirstRecordAsHeader() for header rows, or parse without headers and access by index.

Maven setup: dependency management

Before you can parse or write CSVs, you must add the Apache Commons CSV dependency to your Maven project. The dependency coordinates keep your project aligned with the library across environments and teams. In many teams, a property-driven version is preferred to simplify upgrades:

XML
<dependencies> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-csv</artifactId> <version>${commons.csv.version}</version> </dependency> </dependencies>
  • The version uses a property so you can upgrade in one place. Check your organization’s policy for version management and pinning.
  • If you prefer explicit versions, replace the property with a specific version like 1.x.y, ensuring compatibility with your Java runtime.

In addition, verify that your Maven project is using a compatible Java version and that your IDE refreshes dependencies when pom.xml changes.

Reading CSV data with headers and without headers

Reading CSV data is straightforward once you configure the format. If your file includes a header row, you should enable header mapping to access fields by name. If there is no header, access fields by index. The following examples show both approaches:

Java
// With headers try (CSVParser parser = CSVFormat.DEFAULT .withFirstRecordAsHeader() .parse(java.nio.file.Files.newBufferedReader(Paths.get("data/products.csv")))) { for (CSVRecord record : parser) { String id = record.get("id"); String name = record.get("name"); System.out.printf("%s - %s%n", id, name); } } // Without headers try (CSVParser parser = CSVFormat.DEFAULT .withSkipHeaderRecord() .parse(java.nio.file.Files.newBufferedReader(Paths.get("data/products_no_header.csv")))) { for (CSVRecord record : parser) { String a = record.get(0); String b = record.get(1); System.out.println(a + ":" + b); } }
  • Notes: withFirstRecordAsHeader() binds column names; withSkipHeaderRecord() treats the first row as data. If you mix quoted fields and embedded newlines, CSVFormat handles these correctly when configured.

Writing CSV with CSVPrinter and headers

Writing CSV data is equally ergonomic with CSVPrinter. You can write headers once and then stream rows, which is ideal for log-like outputs or exporting large datasets. The example shows both: printing to a file and to a string in memory for small reports.

Java
import org.apache.commons.csv.CSVFormat; import org.apache.commons.csv.CSVPrinter; import java.nio.file.Files; import java.nio.file.Paths; import java.io.IOException; import java.util.Arrays; public class WriteCsvExample { public static void main(String[] args) throws IOException { Path path = Paths.get("data/output.csv"); try (CSVPrinter printer = new CSVPrinter(Files.newBufferedWriter(path), CSVFormat.DEFAULT.withHeader("id", "name", "price"))) { printer.printRecord(101, "Widget", 9.99); printer.printRecord(102, "Gadget", 14.5); } } }
  • The withHeader() variant writes the header row; you can omit it if you already have headers in your source. CSVPrinter supports printing complex records with quotes and escapes automatically.

Custom formats: delimiters, quotes, and escaping

Apache Commons CSV supports customizing delimiters, quote characters, and escape characters to handle unusual data formats. If you consume CSV from systems that use semicolons or pipes, you can configure the formatter accordingly. This flexibility also helps when exporting data to external tools that expect specific formats.

Java
CSVFormat format = CSVFormat.DEFAULT .withDelimiter(';') .withQuote('|') .withEscape('\\') .withRecordSeparator("\n"); try (CSVPrinter printer = new CSVPrinter(Files.newBufferedWriter(Paths.get("data/delimited.csv")), format)) { printer.printRecord(1, "Alice", 12.3); printer.printRecord(2, "Bob", 4.56); }
  • Delimiter control is essential for interoperability. Depending on your data source, you may also need to trim whitespace or ignore surrounding spaces.

Streaming large CSV files to avoid OOM and improve performance

When CSV files are large, loading the entire file into memory is impractical. Streaming parsing with a buffered reader and a streaming API is the preferred approach. You can process each record as it arrives, perform transformations, and write results incrementally. This approach reduces peak memory usage and makes the pipeline more resilient to data size.

Java
try (CSVParser parser = CSVFormat.DEFAULT .withFirstRecordAsHeader() .parse(Files.newBufferedReader(Paths.get("data/large.csv")))) { for (CSVRecord record : parser) { // Process each row on the fly String status = record.get("status"); // ... business logic } }
  • For writing, consider streaming builders or flushing intermittently to a destination rather than buffering all rows in memory.
  • If you must accumulate results, use a bounded collection with backpressure to avoid memory pressure.

Testing and validation patterns for CSV parsing and writing

Tests are crucial to ensure your CSV handling remains correct across schema changes and format variants. Use a mix of unit tests and property-based tests to cover edge cases: quoted fields, embedded newlines, empty rows, and mixed delimited formats. Validate both parsed values and the generated CSV string against expected outputs.

Java
import org.apache.commons.csv.CSVFormat; import org.apache.commons.csv.CSVParser; import org.apache.commons.csv.CSVRecord; import org.junit.jupiter.api.Test; import static org.junit.jupiter.api.Assertions.*; import java.nio.file.Paths; public class CsvTest { @Test public void testParseWithHeader() throws Exception { try (CSVParser p = CSVFormat.DEFAULT .withFirstRecordAsHeader() .parse(java.nio.file.Paths.get("data/products.csv"))) { assertTrue(p.iterator().hasNext()); } } }
  • Tests verify behavior under header presence, escaping, and large rows.
  • Consider snapshot tests for complex records to detect regressions.

Common pitfalls and debugging tips for Apache Commons CSV usage

Even experienced developers encounter pitfalls when parsing CSV data. Common issues include assuming a fixed number of columns, mishandling quotes, or ignoring missing headers. A robust approach is to enable header mapping and validate records against a schema. When debugging, print representative samples and the full header map to confirm field names.

Java
try (CSVParser parser = CSVFormat.DEFAULT .withFirstRecordAsHeader() .withIgnoreEmptyLines() .parse(Files.newBufferedReader(Paths.get("data/sample.csv")))) { for (CSVRecord record : parser) { // Quick validation: ensure required fields exist String id = record.get("id"); if (id == null || id.isEmpty()) { System.err.println("Missing id on line " + record.getRecordNumber()); } } }
  • If you encounter parsing errors, enable verbose logging for the CSV library and inspect the failing lines to determine whether the issue is a format mismatch or corrupted data.

Practical tips for production use and maintenance

In production, prefer a single source of truth for your CSV format: standardized delimeters, consistent header names, and explicit quoting rules. Keep your code resilient by handling exceptions gracefully, and document the expected CSV schema in your repository. Regularly refresh dependencies and test against representative data samples to catch regressions early.

How to integrate Apache Commons CSV into a larger data pipeline

CSV processing often sits at the edge of data pipelines. A common pattern is to place the CSV parsing component behind a simple interface that accepts a path or stream and returns a list of domain objects or a streaming iterator. This isolation makes it easier to swap the underlying CSV library or add additional data transformations later, without touching downstream components.

Java
public interface CsvReader<T> { Iterable<T> read(Path path) throws IOException; } public class Product { String id; String name; double price; // constructor, getters, setters } public class ProductReader implements CsvReader<Product> { @Override public Iterable<Product> read(Path path) throws IOException { // Implement using CSVParser to map to Product } }
  • This approach supports testability and clean separation of concerns in data workflows.

Steps

Estimated time: 60-90 minutes

  1. 1

    Create a new Maven project

    Generate a small Maven project to host CSV parsing code. Initialize basic package structure and a main class you will expand.

    Tip: Use the quickstart archetype to save boilerplate.
  2. 2

    Add commons-csv dependency

    Add the Apache Commons CSV dependency to pom.xml using a version property to simplify upgrades across teams.

    Tip: Coordinate with your build team to align the version policy.
  3. 3

    Prepare sample CSV data

    Create a sample CSV under data/ with headers to exercise read/write paths and test edge cases.

    Tip: Include quotes and embedded newlines to test robustness.
  4. 4

    Implement CSV reading

    Write a small class that uses CSVFormat.DEFAULT.withFirstRecordAsHeader() to parse records by header name.

    Tip: Wrap IO in try-with-resources to guarantee closure.
  5. 5

    Implement CSV writing

    Add a CSVPrinter-based writer that outputs headers and rows with proper quoting.

    Tip: Use withHeader to ensure output schema matches input.
  6. 6

    Run and verify

    Build, run, and compare parsed results against the expected values in your test data.

    Tip: Run dependency:tree first to confirm dependency resolution.
Pro Tip: Always close CSVParser and CSVPrinter resources with try-with-resources.
Warning: For large files, stream data row-by-row instead of loading entire content into memory.
Note: Prefer explicit headers in CSVFormat to avoid misaligned fields after edits.

Prerequisites

Required

  • Required
  • Required
  • Basic knowledge of Java and Maven
    Required
  • An IDE or code editor (IntelliJ IDEA, Eclipse, VS Code)
    Required
  • Access to the internet to fetch dependencies
    Required

Optional

  • Optional: JUnit or similar test framework for validation
    Optional

Commands

ActionCommand
Create new Maven projectGenerates a basic Java project structure suitable for CSV work
Build and packageCompiles sources and creates a runnable JAR in target/
List dependenciesVerify transitive dependencies and ensure compatibility
Run testsExecute unit tests for CSV parsing logic
Run a Java class from the built artifactManual testing of CSV processing in a running app

People Also Ask

What is Apache Commons CSV in a sentence?

Apache Commons CSV is a Java library that simplifies reading and writing CSV data with robust handling for headers, quotes, and edge cases. It integrates nicely with Maven for dependency management and project builds.

Apache Commons CSV is a Java library that simplifies CSV parsing and writing, especially with headers and complex fields.

How do I add Apache Commons CSV to a Maven project?

Include the commons-csv dependency in pom.xml, preferably using a version property to simplify upgrades across environments. Ensure Maven can reach the repository to download the artifact.

Add the commons-csv dependency to your Maven pom.xml and refresh dependencies.

Can I parse CSV files with or without a header row?

Yes. Use withFirstRecordAsHeader() when a header exists; otherwise parse by column index. This makes the code resilient to format variations.

Yes, you can parse with headers or by index depending on the file.

How do I customize delimiters or quotes?

Configure CSVFormat withDelimiter(char) and withQuote(char) as needed. This allows interoperability with non-standard CSV formats.

Adjust the delimiter and quote characters in the CSVFormat configuration.

Is Apache Commons CSV suitable for large datasets?

Yes, with streaming parsing and writing. Avoid loading all data into memory; process records incrementally to conserve memory.

Yes, but stream data and avoid loading everything at once.

Can I map CSV rows to POJOs directly?

You can map rows to POJOs by reading each CSVRecord and constructing objects, or use a helper library to bind fields to object properties.

You can map rows to Java objects by reading records and populating fields.

Main Points

  • Add the commons-csv dependency via Maven
  • Configure CSVFormat for headers, delimiters, and quoting
  • Parse with CSVParser and access by header name
  • Write with CSVPrinter using a header row
  • Test edge cases like embedded newlines and quotes

Related Articles