Apache Commons CSV + Maven: A Practical Guide for Java
A comprehensive, code-rich guide to using Apache Commons CSV with Maven for reading and writing CSV data in Java, including headers, delimiters, and best practices.

To use Apache Commons CSV with Maven, declare the commons-csv dependency in your pom.xml, then read or write CSV data with CSVParser and CSVPrinter. Start by choosing a delimiter (comma by default), enabling header support if present, and handling quotes and escapes correctly. This approach keeps parsing robust, testable, and easy to extend with custom formats.
What is Apache Commons CSV and why use it with Maven
Apache Commons CSV provides a simple, robust API for reading and writing CSV data in Java. It handles common edge cases like quoted fields, embedded newlines, and escaped delimiters so you don’t manually parse lines. When you combine it with Maven, you gain repeatable builds and centralized dependency management across teams. The canonical approach is to declare a dependency in pom.xml and then use a few concise classes to parse or generate CSV. In this section, we review core concepts and show a minimal end-to-end example that emphasizes readability and maintainability.
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;
import java.nio.file.Paths;
import java.nio.file.Path;
import java.io.IOException;
public class ReadCsvExample {
public static void main(String[] args) throws IOException {
Path path = Paths.get("data/products.csv");
try (CSVParser parser = CSVFormat.DEFAULT
.withFirstRecordAsHeader()
.withIgnoreSurroundingSpaces()
.parse(java.nio.file.Files.newBufferedReader(path))) {
for (CSVRecord record : parser) {
String id = record.get("id");
String name = record.get("name");
String price = record.get("price");
System.out.printf("Product %s: %s costs %s%n", id, name, price);
}
}
}
}- Core concepts: CSVFormat configuration, header handling, and safe resource management.
- Variations: use withFirstRecordAsHeader() for header rows, or parse without headers and access by index.
Maven setup: dependency management
Before you can parse or write CSVs, you must add the Apache Commons CSV dependency to your Maven project. The dependency coordinates keep your project aligned with the library across environments and teams. In many teams, a property-driven version is preferred to simplify upgrades:
<dependencies>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-csv</artifactId>
<version>${commons.csv.version}</version>
</dependency>
</dependencies>- The version uses a property so you can upgrade in one place. Check your organization’s policy for version management and pinning.
- If you prefer explicit versions, replace the property with a specific version like 1.x.y, ensuring compatibility with your Java runtime.
In addition, verify that your Maven project is using a compatible Java version and that your IDE refreshes dependencies when pom.xml changes.
Reading CSV data with headers and without headers
Reading CSV data is straightforward once you configure the format. If your file includes a header row, you should enable header mapping to access fields by name. If there is no header, access fields by index. The following examples show both approaches:
// With headers
try (CSVParser parser = CSVFormat.DEFAULT
.withFirstRecordAsHeader()
.parse(java.nio.file.Files.newBufferedReader(Paths.get("data/products.csv")))) {
for (CSVRecord record : parser) {
String id = record.get("id");
String name = record.get("name");
System.out.printf("%s - %s%n", id, name);
}
}
// Without headers
try (CSVParser parser = CSVFormat.DEFAULT
.withSkipHeaderRecord()
.parse(java.nio.file.Files.newBufferedReader(Paths.get("data/products_no_header.csv")))) {
for (CSVRecord record : parser) {
String a = record.get(0);
String b = record.get(1);
System.out.println(a + ":" + b);
}
}- Notes: withFirstRecordAsHeader() binds column names; withSkipHeaderRecord() treats the first row as data. If you mix quoted fields and embedded newlines, CSVFormat handles these correctly when configured.
Writing CSV with CSVPrinter and headers
Writing CSV data is equally ergonomic with CSVPrinter. You can write headers once and then stream rows, which is ideal for log-like outputs or exporting large datasets. The example shows both: printing to a file and to a string in memory for small reports.
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.io.IOException;
import java.util.Arrays;
public class WriteCsvExample {
public static void main(String[] args) throws IOException {
Path path = Paths.get("data/output.csv");
try (CSVPrinter printer = new CSVPrinter(Files.newBufferedWriter(path),
CSVFormat.DEFAULT.withHeader("id", "name", "price"))) {
printer.printRecord(101, "Widget", 9.99);
printer.printRecord(102, "Gadget", 14.5);
}
}
}- The withHeader() variant writes the header row; you can omit it if you already have headers in your source. CSVPrinter supports printing complex records with quotes and escapes automatically.
Custom formats: delimiters, quotes, and escaping
Apache Commons CSV supports customizing delimiters, quote characters, and escape characters to handle unusual data formats. If you consume CSV from systems that use semicolons or pipes, you can configure the formatter accordingly. This flexibility also helps when exporting data to external tools that expect specific formats.
CSVFormat format = CSVFormat.DEFAULT
.withDelimiter(';')
.withQuote('|')
.withEscape('\\')
.withRecordSeparator("\n");
try (CSVPrinter printer = new CSVPrinter(Files.newBufferedWriter(Paths.get("data/delimited.csv")), format)) {
printer.printRecord(1, "Alice", 12.3);
printer.printRecord(2, "Bob", 4.56);
}- Delimiter control is essential for interoperability. Depending on your data source, you may also need to trim whitespace or ignore surrounding spaces.
Streaming large CSV files to avoid OOM and improve performance
When CSV files are large, loading the entire file into memory is impractical. Streaming parsing with a buffered reader and a streaming API is the preferred approach. You can process each record as it arrives, perform transformations, and write results incrementally. This approach reduces peak memory usage and makes the pipeline more resilient to data size.
try (CSVParser parser = CSVFormat.DEFAULT
.withFirstRecordAsHeader()
.parse(Files.newBufferedReader(Paths.get("data/large.csv")))) {
for (CSVRecord record : parser) {
// Process each row on the fly
String status = record.get("status");
// ... business logic
}
}- For writing, consider streaming builders or flushing intermittently to a destination rather than buffering all rows in memory.
- If you must accumulate results, use a bounded collection with backpressure to avoid memory pressure.
Testing and validation patterns for CSV parsing and writing
Tests are crucial to ensure your CSV handling remains correct across schema changes and format variants. Use a mix of unit tests and property-based tests to cover edge cases: quoted fields, embedded newlines, empty rows, and mixed delimited formats. Validate both parsed values and the generated CSV string against expected outputs.
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVParser;
import org.apache.commons.csv.CSVRecord;
import org.junit.jupiter.api.Test;
import static org.junit.jupiter.api.Assertions.*;
import java.nio.file.Paths;
public class CsvTest {
@Test
public void testParseWithHeader() throws Exception {
try (CSVParser p = CSVFormat.DEFAULT
.withFirstRecordAsHeader()
.parse(java.nio.file.Paths.get("data/products.csv"))) {
assertTrue(p.iterator().hasNext());
}
}
}- Tests verify behavior under header presence, escaping, and large rows.
- Consider snapshot tests for complex records to detect regressions.
Common pitfalls and debugging tips for Apache Commons CSV usage
Even experienced developers encounter pitfalls when parsing CSV data. Common issues include assuming a fixed number of columns, mishandling quotes, or ignoring missing headers. A robust approach is to enable header mapping and validate records against a schema. When debugging, print representative samples and the full header map to confirm field names.
try (CSVParser parser = CSVFormat.DEFAULT
.withFirstRecordAsHeader()
.withIgnoreEmptyLines()
.parse(Files.newBufferedReader(Paths.get("data/sample.csv")))) {
for (CSVRecord record : parser) {
// Quick validation: ensure required fields exist
String id = record.get("id");
if (id == null || id.isEmpty()) {
System.err.println("Missing id on line " + record.getRecordNumber());
}
}
}- If you encounter parsing errors, enable verbose logging for the CSV library and inspect the failing lines to determine whether the issue is a format mismatch or corrupted data.
Practical tips for production use and maintenance
In production, prefer a single source of truth for your CSV format: standardized delimeters, consistent header names, and explicit quoting rules. Keep your code resilient by handling exceptions gracefully, and document the expected CSV schema in your repository. Regularly refresh dependencies and test against representative data samples to catch regressions early.
How to integrate Apache Commons CSV into a larger data pipeline
CSV processing often sits at the edge of data pipelines. A common pattern is to place the CSV parsing component behind a simple interface that accepts a path or stream and returns a list of domain objects or a streaming iterator. This isolation makes it easier to swap the underlying CSV library or add additional data transformations later, without touching downstream components.
public interface CsvReader<T> {
Iterable<T> read(Path path) throws IOException;
}
public class Product {
String id; String name; double price;
// constructor, getters, setters
}
public class ProductReader implements CsvReader<Product> {
@Override
public Iterable<Product> read(Path path) throws IOException {
// Implement using CSVParser to map to Product
}
}- This approach supports testability and clean separation of concerns in data workflows.
Steps
Estimated time: 60-90 minutes
- 1
Create a new Maven project
Generate a small Maven project to host CSV parsing code. Initialize basic package structure and a main class you will expand.
Tip: Use the quickstart archetype to save boilerplate. - 2
Add commons-csv dependency
Add the Apache Commons CSV dependency to pom.xml using a version property to simplify upgrades across teams.
Tip: Coordinate with your build team to align the version policy. - 3
Prepare sample CSV data
Create a sample CSV under data/ with headers to exercise read/write paths and test edge cases.
Tip: Include quotes and embedded newlines to test robustness. - 4
Implement CSV reading
Write a small class that uses CSVFormat.DEFAULT.withFirstRecordAsHeader() to parse records by header name.
Tip: Wrap IO in try-with-resources to guarantee closure. - 5
Implement CSV writing
Add a CSVPrinter-based writer that outputs headers and rows with proper quoting.
Tip: Use withHeader to ensure output schema matches input. - 6
Run and verify
Build, run, and compare parsed results against the expected values in your test data.
Tip: Run dependency:tree first to confirm dependency resolution.
Prerequisites
Required
- Required
- Required
- Basic knowledge of Java and MavenRequired
- An IDE or code editor (IntelliJ IDEA, Eclipse, VS Code)Required
- Access to the internet to fetch dependenciesRequired
Optional
- Optional: JUnit or similar test framework for validationOptional
Commands
| Action | Command |
|---|---|
| Create new Maven projectGenerates a basic Java project structure suitable for CSV work | — |
| Build and packageCompiles sources and creates a runnable JAR in target/ | — |
| List dependenciesVerify transitive dependencies and ensure compatibility | — |
| Run testsExecute unit tests for CSV parsing logic | — |
| Run a Java class from the built artifactManual testing of CSV processing in a running app | — |
People Also Ask
What is Apache Commons CSV in a sentence?
Apache Commons CSV is a Java library that simplifies reading and writing CSV data with robust handling for headers, quotes, and edge cases. It integrates nicely with Maven for dependency management and project builds.
Apache Commons CSV is a Java library that simplifies CSV parsing and writing, especially with headers and complex fields.
How do I add Apache Commons CSV to a Maven project?
Include the commons-csv dependency in pom.xml, preferably using a version property to simplify upgrades across environments. Ensure Maven can reach the repository to download the artifact.
Add the commons-csv dependency to your Maven pom.xml and refresh dependencies.
Can I parse CSV files with or without a header row?
Yes. Use withFirstRecordAsHeader() when a header exists; otherwise parse by column index. This makes the code resilient to format variations.
Yes, you can parse with headers or by index depending on the file.
How do I customize delimiters or quotes?
Configure CSVFormat withDelimiter(char) and withQuote(char) as needed. This allows interoperability with non-standard CSV formats.
Adjust the delimiter and quote characters in the CSVFormat configuration.
Is Apache Commons CSV suitable for large datasets?
Yes, with streaming parsing and writing. Avoid loading all data into memory; process records incrementally to conserve memory.
Yes, but stream data and avoid loading everything at once.
Can I map CSV rows to POJOs directly?
You can map rows to POJOs by reading each CSVRecord and constructing objects, or use a helper library to bind fields to object properties.
You can map rows to Java objects by reading records and populating fields.
Main Points
- Add the commons-csv dependency via Maven
- Configure CSVFormat for headers, delimiters, and quoting
- Parse with CSVParser and access by header name
- Write with CSVPrinter using a header row
- Test edge cases like embedded newlines and quotes