CSV File Read in Java: A Practical Guide for Developers

Learn how to read CSV files in Java using plain I/O and libraries like OpenCSV and Apache Commons CSV. This guide covers encoding, delimiters, headers, streaming for large files, and mapping rows to Java objects with real code samples.

MyDataTables
MyDataTables Team
·5 min read
CSV in Java - MyDataTables
Photo by JanBabyvia Pixabay
Quick AnswerDefinition

Reading a CSV file in Java is a common task for data ingestion and processing. This guide covers plain Java approaches with BufferedReader and modern libraries to reliably parse CSV, handle quotes, delimiters, and encodings, and map rows to POJOs or records for further processing. You will learn practical patterns suitable for large files and streaming data.

Understanding the CSV reading landscape in Java

According to MyDataTables, the topic csv file read in java is a foundational skill for data pipelines and analytics. This section outlines why CSV remains a first-class data interchange format in Java applications, especially when integrating with databases, message queues, or analytics workflows. You’ll see how different approaches affect performance, memory usage, and error handling. In practice, teams choose between a naive line-split parser for small datasets and streaming-based readers for large files to keep memory footprints predictable. The MyDataTables team emphasizes starting with a clear contract on encoding and delimiters to avoid subtle bugs as data grows.

Java
import java.io.*; import java.nio.file.*; import java.util.*; public class CsvIntro { public static void main(String[] args) throws IOException { Path path = Paths.get("data.csv"); try (BufferedReader br = Files.newBufferedReader(path)) { String line; while ((line = br.readLine()) != null) { String[] cols = line.split(","); System.out.println(Arrays.toString(cols)); } } } }

This naive approach is quick for tiny files but falls apart with quoted fields, embedded delimiters, or multi-line records.

Plain Java approach: read with BufferedReader

A straightforward way to read a CSV in Java is to use a BufferedReader with a controlled loop. This technique is portable and dependency-free, ideal for quick scripts or simple utilities. The trade-off is that it requires you to handle parsing details like quoted values and escaping yourself, which can be error-prone for real-world CSVs. The example below shows a minimal pattern, followed by a more robust option later in the article.

Java
import java.io.*; import java.nio.file.*; import java.util.*; public class PlainCsvReader { public static void main(String[] args) throws IOException { Path path = Paths.get("data.csv"); try (BufferedReader br = Files.newBufferedReader(path)) { String header = br.readLine(); // optional: header row String line; while ((line = br.readLine()) != null) { String[] fields = line.split(","); // Basic processing if (fields.length > 0) { System.out.println("First field: " + fields[0]); } } } } }

Key pitfalls: this method treats every comma as a delimiter and ignores quotes. For data with quoted fields, commas inside quotes, or newlines inside a field, this approach will misparse records.

Reading with a CSV library: OpenCSV example

Using a dedicated CSV library like OpenCSV dramatically reduces parsing errors and handles edge cases such as quoted fields and embedded newlines. This approach is robust for production code and easy to maintain. You’ll typically add the library as a dependency and then read rows with a simple loop or a mapping strategy.

Java
import com.opencsv.CSVReader; import java.io.FileReader; import java.util.Arrays; public class OpenCsvExample { public static void main(String[] args) throws Exception { try (CSVReader reader = new CSVReader(new FileReader("data.csv"))) { String[] row; while ((row = reader.readNext()) != null) { System.out.println(Arrays.toString(row)); } } } }

Notes:

  • Add dependency: OpenCSV in your build tool configuration
  • This approach handles quoted fields and escaping for you

Robust parsing with Apache Commons CSV

Apache Commons CSV provides a flexible API for parsing CSV files with headers, custom delimiters, and strict validation. It’s a good choice when you want to express your CSV contract clearly and reuse parsers across projects. The example below demonstrates reading with headers and mapping to a record-like structure.

Java
import org.apache.commons.csv.CSVFormat; import org.apache.commons.csv.CSVParser; import org.apache.commons.csv.CSVRecord; import java.nio.file.Files; import java.nio.file.Path; import java.io.Reader; public class CommonsCsvDemo { public static void main(String[] args) throws Exception { Path path = Path.of("data.csv"); try (Reader reader = Files.newBufferedReader(path)) { CSVParser csvParser = new CSVParser(reader, CSVFormat.DEFAULT.withFirstRecordAsHeader()); for (CSVRecord record : csvParser) { String name = record.get("Name"); String age = record.get("Age"); System.out.println(name + " - " + age); } } } }

Dependencies:

  • Commons CSV dependency for your build tool

Handling large CSV files: streaming and memory considerations

When CSV files grow large, loading the entire dataset into memory can be risky. Streaming readers process one line at a time, keeping a small memory footprint. Java’s NIO and streams allow you to process records on the fly, reducing GC pressure and improving latency in data pipelines. Combine a streaming API with proper error handling to skip malformed lines without stopping the entire job.

Java
import java.nio.file.*; import java.nio.charset.StandardCharsets; import java.io.IOException; import java.util.stream.Stream; public class StreamingCsv { public static void main(String[] args) throws IOException { Path path = Paths.get("large.csv"); try (Stream<String> lines = Files.lines(path, StandardCharsets.UTF_8)) { lines.skip(1).forEach(line -> { String[] cols = line.split(","); // Basic processing for each line System.out.println("cols=" + Arrays.toString(cols)); }); } } }

Real-world streaming should use a library that supports streaming CSV records to avoid splitting issues. The key is to minimize memory usage while ensuring you can recover from bad lines without terminating the job.

Encoding, delimiters, and headers: handling real-world CSVs

CSV files come in many flavors: different delimiters (comma, semicolon), different encodings (UTF-8, UTF-16), and optional header rows. A robust reader should be explicit about encoding and use a parser configured for the expected format. The following example demonstrates reading with UTF-8 and a custom delimiter, with a header line processed as data or skipped as needed.

Java
import java.io.*; import java.nio.file.*; import java.util.*; public class CustomDelimiter { public static void main(String[] args) throws IOException { Path path = Paths.get("data.csv"); try (BufferedReader br = Files.newBufferedReader(path, java.nio.charset.StandardCharsets.UTF_8)) { String header = br.readLine(); // skip header if present String line; while ((line = br.readLine()) != null) { String[] fields = line.split(";"); // semicolon-delimited System.out.println(Arrays.toString(fields)); } } } }

If you expect BOMs or non-UTF8 inputs, wrap the reader with appropriate decoders and test with representative samples.

Mapping CSV rows to Java objects: POJOs and Records

One of the main reasons to parse CSV is to map each row to a Java object. You can implement a simple POJO (or a Java 14+ record) and populate it inside the read loop. This pattern improves type safety and maintainability, especially when downstream components expect strongly-typed data.

Java
import java.io.*; import java.nio.file.*; import java.util.*; public class CsvToPojo { static class Person { String name; int age; String email; Person(String n, int a, String e) { this.name = n; this.age = a; this.email = e; } } public static void main(String[] args) throws IOException { Path path = Paths.get("data.csv"); List<Person> people = new ArrayList<>(); try (BufferedReader br = Files.newBufferedReader(path)) { br.readLine(); // skip header String line; while ((line = br.readLine()) != null) { String[] f = line.split(","); if (f.length >= 3) { Person p = new Person(f[0], Integer.parseInt(f[1]), f[2]); people.add(p); } } } // Print or pass along people.forEach(p -> System.out.println(p.name + " (" + p.age + ")")); } }

For a more modern approach, consider Java records to reduce boilerplate.

Common pitfalls, testing, and best practices

Readers often run into parsing errors when fields contain commas, quotes, or line breaks. Always validate input against a schema and consider using a library for complex data. Unit tests with representative CSV samples help guard against regressions as formats evolve. Finally, avoid mixing parsing strategies in production; pick a single, well-supported approach and document its limitations.

Java
import java.io.*; import java.nio.file.*; import java.util.*; public class CsvTests { // helper test examples fake in-code tests; in real code use a testing framework public static void main(String[] args) throws Exception { // Example: test quoted fields and escaped quotes String sample = "name,city\n" + "Alice,New York"; System.out.println("Sample: " + sample); } }

Remember to keep a clear separation between parsing logic and business rules, and monitor parsing performance in production.

Steps

Estimated time: 60-90 minutes

  1. 1

    Create a Java project

    Initialize a new Java project with your chosen build tool and set source compatibility to at least Java 11.

    Tip: Use a modular project layout to separate IO, parsing, and mapping logic.
  2. 2

    Add dependencies

    Add your CSV parsing library or rely on JDK APIs for simple parsing. Include a test suite and example data.

    Tip: Prefer a library for robustness and future-proofing.
  3. 3

    Implement a reader

    Write a class that reads data.csv using a buffered reader or a library, handling headers and errors.

    Tip: Encapsulate CSV logic behind a small API: readAll(), streamAll().
  4. 4

    Test reading

    Run tests with sample data that includes edge cases: quoted fields, empty fields, multiline entries.

    Tip: Add test data for edge cases; automate tests.
  5. 5

    Extend to POJOs

    Map rows to Java objects or records; implement validation and type conversion.

    Tip: Keep mapping logic isolated from parsing.
Pro Tip: Always specify the encoding when reading files. UTF-8 is the default most teams prefer.
Warning: Avoid using String.split for real CSVs; quoted fields and embedded delimiters require a robust parser.
Note: Test with large files and edge cases to prevent runtime surprises in production.

Prerequisites

Required

Keyboard Shortcuts

ActionShortcut
CopyGeneral editor operationCtrl+C
PasteGeneral editor operationCtrl+V
Find in editorSearch within filesCtrl+F
Build projectBuild using IDE or CLICtrl+F9
Format codeCode formatting in IDECtrl+Alt+L

People Also Ask

Do I need a CSV library to read CSV files in Java?

Not strictly. For simple files you can use basic I/O, but robust parsing with quotes and multi-line fields is best achieved with a library such as OpenCSV or Apache Commons CSV.

You can start with basic I/O, but for real-world CSVs a library is recommended to handle quotes and multi-line fields.

How do I handle quoted fields and embedded commas?

Quoted fields and embedded commas are tricky with plain string splitting. Use a library that implements RFC 4180 parsing or configure a parser to handle quotes and escapes.

Quoted fields with commas need a proper CSV parser library to avoid misreading records.

What encoding should I use when reading CSV?

UTF-8 is the recommended default encoding. Always specify the encoding when creating a reader to avoid platform-specific differences.

UTF-8 is recommended; always specify encoding when opening the file.

How can I map CSV rows to Java objects?

Read each line and convert fields to appropriate types, then instantiate a POJO or a Java record. Consider using a mapping utility to reduce boilerplate.

Map each CSV row to a typed Java object to simplify downstream processing.

How to read large CSV files efficiently?

Use streaming parsing or process lines one-by-one to avoid loading the entire file into memory. Choose a library that supports streaming.

Process lines one by one to keep memory usage low when files are large.

Main Points

  • Choose a parsing approach that matches file size and complexity.
  • Prefer library-based parsing for reliability and simplicity.
  • Map rows to typed objects to improve downstream processing.
  • Test with edge cases like quotes and multi-line fields.
  • Consider encoding and delimiter configurations upfront.

Related Articles