You are on page 1of 30

Reading and Writing Text Files in Java

John Lamertina (Dietel Java 5.0 Chp 14, 19, 29) April 2007

Content

Reading and Writing Data Files (chp 14) String Tokenizer to Parse Data (chp 29) Comma Separated Value (CSV) Files an exercise which applies:
Multi-dimensional

arrays (chp 7) Exception Handling (chp 13) Files (chp 14) ArrayList Collection (chp 19) Tokenizer (chp 29)

Data Hierarchy

Field a group of characters or bytes that conveys meaning Record a group of related fields File a group of related records Record key identifies a record as belonging to a particular person or entity used for easy retrieval of specific records Sequential file file in which records are stored in order by the record-key field

Reading & Writing Files

Java Streams and Files


Each file is a sequential stream of bytes Operating system provides mechanism to determine end of file

End-of-file

marker Count of total bytes in file

Java program processing a stream of bytes receives an indication from the operating system when program reaches end of stream
Reading & Writing Files 4

File - Object - Stream

Java opens file by creating an object and associating a stream with it Standard streams each stream can be redirected
System.in standard input stream object, can be redirected with method setIn System.out standard output stream object, can be redirected with method setOut System.err standard error stream object, can be redirected with method setErr

Reading & Writing Files

Classes related to Files

java.io classes
FileInputStream and FileOutputStream byte-based I/O FileReader and FileWriter character-based I/O ObjectInputStream and ObjectOutputStream used for

input and output of objects or variables of primitive data types File useful for obtaining information about files and directories

Classes Scanner and Formatter


Scanner can be used to easily read data from a file Formatter can be used to easily write data to a file

Reading & Writing Files

File Class

Common File methods


return true if file exists where it is specified isFile returns true if File is a file, not a directory isDirectory returns true if File is a directory getPath return file path as a string list retrieve contents of a directory
exists

Reading & Writing Files

Write with Formatter Class

Formatter class can be used to open a text file for writing


Pass

name of file to constructor If file does not exist, will be created If file already exists, contents are truncated (discarded) Use method format to write formatted text to file Use method close to close the Formatter object (if method not called, OS normally closes file when program exits) Example: see figure 14.7 (p 686-7)
Reading & Writing Files 8

Possible Exceptions
occurs when opening file using Formatter object, if user does not have permission to write data to file FileNotFoundException occurs when opening file using Formatter object, if file cannot be found and new file cannot be created NoSuchElementException occurs when invalid input is read in by a Scanner object FormatterClosedException occurs when an attempt is made to write to a file using an already closed Formatter object
SecurityException

Reading & Writing Files

Read with Scanner Class

Scanner object can be used to read data sequentially from a text file
File object representing file to be read to Scanner constructor FileNotFoundException occurs if file cannot be found Data read from file using same methods as for keyboard input nextInt, nextDouble, next, etc. IllegalStateException occurs if attempt is made to read from closed Scanner object Example: see Figure 14.11 (p 690-1)
Pass
Reading & Writing Files 10

Tokens: Fields of a Record

Tokenization breaks a statement, sentence, or line of data into individual pieces Tokens are the individual pieces
Words

from a sentence Keywords, identifiers, operators from a Java statement Individual data items or fields of a record (that were separated by white space, tab, new line, comma, or other delimiter)
String Tokenizer 11

String Classes
Class java.lang.String Class java.lang.StringBuffer Class java.util.StringTokenizer

String Tokenizer

12

StringTokenizer

Breaks a string into component tokens Default delimiters: \t \n \r \f

space, tab, new line, return, or form feed

Specify other delimiter(s) at construction or in method nextToken:


String delimiter = , \n; StringTokenizer tokens = new StringTokenizer(sentence, delimiter); -or String newDelimiterString = |,; tokens.nextToken(newDelimiterString);

String Tokenizer

13

Example 29.18
import java.util.Scanner; import java.util.StringTokenizer;
public class TokenTest { public static void main (String[] args) { Scanner scan = new Scanner(System.in); System.out.println("Enter a sentence to tokenize and press Enter:"); String sentence = scan.nextLine(); // default delimiter is " \t\n\r\f" String delimiter = " ,\n"; StringTokenizer tokens = new StringTokenizer(sentence, delimiter); System.out.printf("Number of elements: %d\n", tokens.countTokens()); System.out.println("The tokens are:"); while (tokens.hasMoreTokens()) System.out.println(tokens.nextToken()); } }

(Refer to p 1378)
String Tokenizer 14

Comma Separated Value (CSV) Data Files


Fields are separated by commas For data exchange between disparate systems Pseudo standard used by Microsoft Excel and other systems

Comma Separated Values

15

CSV File Format Rules


1. 2.

3.
4.

5.

6.

Each record is one line Fields are separated by comma delimiters Leading and trailing white space in a field is ignored unless the field is enclosed in double quotes First record in a CSV may be a header of field names. A CSV application needs some boolean indication of whether first record is a header. Empty fields are indicated by consecutive comma delimiters. Thus every record should have the same number of delimiters Fields with embedded commas must be enclosed in double quotes

For more information: http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm


Comma Separated Values 16

CSV Format vs StringTokenizer

StringTokenizer with a comma delimiter will read most CSV files, but does not account for empty fields or a quoted field with embedded commas:
Empty

fields in a CSV file are indicated by consecutive commas. Example: 123, John ,, Doe (Middle Name field is blank) Fields with embedded commas are enclosed in quotes. Example: 456 , King , the Gorilla , Kong
Comma Separated Values 17

Exercise Part 1

Develop and test classes to read and write CSV data files, satisfying the first four CSV File Format Rules (listed on a previous slide). Your completed classes must:
Handle the usual possible file exceptions Read CSV-formatted data from one or more

files into

a single array Print the data array Write data from the array to a single file in CSV format

Test your CSV reader to read and print sample files:


TestFile1.csv TestFile2.csv
Comma Separated Values 18

Multi-dimensional Arrays
Java implements multi-dimensional arrays as arrays of 1-dimensional arrays. Rows can actually have different numbers of columns. Example:

int b[][]; b = new int[ 2 ][ ]; // create 2 rows b[ 0 ] = new int[ 5 ]; // create 5 columns for row 0 b[ 1 ] = new int[ 3 ]; // create 3 columns for row 1

(Refer to p 311-315)
Comma Separated Values 19

Array Dimension: Length

Recall that for a one-dimensional array:

int a[ ] = new int[ 10 ]; int size = a.length;

For a two-dimensional array:

int b[][] = new int[ 10 ][ 20 ]; int size1 = b.length; // number of rows int size2 = b[ i ].length; // number of cols for i-th row

Comma Separated Values

20

TestFile1.cvs
987, 413, 123, 990, Thomas ,Jefferson,7 Estate Ave.,Loretto, PA, 15940 Martha,Washington,1600 Penna Ave,Washington, DC,20002 Martin , Martina ,777 Williams Ct.,Smallville, PA,15990 Shelby, Roosevelt,15 Jackson Pl,NYC,NY, 12345

TestFile2.cvs
ID, FName, LName, StreetAddress, City, State, Zip 123, John ,Dozer,120 Main st.,Loretto, PA, 15940 107, Jane,Washington,220 Hobokin Ave.,Philadelphia, PA,0911 123, William , Adams ,120 Jefferson St.,Johnstown, PA,15904 451, Brenda, Bronson,127 Terrace Road,Barrows,AK, 99789 729, Brainfield,Blanktowm, PA, 16600

Comma Separated Values

21

Exercise Part 2

Develop an application that uses your CSV reader and writer classes Read the test files (or create your own test files) and perform data validity checks by displaying an appropriate error message and the offending record(s):

If any fields are missing If extra fields are found If any records have duplicate IDs If any record has an invalid zip code (i.e. not exactly 5 digits)

Write all records to a single CSV file (i.e. concatenate the multiple test files in a single file)

Comma Separated Values

22

Exercise Part 3 (extra credit)


Extend your classes to be fully compliant with the CSV File Format Rules. Hint: Review some existing CSV Java libraries online.

Comma Separated Values

23

Hints 1.a

CSVFile
+ + + + + + + + + + + boolean hasHeaderRow; String fileName; Scanner input; List<String> records; String data[][]; int numRecords; int maxNumFields; CSVFile(String fileName) CSVFile(boolean headerRow, String fileName) boolean getHasHeaderRow() String getFileName() int getNumRecords() int getMaxNumFields() void getData(String a[][]) void openFile() void readRecords() void parseFields() void printData()
Comma Separated Values 24

Hints 1.b
import import import import import import import import java.io.File; java.util.Scanner; java.io.FileNotFoundException; java.lang.IllegalStateException; java.util.NoSuchElementException; java.util.List; java.util.ArrayList; java.util.StringTokenizer;

Comma Separated Values

25

Hints 1.c
public void openFile() { try { input = new Scanner(new File(fileName)); } catch (FileNotFoundException fileNotFound) {

...

public void readRecords() { // Read all lines (records) from the file into an ArrayList records = new ArrayList<String>(); try { while (input.hasNext()) records.add( input.nextLine() );

...
26

Comma Separated Values

Hints 1.d
public void parseFields() { String delimiter = ",\n"; // Create two-dimensional array to hold data (see Deitel, p 313-315) int rows = records.size(); // #rows for array = #lines in file data = new String[rows][]; // create the rows for the array int row = 0; for (String record : records) { StringTokenizer tokens = new StringTokenizer(record,delimiter); int cols = tokens.countTokens(); data[row] = new String[cols]; // create columns for current row int col = 0; while (tokens.hasMoreTokens()) { data[row][col] = tokens.nextToken(); col++; }

Comma Separated Values 27

Hints 1.e
public static void main (String[] args) { CSVFile file1 = new CSVFile(true,"TestFile1.csv"); file1.openFile(); file1.readRecords(); file1.parseFields(); file1.printData(); String fileData[][] = new String[file1.getNumRecords()][file1.getMaxNumFields()]; file1.getData(fileData);

Comma Separated Values

28

CSV Libraries

http://ostermiller.org/utils/CSV.html http://opencsv.sourceforge.net/

You might also like