File formats

Here is a brief introduction to some of the ways that data files can be formatted.

ASCII vs. binary

ASCII files are ones which can be viewed with simple text editors, such as the "notepad" program. Binary files can only be intelligibly read by a program designed to read the specific format used.

Binary files make up for the inconvenience of needing a special program to read them by being much more compact than ASCII files. For example, an image in an ASCII postscript file requires 3-20 times as much disk space for storage as the same image in a binary image file, such as .gif or .jpeg files.

For small amounts of data or information that users may wish to access directly, an ASCII file is an excellent choice. For this class, it is recommended to that you use ASCII files at all times.

Fixed format files

One of the easiest file formats for a programmer to work with is a fixed format file. This is a file which always contains the same type of information at the same place in the file. Thus the code to read such a file is a simple succession of read statements. The disadvantage of a fixed file format is that it is inflexible, meaning that there is no built in way to change the type of information in the file and still be compatable with the original program.

Record format

Another way to structure a file is as a sequence of records each of which is identified by some sort of a keyword. The program reading the file identifies the keyword then calls the routine to read that particular type of information. The program can be designed to skip to the next keyword if it finds an unknown keyword. This lets several programs put different types of information in the same file format and still be compatible with the older programs (which just skip over the unknown data types).

A variation on a record based file is a heirarchical file format. This is simply a record based file which allows records to be placed inside of other records. This can be a more natural way to store information for some applications.

Counters

A file may or may not include counters. A counter is simply an integer which is read from the file before the data is read. This number tells the program how much data is going to be read next. This may seem like a trivial point, but it affects how the data can be stored in the program. If the program knows how much data is going to be read, it can allocate the correct amount of memory before reading in the data and use a loop to read the data.

Random access files

Many programming languages have a built in facility to create random access files. A random access file is designed to be used when not all of the data is to be read into memory at once. In a typical (sequential) file the data is read or written in precisely the order that it appears in the file. In a random access file, any given record can be read at any time.