Data Structures

This page provides a review of some of the terminology surrounding variables and data types. Some of these may not be used for this class but are included here for the sake of completeness. The C++ synatax is provided for some of these types.

Elementary data types

Elementary data types are the simplest pieces of data that can be manipulated. With the exeception of the "char" type, the number of bytes used to represent each of these types depends upon the specific computer hardware being used. Here are some of the elementary data types in C++.

char A single byte of information, 8 bits long, which represents a single ASCII character.
int denotes an integer variable which can only contain whole numbers. The designations short int and long int are used to specify integers of a smaller or larger number of bytes respectively. unsigned int denotes an integer which can only be assigned to positive values.
float denotes a floating point number consisting of a number with a decimal point times 10 raised to some power. This is the same as the "real" type in Pascal.
double denotes a floating point number with a higher precision due to using twice as many bytes as are used for a variable of type "float".
unsigned char, in spite of its name, is used to hold a single byte that is to be used purely as 8 binary digits without representing any character, number or address. C and C++ have bitwise operators for manipulating binary data directly just as could be done in assembly language.
word denotes a binary number of the same size as is used for the "int" or "float" types.
long denotes a binary number of the same size as is used for the "long int" or "double" types.
pointers A pointer contains the memory address of where a piece of data is stored. Thus it points to where the data is. A variable that is declared with an asterisk in front of the name is a pointer to the location of a piece of data of that type. For example

	int	*current_month;

creates a variable named "current_month" which could be set to point to an integer without changing that integer (unless you assign a new value to current_month).

user renamed types C++ allows the user to create a new data type which is simply a new name for something else. For example, a string of characters (strings are not predefined) could be defined like this.

	typedef char string[40];

would make the following two statements equivalent

	char	name[40];
	string	name;

If the program must use many strings, the second format would prevent errors due to accidentally making one if them a different length from the rest.

user defined types C++ allows the programmer to create a whole new type of data like this

	enum week_day  {SUN, MON, TUE, WED, THR, FRI, SAT};

	week_day	today;	// create a variable of type week_day

The variable "today" can only be assigned one of seven values (SUN, MON, etc.). The same job could be done with an integer. However, using an integer requires the programmer to put in error checking routines to ensure that an incorrect value is not assigned. By using an "enum" statement we are telling the compiler how to put the error checking in for us.

All of the rest of the data structures discussed in this document are considered "compound data types".

Arrays

An array is just a collection of variables of the same type. The data (called the elements of the array) can be accessed either as a collection or individually like this

	char	first_name[40];

	cout << first_name;	// print the first name
	cout << first_name[0];	// print the first initial

Arrays can be made out of elementary data types or compound data types (for example an array of objects).

Records

A record is a collection of pieces of data that do not have to be of the same type. In C and C++ books, records are often referred to as structures because they are created with the "struct" keyword like this

	struct employee
	  { 
	    char	name[80];
	    long int	SSN;
	    float	salary;
	  };

	employee	new_person;   	// create a variable of type employee

	new_person.salary = 0;		// access data with a period

Unions

A union looks like a record but is actually much different. Defining a record results in setting aside a memory location for each element of the record. A union sets aside only one memory location which may be used for several different types of data (but not all at once).

Here is an example of how this is used. Suppose that you are creating a database for a whole saler. Some manufacturers use catalog numbers which are integers while others use floating point numbers or characters. In order to give the program the ability to use all three you might have statements like this

	union value 
	  {
	    long int	i_value;
	    double	f_value; 
	    char	c_value[8];
	  };

	value  catalog_number;	// create a variable

	catalog_number.i_value = 123;	    // use as an integer
	catalog_number.f_value = 12.34;	    // use as floating point 
	catalog_number.c_value = "BK123";   // use as a string
		// use any one of these but only one for any given variable.

The word "union" could be replaced with the word "struct" and the code would work exactly the same. The difference is that using "struct" results in using 24 bytes of memory while using "union" uses 8 bytes of memory.

This same effect can be obtained, usually more elegantly, by using objects.

Objects

Objects will be covered in far greater detail on other web pages. Here is a quick listing of some of the properties of objects

Objects contain different types of data, similar to a record.
Objects contain the functions which act on their data.
Objects can have data which is only accessable to their functions.
There can be sub-classes of objects which have all of the properties of the parent object with the exception of any new data or function that is defined only for the sub-class. This is called "inheritance".
An object of a given class and an object of one of its sub-classes can have functions with the exact same name that behave differently. This is called "polymorphism".

Linked lists

The last two data structures discussed here, linked lists and trees, are actually ways of using the data types listed above.

It is possible to define a record which contains a pointer to the memory location of another record just like it. Thus a list of records, each pointing to the next, can be made. The pointers are called "links".

At first glance this may seem to serve the same function as having an array of records. However, there are a few important differences.

Once an array is created it cannot be made larger or smaller to accommodate additional or discarded records. A linked list is set up to create new records and add them into the list by simply changing where the links point to.
Alphabetizing an array of records requires copying the entire record, which could be very large, from one spot in the array to another. Ordering a linked list is as easy as changing which link points where. This results in a program that runs much faster.

Trees

A tree is very similar to a linked list. The difference is that in a tree each record may point to several other records. Thus from each record there are several paths to follow. This can be a very natural way to represent the organizational structure of a corporation or a series of yes/no decisions.