CISC220, Summer 2006, Project 2b

Introduction

In Project 2, you will build two classes:
one to represent an object, and another to represent a list of those objects

This project reviews the following concepts from CISC181:

classes
linked lists
working with dynamic memory (allocation from the heap)
the "big-3" (copy constructor, overloaded = operator, and destructor)
reading data from an input file

In addition, this project introduces some (possibly) new concepts:

A "test-driven development" approach

The basic class will represent one line of data
from one file from the Lahman Baseball Database (v5.3)

You will start by implementing a class that represents one line from one file from the Lahman Baseball Database. This file (proj2a.html) covers only this initial step.

A local copy of this database is available in the following places:

the following directory on strauss: /www/htdocs/CIS/restricted/data/baseball/lahman53
from on campus, at the restricted web address http://www.udel.edu/CIS/restricted/data/baseball/lahman53
from off campus, at the same address (via UD Proxy)

Getting Started

Step 1: Look at the sample code for the file Master.csv

In the directory proj2/step01, I've provided some sample code for a class that represents one line from the file Master.csv.

If you look at the first few lines of this file, they look like this:

lahmanID,playerID,managerID,hofID,birthYear,birthMonth,birthDay,birthCountry,birthState,birthCity,deathYear,deathMonth,deathDay,deathCountry,deathState,deathCity,nameFirst,nameLast,nameNote,nameGiven,nameNick,weight,height,bats,throws,debut,finalGame,college,lahman40ID,lahman45ID,retroID,holtzID,bbrefID
1,aaronha01,,aaronha01h,1934,2,5,USA,AL,Mobile,,,,,,,Hank,Aaron,,Henry Louis,"Hammer,Hammerin' Hank,Bad Henry",180,72,R,R,4/13/1954,10/3/1976,,aaronha01,aaronha01,aaroh101,aaronha01,aaronha01
2,aaronto01,,,1939,8,5,USA,AL,Mobile,1984,8,16,USA,GA,Atlanta,Tommie,Aaron,,Tommie Lee,,200,73,R,R,4/10/1962,9/24/1971,,aaronto01,aaronto01,aarot101,aaronto01,aaronto01
3,aasedo01,,,1954,9,8,USA,CA,Orange,,,,,,,Don,Aase,,Donald William,,210,75,R,R,7/26/1977,10/3/1990,Cal St. Fullerton,aasedo01,aasedo01,aased001,aasedo01,aasedo01
...

The first line contains a comma separated list of all the fields in the file. For the most part, you should choose these field names as the private data members of your class. (We'll note some exceptions below.)

You should also look at the file readme53.txt (via proxy). Note that it contains descriptions of each field in the file Master.csv (see Section 2.1).

Now look at the file baseballMasterTest.cc. Note that this file uses test-driven development (including the class RunTests_C illustrated in lecture on 06/19 and 06/20) to test out a class that corresponds to one line from this file.

Your job in Step 1

Your job in this step is to understand how this program works, including examining the following source code files:

Because of the use of "composition" (explained in the next section), you'll also need to look at the files:

Note the use of Composition ("has-a" relationships)

Note that the BaseballMaster_C class (specified in the file baseballMaster.h) uses composition, which in Object-Oriented Design corresponds to the so-called "has-a" relationship between two objects. Note that rather than having six data members to represent various aspects of a players birth, namely:

birthYear,birthMonth,birthDay,birthCountry,birthState,birthCity

we instead store only one data member, birth, that is of type Event_C *. The data member birth is a pointer to an Event_C object. The Event_C object, in turn, includes a Date_C object with month/day/year, and then pointers to fields for the country, city, and state of that birth.

Structuring things in this way saves us a lot of time, because we can reuse this structure for the fields relating to a players death. Instead of

deathYear,deathMonth,deathDay,deathCountry,deathState,deathCity

we just have the data member death.

The OOP terminology for this: composition, "has-a"

This technique is called "composition", and it allows us to have just two "get" functions, i.e. getBirth() and getDeath(), instead of having to write twelve get functions, i.e. getBirthYear(), getBirthMonth(), etc.

In the language of object oriented programming, composition is called a "has-a" relationship. That is, we say that

one record from the Master file "has-a" birth event
one birth event "has-a" date
one record from the Master file "has-a" death event
one death event "has-a" date

Step 2: Choosing a topic

Once you have understood the code from step 1, now you are ready to get started on your own code. You need first to choose a topic. Look at the file readme53.txt (via proxy) again, and look at the list of data files other than the master file (sections 2.2 through 2.21).

Your job in this step is to choose one of these as the topic of your project 2. Try to make sure you understand the data in the file you choose before you select it. Look at the actual data for that file also by consulting the data directory (via proxy)

Getting help with the baseball background

If you are not very familiar with baseball, you may need to do a bit of background research to understand the table you select. I'll set up a wiki page so that you can ask for help from those in the class that are more familiar with baseball statistics.

Once you have chosen one of these files, record your selection on the Wiki, on the page:

http://jaguar.cis.udel.edu:8051/220wiki/Wiki.jsp?page=06J.Proj2

Note the additional rules on that page also.

Step 3: Writing the code

My suggestion is that you proceed like this:

Start with the test program, i.e a program similar to baseballMasterTest.cc
1. Copy in one line from your file as a string literal called inputLine1, folowing the example.
2. Write a call to the constructor that takes an input line and constructs an instance of the object.
3. Write a call to a get function to get the value of the first field only
4. Write a call to the assertEquals() function to test that the value was set correctly.
Now, implement just enough of your baseballXxxx.h and baseballXxxx.cc files so that your test program compiles, but does not yet pass the test. For example, implement the constructor and the get function, but make the get function return a bogus value.
Then, correct the constructor and the get function so that it passes the test for that field, i.e.
1. Add the code into the constructor that calls strtok (or strtokWithQuotedStrings) to parse out the field from the file.
2. Add a private data member for the field, and code in the constructor to set that field
3. Add the code into the get function for that member that returns the correct value.
4. Make sure the code now passes the test

Now repeat those steps until you have code for every field in your file.

Step 4: Understand the Data Structure we are going to build

Look at the following diagram. It corresponds to a data structure for a class Acct_C, and a class AcctList_C. The Acct_C class has only two private data members: an account number (stored as an int), and a name (stored as a C-string, allocated separately on the heap.) The AcctList_C class has only two private data members: a head and a tail pointer that point to the head and tail of a linked lists of "Node" structs. Each struct has only two members: a pointer to an Acct_C object (on the heap), and a pointer to the next node.

Your job is to write code for classes that will build a corresponding structure for a list of the baseball data in the file you selected to work with in Step 2. We'll talk more about this in lecture on Wednesday and Thursday.

Note: you can find versions of the diagram below in PowerPoint, PDF, png and jpg format in the directory proj2/step04

Step 5: A linked list class

In the file acctList.h, you'll find a file that provides an idea about how to proceed with writing the linkedList class. We will also cover this in lecture on 06.21, so consult the lecture notes on the Wiki for that day, and the lecture code directory.

What we want to write is a main program that takes the filename on the command line, and then presents the user with four menu options:

f: find record
l: list records
s: summarize list
q: quit

The program will start by reading all the records from that file into a linked list. Since the files are already sorted, we won't worry about creating a sorted list—we'll maintain the order from the file.

Note that when reading the files, you'll want to skip over the first line in the file (since it contains field descriptions rather than actual data.)

Then allow to search for records by the primary key, list the records, summarize the list (with a count of the records), and quit.

For your list option, don't try to print all the fields---just print a sample of the fields---as many as will fit comfortably in 80 characters wide. Be sure, though, to include the primary key.

If you aren't sure what the primary key is, ask!

Step 6: Finishing up: what you must turn in

To finish up, you need all of the following:

A class for your data file (.h and .cc files). That class should contain (at least)

a constructor that converts one line in the data file (as a const char * const inputLine) to a new object
get functions for all the fields in the file
a print function (that prints the primary key, and the most important fields, up to 80 characters worth)
the "big-three"
an overloaded << operator (not as a member function, and not as a friend) that calls your print member function
All char * fields that can vary in length should be allocated as separate space on the heap (as variable length C-strings)

A test program (in the style of baseballMasterTest.cc) that uses the RunTests_C class to test whether your constructor and get functions work
Two test programs (in the style of baseballReadFileTest..cc and baseballReadFileTest2.cc) to test whether reading from the file works.
A class for a linked list for your data file (.h and .cc files). That class needs
- a constructor (default constructor) that takes no parameters and creates an empty list.
- the "big-three"
- a print member function that calls your print function for the object class, in a loop that traverses the entire list
- a get function for the count of the linked list
- an overloaded << operator (not as a member function, and not as a friend) that calls your print member function
A main program that reads the entire file into a list, and presents the user with a menu of options
A small test file (you can use the head unix command to get the first 6 lines of the file (header line and 5 data lines).
A Makefile that compiles all your code, and runs all your test programs (it does not run the main program with the menu)
A script that

shows you doing a make clean and make all (that should run all your tests).
Then, test the main program that contains the menu of options, first on the short file (in that test, run the list option).
In your script, cat all the source files (.h, .cc, Makefile) you wrote yourself or modified, and cat the "short" data file. You do NOT need to cat the .h and .cc files that were just provided for you (and that you didn't write yourself, or modify.)
You do not have to cat the "long" data file in your script
Also include a test, running your main program with the menu on the "long" file. Do not include a "list" option for the long data file in your script, but you should show that the "find" option works on the long data file.

A tar file including your entire directory structure (including the provided source code that you used), after a make clean.

CISC220, Summer 2006, Project 2b

Introduction

In Project 2, you will build two classes:
one to represent an object, and another to represent a list of those objects

The basic class will represent one line of data
from one file from the Lahman Baseball Database (v5.3)

Getting Started

Step 1: Look at the sample code for the file Master.csv

Your job in Step 1

Note the use of Composition ("has-a" relationships)

The OOP terminology for this: composition, "has-a"

Read more about composition on p.80-85 of your Malik textbook!

Step 2: Choosing a topic

Getting help with the baseball background

Step 3: Writing the code

Step 4: Understand the Data Structure we are going to build

Step 5: A linked list class

Step 6: Finishing up: what you must turn in

End of file proj2b, for Project 2, CISC220, 06J

CISC220, Summer 2006, Project 2b

Introduction

In Project 2, you will build two classes: one to represent an object, and another to represent a list of those objects

The basic class will represent one line of data from one file from the Lahman Baseball Database (v5.3)

Getting Started

Step 1: Look at the sample code for the file Master.csv

Your job in Step 1

Note the use of Composition ("has-a" relationships)

The OOP terminology for this: composition, "has-a"

Read more about composition on p.80-85 of your Malik textbook!

Step 2: Choosing a topic

Getting help with the baseball background

Step 3: Writing the code

Step 4: Understand the Data Structure we are going to build

Step 5: A linked list class

Step 6: Finishing up: what you must turn in

End of file proj2b, for Project 2, CISC220, 06J

In Project 2, you will build two classes:
one to represent an object, and another to represent a list of those objects

The basic class will represent one line of data
from one file from the Lahman Baseball Database (v5.3)