lab07, CISC106, Fall 2007

Special note about this lab

Tip for success: skim over the entire lab before starting

This lab contains 12 steps. Here is a general overview.

You will probably find the lab easier to understand and complete if you look over the entire lab before jumping in.

Special note for Conrad's honors sections only:

On this lab, you may work in pairs.

If you work in pairs,

The words above are specific to this lab.

This does NOT establish a precedent for future labs—unless specifically instructed otherwise, always assume that collaboration at the level of working on code together is not permitted, and may constitute academic dishonesty.

Overview

In this lab, we'll cover

Goals

By the time you complete this lab, you should be able to:

  1. Explain how to use cell arrays to store strings of different lengths
  2. Convert a script M-file to a function M-file
  3. Demonstrate how to separate out parts of a comma-separated string using the split option of the regexp function
  4. Write a script that reads data from a file of comma-separated values into columns of data

Step-by-Step Instructions

Step 1: Preliminaries

To prepare for this week's lab, do all the following steps. If you are not sure what to do at any stage, you can consult earlier labs for details:

Because there are lots of files this week, we are dividing them into two directories:

Step 2: A brief introduction to Cell Arrays

Some parts of this step in the lab is a reiteration of some material that you may already have covered in lecture. However, some parts of it are new, or emphasize different aspects of cell arrays you might not yet have seen.

So, while there is a temptation to skim over this, you are encouraged instead, to read in detail and work the examples.

Even though there is nothing to turn in for credit,

So, don't skip over it.

The files you need for this step should be in your ~/cisc106/lab07a directory, so you may want to cd into that directory before proceeding.

Step 2a: (parentheses) vs. {braces}

So, something that you will have to get used to in this lab is seeing the difference between a(1,2) and a{1,2}

The difference is subtle, but crucial, and it can be hard to see on the computer screen. Let's zoom in for a closer look:

a(1,2) vs. a{1,2}

In the first case, we have parentheses: a(1,2). In the second case we have braces: a{1,2}. Braces are "the curly kind".

It can be difficult to see the difference if the font size on your monitor is small, so be sure that you increase the font size on your monitor so that you can clearly see the difference. We'll be using both this week, and discussing the difference between writing one vs. the other.

As long as we are on the subject, we might as well mention brackets too: brackets are the square kind: a = [ 1  2 ];

So, we have:

() {} []
parentheses braces brackets

Learn the difference!

Step 2b: Something regular arrays can't do: a column with strings of different lengths

Suppose you want to create a MATLAB variable containing a name, with your first name above your last name, like this:

name =
John
Ford

So, if your name happens to be John Ford, or Bette Davis, or Austin Powers, an ordinary MATLAB array will work just fine:

>> name = ['John';'Ford']

name =

John
Ford

>> name = ['Bette';'Davis']

name =

Bette
Davis

>> name = ['Austin';'Powers']

name =

Austin
Powers

>>

However, if your name is Phillip Conrad, Terry Harvey, or Dave Saunders, there will be trouble:

>> name = ['Phill';'Conrad'] 
??? Error using ==> vertcat
CAT arguments dimensions are not consistent.
>>


The problem is that, since everything in MATLAB is a matrix, MATLAB wants to create a matrix with the same number of columns in every row. Thus, the two male leads in Fight Club (Brad Pitt, Edward Norton), can both be put into a regular MATLAB array, and even the lead actress with three names, Helena Bonham Carter can all work:

>> name = ['Helena';'Bonham';'Carter']
name =
Helena
Bonham
Carter
>> 
 

But if we want to store Terry Harvey, in this way, we are out of luck.

>> name = ['Terry';'Harvey'] 
??? Error using ==> vertcat
CAT arguments dimensions are not consistent.
>>

Step 2c: But why would I want to do that anyway?

This is really just a simple example of a more general problem—storing a column of strings with different lengths.

Suppose, for example, we have a file called elementNames.txt, containing the first ten elements in the periodic table, and we want to store these in an array:

>> type elementNames.txt

Hydrogen
Helium
Lithium
Beryllium
Boron
Carbon
Nitrogen
Oxygen
Fluorine
Neon

>>

One approach is to use the char function, which pads a character array with spaces, so that every row has the same number of columns. (As a reminder, the ... here means that we are continuing a MATLAB statement across multiple lines.)

>> elements = char('Hydrogen','Helium','Lithium','Beryllium','Boron', ...
'Carbon','Nitrogen','Oxygen','Fluorine','Neon')                       

elements =

Hydrogen 
Helium   
Lithium  
Beryllium
Boron    
Carbon   
Nitrogen 
Oxygen   
Fluorine 
Neon     

>> 

But this is really kind of a hack—an inelegant solution that works, but isn't really what we want.

First, if we ask for element(3), we don't get ans = Lithium. Instead, we get ans = L.

>> elements(3)

ans =

L

>> 

To get Lithium, we have to ask for element(3,:)

>> elements(3,:)

ans =

Lithium  

>> 

Next, if we want to add a new element, it has to be padded out with exactly the right number of spaces. This is super-duper annoying:

>> elements(11)='Sodium ' 
??? In an assignment  A(:) = B, the number of elements in A and B
must be the same.

>> elements(11,:)='Sodium '
??? Subscripted assignment dimension mismatch.

>> elements(11,:)='Sodium   '

elements =

Hydrogen 
Helium   
Lithium  
Beryllium
Boron    
Carbon   
Nitrogen 
Oxygen   
Fluorine 
Neon     
Sodium   

>>                

Fortunately, there is a better way—the cell array.

Step 2d: Cell arrays as a way to store an array of strings of different lengths

Fortunately, MATLAB offers a different kind of array: the cell array.

Where normal arrays in MATLAB are built with the square bracket [] symbol, e.g.:

>> name = ['John';'Ford']   

name =

John
Ford

>> 

The cell array is built with the curly brace symbol {}, e.g.

>> name = {'Julia';'Roberts'}

name = 

    'Julia'
    'Roberts'

>>

Note three crucial differences between the cell array version and the standard version:

Difference number 1: We can store variable length strings

With cell arrays, we can store both Salma Hayek, and Julia Roberts. With regular arrays, Julia is out of luck.

>> name = {'Salma';'Hayek'}  

name = 

    'Salma'
    'Hayek'

>> name = {'Julia';'Roberts'}

name = 

    'Julia'
    'Roberts'

>> name = ['Salma';'Hayek']  

name =

Salma
Hayek

>> name = ['Julia';'Roberts']
??? Error using ==> vertcat
CAT arguments dimensions are not consistent.

>> 
Difference number 2: The cell array has only one column

The regular array for Salma Hayek has two rows, and five columns. But the cell array has only one column.

Compare:

>> clear
>> name = ['Salma';'Hayek']

name =

Salma
Hayek

>> whos
  Name      Size            Bytes  Class    Attributes

  name      2x5                20  char               

>> clear                   
>> name = {'Salma';'Hayek'}  

name = 

    'Salma'
    'Hayek'

>> whos
  Name      Size            Bytes  Class    Attributes

  name      2x1               140  cell               

>> 
Difference number 3: The elements are entire strings, not single characters

If we index into the regular array with name(1), we'll just get the S in Salma. But with the cell array, name(1) gives us the entire string Salma

>> name = ['Salma';'Hayek']

name =

Salma
Hayek

>> 

>> name(1)

ans =

S

>> name = {'Salma';'Hayek'}

name = 

    'Salma'
    'Hayek'

>> name(1)

ans = 

    'Salma'

>> 

This last issue brings us to the real truth about what cell arrays actually are, and invites us to take a deeper look:

Step 3: A deeper look at Cell Arrays

As with Step 2, there is nothing to turn in from Step 3, but exam or quiz questions, and later steps in the lab depend on the stuff you will learn from doing this step. So, don't skip it!

The files you need for this step should be in your ~/cisc106/lab07a directory, so you may want to cd into that directory before proceeding.

Step 3a: What it means to be a cell

Cell arrays in MATLAB are an example of a principle of computer science called indirection. Indirection is a general term for languages features such as:

This is the important point:

Each element of a cell array is actually a pointer to a complete MATLAB matrix.

Let's look at this principle more deeply.

When we write this:

>> element{1}='Hydrogen';
>> element{2}='Helium';
>> element{3}='Lithium';
>>

There are actually two levels at which things are going on.

First, we see that at the cell array level, we have created a 1x3 cell array called element.


>> whos
  Name         Size            Bytes  Class    Attributes

  element      1x3               222  cell               

>> 
 

However, at a deeper level, we have actually created four matrices, not one—because each element in the array element is actually a full matrix in its own right. We can see this by using the cellplot function:

>> cellplot(element);
>> print('-djpeg','element.jpg');                                        
>> 

The resulting plot looks like this:

What we see is that the array elements, in this case, is composed of three "cells", each of which contains a complete matrix.

Thus, if we want to pull out the fourth character of the word Hydrogen, we can write {1} to pull out the first element of the cell array, and then write (4) to pull out the fourth element of the matrix in that cell.

Notice the subtle but crucial difference between the curly braces {} around the 1, and the round parentheses () around the four.

>> element{1}(4)

ans =

r

>> 

Step 3b: Not all cells have to be the same data type

We can build a cell array that has different data types. For example, if we have a file called nflTeams.txt containing data about NFL teams, for example (note that this is typed at the Unix prompt, not the MATLAB prompt:)

> more nflTeams.txt 
Eagles 70000 50 1
Giants 65000 55 3
Cowboys 80000 60 4
> 

This file has a column for the team name, a column for the stadium capacity, a column for the ticket price, and a column for the number of wins so far this season (note: this file was generated on 10/10/2007).

We can save the first line of this file into a cell array, as follows:

>> nfl{1} = 'Eagles';
>> nfl{2} = 70000;   
>> nfl{3} = 50;      
>> nfl{4} = 1;       
>> nfl

nfl = 

    'Eagles'    [70000]    [50]    [1]

>> 

Note that the elements {2}, {3} and {4} are listed with a set of [] around them. This reflects the fact that each of the cells in a cell array is a complete matrix in its own right. That, is, the element nfl{2} is not just the number 70000, but is in fact, a pointer to a 1x1 matrix containing the number 70000.

Step 3c: Cell arrays can have multiple dimensions

Because cell arrays can have multiple dimensions, we can actually store all the data from our nflTeams.txt file into a single cell array:

>> nfl{1,1} = 'Eagles'; 
>> nfl{1,2} = 70000;    
>> nfl{1,3} = 50;       
>> nfl{1,4} = 1;        
>> nfl{2,1} = 'Giants'; 
>> nfl{2,2} = 65000;    
>> nfl{2,3} = 55;       
>> nfl{2,4} = 3;        
>> nfl{3,1} = 'Cowboys';
>> nfl{3,2}= 80000;     
>> nfl{3,3}= 60;        
>> nfl{3,4}= 4;         
>> nfl

nfl = 

    'Eagles'     [70000]    [50]    [1]
    'Giants'     [65000]    [55]    [3]
    'Cowboys'    [80000]    [60]    [4]

>> 
 

A cell plot of the nfl cell array shows the nature of the data contained inside. We can see that all the cells in the first column are strings, of different lengths: Eagles is a 1x6 character array, while Cowboys is a 1x7 character array. Meanwhile, the cells in the second, third, and fourth columns are all 1x1 matrices (scalars) containing individual numbers.

One of our goals in this lab is to learn how to automatically convert a file such as nflTeams.txt into a cell array such as nfl without having to tediously type all the assignments statements:

>> nfl{1,1} = 'Eagles'; 
>> nfl{1,2} = 70000;    
>> nfl{1,3} = 50;       
>> nfl{1,4} = 1;        
>> nfl{2,1} = 'Giants'; 
>> nfl{2,2} = 65000;    
etc... 

Instead, we'd like to be able to just do something like this, and achieve the same effect:

>> nfl = fileToCellArray('nflTeams.txt');
>>

For the time being, though, until we have written a M-file for fileToCellArray() function that can achieve this feat, we can use the script M-file setUpNflCellArray.m, which is in your lab07a directory this week. Try typing in these commands:

>> clear
>> cd ~/cisc106/lab07a
>> setUpNflCellArray 
>> whos
  Name      Size            Bytes  Class    Attributes

  nfl       3x4               830  cell               

>> nfl

nfl = 

    'Eagles'     [70000]    [50]    [1]
    'Giants'     [65000]    [55]    [3]
    'Cowboys'    [80000]    [60]    [4]

>> 

Step 3d: Content indexing vs. Cell indexing

Now that we have the nfl cell array set up, we can show the difference between content indexing and cell indexing.

If you haven't already done so, type the commands at the end of step 3c that set up the nfl cell array:

Then, try the following:

>> nfl(2,1)

ans = 

    'Giants'

>> nfl{2,1}  

ans =

Giants

>>  

Note that we can type both nfl(2,1) and nfl{2,1} and in both cases we learn that the team in row two is the Giants, but there is a subtle difference:

What is going on here?

To look into this more deeply, we can try the same trick with the stadium capacity for the Giants.

>> nfl(2,2)

ans = 

    [65000]

>> nfl{2,2}

ans =

       65000

>> 

So, this is interesting. When we use the notation nfl(2,1) or nfl(2,2), we seem to get extra punctuation, either '' or [] around our result.

We can learn more by doing a whos command after we type each of these:

>> nfl(1,1)

ans = 

    'Eagles'

>> whos
  Name      Size            Bytes  Class    Attributes

  ans       1x1                72  cell               
  nfl       3x4               830  cell               

>> 

Here we see that ans is showing up as a 1x1 cell. But, if we use nfl{1,1} instead, look what we get:

>> nfl{1,1}

ans =

Eagles

>> whos    
  Name      Size            Bytes  Class    Attributes

  ans       1x6                12  char               
  nfl       3x4               830  cell               

>> 

If we use the {} to index into the array, we get an actual 1x6 character string.

In MATLAB, this is called the difference between cell indexing, and content indexing.

One way to understand the difference is this:

When we write nfl(2,1) we get 'Giants', and when we write nfl(2,2) we get [65000]. In both cases, the result is a cell. The actual content is still wrapped up inside a cell. Think of the cell as like a cellophane wrapper that has to be peeled off.

However, when we write nfl{2,1}, we go deeper. We automatically peel off the cellophane wrapper, and get at the content inside. Thus the result is an actual character string, or number, or whatever. Thus, the result of nfl{2,1} is Giants (without the wrapper) and the result of nfl{2,2} is 65000 (again, without the wrapper.)

This also works when making assignments statements. If we want to assign another row of data, we can do it in one of two ways.

We can either assign to the contents of the cells:

>> nfl{4,1}='49ers';
>> nfl{4,2}=70000;  
>> nfl{4,3}=59;     
>> nfl{4,4}=2; 
>> nfl

nfl = 

    'Eagles'     [70000]    [50]    [1]
    'Giants'     [65000]    [55]    [3]
    'Cowboys'    [80000]    [60]    [4]
    '49ers'      [70000]    [59]    [2]

>> 

Or, we can create cells on the right hand of the assignment statement by putting the contents inside a set of braces, and then assign the cell itself:

>> nfl(4,1)={'49ers'}; 
>> nfl(4,2)={70000};  
>> nfl(4,3)={59};   
>> nfl(4,4)={2}; 
>> nfl

nfl = 

    'Eagles'     [70000]    [50]    [1]
    'Giants'     [65000]    [55]    [3]
    'Cowboys'    [80000]    [60]    [4]
    '49ers'      [70000]    [59]    [2]

>> 

What we cannot do is assign content directly into a cell:

>> nfl(4,1)='49ers';  
??? Conversion to cell from char is not possible.

>> nfl(4,3)=70000;  
??? Conversion to cell from double is not possible.

>> 

 

Understand this point, and be prepared to answer questions about it on an exam or quiz. If it is still not clear, go back through these steps, and experiment with adding data to the table, or changing data in the table, until it becomes more clear. You can also ask your TA or instructor to clarify, during lab, lecture, or office hours.

Step 4: Returning more than one value from a function

In MATLAB, we can return more than one value from a function. We'll need this fact later in this lab, so this step is here to review how this feature of MATLAB works. And, as usual, this could show up on an exam or quiz, so don't skip this step.

The files you need for this step should be in your ~/cisc106/lab07a directory, so you may want to cd into that directory before proceeding.

Take a look at the function M-file myFunc.m. Notice that there are two result variables, sum and prod

>> type myFunc.m 
function [sum, prod] = myFunc(a,b) %myFunc returns sum and product of two number % % consumes: two numbers, a and b % produces: their sum and their product % % Examples: % %>> myFunc(3,6) %ans = % 9 %>> [theSum, theProd] = myFunc(3, 4) %theSum = % 7 %theProd = % 12 %>> [x , y ] = myFunc(5,7) %x = % 12 %y = % 35 % % P. Conrad for CISC106, Sect 99, 10/12/2007 sum = a+b; prod = a * b; return; end >>

If we type a function call to myFunc() and do not put anything on the left hand side of the assignment statement, we get only the first of these two result variables, which represents the sum:

 >> myFunc(2,4)           

ans =

     6

>> 
 
 

The same is true if we only put one variable on the left hand side of an assignment statement:

>> x = myFunc(2,4)

x =

     6

>> 
 
 

But, if we provide a matrix containing two result variables on the left hand side of the assignments statement, we can access both output variables:

>> [x y] = myFunc(2,4)

x =

     6


y =

     8

>> 
 
 

In this way, we can return more than one result from a function.

A side note: this is an unusual feature of MATLAB. Most programming languages, including C, C++, and Java only allow one result to be returned from a function. In those languages, there are ways to simulate the effect of returning more than one result from a function, but that's a topic for another day.

Step 5: Reading text from a file into a cell array of strings

In this step, we'll look at reading data from an input file.

There is nothing to turn in from this step, but material could show up on labs/quizzes, and you need it for later steps in the lab.

The files you need for this step should be in your ~/cisc106/lab07a directory, so you may want to cd into that directory before proceeding.

Step 5a: Why we care about reading from files

Knowing how to read from an input file is important, because

So, let's suppose we have a file of data, and we want to read that data into MATLAB. In an earlier lab (lab05) we've already seen that if the data consists entirely of numbers, formatted in columns and rows, we can read that data in using the load command. As a reminder, if katrina.dat contains only rows and columns numbers, we can type:

>> load katrina.dat;
>>

and the result is a MATLAB matrix called katrina, containing all of our data. However, this does not work for files containing a mixture of text and numbers. For example, if our files is input.txt, containing our NFL data:

 >> type input.txt

Eagles 70000 50 1
Giants 65000 55 3
Cowboys 80000 60 4

>> load input.txt
??? Error using ==> load
Unknown text on line number 1 of ASCII file /home/kodos/vol0/hsm/22/pconrad/106/07F/labPrep/lab07/input.txt
"Eagles".

>> 

 

Further, sometimes data is stored in other formats, such as "comma-separated values", or CSV format. Consider the file nflTeams.csv:

>> type nflTeams.csv
 
Eagles,70000,50,1
Giants,65000,55,3
Cowboys,80000,60,4

>> 

So, we need a different approach.

Step 5b: The fopen() function

In this step we'll cover three MATLAB built-in functions that we can use to access every line of any input file that we can create with a regular text editor (e.g. emacs on Unix, Notepad on Windows, or TextEdit on Mac). This approach will work even if the contents of the file are a mixture of text and numbers.

This is especially useful, for example, for processing data that often comes with a mixture of data types. For example, the katrina.dat file that we used in lab05 was originally in this format (and taken from the web link: http://weather.unisys.com/hurricane/atlantic/2005H/KATRINA/track.dat)

This file is also in your directory this week as track.dat

Date: 23-31 AUG 2005
Hurricane KATRINA
ADV  LAT    LON      TIME     WIND  PR  STAT
  1  23.20  -75.50 08/23/21Z   30  1007 TROPICAL DEPRESSION
 1A  23.30  -75.80 08/24/00Z   30  1007 TROPICAL DEPRESSION
  2  23.40  -76.00 08/24/03Z   30  1007 TROPICAL DEPRESSION
 2A  23.60  -76.00 08/24/06Z   30  1007 TROPICAL DEPRESSION
etc ...
25A  27.90  -89.50 08/29/03Z  140   908 HURRICANE-5
25B  28.20  -89.60 08/29/07Z  135   910 HURRICANE-4
 26  28.80  -89.60 08/29/09Z  130   915 HURRICANE-4
etc ...
 34  41.10  -81.60 08/31/09Z   15   996 TROPICAL DEPRESSION
 

The first of these three functions is the fopen() function. Just like many with many other things in life, you have to open a file before you can access its contents (you have to open the cookie jar to get at the cookies.)

Here's how we can use fopen() to open the file track.dat to get ready to read lines from it:

>> [fid  message] = fopen('track.dat')

A few notes:

Try typing in this command:

>> [fid  message] = fopen('track.dat')

If you get the following result, it probably means that either

>> [fid  message] = fopen('track.dat')

fid =

    -1


message =

No such file or directory

>> 

On the other hand, if you get this result (or one like it), then all is well. The exact number that is returned in fid may vary from what is shown here—the exact value is not important as long as the number isn't -1

>> [fid  message] = fopen('track.dat')

fid =

     3


message =

     ''


>> 

Try typing in the name of a file that doesn't exist, just so you can see the error message:

>> [fid  message] = fopen('blah.txt') 

fid =

    -1


message =

No such file or directory

>> 

Once you are comfortable with how to open a file, we can move on to the next step.

Step 5c: The fgetl() function

Now, open the file, and try to read some lines of text using the fgetl() function. The name fgetl stands for "file get line". Notice that each time you call fgetl(), it returns another line of the file:

>> [fid  message] = fopen('track.dat')

fid =

     4


message =

     ''


>> fgetl(fid)

ans =

Date: 23-31 AUG 2005

>> fgetl(fid)

ans =

Hurricane KATRINA

>> fgetl(fid)

ans =

ADV  LAT    LON      TIME     WIND  PR  STAT

>> fgetl(fid)

ans =

  1  23.20  -75.50 08/23/21Z   30  1007 TROPICAL DEPRESSION

>> 

If we know exactly how many lines are in a file, we can use a for loop to read them into a cell array. For example, since the file input.txt (which is an exact copy of nflTeams.txt) has exactly three lines, we can do this:

>> [nflFile message] = fopen('input.txt')

nflFile =

     5


message =

     ''


>> for i = [1:3]
lines{i} = fgetl(nflFile);
end
>> lines

lines = 

    'Eagles 70000 50 1'    'Giants 65000 55 3'    'Cowboys 80000 60 4'

>> 

However, more typically, we do not know how many lines are in a file. The fgetl() signals that it has hit the end of the file by returning -1 as the result. For example:

>> [nflFile message] = fopen('input.txt')

nflFile =

     3


message =

     ''


>> fgetl(nflFile)

ans =

Eagles 70000 50 1

>> fgetl(nflFile)

ans =

Giants 65000 55 3

>> fgetl(nflFile)

ans =

Cowboys 80000 60 4

>> fgetl(nflFile)

ans =

    -1

>> 

You can see that after we've read the last line, if we try to read again, the function fgetl() returns -1 as the result.

In a later step in this lab, we'll use this fact to write a while loop that can read all of the lines in a file into a cell array called lines.

We'll also calculate how many lines are in the file by starting a variable at 0 (e.g. numLines=0), and adding one to it each time we see a new line (e.g. numLines = numLines + 1).

Step 5d: The fclose() function

If you don't close the cookie jar, the cookies can go stale.

In the same way, whenever you are working with a file, it is a good programming practice to close the file when you are finished with it. We do this by calling the fclose() function:

>> fclose(fid);
>>  

or

>> fclose (nflFile); 
>>
The variable we pass in to fclose is the fid variable, NOT the name of the file:
The cookie analogy is not perfect

Closing a file you are reading from doesn't keep the "bits" from going stale. The bits in a file you are reading from are safe whether you close the file or not. Rather, it is a matter of resources. If you leave the file open, then MATLAB has to keep some memory tied up keeping track of that open file. This may will cause MATLAB to run slower.

Also, In a large program, you might open hundreds or thousands of files. MATLAB has an upper limit on how many files can be open at a time. If you exceed this limit, your program will fail.

Later, we'll talk about writing to a file—a so-called output file. In the case of writing to a file, closing the file is even more important. Sometimes when writing to a file, if you fail to close the file, the last few things you wrote might not actually make it into the file because of something called buffering. Without going into details, let's just say that closing a file is even more important with output files than with input files.

Now with our three basic tools—fopen(), fgetl() and fclose(), we are ready to look at a complete script M-file that reads from a file into a cell array.

Step 6: A script M-file to read text from a file into a cell array of strings

The file readInputFile.m is a script M-file that uses all of the concepts we've covered so far in Step 1 through Step 5. It also includes the concepts of if/else and while loops from lab06.

The script M-file readInputFile.m reads lines of text from an input file called input.txt into a cell array called lines.

Your job in this step is easy to explain, but complex to undertake: you need to try to understand how the file readInputFile.m does its job.

To do this, here are some steps that may help to get started:

That, actually, was the easy part. The more challenging part is to go through the file, line by line, and try to understand what each line of code is doing.

Once you've figured that out, you are ready for Step 7, the first part of this lab where you have to do something that you will turn in for credit.

Step 7: A function M-file to read text from a file into a cell array of strings

Your job in this step is easy to explain:

This is a step you will turn in for credit.

Why would we want to do this?

Here's why

How to get started

You can start by copying the file readInputFile.m from your ~/cisc106/lab07a directory to your ~/cisc106/lab07 directory, changing the name as you copy it from readInputFile.m to readFileToCellArray.m.

Then you need to edit the readFileToCellArray.m file, working in your ~/cisc106/lab07 directory (since that is where you will work on files that you are turning in for credit.)

The H1 comment of this file should look like this:

function [lines, numLinesRead]  = readFileToCellArray(filename)
%readFileToCellArray   read lines of filename into cell array
%Examples: 
%         >>lines=readFileToCellArray('lines.txt');
%         >>[lines,numLinesRead]=readFileToCellArray('myData.dat');
%
% Reads input from filename.   Each line is read into successive
% entries in a cell Array called lines, i.e. lines{1} is the first
% line, lines{2} is the second line, etc.    numLinesRead is the
% variable that keeps track of how many lines are in the file.
% (your name, course, date, section)

For purposes of this file, this is a sufficient for the inputs/outputs, and examples. Feel free to copy and paste the H1 comment above directly into your file. (Be sure to change the last line though!)

Now, the detailed part

Now, you need to make the changes in the body of the file so that it works as a function M-file, instead of as a script. As it turns out, this is not particularly complicated—if you have understood Steps 1 through Step 6 in detail, you can probably accomplish this in a matter of less than 10 minutes. It involves very few changes to the file.

If it works properly, you should be able to do this. Note the use of ../lab07a to access the "sibling" directory of your lab07a directory, through the parent directory (..)

>> [lines, howmany] = readFileToCellArray('nflTeams.txt');
>> lines

lines = 

    'Eagles 70000 50 1'    'Giants 65000 55 3'    'Cowboys 80000 60 4'

>> howmany

howmany =

     3

>> [lines, howmany] = readFileToCellArray('../lab07a/track.dat');   
>> howmany

howmany =

    66

>> lines{1}

ans =

Date: 23-31 AUG 2005

>> lines{2}

ans =

Hurricane KATRINA

>> 

Try it also on the file ../lab07a/elementNames.txt.

We'll make a diary file to document your work on Step 7 later, in Step 10. So if all this is working, it is on to Step 8!

Step 8: Parsing lines of data

According to thefreedictionary.com, the word parse means, among other things:

"To analyze or separate (input, for example) into more easily processed components."

In this step, we'll look into different ways of parsing input in MATLAB. The goal is to get input from a file into a format that we can work with.

Step 8a: Understanding the goal

So, once we have the lines into a cell array called lines, we then may want to try to parse them into their various parts.

For example, the file nflTeams.txt contains lines such as this.

Eagles 70000 50 1
Giants 65000 55 3
Cowboys 80000 60 4
 

The file nflTeams.txt appears not only in your ~/cisc106/lab07a directory but also in your ~/cisc106/lab07 directory.

So, you should be able to use the function M-file you wrote in part 7 to get each line into a cell as a string, like this:

>> [lines,howmany] = readFileToCellArray('nflTeams.txt')

lines = 

    'Eagles 70000 50 1'    'Giants 65000 55 3'    'Cowboys 80000 60 4'


howmany =

     3

>> lines

lines = 

    'Eagles 70000 50 1'    'Giants 65000 55 3'    'Cowboys 80000 60 4'

>> lines'

ans = 

    'Eagles 70000 50 1'
    'Giants 65000 55 3'
    'Cowboys 80000 60 4'

>> 

 

Now, we want those lines to be split up into their various parts, so we have a cell array like this one:

>> nfl                                                  

nfl = 

    'Eagles'     [70000]    [50]    [1]
    'Giants'     [65000]    [55]    [3]
    'Cowboys'    [80000]    [60]    [4]

>> 

The first step is to figure out how to separate a single line of text such as 'Eagles 70000 50 1' into its various parts. Fortunately, there is a function that can help.

Step 8b: Using the split option of the regexp function

The MATLAB built in function regexp() is a very powerful function. The name, regexp, stands for regular expression. The concept of a regular expression is a fundamental concept in Computer Science. In this step, we are only going to scratch the surface of the power of this function.

In particular, we are going to focus on only one use of this function—its use to split apart a string based on a delimiter symbol. The web site freedictionary.com defines delimiter as "a character or sequence of characters marking the beginning or end of a unit of data.". Consider the string:

'Eagles 70000 50 1'

In the string above, the delimiter is the space character. Contrast that with these two strings:

'Eagles,70000,50,1'
'1,Hydrogen,H,1,1,Nonmetal'

In these two strings, the delimiter is the comma.

In MATLAB, we use use a special form of the regexp function to divide a string into a cell array. It works like this. Note that the second actual parameter to the function is a space character in between to single quotation marks: ' '

>> regexp('Eagles 70000 50 1',' ','split')       

ans = 

    'Eagles'    '70000'    '50'    '1'

>> 

What gets returned is a cell array.

It works too if we use a comma as the delimiter. Note that the second actual parameter in this function call is a comma in-between two single quotation marks: ','

>> regexp('2,Helium,He,1,18,Noble gas',',','split')

ans = 

    '2'    'Helium'    'He'    '1'    '18'    'Noble gas'

>> 

We can combine this technique with the function that reads lines from a file, and use it to divide up a line into its parts:

>> [lines,howmany] = readFileToCellArray('nflTeams.txt');
>> lines{1}

ans =

Eagles 70000 50 1

>> parts=regexp(lines{1},' ','split');                   
>> parts

parts = 

    'Eagles'    '70000'    '50'    '1'

>> 

Step 8c: Before we combining these ideas together...

If we have a file that contains values separated by a delimiter, we can first use our readFileToCellArray() function to read the lines of this file into a cell array. We can then step through that cell array, split out the items, and fill a 2-dimensoinal cell array with all of the values.

In a moment, we'll show a script that combines these two ideas. But first, here are two concepts that you'll see in the script that may need some extra explanation.

Pre-allocating a cell array

If we know that we are trying to build a cell array of a certain size (a certain number of rows and columns), we can help MATLAB work more efficiently and quickly by pre-allocating. We tell MATLAB the size of the cell array we are trying to build, and it pre-allocates space for it, filling each cell with the empty matrix:

>> myNewArray = cell(4,5);
>> myNewArray

myNewArray = 

     []     []     []     []     []
     []     []     []     []     []
     []     []     []     []     []
     []     []     []     []     []

>> 

A nested loop

We can nest one for loop inside another, much like nested russian dolls.

Consider the following script M-file, called nestedLoop.m (available in your lab07a directory—cd back over to there if you want to try running it.)

% nestedLoop.m    Illustrate nested for loop
% P. Conrad for CISC106, 10/14/2007

for i=1:4
  fprintf('+');
  for j=1:5
    fprintf('-');
  end
  fprintf('+\n');
end

% end nestedLoop.m

Here is the output:

>> nestedLoop
+-----+
+-----+
+-----+
+-----+
>> 

Here is a second example called nestedLoop2.m—one that shows how we use i and j inside the loop:

% nestedLoop2.m    Illustrate nested for loop
% P. Conrad for CISC106, 10/14/2007

% use a nested loop that prints out value of i and j

for i=1:4
  for j=1:5
    fprintf('(%d,%d) ',i,j);
  end
  fprintf('\n');
end

% end nestedLoop2.m

And here is the output of the second example:

>> nestedLoop2 
(1,1) (1,2) (1,3) (1,4) (1,5) 
(2,1) (2,2) (2,3) (2,4) (2,5) 
(3,1) (3,2) (3,3) (3,4) (3,5) 
(4,1) (4,2) (4,3) (4,4) (4,5) 
>>

Be sure you understand these two examples before proceeding.

Step 8d: Ok, now we can combine these ideas together:

So, here is a script M-file called readNflTeams.m (available in your lab07a directory) that combines the ideas of:

Read through the script and try to understand how it works. Then look at the sample output:

% readNflTeams.m        P. Conrad for CISC106, 07F
% set up a cell array containing data from nflTeams.txt
% column 1: team name
% column 2: stadium capacity
% column 3: ticket price
% column 4: wins so far this season (as of 10/10/2007)

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% read input into lines array     %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


[lines, howMany] = readFileToCellArray('nflTeams.txt');


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% pre-allocate cell array   %
% this improves efficiency  %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


nfl = cell(howMany, 4); % howMany rows, 4 columns


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% loop through the lines, parsing out %
% input and filling cells             %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

for i=[1:howMany]

  % make a cell array of this row
  thisRow = regexp(lines{i},' ','split');

  % loop through this row, copying into the big array

  for j=[1:4]
     nfl{i,j} = thisRow{j};
  end

end

% end of readNflTeams.m


Here's the sample output:

>> readNflTeams
>> nfl

nfl = 

    'Eagles'     '70000'    '50'    '1'
    'Giants'     '65000'    '55'    '3'
    'Cowboys'    '80000'    '60'    '4'

>> 

 

Understand this script M-file in detail before going to the next step.

Step 8e: What you must do for credit: A function M-file to do the same thing

So, it is all well and good to be able to do this with a script M-file. But note that this script is not very reusable:

What we want is a more reusable function M-file to do the same task. We'd like to be able to pass in parameters for:

Your task is to come up with a function M-file that can be used in place of readNflTeams.m. It should be able to read the file nflTeams.txt, but it should also be able to read the file nflTeams.csv (which is a comma delimited version of the file nflTeams.txt), and the file elements.csv (a comma-delimited file containing the first 10 elements of the periodic table.)

Think about what your output variable(s) should be, and what a reasonable name for the function M file is. Then write the function M-file that will accomplish the task.

Note that your function M-file needs to be generic—it should be able to read any CSV file, or any file that has the same number of rows and columns, all delimited by some character. It should not be specific to the NFL, and should not contain any variable names or comments that refer to the NFL!

Put this function M-file in your lab07 directory, not your lab07a directory.

When you are done, go on to step 9.

Step 9: Come up with your own comma delimited file.

Think about a topic that is of interest to you, and the write your own comma delimited file. It should have at least 4 lines of data in it, and should have at least 4 columns. At least two of these columns should be numeric, and at least one should be a character string.

Make sure the function M-file you wrote in step 8e works on your new comma delimited file. When it does, you are ready to go to steps 10,11,12, where you make your diary file, make your zip file, and submit on WebCT.

Step 10: Make a diary file lab07.txt.

Now, we want to make a diary file called lab07.txt documenting the work from steps 7, 8 and 9 of this week's lab.

Put yourself inside MATLAB, inside the directory ~/cisc106/lab07, and start a diary file called lab07.txt.

Then, do each of the following steps:

To document your work from step 7

To document your work from step 8

To document your work from step 9

 

Step 11: Make a zip file lab07.zip of your .m files

Just like last week, we want to create a zip file of the files you are going to submit.

Here's a list of all the files we want to put in the zip file for this week—all of these files (and only these files) should be in your lab07 directory.

Step Filename(s)
7

readFileToCellArray.m

8

the new function M-file you created
nflTeams.txt
, nflTeams.csv and elements.csv

9

the new data file you created

Here's how to do it:

  1. Get to a Unix prompt in the directory above ~/cisc106/lab07, i.e. ~/cisc106/.
  2. Type the following Unix command, which will make a zip file consisting of only the .m, .txt, .csv and .dat files in your lab07 directory:

    zip -r lab07.zip lab07 -i \*.m \*.csv \*.txt \*.dat

    Afterwards you'll have a file called lab07.zip in your ~/cisc106 directory that you can submit on WebCT.

  3. To test whether creating the zip file worked or not, you can make a temporary directory, copy the zip file into it, and try unzipping the file, and seeing if it creates a lab07 directory containing the appropriate information. This is optional, but highly recommended. (Detailed instructions for how to do this are available in the instructions for lab06.)

Step 12: Submit your lab07.txt diary file, and your lab07.zip file

Now you can submit your work on WebCT, and you are done!


Grading

 

step what we're looking for points
step 7

readFileToCellArray.m
15 pts for correctness, 15 for programming style

30
step 8

the new function M-file you created
15 pts for correctness, 15 for programming style

30
step 9

the new data file you created
10 points for creating a file that parses correctly

10
step 10 lab07.txt diary file: following directions in scripting 10
step 11 lab07.zip file: following directions to correctly create zip file
(should unzip into a directory called lab07, not just a bunch of files)
10
overall following
of directions
student should follow the directions given 10
Total    

 

End of lab07 for CISC106, Fall 2007 (100 pts)
Due Date: 10/25/2007