This lab contains 12 steps. Here is a general overview.
You will probably find the lab easier to understand and complete if you look over the entire lab before jumping in.
If you work in pairs,
% lab07a.m Generate a plot of the path of hurricane Katrina % Liz Lemon, CISC106 section 082 lab07, 09/13/2007 ...
Instead, if Liz Lemon and Jack Donaghy are working together, Liz should write:
% lab07a.m Generate a plot of the path of hurricane Katrina % Liz Lemon (with Jack Donaghy) CISC106 section 082, lab07, 09/13/2007
and Jack should write:
% lab07a.m Generate a plot of the path of hurricane Katrina % Jack Donaghy (with Liz Lemon), CISC105 section 080, lab07, 09/13/2007
This does NOT establish a precedent for future labs—unless specifically instructed otherwise, always assume that collaboration at the level of working on code together is not permitted, and may constitute academic dishonesty.
In this lab, we'll cover
By the time you complete this lab, you should be able to:
To prepare for this week's lab, do all the following steps. If you are not sure what to do at any stage, you can consult earlier labs for details:
~/cisc106/lab07
/www/htdocs/CIS/106/pconrad/07F/labs/lab07
into your new directory ~/cisc106/lab07
~/cisc106/lab07a
/www/htdocs/CIS/106/pconrad/07F/labs/lab07a
into your new directory ~/cisc106/lab07a
Because there are lots of files this week, we are dividing them into two directories:
lab07
include the ones you will turn in for creditlab07a
are ones that are only for practice. Some parts of this step in the lab is a reiteration of some material that you may already have covered in lecture. However, some parts of it are new, or emphasize different aspects of cell arrays you might not yet have seen.
So, while there is a temptation to skim over this, you are encouraged instead, to read in detail and work the examples.
Even though there is nothing to turn in for credit,
So, don't skip over it.
The files you need for this step should be in your ~/cisc106/lab07a
directory, so you may want to cd
into that directory before proceeding.
So, something that you will have to get used to in this lab is seeing the difference between a(1,2)
and a{1,2}
The difference is subtle, but crucial, and it can be hard to see on the computer screen. Let's zoom in for a closer look:
a(1,2)
vs. a{1,2}
In the first case, we have parentheses: a(1,2)
. In the second case we have braces: a{1,2}
. Braces are "the curly kind".
It can be difficult to see the difference if the font size on your monitor is small, so be sure that you increase the font size on your monitor so that you can clearly see the difference. We'll be using both this week, and discussing the difference between writing one vs. the other.
As long as we are on the subject, we might as well mention brackets too: brackets are the square kind: a = [ 1 2 ];
So, we have:
() |
{} |
[] |
parentheses | braces | brackets |
Learn the difference!
Suppose you want to create a MATLAB variable containing a name, with your first name above your last name, like this:
name = John Ford
So, if your name happens to be John Ford
, or Bette Davis
, or Austin Powers
, an ordinary MATLAB array will work just fine:
>> name = ['John';'Ford'] name = John Ford >> name = ['Bette';'Davis'] name = Bette Davis >> name = ['Austin';'Powers'] name = Austin Powers >>
However, if your name is Phillip Conrad, Terry Harvey, or Dave Saunders, there will be trouble:
>> name = ['Phill';'Conrad']
??? Error using ==> vertcat
CAT arguments dimensions are not consistent.
>>
The problem is that, since everything in MATLAB is a matrix, MATLAB wants to create a matrix with the same number of columns in every row. Thus, the two male leads in Fight Club (Brad Pitt
, Edward Norton
), can both be put into a regular MATLAB array, and even the lead actress with three names, Helena Bonham Carter
can all work:
>> name = ['Helena';'Bonham';'Carter'] name = Helena Bonham Carter >>
But if we want to store Terry Harvey, in this way, we are out of luck.
>> name = ['Terry';'Harvey'] ??? Error using ==> vertcat CAT arguments dimensions are not consistent.
>>
This is really just a simple example of a more general problem—storing a column of strings with different lengths.
Suppose, for example, we have a file called elementNames.txt, containing the first ten elements in the periodic table, and we want to store these in an array:
>> type elementNames.txt Hydrogen Helium Lithium Beryllium Boron Carbon Nitrogen Oxygen Fluorine Neon >>
One approach is to use the char function, which pads a character array with spaces, so that every row has the same number of columns. (As a reminder, the ...
here means that we are continuing a MATLAB statement across multiple lines.)
>> elements = char('Hydrogen','Helium','Lithium','Beryllium','Boron', ... 'Carbon','Nitrogen','Oxygen','Fluorine','Neon') elements = Hydrogen Helium Lithium Beryllium Boron Carbon Nitrogen Oxygen Fluorine Neon >>
But this is really kind of a hack—an inelegant solution that works, but isn't really what we want.
First, if we ask for element(3), we don't get ans = Lithium
. Instead, we get ans = L.
>> elements(3) ans = L >>
To get Lithium, we have to ask for element(3,:)
>> elements(3,:) ans = Lithium >>
Next, if we want to add a new element, it has to be padded out with exactly the right number of spaces. This is super-duper annoying:
>> elements(11)='Sodium ' ??? In an assignment A(:) = B, the number of elements in A and B must be the same. >> elements(11,:)='Sodium ' ??? Subscripted assignment dimension mismatch. >> elements(11,:)='Sodium ' elements = Hydrogen Helium Lithium Beryllium Boron Carbon Nitrogen Oxygen Fluorine Neon Sodium >>
Fortunately, there is a better way—the cell array.
Fortunately, MATLAB offers a different kind of array: the cell array.
Where normal arrays in MATLAB are built with the square bracket []
symbol, e.g.:
>> name = ['John';'Ford'] name = John Ford >>
The cell array is built with the curly brace symbol {},
e.g.
>> name = {'Julia';'Roberts'} name = 'Julia' 'Roberts' >>
Note three crucial differences between the cell array version and the standard version:
With cell arrays, we can store both Salma Hayek, and Julia Roberts. With regular arrays, Julia is out of luck.
>> name = {'Salma';'Hayek'} name = 'Salma' 'Hayek' >> name = {'Julia';'Roberts'} name = 'Julia' 'Roberts' >> name = ['Salma';'Hayek'] name = Salma Hayek >> name = ['Julia';'Roberts'] ??? Error using ==> vertcat CAT arguments dimensions are not consistent. >>
The regular array for Salma Hayek has two rows, and five columns. But the cell array has only one column.
Compare:
>> clear >> name = ['Salma';'Hayek'] name = Salma Hayek >> whos Name Size Bytes Class Attributes name 2x5 20 char >> clear >> name = {'Salma';'Hayek'} name = 'Salma' 'Hayek' >> whos Name Size Bytes Class Attributes name 2x1 140 cell >>
If we index into the regular array with name(1)
, we'll just get the S in Salma. But with the cell array, name(1) gives us the entire string Salma
>> name = ['Salma';'Hayek'] name = Salma Hayek >> >> name(1) ans = S >> name = {'Salma';'Hayek'} name = 'Salma' 'Hayek' >> name(1) ans = 'Salma' >>
This last issue brings us to the real truth about what cell arrays actually are, and invites us to take a deeper look:
As with Step 2, there is nothing to turn in from Step 3, but exam or quiz questions, and later steps in the lab depend on the stuff you will learn from doing this step. So, don't skip it!
The files you need for this step should be in your ~/cisc106/lab07a
directory, so you may want to cd
into that directory before proceeding.
Cell arrays in MATLAB are an example of a principle of computer science called indirection. Indirection is a general term for languages features such as:
This is the important point:
Each element of a cell array is actually a pointer to a complete MATLAB matrix.
Let's look at this principle more deeply.
When we write this:
>> element{1}='Hydrogen'; >> element{2}='Helium'; >> element{3}='Lithium'; >>
There are actually two levels at which things are going on.
First, we see that at the cell array level, we have created a 1x3 cell array called element
.
>> whos Name Size Bytes Class Attributes element 1x3 222 cell >>
However, at a deeper level, we have actually created four matrices, not one—because each element in the array element
is actually a full matrix in its own right. We can see this by using the cellplot function:
>> cellplot(element); >> print('-djpeg','element.jpg'); >>
The resulting plot looks like this:
What we see is that the array elements, in this case, is composed of three "cells", each of which contains a complete matrix.
Thus, if we want to pull out the fourth character of the word Hydrogen, we can write {1}
to pull out the first element of the cell array, and then write (4)
to pull out the fourth element of the matrix in that cell.
Notice the subtle but crucial difference between the curly braces {}
around the 1, and the round parentheses ()
around the four.
>> element{1}(4) ans = r >>
We can build a cell array that has different data types. For example, if we have a file called nflTeams.txt containing data about NFL teams, for example (note that this is typed at the Unix prompt, not the MATLAB prompt:)
> more nflTeams.txt Eagles 70000 50 1 Giants 65000 55 3 Cowboys 80000 60 4 >
This file has a column for the team name, a column for the stadium capacity, a column for the ticket price, and a column for the number of wins so far this season (note: this file was generated on 10/10/2007).
We can save the first line of this file into a cell array, as follows:
>> nfl{1} = 'Eagles'; >> nfl{2} = 70000; >> nfl{3} = 50; >> nfl{4} = 1; >> nfl nfl = 'Eagles' [70000] [50] [1] >>
Note that the elements {2}, {3} and {4} are listed with a set of [] around them. This reflects the fact that each of the cells in a cell array is a complete matrix in its own right. That, is, the element nfl{2} is not just the number 70000, but is in fact, a pointer to a 1x1 matrix containing the number 70000.
Because cell arrays can have multiple dimensions, we can actually store all the data from our nflTeams.txt file into a single cell array:
>> nfl{1,1} = 'Eagles'; >> nfl{1,2} = 70000; >> nfl{1,3} = 50; >> nfl{1,4} = 1; >> nfl{2,1} = 'Giants'; >> nfl{2,2} = 65000; >> nfl{2,3} = 55; >> nfl{2,4} = 3; >> nfl{3,1} = 'Cowboys'; >> nfl{3,2}= 80000; >> nfl{3,3}= 60; >> nfl{3,4}= 4; >> nfl nfl = 'Eagles' [70000] [50] [1] 'Giants' [65000] [55] [3] 'Cowboys' [80000] [60] [4] >>
A cell plot of the nfl
cell array shows the nature of the data contained inside. We can see that all the cells in the first column are strings, of different lengths: Eagles is a 1x6 character array, while Cowboys is a 1x7 character array. Meanwhile, the cells in the second, third, and fourth columns are all 1x1 matrices (scalars) containing individual numbers.
One of our goals in this lab is to learn how to automatically convert a file such as nflTeams.txt into a cell array such as nfl without having to tediously type all the assignments statements:
>> nfl{1,1} = 'Eagles'; >> nfl{1,2} = 70000; >> nfl{1,3} = 50; >> nfl{1,4} = 1; >> nfl{2,1} = 'Giants'; >> nfl{2,2} = 65000; etc...
Instead, we'd like to be able to just do something like this, and achieve the same effect:
>> nfl = fileToCellArray('nflTeams.txt'); >>
For the time being, though, until we have written a M-file for fileToCellArray()
function that can achieve this feat, we can use the script M-file setUpNflCellArray.m
, which is in your lab07a
directory this week. Try typing in these commands:
>> clear >> cd ~/cisc106/lab07a >> setUpNflCellArray >> whos Name Size Bytes Class Attributes nfl 3x4 830 cell >> nfl nfl = 'Eagles' [70000] [50] [1] 'Giants' [65000] [55] [3] 'Cowboys' [80000] [60] [4] >>
Now that we have the nfl
cell array set up, we can show the difference between content indexing and cell indexing.
If you haven't already done so, type the commands at the end of step 3c that set up the nfl
cell array:
Then, try the following:
>> nfl(2,1) ans = 'Giants' >> nfl{2,1} ans = Giants >>
Note that we can type both nfl(2,1)
and nfl{2,1}
and in both cases we learn that the team in row two is the Giants, but there is a subtle difference:
nfl(2,1)
we get the answer Giants
nfl{2,1}
we get the answer 'Giants'
What is going on here?
To look into this more deeply, we can try the same trick with the stadium capacity for the Giants.
>> nfl(2,2) ans = [65000] >> nfl{2,2} ans = 65000 >>
So, this is interesting. When we use the notation nfl(2,1)
or nfl(2,2)
, we seem to get extra punctuation, either ''
or []
around our result.
We can learn more by doing a whos command after we type each of these:
>> nfl(1,1) ans = 'Eagles' >> whos Name Size Bytes Class Attributes ans 1x1 72 cell nfl 3x4 830 cell >>
Here we see that ans
is showing up as a 1x1 cell
. But, if we use nfl{1,1}
instead, look what we get:
>> nfl{1,1} ans = Eagles >> whos Name Size Bytes Class Attributes ans 1x6 12 char nfl 3x4 830 cell >>
If we use the {}
to index into the array, we get an actual 1x6 character string.
In MATLAB, this is called the difference between cell indexing, and content indexing.
One way to understand the difference is this:
When we write nfl(2,1)
we get 'Giants'
, and when we write nfl(2,2)
we get [65000]. In both cases, the result is a cell. The actual content is still wrapped up inside a cell. Think of the cell as like a cellophane wrapper that has to be peeled off.
However, when we write nfl{2,1}
, we go deeper. We automatically peel off the cellophane wrapper, and get at the content inside. Thus the result is an actual character string, or number, or whatever. Thus, the result of nfl{2,1}
is Giants
(without the wrapper) and the result of nfl{2,2}
is 65000
(again, without the wrapper.)
This also works when making assignments statements. If we want to assign another row of data, we can do it in one of two ways.
We can either assign to the contents of the cells:
>> nfl{4,1}='49ers'; >> nfl{4,2}=70000; >> nfl{4,3}=59; >> nfl{4,4}=2; >> nfl nfl = 'Eagles' [70000] [50] [1] 'Giants' [65000] [55] [3] 'Cowboys' [80000] [60] [4] '49ers' [70000] [59] [2] >>
Or, we can create cells on the right hand of the assignment statement by putting the contents inside a set of braces, and then assign the cell itself:
>> nfl(4,1)={'49ers'}; >> nfl(4,2)={70000}; >> nfl(4,3)={59}; >> nfl(4,4)={2}; >> nfl nfl = 'Eagles' [70000] [50] [1] 'Giants' [65000] [55] [3] 'Cowboys' [80000] [60] [4] '49ers' [70000] [59] [2] >>
What we cannot do is assign content directly into a cell:
>> nfl(4,1)='49ers'; ??? Conversion to cell from char is not possible. >> nfl(4,3)=70000; ??? Conversion to cell from double is not possible. >>
Understand this point, and be prepared to answer questions about it on an exam or quiz. If it is still not clear, go back through these steps, and experiment with adding data to the table, or changing data in the table, until it becomes more clear. You can also ask your TA or instructor to clarify, during lab, lecture, or office hours.
In MATLAB, we can return more than one value from a function. We'll need this fact later in this lab, so this step is here to review how this feature of MATLAB works. And, as usual, this could show up on an exam or quiz, so don't skip this step.
The files you need for this step should be in your ~/cisc106/lab07a
directory, so you may want to cd
into that directory before proceeding.
Take a look at the function M-file myFunc.m. Notice that there are two result variables, sum
and prod
>> type myFunc.m
function [sum, prod] = myFunc(a,b) %myFunc returns sum and product of two number % % consumes: two numbers, a and b % produces: their sum and their product % % Examples: % %>> myFunc(3,6) %ans = % 9 %>> [theSum, theProd] = myFunc(3, 4) %theSum = % 7 %theProd = % 12 %>> [x , y ] = myFunc(5,7) %x = % 12 %y = % 35 % % P. Conrad for CISC106, Sect 99, 10/12/2007 sum = a+b; prod = a * b; return; end >>
If we type a function call to myFunc()
and do not put anything on the left hand side of the assignment statement, we get only the first of these two result variables, which represents the sum:
>> myFunc(2,4) ans = 6 >>
The same is true if we only put one variable on the left hand side of an assignment statement:
>> x = myFunc(2,4) x = 6 >>
But, if we provide a matrix containing two result variables on the left hand side of the assignments statement, we can access both output variables:
>> [x y] = myFunc(2,4) x = 6 y = 8 >>
In this way, we can return more than one result from a function.
A side note: this is an unusual feature of MATLAB. Most programming languages, including C, C++, and Java only allow one result to be returned from a function. In those languages, there are ways to simulate the effect of returning more than one result from a function, but that's a topic for another day.
In this step, we'll look at reading data from an input file.
There is nothing to turn in from this step, but material could show up on labs/quizzes, and you need it for later steps in the lab.
The files you need for this step should be in your ~/cisc106/lab07a
directory, so you may want to cd
into that directory before proceeding.
Knowing how to read from an input file is important, because
So, let's suppose we have a file of data, and we want to read that data into MATLAB. In an earlier lab (lab05) we've already seen that if the data consists entirely of numbers, formatted in columns and rows, we can read that data in using the load command. As a reminder, if katrina.dat
contains only rows and columns numbers, we can type:
>> load katrina.dat; >>
and the result is a MATLAB matrix called katrina
, containing all of our data. However, this does not work for files containing a mixture of text and numbers. For example, if our files is input.txt
, containing our NFL data:
>> type input.txt Eagles 70000 50 1 Giants 65000 55 3 Cowboys 80000 60 4 >> load input.txt ??? Error using ==> load Unknown text on line number 1 of ASCII file /home/kodos/vol0/hsm/22/pconrad/106/07F/labPrep/lab07/input.txt "Eagles". >>
Further, sometimes data is stored in other formats, such as "comma-separated values", or CSV format. Consider the file nflTeams.csv:
>> type nflTeams.csv Eagles,70000,50,1 Giants,65000,55,3 Cowboys,80000,60,4 >>
So, we need a different approach.
In this step we'll cover three MATLAB built-in functions that we can use to access every line of any input file that we can create with a regular text editor (e.g. emacs on Unix, Notepad on Windows, or TextEdit on Mac). This approach will work even if the contents of the file are a mixture of text and numbers.
This is especially useful, for example, for processing data that often comes with a mixture of data types. For example, the katrina.dat file that we used in lab05 was originally in this format (and taken from the web link: http://weather.unisys.com/hurricane/atlantic/2005H/KATRINA/track.dat)
This file is also in your directory this week as track.dat
Date: 23-31 AUG 2005 Hurricane KATRINA ADV LAT LON TIME WIND PR STAT 1 23.20 -75.50 08/23/21Z 30 1007 TROPICAL DEPRESSION 1A 23.30 -75.80 08/24/00Z 30 1007 TROPICAL DEPRESSION 2 23.40 -76.00 08/24/03Z 30 1007 TROPICAL DEPRESSION 2A 23.60 -76.00 08/24/06Z 30 1007 TROPICAL DEPRESSION etc ... 25A 27.90 -89.50 08/29/03Z 140 908 HURRICANE-5 25B 28.20 -89.60 08/29/07Z 135 910 HURRICANE-4 26 28.80 -89.60 08/29/09Z 130 915 HURRICANE-4 etc ... 34 41.10 -81.60 08/31/09Z 15 996 TROPICAL DEPRESSION
The first of these three functions is the fopen()
function. Just like many with many other things in life, you have to open a file before you can access its contents (you have to open the cookie jar to get at the cookies.)
Here's how we can use fopen()
to open the file track.dat
to get ready to read lines from it:
>> [fid message] = fopen('track.dat')
A few notes:
fid
, is the file identifier. It keeps track of where we are in the file. It also allows us to open more than one file at a time, and keep track of which file is which.Try typing in this command:
>> [fid message] = fopen('track.dat')
If you get the following result, it probably means that either
cd
'ing into ~/cisc106/lab07a
first), or track.dat
into your directory back at step 1 >> [fid message] = fopen('track.dat') fid = -1 message = No such file or directory >>
On the other hand, if you get this result (or one like it), then all is well. The exact number that is returned in fid
may vary from what is shown here—the exact value is not important as long as the number isn't -1
>> [fid message] = fopen('track.dat') fid = 3 message = '' >>
Try typing in the name of a file that doesn't exist, just so you can see the error message:
>> [fid message] = fopen('blah.txt') fid = -1 message = No such file or directory >>
Once you are comfortable with how to open a file, we can move on to the next step.
Now, open the file, and try to read some lines of text using the fgetl()
function. The name fgetl
stands for "file get line". Notice that each time you call fgetl()
, it returns another line of the file:
>> [fid message] = fopen('track.dat') fid = 4 message = '' >> fgetl(fid) ans = Date: 23-31 AUG 2005 >> fgetl(fid) ans = Hurricane KATRINA >> fgetl(fid) ans = ADV LAT LON TIME WIND PR STAT >> fgetl(fid) ans = 1 23.20 -75.50 08/23/21Z 30 1007 TROPICAL DEPRESSION >>
If we know exactly how many lines are in a file, we can use a for
loop to read them into a cell array. For example, since the file input.txt (which is an exact copy of nflTeams.txt) has exactly three lines, we can do this:
>> [nflFile message] = fopen('input.txt') nflFile = 5 message = '' >> for i = [1:3] lines{i} = fgetl(nflFile); end >> lines lines = 'Eagles 70000 50 1' 'Giants 65000 55 3' 'Cowboys 80000 60 4' >>
However, more typically, we do not know how many lines are in a file. The fgetl()
signals that it has hit the end of the file by returning -1
as the result. For example:
>> [nflFile message] = fopen('input.txt') nflFile = 3 message = '' >> fgetl(nflFile) ans = Eagles 70000 50 1 >> fgetl(nflFile) ans = Giants 65000 55 3 >> fgetl(nflFile) ans = Cowboys 80000 60 4 >> fgetl(nflFile) ans = -1 >>
You can see that after we've read the last line, if we try to read again, the function fgetl()
returns -1
as the result.
In a later step in this lab, we'll use this fact to write a while
loop that can read all of the lines in a file into a cell array called lines
.
We'll also calculate how many lines are in the file by starting a variable at 0 (e.g. numLines=0
), and adding one to it each time we see a new line (e.g. numLines = numLines + 1
).
If you don't close the cookie jar, the cookies can go stale.
In the same way, whenever you are working with a file, it is a good programming practice to close the file when you are finished with it. We do this by calling the fclose()
function:
>> fclose(fid);
>>
or
>> fclose (nflFile); >>
fid
variable, NOT the name of the file:fclose(fid);
because we previously did [fid,message]=fopen('input.txt');
fclose(nflFile)
it is because used the variable nflFile when we did [nflFile,message]=fopen('nflFile.txt');
[banana,message]=fopen('nflFile.txt');
earlierfclose(banana);
not fclose(nflFile);
fclose(nflFile.txt);
or even fclose('nflFile.txt');
we will get an error. Closing a file you are reading from doesn't keep the "bits" from going stale. The bits in a file you are reading from are safe whether you close the file or not. Rather, it is a matter of resources. If you leave the file open, then MATLAB has to keep some memory tied up keeping track of that open file. This may will cause MATLAB to run slower.
Also, In a large program, you might open hundreds or thousands of files. MATLAB has an upper limit on how many files can be open at a time. If you exceed this limit, your program will fail.
Later, we'll talk about writing to a file—a so-called output file. In the case of writing to a file, closing the file is even more important. Sometimes when writing to a file, if you fail to close the file, the last few things you wrote might not actually make it into the file because of something called buffering. Without going into details, let's just say that closing a file is even more important with output files than with input files.
Now with our three basic tools—fopen()
, fgetl()
and fclose()
, we are ready to look at a complete script M-file that reads from a file into a cell array.
The file readInputFile.m
is a script M-file that uses all of the concepts we've covered so far in Step 1 through Step 5. It also includes the concepts of if/else and while loops from lab06.
The script M-file readInputFile.m
reads lines of text from an input file called input.txt
into a cell array called lines
.
Your job in this step is easy to explain, but complex to undertake: you need to try to understand how the file readInputFile.m
does its job.
To do this, here are some steps that may help to get started:
input.txt
and readInputFile.m
input.txt
readInputFile.m
lines
and numLinesRead
to see what their contents areThat, actually, was the easy part. The more challenging part is to go through the file, line by line, and try to understand what each line of code is doing.
Once you've figured that out, you are ready for Step 7, the first part of this lab where you have to do something that you will turn in for credit.
Your job in this step is easy to explain:
This is a step you will turn in for credit.
Here's why
input.txt
You can start by copying the file readInputFile.m
from your ~/cisc106/lab07a
directory to your ~/cisc106/lab07
directory, changing the name as you copy it from readInputFile.m
to readFileToCellArray.m
.
Then you need to edit the readFileToCellArray.m
file, working in your ~/cisc106/lab07
directory (since that is where you will work on files that you are turning in for credit.)
The H1 comment of this file should look like this:
function [lines, numLinesRead] = readFileToCellArray(filename) %readFileToCellArray read lines of filename into cell array %Examples: % >>lines=readFileToCellArray('lines.txt'); % >>[lines,numLinesRead]=readFileToCellArray('myData.dat'); % % Reads input from filename. Each line is read into successive % entries in a cell Array called lines, i.e. lines{1} is the first % line, lines{2} is the second line, etc. numLinesRead is the % variable that keeps track of how many lines are in the file. % (your name, course, date, section)
For purposes of this file, this is a sufficient for the inputs/outputs, and examples. Feel free to copy and paste the H1 comment above directly into your file. (Be sure to change the last line though!)
Now, you need to make the changes in the body of the file so that it works as a function M-file, instead of as a script. As it turns out, this is not particularly complicated—if you have understood Steps 1 through Step 6 in detail, you can probably accomplish this in a matter of less than 10 minutes. It involves very few changes to the file.
If it works properly, you should be able to do this. Note the use of ../lab07a
to access the "sibling" directory of your lab07a directory, through the parent directory (..
)
>> [lines, howmany] = readFileToCellArray('nflTeams.txt'); >> lines lines = 'Eagles 70000 50 1' 'Giants 65000 55 3' 'Cowboys 80000 60 4' >> howmany howmany = 3 >> [lines, howmany] = readFileToCellArray('../lab07a/track.dat'); >> howmany howmany = 66 >> lines{1} ans = Date: 23-31 AUG 2005 >> lines{2} ans = Hurricane KATRINA >>
Try it also on the file ../lab07a/elementNames.txt
.
We'll make a diary file to document your work on Step 7 later, in Step 10. So if all this is working, it is on to Step 8!
According to thefreedictionary.com, the word parse means, among other things:
"To analyze or separate (input, for example) into more easily processed components."
In this step, we'll look into different ways of parsing input in MATLAB. The goal is to get input from a file into a format that we can work with.
So, once we have the lines into a cell array called lines, we then may want to try to parse them into their various parts.
For example, the file nflTeams.txt
contains lines such as this.
Eagles 70000 50 1 Giants 65000 55 3 Cowboys 80000 60 4
The file nflTeams.txt appears not only in your ~/cisc106/lab07a
directory but also in your ~/cisc106/lab07
directory.
So, you should be able to use the function M-file you wrote in part 7 to get each line into a cell as a string, like this:
>> [lines,howmany] = readFileToCellArray('nflTeams.txt') lines = 'Eagles 70000 50 1' 'Giants 65000 55 3' 'Cowboys 80000 60 4' howmany = 3 >> lines lines = 'Eagles 70000 50 1' 'Giants 65000 55 3' 'Cowboys 80000 60 4' >> lines' ans = 'Eagles 70000 50 1' 'Giants 65000 55 3' 'Cowboys 80000 60 4' >>
Now, we want those lines to be split up into their various parts, so we have a cell array like this one:
>> nfl nfl = 'Eagles' [70000] [50] [1] 'Giants' [65000] [55] [3] 'Cowboys' [80000] [60] [4] >>
The first step is to figure out how to separate a single line of text such as 'Eagles 70000 50 1' into its various parts. Fortunately, there is a function that can help.
The MATLAB built in function regexp()
is a very powerful function. The name, regexp
, stands for regular expression. The concept of a regular expression is a fundamental concept in Computer Science. In this step, we are only going to scratch the surface of the power of this function.
In particular, we are going to focus on only one use of this function—its use to split apart a string based on a delimiter symbol. The web site freedictionary.com defines delimiter as "a character or sequence of characters marking the beginning or end of a unit of data.". Consider the string:
'Eagles 70000 50 1'
In the string above, the delimiter is the space character. Contrast that with these two strings:
'Eagles,70000,50,1' '1,Hydrogen,H,1,1,Nonmetal'
In these two strings, the delimiter is the comma.
In MATLAB, we use use a special form of the regexp function to divide a string into a cell array. It works like this. Note that the second actual parameter to the function is a space character in between to single quotation marks: ' '
>> regexp('Eagles 70000 50 1',' ','split') ans = 'Eagles' '70000' '50' '1' >>
What gets returned is a cell array.
It works too if we use a comma as the delimiter. Note that the second actual parameter in this function call is a comma in-between two single quotation marks: ','
>> regexp('2,Helium,He,1,18,Noble gas',',','split') ans = '2' 'Helium' 'He' '1' '18' 'Noble gas' >>
We can combine this technique with the function that reads lines from a file, and use it to divide up a line into its parts:
>> [lines,howmany] = readFileToCellArray('nflTeams.txt'); >> lines{1} ans = Eagles 70000 50 1 >> parts=regexp(lines{1},' ','split'); >> parts parts = 'Eagles' '70000' '50' '1' >>
If we have a file that contains values separated by a delimiter, we can first use our readFileToCellArray() function to read the lines of this file into a cell array. We can then step through that cell array, split out the items, and fill a 2-dimensoinal cell array with all of the values.
In a moment, we'll show a script that combines these two ideas. But first, here are two concepts that you'll see in the script that may need some extra explanation.
If we know that we are trying to build a cell array of a certain size (a certain number of rows and columns), we can help MATLAB work more efficiently and quickly by pre-allocating. We tell MATLAB the size of the cell array we are trying to build, and it pre-allocates space for it, filling each cell with the empty matrix:
>> myNewArray = cell(4,5); >> myNewArray myNewArray = [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] >>
We can nest one for loop inside another, much like nested russian dolls.
Consider the following script M-file, called nestedLoop.m
(available in your lab07a directory—cd back over to there if you want to try running it.)
% nestedLoop.m Illustrate nested for loop % P. Conrad for CISC106, 10/14/2007 for i=1:4 fprintf('+'); for j=1:5 fprintf('-'); end fprintf('+\n'); end % end nestedLoop.m
Here is the output:
>> nestedLoop +-----+ +-----+ +-----+ +-----+ >>
Here is a second example called nestedLoop2.m—one that shows how we use i
and j
inside the loop:
% nestedLoop2.m Illustrate nested for loop % P. Conrad for CISC106, 10/14/2007 % use a nested loop that prints out value of i and j for i=1:4 for j=1:5 fprintf('(%d,%d) ',i,j); end fprintf('\n'); end % end nestedLoop2.m
And here is the output of the second example:
>> nestedLoop2 (1,1) (1,2) (1,3) (1,4) (1,5) (2,1) (2,2) (2,3) (2,4) (2,5) (3,1) (3,2) (3,3) (3,4) (3,5) (4,1) (4,2) (4,3) (4,4) (4,5) >>
Be sure you understand these two examples before proceeding.
So, here is a script M-file called readNflTeams.m (available in your lab07a directory) that combines the ideas of:
Read through the script and try to understand how it works. Then look at the sample output:
% readNflTeams.m P. Conrad for CISC106, 07F % set up a cell array containing data from nflTeams.txt % column 1: team name % column 2: stadium capacity % column 3: ticket price % column 4: wins so far this season (as of 10/10/2007) %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % read input into lines array % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% [lines, howMany] = readFileToCellArray('nflTeams.txt'); %%%%%%%%%%%%%%%%%%%%%%%%%%%%% % pre-allocate cell array % % this improves efficiency % %%%%%%%%%%%%%%%%%%%%%%%%%%%%% nfl = cell(howMany, 4); % howMany rows, 4 columns %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % loop through the lines, parsing out % % input and filling cells % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% for i=[1:howMany] % make a cell array of this row thisRow = regexp(lines{i},' ','split'); % loop through this row, copying into the big array for j=[1:4] nfl{i,j} = thisRow{j}; end end % end of readNflTeams.m
Here's the sample output:
>> readNflTeams >> nfl nfl = 'Eagles' '70000' '50' '1' 'Giants' '65000' '55' '3' 'Cowboys' '80000' '60' '4' >>
Understand this script M-file in detail before going to the next step.
So, it is all well and good to be able to do this with a script M-file. But note that this script is not very reusable:
'nflTeams.txt'
hard coded into itnfl
hard coded into it What we want is a more reusable function M-file to do the same task. We'd like to be able to pass in parameters for:
' '
or ','
) Your task is to come up with a function M-file that can be used in place of readNflTeams.m
. It should be able to read the file nflTeams.txt
, but it should also be able to read the file nflTeams.csv
(which is a comma delimited version of the file nflTeams.txt), and the file elements.csv
(a comma-delimited file containing the first 10 elements of the periodic table.)
Think about what your output variable(s) should be, and what a reasonable name for the function M file is. Then write the function M-file that will accomplish the task.
Note that your function M-file needs to be generic—it should be able to read any CSV file, or any file that has the same number of rows and columns, all delimited by some character. It should not be specific to the NFL, and should not contain any variable names or comments that refer to the NFL!
Put this function M-file in your lab07
directory, not your lab07a
directory.
When you are done, go on to step 9.
Think about a topic that is of interest to you, and the write your own comma delimited file. It should have at least 4 lines of data in it, and should have at least 4 columns. At least two of these columns should be numeric, and at least one should be a character string.
Make sure the function M-file you wrote in step 8e works on your new comma delimited file. When it does, you are ready to go to steps 10,11,12, where you make your diary file, make your zip file, and submit on WebCT.
lab07.txt
.Now, we want to make a diary file called lab07.txt documenting the work from steps 7, 8 and 9 of this week's lab.
Put yourself inside MATLAB, inside the directory ~/cisc106/lab07
, and start a diary file called lab07.txt
.
Then, do each of the following steps:
readFileToCellArray.m
input.txt
as well as ../lab07a/elementNames.txt
nflTeams.txt
, nflTeams.csv
and elements.csv
to show that it works for all three.
Just like last week, we want to create a zip file of the files you are going to submit.
Here's a list of all the files we want to put in the zip file for this week—all of these files (and only these files) should be in your lab07
directory.
Step | Filename(s) |
---|---|
7 |
|
8 | the new function M-file you created |
9 | the new data file you created |
Here's how to do it:
~/cisc106
, not ~/cisc106/lab07
.m
, .txt
, .csv
and .dat
files in your lab07 directory:
zip -r lab07.zip lab07 -i \*.m \*.csv \*.txt \*.dat
Afterwards you'll have a file called lab07.zip
in your ~/cisc106 directory that you can submit on WebCT.
lab07
directory containing the appropriate information. This is optional, but highly recommended.
(Detailed instructions for how to do this are available in the instructions for lab06.)Now you can submit your work on WebCT, and you are done!
step | what we're looking for | points |
---|---|---|
step 7 |
|
30 |
step 8 | the new function M-file you created |
30 |
step 9 | the new data file you created |
10 |
step 10 | lab07.txt diary file: following directions in scripting |
10 |
step 11 | lab07.zip file: following directions to correctly create zip file(should unzip into a directory called lab07 , not just a bunch of files) |
10 |
overall following of directions |
student should follow the directions given | 10 |
Total |