CISC474, Reading Notes for Sebesta,
Programming The World Wide Web, 3rd Edition

Reading notes by Phill Conrad, Asst. Professor, CIS Dept. University of Delaware

Note: If you get a warning about "blocked active content", don't worry. It is because of the JavaScript drag and drop elements in Section 1.8.5. You can allow the content, or block it, as you see fit; if you block it though, the drag and drop demo won't work.

Chapter 1: Fundamentals

I think you'll find this a really useful textbook.

Sebesta has great tutorials for several useful topics that aren't covered in HFSJ, including practical details of XHTML, CSS, JavaScript, and MySQL (all technologies we'll definitely be using this semester.)

In addition there is coverage of Perl, PHP, and ASP.NET, and Java Applets, which we may or may not get to, but you are likely to encounter in the real world, and might decide to use in your projects.

Finally Sebesta also provides coverage of Java Servlets and JSP that can help summarize and reenforce what we learn in HFSJ.

Having said that, my advice to you as you read Chapter 1 is: don't judge a book by its first chapter.

I'm afraid it gets off to a weak start.

Do you agree or disagree? How could it be made better?

I have some advice to Sebesta at the end of this document on how the book could be improved; I'd be interesting to hear opinions from CISC474 students on that advice! But read the chapter and the reading notes first, so that you'll have an informed opinion.

If you like, skip to Section 1.6 on first reading, but read all of these notes carefully!

If you are an impatient reader you might want to skip pages 1-13 (i.e. up through Section 1.5) on your first reading. Start with Section 1.6, and come back to pages 1-13 afterwards.

However, be sure to read through these reading notes, even for Section 1.1 through 1.5. There may be details that are only in these notes that you are responsible for on the exams.

These notes will tell you what I think is important in the readings (and since I'm designing and grading your exams, you probably will think those things are important too.) I'll also fill in gaps, offer contrary points of view, refer you to related material in your HFSJ textbook, and point out online resources that related to what you are reading.

Be particularly vigilant about anything that appears in a box such as this one. These boxes contain material specifically set aside as important (e.g. for your exams) and often not covered in the textbook.

On the other hand, notes that are in beige shaded boxes like this one are ones that are provided for your background information only. They are unlikely to appear on an exam.

You may also see the following background color used for code listings, or transcripts of terminal sessions. That color is just to help you find them more easily, and separate them from the text that surrounds them.

<html>
<head>
  <title>P. Conrad's Web Page</title>
</head>
<body>
...

Ok, let's start reading!

1.1 A Brief Introduction to the Internet

1.1.1 Origins

Gack. A textbook that seems, on the whole, quite good, gets off to an inauspicious start by reenforcing one of the most common misconceptions about the origin of the Internet.

It doesn't have a whole lot to do with the central themes of the course, but you might as well know:

"Surviving a nuclear war" was not, per se, a design goal of the early ARPAnet. This is an urban legend.

To be fair, Sebesta doesn't say that, but by emphasizing the following point, he does tend to lend support to that wrong view:

"One fundamental requirement was that the network be sufficiently robust so that even if some network nodes were lost due to sabotage, war, or some more benign reason, the network could continue to function." (Sebesta, p. 2)

It is true that the designers of the Internet wanted it to be robust to failure. However, as for being worried about sabotage, or war—while there is some dispute about this— the best sources indicate that the designers of the internet has no such idea in mind.

Probably the best source that debunks this canard is a history of the Internet from the Internet Society's own web site written by nine co-authors, several of whom are well-known to be have been present at the very creation of the Internet, including:

Vint Cert and Bob Kahn, the co-authors of the original specs for the TCP and IP protocols
Len Kleinrock, in whose laboratory at UCLA the first router (called an Interface Message Processsor, or IMP) on the Internet was delivered and started up
Larry Roberts, who directed the team of engineers that created the first IMP.

According to the most reliable accounts, the real purpose of the Internet was to enable the DoD, specifically the Advanced Research Projects Agency (ARPA), to better exchange data with the various universities that were doing DoD sponsored research, and to enable those universities to exchange data with one another. Although it was military money that paid for the network, the military itself wasn't even a primary user of the network—the users were academic scientists, and DoD civilians that oversaw their work.

Anyway, for purposes of this course, this is all just an aside, so let's move on. While the material in Section 1.1.1 is probably good background for your general knowledge, the only details from Section 1.1.1 I want you to know for this course are:

The Internet grew out of the ARPAnet, a project sponsored by the US Pentagon to support research in computing.
The first node on the ARPAnet was at UCLA in 1969 (on Labor Day weekend, just weeks after the first humans landed on the moon over the 4th of July weekend.)

1.1.2 What the Internet Is

The key idea in this section is that the Internet is a network of networks, and that it is the TCP/IP protocol suite that allows all the computers on the Internet to communicate.

Again, though, Sebesta is slightly misleading on a key detail:

Some Things You Should Know About

the TCP/IP protocol suite

Parts shaded in blue might appear on an exam.
Parts shaded in beige and in a smaller font are for your background knowledge only.

TCP/IP is not a "single low-level protocol", as the book suggests (p. 3). Rather, the name TCP/IP (with the slash between the TCP and the IP) refers to the entire suite of protocols used on the Internet.

The TCP/IP protocol suite includes the protocols TCP and IP, but also includes protocols such as HTTP, DHCP, UDP, and others. The details of the Domain Name System (DNS) are part of the TCP/IP protocol suite as well.

Another name for the TCP/IP protocol suite is the "Internet Protocol Suite". We can say that HTTP is an "Internet protocol" or that it is a "TCP/IP protocol" to indicate that it is part of that protocol suite.

A few facts you should know about the TCP/IP protocol suite.

The TCP/IP protocol suite is standardized by the Internet Engineering Task Force (IETF). Documents called RFCs (Requests For Comments) specify Internet Standards. Their web site is www.ietf.org.
An organization called the Internet Society (ISOC) oversees both the IETF, and the Internet as a whole. Their web site is www.isoc.org
The TCP/IP protocol suite can be divided into a number of layers. The number of layers differs from author to author. For our purposes, a brief overview of five layers will be enough (see box below).
- If you are interested in more details than are provided here, you can take CISC450 (Computer Networks)

Finally, while HTTP is a TCP/IP protocol, and as such is standardized in RFC2616 by the IETF, languages such as HTML and CSS are not part of the TCP/IP protocol suite . Those languages are standardized by the World Wide Web Consortium (W3C) (web site: www.w3.org). We'll revisit the topic of the World Wide Web Consortium in Section 2.1.1.

Some Things You Should Know About

The Five Layers of the TCP/IP protocol suite

Parts shaded in blue might appear on an exam.
Parts shaded in beige and in a smaller font are for your background knowledge only.

The five layers we'll concern ourselves with are (from top to bottom):

application (e.g. HTTP)
transport (e.g. TCP)
network (e.g. IP)
data-link (e.g. Ethernet)
physical (e.g. CAT 5 twisted pair cable)

We'll now look at each of those layers in more detail (this time from bottom to top).

The lowest layer is the physical layer at the bottom, and concerns the representation of bits 1s and 0s as voltage, light, or radio waves),
The data-link layer sits directly on top of the physical layer. It's main purpose is framing: turning a sequence of bits into a sequence of packets. (What is a packet? See the next bullet point.)
- Examples data links layers include:
  - Ethernet, which typically uses "CAT 5 Twisted Pair Cable" as its physical layer.
  - Wireless LANs. Data Link Layer standards include 802.11b or 802.11g which use radio waves as its physical layer (standards include DSSS and OFDM
A packet is a sequence of bytes with a well-defined beginning and end, and clearly separated into header (containing address, sequence number and other information) and payload (containing user data to be transmitted.)
the network layer sits on top of the data-link layer and provides a way to move packets from any node int network to any other node in the network. The network layer protocol in the Internet is IP.
- The current version of IP is version 4 (IPv4).
- A new version, IPv6 is about 3-4 years away, and has been for about a decade. Ten years from now, it will likely still be 3-4 years away. (That was a joke, but not entirely). I can say more about this if you are interested, but it isn't really a topic for this course; we'll only deal with IPv4 in CISC474.
There are at least three separate concerns at the network layer:
- addressing (making sure each node has a unique address, i.e. an IP address (see also: Sebesta, Section 1.1.3)
- routing (figuring out how to get from point A to point B, for every (A,B) pair in the network), and
- packet forwarding (actually moving the packets on the calculated routes.
The transport layer sits at the end hosts and takes care of the fact that the IP network layer sometimes can lose, corrupt, duplicate, or reorder packets. TCP is the main transport layer protocol in the Internet.
- Other transport layers include UDP, which is sometimes used for streaming of multimedia, and SCTP, a new protocol that was developed primarily for voice-over-IP signalling. We'll probably have no occaision to encounter either of those in CISC474.
TCP provides a reliable-byte stream services to application layer protocols that sit on top of it. TCP does error checking, and resequencing on network-layer packets. TCP manages retransmission of lost or corrupted packets and throws out duplicates. These functions of TCP will be mostly invisible to us in CISC474.
The main transport-layer function we have to be directly aware of in CISC474 is multiplexing. TCP allows multiple applications (e.g. web browsing, email, file-transfer, remote login) to communicate between two IP addressses, by maintaining separate logical connections. TCP does this by providing port numbers. (HFSJ p. 21 has more detail on port numbers).

A TCP connection is defined by four numbers:
- local IP address
- local port number
- remote IP address
- remote port number
The Application layer includes protocols for specific applications run directly by end users (e.g. web browsing, file transfer, email, etc.) Our main concern in this course will be the application-layer protocol HTTP. HTTP sits directly on top of TCP. We'll look at HTTP in detail in Section 1.7 of Sebesta, and in Chapter 1 of HFSJ.

1.1.3 Internet Protocol Addresses

A few facts to know from this section: IP addresses are 32-bits, and usually written in "dotted-decimal form". For example, the IP address of strauss.udel.edu is 128.175.13.74. This corresponds to 32-bits as follows:

128.	175.	13.	74
1000 0000	1010 1111	0000 1101	0100 1010

The book is mostly right when it says that the four parts (i.e. four bytes) of an IP address are used separately to route messages. Back in the day, this was literally true. Now, something called Classless Inter-Domain Routing (CIDR) is being used to route messages on portions of an IP address that don't necessarily fall on 8-bit boundaries. But the basic idea is still the same.

CIDR has helped to slow the demand for IPv6 by allowing organizations to use blocks of addresses that don't necessarily correspond to 8-bit boundaries.

Another technology that has slowed the need for IPv6 is the use of Network Address Translation (NAT), where a boundary router converts between public IP addresses and private IP addresses (for example, those that start with the first byte being 10, such as 10.0.0.1.) If you use a wireless router with a cable or DSL modem to share a single ISP connection among multiple computers, you are probably doing it by using NAT. Thus, you are using a single "public" IP address (the one registered by your cable modem or DSL modem) to connect multiple computers to the Internet. Many small and medium size businesses do this as well, but on a larger scale.

Both NAT and CIDR have had the effect that you should be a bit skeptical of predictions that "IPv6 is soon to be essential becuase the number of unused IP addresses is diminishing rapidly". The larger point is valid, but the pace of change is liable to be years, not months. There is even doubt among some as to whether IPv6 will ever take firm hold in the marketplace.

The unix nslookup utiltiy provides one way to look up IP addresses (this is a transcript of a terminal session from strauss.udel.edu)

> /usr/sbin/nslookup
Default Server:  localhost.udel.edu
Address:  127.0.0.1

> strauss.udel.edu
Server:  localhost.udel.edu
Address:  127.0.0.1

Name:    strauss.udel.edu
Address:  128.175.13.74

> www.mit.edu
Server:  localhost.udel.edu
Address:  127.0.0.1

Name:    www.mit.edu
Address:  18.7.22.83

> www.microsoft.com
Server:  localhost.udel.edu
Address:  127.0.0.1

Non-authoritative answer:
Name:    lb1.www.ms.akadns.net
Addresses:  207.46.20.30, 207.46.19.30, 207.46.19.60, 207.46.20.60
          207.46.18.30, 207.46.199.30, 207.46.225.60, 207.46.198.30
Aliases:  www.microsoft.com, toggle.www.ms.akadns.net
          g.www.ms.akadns.net

> www.gnu.org
Server:  localhost.udel.edu
Address:  127.0.0.1

Name:    gnu.org
Address:  199.232.41.10
Aliases:  www.gnu.org

> exit
>

1.1.4 Domain Names

Be sure to know the following terminology from this section:

fully-qualified domain name

Another thing to point out: the book suggests that telnet is a good way to determine the IP address of a fully-qualified domain name (as illustrated in Section 1.7.1). This is true. Just be advised that telnet isn't necessarily a good way to connect to a system if you are going to be typing in a password—in those cases, use ssh instead. (The example in Section 1.7.1 does not involve typing in any passwords, so it is fine.)

One other correction: in the second to the last paragraph on page 5, Sebesta indicates that telnet and ftp are protocols. This is true:

The RFCs for these are RFC0854 and RFC0959 respectively.

However, mailto is not a protocol. Sebesta's error is understandable; mailto appears in the "spot" in a URL where a protocol name normally goes. However, strictly speaking, mailto is a "URI scheme" (see Section 3.5 of RFC1738, as well as Section 1.5.1 of Sebesta itself.)

The protocols for email include SMTP (RFC0821, updated by RFC2821), POP3 (RFC1460) and IMAP (RFC3501). You can find the RFCs for particular protocols by consulting the RFC Index at the IETF web site.

1.2 The World Wide Web

1.2.1 Origins

Be sure to know the terms

hypertext
hypermedia

Know the name Tim-Berners Lee and the significance of the date 1989 (twenty-years after 1969, when the ARPAnet first came to life).

Also know that the terms document, page and resource are used more or less interchangably to talk about items available on the web.

1.2.2 Web or Internet?

The Web is not the Internet, and the Internet is not the Web.

Hopefully, you could explain that statement on an exam.

1.3 Web Browsers

Particular concepts to pay attention to in this section

the definition of client and server (from the first paragraph in Section 1.3 on p. 7)
the role of browsers as web clients.

You should also know that Mosaic, released in 1993, was the first graphical browser for the Web.

One of the co-authors of Mosaic, Marc Andreesen, went on to be a co-founder of Netscape.

The last paragraph on p. 7 contains a very nice overview of web architecture, particularly the part that starts "However, more complicated situations are common", so I commend this to your attention.

Sebesta mentions Internet Explorer and Netscape as the main browsers. It is likely that the Firefox browser came to prominence after the text of Sebesta's book was already finalized.

Our focus in CISC474 will be primarily on the current versions of

Firefox (v1.5) for Unix (Solaris, Linux) and Windows, and
Internet Explorer (IE6) for WIndows,

1.4 Web Servers

As the introduction to Section 1.4, points out, the most common web servers are Apache and IIS.

For our part, we'll be using a web server called Tomcat, which comes from the Apache project. It is a special web server that is designed to be a Java Servlet Container. (We'll read more about Servlet Containers in HFSJ, in particular, pages 39 through 43.) However, it can also do all the "basic" functions of a web server as well (as described in Section 1.4).

Some Things You Should Know About

Tomcat

as you read about Web Servers in Sebesta

Parts shaded in blue might appear on an exam.
Parts shaded in beige and in a smaller font are for your background knowledge only.

All the stuff in this box is stuff that is covered elsewhere in the course in much more detail. This is just a quick summary to help you tie all the pieces together as you read about Web Servers in Sebesta.

The web server we'll be using is called Tomcat. Tomcat is allows us to serve not only static web pages (e.g. .html files), but also web pages where the content is the result of running some Java code. That Java code is specified in one of two forms: a Servlet or a Java Server Page (JSP)

Servlets are Java classes with methods that can take a request for a web page, and turn that request into a response.
JSPs are web pages that contain a mixture of HTML, Java code, and some other things. In fact, a JSP is nothing more then a convenient way of writing a Servlet. Web containers such as Tomcat compile JSPs into Servlets "on the fly".

So whether you write a servlet directly, or write a JSP, either way, your page ends up being generated by a Servlet.

Another technology for generating web pages as a result of some calculation on the server side is PHP, which is based on the scripting language Perl.

Tomcat is not designed to be able to serve PHP pages "out of the box". However, Tomcat can be configured to serve PHP pages .

Serving PHP with Tomcat is perhaps not the best architecture for a production environment—there are probably other servers that are more efficient or effective at serving PHP. For a learning/testing environment though, serving PHP with Tomcat might be "good enough". We may end up using Tomcat as a PHP server as well to avoid the overhead of having to install yet another piece of software.

1.4.1 Web Server Operation

Some key ideas from this section

Which takes more resources: serving a file, or displaying it? Justify your answer.
The last paragraph on p.8 points out that a URL can specify two different things. This is a crucial point to understand for purposes of CISC474, so pay particular attention to this.

There is a also a nice summary on p. 9 of the details of how browsers and web servers interact. Compare this summary with the five layers discussed in the box on the Five Layers of the TCP/IP protocol suite in Section 1.1.2 of these reading notes, and see if you can find the parallels.

1.4.2 General Server Characteristics

Sebesta describes the document root and the server root. Read to find out what these terms mean.

Then explore the concept for yourself.

The server root for the http://www.udel.edu web site appears to the the directory /www on strauss.
The document root for the http://www.udel.edu web site appears to be the directory /www/htdocs on strauss.
For http://copland.udel.edu, server root appears to be /home/copland/usra/www/, and the document root appears to be /home/copland/usra/www/docs.

How could you check whether the assertions above are still accurate, and if they are, show evidence that they still hold?

You'll also see these concepts when we work with Tomcat. I might ask you on an exam to relate the definitions from this section to our work with Tomcat.

This section also includes mention of virtual document trees. Broadly speaking, virtual document trees allow you to serve documents from places other than subdirectories that are under the document root.

If you've ever maintained a personal web site on copland, you are aware of another mapping from URL to file system that is not mentioned in this section. What mapping am I referring to?

Also be familiar with the terms: virtual host and proxy server.

1.4.3 Apache

Two things to know from this section:

What is the origin of the name Apache?
Who maintains and distributes Apache?

The rest is both too much and not enough to be useful: too much detail about a program we probably aren't going to use this semester, and way too little detail about Apache if we were going to use it! So once you've answered the two questions above, you can skip the rest of this section for purposes of CISC474.

1.4.4 IIS

Two things to know from this section:

IIS is the dominant web server on what operating system?
What organization maintains and distributes IIS?

The rest is just like Section 1.4.3: both too much and not enough to be useful (see details there). So once you've answered the two questions above, you can skip the rest of this section for purposes of CISC474.

1.5 Uniform Resource Locators

Some Things You Should Know About

URL vs. URI vs. URN

You'll encounter these various terms in the W3C literature all over the place. So uou should have some idea of the differences between and among these terms.

However, this time, I'm not going to deprive you on the opportunity to research this on your own. Search engines such as Google are your friend. See what you can find.

Extra credit points for the best postings to the WebCT discussion board about what these three stand for, the subtle differences in meaning among the three, and an explanation of why the confusion exists in the first place, and how the meanings of these have changed over time. You'll probably also discover whether the "U" in fact stands for "Uniform" or "Universal".

(Use the board marked marked URL vs. URI vs. URN ). Points will be given for both the best summaries in your own words, as well as the best links to web sites that explain the difference.

Once these postings have been made, I'll summarize on that WebCT discussion board what you should know for the exam(s).

1.5.1 URL Formats

A few things to know from this section:

What a "scheme" is as it relates to a URL
Where the port number goes if it is something other than 80 (you'll definitely need to know that to work with Tomcat this semester!)
What it means when there is a %20 in the middle of a URL

One thing to note: the book says that ampersand ("&") is a character that cannot be part of a URL. In general, this is true.

However, there is a circumstance where "&" characters have a particular meaning in a URL; they are used separate name/value pairs for parameters in a query string. For example, the URL below can be used to look up information about CISC474 for Spring 2006. Note the & characters that separate the name/value pairs term=06S and course_sec=CISC474010. (Try clicking on it; it uses a Java Server Page to return the information!)

http://chico.nss.udel.edu/CoursesSearch/courseInfo.jsp?&term=06S&course_sec=CISC474010

For more information on parameters and query strings, see pp. 110-111 in HFSJ, and Section 10.3 in Sebesta.

Here also, for you convenience, is the URL cited at the end of Section 1.5.1 in Sebesta as a hyperlink:

http://www.w3.org/Addressing/URL/URI_Overview.html

1.5.2 URL Paths

In the first paragraph of this section, Sebesta says something suprising about the direction of slashes in a URL path—if what he says is true, it is news to me. Extra credit for anyone who can find an independent source to verify (or authoritatively refute) Sebesta's assertions here, and/or show an example where his claims check out in practice.

The rest of this section is stuff you probably already know, but read it over to be sure. In particular, know what gets served up if the URL you specify maps to a directory name, and not to a specific file (there are two possible cases; know what happens in each case.)

1.6 Multipurpose Internet Mail Extensions

If you've ever sent or received an email with an "attachment", you can thank MIME. MIME is the "under-the-hood", "behind-the-scenes" technology that makes email attachments work.

So what is MIME doing in a web technologies course? Well, it turns out the format used to specify how attachements are handled in email was "repurposed" to serve as the way that Web content is identified.

So, even though the second M in MIME stands for Mail (the original purpose of MIME), today MIME types are just a "way to identify the type of some content".

On the Unix operating system, a file is just a sequence of bytes, so unless you know the type of the file, it is not possible to correctly "interpret" the contents. At some point you've probably accidentally opened a Microsoft Word document or a JPEG image as a text file (e.g. in vi or emacs); as you know, you just get a screen full of nonsense. (If you've never had that experience, do it once just so you can say that you have done it.)

Most of us are used to identifying the type of files by their file extensions—e.g., a web file ends in .htm or .html, an image file ends in .jpg, .jpeg, .gif, or .png, a sound file ends in .wav or .au, and a Microsoft Word document ends in .doc, etc. Most software is pre-programmed to only open files with the right kind of file extension, and/or to interpret the contents based on the extension.

For example, programs like PhotoShop can typically open both .gif and .jpeg files, such programs will look at the file extension when choosing what algorithm to use to decode the file and load it into a buffer for editing, one algorithm for .gif, and a different algorithm for .jpeg.

As it turns out, that creates some problems on the web, since some browsers (e.g. Internet Explorer) tend to follow that convention, while others (e.g. Firefox) tend to rely strictly on the MIME type set in the HTTP headers by the server. The problem is that the server doesn't always set the MIME type correctly, so sometimes we poor dumb users have to give the web server a little help.

MIME types turn up over and over again in working with web technologies. A few examples:

In HTML and XHTML, MIME types are involved in correctly specifying the <meta>, <script> and <style> elements.

Getting an XML document served from copland.udel.edu to show up properly in the Firefox browser involves configuring MIME types in a .htaccess file.

The headers of HTTP requests and responses contain MIME types (as you'll see in Sebesta Section 1.7, and HFSJ pages 15, 16 and 17)
MIME types show up in Java Servlet development in at least two places:
- as a parameter of the response.setContentType() method (HFSJ Chapter 4, p. 130-131), and
- in the <mime-mapping> element of the Deployment Descriptor (HFSJ Chapter 11, p. 601.)

Anyway, all of this is just to say: MIME types are important. So read Sections 1.6.1 and http://www.joelonsoftware.com/articles/Unicode.html

1.6.1 Type Specifications

Before I tell you what you should know from this section, you need to read this Blue Box:

Some Things You Should Know About

the term "MIME Type"

Sebesta makes a technical distinction among

MIME type (e.g. text)
MIME subtype (e.g. plain), and
MIME specification (e.g. text/plain)

However Sebesta's terminology differs from that used in RFC2045/RFC2046 (the Internet Standard for MIME), and, if search engine results and our HFSJ textbook are any indication, from common practice.

Here's the more common practice, and the one we'll use in this course:

What Sebesta calls a MIME type (e.g. text, image, and video), RFC2045 calls a "media type"
What Sebesta calls a MIME subtype (e.g. plain, html), RFC2045 calls a "media subtype"
What Sebesta calls a "MIME specification", is more commonly just called a "MIME type".

In fact, compare the use of the term "MIME type" on p. 17 of HFSJ.

So, we'll use this more common terminology.

In fact, if you type "MIME specification" (in quotes) into a search engine, chances are that if you look at where that phrase appears in context, most often "MIME specification" is referring to RFC2045/RFC2046 themselves—that is, the "specification for MIME", the standards documents in which MIME is "specified."

Having said that, from this section, know the following:

the general format for a MIME type (in Sebesta: MIME specification)
a few of the most common media types (in Sebesta: MIME type)
the most common media subtypes (in Sebesta: MIME subtype)
the MIME type of a typical HTML web document (in Sebesta: MIME specification)
how the server typically determines the MIME type it sends along with a document (in Sebesta: MIME specification)

From here on out, I'll usually omit the (in Sebesta: foo) stuff and just assume you've made the switch to the common terminology.

1.6.2 Experimental Document Types

From this section, know

How to recognize an experimental MIME type
What browsers typically are expected to do with experimental MIME types
Two basic MIME types that you'd expect every web browser to be able to handle

1.7 The Hypertext Transfer Protocol

Sebesta refers you to The W3C web site (http://www.w3.org) for the HTTP protocol spec, RFC2616.

While RFC2616 is available at the W3C, it would be more appropriate to go to the IETF web site for that particular spec.

Remember:

Protocols for communication over the Internet are developed and standardized by the IETF (in the form of RFCs).
Markup languages and other ways of specifying content (such as HTML, XHTML, XML and CSS) are developed and standardized by the W3C.

Before proceeding into Sections 1.7.1 and 1.7.2, take a moment to familiarize yourself with the following terms defined in the intro to Section 1.7:

request
response
header
body

1.7.1 The Request Phase

Before I tell you what you should know from this section, you need to read this Blue Box:

Some Things You Should Know About

HTTP Methods

Because we'll be spending lots of time with both HTTP and Object Oriented Programming in Java, we'll use the word "method" frequently.

First, make sure you are clear that the word method has two meanings in this course:

HTTP methods (which are the eight different things you can put in a request message, as specified in Section 9 of RFC2616)
Java methods (which are the member functions of a Java class)

The most common HTTP methods that you'll deal with are GET and POST. You'll hardly ever need to know about any others. Most requests for web pages use GET.

POST is only needed when you are sending information to the server along with your request (e.g. you've filled out some fields in a form on a web page.) Even then, you sometimes use GET, and sometimes POST. HFSJ has pages and pages (13-19, and 110-118) about when to use GET and when to use POST, so we'll leave that discussion for the HFSJ reading notes.

So, to review, altogether there are eight HTTP methods. Sebesta mentions only five of them, while HFSJ (on p. 109) mentions all eight. Which ones did Sebesta leave out?

This section discusses all the parts of an HTTP request. While this is a nice detailed discussion of all the pieces of an HTTP request object, one thing that is missing is the "big picture".

So you might find it helpful to refer to pages 15 and 16 of HFSJ while you read this section, where you can see a complete GET request (p. 15) and a complete POST request (p. 16).

I strongly encourage you to try the experiment that starts at the bottom of p. 17, where you use telnet to talk directly to a web server. Note that this is NOT a security problem, since you are never sending a password. Here, rather than using telnet as a login client, you are using telnet as a "general purpose TCP connection client", to establish a text-only connection directly with the web server. Essentially you are "pretending to be a web browser", and sending what the web browser "would send".

Instead of typing "telnet blanca.uccs.edu http", though, try this. The part you type is in bold.

It doesn't matter where you system you type this on as long as you have a telnet client and Internet access. Note however that

you need a blank line after you type Host: copland.udel.edu.
everything is case sensitive (e.g. "GET" must be all caps, "Host" must be capital "H", lowercase "ost".).

The part that starts with HTTP/1.1 200 OK is the response from the server, which is covered in the next section (Section 1.7.2)

$ telnet copland.udel.edu 80
Trying 128.175.13.92...
Connected to copland.udel.edu. 
Escape character is '^]'. 
GET /~pconrad/index.html HTTP/1.1 
Host: copland.udel.edu

HTTP/1.1 200 OK 
Date: Thu, 19 Jan 2006 01:46:46 GMT 
Server: Apache/1.3.26 (Unix) mod_ssl/2.8.10 OpenSSL/0.9.6g 
Last-Modified: Sun, 05 Sep 2004 13:52:13 GMT 
ETag: "619e7b-29d-413b1a0d" 
Accept-Ranges: bytes 
Content-Length: 669 
Content-Type: text/html 
X-Pad: avoid browser bug

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd"> 
<html> 
<head> 
   <title>Phillip Conrad, udel.edu home page</title> 
   <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> 
</head>
<frameset rows="*" cols="133,*" frameborder="YES" border="1" framespacing="0"> 
 <frame src="index2c.html" name="leftFrame" scrolling="YES" noresize>  
 <frameset rows="115,*" cols="*" framespacing="0" frameborder="YES" border="1"> 
  <frame src="index2b.html" name="topFrame" scrolling="NO" noresize> 
  <frame src="index2a.html" name="mainFrame"> 
 </frameset> 
</frameset> 
<noframes><body>
</body></noframes> 
</html> 

Connection closed by foreign host.
$

Now try this same experiment with some other servers and pages.

Try some pages that you know don't exist, to see what response you get.
Try making syntax errors in your request
Try requesting an image file (e.g. a .gif or .jpeg)

1.7.2 The Response Phase

Read through the description of the HTTP response, and compare it with the responses shown on

p. 19 of Sebesta (in this section)
the response shown above to the request for ~pconrad/index.html on copland.udel.edu
the response on p.17 of HFSJ
the response you get when you try your own

Sebesta notes that in HTTP/1.1 the default is to leave the connection open, and that this results in signficant increases in efficiency.

These changes to HTTP were based on the work of Jeff Mogul, and were first published in an ACM SIGCOMM conference paper in 1995.

1.8 The Web Programmer's Toolbox

This intro is a great overview of the main topics that you should become familiar with this semester, and a good starting point for your Concept Map assignment.

1.8.1 Overview of XHTML

Even if you already think you know a lot about HTML, this section is important to read, because it puts XHTML in context.

Some things you should get from this section:

What is a markup language?
What is an element in XHTML?
What is an attribute?
What is the relationship between HTML and XHTML?

There is more to say about XHTML, but we'll save that for Chapter 2 (which is entirely devoted to XHTML.)

1.8.2 Tools for Creating XHTML Documents

From this section, you should know what a WYSIWYG editor is (you probably know already).

A couple of updates to this section, and things that may be helpful to know about particular WYSIWYG editors:

Adobe Page Mill was discontinued in March 2000. Their new product is called Go Live.
Adobe bought out Macromedia (former makers of Dreamweaver) this past year. Look for some realignment in this product area over the next year or two.
Dreamweaver MX is available on computers in a lab in Memorial 028. This lab is used to teach some classes in the English department (including one called "Designing Online Information" that covers Dreamweaver.) During times when the lab is not being used for classes, you can use Dreamweaver there if you like.
While Dreamweaver MX retails for $400, with academic discount it goes for around $200 through the UD Bookstore's software vendor (JourneyEd) as of January 2006.
You can also download Dreamweaver (fully functional) and try it free on any given computer for 30 days.
Dreamweaver is what I use. Although Sebesta says that Dreamweaver cannot handle all the tags of XHTML, I have yet to find any feature of XHTML that it does not support; in fact, I use it to create 100% XHTML 1.1 compliant documents. (This document itself was created in Dreamweaver MX, as a matter of fact.)

1.8.3 Plug-ins and Filters

There is nothing in this section that you are responsible for for CISC474; feel free to skip it.

1.8.4 Overview of XML

This is a good overview of XML, but without any examples it will probably seem hopelessly abstract.

The point is that in XHTML, you are given a set of tags to work with (e.g. <strong>, <p>, <h1>, <head>, <body>) but in XML, you come up with your own tags depending on what kind of data you are working with.

For medical data, your tags are things like <patient><diagnosis><condition><blood-pressure>.

For ski resorts, your tags are things like <lift-ticket-price><number-of-trails><snow-depth>

After you tag the data with its actual structure, you can write applications that format it into web pages, do queries on it (similar to database queries), format it for printing, or a variety of other transformations.

Taking a look at some examples may help you. You don't need to read these pages in detail, but at least glance at the code examples on the following pages:

p. 301 (the code for the 1960 cessna),
p. 302 (the code listing a patient named Maggie), and
p 311 (the code for a bunch of ads for planes for sale.)

Here are some more examples of XML files from various web sites. Note that in IE and Firefox, you'll see these files in a web browser with little minus signs in front of the elements. Clicking on these will change them into plus signs. Clicking again will change them back to minus signs. Observe what is happening. (See the explanation in Section 8.7, p. 322 through p. 324 if you aren't sure).

Example XML files from w3schools.com:

A CD catalog
A food menu

There are some nice examples on that same w3school.com page of more interesting things you can do with XML files.

But beware; only about half of these things are cross-browser. Many of them work only in IE and only on Windows. As far as I know, these IE pages use Microsoft-specific extensions and not W3C standards, so it is not that Firefox is broken, but rather that Microsoft is trying to get "out ahead" of the standards bodies.

I could be mistaken about this, and will offer extra credit to anyone that can shed light on this, either by documenting that my assumption here is correct, or documenting that it is not correct.)

1.8.5 Overview of JavaScript

Some things to know about JavaScript

the relationship between Java and JavaScript (hint: there isn't one)
where JavaScript code gets interpreted
two main uses for JavaScript

The point about JavaScript being dynamically typed is important, but we'll cover that in much more detail when we cover Sebesta Chapter 4, which focuses on the JavaScript language.

You may have heard of Dynamic HTML (DHTML). It turns out that DHTML isn't really a separate technology at all, but is rather a set of techniques that involve using JavaScript, Cascading Style Sheets, HTML or XHTML, and something called the Document Object Model (DOM). DHTML allows you to to really cool things like drag-and-drop (for example, try dragging the pretty little Blue and Gold balls shown here around the page!) Chapters 5 and 6 in Sebesta will get us into that material.

JavaScript is also the basis for hot new approach to building web applications called AJAX. AJAX is the basis of lots of hot websites such as Google Maps and Gmail. The J in AJAX stands for JavaScript, and the X stands for XML.

The article that introduced AJAX to the world points out that AJAX isn't really a separate technology. Just like DHTML, it is a particular way of using a combnination of existing technologies to achieve a really cool result.

We may or may not have time to cover AJAX in detail, but I hope to at least introduce you to some of the basics. If you are interested, I encourage you to pursue applications of AJAX in your projects.

1.8.6 Overview of Java

Given that CISC370 is a strict pre-requisites for CISC474, pretty much everything in this section should be review. In any case, be sure you know the following terms:

bytecode
applet
servlet

As long as we are on the subject, there is a longer list of things you should already know about Java in the file topics/java/thingsYouShouldKnow.txt on the course web site.

Also, somehow Sebesta mentions ASP.NET in the Java section, which is ironic, since with ASP.NET, Microsoft really seems to be trying to steer folks away from Java (and towards their own languages called Visual Basic and C#). We probably won't spend a lot of time on ASP.NET this semester; I was starting to move in the direction of including more ASP.NET in the course, but my contacts at Microsoft kind of dried up.

1.8.7 Overview of Perl

This section is probably useful for anyone who has "heard of Perl", but doesn't really know what it is.

This section will probably be more interesting if you read it in combination with pages 28 and 29 of HFSJ, where CGI programs are discussed, and the Kung Fu masters debate CGI/Perl vs. Java Servlets.

Two things you should know from this section

The main purpose of CGI (you can also get this from p. 28 of HFSJ)
The fact that CGI programs are run on the server side.

Beyond that, I don't plan to ask exam questions about the details of 1.8.7; if I ask you about CGI and/or Perl, it will only be after we actually do something with it. In that case, I'll probably look more to Sebesta Chapters 9 and 10 for questions.

However, I might ask you exam questions about p. 28 and p. 29 of HFSJ, so reading this section to help you understand those pages better might be very useful. This section might also help you with your Concept Mapping assignment.

1.8.8 Overview of PHP

Like the previous section, section is probably useful for anyone who has "heard of PHP ", but doesn't really know what it is.

As with Section 1.8.7, I want you to know

the basic idea of PHP (a language for embedding code in XHTML documents)
the fact that the code gets executed on the server before the XHTML is returned to the client
that the browser never sees any of the PHP code (only the generated XHTML)

Beyond that, I don't plan to ask exam questions about the details of 1.8.8; if I ask you about PHP, it will only be after we actually do something with it. In that case, I'll probably look more to Sebesta Chapter 12 for questions.

This section may help you with your Concept Mapping assignment, though.

1.9 Summary

A nice summary. But beware of some inaccuracies that have already been covered in the detailed sections above. I'll leave it as an exercise to you to find them (I found at least three!)

1.10 Review Questions

Comments on these questions:

I won't ask this one, because it is based on an urban myth (see comments on Section 1.1.1)
The answer Sebesta is probably looking for is TCP/IP, but that would be inaccurate, since TCP/IP is not a protocol, but rather a protocol suite. A correct answer would be IP; if a computer doesn't have an IP address, and doesn't send and receive packets via IP, it is not directly connected to the Internet.
I'm not precisely sure what Sebesta is looking for with this question. The only reasonable answer I can suggest would be "32 bits". It is no longer entirely accurate to say that IP addresses are "thought of" as four eight-bit numbers; with the advent of CIDR, that is a bit of an anachronism.
A fair exam question.
A fair exam question.
Comment: telnet is not secure, and therefore should no longer be used for its original intended purpose (which was? That is what the question is asking.)

Telnet is, however, still useful as a utility to establish an TCP connection to a server as illustrated in both the text, and in these reading notes (Section 1.7.1).
A fair exam question.
A fair exam question.
A fair exam question.
There are several possible answers if you've read the whole chapter, but I think the key here is "common situation". Can you think of an instance where this occurs that does NOT involve a "server-side" technology such as CGI, PHP, Java Servlets, etc. The answer is mentioned in Section 1.5
A fair exam question.
A fair exam question.
A fair exam question.
A fair exam question.
A fair exam question.
I wouldn't ask the question in this form, given that file as used here is not a protocol, but rather a URI scheme. Having said that, we can ask: what does "file" at the beginning of URL signify?
Its a reasonable question, but I wouldn't ask it in this form; I'd more likely ask a question related to this concept after we cover the specifics of URL mapping in Java Servlets.
A fair exam question.
A fair exam question (I'd strike the word "specification" though).
The question is badly formed. It is not necesarily the web server's responsibility to furnish anything in particular when it serves up a document with an experimental MIME type. True, the browser can't display the content without a helper application or plug-in, but that might come (and usually does come!) from another web server altogether!
I wouldn't ask this. For CISC474, unless, I tell you otherwise later, it is enough to know GET and POST (and to know that six other methods exist, but are seldom used.)
A fair exam question.
I would rephrase the question this way: Most of the headers on an HTTP response are not essential, but there is one, without which, the browser cannot correctly display the downloaded content. Which one is this, and why?
An interesting topic for CISC450, but I won't ask it in this course.
Ah, CISC301 rears its head. You might reply "A markup language is not Turing-complete". What would this mean?
I won't ask this.
I won't ask this.
I won't ask this.
A great exam question, but perhaps premature until we've studied XML in more depth.
Also a great exam question, but definitely premature until we've studied XML in more depth. The answer isn't really even given in the chapter—only implied, and not even that strongly. I don't know how you'd know this without prior knowledge of XML, or futher study beyond the book—not that there's anything wrong with asking you to do that, but these are review questions, not questions for further exploration.
An outstanding exam question. Expect this one to appear.
A good exam question.
A good exam question.
A good exam question.
A fair exam question.
I won't ask unless/until we've worked directly with Perl
A fair exam question
A fair exam question
I won't ask unless/until we've worked directly with PHP.

1.11 Exercises

I wont be using the exercises from this Chapter. Don't worry; you'll be plenty busy without worrying about these.

My advice to Sebesta on Chapter 1

Ok, before we get started, here's that promised advice for Sebesta.

I'd like to hear your opinions (those of CISC474 students) on this advice too—use the discussion board on WebCT with the title: "Reading Notes for Sebesta Questions/Discussions"

Here's my advice. Instead of the "yada yada yada" about how profound the Internet is, and the dry overviews of basic concepts, start the book this way: with a chapter that takes the reader on a tour of what various technologies can do.

This chapter would be an illustrated elaboration on the material currently in Section 1.8 ("The Web Programmer's Toolbox"). But instead of just describing what the different technologies can do, point the reader to example web sites that use the various technologies. Show some pictures in the book of the web sites described, but mostly, let the web sites speak for themselves. The text in the book would just be a brief description of how the technology is used to make that particular site "do what it does".

Get Addison-Wesley to host the sites so you'll know they'll stay up. Start with simple XHTML, and work your way through the technologies, so that the reader is motivated to learn more. This would get the book off to a much more exciting start!

You even include on the site links to other "real-world" site that use the technologies being described (since those links might change frequently, you probably wouldn't want to include them in the book, but on the web site, they could be updated as needed.)

Then provide a second chapter (a separate chapter) of "background and basic concepts" with the stuff that is currently in Sections 1.1 through 1.7. Separating this out, and putting it after the "tour" would make it easier to swallow. You can point out that all of these things are necessary background before we can get on to the rewarding task of building interesting web sites (like the ones we saw in Chapter 1.)

CISC474, Reading Notes for Sebesta, Programming The World Wide Web, 3rd Edition

Reading notes by Phill Conrad, Asst. Professor, CIS Dept. University of Delaware

Chapter 1: Fundamentals

I think you'll find this a really useful textbook.

I'm afraid it gets off to a weak start.

Do you agree or disagree? How could it be made better?

If you like, skip to Section 1.6 on first reading, but read all of these notes carefully!

1.1 A Brief Introduction to the Internet

1.1.2 What the Internet Is

Some Things You Should Know About

the TCP/IP protocol suite

Parts shaded in blue might appear on an exam. Parts shaded in beige and in a smaller font are for your background knowledge only.

Some Things You Should Know About

The Five Layers of the TCP/IP protocol suite

Parts shaded in blue might appear on an exam. Parts shaded in beige and in a smaller font are for your background knowledge only.

Some Things You Should Know About

Tomcat

as you read about Web Servers in Sebesta

Parts shaded in blue might appear on an exam. Parts shaded in beige and in a smaller font are for your background knowledge only.

1.4.1 Web Server Operation

Some Things You Should Know About

URL vs. URI vs. URN

Some Things You Should Know About

the term "MIME Type"

Some Things You Should Know About

HTTP Methods

My advice to Sebesta on Chapter 1

CISC474, Reading Notes for Sebesta,
Programming The World Wide Web, 3rd Edition

Parts shaded in blue might appear on an exam.
Parts shaded in beige and in a smaller font are for your background knowledge only.

Parts shaded in blue might appear on an exam.
Parts shaded in beige and in a smaller font are for your background knowledge only.

Parts shaded in blue might appear on an exam.
Parts shaded in beige and in a smaller font are for your background knowledge only.