Marketing and Electronic Commerce

What information can be found on the web?

  1. Introduction
    1. Distributed Medium
    2. Authorship
    3. Author Indexing Control
  2. Search Engines
    1. Basic Guidelines
    2. Simple Searches
    3. Advanced Searches
  3. Directories


There is a vast amount of information available on WWW. This information covers topics of all areas, from all cultures from all around the world. Because there is so much information available on WWW the following dilemma is evolving.

Much Information x Poor Information Search Techniques = Little Information

Thus we need to understand the nature of WWW, what kind of information is available, and how to efficiently search for that information.

Distributed Medium

As previously discussed WWW is a distributed medium. Information sources are worldwide and are hosted on any of the almost half a million WWW servers. Imagine trying to decide which TV station to watch if your TV allowed you to surf almost half a million stations! Because the information is distributed in such a fashion, it becomes a real problem for those that try to "catalog" WWW in order to provide efficient indexing and information retrieval for browsers. And once a "catalog" has been developed, then updating it becomes a real headache, not only would you have to be aware of new information from existing servers, but you would also need to develop a mechanism for accounting for new servers. Servers are becoming very simple to install thus compounding the problem further.


Because there are significant barriers to entry in traditional media markets this limits the number of "publishers" to a very few. Those that can afford the capital outlay and on going expense to compete. Because this is a significant investment, the value of the material that is published must also be considered significant, at least to a particular target audience who finances the endevour (either by subscrition, pay per view or third party advertising). This is a good "checks and balances" mechanism to make sure that, in general, what is published off-line does carry some value.

While the low cost of entering WWW offers the real benefit of opening up the WWW market to small businesses, publishers and individuals, this presents a real disadvantage. Many publishers create a lot of information, and since the cost to publish is very low, then the return needed is also very low, deeming a lot of the information out there only relevant to the very few (or only the author). Thus much WWW information is of no value to the WWW audience, but is still a viable publishing proposition from an economic (utility) standpoint.

Author Indexing Control

Another issue that complicates the quality of information available to browse is that the author, to a large extent, can control the "indexing" process of a site. By using relevant keywords, and META tags that can hide irrelevant (but very popular) keywords, the author can try to manipulate when the site appears to a browser, searching for information. This will be discussed in greater detail when we focus on marketing a web-site, but it does create a problem for the objective indexing of sites. Thus when you search using a particular keyword some of your results may bear no (or very little) relevance to the search term in question. If that is the case, click on the view source option to see if keywords have been placed in the source document, but don't appear in the WWW document.

Search Engines

The main method of searching for information on WWW is to access a search engine. The table below will give you access to the major search engines available on WWW. It is probably appropriate to become familiar with a couple of these engines and use them as you primary starting search point.

The first thing you should understand is that search engines do not search WWW, but search a database that they maintain containing WWW URLs. These databases are updated regularly (how regularly depends on which search engine) by using a "spider" that "crawls" WWW to find new sites. Thus each search engine's database is going to be slightly different, not because they are searching different things (they are all searching WWW) but because they search WWW at different times and update their databases at different times. Remember, WWW is a distributed medium, new WWW sites are created continuously, everywhere. Search engines will crawl new sites at different times and may overlook sites unless the author submits the URL to the engine (this will be discussed later.)

The results that each engine provides, based on keyword searches, will also differ. The results are dependant on the comprehensiveness of the database (of which there will be a difference) and the method of searching through the databases. It is very important, therefore, if you want to undertake a quality search for a particular topic, to understand how to search the database of the search engine. The following will give you some good overall pointers on how to use the Alta Vista Search Engine. You should spend some time viewing the search engines "help" pages to get a better understanding of the different techniques that are particular to each engine.

Major WWW Search Engines
Search Engine Simple Search Advanced Search
AltaVista Simple Advanced
Excite Simple Advanced
HotBot HotBot Help
Infoseek Simple Advanced
Lycos Simple Advanced
WebCrawler Simple Advanced
Yahoo Directory Simple Advanced

Search Engines, Some Basic Guides

The above table gives you access to the primary engines and their help pages. These help pages are very useful in determining how to undertake an efficient search. The following guidelines are for Alta Vista. All the major engines use similar commands. These tips are broken down to simple searches and advanced searches.

Simple Search Guidelines

  1. How are listings ranked?
    The output will be ranked in order of relevance. Relevance is determined by the number of keywords devided by the number of words in the document (or a formula similar to this). Also important (depending on search engines) is whether the keyword(s) appear(s) in the title of the document, or generally how "high" in the document keyword(s) appear.
  2. Use multiple words in a search
    Using multiple words will help refine a search. Be aware however, do not use general words that will identify documents that are irrelevant. When using multiple words the search will look for all documents that contain all the words first, but will also find documents that contain only some of the words, or just one. Thus if you are looking for MBA programs, by keying in mba program, you are looking for documents with the phrase mba program in it, and look for all documents with the word program in it. You will get documents that cover pschology programs, and computer programming guides (there are probably many of these!) If you want to ask the search engine to only look for the phrase MBA Program, then use quotes as follows: "mba program". This tells the search engine to only list documents with the phrase MBA Program in it, a more precise search.
  3. When to capitalize letters
    If you use capital letters then the results must be capitalized in a similar fashion. Hence a search for houses using the the keyword House will find documents containing the keyword House but not house. Conversely, if you use small letters then the search results are case insensitive. Therefore a search using the term house will find documents containing the words house House and/or HOUSE. When you are searching for a proper name, it makes sense to capitalize the first letter of the name, since it will be capitalized in the documents you are interested in.
  4. *
    Using the wild card character * you can use partial words for your search. This is very useful if you are not sure of the spelling of a word. You must include four letters in the search for this to work. For example, a search using the word "catah* leopard" will look for the breed of dog, Catahula Leopard, or Catahoula Leopard.
  5. - +
    By putting the -sign infront of a word, you are asking that were not appear in the documents of the results list, a +sign indicates the word must appear. Make sure not to leave a space between the sign and the word. The search -thoroughbred +horse askes that the search engine produces all documents that include the word horse, but not thoroughbred.

Advanced Searches

You can use more advanced search techniques using boolean search operators. These commands allow you to include words, not include words and other, more complex, options. These commands are case insensitive. Make sure you select the advanced search option to use these commands.
  1. OR
    By using the OR between two keywords you are asking the search engine to look for either word or both words. This is basically the same as a simple search where the results priortize when both terms appear, then when either term appears.
  2. AND
    This command asks that both words appear in the results of the search, in either order.
  3. NOT
    This will exclude a word or a phrase that follows the command.
  4. ()
    Brackets allow you to combine search terms as follows:
    (cat OR dog) AND collar This will look for articles that includes the phrase cat collar or dog collar.


Directories are generally more specialized areas to search. Although Yahoo! is a directory I have included it above because its size and its search capabilities are similar to search engines. You can, however, search Yahoo! by browsing their hierarchy of links. The top of the heirarchy is very general, the further down the herirachy you link, the more specific your search. This is generally how a directory is designed. The deeper in the directory you are, the more specific the search. Directories usual specialize in a particular area, and are therefore niche search areas. Often times, you will find them from your more general searches using one of the above search engines. Madalyn is a directory of business resources that was developed in order to help students perform business research on WWW.

Return to main page