What information can be found on the web?
- Introduction
- Distributed Medium
- Authorship
- Author Indexing Control
- Search Engines
- Basic Guidelines
- Simple Searches
- Advanced Searches
- Directories
There is a vast amount of information available on WWW. This information
covers topics of all areas, from all cultures from all around the world.
Because there is so much information available on WWW the following
dilemma is evolving.
Much Information x Poor Information Search Techniques =
Little
Information
Thus we need to understand the nature of WWW, what kind of information is
available, and how to efficiently search for that information.
As previously discussed WWW is a distributed medium. Information sources
are worldwide and are hosted on any of the almost half a million WWW
servers. Imagine trying to decide which TV station to watch if
your TV allowed you to surf almost half a million stations! Because the
information is distributed in such a fashion, it becomes
a real problem for those that try to "catalog" WWW in order to provide
efficient indexing and information retrieval for browsers. And once a
"catalog" has
been developed, then updating it becomes a real headache, not only would
you have to be aware of new information from existing servers, but you
would also need to develop a mechanism for accounting for new servers.
Servers are becoming very simple to install thus compounding the problem
further.
Authorship
Because there are significant barriers to entry in traditional media
markets this limits the number of "publishers" to a very few. Those that
can afford the
capital outlay and on going expense to compete. Because this is a
significant investment, the
value of the material that is published must also be considered
significant, at least to a particular target audience who finances
the endevour (either by subscrition, pay per view or third
party advertising). This
is a good "checks and balances" mechanism to make sure that, in general,
what is published off-line does carry some value.
While the low cost of
entering WWW offers the real benefit of opening up the WWW market to small
businesses, publishers and individuals, this presents a real disadvantage.
Many publishers create a lot of information, and since the cost to publish
is very low, then the return needed is also very low, deeming a lot of the
information out there only relevant to the very few (or only the author).
Thus much WWW information is of no value to the WWW audience, but is still
a viable publishing proposition from an economic (utility) standpoint.
Another issue that complicates the quality of information available to
browse is that the author, to a large extent, can control the "indexing"
process of a site. By using relevant keywords, and META tags that can
hide irrelevant (but very popular) keywords, the author can try to
manipulate when the site appears to a browser, searching for information.
This will be discussed in greater detail when we focus on marketing a
web-site, but it does create a problem for the objective indexing of
sites. Thus when you search using a particular keyword some of your
results may bear no (or very little) relevance to the search term in
question. If that is the case, click on the view source option to see if
keywords have been placed in the source document, but don't appear in the
WWW document.
Search Engines
The main method of searching for information on WWW is to access a search
engine. The table below will give you access to the major search
engines available on WWW.
It is probably appropriate to become familiar with a couple of these
engines and use them as you primary starting search point.
The first thing you should understand is that search engines do not
search
WWW, but search a database that they maintain containing WWW URLs. These
databases are
updated regularly (how regularly depends on which search engine) by using
a "spider" that "crawls" WWW to find new sites. Thus each search
engine's database is going to be slightly different, not because they are
searching different things (they are all searching WWW) but because they
search WWW at different times and update their databases at different
times.
Remember, WWW is a distributed medium, new WWW sites are created
continuously, everywhere. Search engines will crawl new sites at
different
times and may overlook sites unless the author submits the URL to the
engine (this will be discussed later.)
The results that each engine provides, based on keyword searches, will
also
differ. The results are dependant on the comprehensiveness of the
database (of which there will be a difference) and the method of searching
through the databases. It is very important, therefore, if you want to
undertake a quality search for a particular topic, to
understand how to search the database of the search engine. The
following will give you some good overall pointers on how to use the Alta
Vista Search Engine. You should
spend some time viewing the search engines "help" pages to get a better
understanding of the different techniques that are particular to each
engine.
Major WWW Search Engines
The above table gives you access to the primary engines and their help
pages. These help pages are very useful in determining how to undertake
an efficient search. The following guidelines are for Alta Vista. All the major engines use similar commands. These tips are broken down to
simple searches and
advanced searches.
Simple Search Guidelines
- How are listings ranked?
The output will be ranked in order of relevance. Relevance is determined
by the number of keywords devided by the number of words in the document
(or a formula similar to this).
Also important (depending on search engines) is whether the keyword(s)
appear(s) in the title of the document, or generally how "high" in the
document keyword(s) appear.
- Use multiple words in a search
Using multiple words will help refine a search. Be aware however,
do not use
general words that will identify documents that are irrelevant. When
using multiple words the search will look for all documents that contain
all the words first, but will also find documents that contain only some
of
the words, or just one. Thus if you are looking for MBA programs,
by keying in mba program, you are looking for
documents with the phrase mba program in it, and look for
all documents with
the word program in it. You will get documents that cover pschology
programs, and computer programming guides (there are probably many of
these!) If you want to ask the search engine to
only look for the phrase MBA Program, then use quotes as follows: "mba
program".
This tells the search engine to only list documents with the phrase MBA
Program in it, a more precise search.
- When to capitalize letters
If you use capital letters then the results must be capitalized in a
similar fashion. Hence a search for houses using the the keyword
House
will find documents containing the keyword House but not
house. Conversely, if you use small letters then the search
results
are case insensitive. Therefore a search using the term house will
find documents containing the words house House and/or HOUSE. When
you are searching for a proper name, it makes sense to capitalize the
first
letter of the name, since it will be capitalized in the documents you are
interested in.
- *
Using the wild card character * you can use partial words for your search.
This is very useful if you are not sure of the spelling of a word. You
must include four letters in the search for this to work. For example, a
search using the word "catah* leopard" will look for the breed of
dog, Catahula Leopard, or Catahoula Leopard.
- - +
By putting the -sign infront of a word, you are asking that were not
appear in the documents of the results list, a +sign indicates the word
must appear. Make sure not to leave a space between the sign and the
word. The search -thoroughbred +horse askes that the
search engine produces all documents that include the word horse, but
not thoroughbred.
Advanced Searches
You can use more advanced search techniques using boolean search
operators. These commands allow you to include words, not include words
and other, more complex, options. These commands are case insensitive.
Make sure you select the advanced search option to use these
commands.
- OR
By using the OR between two keywords you are asking the search engine to
look for either word or both words. This is basically the same as a
simple search where the results priortize when both terms appear, then
when either term appears.
- AND
This command asks that both words appear in the results of the search, in
either order.
- NOT
This will exclude a word or a phrase that follows the command.
- ()
Brackets allow you to combine search terms as follows:
(cat OR dog) AND collar This will look for articles that includes
the phrase cat collar or dog collar.
Directories
Directories are generally more specialized areas to search. Although
Yahoo!
is a directory I have included it above because its size and its
search
capabilities are similar to search engines. You can, however, search
Yahoo!
by browsing their hierarchy of links. The top of the heirarchy is
very
general, the further down the herirachy you link, the more specific your
search. This is generally how a directory is designed. The deeper in
the directory you are, the more specific the search. Directories usual
specialize in a particular area, and are therefore niche search areas.
Often times, you
will find them from your more general searches using one of the above
search engines. Madalyn is a
directory of business resources that was developed in order to help
students perform business research on WWW.
Return to main page