What is a Web Database?
A web database is an organized listing of web pages. It's like
the card catalog that you might find in the library. The database
holds a "surrogate" (or selected pieces like the title,
the headings, etc.) for each web page. The creation of these surrogates
is called "indexing", and each web database does it
in a different way. Web databases hold surrogates for anywhere
from 1 to 30 million web pages. The program also has a search
interface, which is the box you type words into (like in Alta
Vista or Lycos) or the lists of directories you pick from (like
in Yahoo). Thus, each web database has a different indexing method
and a different search interface.
Methods of Indexing
a. Full-Text Indexing
As its name implies, full-text indexing is where every word on
the page is put into a database for searching.
Alta Vista http://altavista.digital.com
Open Text http://index.opentext.net/
These also contain a topical type of search and contains phone books, stock reports, weather, travel, maps, various interest topics.
Info Seek at: http://www.infoseek.com/
Excite at: http://www.excite.com
are examples of full-text databases. Full-text indexing will
help you find every example of a reference to a specific name
or terminology. However, a general topic search will not be very
useful in these database, and you will have to dig through a lot
of "false drops" (or returned pages that have nothing
to do with your search).
b. Keyword Indexing
In keyword indexing, only the "important" words and
phrases are put into the
database.
Lycos at: http://www.lycos.com
These are more like a topical type of search and contains phone
books, stock reports, weather, travel, maps, various interest
topics. This allows a searcher to search on more general subjects
and have more accurate results. However, if a name is only mentioned
once or twice on a page, it won't be included in the database.
c. Human Indexing
Yahoo at: http://www.yahoo.com
and some of
Magellan at: http://www.mckinley.com
are three examples of human indexing. In the keyword indexing,
all of the work was done by a computer program called a "spider"
or a "robot". In human indexing, a person examines the
page and determines a very few key phrases that describe it. This
allows for the user to find a good start of works on a topic -
assuming that the topic was picked by the human as something that
describes the page. This is how the directory-based web databases
are developed.
Spiders, Robots, or People
How do the web databases select which pages are indexed? As there
is no centralized Internet computer, there's no one place where
these services can learn about new pages. Thus, many services
use automated programs called "spiders" or "robots"
that travel from site to site, looking for new WWW pages. Some
spiders only go to the "What's New" or the "What's
Hot" pages and use those for indexing the "popular"
sites. Others methodically examine every link leading from a page,
and every link leading from that page, and so on... In some cases,
people examine the pages brought back from these programs, and
don't index the pages that don't meet certain criteria. So, these
tools create three classes of web databases - those that look
at all WWW pages, those that examine popular WWW pages, and those
that examine quality web pages.
Search Engines versus Pick Lists
Now that the web database has a group of pages indexed in their
database, how does the user access it. This is through one
of two methods - a search engine or a directory (otherwise known
as a pick list). A search engine allows the user to type in any
terminology he wishes, and will search the database to find those
web pages that match the terms entered. A directory structure
has pages organized by subject (like the Yellow Pages), and can
then be navigated by selecting things off the directory. The directory
structure usually allows a good starting point for a search, assuming
that the topic you desire has been selected as a directory entry.
One thing not to get confused about - Yahoo has both a search
engine and a directory tree. Instead of searching the pages, however,
the search engine just looks through the directory at Yahoo. It
can be used as a quick way to find the area of the directory with
the information you desire.
Presentation of Results
You've entered in your search terms, the computer has matched
them to the
indexed database, and you are given a list of results. The documents
are almost always listed in order by relevance. Based upon your
search request, the computer ranks all of the documents that contain
your search term, and lists the ones that it thinks are most relevant
first. That is why you really shouldn't worry about the fact that
there are hundreds or thousands of pages matching your query term.
All you care about are the first 20 - 40. The better your search
terms, the better ranked the pages will be (and the less work
you will have to do).
Evaluation of Internet Sources of Information:
In evaluating Internet Documents, consider first the quality issues
that are relevant for print materials. Print materials may run
the gamut of scholarly, substantive news/general interest, popular,
and sensational.
The criterian which are explained further in the following two
internet sources include: Authority (author and
institution or organization), Accuracy, Objectivity, Currency,
Coverage (comprehensive)
Checklist
for an Informational Web Pages: How to Recognize an Informational
Web Page
Checklist
for an Advocacy Web Pages: How to Recognize an Advocacy Page
WEB SEARCHING
Using
the Web: Search Engines - Elmhurst College Library - Web
page by Anne Jordan-Baker, Assistant Librarian
Locating information on specific topics.
A good way to run a search for specific information on the Internet
is to use a search engine, at network tool that automates the
searching and retrieval process. Modern search engines can search
World Wide Web pages, gopher items, and news group articles. There
are many such search engines; a good one is "AltaVista".
On the location line, type in:
http://www.altavista.digital.com/
and press RETURN.
As an alternative, use the buttons on the Netscape menu bar
for Destinations or Net Search to bring up a variety of serach
engines.
Like all search engines, AltaVista has a rectangular box where
you type in one or more keywords that describe the topic or item
you wish to find. You then click on the "submit" button.
After a few seconds, the search engine will return the number
of "hits" and a search list giving the name, address,
and short description of the information that it found. Each hit
has a blue-colored underlined "hyperlink" to the on-line
source of that information. If the item sounds useful, click on
the hyperlink to go to that source. To return to the search list,
click on the Back button at the top left of the Netscape window.
You can use any number of keywords to describe the information
that you want more completely; the search engine will put at the
top of the hit list those documents that contain all or most of
the keywords and that use them most often.
In some cases you may need to search on a specific phrase (group
of words) in a specific order. In that case you put the phrase
in quotes in the keyword box. For example, say you need to find
information about the topic of science education. Using the AltaVista
search engine, type in the words with quotation marks.
An Internet Resource from Widner University which gives a series
of modules for use in learning search techniques.
A
Modular Approach to Teaching the World Wide Web
Advanced
Web Searching Techniques Modules (#6) from Widener University
Advanced Search Techniques using: Alta
Vista at: http://www.altavista.com/
1. Use the first screen for simple searches. You may use quotes
to define a phrase.
2. On the home screen, use the Advanced Search button to explore
other options and techniques to either narrow or broaden a search.
3. From this screen, use the Help button for more definitions
about how to do advanced searches. Follow any number of links
in this section to learn about Boolean logic and operators.
Essential Search Strategies:
Most search engines allow users the following advanced techniques.
Boolean Operators: AND, OR and NOT. AND limits your search by
requiring that both or all words appear. OR is used to capture
synonyms or related words. NOT eliminates possibilities that you
suspect will give extra hits. Some engines require that these
operators be capitalized; in others - such as Excite and Infoseek
- symbols + and - may be used instead. Actually you need to take
the time to learn some techniques for the specific data base that
you use most of the time.
Info Seek
at: http://www.infoseek.com/
This is more of a topical type of search contains phone books,
stock reports, weather, travel, maps, various interest topics.