Introduction to Web Databases

What is a Web Database?

A web database is an organized listing of web pages. It's like the card catalog that you might find in the library. The database holds a "surrogate" (or selected pieces like the title, the headings, etc.) for each web page. The creation of these surrogates is called "indexing", and each web database does it in a different way. Web databases hold surrogates for anywhere from 1 to 30 million web pages. The program also has a search interface, which is the box you type words into (like in Alta Vista or Lycos) or the lists of directories you pick from (like in Yahoo). Thus, each web database has a different indexing method and a different search interface.

Methods of Indexing

a. Full-Text Indexing

As its name implies, full-text indexing is where every word on the page is put into a database for searching.

Alta Vista http://altavista.digital.com

Open Text http://index.opentext.net/

These also contain a topical type of search and contains phone books, stock reports, weather, travel, maps, various interest topics.

Info Seek at: http://www.infoseek.com/

Excite at: http://www.excite.com

 

are examples of full-text databases. Full-text indexing will help you find every example of a reference to a specific name or terminology. However, a general topic search will not be very useful in these database, and you will have to dig through a lot of "false drops" (or returned pages that have nothing to do with your search).

b. Keyword Indexing

In keyword indexing, only the "important" words and phrases are put into the
database.

Lycos at: http://www.lycos.com




These are more like a topical type of search and contains phone books, stock reports, weather, travel, maps, various interest topics. This allows a searcher to search on more general subjects and have more accurate results. However, if a name is only mentioned once or twice on a page, it won't be included in the database.

c. Human Indexing

Yahoo at: http://www.yahoo.com and some of

Magellan at: http://www.mckinley.com

WWW Virtual Library

are three examples of human indexing. In the keyword indexing, all of the work was done by a computer program called a "spider" or a "robot". In human indexing, a person examines the page and determines a very few key phrases that describe it. This allows for the user to find a good start of works on a topic - assuming that the topic was picked by the human as something that describes the page. This is how the directory-based web databases are developed.

Spiders, Robots, or People

How do the web databases select which pages are indexed? As there is no centralized Internet computer, there's no one place where these services can learn about new pages. Thus, many services use automated programs called "spiders" or "robots" that travel from site to site, looking for new WWW pages. Some spiders only go to the "What's New" or the "What's Hot" pages and use those for indexing the "popular" sites. Others methodically examine every link leading from a page, and every link leading from that page, and so on... In some cases, people examine the pages brought back from these programs, and don't index the pages that don't meet certain criteria. So, these tools create three classes of web databases - those that look at all WWW pages, those that examine popular WWW pages, and those that examine quality web pages.

Search Engines versus Pick Lists

Now that the web database has a group of pages indexed in their database, how does the user access it.  This is through one of two methods - a search engine or a directory (otherwise known as a pick list). A search engine allows the user to type in any terminology he wishes, and will search the database to find those web pages that match the terms entered. A directory structure has pages organized by subject (like the Yellow Pages), and can then be navigated by selecting things off the directory. The directory structure usually allows a good starting point for a search, assuming that the topic you desire has been selected as a directory entry.

One thing not to get confused about - Yahoo has both a search engine and a directory tree. Instead of searching the pages, however, the search engine just looks through the directory at Yahoo. It can be used as a quick way to find the area of the directory with the information you desire.

Presentation of Results

You've entered in your search terms, the computer has matched them to the
indexed database, and you are given a list of results. The documents are almost always listed in order by relevance. Based upon your search request, the computer ranks all of the documents that contain your search term, and lists the ones that it thinks are most relevant first. That is why you really shouldn't worry about the fact that there are hundreds or thousands of pages matching your query term. All you care about are the first 20 - 40. The better your search terms, the better ranked the pages will be (and the less work you will have to do).


Evaluation of Internet Sources of Information:

In evaluating Internet Documents, consider first the quality issues that are relevant for print materials. Print materials may run the gamut of scholarly, substantive news/general interest, popular, and sensational.
The criterian which are explained further in the following two internet sources include: Authority (author and institution or organization), Accuracy, Objectivity, Currency, Coverage (comprehensive)

Checklist for an Informational Web Pages: How to Recognize an Informational Web Page

Checklist for an Advocacy Web Pages: How to Recognize an Advocacy Page



WEB SEARCHING

Using the Web: Search Engines - Elmhurst College Library - Web page by Anne Jordan-Baker, Assistant Librarian

Locating information on specific topics.

A good way to run a search for specific information on the Internet is to use a search engine, at network tool that automates the searching and retrieval process. Modern search engines can search World Wide Web pages, gopher items, and news group articles. There are many such search engines; a good one is "AltaVista". On the location line, type in:

http://www.altavista.digital.com/

and press RETURN.

As an alternative, use the buttons on the Netscape menu bar for Destinations or Net Search to bring up a variety of serach engines.
Like all search engines, AltaVista has a rectangular box where you type in one or more keywords that describe the topic or item you wish to find. You then click on the "submit" button. After a few seconds, the search engine will return the number of "hits" and a search list giving the name, address, and short description of the information that it found. Each hit has a blue-colored underlined "hyperlink" to the on-line source of that information. If the item sounds useful, click on the hyperlink to go to that source. To return to the search list, click on the Back button at the top left of the Netscape window.

You can use any number of keywords to describe the information that you want more completely; the search engine will put at the top of the hit list those documents that contain all or most of the keywords and that use them most often.

In some cases you may need to search on a specific phrase (group of words) in a specific order. In that case you put the phrase in quotes in the keyword box. For example, say you need to find information about the topic of science education. Using the AltaVista search engine, type in the words with quotation marks.

An Internet Resource from Widner University which gives a series of modules for use in learning search techniques.

A Modular Approach to Teaching the World Wide Web
Advanced Web Searching Techniques Modules (#6) from Widener University

Advanced Search Techniques using: Alta Vista at: http://www.altavista.com/

1. Use the first screen for simple searches. You may use quotes to define a phrase.
2. On the home screen, use the Advanced Search button to explore other options and techniques to either narrow or broaden a search.
3. From this screen, use the Help button for more definitions about how to do advanced searches. Follow any number of links in this section to learn about Boolean logic and operators.

Essential Search Strategies:

Most search engines allow users the following advanced techniques. Boolean Operators: AND, OR and NOT. AND limits your search by requiring that both or all words appear. OR is used to capture synonyms or related words. NOT eliminates possibilities that you suspect will give extra hits. Some engines require that these operators be capitalized; in others - such as Excite and Infoseek - symbols + and - may be used instead. Actually you need to take the time to learn some techniques for the specific data base that you use most of the time.

Info Seek at: http://www.infoseek.com/
This is more of a topical type of search contains phone books, stock reports, weather, travel, maps, various interest topics.