Wednesday, May 6, 2009

Introduction to the Internet: Database searching

Database structure
Database:
a collection of information organized for searching and retrieval. An example of this would be the Yellow pages, organized by subject and then alphabetically.


Record: the complete description of one document in a database. For the Yellow Pages, a record would consist of all the information for a individual – generally name, address and telephone number.

Field: individual unit of info in a record e.g. author, title, subject. Referring above to the Yellow Pages, we would generally have three fields – name, address and telephone number.

A record is a collection of fields.

Database conceptual structure
http://lib.radford.edu/tutorial/IV/index.asp



Sample record
http://lib.radford.edu/tutorial/IV/index.asp



In the above OPAC record example, the fields on display are the author, title and publication info.

Types of databases
Bibliographic:
contains citations or references to, but not the full text of publications such as magazine articles, books, dissertations, videos, government documents, etc. Sometimes include abstracts. The Red River College Library catalogue is an example of a bibliographic database (indexing only), as is ERIC (indexing plus abstracts).



In the beginning, all bibliographic databases contained bibliographic records. The records consist of the essential facts regarding the book to ensure it can be found again.

Below is an example of a bibliographic RRC catalogue record.

Weddings : a novel /
Title: Weddings : a novel / by Dave Williamson.
Author:
Williamson, Dave, 1934-
Publisher: Winnipeg : Great Plains Fiction, c1999.
ISBN: 189428304X :
Description: 252 p.
Format: Book
Subjects:
Canadian fiction. Red River College authors.
Held at
Holdings Record Display
Location: Book Stacks - Notre Dame Campus
Call Number: PS 8595 .I564 W43 1999
Number of Items: 1
Status: Available (Not Charged)

Eric record



ERIC provides citations to books and journals, assigning an accession number to each item. The abstract lets a user decide whether they want to read it or not. The descriptors/identifiers are just fancy words for subject headings. The * before the identifiers indicates a major subject heading.

Types of databases

  • Full-text: contains complete text of documents e.g. The Canadian EncyclopediaA full-text database doesn’t have to contain periodicals. It just needs to contain all the document information.

  • Directory: contains factual info about companies, organizations, products, people, or materials, e.g. Encyclopedia of Associations (which contains directory information similar to a telephone directory)

  • Numeric: contains data in a tabular or statistically manipulated form, often with some added text, e.g. E-STAT (specializing in numeric data in numeric databases)

  • Hybrid: contains a mix of record types, e.g. Academic Search FullText Elite contains abstracts for all records and fulltext for many but not all records

  • Audio: can contain music, speeches, sound effects, etc., e.g. The History Channel: Speeches

  • Image: can contain photos, animations, prints, etc., e.g. AICT : Art Images for College Teaching, or Google.

Building a database

  • System software analyzes each record and indexes all significant words & phrases, resulting in indexes termed “inverted files” (like the printed index of a book). This is the database’s individual way of constructing a search result. Stop words are not significant as they are commonly used words usually not taken into account.

  • Machine-readable records stored sequentially by accession no. in a linear file (record in a linear file like a page in a book)

  • Each record analyzed for significant words and phrases. Each record divided into field, each field labeled and position of each word within a field is noted.

289519 (Accession number)

Servants of Socrates in the land of OZ
T11 T13 T14 T16 T18

Notice how the words in the title have been established. The stop words are not identified but they are taken into account.

Blackburn, Harold
AU

The author’s first name is not identified or indexed.

Clearing House, v57 n2 p69-71 Oct 1983
JN PY

The journal’s name and publication year limits the result.

Language: English
LA

Document Type: Position Paper
DT

Discusses the role of the microcomputer in the classroom.
AB1 B3 AB6 AB7 AB9

The abstract field has been identified, as has the stop words, which has been left out.

Descriptors: Educational Improvement; Microcomputers
DE1 DE2 DE3

For further information on how a database is constructed see: Dialog Lab Workbook chapter 3 at
http://gep.dialog.com/instruction/workbook/gep-02-30017_lab_workbook2005.pdf The database is based on the dialog search system.

Forming a search strategy

  1. Summarize topic
    I need recent references which discuss the importance of colour in improving cycling safety.

    Put the topic into a sentence. Remember not all the words will be important.

  2. Identify concepts.
    I need recent references which discuss the importance of colour in improving cycling safety.

    Identify the concepts and important terms that need to be looked for. Translate ‘recent’ into a date range.

  3. Select terms e.g.
    colour(s) cycle(s) safety (recent)
    color(s) cycling safe
    visibility bicycle(s) helmet(s)
    visible bike(s) clothing

    What words can be put into a search to get out the best result? Clarify the terms – in this case, search for bicycle not motorcycle.

  4. Combine terms
    Always link key words with representing the same concept with OR
    colour cycle safety
    OR OR OR
    color cycling safe
    OR OR OR
    visibility bicycle helmet
    OR OR OR
    visible bike clothing

    Use truncation as well.

    Link concepts together with AND
    (colour OR color OR…etc) AND
    (cycle OR bicycle OR…etc) AND
    (safety OR safe OR…etc)
Perform search.
  • Review and evaluate search results.
  • Modify the search if necessary.

  • These steps will provide a plan for searching. The actual way you input the plan will vary depending upon the resource you are searching. Did you get the result you wanted? This is a theoretical plan.

    Search types
    • Key word/”simple search”
      A keyword search is the most popular search. Remember you have to know what you’re looking for from a drop-down menu.

    • Boolean
      Use of and/or/not

    • Subject/descriptor
      Zero in for a more specific search.

    • Proximity/”phrase searching”

    • Field searching
      Fields are already provided in the database search.

    Selecting a database
    When choosing a database consider the following:

    • the scope and range
    • types of publications
    • range of publication date
    • place of publication
    • anguage of publication
    • availability of journals indexed

    What is the most appropriate database? What does the index cover? Use a database that specialises in the topic/subject you’re looking for. Be aware of how far back the database goes.

    Database organizations
    To search a database effectively, you should know how it is organized:

    • search by descriptors (subject headings) allowed?
    • thesaurus or subject heading guide available?
    • keyword search in specific fields e.g. title, author, descriptor available?
    • limiting by year or language available?

    Is there an index to find the subject heading? What limitations are there?

    Subject vs. keyword/free text searching

    Subject searchingKeyword searching
    • Searches subject or descriptor field only
    • Searches subject, title, full text and abstract fields
    • Controlled terminology (from thesaurus)
    • May search for any significant terms
    • High degree of relevancy
    • May retrieve irrelevant terms
    • High precision
    • Low recall
    • Low precision
    • High precision

    Precision vs. recall

    • The number of records retrieved by a strategy is a measure of search recall
    • The relevancy of the records is a measure of precision

    Field specific searching
    May be appropriate when looking for:

    • Articles in a particular journal
    • Items published in particular year(s)
    • Publications by a particular author
    • Keywords in the title
    • Keywords in the descriptor
    • Items in a particular language only

    For instructions in how to do a field specific search in a particular database, check the database system “Help guide”.

    Search operators

    System Truncation/wildcard Proximity
    EBSCOhost Truncation: * replaces an
    unlimited no. of characters
    including none.
    Wildcard: ? replaces one character only
    w: find terms within specified number of words, in order specified
    n: same as w but word order disregarded
    FirstSearch

    Plurals: + (only for s or es)
    Truncation: * replaces an unlimited no. of characters, including none
    Wildcard: # replaces a single character
    ? replaces an unlimited no. of characters

    w: finds terms within specified number of words, in order
    specified
    n: same as w but word order disregarded
    WebSPIRS Truncation: * replaces an unlimited no. of characters, including none
    Wildcard: ? replaces one character or none

    adj: retrieves records with search terms next to each other in specified order.
    near: retrieves records with both search terms in same sentence, within a specified no. of words.
    with: retrieves records with both search terms in same field (title, abstract, descriptor, etc.)

    Wilson Web Truncation: * replaces an unlimited no. of words, including none
    Wildcard: ? replaces one character or none
    : retrieves records with both terms within a specified number (#) of words of each other no matter the order

    The same search interface is used for all databases available from a provider.

    Proximity operators specify arrangement of search terms, e.g. marine w biology would retrieve marine biology; while marine w4 biology would retrieve a record containing marine plant and animal biology; marine n4 biology would retrieve a record containing the phrase “biology of coastal and marine wildlife”.

    Truncation searches for word variations by placing an operator at the end of a root word, e.g. computer* will retrieve computer, computerize, computers, computerization, etc.

    Wildcards search for word variations. They replace up to one or sometimes more than one character where the wildcard operator is placed, e.g. colo?r will retrieve records containing the words colour or color. Especially useful for retrieving American and Canadian spellings in same search.

    No comments: