You are on page 1of 5

MISC 710: INFORMATION STORAGE AND RETRIEVAL

The objectives of this unit are:


To outline basic terminology and components in information storage and retrieval
systems
To compare and contrast information retrieval models and internal mechanisms such
as Boolean, Probability, and Vector Space Models
To outline the structure of queries and documents
To articulate fundamental functions used in information retrieval such as automatic
indexing, abstracting, and clustering
To critically evaluate information retrieval system effectiveness and improvement
techniques
To understand the unique features of Internet-based information retrieval
To describe current trends in information retrieval such as information visualization.

Course Schedule
Topic 1: Introduction to information retrieval

Definition of concepts
Objectives and functions of information retrieval systems
Components of information retrieval systems

Topic2: Information retrieval models


Topic 3: Subject Analysis, Indexing & Abstracting
Topic 4: Vocabulary control and Thesaurus Construction
Topic 5: Systems for Categorization & Classification
Topic 6: Information Representation and Formatting: Metadata Schemas, ISBD, AACR, Dublin
Core, RDA
Topic 7: Encoding Standards for Document Representation, MARC, RDF/XML
Topic 8: Name Access Points & Name Authority Control: AACR, RDA, & others
Topic 9: Information Systems; Databases; and Bibliographic Systems
Topic 10: System Design, Implementation and Evaluation
Topic 11: Current trends in information retrieval techniques

References
Baeza-Yates, R., & Ribeiro-Neto, B. (1999). Modern Information Retrieval. Reading, MA:
Addison-Wesley.
Chan, Lois Mai (2007). Cataloging and Classification: An Introduction. 3rd ed. Lanham, MD:
Scarecrow Press.

Chu, Heting (2003). Information Representation and Retrieval in the Digital Age, Medford, NJ:
Information Today, 2003
Korfhage, Robert R. (1997). Chapter 1: Overview. In Information Storage & Retrieval. New
York: Wiley & Sons.
Lancaster, F. W. (2003). Indexing Principles. In Indexing and Abstracting in Theory and
Practice. 3rd ed. Champaign: University of Illinois, Graduate School of Library and Information
Science.
Lancaster, F. W. (1986). Vocabulary Control for Information Retrieval. 2nd ed. Arlington, Va:
Information Resources Press.

Topic 1: Introduction to information retrieval


Information retrieval is the activity of obtaining information resources relevant to an
information need from a collection of information resources. Searches can be based
on metadata or on full-text indexing.
Automated information retrieval systems are used to reduce what has been called
"information overload".
Many universities and public libraries use IR systems to provide access to books,
journals and other documents.
Web search engines are the most visible IR applications.
An information retrieval process begins when a user enters a query into the system.
Queries are formal statements of information needs, for example search strings in
web search engines.
Information need
Information need is defined as a state or process started when one perceives that
there is a gap between the information and knowledge available to solve a problem
and the actual solution of the problem. Information competencies are defined as the
capabilities developed to reach the solution of a problem by searching for new
information or knowledge that could fill the perceived gap.
Types of Information Needs
Retrospective
Searching the past
Different queries posed against a static collection
Time invariant
Prospective
Searching the future
Static query posed against a dynamic collection
Time dependent
Conceptual model of IR
IR is a three step process

Asking the question (how to use the language to get what we want)
Building the answer from known data (how to refer to a given text)
Assessing the answer (does it contain the information we are looking for}
Objectives and functions of information retrieval systems

The following are some typical activities


to search for information resources in a librarys online public access catalogue
(OPAC), which provides access to the librarys collections
to search for information in online bibliographic or full-text databases (database
search services) providing access to remote collections
to access e-books and e-journal services such as Emerald & Ingenta providing
access to electronic books and journal articles
to search for an e-mail address, a specific message, a phone number or an
address on a mobile phone or in e-mail services such as Outlook Express, Gmail, or
Eudora
to search for information on institutional intranets and databases, such as those
created by companies and institutions providing access to various information
resources created within the institution
to access information on websites either by going directly to the web page, by
entering the web address or Uniform Resource Locator (URL) of the site, or by using
tools such as search engines
to access information on the web using subject gateways that provide access to
selected web resources in one or more specific discipline(s)
to search for music on iTunes
to search for information on social networking sites such as Facebook, Twitter and
YouTube.
Functions
Information retrieval systems have the following functions:
To identify the information (sources) relevant to the areas of interest of the target
users community; this is a challenging job especially in the web environment where
virtually everybody in the world can be the potential user of a web-based
information retrieval system

to analyze the contents of the sources (documents); this is becoming increasingly


challenging as the size, volume and variety of information sources (documents) is
increasing rapidly; web information retrieval is carried out automatically using
specially designed programs called spiders
to represent the contents of analyzed sources in a way that matches users
queries; this is done by automatically creating one or more index files, and is
becoming an increasingly complex task due to the volume and variety of content
and increasing user demands
to analyze users queries and represent them in a form that will be suitable for
matching the database; this is done in a number of ways, through the design of
sophisticated search interfaces including those that can provide some help to users
for selection of appropriate search terms by using dictionary and thesauri,
automatic spell checkers, a predefined set of search statements and so forth
To match the search statement with the stored database; a number of complex
information retrieval models have been developed over the years that are used to
determine the similarity of the query and stored documents
to retrieve relevant information; a variety of tools and techniques are used to
determine the relevance of retrieved items and their ranking
to make continuous changes in all aspects of the system, keeping in mind the
rapid developments in information and communication technologies (ICTs) relating
to changing patterns of society, users and their information needs and expectations.
Components of an information retrieval system
An information retrieval system comprises six major subsystems:
the document subsystem
the indexing subsystem
the vocabulary subsystem
the searching subsystem
the user-system interface
the matching subsystem.

You might also like