You are on page 1of 47

Document

Imaging

in the New Millennium

document imaging

www.laserfiche.com

Copyright 2000 by Compulink Management Center, Inc.


All rights reserved.
LaserFiche is a registered trademark of Compulink Management Center, Inc.
All other trademarks are properties of their respective companies. No part of
this publication may be reproduced, photocopied, stored on a retrieval system,
or transmitted without the express prior written consent of the publisher.

LaserFiche Document Imaging


A Division of Compulink Management Center, Inc.
20000 Mariner Avenue
Torrance, CA 90503
USA
Document Imaging in the New Millennium
by LaserFiche
Second edition published April 2000
Printed in the U.S.A.

Contents
Introduction....................................................................................................................................................................... 1
What is Document Imaging? ................................................................................................................................. 2
Bringing in Documents ...................................................... 3
Storing Documents ............................................................ 5
Indexing Documents .......................................................... 8
Retrieving Documents ....................................................... 10
Controlling Access .............................................................. 11

Benefits of Document Imaging ............................................................................................................................ 12


Implementing Document Imaging .................................................................................................................... 13
Evaluating Your Needs .......................................................
Scaling from Pilot Project to Enterprise Solution .......
Installation ............................................................................
Training ..................................................................................
Legal Issues ...........................................................................
Support and Maintenance .................................................

13
13
13
14
15
16

Additional Features ...................................................................................................................................................... 17


Ease of Use ..........................................................................
Annotations ..........................................................................
Print/Fax/E-mail ...................................................................
Full-text Search Options ...................................................
Internet/Intranet .................................................................
Security ..................................................................................
Portability and CDs ............................................................
Outsourcing Scanning ........................................................
Disaster Recovery ..............................................................
Batch Processing .................................................................
Bar Codes .............................................................................
Zone OCR ...........................................................................
Workflow ..............................................................................
Scalability................................................................................
System Compatibility...........................................................
Networked Systems............................................................
Client/Server Architecture.................................................
Non-proprietary File Formats...........................................

17
17
17
18
19
19
20
21
21
22
22
22
22
23
24
24
24
25

Contents
Frequently Asked Questions ................................................................................................................................. 26
General..................................................................................
Scanning/Importing/Storing................................................
Viewing/Printing/Exporting................................................
OCR: Optical Character Recognition ...........................
COLD: Computer Output to Laser Disc.....................

26
27
29
31
32

Glossary of Terms.......................................................................................................................................................... 33
About the Author ......................................................................................................................................................... 42

Introduction
Ive done searches that would have taken me probably three or four working days and I found the information in about 10 minutes. Our archives are historical treasures which is one of the reasons we did this,
because people use them for research and the records were wearing out. So we wanted to store the original
materials away and not risk damaging them anymore.
Linda Butler, City Clerk
Flagstaff, AZ

Paper

Be a pleasure to use, whether youre the


person that needs the files, the records
manager or the MIS manager

We all need it to do our work, but paper accumulates quickly. Our files grow fatter and fatter,
and then they grow some more. Folders and filing systems make it easier to find our documents.
Records managers organize, archive and retrieve
our information.

Let you share documents with colleagues


while protecting confidential information
Allow you to e-mail or fax files with the
click of a mouse
Provide an easy way to share documents
with other offices or take them on the road

But the amount of paper keeps growing. Paper


files are often hard to find. Records may not be
in their proper folder. Or they may be borrowed and then lost on somebodys desk.
Studies show that professionals often lose up to
500 hours a year just looking for documents.

Conform to the way you work, rather than


forcing you to change
Since 1987, we at LaserFiche have devoted
countless hours to building document imaging
and management software. Drawing upon that
expertise, we have created this guide to explain
what document imaging is, what to look for in a
system, and how it can make storage and
retrieval of documents a smooth process
instead of a chore.

Those days are gone. Document imaging offers


a better way to manage the records you rely on.

Document imaging should:


Enable you to manage millions of records
and retrieve the one you need in seconds

What is Document Imaging?


Just as the Internet has boomed in popularity
because of the fast access it provides to information stored in web pages, document imaging
systems provide tremendous value because of
the fast access they provide to information
stored within an organizations documents.

Document imaging is the conversion of paper


documents into electronic images on your computer. Once on your desktop, these documents
can be retrieved effortlessly in seconds. Every
organization generates large amounts of paper
and electronic documents. We have all developed our own ways to store important files, yet
things continue to get misplaced. Everyone
knows the frustration of not being able to find a
file right when we need it most. Traditional
methods of storing paper and electronic records
require a great deal of effort to manage, distribute and find those documents. As the number of
files grows, the time and effort required to manage them also increases.

Document imaging builds on the strong points


of paper documents: Files are scanned or electronically converted and a high-resolution photocopy is stored on a hard drive or optical disk.
Electronic index cards can attach information
to a document such as author, reference number
or date created. Files can still be viewed, printed, shared and stored, but imaging adds an
enormous advantage by giving documents
active content.

Document imaging revolutionizes the archival


of information and provides the means to rapidly find, retrieve and share all documents in
your system. All document imaging systems
should have five basic components:

No longer just ink on a page, document text is


read by Optical Character Recognition (OCR)
technology. A system should allow you to
retrieve files by searching for any word or
phrase in the text, by folder location or by
index card information. Which documents
people can read, and what actions or modifications they can perform on these documents,
depends on their level of security, which should
be controlled by the document imaging system.

Scanning and importing tools to bring


documents into the system
Methods for archiving and storing
documents
Indexing systems to organize documents

Selecting the right document imaging system


can be an exciting task. There are many aspects
to consider to ensure that it fits your organizations needs. Following is a description of the
five basic components of what to look for when
choosing your system.

Retrieval tools to find documents


Access control to provide documents to
authorized people

Bringing in Documents

Scanners can handle a variety of paper sizes,


from business cards to engineering drawings.
Most departments only need to scan documents
up to legal-size paper (81/2" x 14"). For organizations or departments that use blueprints,
plans and architectural drawings, there are
large-format scanners that support E-sized (34"
x 44") documents. In general, the larger the
paper-size the scanner can handle, the more
expensive it is. Other options, such as color or
grayscale (used for photographs), also increase
the scanners price.

There are three primary methods of bringing


files into a document imaging system:
Scanning, for paper files
Conversion, for creating unalterable
images of electronic documents
Importation, for creating modifiable
versions of electronic documents

Scanning

The speed of the document scanner is another


consideration. Document imaging scanners can
handle between 10 200 pages per minute.
These are available in both simplex mode and
duplex mode. Duplex scanning allows both
sides of a two-sided document to be scanned in
a single pass. As with other options, high speed
scanning and duplex scanning will increase the
price of the scanner. In some instances, it is
more economical to purchase two 20-page-perminute scanners than one 40-page-per-minute
scanner. This option is only supported by document imaging systems that support multiple
scan stations.

Scanning a document produces a raster (picture) image that can be stored on a computer.
When choosing a scanner, it is important to
consider overall budget and the size and volume
of paper to be scanned. The ability to use a wide
range of scanners is one of the defining characteristics of a good imaging system.
A document imaging scanner should have an
Automatic Document Feeder (ADF). This
device allows stacks of paper to be placed into a
tray and automatically fed one page at a time
into the scanner, speeding up the scanning
process. Scanners without an ADF are primarily designed for imaging graphics and require
each page to be placed manually in the scanner.

If there is a large volume of documents to scan,


i.e. thousands or millions of pages, it may be
more practical and economical to use an outside scanning service bureau. To support this
option, the imaging system must accommodate
easy database synchronization between information scanned by the service bureau and pages
scanned in-house. The data volumes containing
images and index information need to be modular and easily portable. This ensures that the
documents scanned by the service bureau can

Scanner with ADF

Importing

be incorporated into a live system without


interrupting or re-indexing existing work. This
option is often referred to as portable volumes.

Importation, also known as electronic document management, is the second method for
bringing electronic files, such as office suite
documents, graphics, audio clips or video files,
into a document imaging system. Files can be
dragged and dropped into an imaging system,
but are modifiable and remain in their native
format. These files can be viewed in their original format by either launching the originating
application or by using an embedded file viewer from within the imaging system.

If an organization has several offices and needs


to share the documents scanned by each,
portable volumes capability provides an easy
way to distribute files.

Conversion
Converting documents is the process of transforming electronic word processor or spreadsheet documents into a permanent raster image
format for storage within an imaging system.
Windows applications, such as Microsoft
Word, Excel or Autodesk AutoCAD, can print
existing files into an unalterable image of the
document. These images are usually stored as
archival-quality TIFF (Tagged Image File
Format). The conversion process also generates
a complete text file, while retaining the visual
formatting and layout of the original file. This
text file can then be used for full-text indexing
of the document to assist with later retrieval.
Converting electronic documents bypasses
scanning, saves paper and printing resources,
and produces a cleaner image than scanned
paper files. This method of imaging electronic documents is best suited for permanent
archives.

Importing files

Converting documents to images


4

Storing Documents

Magnetic Media (Hard Drives)


Increasingly fast response times (the time it
takes to store and retrieve a document) and dramatic reductions in storage prices make magnetic media such as hard drives or RAID
(Redundant Array of Independent Disks) systems a popular choice.

Once brought into the system, documents must


be stored. Document imaging storage systems
must encompass changing technologies,
increasing numbers of document volumes and
the tests of time. The needs and budget for
image storage are best determined by the individual organization involved.

RAID systems are a means of formating multiple hard drives to appear as a single large storage volume. These systems protect data against
loss or damage by providing data redundancy
and fault tolerance against hardware problems.
These devices are relatively inexpensive, can be
linked together to store large numbers of documents, and provide the fastest response times.

A good document imaging system should be


able to use any storage device currently available
as well as those on the horizon to provide
long term document storage. This allows you to
select the equipment that best meets your
needs, both now and in the future.
To ensure readability in the future, if a document imaging system is to be used for digital
archiving, the files should be stored in a nonproprietary format. The computer industry
advances so quickly that storing document
images or text files in a proprietary format may
leave a company held hostage to the fortunes
and whims of a single company.

The main drawback for magnetic media is that


while inexpensive, they still contain moving
parts, which are subject to mechanical failure.
Data files can also be completely erased.
Computer personnel should perform regular
backups of hard drives so that if data is erased
or damaged, it can be restored.

At this time, there are five primary storage


options:

Magneto-Optical Storage

A few years ago the magneto-optical (MO)


diskette/disk drive was a popular way to back up
files on a personal computer. As the term
implies, an MO device employs both magnetic
and optical technologies to obtain ultra-high
data density. A typical MO cartridge is slightly
larger than a conventional 3.5-inch magnetic
diskette, and looks similar. But, while the older
type of magnetic diskette can store 1.44
megabytes (MB) of data, an MO diskette can
store many times that amount, ranging from
100 MB up to several gigabytes (GB).

Magnetic Media (Hard Drives)


Magneto-Optical Storage
Compact Discs
DVDs
WORM

The advantages and drawbacks of each are


described below.

An MO system achieves its high data density by


using a laser and a magnetic read/write head in
combination. Both the laser and the magnet are
used to write data onto the diskette. The laser
heats up the diskette surface so it can be easily
magnetized, and also to allow the region of
magnetization to be precisely located and confined. A less intense laser is used to read data
from the diskette. Data can be erased and/or
overwritten an unlimited number of times, as
with a conventional 3.5-inch diskette.

Compact disc tower

Compact Discs
Compact discs (CD) are small plastic discs used
to store information digitally. Originally developed for audio systems as an alternative to
phonograph records and audiotapes, CDs are
now used for computer data storage. Digital
information is recorded on a CD encoded as a
series of microscopic pits on the reflective surface of an aluminum disc. The disc is covered
with a transparent plastic coating and is played
on a machine that uses an infrared laser to read
the pattern of pitted and unpitted areas on the
discs surface. Since nothing touches the encoded portion, the CD is not worn out by the playing process. Standard CD formats include CDROM (Compact Disc-Read Only Memory), a
preprinted media format; CD-R (CD
Recordable), a single-use recordable disc; and
CD-RW (CD Rewritable), a multi-use recordable disc.

Magneto-Optical diskette

The chief assets of MO drives include convenience, modest cost, reliability, and (for some
models) widespread availability approaching
industry standardization. The chief limitation
of MO drives is that they are slower than hard
disk drives and still subject to mechanical failure, although they are usually faster than conventional 3.5-inch diskette drives. Data files can
also be completely erased. With the drop in hard
drive prices, the popularity of magneto-optical
storage has faded. MO disks can be placed in
jukeboxes that hold hundreds of disks.

has unprecedented widespread support from all


major electronics companies, all major computer hardware companies, and about half of the
major movie and music studios, which says
much for its chances of success. Never before
has one new technology changed so many
aspects of data storage and retrieval.

CDs offer a safe and reliable media that can provide long-term storage for images, in some cases
up to 100 years. Furthermore, CD-ROMs do
not require specialized hardware or software to
retrieve information. CDs use ISO-9600 specifications, which means the data can be read on
many computer platforms (i.e. PCs, Macs, NT
servers, Novell servers) unlike Magneto-Optical
or Worm disks. The primary drawback of this
media is its limited storage capacity, 650 MB.
CD-ROMs can be accessed through CD-ROM
drives, CD towers and jukeboxes of up to 500
discs, making it a convenient method of storing
large numbers of imaged documents.

DVD achieves its huge capacity by packing


more data into the same physical space as a CD.
It does this in several ways. First, its tracks are
closer together and the pits in each track are
smaller. Second, new data compression technology is highly efficient, minimizing the need to
store repetitive unneeded data. Third, two separate layers of tracks can be combined into a single disc.
Like a compact disc or LaserDisc, DVD permits
random access to any point on the disc. Theres
no need to shuttle forward or backward
through a tape, and of course theres no rewinding. As an optical disc, DVD never physically
contacts the pickup. The disc is played by a
beam of laser light, so there is no wear and tear
even if you keep reading the same data. The
tough plastic surface is forgiving of fingerprints,
dust and dirt. Care is the same as for compact
discs - no special treatment needed. This means
DVDs can be played thousands of times and
continue to represent the best long-term option
for reliable document imaging storage.

Digital versatile discs

DVDs
DVD, which stands for Digital Video Disc or
Digital Versatile Disc, is the next generation of
optical disc storage technology. It is essentially a
bigger, faster CD that can hold more information, and video, as well as audio and computer
data. DVD aims to encompass home entertainment, computers, and business information
within a single digital format, eventually replacing audio CD, videotape, LaserDisc, CD-ROM,
and perhaps even video game cartridges. DVD

The drawbacks of this media are its high costs


and an ongoing standards battle at time of publication. Similar to the battle between VHS and
Beta, different manufacturers are using different
formats for rewritable DVDs.

WORM

Indexing Documents

The final storage media is WORM. Short for


Write Once, Read Many, WORM is an optical
disk technology that allows you to write data
onto a disk just once. After that, the data is permanent and can be read any number of times.
This media format is not readily available and
requires specialized hardware and software to
operate. Unlike CD-ROM, there is no single
standard for WORM disks, which means that
they can only be read by the same type of drive
that wrote them. This has hampered their
acceptance, although they have found a niche
market as an archival media. Because of the limited number of companies that provide materials and support for WORM technology, it is not
highly recommended.

When paper documents are received in an


office, they must be organized to be useful. They
are usually labeled, sorted, indexed, stapled,
placed in folders and filed in a cabinet. Without
these steps, nothing could be found in a busy
workplace. Electronic documents are no different. A document imaging system should provide several different methods of organizing
information for future use. Whatever combination of indexing methodologies is used, it needs
to be easily used and understood by the people
who retrieve the documents, as well as those
who file them.
There are several schools of thought about how
much change to filing methodologies should be
introduced. In general, the more a document
imaging system can adapt to existing procedures, the less upheaval and training is involved,
and the greater the likelihood the system will be
used on a regular basis.
There are three primary ways to organize documents in an imaging system:
Index Fields
Full-text Indexing
Folder/File Structure

WORM disk

Index Fields

To avoid creating extra work, a well designed


imaging system should provide the ability to
automatically OCR and full-text index a document without requiring any human involvement.

Indexing documents using categorization fields


and keywords is a method traditionally used
with paper files that translates very well into
electronic systems. An imaging system should
allow users to customize index templates, create
multiple templates and have different types of
index field data within each template, such as
date, number and alphanumeric characters.
Index fields can be used to categorize documents, track creation or retention dates, or
record subject matter, among other uses. In
addition, an imaging system should allow pulldown boxes to speed index field entry and have
tools available to help automate entering index
information.

Folder/File Structure
Along with index fields and full-text indexing,
an imaging system should provide a visual
method of finding documents. In most offices,
files are normally found by looking in a particular folder, in a particular drawer, in a particular file cabinet. An imaging system should have
the ability to electronically re-create this filing
system through multiple levels of nested folders.
A flexible folder structure eases the transition
from paper filing to electronic filing and makes
imaging systems more successful.

Full-text Indexing
By providing full-text indexing, imaging systems eliminate the time needed for qualified
people to read and manually index documents
using keywords. To do this, the software must
have the capability to perform Optical
Character Recognition (OCR). This process
reads a scanned page and then indexes every
word to track its location. This dramatically
reduces indexing costs while providing
improved searching capabilities. With full-text
indexing, documents can be found using any
word or phrase in the document, even if those
words are not part of the keyword index.
Typically, when a computer OCRs a document,
it uses English as the default alphabet. If multiple languages are required, the imaging system
should support OCR and full-text searches in
these languages.

Example of folder/file structure

Retrieving Documents

Using index field information to find a particular document can also be helpful. A full-featured imaging system will have user-definable
template fields. Index field searches allow users
to comb through millions of records in seconds
to find their needed documents. Of course, a
person will need to know how the document
has been categorized and what index fields have
been assigned to it.

Retrieval is where a powerful indexing system


pays off. Users need to be able to use common
sense tools to find any document within the system based on what they know. In some cases,
this means browsing through folders, in other
cases it could mean conducting index field
searches. If all that is known about a needed
document is a word or phrase it contains, a fulltext search would help find the relevant file.
Whatever the method, document retrieval must
be simple and user-friendly.

To maximize search effectiveness, an intelligent


search system should be able to combine template searches with full-text searches with document or folder name searches into one comprehensive search. A good imaging system makes
retrieval of relevant documents fast, easy and
efficient

Users who are familiar with a documents text


should be able to use that information to find
what they need. Some systems can only find
pages based on indexed keywords. This method
is not always helpful because the person who
selected the keywords may not be the one
searching for the file. To be truly useful, a document imaging system must use full-text
retrieval.
Similarly, using the document name and folder
view to find a document can be helpful and
intuitive, but is not always the best method.
Sometimes a person will know exactly which
document they need, but not know what folder
it was placed in or how the document was
named.

10

Controlling Access

or e-mailing of documents. Imaging systems


without this flexibility limit the abilities of the
user.

The final mandatory component of a document


imaging system is the access control. In many
computer environments, different people use
different types of computer equipment from
different locations to search for information. A
full-featured imaging system must provide
these different users with appropriate levels of
access, without compromising confidentiality
or security. To do this, a system must have two
fundamental features:

In addition, sharing documents through the


Internet or an intranet allows system administrators to deploy an imaging system across their
entire network or even to the public. Users
should be able to search, retrieve and view documents with any web browser. Browser-based
document access removes limitations of location and computer platform (Windows,
Macintosh, Unix, etc.)

Broad Availability
Comprehensive Security

Comprehensive Security
As organizations use imaging systems to archive
a larger variety of documents, both public and
private, a system of access control needs to be
present. A comprehensive security system must
allow the system administrator to control what
folders and documents users can see, and what
actions they can perform on those documents
(edit, copy, delete, etc.) This system must control access to folders, documents and even
redacted images and text in a simple and complete manner. The ability to deploy imaging to a
wide variety of users requires a robust security
system combined with an elegant user interface.

Broad Availability
An imaging system must offer several different
ways of accessing files. A broad level of access
saves limited financial resources, intellectual
capital and network bandwidth. The most common method of access is through the users
desktop. Every document imaging system must
provide a client-based user interface that
enables the scanning, indexing and retrieval of
documents. Without this basic interface, the
system cannot function.
To provide broad availability and access flexibility, imaging systems must now meet the
requirements of offices with diverse uses and
remote locations. Document imaging is no
longer an in-the-office process. Many users
require portability to exchange documents with
other colleagues, or to work off-site. This is frequently done through CDs, notebook computers,

A good
imaging
whether
over the
security.

11

access system will make document


available to every authorized person,
in an office, at a remote location or
Web, all without compromising system

Benefits of Document Imaging


documents from over-handling and keeps
electronic documents in a non-proprietary
format.

Thousands of organizations around the world


use document imaging every day instead of paper
filing systems. Document imaging offers a number of benefits over paper or microfilm systems.

Share files easily Imaging makes it easy


to share documents electronically with colleagues and clients over a network, on CD
or through the Web. Paper documents usually
require photocopying to be shared, and
microfilm requires conversion to paper.

Fast retrieval Imaging lets you find documents quickly without leaving your desk.
Paper and microfiche are slower because
users must go to files and search manually.
Flexible indexing Imaging can index
documents in several different ways simultaneously. Indexing paper and microfilm in
more than one way is awkward, costly and
time-consuming.

Improved security Imaging can provide


better, more flexible control over sensitive
documents. Imaging controls security at
the folder, document or individual word
level for different groups and individuals. In
contrast, all paper documents in a filing
cabinet or filing room have the same level of
security.

Full-text search Imaging systems can


retrieve files by any word or phrase in the
document, a capability that is impossible
with paper or microfiche.

Save space Imaging will help recover


valuable office space that was previously
taken up by bulky paper files.

No lost files Imaged documents remain


in their folders when being viewed, so none
are lost or misplaced. Plus, index template
and full-text searches can find documents if
they are accidentally moved. Lost documents
are expensive and time-consuming to
replace.

Disaster recovery Imaging provides an


easy way to back-up documents for offsite
storage and disaster recovery. Paper is
a bulky and expensive way to back-up records
and is vulnerable to fire, flood and theft.

Digital Archiving The risk of loss or damage to paper or electronic records is reduced
with a document imaging system. Keeping
archival versions of documents in a document imaging system helps protect paper

12

Implementing Document Imaging


Scaling from Pilot Project
to Enterprise Solution

Evaluating Your Needs


When deciding on a document imaging system,
there are a number of questions to consider.

Often times, offices find it best to start with a


pilot project involving one or two departments
before expanding the imaging system to the
entire organization. Many offices have started
with the Records Management Department, but
document imaging can be implemented wherever there is need and interest. Pilot projects allow
organizations to fully develop and test their
imaging procedures before committing to an
enterprise-wide solution.

How many documents must the system store,


considering both the number of existing
documents and the number of documents
added annually? This information determines
how much storage space is needed, the hardware configuration and the cost of the system.
How many users will be using the system
concurrently? This determines preliminary
software costs and server size.
What departments will be using the system
and will the public have access? This determines what specific features and levels of
security will be needed.

Installation
The first step of an installation should be a site
inspection by the software vendor to determine
proper equipment placement and to identify any
network connectivity problems. Hardware
installation itself usually consists of unpacking,
connecting, and setting up all components, as
well as installing the necessary operating systems
and drivers. It also includes the testing of equipment to ensure proper hardware functionality
and network connectivity.

What serious problems must absolutely be


solved, and what issues should be addressed to
make life easier or reduce costs or improve
productivity? This determines which functions
will be system requirements and which might
be optional. It also helps determine whether
plug-ins or customizations will be needed.
Do you want a turnkey solution or a customized
one? This determines the amount of consulting, installation, training, configuration and
support that is needed.

After hardware testing, software installation


consists of installing the document imaging
software on the imaging server and the necessary workstations and testing it to ensure operability. Generally, the software vendor will perform these tasks, with the offices MIS personnel
available to answer questions.

What type of network is currently used


NT, NLM, LAN, or other and will it stay
the same? This determines network constraints,
system configuration and workstation upgrades.

13

Implementation Consulting

Training

This assists those responsible for the record


management functions to develop strategies for
translating existing paper filing and indexing
structures into electronic systems. Electronic filing is different from paper filing, and records
managers need to take these differences into
account when setting up their systems.
Decisions on retention schedules, storage and
filing methodologies need to be made before the
system is fully implemented. The length of the
training depends on the complexity of the filing
system, and should take place onsite.

Training programs should address different levels of users and their concerns.

End User
This involves teaching end users the mechanics
of the system. This training should take place
onsite. Each group should receive all instruction
necessary to ensure comfort with the new imaging system. The amount of training necessary
will depend on the users level of familiarity
with Windows applications, the imaging systems ease of use and the amount of change to
existing procedures. Given a user-friendly system and minimal change in procedures, most
users can become proficient in a short time
period. Comprehension is improved when the
class size is limited to no more than 10 people,
and participants are free from interruption by
phone calls and other day-to-day business.
Because of the real world concerns of staff
turnover and the need for untrained people to
search for needed documents, a well-designed
imaging system should be intuitive to use.

Supervised Hands-on Operation


This involves the supervision of office personnel
using the system in actual operation. This
allows them to increase their comfort with the
system and pose questions as they arise directly
to the trainer. This form of training is an excellent way to get people to feel confortable using
the new system.

System Administration
To ensure the document imaging system runs
smoothly, it is important to train select individuals on how to administer and maintain the system. Onsite training is recommended because it
increases familiarity with specific details of the
document imaging system. As with end user
training, freedom from interruption is necessary.

14

Legal Issues

The system has reasonable controls to


prevent and detect deterioration of records.

As document imaging becomes more commonplace, numerous laws have arisen regarding the
legality of imaged paper and electronic files.
Many government agencies now accept imaged
documents as legal records, meaning that the
paper originals can be destroyed, given certain
conditions.

There is an indexing system that assists with


finding records.

Records must be stored in an unalterable


format, such as CD, DVD or WORM.

The system has documentation on how the


software works and how it has been set up.

The system has the ability to print copies of


records.
The system must be able to cross-reference
with other record-keeping systems and
software.

The system has reasonable controls to


ensure integrity, accuracy and reliability.

The legality of imaged documents varies


depending upon the federal agency, state, county, municipality and department involved.
Organizations should consult with an attorney
on the specific statutes governing their area.

The system must provide some type of audit


trail to prevent and detect unauthorized
creation of, addition to, alteration of or
deletion of records.
A complete and accurate transfer of records
can be made.

15

Support can include any or all of the following:

Support and
Maintenance

Software upgrades
Telephone hotline support

Like keeping a car running smoothly, document


imaging systems require ongoing support.
Vendors should offer various levels of support
from software upgrades to regular, onsite maintenance visits. Several factors affecting the level
of support that an organization needs are:

Remote dial-in access to your system


Software patches available through an FTP
site
Regularly published technical bulletins or
newsletters
Onsite maintenance visits

Size of the system purchased

Additional and/or advanced training


sessions

Amount the system is used


Mission critical systems that must be
operational 24 hours a day, 7 days a week
(a frequent requirement of Police and Fire
departments)

Hardware support
When purchasing hardware such as servers,
storage devices, and workstations, organizations
should choose vendors with good reputations
for service and support. While the initial cost
may be higher, the benefits include less downtime
and more consistent and reliable functioning.

MIS personnels level of experience with


document imaging
Internet access
Changes to the organizations computer
network or infrastructure
Turnover among personnel

16

Additional Features
Typical document imaging annotations include:

Document imaging systems must provide the


basics of scanning, retrieval and display.
However, an imaging system designed for multiple users or many documents will have more
stringent requirements. This section discusses
these requirements and the various approaches
to meeting them.

Highlighting images and text in various


colors to emphasize words or sections
Redacting (blacking-out or whiting-out)
images and text to preserve confidentiality
Stamping images with words such as FAXED
or CONFIDENTIAL, or with signatures
denoting approval or denial

Ease of Use

Attaching sticky notes that contain additional


comments

One of the most important determining characteristics of how successful an imaging system
will be is its ease of use. A system will only be
used if it is simple to bring documents in,
organize them and find them. The best systems
are flexible, have intuitive graphic user interfaces, and conform to the way people already
work, rather than forcing them to adapt.

All annotations should be overlaid and not


change the actual image. This way, a document
can be printed with or without the annotations.
More importantly, from a legal standpoint, a
document stored in a document imaging system can often stand up as the best copy of a
record, since users cannot modify the original
images. The actual legal standing of a document
varies in different states and courts of law.

Annotations
Annotations provide additional information
about a document or its status without actually
changing the original image. Paper records are
often annotated with highlighting, stamps,
redaction or sticky notes, and imaged records
should support these annotations. An imaging
systems security should control who can view
annotations such as highlighting, stamps or
sticky notes, and who can see through redaction.

Print/Fax/E-mail
Imaging systems must provide ways of getting
information out of the system. Printing, faxing
and e-mailing documents are several ways of
doing this. To maximize their usefulness, imaging systems should support the most common
printer and fax drivers and be able to print
images, text and annotations.

17

As the Internet grows in popularity, more people are using e-mail to communicate and send
information. Organizations will make huge
gains in productivity if they can transmit their
documents via e-mail instead of using faxes or
the postal service. Imaging systems should have
options that allow images to be easily sent with
any MAPI (Mail Application Program
Interface)-compliant e-mail system and read by
recipients who do not have imaging systems.

Wildcards
Wildcards are characters, * (asterisk) and ?
(question mark), that can be used in full-text or
index keyword searches to compensate for misspellings or when the spelling is uncertain. The
asterisk stands for any character or characters,
while the question mark stands for any single
character. For example, searching for c*t could
find the words cat, cot, coat, cut and
chest. Searching for c?t would only find the
words cat, cot and cut.

Full-text Search Options

Boolean Operators
Whenever full-text searches are performed,
there are usually several documents that meet
the search criteria. Boolean operators (AND,
OR and NOT) help fine-tune searches reduce
the number of unrelated documents. For example, to find documents relating to the former
president and not to horticulture, users could
search for Bush AND President.

To maximize the effectiveness of full-text


searches, there are several helpful options. These
options take possible OCR errors into consideration, offer tools to narrow searches, provide
lines of context for the search words, and make
it easier to find the search word when the document is viewed.

Fuzzy Logic

Proximity Searches

Full-text searches assume that the search words


have been spelled and OCRed correctly.
Unfortunately, people misspell words and no
OCR package is 100% perfect. Fuzzy logic
compensates for these errors by searching for
variations on the spelling of a word. An imaging
system should allow the user to control the
amount of fuzziness of the search by setting
how many letters can be wrong or what percentage of a word can be wrong. For example, a
fuzzy logic search for goat would find goat,
gout and coat.

Proximity searches can also be used to narrow


the search results. They are used to find words
that occur within a certain number of words,
sentences or paragraphs of each other. For
example, to find documents relating to tobacco
lawsuits, but not smoking ordinances or tobacco growing, users could search for tobacco
within one sentence of lawsuit.

18

Security

Lines of Context
Even specific searches usually produce several
possible documents. In addition to providing
users with a list of documents that meet their
search criteria, some imaging systems also display lines of context that show how each occurrence of the search word is used in each document. Lines of context help users find the
appropriate document without having to view
every document in the search results.

An imaging systems security is critical to a successful implementation. While security may not
be a primary concern for a single department
installation, it becomes more important as the
system is expanded to allow different departments and the public access to files. A document
imaging system should provide security on
multiple levels to allow each installation to use
the method that best fits its needs. The systems
security should parallel that of the network and
be simple to administer.

Highlighted Search Words


Once a document is selected, the search word
needs to be located within it. To help with this,
some imaging systems display the appropriate
page of the document and highlight the search
word in both the text and image. This makes it
easy for the user to immediately zoom in on the
relevant section instead of having to look
through multiple pages of a document. The
importance of this becomes obvious when the
needed word occurs on page 97 in a 200 page
document.

Access Rights
An imaging system should let organizations
control access to folders and individual documents on both group and individual levels.
Using groups and inherited rights allows
administrators to quickly assign viewing privileges, while individual-level security allows specific users such as managers to view documents
that the rest of the group can not. For example,
a human resources staff member might be able
to view the personnel files of everyone except
other HR personnel, the HR supervisor could
view all personnel files, and staff in other
departments wouldnt be able to view any personnel files at all.

Internet/Intranet
An imaging system should provide a simple way
to publish information to the Internet or an
intranet. This allows organizations to share
information with other departments, remote
offices, clients or the public. Web systems
should be fully searchable and must support the
same security protocols as network systems.
Ideally, an imaging system will require no
HTML or complex coding to post files to the
Web.

Function Rights
An imaging system should also let organizations
control the function rights to folders and individual documents on both individual and

19

group levels. While access rights control what


folders and documents users can view, function
rights control what actions those users can do to
a document, such as adding, editing, copying, or
deleting records. For example, while different
departments could have viewing privileges to
City Council minutes, only the City Clerk
would have modification rights to those files.

business trips, they often need to bring key documents with them. Carrying many paper documents is often impractical, and copying an
entire database to a laptop can be impossible, so
important information may be left behind.
With an imaging system that supports briefcases or portable volumes, documents can be
detached or copied and moved to other databases in other locations. Imaging folders containing relevant documents can be transferred
to other databases quickly and easily using
searchable CDs that hold up to 12,000 pages
each.

Redaction
Redaction (blackout or whiteout) allows security to be controlled down to the individual word
level. An imaging system should offer the ability to redact portions of a documents image
and/or text. The users ability to view redacted
text would depend on their security rights. For
example, crime reports may be available to multiple departments, but only the Police
Department would be able to see identifying
information such as name.

Optical disks weigh much less than paper files

Audit Trails
If an imaging system does not provide this level
of document portability, users of the system will
find it difficult to bring their documents on the
road and to transfer files between different
offices. Briefcases and portable volumes help
users to transfer their documents to other
offices, laptops or customers quickly and easily.

As an additional level of security, an imaging


system should offer the ability to track who is
using the system, what documents are being
viewed, what actions are being performed on
the documents, and when these actions are
being performed. Audit trail abilities are especially important when documents are confidential and when there are many different users.

Briefcases
For users who have a copy of the imaging software on their laptop or remote office, many systems allow users to simply drag and drop the
appropriate imaging system folders into a
briefcase and transfer the briefcase to the laptop or remote system.

Portability and CDs


Imaging systems enable users to carry important documents anywhere for convenient viewing on other computers. When people go on
20

Portable Volumes

Outsourcing Scanning

Portable volumes are like very, very big briefcases and allow for constant updates to shared
imaging databases in different locations. This
ability is useful for organizations that use a
scanning bureau on an ongoing basis or for
organizations with multiple branches. On many
large-scale imaging systems, the document files
are stored on multiple drives or network volumes. Portable volumes allow entire volumes
containing document images and text to be
transferred en masse to another database.

Organizations sometimes find it faster or more


cost effective to have a service bureau perform
their backfile document conversion or ongoing
document scanning. Generally, the imaging system is maintained by the organization and the
service bureau regularly delivers CDs containing the scanned documents. In addition to storing images and text information, these CDs
must also carry data describing the document
names, index fields, folders, etc.
If the organization has been modifying its existing documents and creating new ones during
this time, it cant simply overwrite its database
with the new one provided by the service
bureau. Instead, the imaging system must be
able to merge the new data from the service
bureau with the organizations existing data.
Portable volumes do this automatically.

Distributing Documents
Most organizations need to share documents
with their business associates or customers.
With a paper system, multiple copies would be
printed, perhaps bound and then delivered.
Imaging systems allow an organization to
quickly copy the appropriate files to a CD and
then send that, saving on printing and postage
costs. To be most effective, the imaging system
should allow royalty-free CD duplication and
provide free imaging viewers that enable even
people without an imaging system to search for
and view documents on the disc.

Disaster Recovery
Disasters can strike at any time and damage or
destroy an organizations documents. To help
recover from a disaster, it may be worth keeping
secure backups of documents with the aid of a
document imaging system. Portable volumes
that allow the re-creation of a system from
scratch can simplify this.

With the popularity of the Internet and private


intranets, making sure that documents can be
easily found and read over the web may be an
important priority.

21

Batch Processing

codes require some preparation of the database,


their benefits can be enormous. For example, if
2000 voter registrations, 500 inquiries and 2500
pages of legislative minutes were to be scanned,
bar code stickers could be placed on each document. The system could then automatically read
the stickers, determine the start of each new
document, assign the correct type of index template for each, and fill in template information.

Organizations that image more than a handful


of files a day will quickly realize the importance
of batch processing. When large numbers of
documents need to be brought into the imaging
system daily, it is inefficient to process each one
individually. An imaging system should allow
records to be brought into the system in one
batch to speed up the process.

Zone OCR

The slow start/stop system of individual document scanning or conversion can be avoided by
bringing in documents as one large document
batch. Once all the images have been brought
in, the system should allow users to easily group
them into the appropriate documents before
assigning, index fields and moving them to their
appropriate folder locations. The system should
allow pages to be rearranged, removed or added
to a document to correct any mistakes that may
have occurred in the organization of a file.
Similarly, it should be simple to update or add
index fields at a later time.

Organizations that process the same forms


repeatedly may want to use zone OCR to save
data entry time and system memory. Zone OCR
saves time by automating document indexing
by reading certain regions (zones) of a document, then placing the text into the appropriate
index template fields. The amount of storage
space needed is also reduced because only the
responses that have been entered are OCRed
and indexed.
To minimize errors, the system should allow the
user to set a minimum percent accuracy level
for OCR. If any portion of the form does not
meet this standard, the system should notify the
user so that a staff member can read the form
and manually enter the correct field information.

Bar Codes
In high-volume scanning operations, automatically separating and indexing documents using
bar codes can save time and money. Bar codes
can be used to index documents by extracting
fields from an external database, by filling in
fields with pre-assigned values, or by associating
certain documents with a particular index template. Bar codes can even act as markers to indicate the beginning of a new document,
automating document separation. While bar

Workflow
Workflow can increase the benefits of a document imaging system by routing documents to
various people. While this added functionality
may not be crucial during a pilot phase, it
22

becomes more important as an office expands


its system.

information through its folder structure and


system security. Using security access as the key
to systems implementation allows the System
Administrator to easily modify access rights as
necessary to accommodate what actually happens in a dynamic working environment.

Workflow should automatically notify specific


users of specific imaging system events, based
on lists created by the system administrator.
Once an event is detected, communication is
established with the existing e-mail server to
send e-mail notification of the event to the
recipient. Workflow should also include a set of
condition tables that include the use of return
receipts and timed responses. If a condition
table is used, the imaging server should send
either a reminder message or a second message
to an alternate recipient. These condition tables
help to eliminate bottlenecks and streamline
business processes.

Workflow systems should offer administrators


drag and drop simplicity, a simple GUI interface
and an easily understood folder structure.
Workflow applications should also be ODBCcompliant to link the imaging database with
third-party external database utilities and customized applications to create a fully functional
workflow environment. As a final component,
workflow must provide for comprehensive
security reporting through an audit trail function.

An essential component in any procedural


workflow system is document automation.
Rules-based document workflow requires that
information move through a hierarchical system with a minimal amount of outside intervention. Workflow should be able to automatically move, copy or delete documents within the
imaging database based on a predetermined set
of rules. The System Administrator uses the
rules list to establish the routing protocols and
conditions.

Scalability
The scalability of a system determines how
much the imaging system can grow with your
organizations needs. For full scalability, a system should have the following attributes:
Support an entire enterprises users
concurrently
Store all documents in the enterprise

Every office environment consists of the ideal


and the practical, when it comes to office
automation and work distribution. The success
of any workflow suite is not its ability to follow
the strict routing and reporting features of a
fully automated system, but to handle the
exceptions to the rules that arise. An effective
workflow system should provide complete
access to on-the-fly routing of documents and

Robust system architecture


Store information across multiple drives or
servers
Support multiple databases
Expand to the Web
Publish information to CD or DVD

23

System Compatibility

valuable time and energy. To foster collaboration, networked systems are vital.

Compatibility is the ability of an imaging system to work with existing computer and network systems. To maximize the likelihood of
compatibility with your existing systems, an
imaging system should:

Networked systems can also carry out certain


imaging functions more efficiently than individual PCs can. For example, Optical Character
Recognition (OCR) of an image requires a great
deal of computing power.

Work with existing operating systems, such


as Novell or Windows NT servers, and
Windows desktops

Client-Server
Architecture

Communicate using popular network


protocols such as IPX/SPX or TCP/IP

Imaging applications consume computer


resources: Image files are big and the databases
must track large numbers of records.
Furthermore, functions such as OCR, image
display and searching require extensive computing power. Client-server architecture
becomes a requirement when more than a
handful of people needs to access documents
from an imaging system. Even if an installation
starts out with a single-user pilot project, it is
important to ensure that the imaging system
will be able to handle future growth.

Have the capability to find and view documents


over the web.
Use an open architecture and non-proprietary
database
Use client/server architecture with clientside image compression/decompression and
server-side searching and indexing to
minimize traffic loads on your network
Store files in industry-standard formats

Networked Systems

With true client-server architecture, tasks such


as indexing, OCR and searches are distributed
between the client (the PC workstation) and
the server for optimal performance. Some tasks
are more efficient for the client to perform,
while others are better suited to a centralized
server. Where specific tasks are performed may
vary from imaging system to imaging system.

In any office, documents are used to transmit


information between people. For document
imaging to be truly useful in an office environment, documents must be accessible to everyone who is authorized. It is important for document imaging systems to have a central repository of records, accessible from any PC. Storing
documents on individual PCs impairs the flow
of information between coworkers and wastes

On traditional file-sharing programs, data file


integrity can be compromised when a worksta-

24

tion program is interruped in the middle of a


transaction. With client-server architecture, the
client does not open data files directly, so client
interruptions do not threaten data integrity.

computers will be like ten or twenty years from


now. However, the need for faster retrieval and
improved records management means that
many offices need to find an imaging solution
today.
To address these concerns, imaging systems
should use non-proprietary image and text formats. As the examples of word processors show,
documents saved in WordStar, old WordPerfect
or even old MS Word are already difficult to
read. Since each word processing company uses
proprietary formats for their documents, getting the latest software to read old formats can
be a frustrating or expensive task. The same
applies in the imaging world.

Example of a client/server system

Searches can be carried out much faster on the


server since the server is typically more powerful than individual workstations. However, with
traditional file-sharing imaging systems, a
copy of the database is sent over the network to
the PC and the PC performs the searches. This
method leads to: (a) increased network traffic if,
for example, the database is 800 MB in size; and
(b) search times dependent on the speed of the
PC workstation. File-sharing systems may be
less expensive to start with, but their limitations
restrict system expansion and flexibility.

The non-proprietary formats available for storing document information are few, but stable.
ASCII has been a standard for text information
since 1963 and has become a basic building
block for practically every program involving
text. TIFF has been used as a standard, non-proprietary graphics format since 1981. It is widely
used to transmit document information by
imaging systems, fax machines and software.
Given the prevalence of ASCII and TIFF, system
purchasers can feel comfortable that no matter
what new paradigm arises in the future, the
developers of the new format will have a vested
interest in providing a conversion for these
standards. With proprietary document formats,
or when proprietary headers are used in TIFF
images, there is no such assurance.

Non-proprietary File
Formats
Concerns about future readability make many
records managers hesitate to implement an
imaging solution. With the computer industry
changing so rapidly, it is hard to predict what

25

Frequently Asked Questions


Q. What is the standard format used to
store images?

General

A. Black and white images are most commonly


stored as standard TIFF files using CCITT
Group 4 (two-dimensional) compression.
Grayscale and color images are frequently
stored as TIFF files with JPEG compression.

Q.What is a document?
A. A document can be from one to several
thousand pages, and can include images and/or
text, plus annotations, and one template (index
card).

Q.Which types of desktop operating


systems are usually supported?

Q. Can I edit or alter images?

A. Most imaging systems have client applications that can run as Windows applications on
Windows 95, 98 and Windows NT.
Internet/intranet systems may be able to run on
additional platforms, such as Macintosh and
Unix, among others.

A. An imaging system should not provide any


facility for editing or altering images. This is
important as many users consider that images
should be sacrosanct and that any changes
would undermine the integrity of the system. In
addition, the system should provide an audit
trail function to keep track of which users have
accessed which documents at what times.

Q. How much disk space does an imaging


system typically require?
Q. Do imaging systems support audit
trails?

A. With the rapid drop in prices for hard drives


and optical media, it costs much less to store
documents on an imaging system than with
paper. A single page typically occupies around
50KB of disk space if the image is stored in TIFF
Group IV. Each gigabyte (GB) of storage space
(which costs only a few dollars) will hold
approximately 20,000 pages.

A. An imaging systems audit trail product


should record a user name, date, time, document name and action whenever a user accesses
a database or document. Various levels of audittrail logging detail and activity tracking should
be available. The system should also support a
viewer for sorting and filtering these logs.

26

Scanning/Importing/
Storing

Q.What if my database is too big to fit in


one data volume?
A. A high-end imaging system will allow data
and images to be stored across multiple volumes,
with each volume residing in a different directory or on a different drive, disk array, CD or MO
disk.

Q.Which manufacturers make


document imaging scanners?
A. Some of the top scanner manufacturers
include Ricoh, Fujitsu, Panasonic, Bell &
Howell, Canon, Hewlett Packard, Avision,
Mitsubishi, Visionshape, Kodak and BancTec.
Document imaging scanners typically have document feeders and fast scan rates to quickly
bring in large amounts of documents.

Q. How much total RAM does imaging


software require?
A. Client software generally requires 16 to 20
MB of RAM to run, with higher requirements
for scanning and OCR. Most systems recommend having 64MB or more.

Q.What are the most common


hardware and software scanner
interfaces?

Q.Are special display cards or monitors


required?

A. Kofax Image Controls (http://www.kofax.com)


provide the most popular document imaging
scanner interfaces.

A. Most systems work with any Windows-compatible video card and VGA (or better) monitor,
and recommend that you use at least a 15" monitor with at least 800 x 600 dpi in resolution.

Many scanners attach to an Adaptec SCSI card


or to a Kofax Image processing board. Most
scanners use either TWAIN or ISIS scanner
drivers to communicate with the computer.

27

Q. How can I scan checks?

Q.What about color files or photographs?

A. Several manufacturers make scanners specifically designed for checks that read the magnetically encoded MICR numbers at the bottom of
the check. If you do not have one of these scanners, most checks can be scanned with regular
document imaging scanners and OCRed as
usual, though the MICR numbers will not be
read.

A. Imaging systems should support black and


white, grayscale and color images. Color files
can be scanned with a color scanner or
imported into an imaging system. There are a
wide range of color scanners on the market.
Many document imaging scanners support
color and grayscale.

Q. How can I scan large format


documents?

Q. How can I scan double-sided


documents?

A. Several manufacturers, including Contex,


Vidar, Oc and Calcomp make scanners specifically designed for large format documents up to
E-size (34" x 44") and A-0 size (33" x 46.8"). If
you do not have one of these, the document can
be reduced in size using a photocopier and then
scanned with a normal scanner, or sent to a
service bureau that has large format scanners.

A. An imaging system should provide two different ways to do this. It should support duplex
scanners, which simultaneously scan both sides
of a page. Also, with a simplex scanner, the user
should be able to scan all the front sides, place
the documents in upside down and scan all the
back sides, and then the system should automatically collate the pages into the correct
order.

Q.What image resolution should I use?


A. Most imaging systems can support documents scanned at various resolutions, from 50
dpi to 600 dpi (or more) depending on your
scanner. Depending on the purpose and the
contents of the page, most documents are
scanned in black and white at 300 dpi.

Q. Can I scan landscape and portrait


pages together in one batch?
A. An imaging system should allow you to
change the orientation of pages as you scan or
after scanning. A well-designed system will also
include an option to automatically check and
correct the orientation of pages.

28

tower, many CD or DVD drives are stacked


together in a single unit, and every disk is always
sitting in a drive. Towers provide faster data
access but typically cost more per disk and do
not hold as many disks. Jukeboxes/changers cost
less per disk and can hold up to 500 disks, but
are slower because swapping disks in and out of
the drives is time-consuming.

Q. How are skewed images handled?


A. Skewed (crooked or tilted) images can
adversely affect the accuracy of the OCR
process, so an imaging system should include
software that recognizes skewed images and
compensates for them. This is particularly
important when scanning press cuttings on a
flat bed scanner or when scanning documents
through a worn-out or poorly-designed ADF
(automatic document feeder).

Viewing/Printing/
Exporting

Q.What file formats can a versatile


system import?
A. A versatile system should be able to import
the files you would encounter in your office.
This includes word processing files, spreadsheets and presentations as well as common
image formats such as TIFF 4, TIFF 3, TIFF
Raw, TIFF LZW, PCX, BMP, CALS, JPEG, GIF,
PICT, PNG and EPS Preview images. An imaging system providing long term archival of documents should allow the images of each page to
be stored in a non-proprietary format. For
example, electronic document pages would be
printed to the imaging system, black and
white graphical files would be converted to TIFF
Group 4 format and color/grayscale images
would be converted to TIFF JPEG.

Q. Can I view combinations of images,


text and index fields side by side?

Q.What is the difference between


CD or DVD jukeboxes/changers
and towers?

Q. How can I re-sequence pages?

A. To allow convenient access to document


information, a well-designed imaging system
will allow the view screen to be configured to
show the text, images, template index fields or
thumbnail images.
Q. Can I open and display more than
one document at a time?
A. Some imaging systems will allow you to display multiple documents, with the number of
documents you can have open simultaneously
limited only by the amount of memory available.

A. If pages are out of order and need to be resequenced, a well-designed imaging system will
allow thumbnail views of pages to be simply
dragged to the required position. In the same
way, individual pages can be selected and deleted, subject to appropriate security access control
and privileges.

A. In a jukebox/changer, there are more slots


and disks than there are drives. Robotic mechanisms automatically place the correct disk into
one of the drives when the disk is needed. In a
29

Q.Will I need a specialized imaging


display?

Q.Will I need a specialized printer


for images or OCRed text?

A. No, most systems run perfectly well on standard VGA and better monitors. A 15" display
using a Super VGA controller should be considered
the absolute minimum practical display for an
ad hoc user of the system. Frequent users
should have a 17" monitor, and users who scan
or review imaged documents full-time may
want to consider a 19" or 21" monitor.

A. Generally no. Most imaging systems support


most Windows compatible printers, but recommend that you use a laser printer with at least 4
MB of RAM. If you are using a networked system and printing high volumes of pages to a
network printer, you might consider installing a
separate laser printer either locally or on its own
network segment to minimize network traffic.

Q.What is the advantage of a large


monitor for power users?

Q. In which formats can I export


documents?

A. For people who use an imaging system


intensively, screen size can be a critical factor. If
users are to flip between pages with the ease of
real paper, they must be able to view the whole
page at once in a way that allows the text to be
readable. If 81/2" x 11" pages are the dominant
paper size, then a 21" monitor capable of displaying 1600 x 1200 is optimal. Using a standard
14" VGA monitor will require scrolling and
panning if the image is viewed at normal size.

A. It depends on the imaging system. Common


graphical formats you may need include TIFF
III, TIFF IV, TIFF Raw, BMP, GIF, CALS and
JPEG.

Q.What is important besides


monitor size?
A. Screen resolution and the refresh rate of the
monitor are also important. Generally, the larger a monitor is and the higher resolution it has,
the harder it is to get the high refresh rate that is
required for sustained viewing without screen
flicker. The optimum threshold for minimum
flicker is generally considered to be a horizontal
refresh rate of 72 Mhz on a 21" monitor. The
maximum refresh rate is a function of the monitor
and the graphics controller.
30

Q. Do I have to go through and correct


OCR mistakes?

OCR: Optical Character


Recognition

A. Not if the imaging system supports fuzzy


logic, which will find words even if the OCR
engine made a few mistakes.

Q.What is OCR?
A. OCR stands for Optical Character
Recognition, which is how a computer converts
words in an unsearchable scanned image to
searchable text. OCR is usually necessary in
order to use full-text indexing and searches, and
it should be included in an imaging system.
OCR engines can generally only recognize
typed or laser-printed text, not handwriting.

Q. How fast is the OCR process?


A. The performance of the OCR and indexing
processes is entirely dependent on factors such
as the speed and configuration of the host system as well as the contents of the image. A 133
MHz Pentium generally needs about 6 seconds
per page, while a 450 MHz Pentium II will take
about 2-3 seconds per page.

Q.What is the difference between


OCR and indexing?
Q.What is ICR (Intelligent Character
Recognition)?

A. OCR is the process of converting scanned


images to text files. Full-text indexing is the
process of taking a text file and adding each
word to an index file that specifies the location
of every word on every document. Well
designed imaging software can make this a fast
and easy procedure, providing rapid access to
any word in any document.

A. ICR is pattern based character recognition


and is also known as Hand-Print Recognition.
Handwritten text is more difficult for computers to recognize and results in higher error rates
than printed text. ICR engines usually do best at
recognizing constrained printing, which means
block printed letters with one letter in each box.
Accurate recognition of unconstrained handwriting, especially cursive handwriting, typically requires that the ICR engine be trained to
recognize each users style of writing.

Q. How accurate is OCR?


A. Accuracy on a freshly laser-printed page is
typically better than 99.6%. Accuracy on faxed,
dirty or degraded documents will of course be
lower, but a few imaging systems have image
clean-up technology that can improve OCR
accuracy.

31

Q.What is OMR (Optical Mark


Recognition)?

COLD: Computer
Output to Laser Disc

A. OMR, also called Mark-Sense Recognition, is


the recognition of marks commonly used on
forms, such as check marks, circled choices, and
filled-in bubbles. OMR can be an important
part of an imaging system for organizations that
process many standard forms. Scantron exam
forms and customer survey cards are perhaps
the best-known examples of OMR in action.

Q.What is the difference between


COLD and imaging?
A. Imaging is for scanning, compressing, storing, indexing, OCRing, searching and retrieving
millions of pages of paper documents or electronic documents archived as permanent
images. COLD is for archiving, indexing,
searching and printing reports from huge text
files generated by mainframes, mini-computers
and other computer applications. COLD stores
huge report files and extracted index fields on
hard disk, optical cartridge or CD-ROM instead
of printing all the information out on paper or
storing it to microfilm.

Q. Can OCRed text be exported and


re-used in a word processor?
A. Yes, you can usually cut and paste text
between the imaging system and another
Windows application, or you can export complete text files (all text pages in a document) to
a directory and open it with your favorite word
processor.

Q. How many index fields can the


COLD server extract from each
report?

Q. Can I manually correct OCR


errors and typos?

A. The number of index fields is usually unlimited. However, the more fields extracted from
each report, the slower the extraction process
will run and the larger the index files will be.

A. Well-designed systems allow users to correct


OCR errors from within the system. However,
when hundreds or thousands of pages are
scanned every day, it is usually not practical to
have someone clean up the text. If fuzzy logic
search capabilities are available, it is not necessary to correct the text as searches will typically
still find misread words.

32

Glossary of Terms
ADF

Bar Code

Automatic Document Feeder. This is the means


by which a scanner feeds the paper document.

A small pattern of vertical lines that is read by a


laser or an optical scanner, and which corresponds to a record in a database. An add-on
component to imaging software, this feature is
designed to increase the speed with which documents can be archived.

Annotations
The changes or additions made to a document
using sticky notes, a highlighter, or other electronic tools. Document images or text can be
highlighted in different colors, redacted
(blacked-out or whited-out), stamped (e.g.
FAXED or CONFIDENTIAL), or have electronic sticky notes attached. Annotations
should be overlaid and not change the original
document.

Batch Processing
The name of the technique used to input a large
amount of information in a single step, as
opposed to individual processes.

Bitmap/Bitmapped
See Raster/Rasterized.

BMP
A native file format of Windows for storing
images called bitmaps.

Boolean Logic

Stamps, sticky notes and highlighting annotations

The use of the terms AND, OR and NOT


in conducting searches. Used to widen or narrow the scope of a search.

ASCII
American Standard Computer Information
Interchange. Used to define computer text that
was built on a set of 255 alphanumeric and control characters. ASCII has been a standard, nonproprietary text format since 1963.

Briefcase
A method to simplify the transport of a group
of documents from one computer to another.
33

Burn (CDs or DVDs)

CD-ROM Drive

To record or write data on a CD or DVD.

A computer drive that reads compact discs.

Caching (of Images)

Client-Server Architecture vs.


File-Sharing

The temporary storage of image files on a hard


disk for later migration to permanent storage,
like an optical or CD jukebox.

Two common application software architectures found on computer networks. With filesharing applications, all searches occur on the
workstation, while the document database
resides on the server. With client-server architecture, CPU intensive processes (such as
searching and indexing) are completed on the
server, while image viewing and OCR occur on
the client. File-sharing applications are easier to
develop, but they tend to generate tremendous
network data traffic in document imaging
applications. They also expose the database to
corruption through workstation interruptions.
Client-server applications are harder to develop,
but dramatically reduce network data traffic
and insulate the database from workstation
interruptions.

CD Publishing
An alternative to photocopying large volumes of
paper documents. This method involves coupling image and text documents with viewer
software on CDs. Sometimes search software is
included on the CDs to enhance search capabilities.

CD-R
Short for CD-Recordable. This is a CD which
can be written (or recorded) only once. It can be
copied to distribute a large amount of data. CDRs can be read on any CD-ROM drive whether
on a standalone computer or network system.
This makes interchange between systems easier.

COLD
Computer Output to Laser Disk. A computer
programming process that outputs electronic
records and printed reports to laser disk instead
of a printer. Can be used to replace COM
(Computer Output to Microfilm) or printed
reports such as green-bar.

CD-ROM
Compact Disc Read Only Memory. Written on a
large scale and not on a standard computer CD
burner (CD writer), they are an optical disk
storage media popular for storing computer
files as well as digitally-recorded music.

CD-ROMs
34

COM

Dithering

Computer Output to Microfilm. A process that


outputs electronic records and computer generated reports to microfilm.

The process of converting grays to different


densities of black dots, usually for the purposes
of printing or storing color or grayscale images
as black and white images.

Compression Ratio
The ratio of the file sizes of a compressed file to
an uncompressed file, e.g., with a 20:1 compression ratio, an uncompressed file of 1 MB is
compressed to 50 KB.

Document Imaging

CPU

Drag-and-Drop

Central Processing Unit. The brain of the


computer.

The movement of on-screen objects by dragging them across the screen with the mouse.

De-shading

Duplex Scanners vs. Double-Sided


Scanning

Software used to store, manage, retrieve and


distribute documents quickly and easily on the
computer.

Removing shaded areas to render images more


easily recognizable by OCR. De-shading software typically searches for areas with a regular
pattern of tiny dots.

Duplex scanners automatically scan both sides


of a double-sided page, producing two images
at once. Double-sided scanning uses a singlesided scanner to scan double-sided pages, scanning one collated stack of paper, then flipping it
over and scanning the other side.

De-skewing
The process of straightening skewed (off-center) images. De-skewing is one of the image
enhancements that can improve OCR accuracy.
Documents often become skewed when they are
scanned or faxed.

DVD
Digital Video Disc or Digital Versatile Disc. A
plastic disc, like a CD, on which data can be
written and read. DVDs are faster, can hold
more information, and can support more data
formats than CDs.

De-speckling
Removing isolated speckles from an image file.
Speckles often develop when a document is
scanned or faxed.
35

Full-text Indexing and Search

Electronic Document
Management

Enables the retrieval of documents by either


their word or phrase content. Every word in the
document is indexed into a master word list
with pointers to the documents and pages
where each occurrence of the word appears.

Imaging software that helps manage electronic


documents.

Erasable Optical Drive


A type of optical drive that uses erasable optical
discs.

Fuzzy Logic
A full-text search procedure that looks for exact
matches as well as similarities to the search criteria, in order to compensate for spelling or
OCR errors.

Flatbed Scanner
A flat-surface scanner that allows users to input
books and other documents.

GIF
CompuServes native file format for storing
images.

Folder Browser
A system of on-screen folders (usually hierarchical or stacked) used to organize documents. For example, the File Manager program
in Microsoft Windows is a type of folder browser that displays the directories on your disk.

Gigabyte
One billion bytes. Also expressed as one thousand megabytes. In terms of image storage
capacity, one gigabyte equals approximately
17,000 81/2" x 11" pages scanned at 300 dpi,
stored as TIFF Group IV images.

Forms Processing
A specialized imaging application designed for
handling pre-printed forms. Forms processing
systems often use high-end (or multiple) OCR
engines and elaborate data validation routines
to extract hand-written or poor quality print
from forms that go into a database. This type of
imaging application faces major challenges, since
many of the documents scanned were never
designed for imaging or OCR.

Grayscale
See Scale-to-Gray.

36

Hierarchical Storage Management


(HSM)

Internet Publishing
Specialized imaging software that allows large
volumes of paper documents to be published on
the Internet or intranet. These files can be made
available to other departments, offsite colleagues or the public for searching, viewing and
printing.

Software that automatically migrates files from


on-line to near-line storage media, usually on
the basis of the age or frequency of use of the
files.

ICR
Intelligent Character Recognition. A software
process that recognizes handwritten and printed text as alphanumeric characters.

IPX/SPX

Image Enabling

ISIS and TWAIN Scanner Drivers

A software function that creates links between


existing applications and stored images.

Specialized applications used for communication between scanners and computers.

Image Processing Card (IPC)

ISO 9660 CD Format

A board mounted in either the computer, scanner or printer that facilitates the acquisition and
display of images. The primary function of
most IPCs is the rapid compression and decompression of image files.

The International Standards Organization format for creating CD-ROMs that can be read
worldwide.

Communications protocol used by Novell networks.

JPEG

Index Fields

An image compression format used for storing


color photographs and images.

Database fields used to categorize and organize


documents. Often user-defined, these fields can
be used for searches.

Jukebox
A mass storage device that holds optical disks
and loads them into a drive.

37

Key Field

NT

Database fields used for document searches


and retrieval. Synonymous with index field.

Network Technology. Refers to Microsoft


Windows NT server and workstation software.

Magneto-Optical Drive

OCR

A drive that combines laser and magnetic technology to create high-capacity erasable storage.

Optical Character Recognition. A software


process that recognizes printed text as alphanumeric characters.

MAPI
Off-Line

Mail Application Program Interface. This


Windows software standard has become a popular
e-mail interface and is used by MS Exchange,
GroupWise, and other e-mail packages.

Archival documents stored on optical disks or


compact disks that are not connected or
installed in the computer, but instead require
human intervention to be accessed.

Near-Line
Documents stored on optical disks or compact
disks that are housed in the jukebox or CD
changer and can be retrieved without human
intervention.

On-Line

NetWare Loadable Module


(NLM)

Optical Disks

Documents stored on the hard drive or magnetic


disk of a computer that are available immediately.

Computer media similar to a compact disc that


cannot be rewritten. An optical drive uses a laser
to read the stored data.

An application that runs as part of the network


operating system (NOS) of a Novell NetWare
server.

38

Optical Jukebox
See Jukebox.

Raster/Rasterized (Raster or
Bitmap Drawing)

Phase Change

A method of representing an image with a grid


(or map) of dots or pixels. Typical raster file
formats are GIF, JPEG, TIFF, PCX, BMP, etc.

A method of storing information on rewritable


optical disks.

Region (of an image)


An area of an image file that is selected for specialized processing. Also called a zone.

Pixel
Picture Element. A single dot in an image. It can
be black and white, grayscale or color.

Scale-to-Gray
An option to display a black and white image
file in an enhanced mode, making it easier to
view. A scale-to-gray display uses gray shading
to fill in gaps or jumps (known as aliasing) that
occur when displaying an image file on a computer screen. Also known as grayscale.

Portable Volumes
A feature that facilitates the moving of large volumes of documents without requiring copying
multiple files. Portable volumes enable individual CDs to be easily regrouped, detached and
reattached to different databases for a broader
information exchange.

Scalability
The capacity of a system to expand without
requiring major reconfiguration or re-entry of
data. Multiple servers or additional storage can
be easily added.

RAID
Redundant Array of Independent Disks. A collection of hard disks that act as a single unit.
Files on RAID drives can be duplicated (mirrored) to preserve data. RAID systems may
vary in levels of redundancy, with no redundancy being a single, non-mirrored disk as level 0,
two disks that mirror each other as level 1, on
up to level 5, the most common.

Scanner
An input device commonly used to convert
paper documents into computer images.
Scanner devices are also available to scan microfilm and microfiche.

39

SCSI

TIFF

Small Computer Systems Interface. Pronounced


skuzzy. A standard for attaching peripherals
(notably mass storage devices and scanners) to
computers. SCSI allows for up to 7 devices to be
attached in a chain via cables. The current SCSI
standard is SCSI II, also known as Fast SCSI.

Tagged Image File Format. A non-proprietary


format raster graphics image that has many different compression formats. TIFF has been in
use since 1981.

TIFF Group III (compression)


A one-dimensional compression format for
storing black and white images that is utilized
by most fax machines.

SCSI Scanner Interface


The device used to connect a scanner with a
computer.

TIFF Group IV (compression)

SQL
Structured Query Language. The popular standard for running database searches (queries)
and reports.

A two-dimensional compression format for


storing black and white images. Typically compresses at a 20-to-1 ratio for standard business
documents.

TCP/IP

Video Scanner Interface

Network communications protocol. This is the


protocol used by the Internet.

A type of device used to connect scanners with


computers. Scanners with this interface require
a scanner control board designed by Kofax,
Xionics or Dunord.

Templates, Document
Sets of index fields for documents.

Workflow, Ad Hoc
A simple manual process by which documents
can be moved around a multi-user imaging system on an as-needed basis.

Thumbnails
Small versions of an image used for quick
overviews or to get a general idea of what an
image looks like.

40

Workflow, Rule-Based
A programmed series of automated steps that
route documents to various users on a multiuser imaging system.

WORM Disks
Write Once Read Many Disks. A popular
archival storage media during the 1980s.
Acknowledged as the first optical disks, they are
primarily used to store archives of data that
cannot be altered. WORM disks are created by
standalone PCs and cannot be used on the network, unlike CD-Rs.

ZIP
A common file compression format that allows
quick and easy storage for transport.

Zone OCR
An add-on feature of the imaging software that
populates document templates by reading certain regions or zones of a document, and then
placing the text into a document index field.

41

About the Author


Experience in analysis and solution of
similar problems for other organizations.

LaserFiche created this guide to provide a comprehensive overview of document imaging and
management. We saw the need for a resource
that was suitable for people just learning about
imaging, those preparing to purchase a system,
and for all of those in between.

Understanding of the complexities and


interrelationships of people.
Time to concentrate without interruption
upon saving the problem at hand.

We hold the philosophy that our customers are


intelligent, responsible people who should get
the documents they need as easily as possible.
This means we try to make sure our software is
a pleasure to use and adapts to existing procedures.

The experience of working directly with records


management professionals has taught LaserFiche
a great deal about what works, what works better, and what works best. Unlike companies
whose only expertise is in technology,
LaserFiche combines its knowledge of working
systems with the capabilities of technology to
create better working environments.

Since 1987, LaserFiche has pioneered high-volume document storage and retrieval systems.
LaserFiche document imaging software helps
manage documents in over 15,000 school districts, laws offices, insurance companies and
other business installations around the world,
including over a thousand municipal, state and
federal government agencies.

LaserFiche is a division of Compulink


Management Center, Inc. Compulink is a certified WBE and MBE, and has a successful track
record in assisting organizations in establishing
electronic document management systems.
LaserFiche welcomes the opportunity to answer
in detail any questions about document imaging and to demonstrate the LaserFiche system.

LaserFiches primary resource is a group of


exceptionally competent and experienced professionals who are well-rounded in both the
theoretical and practical aspects of computers
and office automation. Our analysts and programmers have held significant professional
positions in business and industry before entering the field of program development, consulting and custom computer solutions. To each
software project, LaserFiche strives to bring:

LaserFiche Document Imaging


20000 Mariner Avenue
Torrance, CA 90503
(310) 793-1888
(800) 985-8533
(310) 793-8531 fax
www.laserfiche.com

Objectivity competently and impartially


maintained.
42

Notes

You might also like