Document Scanning: A Primer
|
What
Does a Scanner Do? |
A scanner is one
of your first acquisitions on the road to paperlessness. A
scanner is a device used to convert paper documents into
digital files. Essentially, a scanner takes a digital picture
of each page of a paper document. That picture is an exact
duplicate of the source document. The scanner does not
differentiate between printed text, photographs, handwriting
or other marks. When you scan a pleading or a contract, the
scanner will produce an exact images of the actual paper
document as it existed, including printed text and any other
markings.
|
Scanners will also scan
photographs, newspaper articles, handwritten notes and any
other document that is duplicated on your copier. In fact,
your copier uses the same process as a scanner. It makes an
image of the document you want to copy and then prints the
image on a piece of paper. Your fax machine does almost the
same thing. It makes an image of the document you want to fax,
sends that image electronically over the phone lines to the
receiving fax machine, which then prints the electronic image
on paper. Your printer is really a scanner in reverse. It
takes an electronic file on your computer and prints it on
paper. The technology involved in all of these devices is
essentially the same.
|
Types of
Scanners |
Scanners have developed to
perform two different functions, scanning of photographs or
other graphic images and scanning of documents. Photographic
scanners or "flatbed scanners" have been engineered primarily
to scan and save a single image at a time, usually in color
and at high resolutions. Document scanners or "sheet feed
scanners," on the other hand, are engineered for the scanning
of documents. Sheet feed scanners are designed with integrated
automatic document feeders ("ADF"), operate at lower
resolutions than their Photo Scanner cousins and typically
scan only in black and white. Generally, consumer quality
flatbed scanners are less expensive than document scanners
that are designed for business use. Recently, flatbed scanner
manufacturers have added small automatic document feeders and
are marketing the machines as inexpensive document scanners.
However, most of these scanners are not well suited for even
small document imaging applications. |
Photo scanners and so called
"three-in-one" machines are not well suited for serious office
use. A flat bed scanner without an ADF would require that you
individually scan each page of a multi-page document and
combine the resulting separate files into a single document.
Even if the flat bed scanner has an ADF, these machines will
be slow (five pages per minute), lack memory and the capacity
of their ADFs would be too small to be usable. |
Instead, you will want to invest in
a good quality document scanner. Document scanners operate at
a minimum of 15 pages per minute and always have an integrated
automatic document feeder. The ADF will typically hold at
least 50 pages. Scanners operating at speeds of 20, 40, 50 or
more pages per minute will be an excellent choice in the law
office environment and will speed the imaging of paper
documents and the flow of paper out of the office.
|
Scanners are either simplex or
duplex. Simplex scanners are able to scan only one side of the
document at a time. Duplex scanners can scan both sides of a
double-sided document simultaneously. If you need to scan a
two-sided document with a simplex scanner, look for scanning
software that provides the option of automatically
interleaving alternating pages when you scan one side of the
document and then turn it over and scan the other. If you
frequently scan two-sided documents, look for a duplex
scanner. |
Scanners range in price from $750 to
$25,000. Manufactures include Fujitsu, Panasonic, Canon,
Ricoh, Kodak and Bell & Howell. The more expensive
scanners are faster, rated for more copies per day (referred
to as duty cycle), more ruggedly built and have more features.
Bell & Howell and Kodak are the premiere machines in the
industry. Fujitsu is a value leader. Panasonic and Ricoh are
solid machines at fair prices. For light office use, the
Fujitsu 15c is a good value at $750. As a benchmark, you
should expect to pay about $1,000 per lawyer for a scanner.
For example, in an office of five lawyers, expect to pay
$5,000 for a scanner. The more document intensive your
practice, the more scanner you will need. Attached is a list
of scanners from various different manufactures, showing
prices and basic features. We have not used the website and do
not endorse it. |
Digital copiers can do the
traditional work of a copier with the added capability of
scanning a document. Typically, the digital copier must be
equipped with the necessary hardware and software to attache
the scanned file to an e-mail message and send it to users of
the office network. Most digital copiers are useful for
scanning a few pages at a time and as a secondary or backup
scanner. However, with the exception of very large and
expensive machines, digital copiers are not a good choice for
your primary scanner because of their limited ADF capacity and
slow processing speed. All of the major copier manufacturers
sell digital copiers and most have the ability to add the
hardware and software needed for scanning. |
Digital
Senders. Hewlett-Packard manufacturers a line of scanners that
it calls "digital senders." A digital sender is a network
device that scans a document, converts the paper to a "PDF" or
"TIF" file format, attaches the scanned image to an e-mail and
sends the e-mail to any chosen recipient. The digital sender
also doubles as a fax machine. The H-P digital senders range
in price from $1,300 to $3,200. At the lower end, expect
speeds of four pages per minute and a 25 page ADF. At the
upper end, expect fifteen pages per minute and a 50 page ADF.
For more information, see
www.digitalsender.hp.com. |
Scanning and Optical Character
Recognition |
When you scan
a paper document, the electronic file that's produced is just
a picture of the document that was scanned. To convert that
electronic image to text that you can search and copy, you
need to first use an Optical Character Recognition ("OCR")
program. These programs do almost the same thing that you do
when you read. They analyze the image and attempt to recognize
those parts of the image that are letters, combine the letters
into words and the words into paragraphs. Handwriting cannot
be OCR'd because the program has no way of identifying the
squiggles as particular letters. |
Once the OCR program has completed
its work, the image file has been converted to searchable text
that can be copied and pasted into any other text-based
application, including word processors, spreadsheets and
e-mail. Some OCR programs alter the original image file so
that only the OCR'd text is now visible. Others create a
second file of searchable text. Still others display the image
file and hide the searchable text underneath, but in the same
electronic file. |
OCR programs are not perfect.
Their accuracy - the degree to which they are able to
correctly "read" the graphical image and convert it into text
- depends to a large extent on the quality of the document
that was scanned and the size and type of font. In addition,
different OCR programs achieve different levels of accuracy.
Using a first generation document with a standard font such as
Times New Roman, OCR accuracy of between 95% and 98% should be
achieved. Realize that means if your double-spaced document
has 400 words per page, at 97% accuracy you'll still have
twelve errors per page. |
When Should OCR Be
Used? |
Scanned documents are typically
OCR'd in two circumstances. The first is when you want to
convert a document from a graphic image into a word processing
document for editing. For example, if someone sends you a
paper copy of a contract for review, you can OCR the contract
and then make the changes you desire in Word or WordPerfect.
The second situation in which OCR is useful is when you want
the text of the document to be searchable. That might be the
case, for example, with trial exhibits or deposition
transcripts. |
Scanning |
Your scanner will have a variety
of different settings that can be adjusted. One setting
affects the resolution of the scanned image. DPI refers to
dots per inch and is a measure of the scanning resolution.
Generally, scanning documents at between 200 and 300 DPI
produces adequate image quality while keeping the size of the
image files manageable. Increasing the scanning resolution
above 300 DPI will significantly increase the size of the
image files without much increase in image quality. Scanning
at less than 200 DPI normally produces images of unacceptable
scan quality. |
Another scanner setting that you
need to be aware of is whether the document is being scanned
in black and white, grayscale or in color. For most purposes,
scanning documents in black and white rather than color is
preferable. Scanning documents in color will significantly
increase the size of your image files. This is true even if
the document contains only black and white text. |
Conclusion |
Assess your needs and go get a
scanner now. You can then begin converting your paper into
digital files that take up no space, can always be found, can
be read by multiple users simultaneously and that can be sent
by e-mail or transported on a laptop or optical
disk. |
| |
| |
| |