Document Scanning: A Primer

What Does a Scanner Do?

       A scanner is one of your first acquisitions on the road to paperlessness. A scanner is a device used to convert paper documents into digital files. Essentially, a scanner takes a digital picture of each page of a paper document. That picture is an exact duplicate of the source document. The scanner does not differentiate between printed text, photographs, handwriting or other marks. When you scan a pleading or a contract, the scanner will produce an exact images of the actual paper document as it existed, including printed text and any other markings.

     Scanners will also scan photographs, newspaper articles, handwritten notes and any other document that is duplicated on your copier. In fact, your copier uses the same process as a scanner. It makes an image of the document you want to copy and then prints the image on a piece of paper. Your fax machine does almost the same thing. It makes an image of the document you want to fax, sends that image electronically over the phone lines to the receiving fax machine, which then prints the electronic image on paper. Your printer is really a scanner in reverse. It takes an electronic file on your computer and prints it on paper. The technology involved in all of these devices is essentially the same.

Types of Scanners

     Scanners have developed to perform two different functions, scanning of photographs or other graphic images and scanning of documents. Photographic scanners or "flatbed scanners" have been engineered primarily to scan and save a single image at a time, usually in color and at high resolutions. Document scanners or "sheet feed scanners," on the other hand, are engineered for the scanning of documents. Sheet feed scanners are designed with integrated automatic document feeders ("ADF"), operate at lower resolutions than their Photo Scanner cousins and typically scan only in black and white. Generally, consumer quality flatbed scanners are less expensive than document scanners that are designed for business use. Recently, flatbed scanner manufacturers have added small automatic document feeders and are marketing the machines as inexpensive document scanners. However, most of these scanners are not well suited for even small document imaging applications.

    Photo scanners and so called "three-in-one" machines are not well suited for serious office use. A flat bed scanner without an ADF would require that you individually scan each page of a multi-page document and combine the resulting separate files into a single document. Even if the flat bed scanner has an ADF, these machines will be slow (five pages per minute), lack memory and the capacity of their ADFs would be too small to be usable.

    Instead, you will want to invest in a good quality document scanner. Document scanners operate at a minimum of 15 pages per minute and always have an integrated automatic document feeder. The ADF will typically hold at least 50 pages. Scanners operating at speeds of 20, 40, 50 or more pages per minute will be an excellent choice in the law office environment and will speed the imaging of paper documents and the flow of paper out of the office.

    Scanners are either simplex or duplex. Simplex scanners are able to scan only one side of the document at a time. Duplex scanners can scan both sides of a double-sided document simultaneously. If you need to scan a two-sided document with a simplex scanner, look for scanning software that provides the option of automatically interleaving alternating pages when you scan one side of the document and then turn it over and scan the other. If you frequently scan two-sided documents, look for a duplex scanner.

    Scanners range in price from $750 to $25,000. Manufactures include Fujitsu, Panasonic, Canon, Ricoh, Kodak and Bell & Howell. The more expensive scanners are faster, rated for more copies per day (referred to as duty cycle), more ruggedly built and have more features. Bell & Howell and Kodak are the premiere machines in the industry. Fujitsu is a value leader. Panasonic and Ricoh are solid machines at fair prices. For light office use, the Fujitsu 15c is a good value at $750. As a benchmark, you should expect to pay about $1,000 per lawyer for a scanner. For example, in an office of five lawyers, expect to pay $5,000 for a scanner. The more document intensive your practice, the more scanner you will need. Attached is a list of scanners from various different manufactures, showing prices and basic features. We have not used the website and do not endorse it.

    Digital copiers can do the traditional work of a copier with the added capability of scanning a document. Typically, the digital copier must be equipped with the necessary hardware and software to attache the scanned file to an e-mail message and send it to users of the office network. Most digital copiers are useful for scanning a few pages at a time and as a secondary or backup scanner. However, with the exception of very large and expensive machines, digital copiers are not a good choice for your primary scanner because of their limited ADF capacity and slow processing speed. All of the major copier manufacturers sell digital copiers and most have the ability to add the hardware and software needed for scanning.

    Digital Senders. Hewlett-Packard manufacturers a line of scanners that it calls "digital senders." A digital sender is a network device that scans a document, converts the paper to a "PDF" or "TIF" file format, attaches the scanned image to an e-mail and sends the e-mail to any chosen recipient. The digital sender also doubles as a fax machine. The H-P digital senders range in price from $1,300 to $3,200. At the lower end, expect speeds of four pages per minute and a 25 page ADF. At the upper end, expect fifteen pages per minute and a 50 page ADF. For more information, see www.digitalsender.hp.com.

Scanning and Optical Character Recognition

    When you scan a paper document, the electronic file that's produced is just a picture of the document that was scanned. To convert that electronic image to text that you can search and copy, you need to first use an Optical Character Recognition ("OCR") program. These programs do almost the same thing that you do when you read. They analyze the image and attempt to recognize those parts of the image that are letters, combine the letters into words and the words into paragraphs. Handwriting cannot be OCR'd because the program has no way of identifying the squiggles as particular letters.

     Once the OCR program has completed its work, the image file has been converted to searchable text that can be copied and pasted into any other text-based application, including word processors, spreadsheets and e-mail. Some OCR programs alter the original image file so that only the OCR'd text is now visible. Others create a second file of searchable text. Still others display the image file and hide the searchable text underneath, but in the same electronic file.

     OCR programs are not perfect. Their accuracy - the degree to which they are able to correctly "read" the graphical image and convert it into text - depends to a large extent on the quality of the document that was scanned and the size and type of font. In addition, different OCR programs achieve different levels of accuracy. Using a first generation document with a standard font such as Times New Roman, OCR accuracy of between 95% and 98% should be achieved. Realize that means if your double-spaced document has 400 words per page, at 97% accuracy you'll still have twelve errors per page.

When Should OCR Be Used?

      Scanned documents are typically OCR'd in two circumstances. The first is when you want to convert a document from a graphic image into a word processing document for editing. For example, if someone sends you a paper copy of a contract for review, you can OCR the contract and then make the changes you desire in Word or WordPerfect. The second situation in which OCR is useful is when you want the text of the document to be searchable. That might be the case, for example, with trial exhibits or deposition transcripts.

Scanning

      Your scanner will have a variety of different settings that can be adjusted. One setting affects the resolution of the scanned image. DPI refers to dots per inch and is a measure of the scanning resolution. Generally, scanning documents at between 200 and 300 DPI produces adequate image quality while keeping the size of the image files manageable. Increasing the scanning resolution above 300 DPI will significantly increase the size of the image files without much increase in image quality. Scanning at less than 200 DPI normally produces images of unacceptable scan quality.

    Another scanner setting that you need to be aware of is whether the document is being scanned in black and white, grayscale or in color. For most purposes, scanning documents in black and white rather than color is preferable. Scanning documents in color will significantly increase the size of your image files. This is true even if the document contains only black and white text.

Conclusion

    Assess your needs and go get a scanner now. You can then begin converting your paper into digital files that take up no space, can always be found, can be read by multiple users simultaneously and that can be sent by e-mail or transported on a laptop or optical disk.