How do I scan text and save it as a Word Document in the Collaboratory?
You have a text-based, print document that you'd like to edit and save. Here's how you can use the D'Amour Library Collaboratory to scan the document, "read" the text, and save it as a simple Word document you can edit. First, find the D'Amour Library Collaboratory. It's on the main level, in the front, left corner, near the microfilm files. Bring your images and something to save the files to (we recommend USB drives).
- Start at one of the three computers with a scanner attached.
- Place your original face down on the scanner glass and close the cover. If you have multiple pages, place the first down and have the rest ready.
- On the computer desktop, open the program Omni Page Professional 15.0.
OmniPage performs optical character recognition (OCR), which means that it "reads" a scanned (or uploaded) image as text, interprets it, and displays it in a format that can be saved and edited in Microsoft Word. Without OCR, text in scanned documents cannot be edited.
OCR will only recognize machine-produced type (typed, printed text), not handwriting. Handwritten comments usually appear as images and are not editable.
The clearer the print, the better the OCR results. Documents with small text or with distortions and speckling from multiple photocopies will produce many errors.
- You'll see a toolbar at the top with four blocks:
- Leave the first block as 1-2-3. In the second, choose the appropriate color option: Scan black and white (for all-text documents or those with simple line graphics), grayscale (for documents with photographs or other rich images), or color (when you wish to preserve color images or text). Other options allow you to perform OCR on uploaded files (for more information see the handout “Convert PDF to Word document”).
In the third block, most users will want to leave it as Automatic. This block guides the formatting of the text; in the automatic mode, OmniPage will identify formatting such as columns. If your document is a form with fill-in form elements, choose Form.
In the fourth block, leave it as Save to File to automatically save your document in a Word-readable format (.rtf) immediately after proofreading.
- If you are scanning a single page document: Click on the 1-2-3 button to start the 3-part process (pictured at right). The scanner will initiate, and you'll see the scanning software open. OmniPage will automatically lead you through scanning, proofreading, and scanning.
- If you are scanning a multi-page document: Click on the #1 button to scan. The scanner will initiate. When scanning is complete, a box will ask you if you have more pages to process; click Stop Loading Pages (really!).
- Click on button #2 to initiate proofreading (if scanning a single page, this begins automatically). The OCR Proofreader will ask you to correct text that does not appear in the OmniPage dictionary. To leave text as it appears, click Ignore; if one of the choices in the lower box is correct, select it and click Change. If no choices are correct, manually correct the text in the upper box and click Change. A box will tell you when proofreading is complete.
- For multiple pages: now place the next page on the scanner glass, and repeat steps 5 and 6 as many times as needed.
- When scanning and OCR processing are complete, click on button #3 (if scanning a single page, this begins automatically). A Save File As window will appear. You may leave the file type as .rtf (which is readable in Word, WordPerfect, Works, and most other word processors) or change it to .doc (for the Microsoft Word proprietary format).
Be sure to save the file to a removable disk (a floppy disk, USB flash drive, or similar device) or to T: (Thaw Space). Files saved elsewhere on the computer will be deleted when the computer is restarted.
- Open your document in Microsoft Word to inspect it (go to Start > Open Office Document, and find the file where you just saved it). You may find, especially if your document has unusual formatting (multiple columns, uneven margins, boxes of text) that your document is comprised of a number of text boxes instead of a simple block of text. Unfortunately, this is how OmniPage is able to preserve the "look" of your document. For help on working with text boxes, consult the Help menu in Word.
- It is often worthwhile to thoroughly proofread your document, because although the OmniPage proofreader catches most mistakes, it doesn’t catch them all. Run the spell-check (F7 or Tools > Spelling and Grammar), check to make sure all fonts and character spacing are consistent (Format > Font…). Look for instances where lower-case L’s are interpreted as #1’s and other common OCR errors.
- If you are satisfied with the condition of your document in Word, you may close OmniPage. OmniPage will ask you if you would like to save the document—click No. You have already saved your document as a .rtf or .doc file on your disk; OmniPage is asking if you would like to save another file with a .opd extension, a format that is only readable by OmniPage. Saving a .opd file is only necessary if you need to add more pages to your document or re-proofread the document in OmniPage (in other words, rare cases).
This guide was last updated on 31 March 2006.
- How do I save documents on library computers so that I won't lose my work?
- How do I get help with computers (programs, equipment, errors) in D'Amour Library?
- How do I scan a document and save it as a PDF in the Collaboratory?
- How do I convert a PDF into a Word document in the Collaboratory?
- How do I scan an image in the Collaboratory?
Send website suggestions or comments to firstname.lastname@example.org
D'Amour Library, Western New England University, 1215 Wilbraham Road, Springfield, MA 01119
Phone: (413) 782-1535
Fax: (413) 796-2011