Run OCR processing

You can generate a searchable text file from a scanned image. To do this, you run optical character recognition (OCR) processing on an image file. The original scanned document in the file repository remains unchanged. Typically, a case administrator performs this task. Group members and group leaders can run OCR processing if they have permissions to do so.

Note: If a user submits a secured PDF for OCR using the following procedure, an error will appear. If any page in the document fails in the OCR process, the application will not create a .txt file. Additionally, if the option to embed text in a PDF is selected, the application will not generate an updated PDF file.

You can submit the following file formats for OCR processing: .bmp, .dcx, FAXserve Fax Document, .gif, JBIG2 compression standard, .jpeg, JPEG2000, .max, PDF, .png, .sim, TIFF, .wmp, .xif, .xps.

Note: The application does not support XFA-based PDF forms.

The maximum size for images that can be submitted for OCR processing is 8400 x 8400 pixels.

To run OCR processing on documents:

In the List pane, select the check box next to the documents that you want to submit for OCR processing.

On the Tools menu, select OCR Processing.

By default, OCR processing is run on documents that have matching text files. If you do not want to process documents that have matching text files, select the Skip documents with text files check box.

Choose the OCR processing options:

Embed text in PDF files (process will not update original file): Embeds text in a PDF file, rather than create a separate text file.

Auto-rotate images: Rotates images in 90-degree increments and attempts to align them to their correct upright position.

Run spelling checker: Corrects misspellings that are caused if the OCR processor misreads an image. For example, corrects "speli" to "spell."

Note: Spelling checker does not correct misspellings in the original document.

Auto-deskew images: If the image was scanned at an angle, adjusts the angle so that text is aligned horizontally.

Despeckle images: Removes spots or shading that occurred during scanning to improve the accuracy of the output.

Ignore OCR errors: Continues processing when the application encounters an error. To stop the OCR processing when the application encounters an error, clear the check box. All unprocessed documents are marked "skipped due to error" and must be resubmitted.

Enable verbose logging: Sends job processing information to the system. You must select this option to view error information related to the job.

Under Recognized languages, select the language of the document that you are scanning:

To scan for English, select the check box.

To scan for a different language, click English, and then select a language from the list. Select the check box next to the language.

To scan for multiple languages, click the Add button. Click None, select a language from the list, and then select the check box next to the language that you added.

Note: For the best speed and quality, process only one recognized language at a time.

Click OK.

When processing is complete, OCR text is searchable and appears in the Formatted content or Unformatted content view of the document after the next indexing and enrichment job runs. Your administrator can also manually update the index.