OCR: Prepare image for text recognition

In This Topic

Text recognition from an image with ideal quality does not require any pre-processing. Unfortunately most of real documents are far from ideal quality and require some pre-processing (noise removal, text orientation detecting, etc) before running recognition to get acceptable OCR results.

VintaSoft Imaging .NET SDK and VintaSoft Document Cleanup .NET Plug-in offer the professional functionality for document image processing before running OCR. Here is a partial list of available functions:

Auto invert - automatically inverts an image of document.
Border Clear - converts dark borders to white background color automatically.
Deskew - rotates the specified image automatically to straighten it.
Hole puch removal - removes hole punches on image automatically.
Line removal - removes lines on document image automatically (lines of forms, tables, underlining/strikethrough of text, noise).
Auto text invert - automatically inverts an inverted text on image of document.
Despeckle - removes speckles from image automatically.
Border Removal - removes dark border automatically.
Document Segmentation - detects different zone types on the image such as text, graphic, lines.

Depending on the image quality these commands can be used separately or simultaneously.

OcrPreprocessingCommand can be used to simplify the code and use several image processing commands at the same time. This composite command unites some mostly used image processing commands intended to run before text recognition.

Send Feedback