VintaSoft Imaging .NET SDK 14.0: Documentation for .NET developer
In This Topic
    OCR: How to recognize MRZ characters from image in .NET
    In This Topic
    VintaSoft Imaging .NET SDK with VintaSoft OCR .NET Plug-in allows to recognize text from images using Tesseract OCR engine. A lot of dictionaries created for Tesseract OCR engine provide the ability to run the text recognition in more than 100 languages.

    Few authors found in Internet suggest their free dictionaries for recognition of MRZ symbols (machine readable zones) using Tesseract OCR engine.
    We have tested some of them and made sure that the dictionary "mrz.traineddata" provides good quality of MRZ symbols recognition.
    The "mrz.traineddata" dictionary is offered under BSD-3 license, which allows the free use and redistribution of this file.
    We make the "mrz.traineddata" dictionary available for download from our site, also the dictionary can be downloaded from other Internet resources.

    The "mrz.traineddata" dictionary is added into the list of supported dictionaries (MRZ item in Vintasoft.Imaging.Ocr.OcrLanguage enumeration) since version 11.0.5.1 of VintaSoft OCR .NET Plug-in.

    More detailed information about MRZ symbols you can read in Wikipedia: https://en.wikipedia.org/wiki/Machine-readable_passport

    Here is an image from Wikipedia on which is represented a document with machine readable zone (MRZ):



    Here is C#/VB.NET code that shows how to recognize MRZ symbols from image using Tesseract OCR engine:
    /// <summary>
    /// Recognizes MRZ characters from image using Tesseract OCR engine.
    /// </summary>
    /// <param name="filename">The name of file, which stores images with MRZ characters.</param>
    public static void RecognizeMRZCharactersUsingTesseractOCR(string filename)
    {
        // create an image collection
        using (Vintasoft.Imaging.ImageCollection images = 
            new Vintasoft.Imaging.ImageCollection())
        {
            // add images from file to the image collection
            images.Add(filename);
    
            System.Console.WriteLine("Create Tesseract OCR engine...");
            // create the Tesseract OCR engine
            using (Vintasoft.Imaging.Ocr.Tesseract.TesseractOcr tesseractOcr = 
                new Vintasoft.Imaging.Ocr.Tesseract.TesseractOcr())
            {
                System.Console.WriteLine("Initialize OCR engine...");
                // init the Tesseract OCR engine for recognition of MRZ characters (machine-readable zones)
                tesseractOcr.Init(new Vintasoft.Imaging.Ocr.OcrEngineSettings(Vintasoft.Imaging.Ocr.OcrLanguage.MRZ));
    
                // for each image in image collection
                foreach (Vintasoft.Imaging.VintasoftImage image in images)
                {
                    System.Console.WriteLine("Recognize the image...");
                    
                    // recognize text in image
                    Vintasoft.Imaging.Ocr.Results.OcrPage ocrResult = tesseractOcr.Recognize(image);
    
                    // output the recognized text
    
                    System.Console.WriteLine("Page Text:");
                    System.Console.WriteLine(ocrResult.GetText());
                    System.Console.WriteLine();
                }
    
                // shutdown the Tesseract OCR engine
                tesseractOcr.Shutdown();
            }
    
            // free images
            images.ClearAndDisposeItems();
        }
    }
    
    ''' <summary>
    ''' Recognizes MRZ characters from image using Tesseract OCR engine.
    ''' </summary>
    ''' <param name="filename">The name of file, which stores images with MRZ characters.</param>
    Public Shared Sub RecognizeMRZCharactersUsingTesseractOCR(filename As String)
        ' create an image collection
        Using images As New Vintasoft.Imaging.ImageCollection()
            ' add images from file to the image collection
            images.Add(filename)
    
            System.Console.WriteLine("Create Tesseract OCR engine...")
            ' create the Tesseract OCR engine
            Using tesseractOcr As New Vintasoft.Imaging.Ocr.Tesseract.TesseractOcr()
                System.Console.WriteLine("Initialize OCR engine...")
                ' init the Tesseract OCR engine for recognition of MRZ characters (machine-readable zones)
                tesseractOcr.Init(New Vintasoft.Imaging.Ocr.OcrEngineSettings(Vintasoft.Imaging.Ocr.OcrLanguage.MRZ))
    
                ' for each image in image collection
                For Each image As Vintasoft.Imaging.VintasoftImage In images
                    System.Console.WriteLine("Recognize the image...")
    
                    ' recognize text in image
                    Dim ocrResult As Vintasoft.Imaging.Ocr.Results.OcrPage = tesseractOcr.Recognize(image)
    
                    ' output the recognized text
    
                    System.Console.WriteLine("Page Text:")
                    System.Console.WriteLine(ocrResult.GetText())
                    System.Console.WriteLine()
                Next
    
                ' shutdown the Tesseract OCR engine
                tesseractOcr.Shutdown()
            End Using
    
            ' free images
            images.ClearAndDisposeItems()
        End Using
    End Sub