OCR - Bad results

Questions, comments and suggestions concerning VintaSoft Imaging .NET SDK.

Moderator: Alex

Post Reply
SebastianB
Posts: 20
Joined: Thu Sep 27, 2012 12:39 pm

OCR - Bad results

Post by SebastianB » Mon Mar 25, 2013 4:59 pm

Hi,

I am playing around with the OCR Demo. I am using a PDF file which contains an invoice. I am wondering why it does not recognize nearly 100% of its contents.
I am working with the default settings of the demo application, except the default language is now set to German.

The PDF file I am using is the one I already sent for another support case (case where the PDF characters read in weird order)

Am I doing anything wrong? Let me know.

Best,
Sebastian

SebastianB
Posts: 20
Joined: Thu Sep 27, 2012 12:39 pm

Re: OCR - Bad results

Post by SebastianB » Mon Mar 25, 2013 6:31 pm

After some more investigation I got the following, interesting, results:
  • Small segments, which only contains one font, and maybe fontsize as well, seems to be recognized much better than huge segments
  • Non standard fonts, but no script fonts (Arial, Courier New or Times New Roman are standard in this definition) seems to have a general problem. My sample uses a font called Eurostile

Alex
Site Admin
Posts: 1445
Joined: Thu Jul 10, 2008 2:21 pm

Re: OCR - Bad results

Post by Alex » Wed Mar 27, 2013 1:39 pm

Hello Sebastian,

Do you recognize text in image-only or searchable PDF document?

Best regards, Alexander

SebastianB
Posts: 20
Joined: Thu Sep 27, 2012 12:39 pm

Re: OCR - Bad results

Post by SebastianB » Wed Mar 27, 2013 7:13 pm

I tried both.

Alex
Site Admin
Posts: 1445
Joined: Thu Jul 10, 2008 2:21 pm

Re: OCR - Bad results

Post by Alex » Thu Mar 28, 2013 8:54 am

Hello Sebastian,

You will have good text recognition results if
  • PDF image-resource from image-only PDF document has 300 dpi resolution or higher
  • Image of PDF page from searchable PDF document is rendered with 300 dpi resolution or higher
By the way, why do you recognize text from searchable PDF document? VintaSoftPDF.NET Plugin allows to extract text from searchable PDF document.

Best regards, Alexander

Post Reply

Who is online

Users browsing this forum: No registered users and 2 guests