Hi,
I am playing around with the OCR Demo. I am using a PDF file which contains an invoice. I am wondering why it does not recognize nearly 100% of its contents.
I am working with the default settings of the demo application, except the default language is now set to German.
The PDF file I am using is the one I already sent for another support case (case where the PDF characters read in weird order)
Am I doing anything wrong? Let me know.
Best,
Sebastian
OCR - Bad results
Moderator: Alex
-
- Posts: 20
- Joined: Thu Sep 27, 2012 12:39 pm
Re: OCR - Bad results
After some more investigation I got the following, interesting, results:
- Small segments, which only contains one font, and maybe fontsize as well, seems to be recognized much better than huge segments
- Non standard fonts, but no script fonts (Arial, Courier New or Times New Roman are standard in this definition) seems to have a general problem. My sample uses a font called Eurostile
Re: OCR - Bad results
Hello Sebastian,
Do you recognize text in image-only or searchable PDF document?
Best regards, Alexander
Do you recognize text in image-only or searchable PDF document?
Best regards, Alexander
Re: OCR - Bad results
Hello Sebastian,
You will have good text recognition results if
Best regards, Alexander
You will have good text recognition results if
- PDF image-resource from image-only PDF document has 300 dpi resolution or higher
- Image of PDF page from searchable PDF document is rendered with 300 dpi resolution or higher
Best regards, Alexander