Hi
Does Vintasoft.Imaging.Ocr.Tesseract plugin use Tesseract 4? If not, is it possible to do so?
Can I use multiple languages? In tesseract we can use eng+latin etc.
Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4
Moderator: Alex
-
- Posts: 8
- Joined: Mon Jan 14, 2019 12:55 pm
-
- Site Admin
- Posts: 2397
- Joined: Thu Jul 10, 2008 2:21 pm
Re: Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4
Hi David,
https://www.vintasoft.com/docs/vsimagin ... uages.html
Best regards, Alexander
Current version of Vintasoft OCR .NET Plugin uses Tesseract OCR 3.04. We plan to use Tesseract OCR 4 in near time.David_karlsson wrote: Mon Jan 14, 2019 3:47 pm Does Vintasoft.Imaging.Ocr.Tesseract plugin use Tesseract 4? If not, is it possible to do so?
Please read how to recognize text in two languages here:David_karlsson wrote: Mon Jan 14, 2019 3:47 pm Can I use multiple languages? In tesseract we can use eng+latin etc.
https://www.vintasoft.com/docs/vsimagin ... uages.html
Best regards, Alexander
-
- Posts: 8
- Joined: Mon Jan 14, 2019 12:55 pm
Re: Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4
Near time? 1 month ? 1 year?Current version of Vintasoft OCR .NET Plugin uses Tesseract OCR 3.04. We plan to use Tesseract OCR 4 in near time.
I have already read the documentation. In the documentation it says how to OCR interpret different sections of a pdf with different languages.Please read how to recognize text in two languages here:
https://www.vintasoft.com/docs/vsimagin ... uages.html
In tesseract to do a better interpretation of same section (page) it is possible to combine different languages ex. eng+deu.
It can even be used with multiple languages traineddata at a time eg. English and German:
tesseract myscan.png out -l eng+deu
https://github.com/tesseract-ocr/tesseract/wiki
-
- Site Admin
- Posts: 2397
- Joined: Thu Jul 10, 2008 2:21 pm
Re: Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4
Tesseract 4 will be available in version 8.7.2 in 2 months.
Thank you for information. We will analyze information and will try to provide the best solution.David_karlsson wrote: Mon Jan 14, 2019 9:52 pm I have already read the documentation. In the documentation it says how to OCR interpret different sections of a pdf with different languages.
In tesseract to do a better interpretation of same section (page) it is possible to combine different languages ex. eng+deu.
It can even be used with multiple languages traineddata at a time eg. English and German:
tesseract myscan.png out -l eng+deu
https://github.com/tesseract-ocr/tesseract/wiki
Best regards, Alexander
-
- Posts: 8
- Joined: Mon Jan 14, 2019 12:55 pm
Re: Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4
Perfect. I will happily wait for release of Tesseract 4 plugin.
-
- Posts: 8
- Joined: Mon Jan 14, 2019 12:55 pm
Re: Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4
Hi !
Is it possible to access Vintasoft.Imaging.Ocr.Tesseract 8.7.2 in advance?
Is there any preview of version 8.7.2 ? We have started to develop our system. I need Vintasoft.Imaging.Ocr.Tesseract API for tesseract 4.Tesseract 4 will be available in version 8.7.2 in 2 months.
Is it possible to access Vintasoft.Imaging.Ocr.Tesseract 8.7.2 in advance?
-
- Site Admin
- Posts: 2397
- Joined: Thu Jul 10, 2008 2:21 pm
Re: Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4
Hi David,
Best regards, Alexander
I think preview version will be available in 2 weeks.David_karlsson wrote: Mon Feb 04, 2019 10:12 pmIs there any preview of version 8.7.2 ? We have started to develop our system. I need Vintasoft.Imaging.Ocr.Tesseract API for tesseract 4.Tesseract 4 will be available in version 8.7.2 in 2 months.
Is it possible to access Vintasoft.Imaging.Ocr.Tesseract 8.7.2 in advance?
Best regards, Alexander
-
- Site Admin
- Posts: 2397
- Joined: Thu Jul 10, 2008 2:21 pm
Re: Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4
Hi David,
Also in version 8.7.2.1 you can specify that text must be recognized in several languages. Here is an example that shows how to recognize text written in English and German languages: https://www.vintasoft.com/docs/vsimagin ... uages.html
Best regards, Alexander
Version 8.7.2.1 has been released today. In this version the used Tesseract OCR engine has been updated to version 4.0.David_karlsson wrote: Mon Feb 04, 2019 10:12 pmIs there any preview of version 8.7.2 ? We have started to develop our system. I need Vintasoft.Imaging.Ocr.Tesseract API for tesseract 4.Tesseract 4 will be available in version 8.7.2 in 2 months.
Is it possible to access Vintasoft.Imaging.Ocr.Tesseract 8.7.2 in advance?
Also in version 8.7.2.1 you can specify that text must be recognized in several languages. Here is an example that shows how to recognize text written in English and German languages: https://www.vintasoft.com/docs/vsimagin ... uages.html
Best regards, Alexander