Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.


HPR3315: tesseract optical character recognition

Hosted by Ken Fallon on 2021-04-16 00:00:00
Download or Listen

Tesseract (software)

From Wikipedia, the free encyclopedia

Tesseract is an optical character recognition engine for various operating systems. It is free software, released under the Apache License. Originally developed by Hewlett-Packard as proprietary software in the 1980s, it was released as open source in 2005 and development has been sponsored by Google since 2006.
In 2006, Tesseract was considered one of the most accurate open-source OCR engines then available.


$ tesseract -l eng english-page.jpg english
$ tesseract -l nld dutch-page.jpg dutch
$ ls
dutch.txt english.txt 

Comments



More Information...


Copyright Information

Unless otherwise stated, our shows are released under a Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) license.

The HPR Website Design is released to the Public Domain.