Main Activities
Primary activity: Tesseract OCR is an open-source optical character recognition (OCR) engine — originally developed by Hewlett-Packard and maintained by Google since 2006 — dedicated to extracting text from images.
Products / services: distributed OCR software/library (open-source project, public repository on GitHub) usable as a local engine or integrated into APIs and applications to convert images/scans into usable text.
Target market: developers and companies across all sectors requiring OCR capabilities (banks and financial services, IT/software, public services/government, consulting, etc.). Examples of cited users: ING, HSBC, Bajaj Finserv, Scalable Capital, Evalueserve.
History
Tesseract OCR was initially developed in the Hewlett-Packard laboratories in Bristol and Greeley, Colorado, between 1985 and 1994. In 1996, modifications were made to port the software to Windows, followed by a partial conversion to C++ in 1998. In 2005, Hewlett-Packard decided to open source Tesseract. Since 2006, the development of Tesseract has been managed by Google. These key milestones mark the evolution of Tesseract, from an internal HP project to a major open source solution, supported and developed by Google, and widely adopted worldwide for optical character recognition (OCR).
Team
The Tesseract OCR project was initially developed by Hewlett-Packard (HP) between 1985 and 1995, with Ray Smith as the main founder and developer. Since 2006, the project has been maintained as open source by Google, which ensures its ongoing development. Ray Smith remains a key figure in the project, notably as the principal contributor and technical architect.
Regarding the current infrastructure and governance, Tesseract is primarily managed by an open source community, with major contributions from Google developers and other active members of the GitHub community. There is no formal leadership structure, but Ray Smith and other regular contributors play a central role in the technical direction and validation of project developments.
In summary, Ray Smith is the historical founder and main member of the leadership team, while Google and the open source community ensure the maintenance and evolution of Tesseract OCR.
.avif)