The Muhimbi PDF Converter comes with support for a number of OCR (Optical Character Recognition) related facilities including the ability to make image based PDFs (Scans, faxes) fully searchable and indexable. In addition it support a way to extract this text to allow information such as Invoice numbers, Purchase Order numbers or other identifiable information to be extracted and used as part of a larger software / workflow process.
For more details and examples see the following articles:
- The How and Why of OCR /Providing document access to the visually impaired
- OCR Facilities provided by Muhimbi’s server based PDF Conversion products
- Converting scans and images to searchable PDFs using Java and server side OCR
- Converting scans and images to searchable PDFs using C# and server side OCR
- Converting scans and images to searchable PDFs using SharePoint Designer Workflows
- Converting scans and images to searchable PDFs using OCR & Nintex Workflow
- Extract text from scanned content using OCR and SharePoint Designer Workflows
- Extract text from scanned content using OCR and Nintex Workflow
- Utilise 3rd party OCR Engines in Muhimbi’s range of Server Side PDF Products
Please note that in order to use OCR in a production environment, a valid add-on license for the OCR and PDF/A Archiving Add-on must be installed alongside a regular license.