In this guide you’ll learn how to OCR and extract text from an image using Power Automate. We use the Muhimbi ‘Extract Text using OCR’ action to extract text from an image-based file and write the extracted text to a SharePoint List column. This example can be used to retrieve invoice numbers or other textual content from documents that conform to a particular template.
Before we start building the workflow, ensure all prerequisites are in place. It is also assumed that the reader has some knowledge of building Workflows using Power Automate.
- Muhimbi PDF Converter for Power Automate full or free trial subscription.
- Appropriate privileges to create Power Automate(Flow).
- Working knowledge of Power Automate.
Extract text from Images & Scans and Convert to Searchable PDF using Power Automate
We will cover an example to show how to create a Power Automate (Flow) solution to extract text from image-based SharePoint content and update a field with the extracted text. From a high level, the flow will look like the below:
Create a new Flow and use the SharePoint Online trigger ‘When a file is created (properties only). Fill out the Site Address, and Folder Id accordingly..
Insert a ‘Get file content’ action and fill it out as per the screenshot displayed below.
For the ‘Site Address’ in the image below, specify the same address as used in the trigger. In the ‘File identifier’ field, choose the ‘identifier’ option inside the ‘When a file is created in folder’ trigger.
Insert a ‘Extract text using OCR’ action and fill it out as per the screenshot displayed below:
Source file name: Select ‘Name’ returned by the trigger.
Source file content: Select ‘Body’ returned by the ‘Get file content’ action.
Language: This is the language the source document is written in. It defaults to ‘English’, but there is support for Arabic, Danish, German, English, Dutch, Finnish, French, Hebrew, Hungarian, Italian, Norwegian, Portuguese, Spanish and Swedish languages.
X coordinate, Y Coordinate, Width and Height parameters: Specify the exact area to extract text from. The unit of measure (UOM) is 1/72nd of an inch.
Insert a ‘Update file properties’ action to write the converted file back to SharePoint Online (or Dropbox, OneDrive, Google Drive, or send the XLSX via email).
Once the text has been extracted, we want to update a column in the source file. To facilitate this,we have created a column named “convertedtext” in the library.
Specify the same site address and library used in the trigger.
In the ‘Id’ field, specify the ‘Id’ returned by the trigger.
Specify the ‘Out text’, as returned by the ‘Extract text using OCR’ action, in the ‘Extract Output’ field.
- Publish the workflow and create a Tiff file to the specified document library. After a few seconds the Flow will trigger and the column ‘convertedtext’ in the document library will be updated from the text form the tiff file.