OCR analyzes image-based content — for example, a scanned PDF or an image embedded in an MS Word file — applies some image recognition logic, and then embeds the result in a PDF. The scanned content still looks the same, but you can copy text from the document, and search crawlers can index this text as well. Muhimbi PDF Converter for Power Automate can convert scans and images to editable and searchable PDFs.
Using PowerAutomate to OCR an Image
This example takes you through extracting text from an image (.png
) file and updating the extracted text to an MS SharePoint library in a custom column created for this purpose.
1. Creating a New Flow
Create a new flow and use the When a file is created (properties only) SharePoint Online trigger. Fill out the URL for the site collection and select the relevant SharePoint Site Address, Library Name, and Folder from the dropdown menu.
2: Getting the File Content
Insert MS SharePoint's Get file content action and fill it out as shown in the screenshot displayed below. Substitute the Site Address field with a suitable value and the File Identifier field with the output value of the When a file is created (properties only) action.
3. Extracting Text Using OCR
Insert Muhimbi's Extract text using OCR action and fill it out as shown in the screenshot below.
- Source file name — Name of the source file, including the extension.
- Source file content — Content of the file to OCR. Select Body, which is the output value of the Get file content action.
- Language — Select the language of the file. This example uses English.
- X Coordinate — Select the X coordinate (in pts, 1/72 of an inch) to be OCRed. This example uses 150.
- Y Coordinate — Select the Y coordinate (in pts, 1/72 of an inch) to be OCRed. This example uses 368.
- Width — Select the width (in pts, 1/72 of an inch) to be OCRed. This example uses 92.
- Height — Select the height (in pts, 1/72 of an inch) to be OCRed. This example uses 80.
- Page number — Page number to be OCRed. Leave this blank to OCR all pages or for images.
4. Updating File Properties
Insert an MS SharePoint Update file properties action and fill it out as per the screenshot displayed below. This will update the OCRed text back to the column in the library for the item specified by the item ID.
- Site Address — Select the site address of the MS SharePoint library to which the OCRed content needs to be updated.
- Library Name — Select the MS SharePoint library to which the OCRed content needs to be updated.
- Id — This is the unique identifier of the item to be updated. In this case, select ID, which is the output of the When a file is created (properties only) action.
- Item — This is the column name of the library to which data has to be updated. In this example, convertedtext is the name of the column, and the field says Out text, which is the output of the Extract text using OCR action. You can update the value based on whatever your column is named in your scenario.
5. Publishing the Workflow
Publish the workflow and upload a .png
file in the specified document library. After a few seconds, the flow will trigger and the OCRed content will be updated to the convertedtext column in your MS SharePoint library.