OCR Images using Power Automate

OCR analyzes image-based content – e.g. a scanned PDF or an image embedded in an MS Word file applies some image recognition logic and then embeds the result in a PDF. The scanned content still looks the same, but you can now copy text from the document, and search crawlers can also index this text as well. Muhimbi PDF Converter for Power Automate can convert scans and images to editable and searchable PDFs.

Using PowerAutomate to OCR an image

This example takes you through extracting text from an image (*.png) file and update the extracted text to MS SharePoint library in a custom column created for this purpose. On a high level perspective, the steps to create are as follows:

  1. Create a new Flow and use the SharePoint Online trigger ‘When a file is created (properties only)’. Fill out the URL for the site collection and select the relevant SharePoint Site Address, Library Name, and Folder from the dropdown menu.

    create flow

  2. Insert MS SharePoint's ‘Get file content’ action and fill it out as per the screenshot displayed below. Naturally you will need to substitute the Site Address with a suitable value and File identifier with the output value of ‘When a file is created (properties only)’ action.

  3. Insert Muhimbi's ‘Extract text using OCR’ action and fill it out as per the screenshot displayed below.

    • Source file name: Name of the source file including extension.

    • Source file content: Content of the file to OCR. Select ‘Body’ which is the output value of ‘Get file content’ action.

    • Language: Select the language of the OCR file.In our case we select ‘English’.

    • X Coordinate: Select the X Coordinate (in Pts, 1/72 of an inch) to be OCR’ed. In our case we enter ‘150’.

    • Y Coordinate: Select the Y Coordinate (in Pts, 1/72 of an inch) to be OCR’ed. In our case we enter ‘368’.

    • Width: Select the width (in Pts, 1/72 of an inch) to be OCR’ed. In our case we enter ‘92’.

    • Height: Select the height (in Pts, 1/72 of an inch) to be OCR’ed. In our case we enter ‘80’.

    • Page number: Page number to be OCR’ed.Leave this blank to OCR all pages or for images.

      extract text using ocr

  4. Insert a MS SharePoint ‘Update file properties’ and fill it out as per the screenshot displayed below. This will update the OCR’ed text back to the column in the library for the item specified by the item id.

    • Site Address: Select the site address where the MS SharePoint library to which the OCR’ed content needs to be updated.

    • Library Name: Select the MS SharePoint library to which the OCR’ed content needs to be updated.

    • Id: This is the unique identifier of the item to be updated. In our case, select ‘Id’ which is the output of ‘When a file is created (properties only)’ action.

    • Item: This is the column name of the library to which data has to be updated. In our case, ‘convertedtext’ is the name of the column and we will update ‘out text’ which is the output of ‘Extract text using OCR’ action. You can update the value suitably based on whatever your column is named in your scenario.

      image to ocr

  5. Publish the workflow and upload a *.png file in the specified document library. After a few seconds the Flow will trigger and the OCR’ed content will be updated to the ‘convertedtext’ column in our MS SharePoint library.

    publish the workflow

Have a Question?
We’re Always Happy to Help.

© Muhimbi Ltd. 2008 - 2023
This website uses cookies to ensure you get the best experience. Learn more