Extract Info from PDF Fields using Power Automate

In this guide you'll learn how to extract data from a PDF form using Power Automate. When working with PDF forms and the Muhimbi PDF Converter you can extract information from PDF forms in the FDF, XFDF, and XML standards.

Two ways to extract PDF form data:

  • Converting the XML string to JSON object and parsing it.
  • Parsing XML using XPath

There are advantages and disadvantages to using both methods. JSON’s data types also have a 1:1 mapping (key-value pair) to data while, XML is a markup language and uses tags (<>) structure to represent data items. JSON has a smaller overhead compared to XML and hence it’s easier and quicker to parse as it’s more lightweight. Sometimes, converting XML to JSON might become a little overwhelming and so you may want to use XPATH() to extract meaningful information.

In this guide, you can learn how to extract data from PDF form by using both methods. We will take a scenario where a form filled with relevant data is uploaded to an MS SharePoint Document Library. Our Flow will pick up the uploaded form automatically, extract the data contained within it, and then add the data to an MS SharePoint List.

Prerequisites:

Before we begin, please ensure the following prerequisites are in place:

  • Power Automate subscription.
  • Muhimbi PDF Converter for SharePoint free or free trial subscription.
  • Appropriate privileges to create Power Automate Flows.
  • Working knowledge of Power Automate.

Convert XML string to JSON object and Parse it

  1. Create a new Flow using the ‘Automated cloud Flow’ option.

    create a new flow

  2. Give your Flow a meaningful name and select the ‘When a file is created in a folder’ SharePoint trigger, click Create button.

    create file

  3. In the trigger, you can specify the path to the SharePoint Online Library to monitor for new files.

    sharepoint library

  4. Add the SharePoint ‘Get file content’ action to the Flow Canvas and configure it with the details below:

  • Site Address: Specify the path to the SharePoint Online site collection which holds the file.
  • File Identifier: ‘x-ms-file-id’, which is the output from ‘When a file is created in a folder’ action.

configure file settings

  1. Add the Muhimbi ‘Convert document’ action to the Flow Canvas and configure it with the details below:
  • Source file name: ‘x-mx-filename-encoded’ is the output from the ‘When a file is created in a folder’ action.
  • Source file content: ‘File Content’ is the output of the ‘Get file content’ action.
  • Output Format: XML

configure conversion settings

  1. Add the ‘Compose’ action to the Flow Canvas and add the ‘Processed file content’ output from ‘Convert document’ action and convert it to base64ToString.

    convert to

  2. Add the ‘Compose’ action to the Flow Canvas and add the ‘Outputs’ output from ‘base64ToString’ action and convert it to JSON using the expression json(xml(outputs('Compose_-_base64ToString')))

    compose action to flow canvas

  3. Save and perform a manual test on the workflow. Upload a supported file type to the folder that is configured by the trigger. After a few seconds you should see the XML output in the ‘Compose - base64ToString’ action and the JSON output in the ‘Compose - XML to JSON’ action.

XML output :

  1. Add the ‘Parse JSON’ action to the Flow Canvas and configure it with the details below:
  • Content: Add the output of the “XML to JSON” action.
  • Click Generate from Sample button and copy the JSON from the above step.

generate from sample

  1. Add SharePoint ‘Create Item’ action, and we will directly pass the expressions here.
  • Site Address: Select the location of the SharePoint list where the list item should be added to.
  • List Name: Select the SharePoint list where the list item should be added to.

add sharepoint item action

Note:** In this step, we will directly add the expressions and map the expressions to the SharePoint list.

  1. Publish your Flow and upload a supported file type to the folder that is monitored by the specified SharePoint trigger. After a short wait, a new list item within the target SharePoint list should have been created.

Parsing XML using XPath

Parsing XML using XPath is very similar to the above and you will need to perform Step 8 and get the XML output. Once the XML Output is got you will add the following actions to the Flow.

  1. Extract the text() value from XML using the xPath expressions below: xpath('<xml>', '<xpath>')

This will return an array with XML nodes or values that match the specified XPath expression. In our case, there is only one element in the array, so we will use the first value using the Expression first()

first(xpath(xml(outputs('Base64_to_XML(String)')),'/fields/GivenNameTextBox/text()'))

extract text value from xml

  1. Add SharePoint ‘Create Item’ action, and we will directly pass the expressions here.
  • Site Address: Select the location of the SharePoint list where the list item should be added to.
  • List Name: Select the SharePoint list where the list item should be added to.

create item action**

Note:** In this step, we will directly add the expressions and map the expressions to the SharePoint list.

  1. Publish your Flow and upload a supported file type to the folder that is monitored by the specified SharePoint trigger. After a short wait, a new list item within the target SharePoint list should have been created.

Have a Question?
We’re Always Happy to Help.

© Muhimbi Ltd. 2008 - 2024
This website uses cookies to ensure you get the best experience. Learn more