Extract Key-Value Pairs from PDFs Using Muhimbi PDF Converter Services

Marija Trpkovic

Marija Trpkovic

Related Products

PDF Converter

PDF Converter

Share

One of the key changes introduced in Muhimbi PDF Converter Services v11.0 is the inclusion of key-value pair (KVP) extraction from PDF documents. This feature takes advantage of AI, machine learning (ML), and advanced layout understanding to extract meaningful information from unstructured documents and images.

The Muhimbi Document Converter Diagnostics Tool, installed along the service, can help you discover these new features.

Key-Value Pair extraction with the Diagnostics Tool

This post provides a simple example describing how to take advantage of this new feature programmatically.

Benefits of Key-Value Pair Extraction

Key-value pair extraction offers numerous benefits for businesses and developers:

Automated Data Extraction — Automate the tedious process of extracting data from documents, reducing manual labor and human error. Enhanced Accuracy — Utilize AI and ML technologies to ensure high accuracy in data extraction, even from complex and unstructured documents. Save Time — Significantly speed up data processing times by automating extraction tasks that would otherwise take hours to complete manually. Versatile Integration — Integrate with various applications and services, enhancing the functionality and efficiency of existing systems. Improved Data Handling — Ensure consistent and structured data output, making it easier to handle, analyze, and utilize extracted information.

How to Implement Key-Value Pair Extraction

This tutorial shows how to create a .NET Framework console application and extract key-value pairs from a PDF document.

  1. Download and install Muhimbi PDF Converter or Muhimbi PDF Converter Services from our website.

  2. Create a new Console Application project in Visual Studio called KVPExtraction. The actual version of the .NET Framework isn’t important, as Web Services are system-agnostic, meaning they can be used by client applications written in a wide variety of programming languages.

  3. In the Solution Explorer window, right-click the project and select Add > Service Reference. Set the Address field to http://localhost:41734/Muhimbi.DocumentConverter.WebService/ and click Go. Define your desired namespace (in this case, DocumentConverterService) and click OK. This will generate the required proxy classes to be able to work with the Web Service.

Service Reference

  1. In your Program.cs file, add the following code:

Sample Input

When the program above is executed on the PDF document and expected keys JSON below, it retrieves values for the expected keys and their synonyms.

PDF Document

Here’s an example of how the PDF document looks.

Sample input PDF

Expected Keys

Here are the expected keys:

Sample Output

Note: The expectedKey property from the JSON above is used as the key property in the output.

Sample csv output

Conclusion

By leveraging Muhimbi PDF Converter Services’ new key-value pair extraction feature, you can streamline data extraction processes, reduce manual labor, and ensure high accuracy and consistency in your data handling workflows. Whether you’re processing invoices, forms, or any other documents, this feature can greatly enhance your document management system’s efficiency and effectiveness.

Author

Marija Trpkovic

Marija Trpkovic

Product Marketing Manager

Marija is a product marketing manager who likes to launch new products and features and target the right people with them. Outside of work, she likes spending time outdoors with her family and dogs.

Have a Question?
We’re Always Happy to Help.

© Muhimbi Ltd. 2008 - 2024
This website uses cookies to ensure you get the best experience. Learn more