One of the key changes introduced in Muhimbi PDF Converter Services v11.0 is the inclusion of key-value pair (KVP) extraction from PDF documents. This feature takes advantage of AI, machine learning (ML), and advanced layout understanding to extract meaningful information from unstructured documents and images.
The Muhimbi Document Converter Diagnostics Tool, installed along the service, can help you discover these new features.
This post provides a simple example describing how to take advantage of this new feature programmatically.
Benefits of Key-Value Pair Extraction
Key-value pair extraction offers numerous benefits for businesses and developers:
Automated Data Extraction — Automate the tedious process of extracting data from documents, reducing manual labor and human error. Enhanced Accuracy — Utilize AI and ML technologies to ensure high accuracy in data extraction, even from complex and unstructured documents. Save Time — Significantly speed up data processing times by automating extraction tasks that would otherwise take hours to complete manually. Versatile Integration — Integrate with various applications and services, enhancing the functionality and efficiency of existing systems. Improved Data Handling — Ensure consistent and structured data output, making it easier to handle, analyze, and utilize extracted information.
How to Implement Key-Value Pair Extraction
This tutorial shows how to create a .NET Framework console application and extract key-value pairs from a PDF document.
Download and install Muhimbi PDF Converter or Muhimbi PDF Converter Services from our website.
Create a new Console Application project in Visual Studio called KVPExtraction. The actual version of the .NET Framework isn’t important, as Web Services are system-agnostic, meaning they can be used by client applications written in a wide variety of programming languages.
In the Solution Explorer window, right-click the project and select Add > Service Reference. Set the Address field to
http://localhost:41734/Muhimbi.DocumentConverter.WebService/
and click Go. Define your desired namespace (in this case,DocumentConverterService
) and click OK. This will generate the required proxy classes to be able to work with the Web Service.
- In your
Program.cs
file, add the following code:
Sample Input
When the program above is executed on the PDF document and expected keys JSON below, it retrieves values for the expected keys and their synonyms.
PDF Document
Here’s an example of how the PDF document looks.
Expected Keys
Here are the expected keys:
Sample Output
Note: The expectedKey
property from the JSON above is used as the key property in the output.
Conclusion
By leveraging Muhimbi PDF Converter Services’ new key-value pair extraction feature, you can streamline data extraction processes, reduce manual labor, and ensure high accuracy and consistency in your data handling workflows. Whether you’re processing invoices, forms, or any other documents, this feature can greatly enhance your document management system’s efficiency and effectiveness.