In this guide, you’ll learn how to archive a wide variety of document types in SharePoint by converting them to the PDF/A format. This guide can be used in SharePoint Online or on-premises deployments.
What Does PDF/A Aim to Achieve?
PDF/A aims to produce files with static content that can be precisely reproduced both today and in the future. Files that are subject to long-term archiving should work regardless of the device or operating system used. The future usability of PDF/A files must also be guaranteed in a manufacturer-independent manner, and this includes Adobe. PDF/A is a “complete” format, which means that PDF files that comply with the PDF/A standard are complete on their own and use no external references or non-PDF data. The PDF/A-1 standard is based on PDF/A specification 1.4, which means that it works within the technical scope of the functions available in Acrobat 5.
A range of rules must be observed when generating PDF/A files to meet these goals. For example, when generating PDF/A files, it’s important to embed all fonts and clearly specify all colors. Forms, comments, and notes are only permitted to a limited extent. Compression is allowed as a general rule, but LZW and JPEG2000 are excluded. Transparent objects and layers (optional content groups) aren’t permitted. PDF/A uses rules for metadata that are based on the Extensible Metadata Platform (XMP). A PDF/A file must also identify itself as such.
Please note that to use this functionality, you need the [OCR and PDF/A Archiving] add-on license, in addition to a valid PDF Converter for SharePoint On-Premises or PDF Converter Services license.
Muhimbi’s Interpretation of the PDF/A Standard
There are as many PDF/A validators as there are interpretations of the specification. The validator we use internally for testing purposes is the one that comes with Adobe Acrobat Pro X.
In the screenshot below, you can see the validation results of a PowerPoint file that was converted to PDF by the Muhimbi PDF Converter and subsequently post-processed for output as a PDF/A1b file. As you can see, it validates successfully. The same is true for conversion to PDF/A2b.
In the following screenshot, the same document saved by PowerPoint in the PDF/A format doesn’t validate successfully.
The validator checks many rules, including important ones such as font embedding, but also rules that are perhaps not so important, like the fact that the modification date stored in the PDF must be exactly the same as the one stored in the XMP metadata.
Muhimbi’s range of PDF conversion products — including PDF Converter for SharePoint On-Premises and the PDF Converter Services — has provided some level of PDF/A support since being launched. Support for PDF/A has since been built into our web service’s object model.
Up to version 5.1 of the converter, any request to output the file in PDF/A format was passed on directly to the underlying converter.
The converters that support PDF/A1b (not PDF/A2b) natively are:
- MS Word
- MS Publisher
Starting with version 5.2 of the converter, any PDF file can be post-processed and converted to PDF/A1b.
As of version 7.0, PDF Converter also supports the PDF/A2b standard.
Before you start building the workflow, ensure all prerequisites are in place. This guide assumes you have some knowledge of building workflows using Nintex workflow.
Install PDF Converter for SharePoint On-Premises version 4.1 or newer.
Install Nintex workflow.
Muhimbi.PDFConverter.Nintex.WebAppSharePoint feature using SharePoint Central Administration in the relevant web application.
Make sure you have the appropriate privileges to create workflows.
If you intend to carry out PDF/A post-processing, configure Ghostscript as explained in the next section.
Configuring the Muhimbi PDF Converter to Use PDF/A
Muhimbi PDF Converter relies on third-party software to carry out the PDF/A post-processing step. This software is free to download and install for both individuals and organizations.
Installation of this software is optional and only required if you intend to carry out any PDF/A post-processing. The steps are as follows:
Download the latest AGPL release from the Ghostscript website. Depending on your hardware and operating system, you’ll need to download either the 32- or 64-bit version. Muhimbi has tested PDF/A1b post-processing with versions 9.04 and later. PDF/A2b requires Ghostscript 9.06 or later.
Install Ghostscript in the location of your choice on every server that runs the Muhimbi conversion service. If you accept the default location, Muhimbi PDF Converter will automatically detect Ghostscript’s file path.
Once Ghostscript has been installed, access our web services-based interface from your own software and make sure the
PDFA.PostProcessing configuration value is correctly set (details for use with SharePoint can be found further below). Set the web service’s
ConversionSettings.PDProfile property to PDF_A1B, and any converter that doesn’t natively support PDF/A1b output will automatically send the generated PDF file to the post-processor.
However, depending on your exact needs, you may want to update a number of settings in the Muhimbi conversion service configuration file:
Muhimbi.DocumentConverter.Service.exe.configin your favorite text editor. The file is stored in the same directory where the Muhimbi conversion service is installed. You can find a shortcut to this folder in the Muhimbi Document Converter group in your start menu.
Change the following settings if needed:
Ghostscript.Path — Leave the path empty to auto-detect the path. When manually specifying the path, include the executable as well, e.g.
PDFA.PostProcessing — Not all converters are able to provide native PDF/A1b output. Use this setting to post-process any generated PDF file. Valid values are
All(post-process files generated by all converters, including the ones that are supposed to already support PDF/A1b),
WhenNeeded(post-process files for only those converters that don’t support native PDF/A1b output), or
None(don’t post-process files generated by any converters). The latter is the default option. Please note that these values will only be used if the output format is set to PDF_A1B, either in the web service call or via the global
PDFA.RasterizeTransparentContent — Define how transparent content is dealt with during conversion to PDF/A. The default setting,
False, removes all transparency. If you want to retain transparent objects, set this value to
True, which will result in pages being rasterized, in turn resulting in considerably larger and slower PDF files.
ConversionSettings.ForcePDFProfile — Override the web service’s
ConversionSettings.PDFProfilevalue during conversion. Leave this empty to use the setting specified in the web service call. Accepted values are members of the
Muhimbi.DocumentConverter.WebService.Data.PDFProfileenumeration or an empty string — for example, PDF_1_5 (use PDF version 1.5) or PDF_A1B (use the PDF/A standard for long-term archiving).
Restart the Muhimbi conversion service from the Windows Services Management Console.
If you don’t interact with Muhimbi’s web services interface directly, but rather use the SharePoint frontend functionality that comes with PDF Converter for SharePoint — e.g. workflow activities or manual PDF conversion — then you must set the
ForcePDFProfile configuration value to PDF_A1B or PDF_A2B. Note that this is a global switch that forces all functionality provided by our product to output in PDF/A format. This may have side effects when applying PDF security. When using SharePoint workflows, you can also specify the PDF profile on a conversion-by-conversion basis. For more information, see this post.
Sample code to convert PDF documents to PDF/A using our web services interface can be found here. If you don’t want to use Ghostscript, or if your organization already uses a different PDF-to-PDF/A converter that you want to integrate with, contact us.
Issues and Limitations
At this moment in time, we’re aware of the following issue:
- Merged documents — When documents are merged using the Muhimbi PDF Converter and subsequently post-processed for PDF/A output, Acrobat’s validator shows the “CIDSystemInfo and CMap dict not compatible” validation error. It doesn’t happen for all document types, and we’re confident this message doesn’t have any significant side effects.
There are other side effects inherent to the PDF/A1b standard that we currently don’t have a workaround for. For example, as transparency isn’t supported, any documents that use (semi) transparent objects may not look the same as the source document. Similarly, because fonts must be embedded in PDF/A-compliant documents, the resulting PDF file may be larger than expected, although in many cases, you’ll find the files to be smaller than the source files.
Converting to PDF/A Format Using Nintex Workflow
The following example shows how you can use our Convert document action, along with overrides, to convert to PDF/A format. This is a single-step workflow configured to start in a document library when any document is created or updated. On a high level, the workflow will look like what’s shown in the following image.
Add the Convert document action listed under the Muhimbi PDF section in Nintex. You can fill in this section as shown in the image below.
You may want to leave the Destination Path field empty, which will write the PDF file to the same location as the source file. Hover over the small information icons for more information about the Destination Path or any other fields.
You’ll also need to add the following override under the Content section:
This override will make sure that all converted files conform to the PDF/A2b standard. The
PDFProfile element supports the following values:
- PDF_A1B — Use the PDF/A1b standard for long-term archiving.
- PDF_A2B — Use the PDF/A2b standard for long-term archiving.
- PDF_A3B — Use the PDF/A3b standard for long-term archiving.
- PDF_1_1 — PDF 1.1 output (compatible with Acrobat 2.0 (1994) and later).
- PDF_1_2 — PDF 1.2 output (compatible with Acrobat 3.0 (1996) and later).
- PDF_1_3 — PDF 1.3 output (compatible with Acrobat 4.0 (2000) and later).
- PDF_1_4 — PDF 1.4 output (compatible with Acrobat 5.0 (2001) and later).
- PDF_1_5 — PDF 1.5 output (compatible with Acrobat 6.0 (2003) and later).
- PDF_1_6 — PDF 1.6 output (compatible with Acrobat 7.0 (2005) and later).
- PDF_1_7 — PDF 1.7 output (compatible with Acrobat 8.0 (2006) and later).
Click Save and Publish to publish the workflow.
Upload a document in the source location, and you’ll notice that this workflow will successfully convert it to PDF/A format.