Splitting PDF Files Using a SharePoint Workflow and the Muhimbi PDF Converter

Some time ago we introduced a facility in the Muhimbi PDF Converter for SharePoint to merge files using the User Interface, Workflow and Web Service calls. Naturally we have to support the ‘other side of the equation’ as well, resulting in the new PDF Split functionality described in this article.

This post shows how to use a SharePoint Designer Workflow to automatically split up an existing PDF file into multiple files containing 10 pages each. This is quite a common scenario for organisations that deal with massive documents who frequently split up these kind of files in batches of 100 pages to keep the files manageable. If your document is using a format other than PDF then make sure your use our Convert to PDF Workflow Activity first.

The SharePoint Designer Workflow Activity is named Split PDF. After adding it to your workflow you will see the following Workflow Sentence.

Split-Sentence

The workflow sentence is consistent with our other Workflow Activities (e.g. Converting / watermarking), and is largely self-describing. The following fields are available:

  • This document: The document to split up. For most workflows selecting Current Item will suffice, but some custom scenarios may require the look up of a different item. You may also want to check that the file type of the document is ‘ pdf’ before trying to split it up.
  • This file: The name and location of the split files are stored in here. Leave this field empty to use the same folder and file name as the source file, but with sequential numbers added. However, you can optionally specify a path and / or filename template.
  • Path: Enter a path, including the Document Library and any folder names, to write the split files to. E.g. “ shared documents/split files/”. You can even specify a different site collection by starting the path with a '/' (never start with 'http:'). When just specifying a path, without the file name, then make sure to use a trailing ‘/’.
  • File Name: The file name can be anything and allows the standard .NET string formatting facilities for numbering, e.g. 'split-{0:3D}' will use 3 digits for the sequential numbers starting at ‘split-001.pdf’. When splitting by bookmark then an optional {1} parameter can be inserted in the file name to include the name of the bookmark as well.
  • Number of pages / bookmark level: Specify if you wish to split based on the number of pages or the level of the bookmark.
  • Batch size: When splitting based on the number of pages then this parameter must be set to the maximum number of pages to include in each split file. When splitting based on the bookmark level then this parameter should contain the ‘depth’ at which to split. E.g. specify ‘1’ to split on top level chapters (Chapter 1, chapter 2, etc.) or a higher number to split at a deeper level (e.g. ‘2’ splits on Chapter 1, 1.1, 1.2, 2, 2.1 etc.)
  • Parameter ‘List ID’: The ID of the list the split files were written to. This can later in the workflow be used to perform additional tasks on the file such as performing a check-in or out.
  • Parameter ‘List Item IDs’: Unlike our other workflow activities, this parameter will return a string with ‘;’ separated values of the generated item IDs. This list can then be used by other (custom) activities, e.g. the ones created by our Workflow Power Pack, to process the individual files further.

A note about splitting based on bookmark levels: PDFs store bookmarks at the page level, so it is not clear on what part of the page a heading starts or ends. As a result an extra page will always be exported for each file split based on bookmark levels.

For example let’s assume the following document:

  • Page 1: Contains chapter 1 and sections 1.1. and 1.2.
  • Page 2: Contains the last paragraph of 1.2 and all of chapter 2.
  • Page 3: Contains Chapter 3.

When splitting this document based on bookmarks using ‘1’ as the batch size then the following files will be created:

  • File 1: Contains page 1 and 2 as expected.
  • File 2: Contains pages 2 and 3 even though Chapter 2 is only really part of page 2. This is because there is no way to know if Chapter 2 runs over into page 3 or not.
  • File 3: Contains Chapter 3.

With all the theory out of the way, let’s create a simple example to split up PDF files in batches of 10 pages. .

  1. Download and install the Muhimbi PDF Converter for SharePoint.  Version 5.2 or newer is required.

  2. Make sure you have the appropriate privileges to create workflows on a site collection.

  3. Create a new workflow using SharePoint Designer.

  4. Associate the workflow with the library of your choice. Do not tick any of the boxes next to the ‘ Automatically start….’ options, we want to start this workflow manually. If you wish to run this workflow automatically then you may want to add an extra column to determine if a file has been split before, similar to the technique used in this post.

  5. Design the workflow as per the following screen. In summary it does the following:

    • Check if the file is in PDF format. Otherwise it cannot be split.
    • The ‘split’ files are written to a folder named ‘Split Files’ so make sure this folder exists. e.g. " Shared Documents/Split Files/spf-{0:D5}.pdf”. You can leave our sample file name or merge the file’s name in using workflow lookups.
    • Log the generated list of Item IDs to the workflow history.

Split-Worfklow

Publish the workflow and create / convert / upload a PDF file in the document library. From the file's context menu select 'Workflows' and run your workflow. Depending on the size of the document the split files will be generated in a matter of seconds.

Labels: Articles, News, pdf, PDF Converter, Splitting, Workflow

Have a Question?
We’re Always Happy to Help.

© Muhimbi Ltd. 2008 - 2024
This website uses cookies to ensure you get the best experience. Learn more