How to Split PDF Pages & Files using C#

Related Products

PDF Converter

PDF Converter

Share

To facilitate the new PDF Splitting facility in our PDF Converter for SharePoint we have added the ability to split a single file into multiple ones to our core PDF Conversion engine, which our SharePoint product shares with our generic Java / .NET oriented PDF Converter Services.

In this post we’ll describe in detail how to invoke this new splitting facility from your own code. This demo uses C# and .NET, but the web services based interface is identical when used from Java ( See this generic PDF Conversion sample).

This post is part of the following series related to manipulating PDF files using web services.

Key Features

The key features of the new splitting facility are as follows:

  1. Split a single PDF file into one or more individual PDF files.
  2. Split based on number of pages or bookmarks.
  3. Automatically generate numbered file names using .NET’s formatting syntax, e.g. 'split-{0:3D}.pdf' will use 3 digits for the sequential numbers starting at ‘split-001.pdf’. When splitting by bookmark then an optional {1} parameter can be inserted in the file name to include the name of the bookmark as well.
  4. Can be combined in combination with other actions, e.g. convert & merge.

A note about splitting based on bookmark levels: PDFs store bookmarks at the page level, so it is not clear on what part of the page a heading starts or ends. As a result an extra page will always be exported for each file split based on bookmark levels.

For example let’s assume the following document:

  • Page 1: Contains chapter 1 and sections 1.1. and 1.2.
  • Page 2: Contains the last paragraph of 1.2 and all of chapter 2.
  • Page 3: Contains Chapter 3.

When splitting this document based on bookmarks using ‘1’ as the batch size then the following files will be created:

  • File 1: Contains page 1 and 2 as expected.
  • File 2: Contains pages 2 and 3 even though Chapter 2 is only really part of page 2. This is because there is no way to know if Chapter 2 runs over into page 3 or not.
  • File 3: Contains Chapter 3.

Object Model

The object model is relatively straight forward. The classes related to PDF Splitting are displayed below. A number of enumerations are used as well by the various classes, these can be found in our original post about Converting files using the Web Services interface.

ClassDiagram-Splitting

The Web Service method that controls splitting (as well as merging) of files is called ProcessBatch. It accepts a ProcessingOptions object that holds all information about the files to process and the operations to apply. A Results object is returned that, when it comes to splitting of files, contains one or more results that hold the contents of the file as well as the suggested output file name, which you may us to save the file locally.

As the ProcessingOptions class accepts both MergeSettings and SplitOptions it is possible to convert and merge a set of input files and then split up the results, all in a single web service call. Just populate the various properties and the system will take care of the rest.

Example code

The following sample describes the steps needed to split up a single PDF file based on the number of pages. We are using Visual Studio and C#, but any environment that can invoke web services should be able to access this functionality. Note that the WSDL can be found at http://localhost:41734/Muhimbi.DocumentConverter.WebService/?wsdl.

A generic PDF Conversion Java based example is installed alongside the product and discussed in the User & Developer Guide. The source code for this example can be found in the folder the Muhimbi Conversion service has been installed to.

  1. Start a new Visual Studio project and create the project type of your choice. In this example we are using a standard .net 3.0 project of type Console Application. Name it ‘Split PDF’.
  2. In the Solution Explorer window, right-click References and select Add Service Reference. (Do not use web references!)
  3. In the Address box enter the WSDL address listed in the introduction of this section. If the Conversion Service is located on a different machine then substitute localhost with the server’s name.
  4. Accept the default Namespace of ServiceReference1 and click the OK button to generate the proxy classes.
  5. Optionally add a PDF file to the solution, set the Build Action to None and Copy to Output Directory to Copy if newer. By doing this there will always be a valid test file in the same directory as the compiled executable.
  6. Copy and paste the following code and replace the contents of Program.cs.

Compile the application and run it either from the command prompt, with a path to the PDF file to split on the command line, or – if a PDF file is present in the executable’s folder – just run it.

Note that In this example we are programmatically configuring the WCF Bindings and End Points. If you wish you can use the declarative approach using the config file as well.

This new functionality is available as of version 5.2 of our software.

Labels: Articles, News, pdf, PDF Converter Services, Splitting

Have a Question?
We’re Always Happy to Help.

© Muhimbi Ltd. 2008 - 2024
This website uses cookies to ensure you get the best experience. Learn more