Monday, July 5, 2010

iText - Reading PDF Documents with PdfReader

PdfReaders can be used for a variety of purposes such as accessing document properties like pdf version and file length as well as page sizes and rotation, bookmark information and metadata.

There standard constructor for PdfReader looks like this:

PdfReader reader = new PdfReader("Test.pdf");

It is worth noting that when manipulating larger pdf documents there is an alternative, memory saving constructor that can be used.

PdfReader reader = new PdfReader(new RandomAccessFileOrArray("Test.pdf"), null);

This reduces the amount of memory used initially, then increases as the pdf is worked with.

Document properties

The main properties you will be interested in will be the number of pages in a document, but you can also query other interesting things like file length and whether the document is encrypted.

int pagesInDocument = reader.getNumberOfPages();
int fileLength = reader.getFileLength();
boolean encrypted = reader.isEncrypted();

Notice that here we read in a pdf file saved on disk. It is also possible to read a pdf buffered in memory.

ByteArrayOutputStream bufferedPdf;
...
//add stuff to the output stream using appropriate writer
...
pdfReader reader = new PdfReader(bufferedPdf.toByteArray()); //read the buffered pdf

This works exactly the same as writing it to disk and reading it in, but saves having to create and delete a file if you were manipulating the pdf before output.

Page Size and Rotation

There are various methods for retrieving interesting information about pdf pages within a document.

Rectangle pageSize = reader.getPageSize(1); //get the page size for the first page
int pageRotation = reader.getPageRotation(1); // get the rotation of the first page
Rectangle pageSizeWithRotation = reader.getPageSizeWithRotation //takes rotation into account

Let's imagine we have a document with 2 pages in it, both A4 (595x842) but the second page is landscape. getPageSize(1) and getPageSize(2) would both return a rectangle with dimensions 595x842. However, getPageSizeWithRotation(1) would return 595x842 and getPageSizeWithRotation(2) would return 842x595 - the same as before but rotated 90 degrees for landscape.

Retrieving Bookmarks

Internal bookmarks are retrieved as a List using the PdfReader.

List bookmarkList = SimpleBookmark.getBookmark(reader);

Metadata

Meta data is set using key-value pairs stored in a map. To access this, we can retrieve this Map and iterate over it.

Map metaDataInfo = reader.getInfo(); // Map of key value pairs for the meta data
String key;
String value;
for (Iterator i = info.keySet().iterator(); i.hasNext();){
key = (string) i.next();
value = (string) info.get(key);
}

You can add to this metadata map using a PdfStamper:
...
//declarations
...
Map info = reader.getInfo();
info.put("Subject", "New Metadata");
stamper.setMoreInfo(info);

In-depth iText information can be found in Bruno Lowagie's excellent book iText In Actionhttp://www.manning.com/lowagie2/


2 comments: