There standard constructor for PdfReader looks like this:
PdfReader reader = new PdfReader("Test.pdf");
It is worth noting that when manipulating larger pdf documents there is an alternative, memory saving constructor that can be used.
PdfReader reader = new PdfReader(new RandomAccessFileOrArray("Test.pdf"), null);
This reduces the amount of memory used initially, then increases as the pdf is worked with.
Document properties
The main properties you will be interested in will be the number of pages in a document, but you can also query other interesting things like file length and whether the document is encrypted.
int pagesInDocument = reader.getNumberOfPages();
int fileLength = reader.getFileLength();
boolean encrypted = reader.isEncrypted();
Notice that here we read in a pdf file saved on disk. It is also possible to read a pdf buffered in memory.
ByteArrayOutputStream bufferedPdf;
...
//add stuff to the output stream using appropriate writer
...
pdfReader reader = new PdfReader(bufferedPdf.toByteArray()); //read the buffered pdf
This works exactly the same as writing it to disk and reading it in, but saves having to create and delete a file if you were manipulating the pdf before output.
Page Size and Rotation
There are various methods for retrieving interesting information about pdf pages within a document.
Rectangle pageSize = reader.getPageSize(1); //get the page size for the first page
int pageRotation = reader.getPageRotation(1); // get the rotation of the first page
Rectangle pageSizeWithRotation = reader.getPageSizeWithRotation //takes rotation into account
Let's imagine we have a document with 2 pages in it, both A4 (595x842) but the second page is landscape. getPageSize(1) and getPageSize(2) would both return a rectangle with dimensions 595x842. However, getPageSizeWithRotation(1) would return 595x842 and getPageSizeWithRotation(2) would return 842x595 - the same as before but rotated 90 degrees for landscape.
Retrieving Bookmarks
Internal bookmarks are retrieved as a List using the PdfReader.
List bookmarkList = SimpleBookmark.getBookmark(reader);
Metadata
Meta data is set using key-value pairs stored in a map. To access this, we can retrieve this Map and iterate over it.
Map metaDataInfo = reader.getInfo(); // Map of key value pairs for the meta data
String key;
String value;
for (Iterator i = info.keySet().iterator(); i.hasNext();){
key = (string) i.next();
value = (string) info.get(key);
}
You can add to this metadata map using a PdfStamper:
...
//declarations
...
Map info = reader.getInfo();
info.put("Subject", "New Metadata");
stamper.setMoreInfo(info);
In-depth iText information can be found in Bruno Lowagie's excellent book iText In Actionhttp://www.manning.com/lowagie2/
.NET PDF reader and converter SDK, converting PDF to text in .NET
ReplyDelete.NET PDF reader and converter SDK
Deleteconverting PDF to text in .NET