Getting page orientation and size from PDFs

This post is more than 2 years old.

I had an interesting conversation today with Josh Knutson. He noticed that the getInfo action of CFPDF did not return either the page orientation (landscape versus portrait). Nor does it say anything about the page size. He needed to put a watermark in a PDF but the position depended on the orientation of the PDF.

I didn't believe him at first, but a quick test showed that he was right. This information is not returned. So we both looked into DDX. Turns out there is a DDX operation called DocumentInformation. This DDX can be used to return information about the PDF. It returns some, but not all, of the same things getInfo returns, but it also returns information that getInfo does not. Included in this information is both a page size (for all the pages, so you could have multiple sizes) and an orientation. (Although they call it rotate90. It will be 0 for portrait and 90 for landscape.) The DDX is fairly simple:


<DocumentInformation result="Out1" source="doc1" />

I had a lot of trouble getting this working though. When I first tried this, I had:


<DocumentInformation result="Out1" source="doc1">
</DocumentInformation>

This kept giving me an invalid DDX error. I knew DocumentInformation was supported. I even did an isXML test on the DDX and it returned true. Yet an immediate call to isDDX always returned false. Then I tried this:


<DocumentInformation result="Out1" source="doc1"></DocumentInformation>

And it worked! For some reason, DDX isn't happy with the line break after the end of the tag, which is odd since it's certainly valid XML. Of course, the shorthand version (../>) works and is the shortest form.

So anyway - I took this code and added it to PDFUtils, my CFC wrapper for complex PDF actions. You can now do this:


<cfset pdf = createObject("component", "pdfutils")>

<cfset mypdf = expandPath("../test.pdf")>
<cfset eInfo = pdf.getExtraInfo(mypdf)>
<cfdump var="#eInfo#">

The result struct will contain all the information returned by the DDX action.

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by Gary Gilbert posted on 8/13/2008 at 12:30 PM

Ray,
I think you mean

&lt;DocumentInformation result="Out1" source="doc1"&gt;&lt;/DocumentInformation&gt;

In your example you are missing the closing &gt;

Comment 2 by Gary Gilbert posted on 8/13/2008 at 12:31 PM

Hi Ray

<DocumentInformation result="Out1" source="doc1"></DocumentInformation>

You are missing the closing >

Please feel free to delete the first comment

Comment 3 by Raymond Camden posted on 8/13/2008 at 3:45 PM

Fixed. Thanks.

Comment 4 by phill.nacelli posted on 8/13/2008 at 5:50 PM

Another method is to use the already installed iText library that the cfpdf tag uses to get more information about your pdf file:

<cfscript>
// pdf file path
pdfFilePath = "existingPdfFile.pdf";

// init document
document = createObject("java","com.lowagie.text.Document").init();

// init fileIO
fileIO = createObject("java","java.io.FileOutputStream").init(pdfFilePath);

// get writer
pdfWriter = createObject("java","com.lowagie.text.pdf.PdfWriter").getInstance(document,fileIO);

// output page size info; format is: baseXheight (rot: n degrees)
writeOutput(pdfWriter.getPageSize());
</cfscript>

Comment 5 by Josh Knutson posted on 8/13/2008 at 5:55 PM

I still can't believe how long that took to actually get it working just right to return such simple values. I am more surprised that the getInfo from cfpdf didn't return anything that would have helped out.

Comment 6 by Sam Farmer posted on 8/14/2008 at 1:51 AM

Awesome find.

Much easier than my other solution: convert first page to jpeg at 100%, read image and get image width and height!