Getting page orientation and size from PDFs

August 12, 2008 coldfusion

(This post is more than 2 years old.)

I had an interesting conversation today with Josh Knutson. He noticed that the getInfo action of CFPDF did not return either the page orientation (landscape versus portrait). Nor does it say anything about the page size. He needed to put a watermark in a PDF but the position depended on the orientation of the PDF.

I didn't believe him at first, but a quick test showed that he was right. This information is not returned. So we both looked into DDX. Turns out there is a DDX operation called DocumentInformation. This DDX can be used to return information about the PDF. It returns some, but not all, of the same things getInfo returns, but it also returns information that getInfo does not. Included in this information is both a page size (for all the pages, so you could have multiple sizes) and an orientation. (Although they call it rotate90. It will be 0 for portrait and 90 for landscape.) The DDX is fairly simple:


<DocumentInformation result="Out1" source="doc1" />

I had a lot of trouble getting this working though. When I first tried this, I had:


<DocumentInformation result="Out1" source="doc1">
</DocumentInformation>

This kept giving me an invalid DDX error. I knew DocumentInformation was supported. I even did an isXML test on the DDX and it returned true. Yet an immediate call to isDDX always returned false. Then I tried this:


<DocumentInformation result="Out1" source="doc1"></DocumentInformation>

And it worked! For some reason, DDX isn't happy with the line break after the end of the tag, which is odd since it's certainly valid XML. Of course, the shorthand version (../>) works and is the shortest form.

So anyway - I took this code and added it to PDFUtils, my CFC wrapper for complex PDF actions. You can now do this:


<cfset pdf = createObject("component", "pdfutils")>

<cfset mypdf = expandPath("../test.pdf")>
<cfset eInfo = pdf.getExtraInfo(mypdf)>
<cfdump var="#eInfo#">

The result struct will contain all the information returned by the DDX action.