Yesterday I blogged about new PDF functions added in ColdFusion 8: isPDFFile and isPDFObject. Today I'm going to continue my discussion of the new PDF tools in ColdFusion 8 by introducing the CFPDF tag. This is one of the 5 new PDF related tags added to ColdFusion 8. This one tag can do many things:
- It can add or remove a watermark to a PDF.
- It can remove pages from a PDF. (Ever wanted to remove the legal crap from in front of a PDF? Or an ad?)
- It can return information about a PDF.
- It can merge multiple PDFs into one.
- It can add/remove security from a PDF.
- It can read a PDF. (Duh.)
- It can set metadata to a PDF.
- It can create thumbnails from a PDF.
- It can write out to a PDF.
Lets start off with a simple example of reading a PDF. Consider the following example:
<cfif isPDFFile("book.pdf")>
<cfpdf action="read" source="book.pdf" name="mypdf">
<cfdump var="#mypdf#">
</cfif>
I begin by checking to see if a file is a proper PDF. If it is, I then use the CFPDF tag to read the PDF into a variable named mypdf. At that point I can dump the PDF and see information about it. By the way, the same trick (reading and dumping) works for images as well.

I've displayed the dump to the left, and you can see it reveals quite a bit of information about my PDF. The PDF I had used was one made from scratch using CFDOCUMENT, so somethings like Author and Keywoard are blank. But it did pick up the page size and security settings. It is too bad that CFDOCUMENT doesn't easily allow us to set the metadata, but guess what? We can use the CFPDF tag to correct that!
The setInfo command lets you pass in a struct of information. You can change the author, the subject, the title, and the keywords for a PDF. Let's look at a simple example:
<cfif isPDFFile("book.pdf")>
<cfset data = {author="Raymond Camden", Subject="Paris Hilton", Title="The Wit and Wison of Paris Hilton", KeyWords="paris hilton,wisdom,wit"}>
<cfpdf action="setinfo" source="book.pdf" info="#data#">
<cfpdf action="getinfo" source="book.pdf" name="mdata">
<cfdump var="#mdata#">
</cfif>
I first create a simple struct of data. I then pass this struct to the CFPDF tag, noting the action of setinfo, the source for my PDF, and the struct of data. I then use getInfo to get the information back, and dump it. Now my PDF created from CFDOCUMENT has proper metadata in it.
Tomorrow I'll demonstrate adding and removing watermarks from your PDF documents.
Archived Comments
It disturbs me that when you type about Paris Hilton your spelling error chance increases by 1000% percent.
Look at the Title of your PDF, Wison huh?
"The Wit and Wisdom of Paris Hilton"
Wouldn't that be an empty document?
Well, it *might* have a watermark in it. Tune it tomorrow to see.
I am trying to change the Title of some PDF's (Which already have titles) and i am not having any luck. It seems like Setinfo does not override current settings.
The strange thing though is when i view the CFDUMP everything is correct. but not when i View the PDF in Acrobat and view the document properties.
Any Ideas?
Did you _save_ the PDF object after you did setInfo?
Ray - just wondering if you knew a way to copy the content of a pdf that has ContentExtraction=NotAllowed and CopyContent=Allowed? Basically, I would like to use CFPDF to programatically copy the content of the pdf to a text file so i could then parse it into normalized data. I can copy and paste manually from the pdf, just not sure if i can use CFPDF or some other technique to do this in an automated way. Any advice or suggestions are greatly appreciated.
Steve, search my blog for my pdfutils.cfc. It has a utility which uses DDX to get the text form a PDF. You can't do it directly with CFPDF, but you can with CFPDF and DDX.
Thanks Ray - I'll check that out.
After I do a SETINFO with CFPDF to update keywords, author and subject, the updated data is visible IN ACROBAT 5 but it does not display in newer versions of Acrobat or Acrobat Reader when you go to view document properties.
How can I correct this?
Thanks,
Jim
Not sure. I had no problem with this at all. Maybe you can share your code.
I set up a structure and enter the title, author, keywords, and subject. Then it is a simple cfpdf setinfo tag. The changed info shows up in Adobe reader 5, but not in newer versions, the older information still shows up. Is there a change in field names maybe?
<cfpdf action="SETINFO" source="#fldir##flnm#" info="#PDFInfo#" destination="#fldir##flnm#" overwrite="Yes" >
Hello Ray - just came across your article on CFPDF and I'm curious.
Can CFPDF read a .pdf file that is saved as an image? We recently
had a need to read a .pdf but nothing we had would read it. We
finally purchased a product that did some OCR on it and pulled out
the text. Just curious if CF8 could do that for me. Currently, still
on CF7.
No - cfpdf reads PDFs only. But CF8 has image support and can work with images.
Thanks for the quick reply Ray.
Would the image support be able to read the text within the image .pdf?
Robert
No, there is no built-in OCR support.
Thank you! It appears we made the right decision then.
Ray, does your example work when you try to modify the Language property? It doesn't seem to update when using setinfo.
Turns out the Language property is Read-Only. However I did find a work around using iText here: http://www.henke.ws/post.cf...
Ray, does the AllowSecure (to allow electronic signing) only work with Acrobat? When I bring the resulting pdf up in Reader, it's properties show that signing is not allowed.
Not sure. It kinda sounds like something they would lock down to the higher end viewer.
My website is under construction so i am in the process of doing SEARCHING pdfs from the local disk using SOLR. I am UNABLE to show the CONTEXT of PDF
Did you edit the collection to allow for context? Did you then reindex?