Posted in ColdFusion | Posted on 07-24-2007 | 13,410 views
In today's entry I'll be discussing the processDDX action of the CFPDF tag. I have to admit that I wasn't looking forward to this entry. Every time I had looked at the documentation, it just didn't make sense. I didn't see the point. But now that I've looked at it again more in depth, I'm almost in awe at how cool this feature is. I'm definitely just scratching the surface in this blog post, but hopefully it will encourage others to look into DDX and how it works with ColdFusion.
So as you can probably guess, CFPDF's processDDX action lets ColdFusion work with DDX. Ok, so what in the heck is DDX? DDX stands for Document Description XML. You can think of it like a template for a PDF file. At a basic level, it lets you lay out PDF files (like the Merge option does) and add special commands (generate a table of contents for example). DDX is used by Adobe's LiveCycle Assembler product. ColdFusion ships with a stripped down version of this product. The exact XML tags not allowed in ColdFusion are listed in the documentation. As far as I can see, there is no way to enter a serial and enable the full power of LiveCycle Assembler. But even with the restrictions there is an incredible amount of power that you have built in. As I mentioned above, this entry is only going to talk at a high level about DDX. You can find the DDX reference here. Also as Charlie Arehart has mentioned in a comment in my PDF series, the ColdFusion documentation is excellent. I want to credit them for my examples below as all are either direct copies or modified versions of their examples. Also note that this is a very complex topic. There is a good chance I will screw something up so please let me know if I do.
Let's begin by talking about how you use DDX in ColdFusion. ColdFusion 8 adds an isDDX() function. This function takes either a relative/absolute path to a filename or an actual string of DDX tags. Don't worry too much about the XML just yet, but here is a simple example of checking a string to see if it is valid DDX:
2<?xml version="1.0" encoding="UTF-8"?>
3<DDX xmlns="http://ns.adobe.com/DDX/1.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd">
4<PDF result="Out1">
5<PDF source="Title"/>
6<TableOfContents/>
7<PDF source="Doc1"/>
8<PDF source="Doc2"/>
9</PDF>
10</DDX>
11</cfsavecontent>
12<cfset myddx = trim(myddx)>
13
14<cfif isDDX(myddx)>
15yes, its ddx
16<cfelse>
17no its not
18</cfif>
In this example I've just used the CFSAVECONTENT tag to wrap my DDX XML. I trim it and then check to see if it is DDX. Now that I've shown you a bit of DDX, let me talk a bit about what that example does. Ignoring the DDX tag, there are 2 XML tags in use here, PDF and TableOfContents. The first PDF tag uses result="Out1" and wraps the other tags. This basically says the result of everything on the inside should be put into a result named Out1. On the inside there are 3 PDF tags with a source. You can think of this like a merge. The tags specify an order based on names: Title, Doc1, and Doc2. So far so good. But then note that a TableOfContents tag exists right after the Title PDF. This particular tag can do a lot - but at a basic level, it just says, "Create a table of contents using the PDFs following me."
So let me repeat what I said above. This is partially for my sake to ensure I'm describing it right (remember what I said, I'm new to this!). What we have is a template that takes 3 PDFs. It puts the Title PDF first. It defines a page as a Table of Contents. It then lays down two more PDFs. Let's take a look at how ColdFusion can work with this DDX.
First note that the DDX worked with PDF names. Notice I don't have any real file names. Nor do I have ColdFusion variables. Instead I have labels like Out1, Title, Doc1, and Doc2. So we need a way to pass real values so that LiveCycle Assembler can use them when processing the DDX. The CFPDF tag takes two related attributes, inputFiles and outputFiles. Each of these are a structure of names to file names. So using our sample DDX above, I can define my 3 input PDFs like so:
2<cfset inputStruct.Title="title.pdf">
3<cfset inputStruct.Doc1="paris.pdf">
4<cfset inputStruct.Doc2="booger.pdf">
Defining the output file is also struct based:
2<cfset outputStruct.Out1="output1.pdf">
Ok so at this point I've detailed all the various variables used in the DDX file. Now lets use CFPDF to run the process:
Pretty trivial I think. I passed in my structs and DDX. At this point I now have a result. If I dump ddxVar, I will see a structure. Each key of the structure maps to the output key from my DDX. I had used this tag:
So ddxVar.out1 will contain a status message for my result. It will either be "successful" or "failed" followed by a reason. One quick note. You will notice I used paths for my PDFs. In order to use DDX, you have to work with real files. You can't pass in a PDF created in memory. Obviously you can make the PDF on the fly and save it in the same request.
If you view your PDF now (remember it was named output1.pdf), you may notice that you don't have a table of contents. Turns out that the TableOfContents tag looks for a bookmark. I had to switch this code:
2<h2>Paris Hilton</h2>
3
4<p>
5Here is the collected wisdom of Paris Hilton.
6</p>
7</cfdocument>
To this:
2<cfdocumentsection name="Paris Section">
3<h2>Paris Hilton</h2>
4
5<p>
6Here is the collected wisdom of Paris Hilton.
7</p>
8</cfdocumentsection>
9</cfdocument>
Note the use of bookmark=true and a cfdocumentsection that wraps the entire page. That was slightly confusing at first, but the end result is perfect. What is great is that my ColdFusion Cookbook site will be able to benefit from this. Right now I have something like 120+ pages in a PDF with no real easy way to navigate. By using DDX I'll be able to add a real table of contents to document!
So what else can you do with DDX? As I mentioned some features were removed from the bundled product, but what is left is still pretty awesome. Charlie Arehart added a comment to another of my blog articles saying that he wished it were simpler to add a watermark to a PDF. I.e., just add "Foo" to the PDF without needing to make a new PDF or an image. Turns out DDX supports that as well. Here is some sample DDX that demonstrates how to apply a watermark. Again - check the LiveCycle Assembler DDX documentation for explicit documentation on each tag.
2<DDX xmlns="http://ns.adobe.com/DDX/1.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd">
3<PDF result="Out1">
4<PDF source="Doc1">
5<Watermark rotation="30" opacity="50%">
6<StyledText><p font-size="85pt" font-weight="bold" color="gray" font="Arial">FINAL</p></StyledText>
7</Watermark>
8</PDF>
9</PDF>
10</DDX>
Nothing too terribly complex here. Frankly I find this a bit easier than earlier PDF and watermarks blog article. Maybe not easier per se - but I find it to be more direct. And in case it isn't obvious - since the DDX is completely abstracted, you can pass any PDF in that you want and specify any output. One thing I'm not sure on is if the value of the watermark, the text, can be dynamic as well. Obviously I can generated my DDX in ColdFusion, so yes, it can be dynamic, but I'm curious to know if DDX supports variables for values like the text between the P tags.
One more example. I always wondering why there wasn't a way to read the text of a PDF. Turns out there is - DDX. Consider this simple DDX example:
2<DDX xmlns="http://ns.adobe.com/DDX/1.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://ns.adobe.com/DDX/1.0/ coldfusion_ddx.xsd">
3<DocumentText result="Out1">
4<PDF source="doc1"/>
5</DocumentText>
6</DDX>
Here is the source PDF I used:
2<h2>Paris Hilton</h2>
3
4<p>
5<cfoutput>
6This is the text of a PDF. It has a bit of randomness (#randRange(1,100)#) in it.
7</cfoutput>
8</p>
9
10<cfdocumentitem type="pagebreak" />
11
12<h2>Fetch Adams</h2>
13
14<p>
15<cfoutput>
16This is the second page. It has a bit of randomness (#randRange(1,100)#) in it.
17</cfoutput>
18</p>
19
20</cfdocument>
When processed, you get an XML file. The result will look something like so:
2<DocText xmlns="http://ns.adobe.com/DDX/DocText/1.0/">
3<TextPerPage>
4<Page pageNumber="1">Paris Hilton This is the text of a PDF . It has a bit of randomness ( 67 ) in it .</Page>
5<Page pageNumber="2">Fetch Adams This is the second page . It has a bit of randomness ( 7 ) in it .</Page>
6</TextPerPage>
7</DocText>
Notice how the HTML was removed. What's cool about this is that if you ned to index PDF data and you don't want to use Verity, you could use this instead. (I think tonight I'll write a quick UDF just for this.)
That's it for this blog entry. I want to remind folks - DDX is a big topic and I didn't cover much at all. I also used a lot of code in this example so I've taken all my test CFMs and PDFs and packaged them as a zip attached to this article.


Do you know, are the CF8 PDF functions still using iText at all, or is it all Adobe technology now?
Too bad, I was really hoping for this functionality.
failed: DDXM_S18005: An error occurred in the PrepareTOC phase while building <TableOfContents>. Cause given.
I only get this error when using the TableOfContents element in the DDX:
<TableOfContents maxBookmarkLevel="infinite" bookmarkTitle="Table of Contents" includeInTOC="false">
<Footer styleReference="CatalogueFooter" />
</TableOfContents>
Any thoughts?
Unfortunately I have been able to get it to work yet. I have added a comment to the CF8 docs but would also love to hear if anybody else has used this successfully.
or do they parse out pdf's into text files and search through those?
what are the differences in on resources?
I HAVE been able to use this with the pdf's I have created with ColFusion8. My initial confusion was dealing with (existing) pdf's that "looked" as if they had a header and footer BUT when I converted those pdf's to text I found that it was actually body text stretched out the edge of the page.
@Verity
I have been reluctant to use Verity because a) it does consume quite a bit of RAM and b)Databases like MySql come with Full Text Searching built in. Fair enough MySql doesn't search pdf documents though.
I tried using the Adobe docs to split Verity off onto its own server, but I was never able to make it work successfully on CF7 or CF8. If anyone has ever completed it successfully, I certainly would be interested.
Thanks!
Also - note that in 8.0.1, you can now supply HTML for watermarks. This means you don't need to use DDX for it anymore.
I've confirmed this same behavior on another Mac running Leopard too. You've got a Mac right? You don't see this behavior?
I appreciate the heads up on 8.0.1 allowing watermark text outside of ddx, I'll give that a shot.
http://www.adobe.com/go/wish
I'm running your example code for pdf generation using a DDX files. Specifically, the ddx2.cfm
I'm getting the same error as Brian above.
failed: DDXM_S18005: An error occurred in the PrepareTOC phase while building <TableOfContents>. Cause given.
I've narrowed it down. When you add bookmark="true" to cfdocument you get the error. If you don't have bookmark="true" it works but no TOC. But I saw your output2.pdf HAS a TOC. Any idea why your code won't run on my copy of CF8? I've tried it on the developer edition and a standard version.
Thanks!
Installing the 8.01 update solved the problem. That will teach me to not run the latest version of CF.
Did a <a href="http://www.designovermatter.com/blog/index.cfm/200... post</a> for anyone who does a google search on the error message. (Which is what I did).
Cheers
http://livedocs.adobe.com/livecycle/8.2/ddxRef/wwh...
But certainly it's easier doing it in CFML (imho).
The OutXML seems like it's reading across the lines instead of down the columns.
The docs (http://livedocs.adobe.com/livecycle/es/sdkHelp/pro...) say "the order in which the words are listed is not guaranteed to be the reading order." I've also tried mode="WithQuads", but it seems like it would be really tough to reconstruct the text from the coordinates.
Any ideas?
When you get this field attribute for searching and print the result, you get an large paragraph that contain a lot of sentance from diferent position of the pdf, but nothing is seaparte these sentance. you know what i mean?
In the text the your article about ColdFusion8 and DDX; you include a lnk to some documentation on LiveCycle DDX 'language' in depth. In fortunately the link is now broken.
Is there any chance that you might be able to inform me of the current link to the same information.
You see, I am implementing some code to create a compound PDF document that could do with a Tablee of Contents; (which I have working in a basic sense) however the formatting could do with some tweaking. Someone elses Blog mentioned how to do it, but their code gives up some errors; so I want to check all of the syntax.
Thanks,
Bryn Parrott
http://help.adobe.com/en_US/livecycle/9.0/ddxRef.p... (PDF obviously)
Just wondering if CF(9) will let us set multiple levels in the TOC. It appears that DDX would have nog problem with that (maxBookmarkLevel="infinite"), but the CF documentation explicitly states that all bookmarks are placed directly under the root.
Any ideas on how to get this done?
Thnx in advance!
Jasper
Now if you were to consolidate documents that were created some other way then it might work. PDFLib perhaps.
To be honest it would have been far better had Adobe provided the capability to create bookmarks based on HTML H tags e.g. h1, h2, h3 etc. or even something like <cfdocumentbookmark /> perhaps !!. Far more intuitive, flexible and easy to work with than documentsections.
Don't get me started on DDX by the way. No fun at all.
[Add Comment] [Subscribe to Comments]