July 26, 2007 (This post is more than 2 years old.)

Getting a page from a PDF Document in ColdFusion 8

development

Every needed to extract a page from a PDF document? Yesterday I blogged my new little CFC called PDFUtils. The idea was to take the power of CFPDF and wrap up some utility functions. The first function contained a simple getText() utility that would return all the text in a PDF.

Today I added getPage(). As you can guess, it grabs one page from a PDF. How? Well CFPDF doesn't support getting one page, but it does support deleting pages. All I did was add logic to "flip" a page number into a delete order. This then lets you do:


<cfset pdf = createObject("component", "pdfutils")>
<cfset mypdf = expandPath("./paristoberead.pdf")>
<cfset page2 = pdf.getPage(mypdf, 2)>
<cfdump var="#page2#">

<cfpdf action="write" source="page2" destination="page2.pdf" overwrite="true">

Running this gets you a dump of the PDF object and a new file named page2.pdf that is just - you guessed it - page 2.

I've reattached the code plus sample files and PDFs.

Download attached file.

Support this Content!

If you like this content, please consider supporting me. You can become a Patron, visit my Amazon wishlist, or buy me a coffee! Any support helps!

Want to get a copy of every new post? Use the form below to sign up for my newsletter.

Archived Comments

Comment 1 by Ahamad posted on 7/27/2007 at 4:32 PM

Can't we achive that using this code(get the particular page of a source pdf)

Comment 2 by Raymond Camden posted on 7/27/2007 at 5:07 PM

I'll test that. If so - my code is also one line, but your version wouldn't need the conversion done on the page numbers, and would allow folks to get N pages, not just one.

Comment 3 by Raymond Camden posted on 7/27/2007 at 5:19 PM

Ok I tried this - and it threw an error:

The attribute source specified in the CFPDF tag is either empty or invalid. <br>The error occurred on line 31.

I tried this line:

Comment 4 by Ahamad posted on 7/28/2007 at 6:22 AM

Hi,

I have tested the below code. It works.

Comment 5 by Ahamad posted on 7/28/2007 at 6:23 AM

Please test the above code and let me know how it works for you.

Thanks,
-ahamad

Comment 6 by Raymond Camden posted on 7/28/2007 at 6:37 AM

Ah - I see the problem. You are writing to the file system. I don't want to do that. I want to return a PDF variable to the user. Then they can save/output/whatever.

Comment 7 by Raymond Camden posted on 9/12/2007 at 6:35 AM

This is added to RIAForge:

http://pdfutils.riaforge.org/

Comment 8 by Toby Dawes posted on 11/30/2007 at 9:30 PM

Ray, this is great! I do have a question. I have been running this against a 4399 page PDF (http://senate.state.ny.us/S...
/B5A372B72DB18C75852572C3006068E2/$file/2007cpf.pdf?OpenElement) and I'm grabbing out page 5, for example. When <cfpdf> writes this single page back the file size is about 381kb, but if I do this same thing directly in Acrobat 8, open full document and simply delete all pages but the one I want, the single page size is a measly 13kb. What's with that? An issue with the PDF write in CF8? Very frustrating for the system I'm creating as this is a space issue on the server now. Having 57mb of single PDF pages is much better than 1.6gb of single pages!

Any insight into this? Cheers.

Comment 9 by Toby Dawes posted on 11/30/2007 at 9:35 PM

Sorry, I guess that link was too long, but try this one:

http://www.ppinapod.us/pdf/...

Comment 10 by Raymond Camden posted on 11/30/2007 at 10:39 PM

No idea, Toby.

Comment 11 by Paul posted on 1/10/2008 at 9:09 PM

Well Toby, I'm experiencing the same thing, starting with a 125-page pdf that is 777kb, each individual page created by cfpdf is 518kb. I contacted Adobe, and they asked for PDFs to duplicate the issue. Since my pdf contains payroll data that can't leave our company, I provided a link to your comment and PDF. If I hear anything more I'll keep you in the loop.

Comment 12 by Greg posted on 3/7/2008 at 12:40 AM

Has anyone run into issues with unintended spaces showing up in the extracted text? Is there a way around this?

Comment 13 by Brett Bruschke posted on 3/19/2008 at 10:05 PM

Toby and Paul,

I was having similiar issues with the file size of the pdf being quite large. A one page PDF was coming out as 483 kb. In my document I was using a JPG for a header and a JPG for a footer by using the processddx command. The header was 150 kb and the footer jpg was 180 kb. The dimension of each of the jpg was somewhere around 2000 x 400. (I was given these images by another employee), but in my img tag I was specifying the height as 150px.

I ended up reducing the size of the jpg images to just a little more than 150 px in height and changed the file type to gif (I have found from other posts that cfdocument does not like jpg files) Both images combined are now under 30 kb.

The good news is that my pdf went from being 483 kb to 72 kb. Not sure if you are using images in your pdfs, but I thought this might help someone.

Comment 14 by Toby Dawes posted on 4/5/2008 at 6:49 AM

Brett,

No images in the first PDF, but I am now working on a project for my local church district website. This PDF does have some images, but even on pages that have no images I get this issue.

The new PDF that I'm having issues with is http://www.enynewesleyan.or..., which is 6.52MB - and I delete all pages but one which only has text and the PDF is still 6.52MB. I then download this, open in Acrobat and re-saved as PDF Optimized and now the file is only 67KB - this is a HUGE difference. This is a 133 page document, and 6.52MB for one page is just too big.

Paul,

who did you contact at Adobe? This seems to be to be a major issue since. Any additional help would be greatly appreciated. Cheers!

Comment 15 by Vinit Srivastva posted on 8/30/2010 at 7:54 AM

hello,

thanks for this blog...it helped me alot..just one question

is there any way we can use remote url of pdf for merge or thumbnail process..as it only works with local path..

any help would be appreciated..

thanks

Comment 16 by Raymond Camden posted on 8/30/2010 at 5:39 PM

Nope. They need to be local.

Comment 17 by Bharath posted on 8/17/2014 at 1:18 AM

Hi Raymond,

I tried the above logic, it seems to get the page correctly, but when I tried to display it using <cfpdf action="read" name="myDoc" source="page.pdf" />
<cfcontent variable="#toBinary(myDoc)#" type="application/pdf" />

It gave a response with special chars(few lines given below) instead of just displaying the PDF.

Comment 18 by Raymond Camden posted on 8/17/2014 at 1:42 AM

@Bharath: I edited your comment to remove the binary content you posted. Please do not do that.

Comment 19 by Raymond Camden posted on 8/18/2014 at 3:08 PM

If you save that data and open it via Finder/Explorer, does it work?

Support this Content!

Archived Comments

Webmentions