Every needed to extract a page from a PDF document? Yesterday I blogged my new little CFC called PDFUtils. The idea was to take the power of CFPDF and wrap up some utility functions. The first function contained a simple getText() utility that would return all the text in a PDF.
Today I added getPage(). As you can guess, it grabs one page from a PDF. How? Well CFPDF doesn't support getting one page, but it does support deleting pages. All I did was add logic to "flip" a page number into a delete order. This then lets you do:
<cfset pdf = createObject("component", "pdfutils")>
<cfset mypdf = expandPath("./paristoberead.pdf")>
<cfset page2 = pdf.getPage(mypdf, 2)> <cfdump var="#page2#">
<cfpdf action="write" source="page2" destination="page2.pdf" overwrite="true">
Running this gets you a dump of the PDF object and a new file named page2.pdf that is just - you guessed it - page 2.
I've reattached the code plus sample files and PDFs.
Can't we achive that using this code(get the particular page of a source pdf)
<cfpdf action="merge" source="sourcefile1.pdf" pages="#n#" destination="destination.pdf">
I'll test that. If so - my code is also one line, but your version wouldn't need the conversion done on the page numbers, and would allow folks to get N pages, not just one.
Ok I tried this - and it threw an error:
The attribute source specified in the CFPDF tag is either empty or invalid. <br>The error occurred on line 31.
I tried this line:
<cfpdf action="merge" source="arguments.pdf" pages="#arguments.page#" name="result">
I have tested the below code. It works.
<cfpdf action="merge" source="source1.pdf" pages="7" destination="desti1.pdf" overwrite="true">
Please test the above code and let me know how it works for you.
Ah - I see the problem. You are writing to the file system. I don't want to do that. I want to return a PDF variable to the user. Then they can save/output/whatever.
This is added to RIAForge:
Ray, this is great! I do have a question. I have been running this against a 4399 page PDF (http://senate.state.ny.us/S...
/B5A372B72DB18C75852572C3006068E2/$file/2007cpf.pdf?OpenElement) and I'm grabbing out page 5, for example. When <cfpdf> writes this single page back the file size is about 381kb, but if I do this same thing directly in Acrobat 8, open full document and simply delete all pages but the one I want, the single page size is a measly 13kb. What's with that? An issue with the PDF write in CF8? Very frustrating for the system I'm creating as this is a space issue on the server now. Having 57mb of single PDF pages is much better than 1.6gb of single pages!
Any insight into this? Cheers.
Sorry, I guess that link was too long, but try this one:
No idea, Toby.
Well Toby, I'm experiencing the same thing, starting with a 125-page pdf that is 777kb, each individual page created by cfpdf is 518kb. I contacted Adobe, and they asked for PDFs to duplicate the issue. Since my pdf contains payroll data that can't leave our company, I provided a link to your comment and PDF. If I hear anything more I'll keep you in the loop.
Has anyone run into issues with unintended spaces showing up in the extracted text? Is there a way around this?
Toby and Paul,
I was having similiar issues with the file size of the pdf being quite large. A one page PDF was coming out as 483 kb. In my document I was using a JPG for a header and a JPG for a footer by using the processddx command. The header was 150 kb and the footer jpg was 180 kb. The dimension of each of the jpg was somewhere around 2000 x 400. (I was given these images by another employee), but in my img tag I was specifying the height as 150px.
I ended up reducing the size of the jpg images to just a little more than 150 px in height and changed the file type to gif (I have found from other posts that cfdocument does not like jpg files) Both images combined are now under 30 kb.
The good news is that my pdf went from being 483 kb to 72 kb. Not sure if you are using images in your pdfs, but I thought this might help someone.
No images in the first PDF, but I am now working on a project for my local church district website. This PDF does have some images, but even on pages that have no images I get this issue.
The new PDF that I'm having issues with is http://www.enynewesleyan.or..., which is 6.52MB - and I delete all pages but one which only has text and the PDF is still 6.52MB. I then download this, open in Acrobat and re-saved as PDF Optimized and now the file is only 67KB - this is a HUGE difference. This is a 133 page document, and 6.52MB for one page is just too big.
who did you contact at Adobe? This seems to be to be a major issue since. Any additional help would be greatly appreciated. Cheers!
thanks for this blog...it helped me alot..just one question
is there any way we can use remote url of pdf for merge or thumbnail process..as it only works with local path..
any help would be appreciated..
Nope. They need to be local.
I tried the above logic, it seems to get the page correctly, but when I tried to display it using <cfpdf action="read" name="myDoc" source="page.pdf" />
<cfcontent variable="#toBinary(myDoc)#" type="application/pdf" />
It gave a response with special chars(few lines given below) instead of just displaying the PDF.
@Bharath: I edited your comment to remove the binary content you posted. Please do not do that.
If you save that data and open it via Finder/Explorer, does it work?