Proof of Concept - Turning HTML into an Image

A few days ago a reader (Andrew Duvall) and I exchanged a few emails about ways one could turn HTML into an image. He wanted to make use of ColdFusion's built in rich text editor as a way for someone to write HTML that would then become an image. I recommended making use of cfdocument. cfdocument can turn HTML into PDF and with the cfpdf tag you can then turn a pdf into an image. He took that advice and ran with it. Here is the simple POC Andrew created. It works - but has one slight drawback.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> </head>

<body bgcolor="#C0C0C0">


<cfparam name="tdata" default='<span style="font-family: Arial;"><strong>This</strong> <em>is a</em> <strong>test </strong><u>to</u> <span style="color: rgb(255, 153, 0);">see </span><strong>what </strong>is to <span style="color: rgb(255, 0, 0);">happen</span></span>'>
<cfset tdata = replace(tdata,'<p>','','all')>
<cfset tdata = replace(tdata,'</p>','','all')>
<cfif isdefined('sbt')>
<cfsavecontent variable="PDFdata">
<div style="border:1px solid blue; background-color:none; font-size:12px; width:60px; height:80px;"><cfoutput>#tdata#</cfoutput></div>
</cfsavecontent>
<cfdocument format="pdf" name="data" scaletofit="yes"><cfoutput>#PDFdata#</cfoutput></cfdocument>
<cfpdf source="data"
pages="1"
action="thumbnail"
destination="."
format="png"
transparent = "true"
overwrite="true"
resolution="high"
scale="100"
imagePrefix="Andrew"
>
<cfscript>
thisImg = ImageRead( expandPath('.') & "/Andrew_page_1.png");
ImageSetAntialiasing(thisImg,"on");
ImageCrop(thisImg, 44, 48, 62, 80);
ImageWrite( thisImg, expandPath('.') & "/Andrew_page_1.png");
</cfscript>
<cfimage action="writeToBrowser" source="#thisImg#">
</cfif>
<cfform>
<cftextarea richtext="true" name="tdata"><cfoutput>#tdata#</cfoutput></cftextarea>
<input name="sbt" type="submit" value="submit"  />
</cfform>

</body> </html>

Reading from the bottom up - you can see it begins with a simple form that makes use of the richtext version of cftextarea. He defaulted the value with some basic HTML (see line 9). On submit the magic happens. First he wraps the HTML with a div. Let me come back to the that in a minute. He then puts the HTML into the cfdocument tag to create the PDF. Notice he saves it to a variable. There isn't any need to save it to the file system.

Next he uses cfpdf to convert the pdf into an image. Notice the scale is set to 100. This creates a full scale representation of the pdf. After making the image he does a crop on it (and again, we will come back to that) and then outputs the result. Here is a screen shot of it in action:

So - what's with the div and the cropping? The technique I originally proposed worked - but left a large white expanse around the rendered image. Since the HTML provided didn't take the whole "page", the resulting image had a lot of white space around it. For his purposes, he was ok with using a div wrapper to set the result to a sized box. He could then crop to that when he got the image.

To make this totally cool you would need to find a way to make that more generic. I'm not sure how one would do that outside of literally scanning the pixels and removing the blocks of whiteness. You could make the process 3 steps - and use the second step as a preview (with just HTML) so that the user can specify his own height and width.

Archived Comments

Comment 1 by Josh Tischer posted on 10/1/2010 at 10:23 PM

Could't you use javascript before the submission to get the height and width of the div container? then pass that along so the cropping can be appropriate.

Comment 2 by Andrew Duvall posted on 10/1/2010 at 10:57 PM

@Josh
yes and no,
you could do that for the cropping part, which is the 2nd of two things going on here. The first part however, which is turning the HTML into a PDF in the first place also needs some way to know the the height and width of the content in the PDF being generated via the cfdocument tag.

Comment 3 by Andrew Duvall posted on 10/1/2010 at 11:24 PM

disregard my last comment. My head was somewhere else. yes if you already had a Div existing on the page, yes you could get the Height and Width and pass that into both the cfdocument tag as well as the imageCrop() function.

Actually, If i could only find a way to get around the forced white background issue, i was going to take this prototype and put it into a remote function for additional ajax stuff; so Height and Width would have been attributes passed in via JS like you suggested.

Comment 4 by Josh Tischer posted on 10/1/2010 at 11:48 PM

what about rendering the content in canvas first, then you could have a transparent png if you wanted..

quick googling..
http://ajaxian.com/archives...

Comment 5 by Andrew Duvall posted on 10/2/2010 at 8:10 PM

@Josh,
that canvas project seems quite interesting and in my brief testing, i verified that in fact the background is transparent. However, I felt it was lacking some necessities... For example, the demo doesn't work in i.e7 or i.e8. and it looks like currently the only html tags supported for parsing are the following tags: HTML, BODY, P, B, and SPAN.

the fck_editor generates <strong> instead of <b> and many other tags. So, to implement this type of solution would require building onto the list of tags supported and I'm not even sure how the css styles would come into play.

For me, I'm going to shy away from this since it doesn't seem cross-browser compatible (FF 3.5+ and Webkit based browsers only)

Comment 6 by Edy Ionescu posted on 10/4/2010 at 3:37 PM

This would be really cool if it could be extended to webpage screenshot generation. Unfortunately, grabbing the HTML source code isn't enough if the list of cfdocument supported CSS styles is not complete.

Comment 7 by Raymond Camden posted on 10/4/2010 at 3:53 PM

I've actually blogged this before. cfdocument supports sourcing to a url. Makes it pretty simple. As you say - it isn't "modern browser" compat, but it looks close enough.

Raymond Camden

Proof of Concept - Turning HTML into an Image

Support this Content!

Archived Comments

Webmentions