I've been working on performance updates to ColdFusionBloggers over the past week or so - and the primary area I'm working on is the aggregator. One item that has been recommended to me by multiple people is to take a look at HTTP Conditional Gets. What in the heck is that?
Charles Miller has written an excellent blog post on it. I'd suggest reading it first. My short take on it is this:
HTTP Conditional Get is a way to ask a web server to return a document only if it hasn't changed. I simply tell the server some information about my last request, and the server will either return the full body, or a header saying nothing has changed.
Again - read Miller's post for more information. In order for this to work - the remote server has to support it of course - and has to return special information in the header for your requests. BlogCFC does this for it's RSS feed. So how can we use this in ColdFusion?
First off - you need to check for - and two headers: Etag and Last-Modified. If "result" is the result of a CFHTTP tag, this code would work:
<cfif structKeyExists(result.responseheader, "etag") and structKeyExists(result.responseheader, "Last-Modified")>
If we have it - we need to store it obviously. I'm using an Application variable:
<cfset application.urlcache[attributes.url] = structNew()>
<cfset application.urlcache[attributes.url].etag = result.responseheader.etag>
<cfset application.urlcache[attributes.url].lastmodified = result.responseheader["Last-Modified"]>
The attributes.url value is just the URL. So at this point - we have our content, but we've also stored the ETag and LastModified. Now what I'll do when I hit the URL again is to pass in the values:
<cfhttp url="#attributes.url#" method="get" result="result" timeout="10">
<cfhttpparam type="header" name="If-None-Match" value="#application.urlcache[attributes.url].etag#">
<cfhttpparam type="header" name="If-Modified-Since" value="#application.urlcache[attributes.url].lastmodified#">
</cfhttp>
Now here is the cool part. All I have to do is check the result header. If the status code is 304, it means nothing changed. If I dump the entire result, I will see no fileContent variable. This means my traffic was reduced quite a bit. If the status code was anything but that - it means either the content changed. I'd then re-update my application cache.
What rocks is that if the remote server doesn't grok this stuff - it doesn't matter. Your GETs will still work. In my unscientific testing in my local copy of ColdFusionBloggers, I think I found that about 40% of my blogs supported it.
So - in working on this code, I found a good article by Pete Freitag on the topic: If-Modified-Since and CFML Part II. One interesting thing about his code is that he only works with one header value: If-Modified-Since. I asked him about that and he said he would respond on the blog. (He is busy now so it may be a bit.)
Later tonight I'm going to share a simple CFC that shows a way to wrap up this logic so you can do: contents = mycfc.get(someurl) and let the CFC worry about it.
Archived Comments
Hey Ray,
Yes, so the HTTP RFC doesn't mention ETag as being required when they mention If-Modified-Since, but it is probably a good idea to use it as a general rule of thumb.
It's probably not an issue when you are dealing with RSS feeds, but if you are fetching a page that may be different for different users, then you would want to use Etags. Because the etag is kind of like a unique id for the content of the page, and the last modified date is simply a timestamp, the issue arrises if there is more than one version of the page. So it's not a big deal with RSS, but in general it's probably a good idea to use them.
The way I understand it (the relationship between the if-modified-since and etag headers) is that the two headers are entirely independent of each other. They are simply two different mechanisms for allowing user agents to cache HTTP responses and perform conditional get requests to check if their cache is outdated.
There is no requirement that they be used together. Therefore, you will likely find user agents and HTTP servers that support one or the other method, or both.
As Pete alluded to in his comment, there are some subtle advantages and disadvantages to the two methods. If-modified-since is an easy method that works well when serving files that have a "modified" date. When dealing with more abstract data, such as dynamic pages, etags might make more sense since you're assigning a unique identifier to the data and don't have to worry about calculating some sort of pseudo-modification date.
Hi Ray,
I was just wondering if you made the cfc you mentioned in this post. I searched for it but I couldn't find it.
Wow, to be honest, I don't think I did write it. Maybe I was planning on it. At minimum, the code for CFB is up on my GitHub account. You can see code using this technique in action there.