Better handling of missing files with your web application

This post is more than 2 years old.

John sent in an interesting topic to me:

Problem: User types in the wrong address. Your site generates a 404 error, and calls your custom coldfusion 404 handler.

Solution: Perform a smarter suggestion for possible page matches. This will work very much like a full-text search engine would auto-suggest words. The custom handler would need to match "conatct" with "contact."

I'll bet we could dig into java to do some sort of dictionary lookup somewhere!

I think this is an absolutely great idea, and it touches on something I've blogged about before. It's pretty trivial to write a 404 handler with Adobe's web application product. The following script will send any CFM request it can't handle to a 404 page:

component { this.name="missing";

public boolean function onMissingTemplate(string targetpage) {
	location(url="404.cfm");
	return true;
}

}

This by itself would be an improvement to most sites (shoot, evne mine). But by itself you are missing out on a lot of opportunities to actually - you know - help the user find what they want. So for example, I could easily add a quick log:

component { this.name="missing";

public boolean function onMissingTemplate(string targetpage) {
	writelog(file="404",text="#arguments.targetpage#?#cgi.query_string#");
	location(url="404.cfm");
	return true;
}

}

And then periodically check the log file for common issues. Let's say we see cases of what John used an example. We could easily handle it like so:

component { this.name="missing";

public boolean function onMissingTemplate(string targetpage) {
	//handle some common ones...
	if(listLast(arguments.targetpage,"/") is "conatct.cfm") location(url="contact.cfm");
	writelog(file="404",text="#arguments.targetpage#?#cgi.query_string#");
	location(url="404.cfm");
	return true;
}

}

Now - what you probably don't want is a giant set of IF statements, or even a switch statement. That can get messy pretty quickly. John suggested a dynamic based approach. You could - in theory - keep a list of files and see if any are "close" to the request. (Perhaps using levDistance.) But this is something you would want to cache heavily.

To me the critical thing here is this: Do you have a good understanding of how people are using your site? What things are they requesting that are not being found? Did CNN link to your site and screw up? You're going to have a lot more success handling it yourself than getting CNN to fix it probably. What are people searching for on your site? I just searched for xbox360 on Sony.com and the results were pitiful. Why not provide a link to a comparison between the PS3 and the XBox? Why not show a list of PS3 exclusives? But most of all - is there someone who is making it their job to see what's being searched for and actually respond to those requests.

This isn't a code issue at all. (Although certainly code can help us generate and report metrics.) It's a basic "Site Awareness" that far too many of us are lacking in. (To be fair, in some companies you have to beg for basic QA!) As I said, this is something I've blogged about before, and it's something I think about when I can't sleep. I'd love to get some comments from folks who are dealing with this - or at least thinking about dealing with this today.

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by Tim Garver posted on 3/11/2011 at 8:06 PM

This is along the same lines as I was doing with my 404 rewrite http://cf404rewrite.riaforg...

Ray the only problem here is that you are assuming people are going to ask for a .cfm file. so the handler would not pick it up if it were a .htm or even a folder request /xbox360/

If you setup a custom 404 on your web server, then you could also possibly use Solr to search your content folder for possible results and display a list of suggested pages to the user.

I bet you could even write into each page, its possible misspelled name variants or meta data that it could be searched against and then write a custom search that would parse each of the content pages and read through these meta tags for results. that way you could control the pages that you wanted to search against.

Anyway just some ideas there.

Tim

Comment 2 by Raymond Camden posted on 3/11/2011 at 8:08 PM

Tim - yeah - sorry - I shoulda mentioned the code above was CFM only, and it makes sense to handle it web server level too.

The idea of metadata in the file is pretty darn interesting.

Comment 3 by Lola LB posted on 3/11/2011 at 8:10 PM

I suspect this is where user interface experience comes in . . . might well be worth shelling out some $$$ to hire a UX consultant to take a look at your mission-critical site.

Comment 4 by Jim O&aposKeefe posted on 3/11/2011 at 8:15 PM

Another strategy would be to send them to the site map. Then they can see if what they were looking for exists on the site. I dislike half-baked AI schemes that try to guess what I want. Once you're logging the failed requests I would use mod rewrite to redirect the most common ones.

Comment 5 by Geoff posted on 3/11/2011 at 8:15 PM

Make sure you add in your statusCode to the location function! You'll want to permanently redirect (301) to the correct page so crawlers etc don't hang on to the old incorrect missing page URL.

Comment 6 by David Hammond posted on 3/11/2011 at 8:59 PM

One interesting approach that I have been meaning to explore is to use the http referrer to customize the 404 error page based on the source of the bad link. This also allows you to fire off an email if the bad link is internal to the site. This article gives an overview of the different options:
http://www.alistapart.com/a...

Google has a widget that you can put on your 404 page that attempts to find the closest match to the requested page. I haven't used it and I'm not sure if it's really supported any more, but it's an interesting option:
http://googlewebmastercentr...

You can also just go with something fun!
http://mashable.com/2011/01...

Comment 7 by JP posted on 3/11/2011 at 9:23 PM

I cache a site map cfc component in my application scope that has a search () method in it. On the web server (I use IIS), I point my 404 errors to 404.cfm, which looks at CGI.query_string to see what was requested. I also call 404.cfm (using cfmodule) from the CF onMissingTemplate method, passing it the requested page as an attribute.

The fun part happens next... I tokenize the request and then use the tokens to search my site map for possible matches. This works pretty well when spelling is correct, but I didn't get any matches for "conatct." A good 404 handler would offer "contact" as a possible match for "conatct."

To solve this, I improved my 404 handler by first asking Google to expand my token list, by returning spelling suggestions for each token, and then doing a sitemap search. This works incredibly well and is fast.

Here's a code example that calls the google suggest api and uses jquery and a CFC that I wrote to make calls into Google. It includes a download of all the source files and is very straight forward. Feel free to use it however you want.

http://code.redtopia.com/ex...

Comment 8 by existdissolve posted on 3/11/2011 at 10:47 PM

"...Adobe's web application product"

Just curious...any particular reason for this choice of words? :)

Comment 9 by Doug posted on 3/15/2011 at 7:54 PM

I seem to recall this happening earlier this year when people turned to their favorite search engine and asked, "What time is the superbowl?" The NFL foolishly did not put that info anywhere useful on their website so they weren't even in the top ten hits.

Can you guess who got to answer the question at #1? Yep, Huffington Post.

Comment 10 by Doug posted on 3/15/2011 at 7:56 PM

Oops, I should mention that I got that info from a Search Engine Optimization article I read somewhere. Maybe it was the New York Times. It had very useful information about improving your site to attract the attention Google and other search engines.

(And of course by writing this I'm improving this blog's SOE value by 110%!)

Comment 11 by Doug posted on 3/15/2011 at 7:56 PM

SEO! Dammit. An edit option would be useful. :(

Comment 12 by Matthew Clemente posted on 9/19/2011 at 9:27 PM

Hey Ray,

Sorry for what might be a beginner comment on an older post, but where would I put this if I just wanted to log missing template errors for my application? As a cfscript in my missing template handler? or in my app.cfc?

Comment 13 by Matthew Clemente posted on 9/19/2011 at 9:41 PM

NM,

Found I can just use onMissingTemplate in the app.cfc

Thanks.