About a week or so ago Dave Ferguson (who has been leading up development of the mobile version of BlogCFC did some work to the "version service" used by BlogCFC installs. Whenever you first enter the administrator of BlogCFC the code does a http ping to a central service to check and see if a new version is available. (That by itself is not the point of this blog entry so if folks want an explanation of that, just ask.) Previously the code was static and I had to update it as part of my release cycle. Dave did some nice work to make it completely automated. But afterwards, I began to get an error from it.
I couldn't understand why. His code was simple. Check the cache (using cacheGet) and if it was empty, load up some data from RIAForge. Here is a snippet to give you an idea:
<cfset versionInfo = cacheGet("versionInfo")>
<cfif isNull(versionInfo) or structKeyExists(url, "reinit")>
<cftry>
<cfhttp url="http://www.riaforge.org/index.cfm?event=xml.userprojects&uid=15" timeout="5">
The data in the cache was a structure with 3 keys. The error I received was just for one key that mysteriously did not exist. If I forget a refresh on the cache (you can see there the URL hook to do so), it immediately went away. Then - sometime later - the error would return.
Then it occurred to me. What if versionInfo as a cache name was being used elsewhere? I did a quick search of the code base for blogcfc.com (the marketing site, not the blog engine) and right away discovered that not only was there another bit of code using the same cache name, it was using similar code as well. It's why I ended up with a struct that had 2 of the same key names.
sigh
So how would you avoid this? At my current job, I like to call some times of code "red flag" issues. Basically, if you do X, you need to let the rest of the team know. That involves things like adding or updating JavaScript libraries, modifying Application.cfc, or even adding new Session variables. In a large application, if you use session.name to represent something, are you sure some other part isn't using "name" as well in the Session scope? In this particular case, a more precise naming scheme could have helped. I ended up changing it to versionSystemInfo, but I could have gone with "remoteSystemUpdateInfo". Yeah that's a bit long, but as you can see, I copy it right away into a local variable. This is one of those things where it may make sense to keep a simple text file in your project directory. Every time you make use of the cache, list the name and possibly the reason.
What about a runtime utility like cacheGetIds? That's helpful, but it's also possible you have a CFM that hasn't been run yet. There may be a 'branch' of your code that isn't run often (and therefore caches itself) that may be be missed by a call to cacheGetIds. (Although to be clear, I strongly recommend using the functions that inspect your cache. They can help monitor how large your cache is and - more importantly - how effective it is.)
Anyway - I hope this helps.
Archived Comments
Ray, since Ehcache doesn't support name spaces for cache keys, one suggestion I give in my caching talks is to simulate them in order to help minimize the potential for key collisions like you described here. In your example, I'd suggest something like this:
admin_foo
user_foo
comment_foo
entry_foo
...
The idea is to group the types of items you're caching so you don't have to remember whether a particular key was used throughout the program - just in the specific domain.
A second option is to use a different cache (CF 9.0.1 supports custom named cache regions in most of the tag functions as well as cfcache). There is minimal overhead with having multiple caches, although for something like a small blog, it's probably overkill.
Good point on the naming scheme - and that could also apply to Session variables as well.
I keep forgetting about regions. Too bad you have to edit an XML file first. I mean - not that that is a big deal, but it would be cool to be able to define the region in onApplicationStart.
How to get "real" cache names through cacheGetAllIDs()?
Always returns names in uppercase, which is ugly to read. Is there some kind of getMeta() or something?
CF901 added a way to get the underlying Java object. In theory you could inspect it there. To be honest, seems like a lot more work than it's worth. I'd just lcase the keys. :)
In CF 9.0.1 you don't need to edit the xml config file (ehcache.xml) in order to create a custom cache/cache region. It's now built into the functions:
cachePut('foo', 'bar', 'myCustomCache')
cacheGett('foo', 'myCustomCache')
@Patrick, cacheGetAllIDs() is evil. It's fine if you only have a fed hundred (or thousand) items in teh cache, but if you have a cache with hundreds of thousands or millions of cache entries, you don't want to call that function.
Rob, the docs said this:
"Edit ehCache.xml (cfroot/lib)to set the properties for user-defined caches as shown in the following example"
I read it as required. So are you saying that if I use cache region FOO and don't define it, it will use some default set of properties? And if I want to tweak it, THEN I go to XML? (If so, I'd still argue that it would be cool to define the props via CFML.)
Ray, the docs are wrong. You don't need to define it first in the ehcache.xml file. When you use it in a function the first time, it automatically gets set for you using the default cache settings from ehcache.xml. If you want to use a different cache config from the default, you have two options (cacheSetProperties() doesn't work for custom caches):
1. Hard code the congig in ehcache.xml
2. Use my cacheNew() UDF to create the custom cache with customizable properties. I keep forgetting to release it. I'll do a quick blog post when I get to work, then send it over to cflib as well.
Interesting. Does your cacheNew end up updating the XML file or is it 100% virtual? If virtual, would it need to be run in onAppStart?
@Ray, my UDF doesn't write out to the ehcache.xml file. It creates the cache at runtime, just like CF does for the built-in template and object caches. If you were to write a new cache region to the ehcache.xml file, you would have to restart CF for it to take effect.
To answer your 2nd question, the answer is it depends ;-) In the case of blogCFC, that's definitely an option. You could first do a check to see if the cache exists, and if not, create it. It all depends on your use case and how you want to config it. If you're happy with the values set by default, then you wouldn't need to create a new custom cache on application start - you could just reference your custom cache name when you do your gets/puts. On the other hand, if you do want to tweak the cache config, you can go the hard coded xml route, which is fine if you have control over your server. However, since blogCFC is really a packaged app, I think it should be as self-contained as possible and shouldn't rely on having to make changes to the xml congig file. So in that case, perhaps having the check/initialization on app start makes the most sense.
That sounds pretty reasonable. Thanks Rob.
@Patrick, RE getting the case of the key that you set in the cache. Unfortunately, you can't. ColdFusion is uppercasing your cache keys when it inserts them into ehcache. This is unfortunate for a number of use cases, but I understand why Adobe did it - ColdFusion is for the most part case insensitive. Ehcache itself is case sensitive, so "Foo", "foo", and "FOO" are all unique keys. Adobe, in an effort to keep things easy for *most* ColdFusion developers converts all keys to upper case when it does a cachePut(). It does the same thing on a cacheGet() as well. I've discussed this with the ColdFusion team in the past and this is something they're looking at for a future release of CF, but there are no commitments as of right now.
I'd love - seriously - love - to hear an ehcache engineer explain why someone would _want_ Foo and FOO to be 2 unique keys in a caching system. I've yet to find anyone who prefers crap like this anywhere -whether it be code or a file system.
<cf_soapbox>
@Ray, It's not really an ehcache thing, it's more of a Java thing. Everything else in Java is case sensitive, so Ehcache just works the same way (since it's written in Java as well). What I'd like in CF is a simple switch to allow me to turn on/off case sensitivity when dealing with cache keys as well as other parts of CF.
Good point. I'll happily blame Java then. I mean seriously - has anyone ever met anyone who like case sensitivity for things like this?
And of course, it's probably not a Java thing...Java's roots are in unix, and unix is all about case sensitivity ... and I never understood why either (even after having worked it for 15 years) ... the best explanation I've found went something like this:
"The primary reason is that programmers in the late 1960?s & 70?s for C and Unix decided to optimize the compilers and parsers. At the time computers were much slower and it was faster to compare identical strings rather then normalizing the upper/lower case of the strings. Back in 1969 when there were was no such thing as personal computers this optimization made a lot of sense." [Greg Raiz, Raizlabs]
One thing that has worked will in the past for me is to setup a cache facade or a cache service. Then when i set my cache values i actually include a 3rd argument.
setCacheValue(chacheKey,cacheValue,getCurrentTemplatePath())
Then inside the service every time that i set a value I log the key, the time it was set, and the .cfm or .cfc that set the key. It makes debugging issues much easier to just dump the log.
Anyway, that just a technique that has worked will for me.