Transfer Caching and Performance Features

This post is more than 2 years old.

It's been a while since I blogged about Transfer, but I finally got time to look into caching and performance issues with Transfer. I'm pretty impressed by what I've found. If I had known about some of this back when I was working on CFLib 2008, I probably would have done things quite a bit differently. (In fact, I may take a look at re-building things a bit and blogging about the changes.) Here is a quick summary of some of the things you can do to improve performance issues with Transfer.

First, let me quickly address caching in Transfer. Out of the box, Transfer is pretty smart. Imagine this scenario based on the Employee Directory sample I've used for my other entries. You have some employee that has a benefit. The benefit has an id of 5. If you fetch benefits later and grab the benefit with the primary key of 5, Transfer knows that it loaded it already when it fetched the employee. It will grab the data from it's own cache instead of requerying the database.

Transfer will, by default, cache within itself. By that I mean it will cache everything within it's own factory object. This is called the "instance" caching. You have multiple other options for how Transfer will store it's cache:

  • instance: Stores the data in the Transfer object. This is the default.
  • application: Stores data in the application scope.
  • session: Stores data in the session scope.
  • transaction: Stores the data in the session scope, but notices changes made to objects across the board and will clear the object from the cache.
  • request: Stores data in the request scope.
  • server: Stores data in the server scope. This is useful for sharing a cache amongst multiple applications. I had a reader ping me about this just this past week.
  • none: No caching. Were you able to guess that?

To configure caching, you edit your transfer.xml and define the cache you want to use. For example:

<objectCache> <defaultcache> <scope type="none" /> </defaultcache> </objectCache>

Transfer can get really sexy here though. Along with specifying a default cache for the entire library, you can also specify caches for different classes of data.

<objectCache> <defaultcache> <scope type="none" /> </defaultcache> <cache class="employee"> <scope type="session" /> </cache> </objectCache>

This says: Turn off caching by default, but session objects will be stored in the session scope. Note that mixing caching types for objects that are related (like benefits to mployees) could cause, according to the docs, "indeterminate behaviour". (That sounds like a euphemism for 'your code will take a crap and die.')

Now let's take a look at another feature - lazy loading. If you remember the Employee object used in the earlier blog posts, there were relations defined to benefits, departments, and positions. That means each time you grab an Employee object, Transfer has to load all that related data. That could, over time, slow down Transfer's retrieval of data. Benefits is a perfect example of something that - due it's very nature, is probably something you won't need to display very often. You certainly don't want the whole company to know that Bob is using his hair replacement benefit. It takes all of two seconds to modify the XML and tell Transfer to be lazy with benefits:

<manytomany name="benefits" table="employees_benefits" lazy="true"> <link to="employee" column="employeeidfk"/> <link to="benefit" column="benefitidfk"/> <collection type="array"> <order property="name" order="asc"/> </collection> </manytomany>

Compare the memento dump and how it changes. With lazy not enabled, benefits is returned automatically:

With lazy=true, the object is now slimmer:

What's awesome though is that as soon as you need the data, Transfer has no trouble getting it. Consider:

<cfset emp = application.transfer.get("employee", 1)> <cfdump var="#emp.getMemento()#" label="Employee Memento">

<cfif structKeyExists(url, "showbenefits")> <cfdump var="#emp.getBenefitsArray()#" label="Benefits"> </cfif> <cfabort/>

Running this template, if I simply add showbenefits=paris to the URL, Transfer will fetch the benefits data. A more real world example would be code that checks the current user's security level. If they were an administrator, it could show the employee's benefits, otherwise the data is hidden.

Ok, so this works well and is fairly easy to add to an application. There is one small problem though. As soon as you decide to get benefit data, Transfer is going to load all the data. In a scenario with users and orders where one user could have thousands of orders, lazy loading alone wont cut it. (And this is exactly the issue I ran into with CFLib.) For users under ColdFusion 8, Transfer adds a feature called object proxies.

What are object proxies? Taking out Employee object, we can turn the positions data into proxies by simply adding proxied=true to the XML:

<onetomany name="positions" lazy="true" proxied="true"> <link to="position" column="employeeidfk" /> <collection type="array"> <order property="startdate" order="desc" /> </collection> </onetomany>

Now something interesting happens. When we get positions (getPositionsArray) from an Employee object, Transfer returns an array of TransferProxy objects. These objects contain the ID of each position, but nothing else. The object will remain "shallow" until we actually use it. A good example of this is pagination. Imagine getting 2000 Order objects back. We could page through these orders using normal pagination code and only fetching information for objects in the current "slice" of data that we care about. Here is a simpler example. While an employee may have a long history of positions, we will keep things simpler by just displaying his last position. The following code sample shows this in action, along with some debug information about the other positions:

<cfset emp = application.transfer.get("employee", 1)> <cfdump var="#emp.getMemento()#" label="Employee Memento">

<cfset positions = emp.getPositionsArray()> <cfoutput>Total positions: #arrayLen(positions)#<br/></cfoutput> <cfloop index="x" from="1" to="#arrayLen(positions)#"> <cfif x is 1> <cfoutput>Current Job: #positions[x].getName()#<br/></cfoutput> </cfif> <cfoutput>position id = #positions[x].getID()#, isLoaded? #positions[x].getIsLoaded()#<br/></cfoutput> </cfloop>

Notice that we only run getName() on the first position returned. For every position we show the ID and report if it is loaded. This outputs:

Notice how the first object is reported as loaded while the second is not. Again, this is exactly the problem I ran into at CFLib.

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by Tim Garver posted on 12/28/2008 at 3:19 AM

Hi Ray,
Another great article!

This really helps me understand it much better than the docs.

I am wondering about stability though. In my own application I am writing, for some reason transfer seems to reload itself quite often. I have my cache setup under the Server scope and share it among 3 applications on the box.

I was also wondering about hundreds of transactions at the same time and how or if transfer uses some sort of locking mechanism? How does it handle multiple transactions on the same record?

Thanks again
Tim

Comment 2 by Raymond Camden posted on 12/30/2008 at 8:38 PM

Tim, I've not done "deep" Transfer testing. CFLib uses it, although the traffic there isn't too heavy. Broadchoice uses it for the previous version of our project, and if I remember right, there was a bit of a memory leak issue. I'm going to ping Sean to see if he could please speak to this a bit more.

Comment 3 by John Whish posted on 12/31/2008 at 1:30 AM

Object proxies = awesome! I didn't know about that feature of Transfer - very useful. Thanks for posting Ray.

Comment 4 by Sean Corfield posted on 1/12/2009 at 3:53 AM

There definitely are some gotchas with caching. The default is to cache everything forever. Naturally, if you have a large data set, you can fill the JVM heap with cached objects. What happens at that point (as far as I can tell) is that both Java *and* Transfer both start to perform their own garbage collection processes. Java is just doing its own regular thing, trying to find unreferenced objects to throw away to get more space. Naturally, Transfer's cache is all still considered "referenced". Meanwhile Transfer is busy trying to expire items from its cache in order to make room for newly added items. The fix for this is to ensure that you specify a cache expiry time (in minutes) for the default cache behavior. We use 10 minutes. However, we override the default for all of our "lookup table" objects (such as status, state etc) and allow those to be cached forever since they are never updated.

The other big gotcha is the cross-scope issue that Ray alluded to (indeterminate behavior). What tends to happen if you have related objects cached in different scopes is that one end of the relationship expires and may be reloaded but the other end of the relationship still points to the "old" instance in memory. At that point, changes to the object will either update the old in-memory instance or the new in-cache instance depending on how you reference the object.

That's why you should never put a Transfer Object in session scope (or application scope). Put the PK in a shared scope by all means, but always get the object by fetching it via Transfer (if the object is in cache, it won't hit the database anyway).

As to the memory leak, Mark Mandel, Mike Brunt and I have been trying to track that down for a long time. We usually can't reproduce it on a test server, even under load, but we see it on our production servers. And, yes, we've tried all the obvious approaches and several non-obvious ones and we've even reconfigured a number of things that seemed to be unique to production in an effort to eliminate every possibility. It's quite intriguing (and rather annoying). It's also not clear whether Transfer is the culprit, to be honest. If we ever figure it out, we'll blog about it :)

Comment 5 by Josh Nathanson posted on 3/5/2009 at 4:59 AM

I am seeing absolutely massive memory leak issues in my local installation. I get maybe 10 objects and I see my memory usage go up by approx. 100MB and not release on GC. I tried setting the objects to cache in the request scope and it didn't make any difference.

What's really frustrating is my coworker has the exact same codebase and data on his sandbox and he is not seeing the same issue. Meanwhile my stack blows up and I get "500 out of memory" errors when I try to retrieve 50 transfer objects.

Sounds a lot like the issue you guys have been seeing. We are continuing to troubleshoot so if we find anything out I'll post back over here.