Posted in ColdFusion | Posted on 12-27-2008 | 5,200 views
It's been a while since I blogged about Transfer, but I finally got time to look into caching and performance issues with Transfer. I'm pretty impressed by what I've found. If I had known about some of this back when I was working on CFLib 2008, I probably would have done things quite a bit differently. (In fact, I may take a look at re-building things a bit and blogging about the changes.) Here is a quick summary of some of the things you can do to improve performance issues with Transfer.
First, let me quickly address caching in Transfer. Out of the box, Transfer is pretty smart. Imagine this scenario based on the Employee Directory sample I've used for my other entries. You have some employee that has a benefit. The benefit has an id of 5. If you fetch benefits later and grab the benefit with the primary key of 5, Transfer knows that it loaded it already when it fetched the employee. It will grab the data from it's own cache instead of requerying the database.
Transfer will, by default, cache within itself. By that I mean it will cache everything within it's own factory object. This is called the "instance" caching. You have multiple other options for how Transfer will store it's cache:
- instance: Stores the data in the Transfer object. This is the default.
- application: Stores data in the application scope.
- session: Stores data in the session scope.
- transaction: Stores the data in the session scope, but notices changes made to objects across the board and will clear the object from the cache.
- request: Stores data in the request scope.
- server: Stores data in the server scope. This is useful for sharing a cache amongst multiple applications. I had a reader ping me about this just this past week.
- none: No caching. Were you able to guess that?
To configure caching, you edit your transfer.xml and define the cache you want to use. For example:
2 <defaultcache>
3 <scope type="none" />
4 </defaultcache>
5</objectCache>
Transfer can get really sexy here though. Along with specifying a default cache for the entire library, you can also specify caches for different classes of data.
2 <defaultcache>
3 <scope type="none" />
4 </defaultcache>
5 <cache class="employee">
6 <scope type="session" />
7 </cache>
8</objectCache>
This says: Turn off caching by default, but session objects will be stored in the session scope. Note that mixing caching types for objects that are related (like benefits to mployees) could cause, according to the docs, "indeterminate behaviour". (That sounds like a euphemism for 'your code will take a crap and die.')
Now let's take a look at another feature - lazy loading. If you remember the Employee object used in the earlier blog posts, there were relations defined to benefits, departments, and positions. That means each time you grab an Employee object, Transfer has to load all that related data. That could, over time, slow down Transfer's retrieval of data. Benefits is a perfect example of something that - due it's very nature, is probably something you won't need to display very often. You certainly don't want the whole company to know that Bob is using his hair replacement benefit. It takes all of two seconds to modify the XML and tell Transfer to be lazy with benefits:
2 <link to="employee" column="employeeidfk"/>
3 <link to="benefit" column="benefitidfk"/>
4 <collection type="array">
5 <order property="name" order="asc"/>
6 </collection>
7</manytomany>
Compare the memento dump and how it changes. With lazy not enabled, benefits is returned automatically:

With lazy=true, the object is now slimmer:

What's awesome though is that as soon as you need the data, Transfer has no trouble getting it. Consider:
2<cfdump var="#emp.getMemento()#" label="Employee Memento">
3
4<cfif structKeyExists(url, "showbenefits")>
5 <cfdump var="#emp.getBenefitsArray()#" label="Benefits">
6</cfif>
7<cfabort/>
Running this template, if I simply add showbenefits=paris to the URL, Transfer will fetch the benefits data. A more real world example would be code that checks the current user's security level. If they were an administrator, it could show the employee's benefits, otherwise the data is hidden.
Ok, so this works well and is fairly easy to add to an application. There is one small problem though. As soon as you decide to get benefit data, Transfer is going to load all the data. In a scenario with users and orders where one user could have thousands of orders, lazy loading alone wont cut it. (And this is exactly the issue I ran into with CFLib.) For users under ColdFusion 8, Transfer adds a feature called object proxies.
What are object proxies? Taking out Employee object, we can turn the positions data into proxies by simply adding proxied=true to the XML:
2 <link to="position" column="employeeidfk" />
3 <collection type="array">
4 <order property="startdate" order="desc" />
5 </collection>
6</onetomany>
Now something interesting happens. When we get positions (getPositionsArray) from an Employee object, Transfer returns an array of TransferProxy objects. These objects contain the ID of each position, but nothing else. The object will remain "shallow" until we actually use it. A good example of this is pagination. Imagine getting 2000 Order objects back. We could page through these orders using normal pagination code and only fetching information for objects in the current "slice" of data that we care about. Here is a simpler example. While an employee may have a long history of positions, we will keep things simpler by just displaying his last position. The following code sample shows this in action, along with some debug information about the other positions:
2<cfdump var="#emp.getMemento()#" label="Employee Memento">
3
4<cfset positions = emp.getPositionsArray()>
5<cfoutput>Total positions: #arrayLen(positions)#<br/></cfoutput>
6<cfloop index="x" from="1" to="#arrayLen(positions)#">
7 <cfif x is 1>
8 <cfoutput>Current Job: #positions[x].getName()#<br/></cfoutput>
9 </cfif>
10 <cfoutput>position id = #positions[x].getID()#, isLoaded? #positions[x].getIsLoaded()#<br/></cfoutput>
11</cfloop>
Notice that we only run getName() on the first position returned. For every position we show the ID and report if it is loaded. This outputs:

Notice how the first object is reported as loaded while the second is not. Again, this is exactly the problem I ran into at CFLib.


Another great article!
This really helps me understand it much better than the docs.
I am wondering about stability though. In my own application I am writing, for some reason transfer seems to reload itself quite often. I have my cache setup under the Server scope and share it among 3 applications on the box.
I was also wondering about hundreds of transactions at the same time and how or if transfer uses some sort of locking mechanism? How does it handle multiple transactions on the same record?
Thanks again
Tim
The other big gotcha is the cross-scope issue that Ray alluded to (indeterminate behavior). What tends to happen if you have related objects cached in different scopes is that one end of the relationship expires and may be reloaded but the other end of the relationship still points to the "old" instance in memory. At that point, changes to the object will either update the old in-memory instance or the new in-cache instance depending on how you reference the object.
That's why you should never put a Transfer Object in session scope (or application scope). Put the PK in a shared scope by all means, but always get the object by fetching it via Transfer (if the object is in cache, it won't hit the database anyway).
As to the memory leak, Mark Mandel, Mike Brunt and I have been trying to track that down for a long time. We usually can't reproduce it on a test server, even under load, but we see it on our production servers. And, yes, we've tried all the obvious approaches and several non-obvious ones and we've even reconfigured a number of things that seemed to be unique to production in an effort to eliminate every possibility. It's quite intriguing (and rather annoying). It's also not clear whether Transfer is the culprit, to be honest. If we ever figure it out, we'll blog about it :)
What's really frustrating is my coworker has the exact same codebase and data on his sandbox and he is not seeing the same issue. Meanwhile my stack blows up and I get "500 out of memory" errors when I try to retrieve 50 transfer objects.
Sounds a lot like the issue you guys have been seeing. We are continuing to troubleshoot so if we find anything out I'll post back over here.