Open Discussion - Organizing 19000 Queries

This post is more than 2 years old.

This one is a doozy. I thought my readers would enjoy it. From a reader, Doug: (And note - I removed the angle brackets from his code because I'm too lazy to replace them out with escaped entities.)

Our 12 year old source code pretty much hasn't been updated since CF6. While trash and start over comes to mind, we do not have the liberty to do so. Our plan is to make the code more maintainable by separating the display, business logic and database code into a pseudo framework with CFCs. We started with database queries since cfquery tags are easy to find, and we also have the immediate goal (requirement) of making the code database independent (currently Oracle, and they use a lot of built in functions, and proprietary join syntax). With ~34% of the queries duplicated among the CFMs (so far), we figured eliminating duplicates might be good start for the 19000+ queries in the whole project. We ran into a snag when we began to stuff these functions (10000+ lines) into a single CFC as it crashed some earlier versions of Dreamweaver we have on our outdated hardware. Our solution was to split our functions into separate CFCs, but alas, we had made a design blunder. We were using a single object to call all of our functions:

cfset application.database = createObject("component", "coldfusn.cfcs.database").init(settings)

cfset queryresults = application.database.function1(blah1, blah2)

cfset queryresults = application.database.function45(blah1)

cfset queryresults = application.database.function943(blah1, blah2, blah3)

One solution was to create a database object for each new CFC we used and change each function call to match:

cfset application.database = createObject("component", "coldfusn.cfcs.database").init(settings)
cfset application.database1 = createObject("component", "coldfusn.cfcs.database1").init(settings)
cfset application.database2 = createObject("component", "coldfusn.cfcs.database2").init(settings)

cfset queryresults = application.database.function1(blah1, blah2)

cfset queryresults = application.database1.function45(blah1)

cfset queryresults = application.database2.function943(blah1, blah2, blah3)

But none of us liked the prospect of figuring out which functions were in which files (since they were pretty much split arbitrarily). The solution we've been going with was to "chain" our CFCs together so any function can be called from a single object:

database.cfc
cfcomponent extends="database1"

database1.cfc
cfcomponent extends="database2"

database2.cfc

cfcomponent add extends if more files are needed

Then our existing CFM code works with unlimited CFCs. It's an abuse of class inheritance, but I didn't know of a better way to do it... until onMissingMethod().

So, the question is, would a solution involving onMissingMethod and isCustomFunction() to dynamically call the correct CFC be more or less elegant/efficient than inheritance chaining? Know of a better idea involving none of the above?

First and foremost - and I say this every time - but it bears repeating. Do not forget that there are always multiple ways to solve problems like this. I'm not saying that because ColdFusion is so versatile - it is - but this applies to all problems of this nature. In my discussions with Doug I know he gets this, but I try to remind my readers of this every time. There is no one way.

Ok - so I think the goal of moving into an MVC system makes sense. I'd probably look into an established framework over a peusdo one that I'm assuming you are rolling your own. Doug said he wasn't interested in a full rewrite, but there's going to be a lot of work involved in updating/creating these CFCs anyway. Something nice and simple like FW/1. I always tend to suggest an established framework over a homegrown one as it makes it easier to bring in new developers. Regardless of what framework you use though let's move on to your ideas on handling your queries.

Normally I'd say that any IDE crashing is not an excuse to change your code. That's like letting your 4 year old dictate your eating habits. However, 10K+ lines in a file is probably a bad idea anyway.

I think your biggest mistake though was to break them up numerically. Some in cfc 1, some in cfc 2, etc. Your workarounds may or may not be effective, but the root of your organization, in my opinion, is critically wrong. I'd look into dividing your queries into "areas of concerns." I'm not sure if there is a more appropriate OO way to describe that, but I tend to break up web sites into their AOCs to help me organize my CFCs. So if my site handled products, users, blog entries, and beer, I'd consider each of them a separate area of concern and would group my CFCs around them. At the simplest level, I could have a product CFC, a user CFC, a blog CFC, and a beer CFC. I'd imagine Doug's site has a way to group his CFCs as well. It may not be so evenly divided. Take Amazon for example. While they have users and reviews, products are obviously their biggest area. Based on that - you could further break up products into more specific product types.

Thoughts? I promised Doug some feedback on this so don't let me down folks. ;)

Raymond Camden's Picture

About Raymond Camden

Raymond is a senior developer evangelist for Adobe. He focuses on document services, JavaScript, and enterprise cat demos. If you like this article, please consider visiting my Amazon Wishlist or donating via PayPal to show your support. You can even buy me a coffee!

Lafayette, LA https://www.raymondcamden.com

Archived Comments

Comment 1 by Kerr posted on 12/22/2010 at 9:24 PM

CFC or method names like "function45" certainly don't explain anything about what's going on, so at a minimum that practice should be changed. I would go mad within minutes of encountering a system like this. Humans recall things in an associative manner, and utilizing numbers in this case runs counter to that tendency.

Doug basically wants to go to an MVC structure, yet not utilize any of the established frameworks out there. My experience in writing my own MVC'ish framework a few years ago tells me this is not a sound strategy. They will likely spend a lot of time on that piece alone, and in the end will expend more effort for less payoff in stability and longevity. There was a lengthy discussion on the FW/1 forum about framework choices, etc. It devolved to a philosophical and religious debate, but one of the more salient points made was that CF'ers tend to reinvent the wheel as opposed to contributing to pre-existing and viable community efforts. I would like to see this practice reversed and the community be more cohesive in its efforts. We would all be beneficiaries, plus CF would perhaps be seen as less "dead" by the world at large. ;)

I'm curious as to how many developer hours have been spent on the project thus far, and what is budgeted for the total project. Boiled down, whether they are willing to admit it or not, Doug's team is basically doing a rewrite. :)

Comment 2 by Peter Boughton posted on 12/22/2010 at 9:27 PM

Definitely agree having database4.function123 is bad.

For function names, I would probably instead go with "getQuery_[existing-query-name]" (until they can be manually fixed).

Before even considering areas of concern, I would split into individual CFCs - and to determine names:

1. Perform a regex search on the code for "FROM\s+[^\s,]+" and export results to a spreadsheet.
2. Group and sort these results, to find the most common primary table names.
3. Create individual CFCs for the top 90% and a misc.cfc for the rest (where 90% is increased if misc.cfc is too large).

That's then a better place to move forward from with further improvements.

Something important to remember:

Refactoring doesn't need to be done in one pass.

Comment 3 by Peter Boughton posted on 12/22/2010 at 9:34 PM

Also, whilst I fully agree that using an established framework is the best way forward, particularly for a huge codebase like this, I don't need it needs to be a first step.

I'd say it's fine to create a non-framework MVC structure initially, just by moving code about (i.e. not actually creating a custom framework, just cut and paste code/files into different locations and adding/updating cfincludes).

Then, once that's completed (and tested), the codebase is in a better position to be ported across to an actual MVC framework as a second step.

Again - it's all about incremental steps and not biting off too large chunks in one go that end up making you choke. :)

Comment 4 by Adrian J. Moreno posted on 12/22/2010 at 9:37 PM

DO NOT, DO NOT, DO NOT do the inheritance thing.

DO NOT use CFINCLUDE in your CFCs to beat the java compilation limitations or to reduce the amount of code you have in a single file.

I think you're missing the biggest opportunity you've got right now: stored procedures. Convert as many of the queries as you can into stored procedures and have your CFFUNCTIONs call them instead of inline queries.

Stored procedures get compiled on the database server and their execution plans are stored so they can run faster than a plain inline query. If you've got a lot of conditional CF code to create a WHERE clause, you can write that same functionality in plain SQL.

Also, you want a single reference to your CFCs? Break up your CFCs into groups like Ray suggests and you can then wire up how they're called. Look into the Factory design pattern or better yet, take a look at ColdSpring. It's a little more effort, but you'll be able to wire up how the CFCs need to be created and then have a single line of code to reference whenever you need to call a CFC and function.

http://www.robgonda.com/blo...
http://www.coldspringframew...

Using the factory patterm, your code ends up looking like this:

application.factory.getBean('someCFC').someFunction()

Comment 5 by jalpino posted on 12/22/2010 at 9:49 PM

I also agree that using a numerical structure to organize is a bad idea, along with having a colossal CFC for all queries.

Ray's suggestion of organizing the queries into logical groupings, even if it feels more difficult to find and reuse them later, is probably the best solution. I mean, no matter how you slice 19,000 queries it's going to be difficult to find an use the appropriate one, but by breaking them up logically it will help ease that pain provide a more maintainable solution.

Having said that, if Doug and his team are set on having one giant cfc I would recommend using a combination of @Peters naming recommendation (getQuery_[existing query name]) and mix-ins rather than inheritence to tie it all together. Doug is right, extending a ton of CFC's simply to bring them together is definitely the wrong use of object inheritence if the issue is just a matter of the IDE crashing.

Comment 6 by Kerr posted on 12/22/2010 at 10:15 PM

RE: stored procedures -- Given Doug's requirement of making the query code database agnostic, I'm not sure that stored procedures are a realistic end goal. The benefits of reusable execution plans can still be realized via the use of cfqueryparam.

Comment 7 by Doug (yeah, that one) posted on 12/22/2010 at 11:05 PM

All,
The function names "function1, function2" were examples. Rest assured the names are far more descriptive. Also, our CFCs were already broken up by module (or what Ray would call AOCs). I didn't mention this in the original post, but one of the drivers behind this is to make the code database independent.

The other problem we were intending to solve was the duplication problem since so far ~34% have been duplicates, just in different CFMs. So with our largest module being 5000+ queries, we would hope to only need ~3300 functions.

@Kerr - I would love an established framework. In fact, one project on this team already uses FW/1. The politics of my project are different however. With the project using FW/1, the developer was able to start using it from scratch. My project, not only has 4K+ files and 19K+ queries, but also has a VERY resistant manager to using a framework. The reasons are ego related, I suspect, but should be assumed to be non-negotiable.

My thought is that if we can start using something framework-like (without setting off framework alarm bells), our manager (and more importantly the other developers) will begin to see the value of a framework opening the possibility of going to a full blown existing MVC fw.

"Boiled down, whether they are willing to admit it or not, Doug's team is basically doing a rewrite."
I am fully aware, and that IS the intention, but unfortunately it is in a underhanded way. Resistance to change from management is the largest single obstacle.

@Peter - Agree completely with your incremental steps approach. I also like your ideas about how to split individual CFCs. Our approach had been to move queries first, then analyze what we had an organize appropriately, but grouping by table names makes a lot of sense.

@Adrian - Yeah, I totally agree, not best practice... As far as stored procedures go... I'm not sold. I think some views may be in order, but with the variance of stored procedure syntax between MySQL, MSSQL Server and Oracle, I'm not sure I want to commit. But then again, I asked for advice, so I will take it under advisement.

I am using a factory, but the problem still is getting the "beans" down to size. I apologize for not including this information in my email. I guess I thought I was helping the problem understanding by boiling it down, but the scale of the problem was lost.

Since we have something like this:
application.database.getInstance("library").functionXYZ()
Since the library CFC is too big, we needed to break it up into smaller files (like how Peter was suggesting, though we were doing it poorly). But that meant (because I'm was stupid) that we would need to modify our function calls to look a little like this:
application.database.getInstance("library").getInstance("addFiles").functionXYZ()

Conclusion:
The solution is to break our CFCs into even smaller AOCs and stop being a little girl and re-write the function calls. Right?

Comment 8 by Raymond Camden posted on 12/22/2010 at 11:45 PM

@Doug: You actually did mention in the email to me you were looking for DB independence - I didn't quote that above and I'm sorry if that breaks some context. I'm curious as to why you worry about that. I use ORM not because I can go from MySQL to SQL Server- I've done that maybe 2-3 times in the past 10 years - but to make working with my objects easier. Do you really see your org switching to a new DB anytime soon?

Finally - don't call yourself stupid. I've got plenty of code examples out there _right now_ which makes me shake my head. Old code is what it is. We all grow. If you can't look at your code from a year ago and find issues, then you've got a bigger problem. ;)

Comment 9 by Rob Brooks-Bilson posted on 12/22/2010 at 11:53 PM

This doesn't help you immediately, but in your future planning this is one case where ORM would help you immensely. The nice thing about Hibernate is that it abstracts database specific code (in most cases) allowing you to write your code once and have it work across multiple databases.

Comment 10 by Doug posted on 12/23/2010 at 12:03 AM

@Ray - We currently have Oracle, but we work for a client who has no problem paying $$$. Within the next 3 months (supposedly) we will have a new client who requires SQL Server (who knows). And simultaneously we are trying to market a low cost option using MySQL to yet more clients. So yes, database independence is a MUST, this other stuff is just gravy. I figured though, if I'm going to have to rewrite every outer join from "column1 = column2(+)" to standard "LEFT JOIN ... ON" syntax, I ought to get rid of the duplicates first, and in the process improve the code base.

Oh, and if you read what I said "because *I'm* was stupid" you would realize, I must be stupid. Who says "I'm was"?!?!

Comment 11 by Raymond Camden posted on 12/23/2010 at 12:04 AM

Ah - so you have a product - not just a general web site. I can then see how wanting to support N DBs makes sense.

Comment 12 by Doug posted on 12/23/2010 at 12:08 AM

@Rob & @Ray: We looked at ORM initially but -
1) Too much to re-write in 3 months
2) The section of the database we tried it on didn't map correctly to ORM. We've got some relationships that make me scream, like one that joins two tables like this:

substr(firstname, 1, 1) || substr(middlename, 1, 1) || lastname = username

One table has a PK of user_id the other has a PK of membId...

Comment 13 by Kerr posted on 12/23/2010 at 12:26 AM

@Doug - I hear you regarding management resistance. I'm going to take a wild guess and surmise that perhaps there is a manager who is/was once performing technical duties and therefore "knows" the best answer to everything. I'm lucky in that I'm not currently in that scenario.

Frameworks aside, I've successfully written large, RDBMS agnostic apps using ANSI compliant SQL without issue. The problem comes about when you get into DDL and having to write DB specific scripts due to inconsistencies between MySQL/Oracle/MSSQL/etc. As Rob says, this is an item where an ORM like CF9's Hibernate based solution may help you.

I'm still cracking up at "...and stop being a little girl and re-write the function calls."

Sometimes when I've met strong resistance, I have gone a dual route where I do what "they" think is right and then also pursue a parallel path with, say in this case a framework. Once I could demonstrate positive effects, usually buy-in was a breeze. This strategy obviously involves more personal effort on your part, and YMMV with your current political situation.

Comment 14 by Kerr posted on 12/23/2010 at 12:28 AM

"substr(firstname, 1, 1) || substr(middlename, 1, 1) || lastname = username"

Yeah, you'll definitely have trouble with ORM in this case. That must be a very fast performing join!

In recent years, and only because I have the complete trust of my management, I have pursued a DB level reorganization in cases like these. Your app will have a very difficult time eclipsing poor database design choices.

Comment 15 by Kevin Marino posted on 12/23/2010 at 1:01 AM

Even though this is an application, very much reminds me of a gov't job I was on. Same scenario and Oracle also. We did two things one incremental approach using a psuedo framework and at the same time planned for a new framework. In our case the entire app was going to re-build so had the opportunity to build from scratch.

In our case our team had various skill levels so the pseudo framework allowed a very basic grasp of MVC and then when we implemented Coldbox was easier for them to grasp.

Also frameworks are not the be all end all. Code organization is. Even a psuedo framework well organized will be able to be followed.

I have not delved into ORM yet and again it is no panacea either, after all you go to get some of those crazy Oracle queries in agnostic format first and traditional SQL functions fine.

Good Luck.

Comment 16 by RogerTheGeek posted on 12/23/2010 at 1:48 AM

Reminds me of a time I had to abstract an old application. The biggest issue for me was organizing the CFCs and UDFs so I could make sure I wasn't duplicating anything. It was a issue of bookkeeping / documentation. In the end, it was worth the extra work to document and organize things well. (How often do you hear that refrain?)

Comment 17 by Travis posted on 12/23/2010 at 1:52 AM

Ray,
"Old code is what it is. We all grow. If you can't look at your code from a year ago and find issues, then you've got a bigger problem." That needs to be a Facebook status. While you're there approve me. :)

Doug,
I'm curious how you're going to handle every db unique "feature". Joins seem to be the least of your problems. ROWNUM, DECODE, NVL, and other Oracle specific syntax have multiple solutions for different databases (assuming you're using any of them now).

Comment 18 by Steve Bryant posted on 12/23/2010 at 1:59 AM

With apologies for pushing my own (FOSS) wares, I think DataMgr is an excellent solution to add database independence to an existing app and simplify code. It returns queries by default, just like hand-written SQL queries, but produces database-independent code with little effort.

Much better (IMO) than ColdFusion ORM:
http://www.bryantwebconsult...

Comment 19 by Rachel posted on 12/23/2010 at 2:10 AM

Perhaps a good way to explain how to split up the functions by area of concern would be "by database table." Not that you necessarily need one DAO per table, and in some queries the "primary" table may be hard to recognize, but generally if you go about it with that approach you will be separating by "areas of concern". HTH!

Comment 20 by Doug posted on 12/23/2010 at 3:08 AM

@Travis: We've got a set of database independent functions which we call in lieu of the function. We've largely been able to find/replace our way out of that one, but the joins must be done manually.

The functions are actually quite like what is implemented in DataMgr:
if_else, case_when, concat, substring, length, trim, Lpad, Rpad, dateToString, replaceNullValue, substringStartIndex

You bring up an interesting point about rownum. Currently, none of our interfaces have paging (1 2 3 Next >). I know that I'll need to wrap a query in another query to accommodate offsets in Oracle and SQL Server, but what's the best way to unobtrusively do so? Not an immediate need, but it would be nice to know.

Comment 21 by Steve Bryant posted on 12/23/2010 at 3:26 AM

Doug,

Oracle and SQL Server 2005 can both handle offsets (Oracle with rownum, SQL Server with nasty nested SQL ROWNUM) - DataMgr abstracts both with a simple "offset" argument.

Rachel makes a good point about using primary table as a rough guideline for splitting up CFCs. Peter's regex should help with that.

Comment 22 by Doug posted on 12/23/2010 at 4:03 AM

Steve,

select *
from tbl
where rownum > 10 and rownum < 20

The above query (which I believe is what you are saying Oracle can do) will return 0 results. It can't work because rownum counts the records as they are displayed. For the same reason, you can't do an order by with rownum. You need subselects for it to work.

Comment 23 by Travis posted on 12/23/2010 at 5:02 AM

Oracle ranges are done like:
select col1, col2, col3
from (select rowNum r, col1, col2, col3
from tableName)
where r > 10 and r < 20

I believe they're 0 indexed too, but I can't remember 100% off the top of my head.

Comment 24 by Travis posted on 12/23/2010 at 5:50 AM

Ok I had time to look it up, it is NOT 0 indexed.

Comment 25 by Steve Bryant posted on 12/23/2010 at 10:14 AM

Doug,

Good to know. I will look at my code later. It could be I need to change my implementation up a bit. I rarely get a chance to test on Oracle lately so I may have missed testing that bit.

Comment 26 by Tim Cunningham posted on 12/23/2010 at 4:37 PM

I would think that ColdBox would be a good use case here. You could put all your current code into the views folder and the code would still work "as-is" ColdBox's feature of "virtual events" was created exactly for a case like this. Then you could go through each file one at a time move it out to the model and proper controllers, gradually. Maybe Luis Majano could describe it better than I can, but I have seen it done.

Comment 27 by Doug posted on 12/23/2010 at 6:27 PM

Some conclusions:
-function1() and functionX() are bad example function names
-managers can be slow to change
-don't abuse inheritance (cmon, I thought it was clever)
-CFCs need to be sub broken up by module, then by sub-module AOCs (possibly by database table)
-DB redesign would be an even better first step (but would probably get less traction than using a framework, see bullet 2)
-Move to a pseudo framework or use a framework without using a framework (@Tim) then sell the benefits
-Documentation of CFC organization is crucial
-People who say "I'm was"... well, you know

Thanks for all your input so far. I will certainly be sharing these "expert" opinions with my team and hopefully we can reach a consensus.

Comment 28 by Tim Cunningham posted on 12/23/2010 at 6:51 PM

The moving to a "pseudo-framework" first and then selling the benefits, will only work if the the move to the "pseduo-framework" goes smoothly. If it doesn't you may have just poisoned the "Framework Well" in the higher-ups mind and then any further mention of Frameworks will go poorly.

If you could drop your code as-is into a framework and it still work and then gradually start pulling in the benefits a given framework allows you, then the framework idea may not seem so intrusive. Plus you have already established that you are using a real framework already, and the "should we use a Framework" battle is already behind you.

I imagine you are having more than technical challenges, the biggest hurdles I find are the political fights.

Comment 29 by Luis Majano posted on 12/23/2010 at 9:37 PM

Just like Tim mentioned we implemented this specific use case in order to implement legacy applications into ColdBox. We have had great success with this feature and migrations have been done very very fast.

Here is a video to show it: http://vimeo.com/15260399

Comment 30 by Aaron Neff posted on 12/28/2010 at 3:35 PM

"Old code is what it is. We all grow. If you can't look at your code from a year ago and find issues, then you've got a bigger problem."

Love it! :) So true.

Comment 31 by Dan Fredericks posted on 12/28/2010 at 5:49 PM

Good Morning all,
Just wanted to try to get this going again after the holidays. I work with Doug and know we are fighting an uphill battle here in trying to implement anything newer than CF 6 type concepts. Management seemed to stop understanding CF around that time, and still runs things based around that. I implemented FW/1 for a project when I started, and management still had developers just drop code in the main folder...as we all know with frameworks, that does not work. So, trying to get anything "new" implemented is a real chore, so Doug thought maybe trying to just get a "framework like" setup could help us push our old school code into a better format and maybe that could get us moving towards a framework...if we mention a framework structure is a good way to go, that gets shot down right away with the my way is the right way and I am the manager and you will follow what I know approach. It is an uphill battle.
I really appreciate everything you all have posted here to help Doug and myself out. I think we have learned a lot of stuff here that we can use to try to organize our cfc's.
Did we really get a good sense of a great way to Extend all these cfc's we will create from a 10,000+ line cfc with 100+ function in it? we were just using extends to extend the group of cfc's...I am not sure if we really got a lot of info on a good way to do this. I am glad you are all helping us with our CFC breakup...I just hope we can convince the management and the "butt" kisser developers about these ideas.

Again, I'd really like to thank everyone especially Ray for posting this, and taking time out to help Doug and myself.

Hope everyone had a great Christmas weekend.
Dan