Jay asked an interesting question:
I have a question about how to do something for pulling data from a database using CF. I need to run a VERY large query over night and then have the data somewhere I can either query against the next day to get at it in smaller chunks or look at in a more manageable way. Do you have any thoughts on how that might best be accomplished?
Obviously the very first thing you want to do is ensure you have done everything possible to ensure that the initial data is as optimized as possible. I find it hard to believe that one query would take so long. I don't mean to belittle your SQL skills, but I know mine aren't that great and this is exactly the case where I'd bring in an experiences DBA.
That being said - let's assume that there really isn't anything you can do. I can think of a few ways to handle this.
The easiest solution would be to take your resultant query and insert it into a database. (Or even better, change your initial query so that it doesn't return the result, but simply save the results to the new table.) Once the data is in your database, you can perform queries against it.
Another option is to store the result in text. This text could be read by another program to perform stats. This will definitely be slower then using the database, but depending on how you want to use the data, it might be an option as well.
Archived Comments
If the resultset isn't large, just the dataset that the query has to sort through, then perhaps you use the cfquery cachedwithin attribute to tell CF to cache the results until the query is to be run again the next night. And its should be possible to use a Query of Queries to further filter/sort to the results as needed.
What kind of SQL back-end are you using? If it's a resultset into the millions or billions or even trillions -- for sure dump the query into tables. I'm guessing it's taking so long because you're aggregating data ... If you're not aggregating data then you probably have a design issue at hand (something is not indexed correctly or you're getting a table scan somewhere). If you're not aggregating, then you could always split up the query by id's, and dump them into subsequent tables. Another possibility might lie in non-aggregating first, dumping it into a table, and then doing the aggregate and updating that same table.
(it could be hardware related too?)
One thing that was already stated, but I can't stress enough, look at the things you are using to join and to filter against and make sure their are indexes that match the above. Most of the time when I am asked to look at slow running queries there will be poorly thought our indexes.
You could also look at a series of queries in a stored procedure where some of the big filtering is done and then inserted into temp tables and then join the smaller sets together. A well thought out stored procedure can really save some processing cycles.
If possible, use a materialized view on the database side, and avoid doing this in ColdFusion at all.
Building on what others have already stated.
Be sure that you look at the execution plan of the query. When dealing with a slow query and building indexes, the exeution plan of the query can tell you if your indexes are optimal or not. The more "Index Seeks" and "Index Scans" you have, the better optimized the query. "Table Scans" are really bad EXCEPT when the table is very small. Now granted this is really the poor mans query optimization advice I'm giving here. I could literally talk to you on the phone for about 4 hours giving advice on optimizing queries.
Two more little tidbits. A lot of people never put indexes on their foriegn keys (why, I have no idea), make sure you do this. The rule of thumb is that any column you filter against in the WHERE clause SHOULD (in most cases) have an index (or be part of an index) on it (except for BIT, TEXT, IMAGES, BLOBS).
Another good tip is to physically look at what columns your hitting against the most on the tables and put the CLUSTED INDEX on them. Case in point. With BannerBoxes, we have a click table that stores that clicks we get. We have a column named PK that is a GUID for making each recored unique, HOWEVER I have the clustered index on the CREATIONDATE column, which is a TIMESTAMP field and sorted DESC. Now why is this you might ask. Because almost all my queries hitting against this table are like this: Return the clicks for the past days (week, month, whatever). So by putting the CLUSTERED INDEX on the CREATIONDATE column and sorting it DESC, it tells the SQL to PHYSICALLY put the records of the table in that order. Thus making queries lightning fast.
Sorry for the long comment Ray. I'm done now.
Tony (and others), do not hold back. This is what I want. :)
I run into this quite a bit with one of my larger clients. Typically in most cases we have the need to run a query against someone elses database where all they have granted us is select access and sometimes only ODBC access. For the latter we end up using CF to run the query and parse the results into our tables. This is a horrible solution for us though because the bigger the result set the higher likelyhood of the CF server running out of resources at least this is the case on the MX boxes but not the CF5. Where we have select access what we do is setup a database link and use a stored procedure to grab the data we need and parse it into our local tables, sometimes it needs no parsing and we just select/insert what we need.
Our bigger problem is that the owners of those other databases are always unwilling to invoke any changes on their end to help us query their data.
It may sound ridiculously simple and patronizing, but sometimes it helps to just stand back and ask yourself WHY you're returning such a large recordset to begin with. I've found (on more than one occassion) that multiple smaller sets and queries, or just a general change in the perception of your data can make all the difference in the world.
@Aaron.
If you're using MSSQL, then I think the better way to go is with a DTS package. I use to do a lot of stuff like that with CF when I first started. Since learning DTS I've moved everything over to it. It's a lot eaiser to maintain and so much more stable. Plus it's fast!
One of these days I need to do a blog post on query optimization and migrating CF stuff like this to DTS packages. I just wish I could get the time.
Tony, I do use DTS on one of them because it is pulling from Oracle and going into MSSQL. All the others though are Oracle to Oracle, one exception being from LDAP to Oracle.
If you're doing Oracle to Oracle, can you set up a database link and build a materialized view on the remote data? This builds a snapshot of the remote data on the local database (however you define remote and local in this case) and it's automatically refreshed at whatever time interval you specify when you create the materialized view.
I agree and recommend views and stored procedures...
Or create a csv file that you can create a tool to parse thru the next day...
I have thought about looking into materialized views the next time one of these comes up. It just has been almost a year since one that was oracle to oracle has come up. Right now out of the projects I have to support currently only one has the need for this and has just been chugging along with its already defined method. It has an LDAP to Oracle(which randomly has issues since CFM is used for that), Oracle to Oracle(which honestly I do not think is even needed any longer), and ODBC to Oracle(which even more randomly has issues since CFM is used for that).
We have a couple search queries that run long, not overnight, but 10-30 seconds; longer than most web users like to wait for a page. We've done two things to work around the problem.
1) Create a Verity collection out of the data (cfindex w/ query=). Not always the best, but it works for one of our situations.
2) On a couple of other searches, we've used CF's asynchronous, query-of-query and query caching abilities. On the search start page we kick off an asynchronous page that runs the long query without any parameters (filters 200,000 records down to 5000 that might apply). When the actual search starts, we run that query again and usually it is cache by then. The search criteria are then applied to that as a query of a query. Again, not always the best solution, but it works in our case.
Ray - not sure if a fella is allowed to plug a book on your blog [feel free to delete if breaking the rules], but I just finished "The Art of SQL" by Stéphane Faroult (published by O'Reilly) which I can't recommend highly enough. Quite often, and especially as procedural programmers (like myself) starting using OO techniques, the power and limitation of the back end db seem to take a back seat. This book has seriously improved my ability to write better, faster queries and has finally provided me some back up in my resistance to co-workers who quite often say "well just index that column" when a query runs slow. Depending on Jay's db of choice, and obviously the size of the data, and other things posters here have already pointed out, the book covers all sorts of interesting techniques to not just 'tune' the sql but explain when any why certain solutions are more applicable. There is a sample chapter online at O'Reilly's site. It's not often you can read a technical book from start to finish but I did with this one. CFUnited had a speaker this year about your own personal IPO (__ Coleman?) which I thought was quite interesting even though not directly related to CF - off topic - but a discussion about better understanding of SQL writing is something I think more CF programmers could use.
mbeach - no problem at all. In fact, I just added this to my wish list.
Hello All,
Does anyone know how to insert an image into an oracle database using coldfusion. This is the code am using
<cfif isdefined("form.filename") and #form.filename# NEQ "">
<cffile
action = "upload"
fileField = "filename"
destination = "#uploaddirectory#"
nameConflict = "makeunique">
<cffile
action = "readBinary"
file = "#file.serverdirectory#/#file.serverfile#"
variable = "thefile" >
</cfif>
<cfquery name="upl"
datasource="ccz1">
INSERT INTO bestStaffImage
VALUES (
bestStaffImageSeq.Nextval,
'#form.staffName#',
'#toBase64(thefile)#',
'#file.serverfile#'
)
</cfquery>
It uploads to the to the specified directory ok but it doesn't insert into the database. I get this error message Error Executing Database Query.
[Macromedia][Oracle JDBC Driver][Oracle]ORA-01704: string literal too long
The data type i use for the image column is blob. I have tried clob, long and even varchar2(4000) but to no avail
Hi Ray,
Am still expecting a reply to my previous message. Please Help!!!!!!!!!!!!!
Leo, you may want to try cf-talk. This question really isn't exactly on topic for this blog post.
Very funny!!! Ray, Guess i'ld look for another forum
Sorry Leo - it just isn't on topic for this thread.
No sweat Ray, infact i just resolved it. Thanks!!!