I think it's somewhat easy to fall into assumptions. Everyone knows X is faster than Y. Everyone knows the so and so function should be avoided. Etc. But it doesn't to double check those assumptions from time to time. You may be surprised. Case in point - MAXROWS. The maxrows attribute for cfquery is a simple way to return only the top N rows of a query. While you can do this in SQL, if you don't know the syntax (and every database is a bit different) than this can be a much simpler way of handling it.
In the early days of my ColdFusion development, I tried to avoid using this feature. It was my understanding that if you used this, you were still retrieving a large set of data and that ColdFusion simply performed some work to "chop" the query into your desired size. So if your query returned two million records (it could happen) and you used MAXROWS of 10, your still moving a heck of a lot of data between the database and ColdFusion.
When MX came out and we switched to nice JDBC drivers, I thought that this was much improved. I knew that ColdFusion was passing the MAXROWS to the driver. I had thought that the driver passed this on to the database and the database "did the right thing" - whatever that was for the database in question.
I was speaking with Mark Mandel a bit about this and he expressed some doubt. We followed up on a private listserv and both Jochem van Dieten and Rupesh Kumar schooled me on how incorrect I was.
Instead of the MAXROWS attribute being passed to the database, it only goes to the JDBC driver. So imagine that I'm the ColdFusion server, Bob is the driver, and Sally is MySQL. When I make a request for: "select * from pants" with "maxrows=10", Bob gets the request along with the maxrows but just passes the SQL along. Sally returns my one million rows, and Bob handles 'the cut' and returns 10.
In other words - don't be lazy and rely on MAXROWS. Is it the most horrible, inefficient thing you can do in ColdFusion? Probably not. But it is certainly worth the time to research how to do this in your preferred SQL flavor.
Jochem also made another very interesting point. If you use audit triggers in the database to record how many rows were selected, then you would have a mismatch between the records and what your application shows to the end user.
Archived Comments
It may even be more complicated then that. For instance, the buffering of the resultset could happen both serverside and clientside. In the MS SQL Server JDBC driver that is determined by the select method (cursor vs. direct, with direct (client) as the default). In PostgreSQL that depends on the protocol version between server and driver, FEBE 2 is buffered on the client, FEBE 3 may be on both sides.
It does pay to experiment a bit with this if you want to find out how your database and driver work exactly. For some SQL to test whether the SQL runs fully on the server see my blog entry <a href="http://jochem.vandieten.net...">Does cfquery’s maxrows work on the ColdFusion side or the database side?</a> If you want to know more about the buffering behaviour of your driver you are probably going to need a packetsniffer.
When I'm done with my apps I always run through them while running MS SQL Profiler (which imo they made harder to find for novices from 2005 onwards) on the MS SQL servers. That has given me A LOT of eye openers up through the years.
It's really worthwhile and I can't recommend it enough. I assume the other major engines have something similar.
I don't believe I ever used maxrows for exactly the reason you stated. I had the feeling that it was "trimming" was done by CF and not by the DB and my tests (way back when) confirmed my feelings and never looked at the attribute again
I'm not sure if I buy this 100%. I use MaxRows in testing sometimes when I'm working with large data sets, and the performance is more inline with select top 20 (of ~200k). It takes longer to run the full query in query analyzer than to run the CF script with maxrows=20.
Maybe it gets data back in chunks and stops requesting the next chunk after maxrows is passed? I dunno. But it works great for testing, when you don't really need to sit around and wait for large datasets to parse
I use maxrows but only on query of queries when I am already displaying the main query on the page, but need to get the top one for one column for some other purpose.
MAXROWS is the suspenders, and "select top N" is the belt. I try to never run a query in a db without a top N in the select, and always use the MAXROWS to get a cut of that. Most of my SQLs I cap at 1000 (or whatever the sanity length is for the query), then cut down from there with maxrows because more often than not, somewhere down the page I'm going to need the rows that I cut and I don't want to have to re-run the query again.
I use Maxrows for query of queries, since CF doesn't allow you to pass something like "LIMIT 20" in this way. This does make sense, since a query of queries is simplay getting data from something that has already been to the database.
It has its place, but yes, for queries direct to the database, I wouldn't use it.
I'm not using maxrows since I ran into a strange problem with cached queries. Even when changing maxrows from say 10 to 5 rows, you will again get 10 rows back if your query is still in cache with the initial value. While LIMIT or TOP will work as expected.
I don't find that surprising at all. The cached query should ignore maxrows and just return what was cached.
Hmm, I was suspicious this would be the case so I created a dummy table for a remote datasource with 10k rows and ran three queries to compare performance. To my surprise I found absolutely no speed difference between using maxrows or TOP.
testNoLimit (Time=3299ms, Records=10000)
SELECT id, stringValue
FROM test
testMaxRows (Time=64ms, Records=10)
SELECT id, stringValue
FROM test
testTOP (Time=65ms, Records=10)
SELECT TOP 10 id, stringValue
FROM test
Interesting. Well - as I said, I was only passing along what I was told. I wonder if this is something with your driver versus other drivers?
I've also tried it out several queries and saw something funny. Yes, the TOP 10 is faster the first 2 or 3 times, but as Andreas descibes the ColdFusion Query Caching kicks in. The TOP 10 Query however is not being cached and therefore takes longer to run.
So, in short.
TOP 10 seems 3 to 4 times faster, but Cached Queries are faster than TOP 10. Can anyone else confirm this?
I think it is a mistake to bring up cached queries. You can cache a query using top 10, so it is a moot point.
I just found something interesting today.
Using Maxrows in a query, using CF9 to talk to MariaDB over the normal mysql driver...
Gives a big
Error Executing Database Query.
Unknown system variable 'OPTION'
So not only is it not good to use with DB, it doesn't play nice with MariaDB.
Haven't seen any other glitches with MariaDB yet, but there is 1.