A reader asks:
Hiya Ray -
Is there any guidelines as to how long a query should take to process? For example, I have a category column that I have return just 10 rows, but my entire database has 60,000+ records and it takes about 1100ms. Is this bad?
It depends. (Tired of me saying that yet? ;) I'd always be wary of saying, "A query should only take X seconds," but I can say 1100 ms is probably way too much. I am not a good DBA. It is something I'm working on improving. But I'd definitely recommend looking into your SQL and trying to figure out why it is taking so long. If you want (and if you still read this blog after I took so long to respond), you can post your code and my readers and I can help debug it.
So what would I use as a general suggestion for the maximum amount of time a query should take? I'd say anything over 50ms should be cause for alarm. (Well, not alarm, but I'd check it out.)
By the way - Starfish can help you check how long your queries are taking.
Archived Comments
I've always lived by the creed "Anything less than instantaneous is unacceptable".
All off the answers that started with 'It depends', this one, by far is the most appropriate.
Also keep in mind that fewer records does not necessarily mean faster queries. If your queries are complex, doing a lot of complex computations or even if your DB is not indexed approprately, the milliseconds can pile up.
Scott, that is a darn fine point and one I should have made. You are absolutely correct. Thanks for adding that!
When using MS SQL it tends to show execution times in CF in rough multiples of 16.66 milliseconds (the server counts seconds with roughly 300 or so clicks per second, so a nice round number isn't feasible).
I've found that simple queries take 16 or 31 milliseconds to run. Longer obviously for more complex queries or those with bigger result sets.
Here is a query to a 15+ column wide Oracle table with 80k rows. Execution time of 31ms, 30 rows returned:
select * from hremp where position in ('01g0834','013r827','019m898','0a10569','020b009')
or SPV_POS in ('01g0834','013r827','019m898','0a10569','020b009')
order by spv_pos, position
Here is one I didn't write that takes longer than I think it should to run (Time=453ms, Records=2348, tables have 10k & 13k+ records in them):
SELECT DISTINCT P.Part_ID, P.Part_Name, P.Active, AsmtCount.Counter
FROM Participants P LEFT OUTER JOIN
Part_Asmt_Levels PAL ON P.Part_ID = PAL.Part_ID LEFT OUTER JOIN
(SELECT Part_ID, COUNT(DISTINCT Assessment_ID) AS Counter
FROM Part_Asmt_Levels
WHERE Level_ID = 0
GROUP BY Part_ID) AsmtCount ON
P.Part_ID = AsmtCount.Part_ID
WHERE PAL.Assessment_ID = 4
AND PAL.Level_ID = 0
ORDER BY P.Part_Name, P.Part_ID
Paste the query in Query Analyzer
hit CTRL + K and look at the execution plan
index seek --good
index scan -- not as good
table scan --bad
So you want to avoid table scans by using sargable arguments
also you want to minimize IO by listing only the columns that you need instead of SELECT * FROM....
take a look here to see what else you can do(http://sqlservercode.blogsp...
Denis
As a tuning method, or just to check major points of a SQL statement:
1) If you are joining 2 files, make sure that whatever field(s) you are joining on are indexed in both files. Nothing like a non-indexed join to slow down your day
2) If you are doing 'alike' queries over and over, consider creating a view. A view is like a SQL query overtop of a table(s) that you can run queries off of! Whenever you can let your DB server do the hard work, it will be faster =) Views are really easy too, check a simple example out here: http://www.faqs.org/docs/pp...
3) Order your WHERE clause so that the major limiting factors come first. For example, if you have a table containing orders for the past 10 years, and the 1st thing in your where clause is WHERE Origin = 'MI', you are first finding any order from MI for the entire life of your table! Put date limiters 1st in your WHERE clause, and it should filter down your results quicker (provided you have an index on date!) In this same thought, put indexed fields 1st in your WHERE clause as well, and if all else fails, consider creating either a new index or an indexed view =)
Hope these help!
> index seek --good
Mongo like index seek.
Yes, it certainly does depend. 1100 msec is a long time to look up a phone number, but a perfectly reasonable time if the query is, say, predicting the GDP of a Western African nation over the next 10 years based upon rainfall forecasts.
In addition to the comments made above about the "slow" query listed, left outer joins are considerably more expensive than inner joins. The two outer joins in that query could be a big contributing factor to the speed.
Hi, Ray - I'm the original questioner and wanted to say thank you for answering my question and to say thank you to the other posters here as well for their advice. I think my indexing might be the problem, but anyway, here is my long running query:
SELECT title, pdate, category
FROM articles
WHERE pdate >= '#DateFormat(dateAdd('d','-12',liveDate),'yyyy-mm-dd')#'
AND (category = '#a[x]#')
ORDER BY pdate DESC
I've got an array of six categories that I loop over. I run a query based on the current array item (news, sports, etc) that equals the liveDate - the MAX date in the DB - and shoot them into a structure. My liveDate query only runs once.
But if a query comes back empty for that array item, I run the above query to locate the most recent articles for that category going back 12 days. Sometimes it'll be a week or more without an article appearing for that category. It's totally random.
Please be gentle, my kevlar is thin...
why not try to make your category a list and run the query once kinda like this
SELECT title, pdate, category
FROM articles
WHERE pdate >= '#DateFormat(dateAdd('d','-12',liveDate),'yyyy-mm-dd')#'
AND category in ('#myList#')
ORDER BY pdate DESC
i think i may have misunderstood...
i'd probably go with datediff in your where clause, but i don't know that would make too big a difference
If you run the same query many, many times in succession (inside a loop, for example) parameterize it with <cfqueryparam>. This enables the DB to compile the query statement a single time and re-use it for all subsequent calls. If you just plop your data directly into the SQL statement then you basically force the server to re-analyze and re-compile the statement each time. [This also guards against SQL injection attacks!]
This won't show you a noticeable performance increase on a single query, but it's a great habit to get into and will help if you have a high volume app that does the same query over and over again.
Some other random performance-enhancing tactics:
1) Indexes are your friend. In my experience most web apps perform far more selects than inserts or updates, so liberal use of indexes is a good rule of thumb.
2) Properly normalize your database! This reduces redundant data, which reduces disc I/O operations, which increases performance.
3) Learn how to use a query analyzer tool to see the execution plan of a given query. Use the plan to determine what indexes you might create that will speed up the query.
This has been said before in this list, but the absolute best way to speed queries is to index the columns that you are using in joins or using in the WHERE clause. Any other tricks to speed up SQL queries generally pale in comparison to having properly indexed tables.
Instead of checking
if a query comes back empty for that array item, and running another query to locate the most recent articles for that category going back 12 days...
It looks like you want to use the TOP 1 limitation instead.
SELECT TOP 1
title, pdate, category
FROM articles
WHERE pdate >= '#DateFormat(dateAdd('d','-12',liveDate),'yyyy-mm-dd')#'
AND (category = '#a[x]#')
ORDER BY pdate DESC
This will always give you the most recent article.
I agree with Todd also: If you can combine your 6 queries into 1 with category in ('#myList#'), that would be much better.
It's also worthy to note that database engines may choose not to use an index even if it exists. If the selectivity of an index is low ("many" rows in the table match the criteria selection) then it is more efficient simply to scan the table. An index on a unique value, say, username or SSN will almost certainly be used, while an index on gender (two values: M or F) may be ignored. The reason for this that collecting the row identifiers from the gender index then using that list to pick out the matching rows in the table is slower and more disk intensive then simply scanning the table top-to-bottom to find the desired values.
The point being is that you should let the database do all the hard work...
I have a simple philosophy, if the tables have too many records or too complex joins then it becomes a view. Then i am only bringing across the exact number of records i need...
The key is how much data you want, and then what is the best method for bringing it to the coldfusion page you need it for.
Other than having cfdocumentation bookmarked, i also have the ms sql 2000 books online bookmarked for easy reference.
Really helps to easily lookup any t-sql or any sql matter..
http://doc.ddart.net/mssql/...
These are good posts, and I'm learning a lot from the generic responses.
sudo-name: I posted a question similar to yours on Experts-Exchange.com
http://www.experts-exchange...
and they came up with a solution like this:
SELECT title, Max(pdate), category
FROM articles
WHERE pdate >= '#DateFormat(dateAdd('d','-12',liveDate),'yyyy-mm-dd')#'
Simple!
Experts Exchange is a really great resource...
But for that dateformat query, i'd be tending to convert that into a sql function...just a personal preference...
I made a quick (4-minute) flash video showing the basics of how to use the index tuning wizard to automagically figure out indexes from your queries:
http://rickosborne.org/blog...
HTH
SELECT t1.id AS Start, t2.id AS End, t2.id - t1.id +1 AS Result
FROM nede t1, nede t2
WHERE (
t1.id < t2.id
)
AND NOT
EXISTS (
SELECT *
FROM nede t3
WHERE (
t3.no6 =1
AND t3.id
BETWEEN t1.id
AND t2.id
)
OR (
t3.id = t1.id -1
AND t3.no6 =0
)
OR (
t3.id = t2.id +1
AND t3.no6 =0
)
)
ORDER BY t2.id - t1.id DESC
LIMIT 1
/*********/
There is 300 records, and only 1 and 0 in column no6.
Can you please give me some info for why it take so long.
I am aware that it is an unbroken series of values that all are equal. It take about 20 sec to execute the query.
Hope for quick responds.
Best regards Kjell