Posted in Development | Posted on 12-21-2009 | 3,103 views
Did you know I have the smartest readers in the world? I'm not just saying that. You guys are truly intelligent. And beautiful too. Did I say that? All of my readers are intelligent and strikingly beautiful as well. I'm not just saying that because I really want some help with a query. Honest. Ok, maybe I am.
Let me give some background here. I've noticed for a while now that when I post a comment to my blog, the code takes a good 4-5 seconds to respond. Normal page requests run much quicker. However - when a comment is posted I clear the cache for the "Recent Comments" list you see below. I extracted the SQL for it and ran it within MySQL's Query Browser and confirmed - it was a dog. For the lift of me I can't figure out why.
The query simply asks for the last 5 comments. It joins against the entries table to get the entry title as well. Here is the SQL. The value next to date_add is dynamic as is the blog in the where clause. That value is set with a cfqueryparam.
2e.title,
3c.id,
4c.entryidfk,
5c.name,
6c.comment,
7date_add(c.posted, interval -1 hour) as posted
8from tblblogcomments c
9inner join tblblogentries e on c.entryidfk = e.id
10where blog = 'Default'
11order by c.posted desc
12limit 5
The issue seems to be the join. If I just get the last 5 comments it runs instantly. However, from what little SQL skills I have, a join like this shouldn't be so slow. I've got an index on comment.id and comment.entryidfk. Entry has an index on it's ID and the blog column. Here is the result of the EXPLAIN:

Unfortunately nothing really here makes sense to me. This seems to only be an issue with a BlogCFC install with a lot of data as I don't see it on my test version. If folks are bored and want to recreate this locally, just download BlogCFC and fill it with some random data. (I'd be willing to give folks an export, but I'd need to prune out the email addresses from my commenters.)
If worse comes to worse, I'm going to mod my own copy to get just the comment data and follow it up with a loop to get the entry data.


- I'm wondering about "limit 5" versus "select top 5" Does MySQL not support "top?" If it *does* support "top," does that change make a difference?
- Does MySQL support database-enforced foreign key relationships? If so, do you have one established for "c.entryidfk = e.id" If not, does establishing one make a difference? If it's helpful, here's the SQL to do this in SQL Server: ALTER TABLE tblblogcomments ADD CONSTRAINT FK_tblblogcomments_tblblogentries FOREIGN KEY (entryidfk) REFERENCES tblblogentries (id) ON UPDATE NO ACTION ON DELETE CASCADE
Is there also an index on tblblogcomments.posted ?
not sure what, if any, difference it will make - it's something I'd traditionally put a clustered index on in MS SQL Server.
Can you post the results of "SHOW CREATE TABLE x" for the tables involved?
Also i noticed you need a "e." on your where statement.
where e.blog = 'Default'
I recently used the MS SQL Database Engine Tuning Advisor to figure out why one of my queries was running really slow and it suggested adding an index to a table and my queries went from 4 seconds to less then a second.
Multiple: More than one said to add an index on tblblogcomments.posted. I did. I didn't think indexes helped with dates. It doesn't seem to have helped though. In fact, the explain now shows a larger # of rows in that first entry. But it is in now.
@Yaron - ashamed to say - how do I get the EP?
@Peter: These results are for a local copy of my blog (it's my blog from Sep 09, so still 'big'). Here is the result for tblblogentries:
Table=
74626c626c6f67656e7472696573
Create Table=
435245415445205441424c45206074626c626c6f67656e74726965736020280a20206069646020766172636861722833352920636861726163746572207365742075746638204e4f54204e554c4c2c0a2020607469746c656020766172636861722831303029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a202060626f647960206c6f6e67746578742c0a202060706f7374656460206461746574696d652064656661756c74204e554c4c2c0a2020606d6f7265626f647960206c6f6e67746578742c0a202060616c6961736020766172636861722831303029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a202060757365726e616d656020766172636861722831303029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a202060626c6f676020766172636861722835302920636861726163746572207365742075746638204e4f54204e554c4c2c0a202060616c6c6f77636f6d6d656e7473602074696e79696e742834292064656661756c74204e554c4c2c0a202060656e636c6f737572656020766172636861722832353529206368617261637465722073657420757466382064656661756c74204e554c4c2c0a20206066696c6573697a656020646563696d616c2831382c30292064656661756c74204e554c4c2c0a2020606d696d65747970656020766172636861722832353529206368617261637465722073657420757466382064656661756c74204e554c4c2c0a20206076696577736020696e74283130292064656661756c74204e554c4c2c0a20206072656c6561736564602074696e79696e742834292064656661756c74204e554c4c2c0a2020606d61696c6564602074696e79696e742834292064656661756c74204e554c4c2c0a20206073756d6d6172796020766172636861722832353529206368617261637465722073657420757466382064656661756c74204e554c4c2c0a2020607375627469746c656020766172636861722831303029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a2020606b6579776f7264736020766172636861722831303029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a2020606475726174696f6e60207661726368617228353029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a20205052494d415259204b455920202860696460292c0a20204b45592060626c6f6760202860626c6f6760290a2920454e47494e453d496e6e6f44422044454641554c5420434841525345543d6c6174696e31
Here is the result for tblblogcomments
Table=74626c626c6f67636f6d6d656e7473
Create Table=435245415445205441424c45206074626c626c6f67636f6d6d656e74736020280a20206069646020766172636861722833352920636861726163746572207365742075746638204e4f54204e554c4c2c0a202060656e7472796964666b6020766172636861722833352920636861726163746572207365742075746638204e4f54204e554c4c2c0a2020606e616d6560207661726368617228353029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a202060656d61696c60207661726368617228353029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a202060636f6d6d656e7460206c6f6e67746578742c0a202060706f7374656460206461746574696d652064656661756c74204e554c4c2c0a202060737562736372696265602074696e79696e742834292064656661756c74204e554c4c2c0a202060776562736974656020766172636861722832353529206368617261637465722073657420757466382064656661756c74204e554c4c2c0a2020606d6f64657261746564602074696e79696e742834292064656661756c74204e554c4c2c0a2020607375627363726962656f6e6c7960206269742831292064656661756c74204e554c4c2c0a2020606b696c6c636f6d6d656e74602076617263686172283335292064656661756c74204e554c4c2c0a20205052494d415259204b455920202860696460292c0a20204b45592060656e7472796964666b60202860656e7472796964666b60290a2920454e47494e453d496e6e6f44422044454641554c5420434841525345543d6c6174696e31
http://dev.mysql.com/doc/refman/5.1/en/optimize-ta...
Have you tried removing "date_add(c.posted, interval -1 hour) as posted" to see if that makes any difference. I bet it does. Anyway if you are just looking for the last 5 comments why not do "ORDER BY c.id DESC" Sorry I'm not shouting there I just like to write my sql commands in caps and my sql variables in camelCase.
I've got a couple of idea's, however they depend on the schema your using for your two tables, do you have a sample script so that I can build the table structure up, then I can provide a solution for you.
Thanks,
As a workaround alternative to your proposed workaround, you might use subqueries with the select portion of the statement to get the entry info (I think MYSQL supports that?):
select
(SELECT id FROM tblblogentries e where c.entryidfk = e.id) as entryID,
(SELECT title FROM tblblogentries e where c.entryidfk = e.id) AS title,
c.id,
c.entryidfk,
c.name,
c.comment,
date_add(c.posted, interval -1 hour) as posted
from tblblogcomments c
where blog = 'Default'
order by c.posted desc
limit 5
I would give that a try in any case. It seems like the problem with your query is that the mysql is not properly optimizing the query, and putting the subquery in the select clause would tell it plainly that you only need the entry information for the 5 latest comments.
Add "EXPLAIN EXTENDED" just before your query (it's explained here: http://dev.mysql.com/doc/refman/5.0/en/explain.htm...)
Hope it helps, if not, it's something new.
In MSSQL
MSSQL has a nice execution plan visual display. Query-->Include Actual Execution Plan.
where blog = 'Default' and c.entryIDFK in (select c.entryIDFK from tblblogcomments c order by c.posted desc limit 5)
The WHERE is where you should limit the query.
Thanks for the schema, you could try this (this is a MS SQL script though sorry, but you get the idea).
In my test this produces an equal 50%, however this is with no actual records, with a large amount of records I'd **expect** this to be more effecient.
SELECT E.EntryID,
E.Title,
C.id,
C.entryidfk,
C.name,
C.comment,
DATEADD(hh,-1,c.posted) as posted
FROM (
SELECT TOP 5 e.id AS EntryID,
e.title
FROM tblblogentries e
WHERE blog = 'Default'
) AS E
INNER JOIN tblblogcomments C ON e.EntryID = C.entryidfk
ORDER BY Posted
Sorry ignore my last post, im talking rubbish, it wont give you a better solution.
One thing I tried was a new index as your currently getting a clustered index scan on the tblBlogComment table. I'd try replacing this with a new index, I dont know if My SQL can do includes in it's index creation but the MSSQL version would be something like this;
create index ix_test on tblblogcomments (entryidfk asc,posted desc) include (id,name,comment)
However, you wont be able to make this as the comment datatype is not supported in an index as it's ntext, can you change this, you may get something more out of it as the optimizer will start using the new index, and not be using the generic clustered index (which covers all the columns)
Just an idea.
if you have this:
select * from blah b
inner join other_blah o on o.id = b.id
you might see difference when doing
select * from blah b
inner join other_blah o on b.id = o.id
I can't speak for mysql in this case but we have seen this in oracle in the past.
Try adding index to posted date column.
My vote is for the table scan occurring due to the order by on c.posted too. An index on c.posted should help or an index on (c.posted, c.entryidfk) should be even better. This would stop the db having to fetch anything from the table. Joins are usually best when they fully use indicies.
@Andrew - yep, added an index to posted.
http://www.coldfusionjedi.com/downloads/ee.html
FYI, I'm still seeing what I think to be an improvement. I've added a bunch of comments and in general, it seems to be responding quicker.
select c.id, c.entryidfk, c.name, c.comment, c.posted,
e.id as entryID, e.title
from tblblogentries e
(select id, entryidfk, name, comment, posted from tblblogcomments order by posted desc limit 5) as c
inner join e.id = c.entryidfk
select e.id as entryID,
e.title,
c.id,
c.entryidfk,
c.name,
c.comment,
c.posted
from (
select
c.id,
c.entryidfk,
c.name,
c.comment,
date_add(c.posted, interval -1 hour) as posted
from tblblogcomments c
order by c.posted desc
limit 500
) c,
inner join tblblogentries e on c.entryidfk = e.id
where blog = 'Default'
limit 5
You are limiting the number of comments you join to 500, instead of the entire table. This has it's faults in that you might not get any results if you have 2 blogs, where one has a lot of comments, and the other has fewer.
this query was rewritten based on the principles found here:
http://www.youtube.com/watch?v=ZVisY-fEoMw&fea...
The index on posted should help, but make sure it is ordered desc, as asc (or default) will require the entire index to be scanned.
select (select id from tblblogentries e where c.entryidfk = e.id) as entryID,
(select title from tblblogentries e where c.entryidfk = e.id) as title,
c.id,
c.entryidfk,
c.name,
c.comment,
date_add(c.posted, interval -1 hour) as posted
from tblblogcomments c
where c.entryidfk in (select id from tblblogentries where blog = 'Default')
order by c.posted desc
limit 5
You have an index on CommentID? Is this value unique for every row? If so, I thought such columns were not good candidates for indexes, and can slow down query time?
Ok - I agree. I now have to look up why, other than key fields, it's not a good idea to index columns that have unique values for every record... Maybe I dreamed that.
Ray, somebody above mentioned WITH (NOLOCK), which is used in SQL Server SELECT statements to ignore table locking. If there is a similar statement in MySQL, you should look into it, because the fact that you saw intermittent slowness is a big red flag to me that some of your reads are being done while the table is being written to (and is thus locked temporarily).
I use NOLOCKS on every SELECT across all of my applications, because data is constantly being inserted into my tables. I'm not dealing with financial data, so I don't really care about the occasional dirty read--the performance boost is far more important.
@Mike: No idea on the with(nolock) - will try to google.
in sql server you can do explain plans, there might be similar options in mysql. But my guess is because of the join and date functions, might be better to do it as a view..
Not sure if this was posted before, but a "first" fix before moving to indexing / plan issues is to put the where clause in the join to become:
inner join tblblogentries e on c.entryidfk = e.id AND blog = 'Default'
Causes the where clause to restrict the records joined instead of filtering records post-join. According to Dr. Seuss - much, much, ever-so-muchly more faster on large data sets.
SELECT e.id as entryID, e.title, c.id,c.entryidfk,c.name,c.comment,date_add(c.posted, interval -1 hour) as posted
FROM tblblogentries e
LEFT OUTER JOIN tblblogcomments c ON (c.entryidfk = e.id)
where blog = 'Default'
order by c.posted desc
limit 5
LOCK TABLES tblblogcomments c, tblblogentries e READ [LOCAL];
{existing SQL here}
UNLOCK TABLES;
I'm not sure from the docs whether or not it'll need to be READ LOCAL. I'm also not sure if this will play nicely with a cfquery tag; I think we can do the multiple-statements thing to trick SQL Server into coughing up identity values, so perhaps it's as simple as the MySQL docs make it sound.
Anything else I can picture seems already to have been covered by a previous commenter ...
If you are considering design changes, would it not be better to have a separate table of blog names, with an id that both the entries table and the comments table could use? It might be a lot easier to join to the blog name table, than its been with to the entries table, at least for this purpose.
** application.learning_center_assets_list is the query on the application scope
<cfquery name="getAssets" dbtype="query">
select c.asset_title, c.asset_id, ls.S_section_name, c.asset_description, c.file_name, c.replay_media, c.asset_subtype, c.date_header, c.time_duration,
c.asset_type, c.asset_location_link,c.asset_created, lso.S_solution_name, lso.solution_ID, c.replay_pdf, c.asset_post_date, c.language_id, c.file_size
from
(
select * from application.learning_center_assets_list
) c
join LCSECTION ls on ( c.asset_type = ls.section_ID )
join LCSOLUTION lso on ( c.asset_solution = lso.solution_ID )
where c.asset_type = ls.section_ID and
c.asset_solution = lso.solution_ID and
c.asset_type <> 5
</cfquery>
[Add Comment] [Subscribe to Comments]