Did you know I have the smartest readers in the world? I'm not just saying that. You guys are truly intelligent. And beautiful too. Did I say that? All of my readers are intelligent and strikingly beautiful as well. I'm not just saying that because I really want some help with a query. Honest. Ok, maybe I am.
Let me give some background here. I've noticed for a while now that when I post a comment to my blog, the code takes a good 4-5 seconds to respond. Normal page requests run much quicker. However - when a comment is posted I clear the cache for the "Recent Comments" list you see below. I extracted the SQL for it and ran it within MySQL's Query Browser and confirmed - it was a dog. For the lift of me I can't figure out why.
The query simply asks for the last 5 comments. It joins against the entries table to get the entry title as well. Here is the SQL. The value next to date_add is dynamic as is the blog in the where clause. That value is set with a cfqueryparam.
select e.id as entryID, e.title, c.id, c.entryidfk, c.name, c.comment, date_add(c.posted, interval -1 hour) as posted from tblblogcomments c inner join tblblogentries e on c.entryidfk = e.id where blog = 'Default' order by c.posted desc limit 5
The issue seems to be the join. If I just get the last 5 comments it runs instantly. However, from what little SQL skills I have, a join like this shouldn't be so slow. I've got an index on comment.id and comment.entryidfk. Entry has an index on it's ID and the blog column. Here is the result of the EXPLAIN:

Unfortunately nothing really here makes sense to me. This seems to only be an issue with a BlogCFC install with a lot of data as I don't see it on my test version. If folks are bored and want to recreate this locally, just download BlogCFC and fill it with some random data. (I'd be willing to give folks an export, but I'd need to prune out the email addresses from my commenters.)
If worse comes to worse, I'm going to mod my own copy to get just the comment data and follow it up with a loop to get the entry data.
Archived Comments
Ray, two (potentially unhelpful) questions:
- I'm wondering about "limit 5" versus "select top 5" Does MySQL not support "top?" If it *does* support "top," does that change make a difference?
- Does MySQL support database-enforced foreign key relationships? If so, do you have one established for "c.entryidfk = e.id" If not, does establishing one make a difference? If it's helpful, here's the SQL to do this in SQL Server: ALTER TABLE tblblogcomments ADD CONSTRAINT FK_tblblogcomments_tblblogentries FOREIGN KEY (entryidfk) REFERENCES tblblogentries (id) ON UPDATE NO ACTION ON DELETE CASCADE
Strange, it's not particularly complex.
Is there also an index on tblblogcomments.posted ?
not sure what, if any, difference it will make - it's something I'd traditionally put a clustered index on in MS SQL Server.
Hi Ray, I'm not a MYSQL guy, MSSQL, but it looks like it is doing a rollup of the 5 for each blog entry. notice the 1826 rows for blog then the 5 for each entry (1826 * 5). the more you have the longer it will take. If this is for the actual blog entry, try limiting the where clause to the actual blog entry where entry = some id. if this is for all entries, it is going to take longer the more entries you have. maybe a different approach in this case.
Perhaps it's getting stuck on "blog = 'Default' " ? What does your execution plan say? If it has to iterate all resuls searching for blog = '*' it might be slowing it down.
I'm a SQL Server guy, but the basics should hold true--you typically want indexes to cover your WHERE and ORDER BY clauses, as well as your keys. That ORDER BY statement is probably your culprit--I would try creating a multi-column index on entryidfk and posted (desc).
@Mike Kaplan : out of interest - what's the reasoning behind creating a multicolumn index instead of having two different indexes? I know nothing about how mysql uses them. will it use two different indexes within a query (or for the where clause and one for the order by)?
The join being fast for low numbers but slow for large numbers does suggest an index issue.
Can you post the results of "SHOW CREATE TABLE x" for the tables involved?
If c.posted is not an index, it surely should be done so. In my experience i saw that adding the right indexes may speed up the process as much as up and over 50 times when it's about VERY BIG amount of data and records. Also i am not sure if this is valid for MySQL as it is for MS SQL, but using "WITH (NOLOCK)" may help too.
I agree with everyone else you need another index to speed up the query. Add an index to the column e.blog. Give that a shot and see what kind of improvment you get.
Also i noticed you need a "e." on your where statement.
where e.blog = 'Default'
I recently used the MS SQL Database Engine Tuning Advisor to figure out why one of my queries was running really slow and it suggested adding an index to a table and my queries went from 4 seconds to less then a second.
John: LIMIT is how mysql does TOP. It's actually better as you can do LIMIT X,Y for pagination. I have NOT made a FK relationship. Before I add that - are you sure? (I guess no big deal to roll back, right?)
Multiple: More than one said to add an index on tblblogcomments.posted. I did. I didn't think indexes helped with dates. It doesn't seem to have helped though. In fact, the explain now shows a larger # of rows in that first entry. But it is in now.
@Yaron - ashamed to say - how do I get the EP?
@Peter: These results are for a local copy of my blog (it's my blog from Sep 09, so still 'big'). Here is the result for tblblogentries:
Table=
74626c626c6f67656e7472696573
Create Table=
435245415445205441424c45206074626c626c6f67656e74726965736020280a20206069646020766172636861722833352920636861726163746572207365742075746638204e4f54204e554c4c2c0a2020607469746c656020766172636861722831303029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a202060626f647960206c6f6e67746578742c0a202060706f7374656460206461746574696d652064656661756c74204e554c4c2c0a2020606d6f7265626f647960206c6f6e67746578742c0a202060616c6961736020766172636861722831303029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a202060757365726e616d656020766172636861722831303029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a202060626c6f676020766172636861722835302920636861726163746572207365742075746638204e4f54204e554c4c2c0a202060616c6c6f77636f6d6d656e7473602074696e79696e742834292064656661756c74204e554c4c2c0a202060656e636c6f737572656020766172636861722832353529206368617261637465722073657420757466382064656661756c74204e554c4c2c0a20206066696c6573697a656020646563696d616c2831382c30292064656661756c74204e554c4c2c0a2020606d696d65747970656020766172636861722832353529206368617261637465722073657420757466382064656661756c74204e554c4c2c0a20206076696577736020696e74283130292064656661756c74204e554c4c2c0a20206072656c6561736564602074696e79696e742834292064656661756c74204e554c4c2c0a2020606d61696c6564602074696e79696e742834292064656661756c74204e554c4c2c0a20206073756d6d6172796020766172636861722832353529206368617261637465722073657420757466382064656661756c74204e554c4c2c0a2020607375627469746c656020766172636861722831303029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a2020606b6579776f7264736020766172636861722831303029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a2020606475726174696f6e60207661726368617228353029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a20205052494d415259204b455920202860696460292c0a20204b45592060626c6f6760202860626c6f6760290a2920454e47494e453d496e6e6f44422044454641554c5420434841525345543d6c6174696e31
Here is the result for tblblogcomments
Table=74626c626c6f67636f6d6d656e7473
Create Table=435245415445205441424c45206074626c626c6f67636f6d6d656e74736020280a20206069646020766172636861722833352920636861726163746572207365742075746638204e4f54204e554c4c2c0a202060656e7472796964666b6020766172636861722833352920636861726163746572207365742075746638204e4f54204e554c4c2c0a2020606e616d6560207661726368617228353029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a202060656d61696c60207661726368617228353029206368617261637465722073657420757466382064656661756c74204e554c4c2c0a202060636f6d6d656e7460206c6f6e67746578742c0a202060706f7374656460206461746574696d652064656661756c74204e554c4c2c0a202060737562736372696265602074696e79696e742834292064656661756c74204e554c4c2c0a202060776562736974656020766172636861722832353529206368617261637465722073657420757466382064656661756c74204e554c4c2c0a2020606d6f64657261746564602074696e79696e742834292064656661756c74204e554c4c2c0a2020607375627363726962656f6e6c7960206269742831292064656661756c74204e554c4c2c0a2020606b696c6c636f6d6d656e74602076617263686172283335292064656661756c74204e554c4c2c0a20205052494d415259204b455920202860696460292c0a20204b45592060656e7472796964666b60202860656e7472796964666b60290a2920454e47494e453d496e6e6f44422044454641554c5420434841525345543d6c6174696e31
Have you tried to run an OPTIMIZE on the table?
http://dev.mysql.com/doc/re...
hmmm. Guys... I thin the index/posted seems to have done something nice. No scientific tests, but some results are coming back much quicker.
Chad - there is an index on e.blog.
I ran optimize on both. Msg said it did not support optimize, but it did recreate... and then the damn MySQL query thing wouldn't let me see more. ;)
Oh - the docs for optimize say this is expected for the type of table I have. -sigh- Things seem slow again on update.
Err... and fast. -boggle- So now it seems to be random - which I guess is better than _always_ bad. ;)
Hi Ray
Have you tried removing "date_add(c.posted, interval -1 hour) as posted" to see if that makes any difference. I bet it does. Anyway if you are just looking for the last 5 comments why not do "ORDER BY c.id DESC" Sorry I'm not shouting there I just like to write my sql commands in caps and my sql variables in camelCase.
@Martin: I removed it but it didn't seem to help any. As for just checking c.id desc - I need the last 5 comments posted and my IDs aren't auto #s. They are UUIDs.
Hi,
I've got a couple of idea's, however they depend on the schema your using for your two tables, do you have a sample script so that I can build the table structure up, then I can provide a solution for you.
Thanks,
Hi, Ray. As with many of the responders I have worked with MSSQL a lot more than MySQL. When I have a query that is inexplicably slow, sometimes refreshing the statistics on the tables helps (not sure if that it an option in MYSQL).
As a workaround alternative to your proposed workaround, you might use subqueries with the select portion of the statement to get the entry info (I think MYSQL supports that?):
select
(SELECT id FROM tblblogentries e where c.entryidfk = e.id) as entryID,
(SELECT title FROM tblblogentries e where c.entryidfk = e.id) AS title,
c.id,
c.entryidfk,
c.name,
c.comment,
date_add(c.posted, interval -1 hour) as posted
from tblblogcomments c
where blog = 'Default'
order by c.posted desc
limit 5
I would give that a try in any case. It seems like the problem with your query is that the mysql is not properly optimizing the query, and putting the subquery in the select clause would tell it plainly that you only need the entry information for the 5 latest comments.
@Someonewhocares - the BlogCFC install (blogcfc.riaforge.org) would have it - but obviously I've modified it since then to add the index on posted.
In MySQL
Add "EXPLAIN EXTENDED" just before your query (it's explained here: http://dev.mysql.com/doc/re...
Hope it helps, if not, it's something new.
In MSSQL
MSSQL has a nice execution plan visual display. Query-->Include Actual Execution Plan.
The LIMIT clause is applied after all other query operations, so you're running a join on many more records than you need. Perhaps using a subquery in the where clause to reduce the number of joined records would be faster?
where blog = 'Default' and c.entryIDFK in (select c.entryIDFK from tblblogcomments c order by c.posted desc limit 5)
Yes Ray,
The WHERE is where you should limit the query.
Hi,
Thanks for the schema, you could try this (this is a MS SQL script though sorry, but you get the idea).
In my test this produces an equal 50%, however this is with no actual records, with a large amount of records I'd **expect** this to be more effecient.
SELECT E.EntryID,
E.Title,
C.id,
C.entryidfk,
C.name,
C.comment,
DATEADD(hh,-1,c.posted) as posted
FROM (
SELECT TOP 5 e.id AS EntryID,
e.title
FROM tblblogentries e
WHERE blog = 'Default'
) AS E
INNER JOIN tblblogcomments C ON e.EntryID = C.entryidfk
ORDER BY Posted
Hey Ray, can you send me the schema (and maybe some sample data) for the 2 tables in question? I've had to do a bit of MySQL tuning on an application here I've worked on. Indexing is most likely the issue as described by others here.
hey Ray, with this type of quality feedback you're likely to have the most responsive blog on the web ;-)
Hi Ray,
Sorry ignore my last post, im talking rubbish, it wont give you a better solution.
Hi Ray,
One thing I tried was a new index as your currently getting a clustered index scan on the tblBlogComment table. I'd try replacing this with a new index, I dont know if My SQL can do includes in it's index creation but the MSSQL version would be something like this;
create index ix_test on tblblogcomments (entryidfk asc,posted desc) include (id,name,comment)
However, you wont be able to make this as the comment datatype is not supported in an index as it's ntext, can you change this, you may get something more out of it as the optimizer will start using the new index, and not be using the generic clustered index (which covers all the columns)
Just an idea.
you might also want to look at the ordering in your join... while you may think the = is commutative it's not always when it comes to a join in sql.
if you have this:
select * from blah b
inner join other_blah o on o.id = b.id
you might see difference when doing
select * from blah b
inner join other_blah o on b.id = o.id
I can't speak for mysql in this case but we have seen this in oracle in the past.
You have a sort by posted column, so in order to retrieve just top 5, it still has to process all records and sort them by posted column.
Try adding index to posted date column.
Hi Ray,
My vote is for the table scan occurring due to the order by on c.posted too. An index on c.posted should help or an index on (c.posted, c.entryidfk) should be even better. This would stop the db having to fetch anything from the table. Joins are usually best when they fully use indicies.
You're may need to add the "blog" field into the index too...
@All: Sorry for being quiet - had to babysit a friend's baby. Still doing so but got a quick break.
@Andrew - yep, added an index to posted.
To all -yes - there is an index on the tblblogentries.blog.
@Tom - the issue I have though is that BlogCFC supports N blogs per db. Hence the need for "where blog=....". Therefore I can't just get the last 5 comments.
@Yaron, and all, here is the result of the EXPLAIN EXTENDED
http://www.coldfusionjedi.c...
FYI, I'm still seeing what I think to be an improvement. I've added a bunch of comments and in general, it seems to be responding quicker.
Have you tried moving the blog='Default' to the join clause? I've found that MySQL can sometimes behave differently in that case. Run an explain with that change to see if it attempts to use the index differently. Also make sure that if you have an index with multiple items in it, you are joining and/or WHERE'ing them in the same order as specified in the index.
my sql is a tad rusty, but I'd do a sub-query first to get your 5 rows, then join them to get the other necessary fields. So something like...
select c.id, c.entryidfk, c.name, c.comment, c.posted,
e.id as entryID, e.title
from tblblogentries e
(select id, entryidfk, name, comment, posted from tblblogcomments order by posted desc limit 5) as c
inner join e.id = c.entryidfk
ok, missed your comment on needing the where clause as I was reading and then coming up with a slick answer. Is blog in the e or c table?
@Chuck - the blog col is in the entries table.
Adam - can you give me a demo of that change please?
Ray, I would say this is a bit of a design issue. You are not able to practically limit your request for comments (which probably has more entries). It would appear that the result is joining both tables in order to do the limit. For the future, I would suggest adding a blog field to the comments table in order to more effectively utilize your limit. In the interim, I would suggest using this particular query:
select e.id as entryID,
e.title,
c.id,
c.entryidfk,
c.name,
c.comment,
c.posted
from (
select
c.id,
c.entryidfk,
c.name,
c.comment,
date_add(c.posted, interval -1 hour) as posted
from tblblogcomments c
order by c.posted desc
limit 500
) c,
inner join tblblogentries e on c.entryidfk = e.id
where blog = 'Default'
limit 5
You are limiting the number of comments you join to 500, instead of the entire table. This has it's faults in that you might not get any results if you have 2 blogs, where one has a lot of comments, and the other has fewer.
this query was rewritten based on the principles found here:
http://www.youtube.com/watc...
The index on posted should help, but make sure it is ordered desc, as asc (or default) will require the entire index to be scanned.
Here's my attempt and modifying your query. Unfortunately, I don't have a mysql install with blogcfc installed to test with and I'm more comfortable with oracle syntax.
select (select id from tblblogentries e where c.entryidfk = e.id) as entryID,
(select title from tblblogentries e where c.entryidfk = e.id) as title,
c.id,
c.entryidfk,
c.name,
c.comment,
date_add(c.posted, interval -1 hour) as posted
from tblblogcomments c
where c.entryidfk in (select id from tblblogentries where blog = 'Default')
order by c.posted desc
limit 5
Hi.
You have an index on CommentID? Is this value unique for every row? If so, I thought such columns were not good candidates for indexes, and can slow down query time?
@Danny S, the index on CommentID is probably because it is a primary key and that is how the DB enforces the PK uniqueness requirement.
Brian -
Ok - I agree. I now have to look up why, other than key fields, it's not a good idea to index columns that have unique values for every record... Maybe I dreamed that.
@Aidan Kane: Can't speak for MySQL, but SQL Server will only pick one index per table to use in a query, so you'll get the most bang for your buck if your index covers as many of the pertinent columns as possible.
Ray, somebody above mentioned WITH (NOLOCK), which is used in SQL Server SELECT statements to ignore table locking. If there is a similar statement in MySQL, you should look into it, because the fact that you saw intermittent slowness is a big red flag to me that some of your reads are being done while the table is being written to (and is thus locked temporarily).
I use NOLOCKS on every SELECT across all of my applications, because data is constantly being inserted into my tables. I'm not dealing with financial data, so I don't really care about the occasional dirty read--the performance boost is far more important.
@Brian: Yeah, having blog in both columns seems bad, BUT, if it creates a good practical result, I can see living with it. As it stands, I don't support folks being able to change their blog name and the app 'fixes' data. I may consider it for the future.
@Mike: No idea on the with(nolock) - will try to google.
Have you considered, doing this as a view, part of the problem is the join, the other is any functions you are doing to the data.
in sql server you can do explain plans, there might be similar options in mysql. But my guess is because of the join and date functions, might be better to do it as a view..
@Craig, A view wouldn't help unless the problem is with the interpreter parsing the SQL. The SQL isn't complex, so it is doubtful that would help, imho.
Again, from an MSSQL standpoint.
Not sure if this was posted before, but a "first" fix before moving to indexing / plan issues is to put the where clause in the join to become:
inner join tblblogentries e on c.entryidfk = e.id AND blog = 'Default'
Causes the where clause to restrict the records joined instead of filtering records post-join. According to Dr. Seuss - much, much, ever-so-muchly more faster on large data sets.
Jason, I tested both queries in the MySQL query tool and both seemed to run in about the same time. I also did an EXPLAIN on em and both had the same results.
sorry if I'm way off here, but are the IDs in your comment table sequential numbers or something else? because if they are just numbers and sequential, is there any reason why you can't just ORDER BY c.id DESC LIMIT 5?
NVM - answered my own question - shutting up now ;P
Have you tried turning it into a stored procedure. I know they do offer some performances benefits as they compile the query.
Just to be wild and different have you tried running this?
SELECT e.id as entryID, e.title, c.id,c.entryidfk,c.name,c.comment,date_add(c.posted, interval -1 hour) as posted
FROM tblblogentries e
LEFT OUTER JOIN tblblogcomments c ON (c.entryidfk = e.id)
where blog = 'Default'
order by c.posted desc
limit 5
..Also if you hadn't already ensuring the column tblblogcomments.entryidfk and tblblogentries.blog are indexed can't hurt.
..and another thing.. if you don't end up being able to optimize that query, have you looked into creating a view out of it?
I wonder if this might help with conflict issues:
LOCK TABLES tblblogcomments c, tblblogentries e READ [LOCAL];
{existing SQL here}
UNLOCK TABLES;
I'm not sure from the docs whether or not it'll need to be READ LOCAL. I'm also not sure if this will play nicely with a cfquery tag; I think we can do the multiple-statements thing to trick SQL Server into coughing up identity values, so perhaps it's as simple as the MySQL docs make it sound.
Anything else I can picture seems already to have been covered by a previous commenter ...
Ray ref your #49,
If you are considering design changes, would it not be better to have a separate table of blog names, with an id that both the entries table and the comments table could use? It might be a lot easier to join to the blog name table, than its been with to the entries table, at least for this purpose.
I'm not considering any major changes for 5.x. 6.x, sure.
@Dave - didn't work.
@Tim - tried the query - didn't seem to help. (My 'scientific' testing is to run a few times and compare to current SQL.)
Umm .... keep the last 5 posts in an array in the application scope? Yeah.... I know it's cheating ;)
It's not the last 5 posts - it's the last 5 _comments_. ;)
Interesting. I am working with Gary Funk off thread and he suggested changing my comments/entries tables from InnoDB to MyISAM. I knew they were different but didn't really have a great knowledge of how. Anyway, I made the switch (after backing up the DB), executed the SQL, and it returned instantly.
Hey guys. Looking this article I've decided to try to right a query of query having my main query been stored on the application scope. I can't make this query to work, any ideas?
** application.learning_center_assets_list is the query on the application scope
<cfquery name="getAssets" dbtype="query">
select c.asset_title, c.asset_id, ls.S_section_name, c.asset_description, c.file_name, c.replay_media, c.asset_subtype, c.date_header, c.time_duration,
c.asset_type, c.asset_location_link,c.asset_created, lso.S_solution_name, lso.solution_ID, c.replay_pdf, c.asset_post_date, c.language_id, c.file_size
from
(
select * from application.learning_center_assets_list
) c
join LCSECTION ls on ( c.asset_type = ls.section_ID )
join LCSOLUTION lso on ( c.asset_solution = lso.solution_ID )
where c.asset_type = ls.section_ID and
c.asset_solution = lso.solution_ID and
c.asset_type <> 5
</cfquery>