php MySQL 在加入时慢。有什么办法可以加速

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1291145/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 01:54:32  来源:igfitidea点击:

MySQL Slow on join. Any way to speed up

phpmysqldatejoin

提问by kayem

I have 2 tables. 1 is music and 2 is listenTrack. listenTrack tracks the unique plays of each song. I am trying to get results for popular songs of the month. I'm getting my results but they are just taking too long. Below is my tables and query

我有2张桌子。1 是音乐,2 是 listenTrack。listenTrack 跟踪每首歌曲的独特播放。我正在尝试获得本月流行歌曲的结果。我正在得到我的结果,但他们只是花了太长时间。下面是我的表和查询

430,000 rows

430,000 行

CREATE TABLE `listentrack` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `sessionId` varchar(50) NOT NULL,
    `url` varchar(50) NOT NULL,
    `date_created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    `ip` varchar(150) NOT NULL,
    `user_id` int(11) DEFAULT NULL,
     PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=731306 DEFAULT CHARSET=utf8

12500 rows

12500 行

CREATE TABLE `music` (
   `music_id` int(11) NOT NULL AUTO_INCREMENT,
   `user_id` int(11) NOT NULL,
   `title` varchar(50) DEFAULT NULL,
   `artist` varchar(50) DEFAULT NULL,
   `description` varchar(255) DEFAULT NULL,
   `genre` int(4) DEFAULT NULL,
   `file` varchar(255) NOT NULL,
   `url` varchar(50) NOT NULL,
   `allow_download` int(2) NOT NULL DEFAULT '1',
   `plays` bigint(20) NOT NULL,
   `downloads` bigint(20) NOT NULL,
   `faved` bigint(20) NOT NULL,
   `dateadded` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
   PRIMARY KEY (`music_id`)
) ENGINE=MyISAM AUTO_INCREMENT=15146 DEFAULT CHARSET=utf8


SELECT COUNT(listenTrack.url) AS total, listenTrack.url 
FROM listenTrack
LEFT JOIN music ON music.url = listenTrack.url
WHERE DATEDIFF(DATE(date_created),'2009-08-15') = 0
GROUP BY listenTrack.url
ORDER BY total DESC
LIMIT 0,10

this query isn't very complex and the rows aren't too large, i don't think.

这个查询不是很复杂,行也不是太大,我不认为。

Is there any way to speed this up? Or can you suggest a better solution? This is going to be a cron job at the beggining of every month but I would also like to do by the day results as well.

有没有办法加快这个速度?或者你能提出更好的解决方案吗?这将是每个月开始时的一项 cron 工作,但我也想按天完成结果。

Oh btw i am running this locally, over 4 min to run, but on prod it takes about 45 secs

哦顺便说一句,我在本地运行它,运行时间超过 4 分钟,但在 prod 上大约需要 45 秒

回答by Jeff Siver

I'm more of a SQL Server guy but these concepts should apply.

我更像是一个 SQL Server 人,但这些概念应该适用。

I'd add indexes:

我会添加索引:

  1. On ListenTrack, add an index with url, and date_created
  2. On Music, add an index with url
  1. 在 ListenTrack 上,添加一个带有 url 和 date_created 的索引
  2. 在音乐上,添加带有 url 的索引

These indexes should speed the query up tremendously (I originally had the table names mixed up - fixed in the latest edit).

这些索引应该会极大地加快查询速度(我最初将表名混淆了 - 在最新的编辑中修复了)。

回答by Cody Caughlan

For the most part you should also index any column that is used in a JOIN. In your case, you should index both listentrack.urland music.url

大多数情况下,您还应该为 JOIN 中使用的任何列建立索引。在你的情况下,你应该同时索引listentrack.urlmusic.url

@jeff s - An index music.date_created wouldnt help because you are running that through a function first so MySQL cannot use an index on that column. Often, you can rewrite a query so that the indexed referenced column is used statically like:

@jeff s - 索引 music.date_created 无济于事,因为您首先通过函数运行它,因此 MySQL 无法在该列上使用索引。通常,您可以重写查询,以便静态使用索引引用列,例如:

DATEDIFF(DATE(date_created),'2009-08-15') = 0

becomes

变成

date_created >= '2009-08-15' and date_created < '2009-08-15'

This will filter down records that are from 2009-08-15 and allow any indexes on that column to be candidates. Note that MySQL might NOT use that index, it depends on other factors.

这将过滤 2009-08-15 的记录,并允许该列上的任何索引成为候选索引。请注意,MySQL 可能不会使用该索引,这取决于其他因素。

Your best bet is to make a dual index on listentrack(url, date_created)and then another index on music.url

最好的办法是先建立一个双索引listentrack(url, date_created),然后再建立另一个索引music.url

These 2 indexes will cover this particular query.

这 2 个索引将涵盖此特定查询。

Note that if you run EXPLAINon this query you are still going to get a using filesortbecause it has to write the records to a temporary table on disk to do the ORDER BY.

请注意,如果您运行EXPLAIN此查询,您仍然会得到 a,using filesort因为它必须将记录写入磁盘上的临时表以执行 ORDER BY。

In general you should always run your query under EXPLAINto get an idea on how MySQL will execute the query and then go from there. See the EXPLAINdocumentation:

通常,您应该始终运行查询EXPLAIN以了解 MySQL 将如何执行查询,然后从那里开始。请参阅EXPLAIN文档:

http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

http://dev.mysql.com/doc/refman/5.0/en/using-explain.html

回答by VoteyDisciple

Try creating an index that will help with the join:

尝试创建一个有助于连接的索引:

CREATE INDEX idx_url ON music (url);

回答by TheJacobTaylor

I think I might have missed the obvious before. Why are you joining the music table at all? You do not appear to be using the data in that table at all and you are performing a left join which is not required, right? I think this table being in the query will make it much slower and will not add any value. Take all references to music out, unless the url inclusion is required, in which case you need a right join to force it to not include a row without a matching value.

我想我之前可能错过了显而易见的事情。你为什么要加入音乐桌?您似乎根本没有使用该表中的数据,并且您正在执行不需要的左连接,对吗?我认为这个表在查询中会使它变慢并且不会增加任何价值。删除所有对音乐的引用,除非需要包含 url,在这种情况下,您需要一个正确的连接来强制它不包含没有匹配值的行。



I would add new indexes, as the others mention. Specifically I would add: music url listentrack date_created,url

正如其他人提到的,我会添加新索引。具体来说,我会添加:音乐 url listentrack date_created,url

This will improve your join a ton.

这将大大改善您的加入。

Then I would look at the query, you are forcing the system to perform work on each row of the table. It would be better to rephrase the date restriction as a range.

然后我会查看查询,您正在强制系统对表的每一行执行工作。最好将日期限制重新表述为一个范围。

Not sure of the syntax off the top of my head: where '2009-08-15 00:00:00' <= date_created < 2009-08-16 00:00:00

不确定我头脑中的语法: where '2009-08-15 00:00:00' <= date_created < 2009-08-16 00:00:00

That should allow it to rapidly use the index to locate the appropriate records. The combined two key index on music should allow it to find the records based on the date and URL. You should experiment, they might be better off going in the other direction url,date_created on the index.

这应该允许它快速使用索引来定位适当的记录。音乐上组合的两个关键索引应该允许它根据日期和 URL 查找记录。你应该尝试一下,他们可能会更好地在索引上的另一个方向 url,date_created 。

The explain plan for this query should say "using index" on the right hand column for both. That means that it will not have to hit the data in the table to calculate your sums.

此查询的解释计划应在右侧列中为两者显示“使用索引”。这意味着它不必点击表中的数据来计算您的总和。

I would also check the memory settings that you have configured for MySQL. It sounds like you do not have enough memory allocated. Be very careful on the differences between server based settings and thread based settings. The server with a 10MB cache is pretty small, a thread with a 10MB cache can use a lot of memory quickly.

我还会检查您为 MySQL 配置的内存设置。听起来您没有分配足够的内存。非常小心基于服务器的设置和基于线程的设置之间的差异。具有 10MB 缓存的服务器非常小,具有 10MB 缓存的线程可以快速使用大量内存。

Jacob

雅各布

回答by TheJacobTaylor

Pre-grouping and then joining makes things a lot faster with MySQL/MyISAM. (I'm suspicious less of this is needed with other DB's)

使用 MySQL/MyISAM 进行预分组然后加入可以使事情变得更快。(我怀疑其他数据库不需要这样做)

This should perform about as fast as the non-joined version:

这应该与未加入的版本一样快:

SELECT
   total, a.url, title
FROM
(
  SELECT COUNT(*) as total, url
  from listenTrack
  WHERE DATEDIFF(DATE(date_created),'2009-08-15') = 0
  GROUP BY url
  ORDER BY total DESC
  LIMIT 0,10
) as a
LEFT JOIN music ON music.url = a.url
;

P.S. - Mapping between the two tables with an id instead of a url is sound advice.

PS - 使用 id 而不是 url 的两个表之间的映射是合理的建议。

回答by kyoryu

Why are you repeating the url in both tables?

为什么要在两个表中重复 url?

Have listentrack hold a music_id instead, and join on that. Gets rid of the text search as well as the extra index.

让 listentrack 持有一个 music_id,然后加入。摆脱文本搜索以及额外的索引。

Besides, it's arguably more correct. You're tracking the times that a particular track was listened to, not the url. What if the url changes?

此外,它可以说是更正确的。您正在跟踪收听特定曲目的时间,而不是网址。如果网址改变了怎么办?

回答by mson

you might want to add an index to the url field of both tables.

您可能希望向两个表的 url 字段添加索引。

having said that, when i converted from mysql to sql server 2008, with the same queries and same database structures, the queries ran 1-3 orders of magnitude faster.

话虽如此,当我从 mysql 转换到 sql server 2008 时,使用相同的查询和相同的数据库结构,查询运行速度快了 1-3 个数量级。

i think some of it had to do with the rdbms (mysql optimizers are not so good...) and some of it might have had to do with how the rdbms reserve system resources. although, the comparisons were made on production systems where only the db would run.

我认为其中一些与 rdbms 有关(mysql 优化器不太好......),其中一些可能与 rdbms 如何保留系统资源有关。不过,比较是在只有数据库运行的生产系统上进行的。

回答by James Black

After you add indexes then you may want to explore adding a new column for the date_created to be a unix_timestamp, which will make math operations quicker.

添加索引后,您可能想要探索为 date_created 添加一个新列作为 unix_timestamp,这将使数学运算更快。

I am not certain why you have the diff function though, as it appears you are looking for all rows that were updated on a particular date.

我不确定为什么你有 diff 函数,因为看起来你正在寻找在特定日期更新的所有行。

You may want to look at your query as it seems to have an error.

您可能想要查看您的查询,因为它似乎有错误。

If you use unit tests then you can compare the results of your query and a query using a unix timestamp instead.

如果您使用单元测试,那么您可以比较您的查询和使用 unix 时间戳的查询的结果。

回答by JTHouseCat

This below would probably work to speed up the query.

下面的这可能会加快查询速度。

CREATE INDEX music_url_index ON music (url) USING BTREE; CREATE INDEX listenTrack_url_index ON listenTrack (url) USING BTREE;

CREATE INDEX music_url_index ON music (url) USING BTREE; 使用 BTREE 创建索引 listenTrack_url_index ON listenTrack (url);

You really need to know the total number of comparisons and row scans that are happening. To get that answer look at the code here of how to do that using explain http://www.siteconsortium.com/h/p1.php?id=mysql002.

您确实需要知道正在发生的比较和行扫描的总数。要获得该答案,请查看此处的代码,了解如何使用解释http://www.siteconsortium.com/h/p1.php?id=mysql002执行此操作。