如何优化 MySQL 的 ORDER BY RAND() 函数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1244555/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 13:49:21  来源:igfitidea点击:

How can i optimize MySQL's ORDER BY RAND() function?

mysqlrandomperformance

提问by fabrik

I'd like to optimize my queries so I look into mysql-slow.log.

我想优化我的查询,所以我研究了mysql-slow.log.

Most of my slow queries contains ORDER BY RAND(). I cannot find a real solution to resolve this problem. Theres is a possible solution at MySQLPerformanceBlogbut I don't think this is enough. On poorly optimized (or frequently updated, user managed) tables it doesn't work or I need to run two or more queries before I can select my PHP-generated random row.

我的大部分慢查询都包含ORDER BY RAND(). 我找不到解决此问题的真正解决方案。MySQLPerformanceBlog有一个可能的解决方案,但我认为这还不够。在优化不佳(或经常更新,用户管理)的表上,它不起作用,或者我需要运行两个或多个查询才能选择我PHP生成的随机行。

Is there any solution for this issue?

这个问题有什么解决方案吗?

A dummy example:

一个虚拟示例:

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation, accomodation_category
WHERE   accomodation.ac_status != 'draft'
        AND accomodation.ac_category = accomodation_category.acat_id
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
ORDER BY
        RAND()
LIMIT 1

采纳答案by Quassnoi

Try this:

尝试这个:

SELECT  *
FROM    (
        SELECT  @cnt := COUNT(*) + 1,
                @lim := 10
        FROM    t_random
        ) vars
STRAIGHT_JOIN
        (
        SELECT  r.*,
                @lim := @lim - 1
        FROM    t_random r
        WHERE   (@cnt := @cnt - 1)
                AND RAND(20090301) < @lim / @cnt
        ) i

This is especially efficient on MyISAM(since the COUNT(*)is instant), but even in InnoDBit's 10times more efficient than ORDER BY RAND().

这在MyISAM(因为COUNT(*)是即时的)上特别有效,但即使在InnoDB10ORDER BY RAND().

The main idea here is that we don't sort, but instead keep two variables and calculate the running probabilityof a row to be selected on the current step.

这里的主要思想是我们不排序,而是保留两个变量并计算running probability要在当前步骤中选择的行的 。

See this article in my blog for more detail:

有关更多详细信息,请参阅我博客中的这篇文章:

Update:

更新:

If you need to select but a single random record, try this:

如果您只需要选择一条随机记录,请尝试以下操作:

SELECT  aco.*
FROM    (
        SELECT  minid + FLOOR((maxid - minid) * RAND()) AS randid
        FROM    (
                SELECT  MAX(ac_id) AS maxid, MIN(ac_id) AS minid
                FROM    accomodation
                ) q
        ) q2
JOIN    accomodation aco
ON      aco.ac_id =
        COALESCE
        (
        (
        SELECT  accomodation.ac_id
        FROM    accomodation
        WHERE   ac_id > randid
                AND ac_status != 'draft'
                AND ac_images != 'b:0;'
                AND NOT EXISTS
                (
                SELECT  NULL
                FROM    accomodation_category
                WHERE   acat_id = ac_category
                        AND acat_slug = 'vendeglatohely'
                )
        ORDER BY
                ac_id
        LIMIT   1
        ),
        (
        SELECT  accomodation.ac_id
        FROM    accomodation
        WHERE   ac_status != 'draft'
                AND ac_images != 'b:0;'
                AND NOT EXISTS
                (
                SELECT  NULL
                FROM    accomodation_category
                WHERE   acat_id = ac_category
                        AND acat_slug = 'vendeglatohely'
                )
        ORDER BY
                ac_id
        LIMIT   1
        )
        )

This assumes your ac_id's are distributed more or less evenly.

这假设您的ac_id's 分布或多或少均匀。

回答by DisgruntledGoat

It depends on how random you need to be. The solution you linked works pretty well IMO. Unless you have large gaps in the ID field, it's still pretty random.

这取决于你需要多随机。您链接的解决方案在 IMO 中运行良好。除非您在 ID 字段中有很大的空白,否则它仍然是非常随机的。

However, you should be able to do it in one query using this (for selecting a single value):

但是,您应该能够使用它在一个查询中完成它(用于选择单个值):

SELECT [fields] FROM [table] WHERE id >= FLOOR(RAND()*MAX(id)) LIMIT 1

Other solutions:

其他解决方案:

  • Add a permanent float field called randomto the table and fill it with random numbers. You can then generate a random number in PHP and do "SELECT ... WHERE rnd > $random"
  • Grab the entire list of IDs and cache them in a text file. Read the file and pick a random ID from it.
  • Cache the results of the query as HTML and keep it for a few hours.
  • 添加一个永久浮点字段调用random到表并用随机数填充它。然后您可以在 PHP 中生成一个随机数并执行"SELECT ... WHERE rnd > $random"
  • 获取整个 ID 列表并将它们缓存在一个文本文件中。读取文件并从中选择一个随机 ID。
  • 将查询结果缓存为 HTML 并保留几个小时。

回答by Bill Karwin

Here's how I'd do it:

这是我的做法:

SET @r := (SELECT ROUND(RAND() * (SELECT COUNT(*)
  FROM    accomodation a
  JOIN    accomodation_category c
    ON (a.ac_category = c.acat_id)
  WHERE   a.ac_status != 'draft'
        AND c.acat_slug != 'vendeglatohely'
        AND a.ac_images != 'b:0;';

SET @sql := CONCAT('
  SELECT  a.ac_id,
        a.ac_status,
        a.ac_name,
        a.ac_status,
        a.ac_images
  FROM    accomodation a
  JOIN    accomodation_category c
    ON (a.ac_category = c.acat_id)
  WHERE   a.ac_status != ''draft''
        AND c.acat_slug != ''vendeglatohely''
        AND a.ac_images != ''b:0;''
  LIMIT ', @r, ', 1');

PREPARE stmt1 FROM @sql;

EXECUTE stmt1;

回答by Rick James

(Yeah, I will get dinged for not having enough meat here, but can't you be a vegan for one day?)

(是的,我会因为这里没有足够的肉而受到谴责,但你就不能成为一名素食主义者吗?)

Case: Consecutive AUTO_INCREMENT without gaps, 1 row returned
Case: Consecutive AUTO_INCREMENT without gaps, 10 rows
Case: AUTO_INCREMENT with gaps, 1 row returned
Case: Extra FLOAT column for randomizing
Case: UUID or MD5 column

案例:连续 AUTO_INCREMENT 无间隙,返回 1 行
案例:连续 AUTO_INCREMENT 无间隙,10 行
案例:AUTO_INCREMENT 有间隙,返回 1 行
案例:用于随机化的额外 FLOAT 列
案例:UUID 或 MD5 列

Those 5 cases can be made very efficient for large tables. See my blogfor the details.

对于大型表,这 5 个案例可以非常有效。有关详细信息,请参阅我的博客

回答by Karl Mikko

This will give you single sub query that will use the index to get a random id then the other query will fire getting your joined table.

这将为您提供单个子查询,该查询将使用索引来获取随机 ID,然后另一个查询将触发获取您的连接表。

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation, accomodation_category
WHERE   accomodation.ac_status != 'draft'
        AND accomodation.ac_category = accomodation_category.acat_id
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
AND accomodation.ac_id IS IN (
        SELECT accomodation.ac_id FROM accomodation ORDER BY RAND() LIMIT 1
)

回答by tere?ko

The solution for your dummy-example would be:

您的虚拟示例的解决方案是:

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation,
        JOIN 
            accomodation_category 
            ON accomodation.ac_category = accomodation_category.acat_id
        JOIN 
            ( 
               SELECT CEIL(RAND()*(SELECT MAX(ac_id) FROM accomodation)) AS ac_id
            ) AS Choices 
            USING (ac_id)
WHERE   accomodation.ac_id >= Choices.ac_id 
        AND accomodation.ac_status != 'draft'
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
LIMIT 1

To read more about alternatives to ORDER BY RAND(), you should read this article.

要阅读有关替代品的更多信息ORDER BY RAND(),您应该阅读这篇文章

回答by lawrenceshen

I am optimizing a lot of existing queries in my project. Quassnoi's solution has helped me speed up the queries a lot! However, I find it hard to incorporate the said solution in all queries, especially for complicated queries involving many subqueries on multiple large tables.

我正在优化项目中的许多现有查询。Quassnoi 的解决方案帮助我大大加快了查询速度!但是,我发现很难将上述解决方案合并到所有查询中,尤其是对于涉及多个大表上的许多子查询的复杂查询。

So I am using a less optimized solution. Fundamentally it works the same way as Quassnoi's solution.

所以我使用了一个不太优化的解决方案。从根本上讲,它的工作方式与 Quassnoi 的解决方案相同。

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation, accomodation_category
WHERE   accomodation.ac_status != 'draft'
        AND accomodation.ac_category = accomodation_category.acat_id
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
        AND rand() <= $size * $factor / [accomodation_table_row_count]
LIMIT $size

$size * $factor / [accomodation_table_row_count]works out the probability of picking a random row. The rand() will generate a random number. The row will be selected if rand() is smaller or equals to the probability. This effectively performs a random selection to limit the table size. Since there is a chance it will return less than the defined limit count, we need to increase probability to ensure we are selecting enough rows. Hence we multiply $size by a $factor (I usually set $factor = 2, works in most cases). Finally we do the limit $size

$size * $factor / [accomodation_table_row_count]计算选择随机行的概率。rand() 将生成一个随机数。如果 rand() 小于或等于概率,则将选择该行。这有效地执行随机选择以限制表大小。由于它有可能返回少于定义的限制计数,我们需要增加概率以确保我们选择了足够的行。因此我们将 $size 乘以 $factor(我通常设置 $factor = 2,在大多数情况下都有效)。最后我们做limit $size

The problem now is working out the accomodation_table_row_count. If we know the table size, we COULD hard code the table size. This would run the fastest, but obviously this is not ideal. If you are using Myisam, getting table count is very efficient. Since I am using innodb, I am just doing a simple count+selection. In your case, it would look like this:

现在的问题是解决accomodation_table_row_count。如果我们知道表格大小,我们可以对表格大小进行硬编码。这将运行得最快,但显然这并不理想。如果您使用的是 Myisam,则获取表数非常有效。由于我使用的是innodb,所以我只是在做一个简单的计数+选择。在您的情况下,它看起来像这样:

SELECT  accomodation.ac_id,
        accomodation.ac_status,
        accomodation.ac_name,
        accomodation.ac_status,
        accomodation.ac_images
FROM    accomodation, accomodation_category
WHERE   accomodation.ac_status != 'draft'
        AND accomodation.ac_category = accomodation_category.acat_id
        AND accomodation_category.acat_slug != 'vendeglatohely'
        AND ac_images != 'b:0;'
        AND rand() <= $size * $factor / (select (SELECT count(*) FROM `accomodation`) * (SELECT count(*) FROM `accomodation_category`))
LIMIT $size

The tricky part is working out the right probability. As you can see the following code actually only calculates the rough temp table size (In fact, too rough!): (select (SELECT count(*) FROM accomodation) * (SELECT count(*) FROM accomodation_category))But you can refine this logic to give a closer table size approximation. Note that it is better to OVER-select than to under-select rows. i.e. if the probability is set too low, you risk not selecting enough rows.

棘手的部分是计算出正确的概率。正如您所看到的,以下代码实际上仅计算了粗略的临时表大小(实际上,太粗略了!):(select (SELECT count(*) FROM accomodation) * (SELECT count(*) FROM accomodation_category))但是您可以细化此逻辑以提供更接近的表大小近似值。请注意,OVER-select 比under-select 行要好。即,如果概率设置得太低,您就有可能没有选择足够的行。

This solution runs slower than Quassnoi's solution since we need to recalculate the table size. However, I find this coding a lot more manageable. This is a trade off between accuracy + performancevs coding complexity. Having said that, on large tables this is still by far faster than Order by Rand().

这个解决方案比 Quassnoi 的解决方案运行得慢,因为我们需要重新计算表的大小。但是,我发现这种编码更易于管理。这是准确性 + 性能编码复杂性之间的权衡。话虽如此,在大表上,这仍然比按 Rand() 排序要快得多。

Note: If the query logic permits, perform the random selection as early as possible before any join operations.

注意:如果查询逻辑允许,请在任何连接操作之前尽早执行随机选择。

回答by Rokhayakebe

function getRandomRow(){
    $id = rand(0,NUM_OF_ROWS_OR_CLOSE_TO_IT);
    $res = getRowById($id);
    if(!empty($res))
    return $res;
    return getRandomRow();
}

//rowid is a key on table
function getRowById($rowid=false){

   return db select from table where rowid = $rowid; 
}