MySQL:ORDER BY RAND() 的替代方案

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1823306/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 14:38:27  来源:igfitidea点击:

MySQL: Alternatives to ORDER BY RAND()

sqlmysqlrandomsql-order-by

提问by Tony

I've read about a few alternatives to MySQL's ORDER BY RAND()function, but most of the alternatives apply only to where on a single random result is needed.

我已经阅读了 MySQLORDER BY RAND()函数的一些替代方案,但大多数替代方案仅适用于需要单个随机结果的地方。

Does anyone have any idea how to optimize a query that returns multiple random results, such as this:

有谁知道如何优化返回多个随机结果的查询,例如:

   SELECT u.id, 
          p.photo 
     FROM users u, profiles p 
    WHERE p.memberid = u.id 
      AND p.photo != '' 
      AND (u.ownership=1 OR u.stamp=1) 
 ORDER BY RAND() 
    LIMIT 18 

回答by Neoaptt

UPDATE 2016

2016 年更新

This solution works best using an indexed column.

此解决方案使用索引列效果最佳。

Here is a simple example of and optimized query bench marked with 100,000 rows.

这是一个简单的例子,优化的查询台标有 100,000 行。

OPTIMIZED: 300ms

优化:300ms

SELECT 
    g.*
FROM
    table g
        JOIN
    (SELECT 
        id
    FROM
        table
    WHERE
        RAND() < (SELECT 
                ((4 / COUNT(*)) * 10)
            FROM
                table)
    ORDER BY RAND()
    LIMIT 4) AS z ON z.id= g.id

note about limit ammount: limit 4 and 4/count(*). The 4s need to be the same number. Changing how many you return doesn't effect the speed that much. Benchmark at limit 4 and limit 1000 are the same. Limit 10,000 took it up to 600ms

关于限制数量的注意事项:限制 4 和 4/count(*)。4s 必须是相同的数字。更改返回的数量不会对速度产生太大影响。限制 4 和限制 1000 的基准是相同的。限制 10,000 花了 600 毫秒

note about join: Randomizing just the id is faster than randomizing a whole row. Since it has to copy the entire row into memory then randomize it. The join can be any table that is linked to the subquery Its to prevent tablescans.

关于 join 的注意事项:仅随机化 id 比随机化整行更快。由于它必须将整行复制到内存中,然后对其进行随机化。联接可以是链接到子查询的任何表,以防止表扫描。

note where clause: The where count limits down the ammount of results that are being randomized. It takes a percentage of the results and sorts them rather than the whole table.

注意 where 子句: where 计数限制了被随机化的结果的数量。它需要一定比例的结果并对它们而不是整个表格进行排序。

note sub query: The if doing joins and extra where clause conditions you need to put them both in the subquery and the subsubquery. To have an accurate count and pull back correct data.

注意子查询:如果执行连接和额外的 where 子句条件,您需要将它们同时放在子查询和子子查询中。有一个准确的计数和拉回正确的数据。

UNOPTIMIZED: 1200ms

未优化:1200ms

SELECT 
    g.*
FROM
    table g
ORDER BY RAND()
LIMIT 4

PROS

优点

4x faster than order by rand(). This solution can work with any table with a indexed column.

比 快 4 倍order by rand()。此解决方案适用于任何带有索引列的表。

CONS

缺点

It is a bit complex with complex queries. Need to maintain 2 code bases in the subqueries

复杂的查询有点复杂。需要在子查询中维护 2 个代码库

回答by OMG Ponies

Here's an alternative, but it is still based on using RAND():

这是一个替代方案,但它仍然基于使用 RAND():

  SELECT u.id, 
         p.photo,
         ROUND(RAND() * x.m_id) 'rand_ind'
    FROM users u, 
         profiles p,
         (SELECT MAX(t.id) 'm_id'
            FROM USERS t) x
   WHERE p.memberid = u.id 
     AND p.photo != '' 
     AND (u.ownership=1 OR u.stamp=1) 
ORDER BY rand_ind
   LIMIT 18

This is slightly more complex, but gave a better distribution of random_ind values:

这稍微复杂一些,但提供了更好的 random_ind 值分布:

  SELECT u.id, 
         p.photo,
         FLOOR(1 + RAND() * x.m_id) 'rand_ind'
    FROM users u, 
         profiles p,
         (SELECT MAX(t.id) - 1 'm_id'
            FROM USERS t) x
   WHERE p.memberid = u.id 
     AND p.photo != '' 
     AND (u.ownership=1 OR u.stamp=1) 
ORDER BY rand_ind
   LIMIT 18

回答by Adlaran

It is not the fastest, but faster then common ORDER BY RAND()way:

它不是最快的,但比普通ORDER BY RAND()方法更快:

ORDER BY RAND()is not so slow, when you use it to find only indexed column. You can take all your ids in one query like this:

ORDER BY RAND()不是那么慢,当您使用它来查找仅索引列时。您可以在一个查询中获取所有 ID,如下所示:

SELECT id
FROM testTable
ORDER BY RAND();

to get a sequence of random ids, and JOINthe result to another query with other SELECT or WHERE parameters:

获取一系列随机 ID,并将JOIN结果发送到另一个带有其他 SELECT 或 WHERE 参数的查询:

SELECT t.*
FROM testTable t
JOIN
    (SELECT id
    FROM `testTable`
    ORDER BY RAND()) AS z ON z.id= t.id   
WHERE t.isVisible = 1
LIMIT 100; 

in your case it would be:

在您的情况下,它将是:

SELECT u.id, p.photo 
FROM users u, profiles p 
JOIN
    (SELECT id
    FROM users
    ORDER BY RAND()) AS z ON z.id = u.id   
WHERE p.memberid = u.id 
  AND p.photo != '' 
  AND (u.ownership=1 OR u.stamp=1) 
LIMIT 18 

It's very blunt method and it can be not proper with very big tables, but still it's faster than common RAND(). I got 20 times faster execution time searching 3000 random rows in almost 400000.

这是非常生硬的方法,对于非常大的表可能不合适,但它仍然比 common 快RAND()。我在近 400000 行中搜索 3000 条随机行的执行时间快了 20 倍。

回答by Peth? Jonatán

Create a column or join to a select with random numbers (generated in for example php) and order by this column.

创建一列或加入带有随机数的选择(例如在 php 中生成)并按此列排序。

回答by tonio

Order by rand()is very slow on large tables,

Order by rand()在大桌子上很慢,

I found the following workaround in a php script:

我在 php 脚本中找到了以下解决方法:

Select min(id) as min, max(id) as max from table;

Then do random in php

然后在php中做随机

$rand = rand($min, $max);

Then

然后

'Select * from table where id>'.$rand.' limit 1';

Seems to be quite fast....

好像还挺快的……

回答by Joe T

I ran into this today and was trying to use 'DISTINCT' along with JOINs, but was getting duplicates I assume because the RAND was making each JOINed row distinct. I muddled around a bit and found a solution that works, like this:

我今天遇到了这个问题,并试图将 'DISTINCT' 与 JOIN 一起使用,但我认为是重复的,因为 RAND 使每个 JOINed 行不同。我摸索了一下,找到了一个有效的解决方案,如下所示:

SELECT DISTINCT t.id, 
                t.photo 
       FROM (SELECT  u.id, 
                     p.photo,
                     RAND() as rand
                FROM users u, profiles p 
                 WHERE p.memberid = u.id 
                  AND p.photo != '' 
                  AND (u.ownership=1 OR u.stamp=1)
                ORDER BY rand) t
       LIMIT 18

回答by lawrenceshen

The solution I am using is also posted in the link below: How can i optimize MySQL's ORDER BY RAND() function?

我正在使用的解决方案也发布在下面的链接中: 如何优化 MySQL 的 ORDER BY RAND() 函数?

I am assuming your users table is going to be larger than your profiles table, if not then it's 1 to 1 cardinality.

我假设您的用户表将比您的配置文件表大,如果不是,那么它是 1 比 1 的基数。

If so, I would first do a random selection on user table before joining with profile table.

如果是这样,我会先在用户表上进行随机选择,然后再加入配置文件表。

First do selection:

首先做选择:

SELECT *
FROM users
WHERE users.ownership = 1 OR users.stamp = 1

Then from this pool, pick out random rows through calculated probability. If your table has M rows and you want to pick out N random rows, the probability of random selection should be N/M. Hence:

然后从这个池中,通过计算的概率挑选出随机的行。如果你的表有M行,你想随机挑出N行,随机选择的概率应该是N/M。因此:

SELECT *
FROM
(
    SELECT *
    FROM users
    WHERE users.ownership = 1 OR users.stamp = 1
) as U
WHERE 
    rand() <= $limitCount / (SELECT count(*) FROM users WHERE users.ownership = 1 OR users.stamp = 1)

Where N is $limitCount and M is the subquery that calculates the table row count. However, since we are working on probability, it is possible to have LESS than $limitCount of rows returned. Therefore we should multiply N by a factor to increase the random pool size.

其中 N 是 $limitCount,M 是计算表行数的子查询。但是,由于我们正在研究概率,因此返回的行数可能少于 $limitCount。因此我们应该将 N 乘以一个因子来增加随机池的大小。

i.e:

IE:

SELECT*
FROM
(
    SELECT *
    FROM users
    WHERE users.ownership = 1 OR users.stamp = 1
) as U
WHERE 
    rand() <= $limitCount * $factor / (SELECT count(*) FROM users WHERE users.ownership = 1 OR users.stamp = 1)

I usually set $factor = 2. You can set the factor to a lower value to further reduce the random pool size (e.g. 1.5).

我通常设置 $factor = 2。您可以将因子设置为较低的值以进一步减小随机池大小(例如 1.5)。

At this point, we would have already limited a M size table down to roughly 2N size. From here we can do a JOIN then LIMIT.

在这一点上,我们已经将 M 大小的表限制为大约 2N 大小。从这里我们可以先加入然后限制。

SELECT * 
FROM
(
       SELECT *
        FROM
        (
            SELECT *
            FROM users
            WHERE users.ownership = 1 OR users.stamp = 1
        ) as U
        WHERE 
            rand() <= $limitCount * $factor / (SELECT count(*) FROM users WHERE users.ownership = 1 OR users.stamp = 1)
) as randUser
JOIN profiles
ON randUser.id = profiles.memberid AND profiles.photo != ''
LIMIT $limitCount

On a large table, this query will outperform a normal ORDER by RAND() query.

在大表上,此查询的性能将优于普通 ORDER by RAND() 查询。

Hope this helps!

希望这可以帮助!