MySQL 从mysql的大表中快速选择随机行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/211329/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 12:14:46  来源:igfitidea点击:

quick selection of a random row from a large table in mysql

mysqlsqlselectrandom

提问by lajos

What is a fast way to select a random row from a large mysql table?

从大型mysql表中选择随机行的快速方法是什么?

I'm working in php, but I'm interested in any solution even if it's in another language.

我正在使用 php,但我对任何解决方案感兴趣,即使它是另一种语言。

回答by Lasse V. Karlsen

Grab all the id's, pick a random one from it, and retrieve the full row.

获取所有 id,从中随机选择一个,然后检索整行。

If you know the id's are sequential without holes, you can just grab the max and calculate a random id.

如果您知道 id 是连续的,没有孔,您可以只获取最大值并计算随机 id。

If there are holes here and there but mostly sequential values, and you don't care about a slightly skewed randomness, grab the max value, calculate an id, and select the first row with an id equal to or above the one you calculated. The reason for the skewing is that id's following such holes will have a higher chance of being picked than ones that follow another id.

如果这里和那里都有漏洞,但主要是连续值,并且您不关心稍微偏斜的随机性,请获取最大值,计算 id,然后选择 id 等于或高于您计算的 id 的第一行。倾斜的原因是,跟在这些洞后面的 id 比跟在另一个 id 后面的那些有更高的机会被选中。

If you order by random, you're going to have a terrible table-scan on your hands, and the word quickdoesn't apply to such a solution.

如果您随机订购,您手上的表格扫描会很糟糕,而且快速这个词不适用于这样的解决方案。

Don't do that, nor should you order by a GUID, it has the same problem.

不要那样做,也不应该按 GUID 订购,它也有同样的问题。

回答by Vinko Vrsalovic

I knew there had to be a way to do it in a single query in a fast way. And here it is:

我知道必须有一种方法可以快速地在单个查询中完成它。这是:

A fast way without involvement of external code, kudos to

一种不涉及外部代码的快速方法,值得称赞

http://jan.kneschke.de/projects/mysql/order-by-rand/

http://jan.kneschke.de/projects/mysql/order-by-rand/

SELECT name
  FROM random AS r1 JOIN
       (SELECT (RAND() *
                     (SELECT MAX(id)
                        FROM random)) AS id)
        AS r2
 WHERE r1.id >= r2.id
 ORDER BY r1.id ASC
 LIMIT 1;

回答by CesarB

MediaWiki uses an interesting trick (for Wikipedia's Special:Random feature): the table with the articles has an extra column with a random number (generated when the article is created). To get a random article, generate a random number and get the article with the next larger or smaller (don't recall which) value in the random number column. With an index, this can be very fast. (And MediaWiki is written in PHP and developed for MySQL.)

MediaWiki 使用了一个有趣的技巧(针对 Wikipedia 的 Special:Random 功能):包含文章的表有一个带有随机数的额外列(在创建文章时生成)。要获取随机文章,请生成一个随机数并获取随机数列中下一个更大或更小(不记得是哪个)值的文章。使用索引,这可以非常快。(而 MediaWiki 是用 PHP 编写并为 MySQL 开发的。)

This approach can cause a problem if the resulting numbers are badly distributed; IIRC, this has been fixed on MediaWiki, so if you decide to do it this way you should take a look at the code to see how it's currently done (probably they periodically regenerate the random number column).

如果结果数字分布不均,这种方法可能会导致问题;IIRC,这已在 MediaWiki 上修复,因此如果您决定这样做,您应该查看代码以了解它当前是如何完成的(可能它们会定期重新生成随机数列)。

回答by Bill Karwin

Here's a solution that runs fairly quickly, and it gets a better random distribution without depending on id values being contiguous or starting at 1.

这是一个运行速度相当快的解决方案,它获得了更好的随机分布,而不依赖于 id 值是否连续或从 1 开始。

SET @r := (SELECT ROUND(RAND() * (SELECT COUNT(*) FROM mytable)));
SET @sql := CONCAT('SELECT * FROM mytable LIMIT ', @r, ', 1');
PREPARE stmt1 FROM @sql;
EXECUTE stmt1;

回答by davr

Maybe you could do something like:

也许你可以这样做:

SELECT * FROM table 
  WHERE id=
    (FLOOR(RAND() * 
           (SELECT COUNT(*) FROM table)
          )
    );

This is assuming your ID numbers are all sequential with no gaps.

这是假设您的 ID 号都是连续的,没有间隙。

回答by Rob

Add a column containing a calculated random value to each row, and use that in the ordering clause, limiting to one result upon selection. This works out faster than having the table scan that ORDER BY RANDOM()causes.

将包含计算出的随机值的列添加到每一行,并在排序子句中使用它,在选择时限制为一个结果。这比ORDER BY RANDOM()导致表扫描更快。

Update:You still need to calculate some random value prior to issuing the SELECTstatement upon retrieval, of course, e.g.

更新:当然,您仍然需要SELECT在检索时发出语句之前计算一些随机值,例如

SELECT * FROM `foo` WHERE `foo_rand` >= {some random value} LIMIT 1

回答by Rob

For selecting multiple random rows from a given table (say 'words'), our team came up with this beauty:

为了从给定的表中选择多个随机行(比如“单词”),我们的团队想出了这个美:

SELECT * FROM
`words` AS r1 JOIN 
(SELECT  MAX(`WordID`) as wid_c FROM `words`) as tmp1
WHERE r1.WordID >= (SELECT (RAND() * tmp1.wid_c) AS id) LIMIT n

回答by Ilan Hazan

There is another way to produce random rows using only a query and without order by rand(). It involves User Defined Variables. See how to produce random rows from a table

还有另一种方法可以仅使用查询而不使用 rand() 排序来生成随机行。它涉及用户定义的变量。查看如何从表中生成随机行

回答by parm.95

if you don't delete row in this table, the most efficient way is:

如果不删除此表中的行,最有效的方法是:

(if you know the mininum id just skip it)

(如果你知道最小 ID 就跳过它)

SELECT MIN(id) AS minId, MAX(id) AS maxId FROM table WHERE 1

$randId=mt_rand((int)$row['minId'], (int)$row['maxId']);

SELECT id,name,... FROM table WHERE id=$randId LIMIT 1

回答by Sagi Bron

In order to find random rows from a table, don't use ORDER BY RAND() because it forces MySQL to do a full file sort and only then to retrieve the limit rows number required. In order to avoid this full file sort, use the RAND() function only at the where clause. It will stop as soon as it reaches to the required number of rows. See http://www.rndblog.com/how-to-select-random-rows-in-mysql/

为了从表中查找随机行,不要使用 ORDER BY RAND(),因为它会强制 MySQL 进行完整的文件排序,然后才检索所需的限制行数。为了避免这种完整的文件排序,请仅在 where 子句中使用 RAND() 函数。一旦达到所需的行数,它就会停止。见 http://www.rndblog.com/how-to-select-random-rows-in-mysql/