MySQL 快速从 600K 行中随机选择 10 行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4329396/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 17:52:19  来源:igfitidea点击:

MySQL select 10 random rows from 600K rows fast

mysqlsql

提问by Francisc

How can I best write a query that selects 10 rows randomly from a total of 600k?

我怎样才能最好地编写一个从总共 600k 行中随机选择 10 行的查询?

采纳答案by Riedsio

A great post handling several cases, from simple, to gaps, to non-uniform with gaps.

处理多种情况的出色帖子,从简单到间隙,再到不均匀的间隙。

http://jan.kneschke.de/projects/mysql/order-by-rand/

http://jan.kneschke.de/projects/mysql/order-by-rand/

For most general case, here is how you do it:

对于大多数一般情况,以下是您的操作方法:

SELECT name
  FROM random AS r1 JOIN
       (SELECT CEIL(RAND() *
                     (SELECT MAX(id)
                        FROM random)) AS id)
        AS r2
 WHERE r1.id >= r2.id
 ORDER BY r1.id ASC
 LIMIT 1

This supposes that the distribution of ids is equal, and that there can be gaps in the id list. See the article for more advanced examples

这假设 id 的分布是相等的,并且 id 列表中可能存在间隙。有关更高级的示例,请参阅文章

回答by Preetam Purbia

SELECT column FROM table
ORDER BY RAND()
LIMIT 10

Not the efficient solution but works

不是有效的解决方案,但有效

回答by Ali

Simple query that has excellent performance and works with gaps:

简单的查询,具有出色的性能和差距工作

SELECT * FROM tbl AS t1 JOIN (SELECT id FROM tbl ORDER BY RAND() LIMIT 10) as t2 ON t1.id=t2.id

This query on a 200K table takes 0.08sand the normal version (SELECT * FROM tbl ORDER BY RAND() LIMIT 10) takes 0.35son my machine.

这个对 200K 表的查询需要0.08 秒,而普通版本 (SELECT * FROM tbl ORDER BY RAND() LIMIT 10)在我的机器上需要0.35 秒

This is fast because the sort phase only uses the indexed ID column. You can see this behaviour in the explain:

这很快,因为排序阶段只使用索引 ID 列。您可以在解释中看到这种行为:

SELECT * FROM tbl ORDER BY RAND() LIMIT 10: Simple Explain

SELECT * FROM tbl ORDER BY RAND() LIMIT 10: 简单解释

SELECT * FROM tbl AS t1 JOIN (SELECT id FROM tbl ORDER BY RAND() LIMIT 10) as t2 ON t1.id=t2.id enter image description here

SELECT * FROM tbl AS t1 JOIN (SELECT id FROM tbl ORDER BY RAND() LIMIT 10) as t2 ON t1.id=t2.id 在此处输入图片说明

Weighted Version: https://stackoverflow.com/a/41577458/893432

加权版本https: //stackoverflow.com/a/41577458/893432

回答by snippetsofcode

I am getting fast queries(around 0.5 seconds) with a slow cpu, selecting 10 random rows in a 400K registers MySQL database non-cached 2Gb size. See here my code: Fast selection of random rows in MySQL

我正在使用慢速 cpu 进行快速查询(大约 0.5 秒),在 400K 寄存器 MySQL 数据库非缓存 2Gb 大小中选择 10 个随机行。在这里查看我的代码:快速选择 MySQL 中的随机行

<?php
$time= microtime_float();

$sql='SELECT COUNT(*) FROM pages';
$rquery= BD_Ejecutar($sql);
list($num_records)=mysql_fetch_row($rquery);
mysql_free_result($rquery);

$sql="SELECT id FROM pages WHERE RAND()*$num_records<20
   ORDER BY RAND() LIMIT 0,10";
$rquery= BD_Ejecutar($sql);
while(list($id)=mysql_fetch_row($rquery)){
    if($id_in) $id_in.=",$id";
    else $id_in="$id";
}
mysql_free_result($rquery);

$sql="SELECT id,url FROM pages WHERE id IN($id_in)";
$rquery= BD_Ejecutar($sql);
while(list($id,$url)=mysql_fetch_row($rquery)){
    logger("$id, $url",1);
}
mysql_free_result($rquery);

$time= microtime_float()-$time;

logger("num_records=$num_records",1);
logger("$id_in",1);
logger("Time elapsed: <b>$time segundos</b>",1);
?>

回答by Muhammad Azeem

Its very simple and single line query.

它非常简单和单行查询。

SELECT * FROM Table_Name ORDER BY RAND() LIMIT 0,10;

回答by zloctb

From book :

从书:

Choose a Random Row Using an Offset

使用偏移量选择随机行

Still another technique that avoids problems found in the preceding alternatives is to count the rows in the data set and return a random number between 0 and the count. Then use this number as an offset when querying the data set

避免上述替代方案中发现的问题的另一种技术是对数据集中的行进行计数并返回一个介于 0 和计数之间的随机数。然后在查询数据集时使用这个数字作为偏移量

<?php
$rand = "SELECT ROUND(RAND() * (SELECT COUNT(*) FROM Bugs))";
$offset = $pdo->query($rand)->fetch(PDO::FETCH_ASSOC);
$sql = "SELECT * FROM Bugs LIMIT 1 OFFSET :offset";
$stmt = $pdo->prepare($sql);
$stmt->execute( $offset );
$rand_bug = $stmt->fetch();

Use this solution when you can't assume contiguous key values and you need to make sure each row has an even chance of being selected.

当您不能假设连续的键值并且需要确保每一行都有均匀的被选中机会时,请使用此解决方案。

回答by The Surrican

Well if you have no gaps in your keys and they are all numeric you can calculate random numbers and select those lines. but this will probably not be the case.

好吧,如果您的键中没有间隙并且它们都是数字,您可以计算随机数并选择这些行。但情况可能并非如此。

So one solution would be the following:

因此,一种解决方案如下:

SELECT * FROM table WHERE key >= FLOOR(RAND()*MAX(id)) LIMIT 1

which will basically ensure that you get a random number in the range of your keys and then you select the next best which is greater. you have to do this 10 times.

这将基本上确保您在密钥范围内获得一个随机数,然后您选择下一个更大的最佳值。你必须这样做10次。

however this is NOT really random because your keys will most likely not be distributed evenly.

但这并不是真正随机的,因为您的密钥很可能不会均匀分布。

It's really a big problem and not easy to solve fulfilling all the requirements, MySQL's rand() is the best you can get if you really want 10 random rows.

这真的是一个大问题,要解决满足所有要求并不容易,如果你真的想要 10 个随机行,MySQL 的 rand() 是你能得到的最好的。

There is however another solution which is fast but also has a trade off when it comes to randomness, but may suit you better. Read about it here: How can i optimize MySQL's ORDER BY RAND() function?

然而,还有另一种解决方案,它速度很快,但在随机性方面也需要权衡,但可能更适合您。在这里阅读:如何优化 MySQL 的 ORDER BY RAND() 函数?

Question is how random do you need it to be.

问题是你需要它有多随机。

Can you explain a bit more so I can give you a good solution.

你能解释一下吗,这样我就可以给你一个很好的解决方案。

For example a company I worked with had a solution where they needed absolute randomness extremely fast. They ended up with pre-populating the database with random values that were selected descending and set to different random values afterwards again.

例如,我合作过的一家公司有一个解决方案,他们需要极快的绝对随机性。他们最终使用随机值预先填充数据库,这些随机值是降序选择的,然后再次设置为不同的随机值。

If you hardly ever update you could also fill an incrementing id so you have no gaps and just can calculate random keys before selecting... It depends on the use case!

如果你几乎不更新,你也可以填充一个递增的 id,这样你就没有间隙,并且可以在选择之前计算随机键......这取决于用例!

回答by user1931858

How to select random rows from a table:

如何从表中选择随机行:

From here: Select random rows in MySQL

从这里: 在 MySQL 中选择随机行

A quick improvement over "table scan" is to use the index to pick up random ids.

对“表扫描”的快速改进是使用索引来获取随机 ID。

SELECT *
FROM random, (
        SELECT id AS sid
        FROM random
        ORDER BY RAND( )
        LIMIT 10
    ) tmp
WHERE random.id = tmp.sid;

回答by bogdan

I used this http://jan.kneschke.de/projects/mysql/order-by-rand/posted by Riedsio (i used the case of a stored procedure that returns one or more random values):

我使用了 Riedsio 发布的这个http://jan.kneschke.de/projects/mysql/order-by-rand/(我使用了返回一个或多个随机值的存储过程的情况):

   DROP TEMPORARY TABLE IF EXISTS rands;
   CREATE TEMPORARY TABLE rands ( rand_id INT );

    loop_me: LOOP
        IF cnt < 1 THEN
          LEAVE loop_me;
        END IF;

        INSERT INTO rands
           SELECT r1.id
             FROM random AS r1 JOIN
                  (SELECT (RAND() *
                                (SELECT MAX(id)
                                   FROM random)) AS id)
                   AS r2
            WHERE r1.id >= r2.id
            ORDER BY r1.id ASC
            LIMIT 1;

        SET cnt = cnt - 1;
      END LOOP loop_me;

In the article he solves the problem of gapsin ids causing not so random resultsby maintaining a table (using triggers, etc...see the article); I'm solving the problem by adding another column to the table, populated with contiguous numbers, starting from 1 (edit:this column is added to the temporary table created by the subquery at runtime, doesn't affect your permanent table):

在文章中,他通过维护一个表(使用触发器等...参见文章)解决了ids 中的差距导致不那么随机的结果问题;我通过向表中添加另一列来解决这个问题,用连续数字填充,从 1 开始(编辑:此列在运行时添加到子查询创建的临时表中,不会影响您的永久表):

   DROP TEMPORARY TABLE IF EXISTS rands;
   CREATE TEMPORARY TABLE rands ( rand_id INT );

    loop_me: LOOP
        IF cnt < 1 THEN
          LEAVE loop_me;
        END IF;

        SET @no_gaps_id := 0;

        INSERT INTO rands
           SELECT r1.id
             FROM (SELECT id, @no_gaps_id := @no_gaps_id + 1 AS no_gaps_id FROM random) AS r1 JOIN
                  (SELECT (RAND() *
                                (SELECT COUNT(*)
                                   FROM random)) AS id)
                   AS r2
            WHERE r1.no_gaps_id >= r2.id
            ORDER BY r1.no_gaps_id ASC
            LIMIT 1;

        SET cnt = cnt - 1;
      END LOOP loop_me;

In the article i can see he went to great lengths to optimize the code; i have no ideea if/how much my changes impact the performance but works very well for me.

在文章中我可以看到他不遗余力地优化代码;我不知道我的更改是否/有多少影响性能,但对我来说效果很好。

回答by user2406626

I needed a query to return a large number of random rows from a rather large table. This is what I came up with. First get the maximum record id:

我需要一个查询来从一个相当大的表中返回大量随机行。这就是我想出的。首先获取最大记录ID:

SELECT MAX(id) FROM table_name;

Then substitute that value into:

然后将该值替换为:

SELECT * FROM table_name WHERE id > FLOOR(RAND() * max) LIMIT n;

Where max is the maximum record id in the table and n is the number of rows you want in your result set. The assumption is that there are no gaps in the record id's although I doubt it would affect the result if there were (haven't tried it though). I also created this stored procedure to be more generic; pass in the table name and number of rows to be returned. I'm running MySQL 5.5.38 on Windows 2008, 32GB, dual 3GHz E5450, and on a table with 17,361,264 rows it's fairly consistent at ~.03 sec / ~11 sec to return 1,000,000 rows. (times are from MySQL Workbench 6.1; you could also use CEIL instead of FLOOR in the 2nd select statement depending on your preference)

其中 max 是表中的最大记录 ID,n 是您想要的结果集中的行数。假设是记录 id 中没有间隙,尽管我怀疑它是否会影响结果(虽然还没有尝试过)。我还创建了这个更通用的存储过程;传入表名和要返回的行数。我在 Windows 2008、32GB、双 3GHz E5450 上运行 MySQL 5.5.38,在一个有 17,361,264 行的表上,它在 ~.03 秒/~11 秒相当一致,返回 1,000,000 行。(时间来自 MySQL Workbench 6.1;您也可以根据自己的喜好在第二个选择语句中使用 CEIL 而不是 FLOOR)

DELIMITER $$

USE [schema name] $$

DROP PROCEDURE IF EXISTS `random_rows` $$

CREATE PROCEDURE `random_rows`(IN tab_name VARCHAR(64), IN num_rows INT)
BEGIN

SET @t = CONCAT('SET @max=(SELECT MAX(id) FROM ',tab_name,')');
PREPARE stmt FROM @t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

SET @t = CONCAT(
    'SELECT * FROM ',
    tab_name,
    ' WHERE id>FLOOR(RAND()*@max) LIMIT ',
    num_rows);

PREPARE stmt FROM @t;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
END
$$

then

然后

CALL [schema name].random_rows([table name], n);