SQL 如何使用纯SQL选择N个随机行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/396943/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to select N random rows using pure SQL?
提问by Gili
How do we combine How to request a random row in SQL?and Multiple random values in SQL Server 2005to select N random rows using a single pure-SQL query? Ideally, I'd like to avoid the use of stored procedures if possible. Is this even possible?
我们如何结合SQL中如何请求随机行?和SQL Server 2005 中的多个随机值以使用单个纯 SQL 查询选择 N 个随机行?理想情况下,如果可能,我想避免使用存储过程。这甚至可能吗?
CLARIFICATIONS:
说明:
- Pure SQL refers to as close as possible to the ANSI/ISO standard.
- The solution should be "efficient enough". Granted ORDER BY RAND() might work but as others have pointed out this isn't feasible for medium-sized tables.
- 纯 SQL 是指尽可能接近 ANSI/ISO 标准。
- 解决方案应该“足够有效”。授予 ORDER BY RAND() 可能会起作用,但正如其他人指出的那样,这对于中型表是不可行的。
采纳答案by user12861
I don't know about pure ANSI, and it's not simple, but you can check out my answer to a similar question here: Simple Random Samples from a Sql database
我不了解纯 ANSI,这并不简单,但您可以在此处查看我对类似问题的回答: 来自 Sql 数据库的简单随机样本
回答by Lasse V. Karlsen
The answer to your question is in the second link there:
您的问题的答案在第二个链接中:
SELECT * FROM table ORDER BY RAND() LIMIT 1
Just change the limit, and/or rewrite for SQL Server:
只需更改限制,和/或重写 SQL Server:
SELECT TOP 1 * FROM table ORDER BY newid()
Now, this strictlyanswers your question, but you really shouldn't be using this solution. Just try it on a large table and you'll see what I mean.
现在,这严格回答了您的问题,但您确实不应该使用此解决方案。只要在一张大桌子上试一试,你就会明白我的意思。
If your key-space is sequential, either without holes, or with very few holes, and if it has very few holes, you're not too concerned that some rows have a slightly higher chance of being picked than others, then you can use a variation where you calculate which key you want to retrieve randomly, ranging from 1 to the highest key in your table, and then retrieve the first row that has a key equal to or higher than the number you calculated. You only need the "higher than" part if your key-space has holes.
如果您的键空间是连续的,要么没有孔,要么孔很少,如果孔很少,您就不会太担心某些行被选中的机会比其他行略高,那么您可以使用一种变体,您计算要随机检索的键,范围从 1 到表中的最高键,然后检索键等于或大于您计算的数字的第一行。如果您的键空间有孔,您只需要“高于”部分。
This SQL is left as an excercise for the reader.
此 SQL 留给读者作为练习。
Edit: Note, a comment to another answer here mentions that perhaps pure SQLmeans ANSI standard SQL. If that is the case, then there is no way, since there is no standardized random function, nor does every database engine treat the random number function the same way. At least one engine I've seen "optimizes" the call by calling it once and just repeating the calculated value for all rows.
编辑:请注意,此处对另一个答案的评论提到可能纯 SQL意味着 ANSI 标准 SQL。如果是这样,那就没有办法了,因为没有标准化的随机函数,也不是每个数据库引擎都以相同的方式对待随机数函数。我见过的至少一个引擎通过调用一次并重复所有行的计算值来“优化”调用。
回答by mtillberg
Here's a potential solution, that would let you balance the risk of getting less than N rows against a sampling bias from the "front" of the table. This assumes that N is small compared to the size of the table:
这是一个潜在的解决方案,它可以让您平衡少于 N 行的风险与来自表“前面”的抽样偏差。这假设 N 与表的大小相比较小:
select * from table where random() < (N / (select count(1) from table)) limit N;
This will generally sample most of the table, but can return less than N rows. If having some bias is acceptable, the numerator can be changed from N to 1.5*N or 2*N to make it very likely that N rows will be returned. Additionally, if it's necessary to randomize the row order, not just select a random subset:
这通常会对大部分表进行采样,但可以返回少于 N 行。如果存在一些偏差是可以接受的,则可以将分子从 N 更改为 1.5*N 或 2*N,以使其很有可能返回 N 行。此外,如果需要随机化行顺序,而不仅仅是选择一个随机子集:
select * from (select * from table
where random() < (N / (select count(1) from table)) limit N)
order by mod(tableid,1111);
The downside of this solution is that, at least in PostgreSQL, it uses a sequential scan of the table. A larger numerator will speed up the query.
该解决方案的缺点是,至少在 PostgreSQL 中,它使用表的顺序扫描。较大的分子将加快查询速度。
回答by dipi evil
That's may help you:
这可能会帮助你:
SELECT TOP 3 * FROM TABLE ORDER BY NEWID()
回答by Mrityunjay Malliya
Using below code you can achieve the same you are looking for..
使用下面的代码,您可以实现您正在寻找的相同内容..
select top 1 * from student1 order by newid()
change value of N where top 1 so you will receive that number of random records.
更改 N 的值,其中 top 1 这样您将收到该数量的随机记录。