SQL 从查询结果中选择随机结果样本

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/733652/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 01:39:32  来源:igfitidea点击:

Select a random sample of results from a query result

sqloraclerandom-sample

提问by Jeremy French

This questionasks about getting a random(ish) sample of records on SQL Server and the answer was to use TABLESAMPLE. Is there an equivalent in Oracle 10?

这个问题询问在 SQL Server 上获取随机(ish)记录样本,答案是使用TABLESAMPLE. Oracle 10 中是否有等价物?

If there isn't, is there a standard way to get a random sample of results from a query set? For example how can one get 1,000 random rows from a query that will return millions normally?

如果没有,是否有一种标准方法可以从查询集中获取随机结果样本?例如,如何从正常返回数百万的查询中获取 1,000 条随机行?

回答by Quassnoi

SELECT  *
FROM    (
        SELECT  *
        FROM    mytable
        ORDER BY
                dbms_random.value
        )
WHERE rownum <= 1000

回答by grokster

The SAMPLE clausewill give you a random sample percentage of all rows in a table.

样本条款会给你表中的所有行的随机抽样百分比。

For example, here we obtain 25% of the rows:

例如,这里我们获得了 25% 的行:

SELECT * FROM emp SAMPLE(25)

The following SQL (using one of the analytical functions) will give you a random sample of a specific number of each occurrence of a particular value (similar to a GROUP BY) in a table.

以下 SQL(使用其中一个分析函数)将为您提供表中每次出现特定值(类似于 GROUP BY)的特定数量的随机样本。

Here we sample 10 of each:

在这里,我们对每个样本进行了 10 次采样:

SELECT * FROM (
SELECT job, sal, ROW_NUMBER()
OVER (
PARTITION BY job ORDER BY job
) SampleCount FROM emp
)
WHERE SampleCount <= 10

回答by Arturo Hernandez

This in not a perfect answer but will get much better performance.

这不是一个完美的答案,但会获得更好的性能。

SELECT  *
FROM    (
    SELECT  *
    FROM    mytable sample (0.01)
    ORDER BY
            dbms_random.value
    )
WHERE rownum <= 1000

Sample will give you a percent of your actual table, if you really wanted a 1000 rows you would need to adjust that number. More often I just need an arbitrary number of rows anyway so I don't limit my results. On my database with 2 million rows I get 2 seconds vs 60 seconds.

Sample 将为您提供实际表的百分比,如果您真的想要 1000 行,则需要调整该数字。更多时候我只需要任意数量的行,所以我不会限制我的结果。在我有 200 万行的数据库上,我得到 2 秒 vs 60 秒。

select * from mytable sample (0.01)

回答by Hors2force

SELECT * FROM TABLE_NAME SAMPLE(1)

Will give you olny an approximate 1% share rather than exactly 1/100 of the number of observations. The likely reason is than Oracle generates a random flag for each observation as to whether include in in the sample that it generates. The argument 1 (1%) in such a generation process takes the role of probability of each observation's being selected into the sample.

会给你 olny 大约 1% 的份额,而不是观察数量的 1/100。可能的原因是 Oracle 为每个观察生成一个随机标志,以确定是否包含在它生成的样本中。在这样的生成过程中,参数 1 (1%) 扮演了每个观察被选入样本的概率的角色。

If this is true, the actual distribution of sample sizes will be binomial.

如果这是真的,样本大小的实际分布将是二项式的。

回答by Thomas Tschernich

I know this has already been answered, but seeing so many visits here I'd like to add one version that uses the SAMPLE clause but still allows to filter the rows first:

我知道这已经得到了回答,但是在这里看到如此多的访问,我想添加一个使用 SAMPLE 子句但仍允许首先过滤行的版本:

with cte1 as (
    select *
    from t_your_table
    where your_column = 'ABC'
)
select * from cte1 sample (5)

Note however that the base select needs a ROWIDcolumn, which means it may not work for some views for example.

但是请注意,基本选择需要一ROWID列,这意味着它可能不适用于某些视图。

回答by Ankit Bajpai

Sample function is used for sample data in ORACLE. So you can try like this:-

Sample 函数用于ORACLE 中的样本数据。所以你可以这样尝试:-

SELECT * FROM TABLE_NAME SAMPLE(50);

Here 50 is the percentage of data contained by the table. So if you want 1000 rows from 100000. You can execute a query like:

这里的 50 是表中包含的数据的百分比。因此,如果您想要 100000 行中的 1000 行。您可以执行如下查询:

SELECT * FROM TABLE_NAME SAMPLE(1);

Hope this can help you.

希望这可以帮到你。

回答by ChrisNZak

We were given and assignment to select only two records from the list of agents..i.e 2 random records for each agent over the span of a week etc.... and below is what we got and it works

我们被要求只从代理列表中选择两条记录......即在一周内每个代理的 2 个随机记录等等......下面是我们得到的并且它有效

with summary as (
Select Dbms_Random.Random As Ran_Number,
             colmn1,
             colm2,
             colm3
             Row_Number() Over(Partition By col2 Order By Dbms_Random.Random) As Rank
    From table1, table2
 Where Table1.Id = Table2.Id
 Order By Dbms_Random.Random Asc)
Select tab1.col2,
             tab1.col4,
             tab1.col5,
    From Summary s
 Where s.Rank <= 2;

回答by BASMA SHAWKY

Something like this should work:

这样的事情应该工作:

SELECT * 
FROM table_name
WHERE primary_key IN (SELECT primary_key 
                      FROM
                      (
                        SELECT primary_key, SYS.DBMS_RANDOM.RANDOM 
                        FROM table_name 
                        ORDER BY 2
                      )
                      WHERE rownum <= 10 );