如何在 PostgreSQL 中将 SELECT DISTINCT 与 RANDOM() 函数一起使用?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/11401229/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to use SELECT DISTINCT with RANDOM() function in PostgreSQL?
提问by Marcio Mazzucato
I am trying to run a SQL query to get four random items. As the table product_filter
has more than one touple in product
i have to use DISTINCT
in SELECT
, so i get this error:
我正在尝试运行 SQL 查询以获取四个随机项目。由于表中product_filter
有不止一个 toupleproduct
我必须使用DISTINCT
in SELECT
,所以我收到此错误:
for SELECT DISTINCT, ORDER BY expressions must appear in select list
对于 SELECT DISTINCT,ORDER BY 表达式必须出现在选择列表中
But if i put RANDOM()
in my SELECT
it will avoid the DISTINCT
result.
但如果我把它放在RANDOM()
我的SELECT
里面,它将避免DISTINCT
结果。
Someone know how to use DISTINCT
with the RANDOM()
function? Below is my problematic query.
有人知道怎么DISTINCT
用这个RANDOM()
函数吗?下面是我有问题的查询。
SELECT DISTINCT
p.id,
p.title
FROM
product_filter pf
JOIN product p ON pf.cod_product = p.cod
JOIN filters f ON pf.cod_filter = f.cod
WHERE
p.visible = TRUE
LIMIT 4
ORDER BY RANDOM();
采纳答案by Erwin Brandstetter
You can simplify your query to avoid the problem a priori:
您可以简化查询以避免先验问题:
SELECT p.cod, p.title
FROM product p
WHERE p.visible
AND EXISTS (
SELECT 1
FROM product_filter pf
JOIN filters f ON f.cod = pf.cod_filter
WHERE pf.cod_product = p.cod
)
ORDER BY random()
LIMIT 4;
Major points:
要点:
You have only columns from table
product
in the result, other tables are only checked for existence of a matching row. For a case like this theEXISTS
semi-joinis likely the fastest and simplest solution. Using it does not multiply rows from the base tableproduct
, so you don't need to remove them again withDISTINCT
.LIMIT
has to come last, afterORDER BY
.I simplified WHERE
p.visible = 't'
top.visible
, because this shouldbe a boolean column.
product
结果中只有表中的列,其他表只检查匹配行的存在。对于这种情况,EXISTS
半连接可能是最快和最简单的解决方案。使用它不会将基表中的行相乘product
,因此您无需再次使用DISTINCT
.LIMIT
必须在ORDER BY
.我将 WHERE 简化
p.visible = 't'
为p.visible
,因为这应该是一个布尔列。
回答by LSerni
You either do a subquery
你要么做一个子查询
SELECT * FROM (
SELECT DISTINCT p.cod, p.title ... JOIN... WHERE
) ORDER BY RANDOM() LIMIT 4;
or you try GROUPing for those same fields:
或者您尝试对这些相同的字段进行分组:
SELECT p.cod, p.title, MIN(RANDOM()) AS o FROM ... JOIN ...
WHERE ... GROUP BY p.cod, p.title ORDER BY o LIMIT 4;
Which of the two expressions will evaluate faster depends on table structure and indexing; with proper indexing on cod and title, the subquery version will run faster (cod and title will be taken from index cardinality information, and cod is the only key needed for the JOIN, so if you index by title, cod and visible (used in the WHERE), it is likely that the physical table will not even be accessed at all.
这两个表达式中哪个计算得更快取决于表结构和索引;通过对 cod 和 title 进行适当的索引,子查询版本将运行得更快(cod 和 title 将从索引基数信息中获取,并且 cod 是 JOIN 所需的唯一键,因此如果您按标题、cod 和可见(用于WHERE),很可能根本就不会访问物理表。
I am not so sure whether this would happen with the second expression too.
我不太确定这是否也会发生在第二个表达式中。
回答by embulldogs99
Use a subquery. Don't forget the table alias, t
. LIMIT
comes after ORDER BY
.
使用子查询。不要忘记表别名,t
. LIMIT
之后ORDER BY
。
SELECT *
FROM (SELECT DISTINCT a, b, c
FROM datatable WHERE a = 'hello'
) t
ORDER BY random()
LIMIT 10;
回答by Gordon Linoff
I think you need a subquery:
我认为你需要一个子查询:
select *
from (select DISTINCT p.cod, p.title
from product_filter pf join
product p
on pf.cod_product = p.cod
where p.visible = 't'
) t
LIMIT 4
order by RANDOM()
Calculate the distinct values first, and then do the limit.
先计算不同的值,然后做限制。
Do note, this does have performance implications, because this query does a distinct on everything before selecting what you want. Whether this matters depends on the size of your table and how you are using the query.
请注意,这确实会影响性能,因为在选择您想要的内容之前,此查询会对所有内容进行不同的处理。这是否重要取决于您的表的大小以及您如何使用查询。
回答by Holger Brandt
SELECT DISTINCT U.* FROM
(
SELECT p.cod, p.title FROM product__filter pf
JOIN product p on pf.cod_product = p.cod
JOIN filters f on pf.cod_filter = f.cod
WHERE p.visible = 't'
ORDER BY RANDOM()
) AS U
LIMIT 4
This does the RANDOM first then the LIMIT afterwards.
这首先执行 RANDOM,然后执行 LIMIT。