postgresql Postgres：不同但仅用于一列

Question

提问by NovumCoder

I have a table on pgsql with names (having more than 1 mio. rows), but I have also many duplicates. I select 3 fields: id, name, metadata.

我在 pgsql 上有一个带有名称的表（有超过 1 个 mio。行），但我也有很多重复项。我选择了 3 个字段：id、name、metadata。

I want to select them randomly with ORDER BY RANDOM()and LIMIT 1000, so I do this is many steps to save some memory in my PHP script.

我想用ORDER BY RANDOM()and随机选择它们LIMIT 1000，所以我这样做是为了在我的 PHP 脚本中节省一些内存。

But how can I do that so it only gives me a list having no duplicates in names.

但是我怎么能做到这一点，所以它只会给我一个名称没有重复的列表。

For example [1,"Michael Fox","2003-03-03,34,M,4545"]will be returned but not [2,"Michael Fox","1989-02-23,M,5633"]. The name field is the most important and must be unique in the list everytime I do the select and it must be random.

例如[1,"Michael Fox","2003-03-03,34,M,4545"]将返回但不返回[2,"Michael Fox","1989-02-23,M,5633"]。名称字段是最重要的，每次我进行选择时都必须在列表中是唯一的，并且必须是随机的。

I tried with GROUP BY name, bu then it expects me to have id and metadata in the GROUP BYas well or in a aggragate function, but I dont want to have them somehow filtered.

我尝试使用GROUP BY name, bu 然后它希望我在GROUP BYas well 或聚合函数中具有 id 和元数据，但我不想让它们以某种方式被过滤。

Anyone knows how to fetch many columns but do only a distinct on one column?

任何人都知道如何获取多列但只对一列执行不同的操作？

Answer 1

回答by Clodoaldo Neto

To do a distinct on only one (or n) column(s):

仅对一个（或 n）列执行不同的操作：

select distinct on (name)
    name, col1, col2
from names

This will return any of the rows containing the name. If you want to control which of the rows will be returned you need to order:

这将返回包含名称的任何行。如果您想控制将返回哪些行，您需要订购：

select distinct on (name)
    name, col1, col2
from names
order by name, col1

Will return the first row when ordered by col1.

按 col1 排序时将返回第一行。

distinct on:

distinct on：

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the “first row” of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first.
The DISTINCT ON expression(s) must match the leftmost ORDER BY expression(s). The ORDER BY clause will normally contain additional expression(s) that determine the desired precedence of rows within each DISTINCT ON group.

SELECT DISTINCT ON ( expression [, ...] ) 仅保留给定表达式计算结果相等的每组行的第一行。DISTINCT ON 表达式使用与 ORDER BY 相同的规则解释（见上文）。请注意，每个集合的“第一行”是不可预测的，除非使用 ORDER BY 来确保所需的行首先出现。
DISTINCT ON 表达式必须匹配最左边的 ORDER BY 表达式。ORDER BY 子句通常包含附加表达式，用于确定每个 DISTINCT ON 组中行的所需优先级。

Answer 2

回答by Craig Ringer

Anyone knows how to fetch many columns but do only a distinct on one column?

任何人都知道如何获取多列但只对一列执行不同的操作？

You want the DISTINCT ONclause.

你想要的DISTINCT ON条款。

You didn't provide sample data or a complete query so I don't have anything to show you. You want to write something like:

您没有提供示例数据或完整的查询，所以我没有任何东西可以向您展示。你想写这样的东西：

SELECT DISTINCT ON (name) fields, id, name, metadata FROM the_table;

This will return an unpredictable (but not "random") set of rows. If you want to make it predictable add an ORDER BYper Clodaldo's answer. If you want to make it truly random, you'll want to ORDER BY random().

这将返回一组不可预测（但不是“随机”）的行。如果您想使其可预测，请添加ORDER BY每个 Clodaldo 的答案。如果你想让它真正随机，你会想要ORDER BY random().

Answer 3

回答by David Jashi

SELECT NAME,MAX(ID) as ID,MAX(METADATA) as METADATA 
from SOMETABLE
GROUP BY NAME

postgresql Postgres：不同但仅用于一列

提问by NovumCoder

回答by Clodoaldo Neto

回答by Craig Ringer

回答by David Jashi

相关推荐

最近更新

标签

postgresql Postgres：不同但仅用于一列

提问by NovumCoder

回答by Clodoaldo Neto

回答by Craig Ringer

回答by David Jashi

相关推荐

postgresql 以 postgres 身份登录但收到错误 createuser：创建新角色失败：错误：必须是超级用户才能创建超级用户

postgresql 如何在 Flask-SQLAlchemy 中执行多个“order_by”？

PostgreSQL JOIN 来自 3 个表的数据

PostgreSQL - 过滤日期范围

相关推荐

最近更新

标签