postgresql 如何在存在重复记录的 Postgres 数据库中对不同的记录求和?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36524450/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How can I SUM distinct records in a Postgres database where there are duplicate records?
提问by Katie F
Imagine a table that looks like this:
想象一个看起来像这样的表:
The SQL to get this data was just SELECT * The first column is "row_id" the second is "id" - which is the order ID and the third is "total" - which is the revenue.
获取此数据的 SQL 只是 SELECT * 第一列是“row_id”,第二列是“id”——这是订单 ID,第三列是“总”——这是收入。
I'm not sure why there are duplicate rows in the database, but when I do a SUM(total), it's including the second entry in the database, even though the order ID is the same, which is causing my numbers to be larger than if I select distinct(id), total - export to excel and then sum the values manually.
我不确定为什么数据库中有重复的行,但是当我执行 SUM(total) 时,它包括数据库中的第二个条目,即使订单 ID 相同,这导致我的数字更大比如果我选择distinct(id), total - 导出到excel然后手动求和值。
So my question is - how can I SUM on just the distinct order IDs so that I get the same revenue as if I exported to excel every distinct order ID row?
所以我的问题是 - 我怎样才能只对不同的订单 ID 求和,以便获得相同的收入,就像我导出到每个不同的订单 ID 行一样?
Thanks in advance!
提前致谢!
回答by Bohemian
Easy - just divide by the count:
简单 - 只需除以计数:
select id, sum(total) / count(id)
from orders
group by id
Also handles any level of duplication, eg triplicates etc.
还可以处理任何级别的重复,例如三次重复等。
回答by zedfoxus
You can try something like this (with your example):
你可以尝试这样的事情(以你的例子):
Table
桌子
create table test (
row_id int,
id int,
total decimal(15,2)
);
insert into test values
(6395, 1509, 112), (22986, 1509, 112),
(1393, 3284, 40.37), (24360, 3284, 40.37);
Query
询问
with distinct_records as (
select distinct id, total from test
)
select a.id, b.actual_total, array_agg(a.row_id) as row_ids
from test a
inner join (select id, sum(total) as actual_total from distinct_records group by id) b
on a.id = b.id
group by a.id, b.actual_total
Result
结果
| id | actual_total | row_ids |
|------|--------------|------------|
| 1509 | 112 | 6395,22986 |
| 3284 | 40.37 | 1393,24360 |
Explanation
解释
We do not know what the reasons is for orders and totals to appear more than one time with different row_id. So using a common table expression (CTE) using the with ...
phrase, we get the distinct id and total.
我们不知道订单和总计以不同的 row_id 出现不止一次的原因是什么。因此,使用使用with ...
短语的公共表表达式 (CTE) ,我们得到不同的 id 和总数。
Under the CTE, we use this distinct data to do totaling. We join ID in the original table with the aggregation over distinct values. Then we comma-separate row_ids so that the information looks cleaner.
在 CTE 下,我们使用这些不同的数据进行总计。我们将原始表中的 ID 与不同值的聚合连接起来。然后我们用逗号分隔 row_ids,使信息看起来更清晰。
SQLFiddle example
SQLFiddle 示例
回答by Mike Kruk
You can use DISTINCT
in your aggregate functions:
您可以DISTINCT
在聚合函数中使用:
SELECT id, SUM(DISTINCT total) FROM orders GROUP BY id
Documentation here: https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES
此处的文档:https: //www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES
回答by PaulZi
In difficult cases:
在困难的情况下:
select
id,
(
SELECT SUM(value::int4)
FROM jsonb_each_text(jsonb_object_agg(row_id, total))
) as total
from orders
group by id
回答by Jaques Rheeder
I would suggest just use a sub-Query:
我建议只使用子查询:
SELECT "a"."id", SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
GROUP BY "a"."id"
The Above will give you the total of each id
以上会给你每个id的总数
Use below if you want the full total of each duplicate removed:
如果您想删除每个重复项的全部总数,请使用以下内容:
SELECT SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
回答by scottjustin5000
If we can trust that the total for 1 order is actually 1 row. We could eliminate the duplicates in a sub-query by selecting the the MAX of the PK id column. An example:
如果我们可以相信 1 个订单的总数实际上是 1 行。我们可以通过选择 PK id 列的 MAX 来消除子查询中的重复项。一个例子:
CREATE TABLE test2 (id int, order_id int, total int);
insert into test2 values (1,1,50);
insert into test2 values (2,1,50);
insert into test2 values (5,1,50);
insert into test2 values (3,2,100);
insert into test2 values (4,2,100);
select order_id, sum(total)
from test2 t
join (
select max(id) as id
from test2
group by order_id) as sq
on t.id = sq.id
group by order_id