postgresql 如何在存在重复记录的 Postgres 数据库中对不同的记录求和？

Question

提问by Katie F

Imagine a table that looks like this:

想象一个看起来像这样的表：

table with duplicate data

包含重复数据的表

The SQL to get this data was just SELECT * The first column is "row_id" the second is "id" - which is the order ID and the third is "total" - which is the revenue.

获取此数据的 SQL 只是 SELECT * 第一列是“row_id”，第二列是“id”——这是订单 ID，第三列是“总”——这是收入。

I'm not sure why there are duplicate rows in the database, but when I do a SUM(total), it's including the second entry in the database, even though the order ID is the same, which is causing my numbers to be larger than if I select distinct(id), total - export to excel and then sum the values manually.

我不确定为什么数据库中有重复的行，但是当我执行 SUM(total) 时，它包括数据库中的第二个条目，即使订单 ID 相同，这导致我的数字更大比如果我选择distinct(id), total - 导出到excel然后手动求和值。

So my question is - how can I SUM on just the distinct order IDs so that I get the same revenue as if I exported to excel every distinct order ID row?

所以我的问题是 - 我怎样才能只对不同的订单 ID 求和，以便获得相同的收入，就像我导出到每个不同的订单 ID 行一样？

Thanks in advance!

提前致谢！

Answer 1

回答by Bohemian

Easy - just divide by the count:

简单 - 只需除以计数：

select id, sum(total) / count(id)
from orders
group by id

Also handles any level of duplication, eg triplicates etc.

还可以处理任何级别的重复，例如三次重复等。

Answer 2

回答by zedfoxus

You can try something like this (with your example):

你可以尝试这样的事情（以你的例子）：

Table

桌子

create table test (
  row_id int,
  id int,
  total decimal(15,2)
);

insert into test values 
(6395, 1509, 112), (22986, 1509, 112), 
(1393, 3284, 40.37), (24360, 3284, 40.37);

Query

询问

with distinct_records as (
  select distinct id, total from test
)

select a.id, b.actual_total, array_agg(a.row_id) as row_ids
from test a
inner join (select id, sum(total) as actual_total from distinct_records group by id) b
  on a.id = b.id
group by a.id, b.actual_total

Result

结果

|   id | actual_total |    row_ids |
|------|--------------|------------|
| 1509 |          112 | 6395,22986 |
| 3284 |        40.37 | 1393,24360 |

Explanation

解释

We do not know what the reasons is for orders and totals to appear more than one time with different row_id. So using a common table expression (CTE) using the with ...phrase, we get the distinct id and total.

我们不知道订单和总计以不同的 row_id 出现不止一次的原因是什么。因此，使用使用with ...短语的公共表表达式 (CTE) ，我们得到不同的 id 和总数。

Under the CTE, we use this distinct data to do totaling. We join ID in the original table with the aggregation over distinct values. Then we comma-separate row_ids so that the information looks cleaner.

在 CTE 下，我们使用这些不同的数据进行总计。我们将原始表中的 ID 与不同值的聚合连接起来。然后我们用逗号分隔 row_ids，使信息看起来更清晰。

SQLFiddle example

SQLFiddle 示例

http://sqlfiddle.com/#!15/72639/3

Answer 3

回答by Mike Kruk

You can use DISTINCTin your aggregate functions:

您可以DISTINCT在聚合函数中使用：

SELECT id, SUM(DISTINCT total) FROM orders GROUP BY id

Documentation here: https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES

此处的文档：https: //www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES

Answer 4

回答by PaulZi

In difficult cases:

在困难的情况下：

select
  id,
  (
    SELECT SUM(value::int4)
    FROM jsonb_each_text(jsonb_object_agg(row_id, total))
  ) as total
from orders
group by id

Answer 5

回答by Jaques Rheeder

I would suggest just use a sub-Query:

我建议只使用子查询：

SELECT "a"."id", SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"
GROUP BY "a"."id"

The Above will give you the total of each id

以上会给你每个id的总数

Use below if you want the full total of each duplicate removed:

如果您想删除每个重复项的全部总数，请使用以下内容：

SELECT SUM("a"."total")
FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"

Answer 6

回答by scottjustin5000

If we can trust that the total for 1 order is actually 1 row. We could eliminate the duplicates in a sub-query by selecting the the MAX of the PK id column. An example:

如果我们可以相信 1 个订单的总数实际上是 1 行。我们可以通过选择 PK id 列的 MAX 来消除子查询中的重复项。一个例子：

CREATE TABLE test2 (id int, order_id int, total int);

insert into test2 values (1,1,50);
insert into test2 values (2,1,50);
insert into test2 values (5,1,50);
insert into test2 values (3,2,100);
insert into test2 values (4,2,100);

select order_id, sum(total)
   from test2 t
   join (
     select max(id) as id
      from test2 
       group by order_id) as sq
  on t.id = sq.id
  group by order_id

sql fiddle

sql 小提琴

postgresql 如何在存在重复记录的 Postgres 数据库中对不同的记录求和？

提问by Katie F

回答by Bohemian

回答by zedfoxus

回答by Mike Kruk

回答by PaulZi

回答by Jaques Rheeder

回答by scottjustin5000

相关推荐

最近更新

标签

postgresql 如何在存在重复记录的 Postgres 数据库中对不同的记录求和？

提问by Katie F

回答by Bohemian

回答by zedfoxus

回答by Mike Kruk

回答by PaulZi

回答by Jaques Rheeder

回答by scottjustin5000

相关推荐

postgresql Spring Data 返回 List<Object[]>

Postgresql 无法使用 -u 快捷方式更改为 root

postgresql Ecto Postgres 安装错误密码认证失败

postgresql postgres 中空值的选择查询中的默认值

相关推荐

最近更新

标签