SQL 使用 CASE 和 GROUP BY 旋转的动态替代方案
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/15506199/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Dynamic alternative to pivot with CASE and GROUP BY
提问by flipflop99
I have a table that looks like this:
我有一张看起来像这样的表:
id feh bar
1 10 A
2 20 A
3 3 B
4 4 B
5 5 C
6 6 D
7 7 D
8 8 D
And I want it to look like this:
我希望它看起来像这样:
bar val1 val2 val3
A 10 20
B 3 4
C 5
D 6 7 8
I have this query that does this:
我有这样的查询:
SELECT bar,
MAX(CASE WHEN abc."row" = 1 THEN feh ELSE NULL END) AS "val1",
MAX(CASE WHEN abc."row" = 2 THEN feh ELSE NULL END) AS "val2",
MAX(CASE WHEN abc."row" = 3 THEN feh ELSE NULL END) AS "val3"
FROM
(
SELECT bar, feh, row_number() OVER (partition by bar) as row
FROM "Foo"
) abc
GROUP BY bar
This is a very make-shifty approach and gets unwieldy if there are a lot of new columns to be created. I was wondering if the CASE
statements can be made better to make this query more dynamic? Also, I'd love to see other approaches to doing this.
这是一种非常灵活的方法,如果要创建大量新列,则它会变得笨拙。我想知道是否CASE
可以更好地使该查询更具动态性?另外,我很想看到其他方法来做到这一点。
回答by Erwin Brandstetter
If you have not installed the additional module tablefunc, run this command onceper database:
如果您尚未安装附加模块tablefunc,请为每个数据库运行一次此命令:
CREATE EXTENSION tablefunc;
Answer to question
回答问题
A very basic crosstab solution for your case:
适用于您的案例的非常基本的交叉表解决方案:
SELECT * FROM crosstab(
'SELECT bar, 1 AS cat, feh
FROM tbl_org
ORDER BY bar, feh')
AS ct (bar text, val1 int, val2 int, val3 int); -- more columns?
The special difficultyhere is, that there is no category(cat
) in the base table. For the basic 1-parameter formwe can just provide a dummy column with a dummy value serving as category. The value is ignored anyway.
这里的特殊困难在于,基表中没有类别( cat
)。对于基本的1 参数形式,我们可以只提供一个带有虚拟值作为类别的虚拟列。无论如何都会忽略该值。
This is one of the rare caseswhere the second parameterfor the crosstab()
function is not needed, because all NULL
values only appear in dangling columns to the right by definition of this problem. And the order can be determined by the value.
这是一个罕见的情况下,其中第二个参数为crosstab()
的功能并不需要,因为所有的NULL
值只出现在这一问题的界定,晃来晃去的列到右边。并且顺序可以由值确定。
If we had an actual categorycolumn with names determining the order of values in the result, we'd need the 2-parameter formof crosstab()
. Here I synthesize a category column with the help of the window function row_number()
, to base crosstab()
on:
如果我们有一个实际的类别与名称确定结果值的顺序列,我们需要的2参数形式的crosstab()
。这里我借助窗函数合成了一个类别列row_number()
,以crosstab()
:
SELECT * FROM crosstab(
$$
SELECT bar, val, feh
FROM (
SELECT *, 'val' || row_number() OVER (PARTITION BY bar ORDER BY feh) AS val
FROM tbl_org
) x
ORDER BY 1, 2
$$
, $$VALUES ('val1'), ('val2'), ('val3')$$ -- more columns?
) AS ct (bar text, val1 int, val2 int, val3 int); -- more columns?
The rest is pretty much run-of-the-mill. Find more explanation and links in these closely related answers.
其余的几乎都是普通的。在这些密切相关的答案中找到更多解释和链接。
Basics:
Read this first if you are not familiar with the crosstab()
function!
基础知识:
如果您不熟悉该crosstab()
功能,请先阅读此内容!
Advanced:
先进的:
Proper test setup
正确的测试设置
That's how you should provide a test case to begin with:
这就是您应该如何提供测试用例开始的方式:
CREATE TEMP TABLE tbl_org (id int, feh int, bar text);
INSERT INTO tbl_org (id, feh, bar) VALUES
(1, 10, 'A')
, (2, 20, 'A')
, (3, 3, 'B')
, (4, 4, 'B')
, (5, 5, 'C')
, (6, 6, 'D')
, (7, 7, 'D')
, (8, 8, 'D');
Dynamic crosstab?
动态交叉表?
Not very dynamic, yet, as @Clodoaldo commented. Dynamic return types are hard to achieve with plpgsql. But there areways around it - with some limitations.
正如@Clodoaldo 评论的那样,还不是很有活力。使用 plpgsql 很难实现动态返回类型。但是有一些方法可以解决 -有一些限制。
So not to further complicate the rest, I demonstrate with a simplertest case:
所以为了不进一步复杂化其余部分,我用一个更简单的测试用例来演示:
CREATE TEMP TABLE tbl (row_name text, attrib text, val int);
INSERT INTO tbl (row_name, attrib, val) VALUES
('A', 'val1', 10)
, ('A', 'val2', 20)
, ('B', 'val1', 3)
, ('B', 'val2', 4)
, ('C', 'val1', 5)
, ('D', 'val3', 8)
, ('D', 'val1', 6)
, ('D', 'val2', 7);
Call:
称呼:
SELECT * FROM crosstab('SELECT row_name, attrib, val FROM tbl ORDER BY 1,2')
AS ct (row_name text, val1 int, val2 int, val3 int);
Returns:
返回:
row_name | val1 | val2 | val3
----------+------+------+------
A | 10 | 20 |
B | 3 | 4 |
C | 5 | |
D | 6 | 7 | 8
Built-in feature of tablefunc
module
tablefunc
模块内置功能
The tablefunc module provides a simple infrastructure for generic crosstab()
calls without providing a column definition list. A number of functions written in C
(typically very fast):
tablefunc 模块为通用crosstab()
调用提供了一个简单的基础结构,而无需提供列定义列表。编写的许多函数 C
(通常非常快):
crosstabN()
crosstab1()
- crosstab4()
are pre-defined. One minor point: they require and return all text
. So we need to cast our integer
values. But it simplifies the call:
crosstab1()
-crosstab4()
是预先定义的。一个小问题:他们需要并返回所有text
. 所以我们需要投射我们的integer
价值观。但它简化了调用:
SELECT * FROM crosstab4('SELECT row_name, attrib, val::text -- cast!
FROM tbl ORDER BY 1,2')
Result:
结果:
row_name | category_1 | category_2 | category_3 | category_4
----------+------------+------------+------------+------------
A | 10 | 20 | |
B | 3 | 4 | |
C | 5 | | |
D | 6 | 7 | 8 |
Custom crosstab()
function
自定义crosstab()
功能
For more columnsor other data types, we create our own composite typeand function(once).
Type:
对于更多列或其他数据类型,我们创建自己的复合类型和函数(一次)。
类型:
CREATE TYPE tablefunc_crosstab_int_5 AS (
row_name text, val1 int, val2 int, val3 int, val4 int, val5 int);
Function:
功能:
CREATE OR REPLACE FUNCTION crosstab_int_5(text)
RETURNS SETOF tablefunc_crosstab_int_5
AS '$libdir/tablefunc', 'crosstab' LANGUAGE c STABLE STRICT;
Call:
称呼:
SELECT * FROM crosstab_int_5('SELECT row_name, attrib, val -- no cast!
FROM tbl ORDER BY 1,2');
Result:
结果:
row_name | val1 | val2 | val3 | val4 | val5
----------+------+------+------+------+------
A | 10 | 20 | | |
B | 3 | 4 | | |
C | 5 | | | |
D | 6 | 7 | 8 | |
Onepolymorphic, dynamic function for all
一种适用于所有人的多态动态函数
This goes beyond what's covered by the tablefunc
module.
To make the return type dynamic I use a polymorphic type with a technique detailed in this related answer:
这超出了tablefunc
模块所涵盖的范围。
为了使返回类型动态化,我使用多态类型和此相关答案中详述的技术:
1-parameter form:
1-参数形式:
CREATE OR REPLACE FUNCTION crosstab_n(_qry text, _rowtype anyelement)
RETURNS SETOF anyelement AS
$func$
BEGIN
RETURN QUERY EXECUTE
(SELECT format('SELECT * FROM crosstab(%L) t(%s)'
, _qry
, string_agg(quote_ident(attname) || ' ' || atttypid::regtype
, ', ' ORDER BY attnum))
FROM pg_attribute
WHERE attrelid = pg_typeof(_rowtype)::text::regclass
AND attnum > 0
AND NOT attisdropped);
END
$func$ LANGUAGE plpgsql;
Overload with this variant for the 2-parameter form:
使用此变体重载 2 参数形式:
CREATE OR REPLACE FUNCTION crosstab_n(_qry text, _cat_qry text, _rowtype anyelement)
RETURNS SETOF anyelement AS
$func$
BEGIN
RETURN QUERY EXECUTE
(SELECT format('SELECT * FROM crosstab(%L, %L) t(%s)'
, _qry, _cat_qry
, string_agg(quote_ident(attname) || ' ' || atttypid::regtype
, ', ' ORDER BY attnum))
FROM pg_attribute
WHERE attrelid = pg_typeof(_rowtype)::text::regclass
AND attnum > 0
AND NOT attisdropped);
END
$func$ LANGUAGE plpgsql;
pg_typeof(_rowtype)::text::regclass
: There is a row type defined for every user-defined composite type, so that attributes (columns) are listed in the system catalog pg_attribute
. The fast lane to get it: cast the registered type (regtype
) to text
and cast this text
to regclass
.
pg_typeof(_rowtype)::text::regclass
:为每个用户定义的复合类型定义了一个行类型,以便在系统目录中列出属性(列)pg_attribute
。获得它的快速通道:将注册的类型 ( regtype
) 转换为text
并将其转换text
为regclass
。
Create composite types once:
一次创建复合类型:
You need to define once every return type you are going to use:
您需要定义要使用的每个返回类型:
CREATE TYPE tablefunc_crosstab_int_3 AS (
row_name text, val1 int, val2 int, val3 int);
CREATE TYPE tablefunc_crosstab_int_4 AS (
row_name text, val1 int, val2 int, val3 int, val4 int);
...
For ad-hoc calls, you can also just create a temporary tableto the same (temporary) effect:
对于临时调用,您还可以创建一个临时表以达到相同(临时)效果:
CREATE TEMP TABLE temp_xtype7 AS (
row_name text, x1 int, x2 int, x3 int, x4 int, x5 int, x6 int, x7 int);
Or use the type of an existing table, view or materialized view if available.
或者使用现有表、视图或物化视图的类型(如果可用)。
Call
称呼
Using above row types:
使用上述行类型:
1-parameter form (no missing values):
1-参数形式(无缺失值):
SELECT * FROM crosstab_n(
'SELECT row_name, attrib, val FROM tbl ORDER BY 1,2'
, NULL::tablefunc_crosstab_int_3);
2-parameter form (some values can be missing):
2 参数形式(某些值可能会丢失):
SELECT * FROM crosstab_n(
'SELECT row_name, attrib, val FROM tbl ORDER BY 1'
, $$VALUES ('val1'), ('val2'), ('val3')$$
, NULL::tablefunc_crosstab_int_3);
This one functionworks for all return types, while the crosstabN()
framework provided by the tablefunc
module needs a separate function for each.
If you have named your types in sequence like demonstrated above, you only have to replace the bold number. To find the maximum number of categories in the base table:
这一个函数适用于所有返回类型,而模块提供的框架需要为每个返回一个单独的函数。
如果您已按照上面演示的顺序命名您的类型,则只需替换粗体数字。要在基表中查找最大类别数:crosstabN()
tablefunc
SELECT max(count(*)) OVER () FROM tbl -- returns 3
GROUP BY row_name
LIMIT 1;
That's about as dynamic as this gets if you want individual columns. Arrays like demonstrated by @Clocoaldoor a simple text representation or the result wrapped in a document type like json
or hstore
can work for any number of categories dynamically.
如果您想要单独的列,这将是动态的。阵列喜欢由@Clocoaldo证明或一个简单的文本表示或结果包裹在文档类型像json
或hstore
可以为任何数量的类别的工作动态。
Disclaimer:
It's always potentially dangerous when user input is converted to code. Make sure this cannot be used for SQL injection. Don't accept input from untrusted users (directly).
免责声明:将
用户输入转换为代码时总是存在潜在危险。确保这不能用于 SQL 注入。不要接受来自不受信任用户的输入(直接)。
Call for original question:
呼吁原始问题:
SELECT * FROM crosstab_n('SELECT bar, 1, feh FROM tbl_org ORDER BY 1,2'
, NULL::tablefunc_crosstab_int_3);
回答by Damian C. Rossney
Although this is an old question, I would like to add another solution made possible by recent improvements in PostgreSQL. This solution achieves the same goal of returning a structured result from a dynamic data set without using the crosstab function at all.In other words, this is a good example of re-examining unintentional and implicit assumptions that prevent us from discovering new solutions to old problems. ;)
虽然这是一个老问题,但我想添加另一个解决方案,这是由于 PostgreSQL 最近的改进而成为可能的。此解决方案实现了相同的目标,即从动态数据集返回结构化结果,而根本不使用交叉表函数。换句话说,这是一个很好的例子,它重新检查了无意和隐含的假设,这些假设阻止了我们发现旧问题的新解决方案。;)
To illustrate, you asked for a method to transpose data with the following structure:
为了说明这一点,您要求一种方法来转置具有以下结构的数据:
id feh bar
1 10 A
2 20 A
3 3 B
4 4 B
5 5 C
6 6 D
7 7 D
8 8 D
into this format:
变成这种格式:
bar val1 val2 val3
A 10 20
B 3 4
C 5
D 6 7 8
The conventional solution is a clever (and incredibly knowledgeable) approach to creating dynamic crosstab queries that is explained in exquisite detail in Erwin Brandstetter's answer.
传统的解决方案是一种创建动态交叉表查询的聪明(而且知识渊博)的方法,在 Erwin Brandstetter 的回答中详细解释了这一点。
However, if your particular use case is flexible enough to accept a slightly different result format, then another solution is possible that handles dynamic pivots beautifully. This technique, which I learned of here
但是,如果您的特定用例足够灵活,可以接受稍微不同的结果格式,那么另一种解决方案可以很好地处理动态枢轴。这种技术,我在这里学到的
uses PostgreSQL's new jsonb_object_agg
function to construct pivoted data on the fly in the form of a JSON object.
使用 PostgreSQL 的新jsonb_object_agg
函数以 JSON 对象的形式动态构建透视数据。
I will use Mr. Brandstetter's "simpler test case" to illustrate:
我将使用 Brandstetter 先生的“更简单的测试用例”来说明:
CREATE TEMP TABLE tbl (row_name text, attrib text, val int);
INSERT INTO tbl (row_name, attrib, val) VALUES
('A', 'val1', 10)
, ('A', 'val2', 20)
, ('B', 'val1', 3)
, ('B', 'val2', 4)
, ('C', 'val1', 5)
, ('D', 'val3', 8)
, ('D', 'val1', 6)
, ('D', 'val2', 7);
Using the jsonb_object_agg
function, we can create the required pivoted result set with this pithy beauty:
使用该jsonb_object_agg
函数,我们可以创建所需的具有这种简洁美的旋转结果集:
SELECT
row_name AS bar,
json_object_agg(attrib, val) AS data
FROM tbl
GROUP BY row_name
ORDER BY row_name;
Which outputs:
哪些输出:
bar | data
-----+----------------------------------------
A | { "val1" : 10, "val2" : 20 }
B | { "val1" : 3, "val2" : 4 }
C | { "val1" : 5 }
D | { "val3" : 8, "val1" : 6, "val2" : 7 }
As you can see, this function works by creating key/value pairs in the JSON object from the attrib
and value
columns in the sample data, all grouped by row_name
.
如您所见,此函数的工作原理是从示例数据中的attrib
和value
列在JSON 对象中创建键/值对,所有这些都按 分组row_name
。
Although this result set obviously looks different, I believe it will actually satisfy many (if not most) real world use cases, especially those where the data requires a dynamically-generated pivot, or where resulting data is consumed by a parent application (e.g., needs to be re-formatted for transmission in a http response).
虽然这个结果集看起来明显不同,但我相信它实际上会满足许多(如果不是大多数)现实世界的用例,尤其是那些数据需要动态生成的数据透视表,或者结果数据被父应用程序消耗的情况(例如,需要重新格式化以在 http 响应中传输)。
Benefits of this approach:
这种方法的好处:
Cleaner syntax.I think everyone would agree that the syntax for this approach is far cleaner and easier to understand than even the most basic crosstab examples.
Completely dynamic.No information about the underlying data need be specified beforehand. Neither the column names nor their data types need be known ahead of time.
Handles large numbers of columns.Since the pivoted data is saved as a single jsonb column, you will not run up against PostgreSQL's column limit (≤1,600 columns, I believe). There is still a limit, but I believe it is the same as for text fields: 1 GB per JSON object created (please correct me if I am wrong). That's a lot of key/value pairs!
Simplified data handling.I believe that the creation of JSON data in the DB will simplify (and likely speed up) the data conversion process in parent applications. (You will note that the integer data in our sample test case was correctly stored as such in the resulting JSON objects. PostgreSQL handles this by automatically converting its intrinsic data types to JSON in accordance with the JSON specification.) This will effectively eliminate the need to manually cast data passed to parent applications: it can all be delegated to the application's native JSON parser.
更简洁的语法。我想每个人都会同意这种方法的语法比最基本的交叉表示例更清晰、更容易理解。
完全动态。无需事先指定有关基础数据的信息。列名及其数据类型都不需要提前知道。
处理大量列。由于透视数据保存为单个 jsonb 列,因此您不会遇到 PostgreSQL 的列限制(我相信≤1,600 列)。仍然有一个限制,但我相信它与文本字段相同:创建的每个 JSON 对象 1 GB(如果我错了,请纠正我)。这是很多键/值对!
简化的数据处理。我相信在 DB 中创建 JSON 数据将简化(并可能加速)父应用程序中的数据转换过程。(您会注意到我们的示例测试用例中的整数数据已正确存储在结果 JSON 对象中。PostgreSQL 通过根据 JSON 规范自动将其内部数据类型转换为 JSON 来处理此问题。)这将有效地消除需要手动转换传递给父应用程序的数据:它可以全部委托给应用程序的本机 JSON 解析器。
Differences (and possible drawbacks):
差异(和可能的缺点):
It looks different.There's no denying that the results of this approach look different. The JSON object is not as pretty as the crosstab result set; however, the differences are purely cosmetic. The same information is produced--and in a format that is probably morefriendly for consumption by parent applications.
Missing keys.Missing values in the crosstab approach are filled in with nulls, while the JSON objects are simply missing the applicable keys. You will have to decide for your self if this is an acceptable trade off for your use case. It seems to me that any attempt to address this problem in PostgreSQL will greatly complicate the process and likely involve some introspection in the form of additional queries.
Key order is not preserved.I don't know if this can be addressed in PostgreSQL, but this issue is mostly cosmetic also, since any parent applications are either unlikely to rely on key order, or have the ability to determine proper key order by other means. The worst case will probably only require an addition query of the database.
它看起来不一样。不可否认,这种方法的结果看起来不同。JSON 对象不如交叉表结果集漂亮;然而,差异纯粹是表面上的。生成相同的信息——并且其格式可能更适合父应用程序使用。
缺少钥匙。交叉表方法中的缺失值用空值填充,而 JSON 对象只是缺少适用的键。您必须自己决定这对于您的用例是否是可以接受的折衷。在我看来,在 PostgreSQL 中解决这个问题的任何尝试都会使这个过程大大复杂化,并且可能涉及以附加查询的形式进行的一些内省。
不保留密钥顺序。我不知道这是否可以在 PostgreSQL 中解决,但这个问题也主要是表面问题,因为任何父应用程序要么不太可能依赖键顺序,要么有能力通过其他方式确定正确的键顺序。最坏的情况可能只需要对数据库进行附加查询。
Conclusion
结论
I am very curious to hear the opinions of others (especially @ErwinBrandstetter's) on this approach, especially as it pertains to performance. When I discovered this approach on Andrew Bender's blog, it was like getting hit in the side of the head. What a beautiful way to take a fresh approach to a difficult problem in PostrgeSQL. It solved my use case perfectly, and I believe it will likewise serve many others as well.
我很想听听其他人(尤其是@ErwinBrandstetter)对这种方法的意见,尤其是与性能有关的意见。当我在 Andrew Bender 的博客上发现这种方法时,感觉就像被击中了头部。对 PostrgeSQL 中的难题采取全新的方法是多么美妙的方式啊。它完美地解决了我的用例,我相信它也将同样适用于许多其他用例。
回答by Clodoaldo Neto
This is to complete @Damiangood answer. I have already suggested the JSON approach in other answers before the 9.6's handy json_object_agg
function. It just takes more work with the previous tool set.
这是完成@Damian 的好答案。在 9.6 的方便json_object_agg
功能之前,我已经在其他答案中建议了 JSON 方法。使用以前的工具集需要做更多的工作。
Two of the cited possible drawbacks are really not. The random key order is trivially corrected if necessary. The missing keys, if relevant, takes an almost trivial amount of code to be addressed:
所引用的两个可能的缺点实际上并非如此。如有必要,随机密钥顺序会被简单地纠正。缺少的键(如果相关)需要处理几乎微不足道的代码:
select
row_name as bar,
json_object_agg(attrib, val order by attrib) as data
from
tbl
right join
(
(select distinct row_name from tbl) a
cross join
(select distinct attrib from tbl) b
) c using (row_name, attrib)
group by row_name
order by row_name
;
bar | data
-----+----------------------------------------------
a | { "val1" : 10, "val2" : 20, "val3" : null }
b | { "val1" : 3, "val2" : 4, "val3" : null }
c | { "val1" : 5, "val2" : null, "val3" : null }
d | { "val1" : 6, "val2" : 7, "val3" : 8 }
For a final query consumer which understands JSON there are no drawbacks. The only one is that it can not be consumed as a table source.
对于理解 JSON 的最终查询使用者来说,没有任何缺点。唯一的一点是不能作为表源消费。
回答by Clodoaldo Neto
In your case I guess an array is good. SQL Fiddle
在你的情况下,我想一个数组是好的。SQL小提琴
select
bar,
feh || array_fill(null::int, array[c - array_length(feh, 1)]) feh
from
(
select bar, array_agg(feh) feh
from foo
group by bar
) s
cross join (
select count(*)::int c
from foo
group by bar
order by c desc limit 1
) c(c)
;
bar | feh
-----+---------------
A | {10,20,NULL}
B | {3,4,NULL}
C | {5,NULL,NULL}
D | {6,7,8}
回答by vsinceac
I'm sorry about returning in the past, but the solution "Dynamic Crosstab" returns erroneous result table. Thus, the valN values are erroneously "aligned to the left" and they don't correspond to the column names. When the input table has "holes" in the values, e.g. "C" has val1 and val3 but not val2. This produces an error: val3 value will be ranged in the column val2 (i.e. the next free column) in the final table.
我很抱歉过去返回,但解决方案“动态交叉表”返回错误的结果表。因此,valN 值错误地“向左对齐”并且它们不对应于列名称。当输入表的值中有“漏洞”时,例如“C”有 val1 和 val3 但没有 val2。这会产生一个错误:val3 值将在最终表的 val2 列(即下一个空闲列)中排列。
CREATE TEMP TABLE tbl (row_name text, attrib text, val int);
INSERT INTO tbl (row_name, attrib, val) VALUES ('C', 'val1', 5) ('C', 'val3', 7);
SELECT * FROM crosstab('SELECT row_name, attrib, val FROM tbl
ORDER BY 1,2') AS ct (row_name text, val1 int, val2 int, val3 int);
row_name|val1|val2|val3
C | 5| 7 |
In order to return correct cells with "holes" in the right column, the crosstab query requires a 2nd SELECT in the crosstab, something like this "crosstab('SELECT row_name, attrib, val FROM tbl ORDER BY 1,2', 'select distinct row_name from tbl order by 1')"
为了在右列中返回带有“孔”的正确单元格,交叉表查询需要交叉表中的第二个 SELECT,类似这样 "crosstab('SELECT row_name, attrib, val FROM tbl ORDER BY 1,2', 'select distinct row_name from tbl order by 1')"