SQL 红移数据库的枢轴

Question

提问by ankitkhanduri

I know this question has been asked before but any of the answers were not able to help me to meet my desired requirements. So asking the question in new thread

我知道之前有人问过这个问题，但任何答案都无法帮助我满足我想要的要求。所以在新线程中问这个问题

In redshift how can use pivot the data into a form of one row per each unique dimension set, e.g.:

在红移中，如何使用将数据透视为每个唯一维度集一行的形式，例如：

id         Name               Category         count
8660     Iced Chocolate         Coffees         105
8660     Iced Chocolate         Milkshakes      10
8662     Old Monk               Beer            29
8663     Burger                 Snacks          18

to

到

id        Name              Cofees  Milkshakes  Beer  Snacks
8660    Iced Chocolate       105       10        0      0
8662    Old Monk             0         0        29      0
8663    Burger               0         0         0      18

The category listed above gets keep on changing. Redshift does not support the pivot operator and a caseexpression would not be of much help (if not please suggest how to do it)

上面列出的类别不断变化。Redshift 不支持数据透视运算符，case表达式也没有多大帮助（如果不支持，请建议如何操作）

How can I achieve this result in redshift?

我怎样才能在红移中达到这个结果？

(The above is just an example, we would have 1000+ categories and these categories keep's on changing)

（以上只是一个例子，我们会有 1000 多个类别，这些类别还在不断变化）

Answer 1

回答by Sami Yabroudi

We do a lot of pivoting at Ro - we built python based toolfor autogenerating pivot queries. This tool allows for the same basic options as what you'd find in excel, including specifying aggregation functions as well as whether you want overall aggregates.

我们在 Ro 上做了很多透视——我们构建了基于 Python 的工具来自动生成透视查询。此工具允许使用与您在 excel 中找到的相同的基本选项，包括指定聚合函数以及是否需要整体聚合。

Answer 2

回答by user3600910

i don't think there is a easy way to do that in Redshift,

我认为在 Redshift 中没有一种简单的方法可以做到这一点，

also you say you have more then 1000 categories and the number is growing you need to taking in to account you have limit of 1600 columns per table,

你还说你有超过 1000 个类别，而且数量还在增长，你需要考虑到每个表有 1600 列的限制，

see attached link [http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_usage.html][1]

请参阅附加链接 [ http://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_usage.html][1]

you can use case but then you need to create case for each category

您可以使用案例，但您需要为每个类别创建案例

select id,
       name,
       sum(case when Category='Coffees' then count end) as Cofees,       
       sum(case when Category='Milkshakes' then count end) as Milkshakes,
       sum(case when Category='Beer' then count end) as Beer,
       sum(case when Category='Snacks' then count end) as Snacks
from my_table
group by 1,2

other option you have is to upload the table for example to R and then you can use cast function for example.

您拥有的其他选项是将表格上传到 R，然后您可以使用 cast 函数。

cast(data, name~ category)

and then upload the data back to S3 or Redshift

然后将数据上传回 S3 或 Redshift

Answer 3

回答by systemHyman

If you will typically want to query specific subsets of the categories from the pivot table, a workaround based on the approach linked in the comments might work.

如果您通常希望从数据透视表中查询类别的特定子集，则基于评论中链接的方法的解决方法可能会奏效。

You can populate your "pivot_table" from the original like so:

您可以像这样从原始填充“pivot_table”：

insert into pivot_table (id, Name, json_cats) (
    select id, Name,
        '{' || listagg(quote_ident(Category) || ':' || count, ',')
               within group (order by Category) || '}' as json_cats
    from to_pivot
    group by id, Name
)

And access specific categories this way:

并以这种方式访问特定类别：

select id, Name,
    nvl(json_extract_path_text(json_cats, 'Snacks')::int, 0) Snacks,
    nvl(json_extract_path_text(json_cats, 'Beer')::int, 0) Beer
from pivot_table

Using varchar(max)for the JSON column type will give 65535 bytes which should be room for a couple thousand categories.

使用varchar(max)的JSON列类型会给65535个字节这应该是房间一对夫妇一千类别。

Answer 4

回答by Anshul Tak

@user3600910 is right with the approach however 'END' is required else '500310' invalid operation would occur.

@user3600910 是正确的方法，但是需要“END”，否则会发生“500310”无效操作。

select id,
       name,
       sum(case when Category='Coffees' then count END) as Cofees,       
       sum(case when Category='Milkshakes' then count END) as Milkshakes,
       sum(case when Category='Beer' then count END) as Beer,
       sum(case when Category='Snacks' then count END) as Snacks
from my_table
group by 1,2

SQL 红移数据库的枢轴

提问by ankitkhanduri

回答by Sami Yabroudi

回答by user3600910

回答by systemHyman

回答by Anshul Tak

相关推荐

最近更新

标签

SQL 红移数据库的枢轴

提问by ankitkhanduri

回答by Sami Yabroudi

回答by user3600910

回答by systemHyman

回答by Anshul Tak

相关推荐

SQL：Oracle - 查询中的参数

SQL 将 varchar 值“Id”转换为数据类型 int 时转换失败

SQL 优化多个连接

Spark/scala 中的 SQL 查询大小超过 Integer.MAX_VALUE

相关推荐

最近更新

标签