Pandas 数据框:如何按列中的值分组并从分组值中创建新列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34556427/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:27:13  来源:igfitidea点击:

Pandas dataframe: how to group by values in a column and create new columns out of grouped values

pythonpandasdataframe

提问by foebu

I have a dataframe with two columns:

我有一个包含两列的数据框:

x y
0 1
1 1
2 2
0 5
1 6
2 8
0 1
1 8
2 4
0 1
1 7
2 3

What I want is:

我想要的是:

x val1 val2 val3 val4
0 1 5 1 1
1 1 6 8 7
2 2 8 4 3

I know that the values in column x are repeated all N times.

我知道 x 列中的值都重复了 N 次。

回答by unutbu

You could use groupby/cumcountto assign column numbers and then call pivot:

您可以使用groupby/cumcount分配列号,然后调用pivot

import pandas as pd

df = pd.DataFrame({'x': [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
                   'y': [1, 1, 2, 5, 6, 8, 1, 8, 4, 1, 7, 3]})

df['columns'] = df.groupby('x')['y'].cumcount()
#     x  y  columns
# 0   0  1        0
# 1   1  1        0
# 2   2  2        0
# 3   0  5        1
# 4   1  6        1
# 5   2  8        1
# 6   0  1        2
# 7   1  8        2
# 8   2  4        2
# 9   0  1        3
# 10  1  7        3
# 11  2  3        3

result = df.pivot(index='x', columns='columns')
print(result)

yields

产量

         y         
columns  0  1  2  3
x                  
0        1  5  1  1
1        1  6  8  7
2        2  8  4  3


Or, if you can really rely on the values in xbeing repeated in orderN times,

或者,如果您真的可以依靠按顺序x重复N 次的值,

N = 3
result = pd.DataFrame(df['y'].values.reshape(-1, N).T)

yields

产量

   0  1  2  3
0  1  5  1  1
1  1  6  8  7
2  2  8  4  3

Using reshapeis quicker than calling groupby/cumcountand pivot, but it is less robust since it relies on the values in yappearing in the right order.

使用reshape比调用groupby/cumcountand更快pivot,但它不太健壮,因为它依赖y于以正确顺序出现的值。