Pandas 数据框:如何按列中的值分组并从分组值中创建新列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34556427/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas dataframe: how to group by values in a column and create new columns out of grouped values
提问by foebu
I have a dataframe with two columns:
我有一个包含两列的数据框:
x y
0 1
1 1
2 2
0 5
1 6
2 8
0 1
1 8
2 4
0 1
1 7
2 3
What I want is:
我想要的是:
x val1 val2 val3 val4
0 1 5 1 1
1 1 6 8 7
2 2 8 4 3
I know that the values in column x are repeated all N times.
我知道 x 列中的值都重复了 N 次。
回答by unutbu
You could use groupby/cumcount
to assign column numbers and then call pivot
:
您可以使用groupby/cumcount
分配列号,然后调用pivot
:
import pandas as pd
df = pd.DataFrame({'x': [0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2],
'y': [1, 1, 2, 5, 6, 8, 1, 8, 4, 1, 7, 3]})
df['columns'] = df.groupby('x')['y'].cumcount()
# x y columns
# 0 0 1 0
# 1 1 1 0
# 2 2 2 0
# 3 0 5 1
# 4 1 6 1
# 5 2 8 1
# 6 0 1 2
# 7 1 8 2
# 8 2 4 2
# 9 0 1 3
# 10 1 7 3
# 11 2 3 3
result = df.pivot(index='x', columns='columns')
print(result)
yields
产量
y
columns 0 1 2 3
x
0 1 5 1 1
1 1 6 8 7
2 2 8 4 3
Or, if you can really rely on the values in x
being repeated in orderN times,
或者,如果您真的可以依靠按顺序x
重复N 次的值,
N = 3
result = pd.DataFrame(df['y'].values.reshape(-1, N).T)
yields
产量
0 1 2 3
0 1 5 1 1
1 1 6 8 7
2 2 8 4 3
Using reshape
is quicker than calling groupby/cumcount
and pivot
, but it
is less robust since it relies on the values in y
appearing in the right order.
使用reshape
比调用groupby/cumcount
and更快pivot
,但它不太健壮,因为它依赖y
于以正确顺序出现的值。