Python Pandas groupby 多列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/42642001/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas groupby multiple columns
提问by Kelvin Ng
thank you for your help.
感谢您的帮助。
I have data that looks like this:
我有看起来像这样的数据:
city, room_type
A, X
A, Y
A, Z
B, X
B, Y
B, Y
I want my end result to look like this:
我希望我的最终结果是这样的:
city, count(X), count(Y), count(z)
A, 1, 1, 1
B, 1, 2, 0
I am grouping by city and I want to show the count of each room_type in each city.
我按城市分组,我想显示每个城市中每个 room_type 的数量。
Any way to do this with python pandas? Thank you.
有什么办法可以用 python pandas 做到这一点吗?谢谢你。
I learned SQL years ago and think that it may have been possible. I'm sure python can do the same. Thanks!
几年前我学过 SQL 并认为它可能是可能的。我相信 python 也可以这样做。谢谢!
回答by jezrael
You can use crosstab
with rename
columns:
您可以crosstab
与rename
列一起使用:
df = pd.crosstab(df.city, df.room_type).rename(columns=lambda x: 'count({})'.format(x))
print (df)
room_type count(X) count(Y) count(Z)
city
A 1 1 1
B 1 2 0
Another solutions with groupby
and size
or value_counts
, for reshape is used unstack
:
使用groupby
和size
或value_counts
用于重塑的另一种解决方案unstack
:
df = df.groupby(['city', 'room_type']).size().unstack(fill_value=0)
.rename(columns=lambda x: 'count({})'.format(x))
print (df)
room_type count(X) count(Y) count(Z)
city
A 1 1 1
B 1 2 0
df = df.groupby('city')['room_type'].value_counts().unstack(fill_value=0)
.rename(columns=lambda x: 'count({})'.format(x))
print (df)
room_type count(X) count(Y) count(Z)
city
A 1 1 1
B 1 2 0
回答by piRSquared
A solution jezrael didn't give ;-)
jezrael 没有给出解决方案 ;-)
s = pd.value_counts([tuple(i) for i in df.values.tolist()])
s.index = pd.MultiIndex.from_tuples(s.index.values, names=['city', None])
s.unstack(fill_value=0).rename(columns='count({})'.format).reset_index()
city count(X) count(Y) count(Z)
0 A 1 1 1
1 B 1 2 0
More involved
更多地参与
cities = pd.unique(df.city)
room_types = pd.unique(df.room_type)
d1 = pd.DataFrame(
np.zeros((len(cities), len(room_types)), dtype=int),
cities,
room_types
)
for r, c in df.values:
d1.set_value(r, c, d1.get_value(r, c) + 1)
d1.rename(columns='count({})'.format).rename_axis('city').reset_index()
Variation of first solution
第一个解决方案的变化
from collections import Counter
pd.Series(
Counter(map(tuple, df.values.tolist()))
).unstack(fill_value=0).rename(
columns='count({})'.format
).rename_axis('city').reset_index()