Python Pandas groupby 多列

Question

提问by Kelvin Ng

thank you for your help.

感谢您的帮助。

I have data that looks like this:

我有看起来像这样的数据：

city,  room_type
A, X
A, Y
A, Z
B, X
B, Y
B, Y

I want my end result to look like this:

我希望我的最终结果是这样的：

city, count(X), count(Y), count(z) 
A,  1, 1, 1
B,  1, 2, 0

I am grouping by city and I want to show the count of each room_type in each city.

我按城市分组，我想显示每个城市中每个 room_type 的数量。

Any way to do this with python pandas? Thank you.

有什么办法可以用 python pandas 做到这一点吗？谢谢你。

I learned SQL years ago and think that it may have been possible. I'm sure python can do the same. Thanks!

几年前我学过 SQL 并认为它可能是可能的。我相信 python 也可以这样做。谢谢！

Answer 1

回答by jezrael

You can use crosstabwith renamecolumns:

您可以crosstab与rename列一起使用：

df = pd.crosstab(df.city, df.room_type).rename(columns=lambda x: 'count({})'.format(x))
print (df)
room_type  count(X)  count(Y)  count(Z)
city                                   
A                 1         1         1
B                 1         2         0

Another solutions with groupbyand sizeor value_counts, for reshape is used unstack:

使用groupby和size或value_counts用于重塑的另一种解决方案unstack：

df = df.groupby(['city', 'room_type']).size().unstack(fill_value=0)
       .rename(columns=lambda x: 'count({})'.format(x))
print (df)
room_type  count(X)  count(Y)  count(Z)
city                                   
A                 1         1         1
B                 1         2         0

df = df.groupby('city')['room_type'].value_counts().unstack(fill_value=0)
       .rename(columns=lambda x: 'count({})'.format(x))
print (df)
room_type  count(X)  count(Y)  count(Z)
city                                   
A                 1         1         1
B                 1         2         0

Answer 2

回答by piRSquared

A solution jezrael didn't give ;-)

jezrael 没有给出解决方案 ;-)

s = pd.value_counts([tuple(i) for i in df.values.tolist()])
s.index = pd.MultiIndex.from_tuples(s.index.values, names=['city', None])
s.unstack(fill_value=0).rename(columns='count({})'.format).reset_index()

  city  count(X)  count(Y)  count(Z)
0    A         1         1         1
1    B         1         2         0

More involved

更多地参与

cities = pd.unique(df.city)
room_types = pd.unique(df.room_type)
d1 = pd.DataFrame(
    np.zeros((len(cities), len(room_types)), dtype=int),
    cities,
    room_types
)
for r, c in df.values:
    d1.set_value(r, c, d1.get_value(r, c) + 1)

d1.rename(columns='count({})'.format).rename_axis('city').reset_index()

Variation of first solution

第一个解决方案的变化

from collections import Counter

pd.Series(
    Counter(map(tuple, df.values.tolist()))
).unstack(fill_value=0).rename(
    columns='count({})'.format
).rename_axis('city').reset_index()

Python Pandas groupby 多列

提问by Kelvin Ng

回答by jezrael

回答by piRSquared

相关推荐

最近更新

标签

Python Pandas groupby 多列

提问by Kelvin Ng

回答by jezrael

回答by piRSquared

相关推荐

pandas 为什么我的熊猫数据框变成了“无”类型？

pandas 从现有的熊猫数据框中复制一些行到一个新的

pandas 类型错误：“系列”对象是可变的，因此它们不能被散列

ValueError：对象类型 <class 'pandas.core.frame.DataFrame 没有名为“type”的轴

相关推荐

最近更新

标签