pandas 根据另一列计算值的出现次数

Question

提问by Niche.P

I have a question regarding creating pandas dataframe according to the sum of other column.

我有一个关于根据其他列的总和创建Pandas数据框的问题。

For example, I have this dataframe

例如，我有这个数据框

 Country    |    Accident
 England           Car
 England           Car
 England           Car
  USA              Car
  USA              Bike
  USA              Plane
 Germany           Car
 Thailand          Plane

I want to make another dataframe based on the sum value of all accident based on the country. We will disregard the type of the accident, while summing them all based on the country.

我想根据基于国家的所有事故的总和来制作另一个数据框。我们将不考虑事故的类型，同时根据国家/地区对它们进行汇总。

My desire dataframe would look like this

我想要的数据框看起来像这样

  Country    |    Sum of Accidents
  England              3
    USA                3
  Germany              1
  Thailand             1

Answer 1

回答by piRSquared

Option 1
Use value_counts

选项 1
使用value_counts

df.Country.value_counts().reset_index(name='Sum of Accidents')

Option 2
Use groupbythen size

选项 2
使用groupbythensize

df.groupby('Country').size().sort_values(ascending=False) \
  .reset_index(name='Sum of Accidents')

Answer 2

回答by Kamehameha

You can use the groupbymethod.

您可以使用该groupby方法。

Example -

例子 -

In [36]: df.groupby(["country"]).count().sort_values(["accident"], ascending=False).rename(columns={"accident" : "Sum of accidents"}).reset_index()
Out[36]:
    country  Sum of accidents
0   England                 3
1       USA                 3
2   Germany                 1
3  Thailand                 1

Explanation -

解释 -

df.groupby(["country"]).                               # Group by country
    count().                                           # Aggregation function which counts the number of occurences of country
    sort_values(                                       # Sorting it 
        ["accident"],                                  
        ascending=False).        
    rename(columns={"accident" : "Sum of accidents"}). # Renaming the columns
    reset_index()                                      # Resetting the index, it takes the country as the index if you don't do this.

pandas 根据另一列计算值的出现次数

提问by Niche.P

回答by piRSquared

回答by Kamehameha

相关推荐

最近更新

标签

pandas 根据另一列计算值的出现次数

提问by Niche.P

回答by piRSquared

回答by Kamehameha

相关推荐

pandas ValueError: '对象对于所需数组来说太深'

python pandas：过滤掉给定字段的空或空字符串记录

pandas 如何在单行上打印 DataFrame

pandas 如何将 DatetimeIndexResampler 转换为 DataFrame？

相关推荐

最近更新

标签