pandas 聚合数据并获得总和和计数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34475239/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:25:58  来源:igfitidea点击:

Aggregating data and getting sum and counts

pythonpandasgroup-byaggregate

提问by John Doe

I have an object in python with a lot of rows:

我在 python 中有一个对象,有很多行:

INPUT :

输入 :

    Team1     Player1     idTrip13     133
    Team2     Player333   idTrip10     18373
    Team3     Player22    idTrip12     17338899
    Team2     Player293   idTrip02     17656
    Team3     Player20    idTrip11     1883
    Team1     Player1     idTrip19     19393

and I need to aggregate this data (like a pivot table).

我需要汇总这些数据(如数据透视表)。

OUTPUT I am working on:

输出我正在处理:

Team1   Player1 : 2 trips : sum(133+19393)
Team2   Player333 : 1 trip : 18373; Player293 : 1 trip : 17656
Team3   Player22 : 1 trip : 17338899; Player20 : 1 trip : 1883

Could someone suggest the appropriate object in Python to use such that I could have the following output?

有人可以建议使用 Python 中的适当对象,以便我可以得到以下输出吗?

print team, player, trips, time

回答by ilyas patanam

Use groupby function for pandas DataFrames

Pandas DataFrames使用 groupby 函数

  1. Put your data into a list of lists, each inner list will be a row in the dataframe.

    In[1]:
    
    mydata = [['Team1', 'Player1', 'idTrip13', 133], ['Team2', 'Player333', 'idTrip10', 18373],
    ['Team3', 'Player22', 'idTrip12', 17338899], ['Team2', 'Player293','idTrip02', 17656], 
    ['Team3', 'Player20', 'idTrip11', 1883], ['Team1', 'Player1', 'idTrip19', 19393]]
    
    df = pd.DataFrame(mydata, columns = ['team', 'player', 'trips', 'time'])
    
    df
    Out[1]:
         team    player       trips      time
    0   Team1   Player1     idTrip13    133
    1   Team2   Player333   idTrip10    18373
    2   Team3   Player22    idTrip12    17338899
    3   Team2   Player293   idTrip02    17656
    4   Team3   Player20    idTrip11    1883
    5   Team1   Player1     idTrip19    19393
    
  2. Call groupby(), pass the column you wish to use as your grouper, and apply a function to the groups.

  1. 将您的数据放入一个列表列表中,每个内部列表将是数据框中的一行。

    In[1]:
    
    mydata = [['Team1', 'Player1', 'idTrip13', 133], ['Team2', 'Player333', 'idTrip10', 18373],
    ['Team3', 'Player22', 'idTrip12', 17338899], ['Team2', 'Player293','idTrip02', 17656], 
    ['Team3', 'Player20', 'idTrip11', 1883], ['Team1', 'Player1', 'idTrip19', 19393]]
    
    df = pd.DataFrame(mydata, columns = ['team', 'player', 'trips', 'time'])
    
    df
    Out[1]:
         team    player       trips      time
    0   Team1   Player1     idTrip13    133
    1   Team2   Player333   idTrip10    18373
    2   Team3   Player22    idTrip12    17338899
    3   Team2   Player293   idTrip02    17656
    4   Team3   Player20    idTrip11    1883
    5   Team1   Player1     idTrip19    19393
    
  2. 调用groupby(),传递您希望用作石斑鱼的列,然后将函数应用于组。



Examples

例子

Ex. 1Find the number of trips each team went on. teamis the grouper, and we apply the function count()on column ['trips'].

前任。1找出每个团队进行的旅行次数。team是石斑鱼,我们count()在 column 上应用函数['trips']

In[2]:
trip_count = df.groupby(by = ['team'])['trips'].count() 

trip_count              
Out[2]:          

 team
Team1    2
Team2    2
Team3    2
Name: trips, dtype: int64

Ex. 2 (multiple columns): Find the total time each player on a team spent traveling. We use 2 columns ['team', 'player']as the grouper, and apply the function sum()on column ['time'].

前任。2(多列):查找团队中每个玩家花费的总时间。我们使用 2 列['team', 'player']作为分组器,并sum()在 column 上应用该函数['time']

In[3]:              
trip_time = df.groupby(by = ['team', 'player'])['time'].sum() 

trip_time        
Out[3]:

 team   player   
Team1  Player1         19526
Team2  Player293       17656
       Player333       18373
Team3  Player20         1883
       Player22     17338899
Name: time, dtype: int64

Ex. 3 (multiple functions):For each player on a team, find the total number of trips and total time spent traveling.

前任。3 (多种功能)对于团队中的每个玩家,找出总旅行次数和总旅行时间。

player_total = df.groupby(by = ['team', 'player']).agg({'time' : 'sum', 'trips' : 'count'})

player_total
Out[4]:
                 trips  time
team    player      
Team1   Player1     2   19526
Team2   Player293   1   17656
        Player333   1   18373
Team3   Player20    1   1883
        Player22    1   17338899