pandas 聚合数据并获得总和和计数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34475239/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Aggregating data and getting sum and counts
提问by John Doe
I have an object in python with a lot of rows:
我在 python 中有一个对象,有很多行:
INPUT :
输入 :
Team1 Player1 idTrip13 133
Team2 Player333 idTrip10 18373
Team3 Player22 idTrip12 17338899
Team2 Player293 idTrip02 17656
Team3 Player20 idTrip11 1883
Team1 Player1 idTrip19 19393
and I need to aggregate this data (like a pivot table).
我需要汇总这些数据(如数据透视表)。
OUTPUT I am working on:
输出我正在处理:
Team1 Player1 : 2 trips : sum(133+19393)
Team2 Player333 : 1 trip : 18373; Player293 : 1 trip : 17656
Team3 Player22 : 1 trip : 17338899; Player20 : 1 trip : 1883
Could someone suggest the appropriate object in Python to use such that I could have the following output?
有人可以建议使用 Python 中的适当对象,以便我可以得到以下输出吗?
print team, player, trips, time
回答by ilyas patanam
Use groupby function for pandas DataFrames
对Pandas DataFrames使用 groupby 函数
Put your data into a list of lists, each inner list will be a row in the dataframe.
In[1]: mydata = [['Team1', 'Player1', 'idTrip13', 133], ['Team2', 'Player333', 'idTrip10', 18373], ['Team3', 'Player22', 'idTrip12', 17338899], ['Team2', 'Player293','idTrip02', 17656], ['Team3', 'Player20', 'idTrip11', 1883], ['Team1', 'Player1', 'idTrip19', 19393]] df = pd.DataFrame(mydata, columns = ['team', 'player', 'trips', 'time']) df Out[1]: team player trips time 0 Team1 Player1 idTrip13 133 1 Team2 Player333 idTrip10 18373 2 Team3 Player22 idTrip12 17338899 3 Team2 Player293 idTrip02 17656 4 Team3 Player20 idTrip11 1883 5 Team1 Player1 idTrip19 19393
Call
groupby()
, pass the column you wish to use as your grouper, and apply a function to the groups.
将您的数据放入一个列表列表中,每个内部列表将是数据框中的一行。
In[1]: mydata = [['Team1', 'Player1', 'idTrip13', 133], ['Team2', 'Player333', 'idTrip10', 18373], ['Team3', 'Player22', 'idTrip12', 17338899], ['Team2', 'Player293','idTrip02', 17656], ['Team3', 'Player20', 'idTrip11', 1883], ['Team1', 'Player1', 'idTrip19', 19393]] df = pd.DataFrame(mydata, columns = ['team', 'player', 'trips', 'time']) df Out[1]: team player trips time 0 Team1 Player1 idTrip13 133 1 Team2 Player333 idTrip10 18373 2 Team3 Player22 idTrip12 17338899 3 Team2 Player293 idTrip02 17656 4 Team3 Player20 idTrip11 1883 5 Team1 Player1 idTrip19 19393
调用
groupby()
,传递您希望用作石斑鱼的列,然后将函数应用于组。
Examples
例子
Ex. 1Find the number of trips each team went on. team
is the grouper, and we apply the function count()
on column ['trips']
.
前任。1找出每个团队进行的旅行次数。team
是石斑鱼,我们count()
在 column 上应用函数['trips']
。
In[2]:
trip_count = df.groupby(by = ['team'])['trips'].count()
trip_count
Out[2]:
team
Team1 2
Team2 2
Team3 2
Name: trips, dtype: int64
Ex. 2 (multiple columns): Find the total time each player on a team spent traveling. We use 2 columns ['team', 'player']
as the grouper, and apply the function sum()
on column ['time']
.
前任。2(多列):查找团队中每个玩家花费的总时间。我们使用 2 列['team', 'player']
作为分组器,并sum()
在 column 上应用该函数['time']
。
In[3]:
trip_time = df.groupby(by = ['team', 'player'])['time'].sum()
trip_time
Out[3]:
team player
Team1 Player1 19526
Team2 Player293 17656
Player333 18373
Team3 Player20 1883
Player22 17338899
Name: time, dtype: int64
Ex. 3 (multiple functions):For each player on a team, find the total number of trips and total time spent traveling.
前任。3 (多种功能):对于团队中的每个玩家,找出总旅行次数和总旅行时间。
player_total = df.groupby(by = ['team', 'player']).agg({'time' : 'sum', 'trips' : 'count'})
player_total
Out[4]:
trips time
team player
Team1 Player1 2 19526
Team2 Player293 1 17656
Player333 1 18373
Team3 Player20 1 1883
Player22 1 17338899