pandas 熊猫数据框分组:仅正数的总和/计数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20431717/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas dataframe groupby: sum/count of only positive numbers
提问by Alexis Eggermont
I have a dataframe ('frame') on which I want to aggregate by Country and Date:
我有一个数据框('frame'),我想在上面按国家和日期聚合:
aggregated=pd.DataFrame(frame.groupby(['Country','Date']).CaseID.count())
aggregated["Total duration"]=frame.groupby(['Country','Date']).Hours.sum()
aggregated["Mean duration"]=frame.groupby(['Country','Date']).Hours.mean()
I want to compute the above figures (total duration, mean duration, etc.) only for the positive 'Hours' numbers in 'frame'. How can I do that?
我只想为“帧”中的正“小时”数字计算上述数字(总持续时间、平均持续时间等)。我怎样才能做到这一点?
Thanks!
谢谢!
Sample "frame"
示例“框架”
import pandas as pd
Line1 = {"Country": "USA", "Date":"01 jan", "Hours":4}
Line2 = {"Country": "USA", "Date":"01 jan", "Hours":3}
Line3 = {"Country": "USA", "Date":"01 jan", "Hours":-999}
Line4 = {"Country": "Japan", "Date":"01 jan", "Hours":3}
pd.DataFrame([Line1,Line2,Line3,Line4])
回答by alko
Not as elegant as above, but deals differently some corner cases. dfstands for framefrom original question.
不像上面那样优雅,但处理一些特殊情况。df代表frame来自原始问题。
>>> df.groupby(['Country','Date']).agg(lambda x: x[x>0].mean())
Hours
Country Date
Japan 01 jan 3.0
USA 01 jan 3.5
>>> df.ix[3, 'Hours'] = -1
>>> df.groupby(['Country','Date']).agg(lambda x: x[x>0].mean())
Hours
Country Date
Japan 01 jan NaN
USA 01 jan 3.5
回答by kgu87
How about -
怎么样 -
frame[frame["Hours"] > 0].groupby(['Country','Date'])

