Pandas - 获取值作为 groupby 中的频率
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/37604496/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas - Get value as frequency in groupby
提问by jwillis0720
Can someone help me with the (possible) groupby in pandas.
有人可以帮我处理Pandas中的(可能的)groupby。
Here is the df:
这是df:
easy_donor v_fam count
0 donor_1_NS IGHV1 5202376
1 donor_1_NS IGHV2 1955547
2 donor_1_NS IGHV3 70426272
3 donor_1_NS IGHV4 452367
4 donor_1_NS IGHV5 4842145
5 donor_1_NS IGHV6 490142
6 donor_1_NS IGHV7 19708
24 donor_2_NS IGHV1 31258603
25 donor_2_NS IGHV2 5295899
26 donor_2_NS IGHV3 47286417
27 donor_2_NS IGHV4 44553802
Then I want each count as a frequency of the sum of the counts grouped by donor.
然后我希望每个计数作为按捐助者分组的计数总和的频率。
Like:
喜欢:
df.groupby('easy_donor').sum()['count']
easy_donor
donor_1_NS 83394639
donor_2_NS 129191591
donor_3_HS 220549762
donor_3_NS 104821016
donor_4_HS 200444923
donor_4_NS 121287306
Then each count in the original data frame divided by the groupby sum if they match the easy_donor column. Do I have to join on original dataframe?
然后,如果原始数据框中的每个计数与 easy_donor 列匹配,则除以 groupby 总和。我必须加入原始数据框吗?
回答by piRSquared
Try:
尝试:
df.groupby('easy_donor')["count"].apply(lambda x: x / x.sum())
回答by cinqS
FORGET THIS ANSWER!!! THIS IS JUST AN IDEA. NOT VIABLE
忘记这个答案!!!这只是一个想法。不可行
Note that using pandas apply
is unbearably slow. Instead, try using the native broadcasting.
请注意,使用pandas apply
速度慢得难以忍受。相反,请尝试使用本机广播。
df.groupby(by='easy_donor')['count'] * 1. / df.groupby(by='easy_donor').sum()