Pandas 计算数据摘要中的百分比
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/21725983/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas calculating percentage in data summary
提问by user3084006
Lets say I have a dataframe
假设我有一个数据框
df=pd.DataFrame({'Location': [ 'Ala', 'SS', 'Ala', 'Ala', 'SS', 'Ala', 'SS', 'TXE', 'TXE', 'TXE'],
                 'Bid': ['E','N','E','N','N','E', 'E',np.nan,np.nan,'A']})
Where S is sealed bids, N is people who did not bid, Nan is not present, and O is open bid.
其中 S 为密封投标,N 为未投标人,Nan 不在场,O 为公开投标。
I want to do a calculate the percentage of bidders where the equation would be (E+A)/(E+A+N). Is the best way to do a pivot table then implement the equation?
我想计算出投标人的百分比,其中方程为 (E+A)/(E+A+N)。制作数据透视表然后实现方程式的最佳方法是什么?
df=pd.DataFrame({'Location': [ 'Ala', 'SS', 'Ala', 'Ala', 'SS', 'Ala', 'SS', 'TXE', 'TXE', 'TXE'],
                 'Bid': ['E','N','E','N','N','E', 'E',np.nan,np.nan,'A']})
pt = df.pivot_table(rows='Location', cols='Bid', aggfunc='size', fill_value=0)
pt['Percentage']=(pt.A + pt.E)/(pt.A+pt.E+pt.N)
print (pt)
>>> 
Bid       A  E  N  Percentage
Location                     
Ala       0  3  1    0.750000
SS        0  1  2    0.333333
TXE       1  0  0    1.000000
[3 rows x 4 columns]
Is this the best way to calculate percentage or is there a better way than pivot tables?
这是计算百分比的最佳方法还是有比数据透视表更好的方法?
回答by jmz
Perhaps this isn't general enough but you can get the percentages with
也许这还不够通用,但您可以通过以下方式获得百分比
counts = df3['Bid'].value_counts(normalize=True)
Then finding (E+A)as a percentage of all bids is as simple as
然后找到(E+A)所有出价的百分比就像
counts.E + counts.A
If you don't want to include NaNbids in the percentage calculation then 
如果您不想NaN在百分比计算中包含出价,那么
counts = df3['Bid'].dropna().value_counts(normalize=True)
and, if there are other bid types you need to exclude
并且,如果您需要排除其他出价类型
all_allowable = df3['Bid'].isin(['E', 'A', 'N'])
counts = df3[all_allowable]['Bid'].value_counts(normalize=True)
To split by location
按位置拆分
all_allowable = df3['Bid'].isin(['E', 'A', 'N'])    
df3[all_allowable].groupby('Location')['Bid'].value_counts(normalize=True)
回答by LondonRob
Your answer looks pretty good to me. It's very readable, which is obviously important.
你的回答对我来说很不错。它非常易读,这显然很重要。
If you want an alternative, you could look at groupby, but, as I said, I think your own answer looks great:
如果你想要一个替代方案,你可以看看groupby,但是,正如我所说,我认为你自己的答案看起来很棒:
>>> df=pd.DataFrame({'Location': [ 'Ala', 'SS', 'Ala', 'Ala', 'SS', 'Ala', 'SS', 'TXE', 'TXE', 'TXE'],
...                  'Bid': ['E','N','E','N','N','E', 'E',np.nan,np.nan,'A']})
>>> df = df.set_index('Location')
>>> ean = df.groupby(level='Location').count()
>>> ea = df[df != 'N'].groupby(level='Location').count()
>>> ea.astype(float) / ean
               Bid
Location          
Ala       0.750000
SS        0.333333
TXE       1.000000

