Pandas 计算数据摘要中的百分比

Question

提问by user3084006

Lets say I have a dataframe

假设我有一个数据框

df=pd.DataFrame({'Location': [ 'Ala', 'SS', 'Ala', 'Ala', 'SS', 'Ala', 'SS', 'TXE', 'TXE', 'TXE'],
                 'Bid': ['E','N','E','N','N','E', 'E',np.nan,np.nan,'A']})

Where S is sealed bids, N is people who did not bid, Nan is not present, and O is open bid.

其中 S 为密封投标，N 为未投标人，Nan 不在场，O 为公开投标。

I want to do a calculate the percentage of bidders where the equation would be (E+A)/(E+A+N). Is the best way to do a pivot table then implement the equation?

我想计算出投标人的百分比，其中方程为 (E+A)/(E+A+N)。制作数据透视表然后实现方程式的最佳方法是什么？

df=pd.DataFrame({'Location': [ 'Ala', 'SS', 'Ala', 'Ala', 'SS', 'Ala', 'SS', 'TXE', 'TXE', 'TXE'],
                 'Bid': ['E','N','E','N','N','E', 'E',np.nan,np.nan,'A']})


pt = df.pivot_table(rows='Location', cols='Bid', aggfunc='size', fill_value=0)

pt['Percentage']=(pt.A + pt.E)/(pt.A+pt.E+pt.N)
print (pt)

>>> 
Bid       A  E  N  Percentage
Location                     
Ala       0  3  1    0.750000
SS        0  1  2    0.333333
TXE       1  0  0    1.000000

[3 rows x 4 columns]

Is this the best way to calculate percentage or is there a better way than pivot tables?

这是计算百分比的最佳方法还是有比数据透视表更好的方法？

Answer 1

回答by jmz

Perhaps this isn't general enough but you can get the percentages with

也许这还不够通用，但您可以通过以下方式获得百分比

counts = df3['Bid'].value_counts(normalize=True)

Then finding (E+A)as a percentage of all bids is as simple as

然后找到(E+A)所有出价的百分比就像

counts.E + counts.A

If you don't want to include NaNbids in the percentage calculation then

如果您不想NaN在百分比计算中包含出价，那么

counts = df3['Bid'].dropna().value_counts(normalize=True)

and, if there are other bid types you need to exclude

并且，如果您需要排除其他出价类型

all_allowable = df3['Bid'].isin(['E', 'A', 'N'])
counts = df3[all_allowable]['Bid'].value_counts(normalize=True)

To split by location

按位置拆分

all_allowable = df3['Bid'].isin(['E', 'A', 'N'])    
df3[all_allowable].groupby('Location')['Bid'].value_counts(normalize=True)

Answer 2

回答by LondonRob

Your answer looks pretty good to me. It's very readable, which is obviously important.

你的回答对我来说很不错。它非常易读，这显然很重要。

If you want an alternative, you could look at groupby, but, as I said, I think your own answer looks great:

如果你想要一个替代方案，你可以看看groupby，但是，正如我所说，我认为你自己的答案看起来很棒：

>>> df=pd.DataFrame({'Location': [ 'Ala', 'SS', 'Ala', 'Ala', 'SS', 'Ala', 'SS', 'TXE', 'TXE', 'TXE'],
...                  'Bid': ['E','N','E','N','N','E', 'E',np.nan,np.nan,'A']})
>>> df = df.set_index('Location')
>>> ean = df.groupby(level='Location').count()
>>> ea = df[df != 'N'].groupby(level='Location').count()
>>> ea.astype(float) / ean
               Bid
Location          
Ala       0.750000
SS        0.333333
TXE       1.000000

Pandas 计算数据摘要中的百分比

提问by user3084006

回答by jmz

回答by LondonRob

相关推荐

最近更新

标签

Pandas 计算数据摘要中的百分比

提问by user3084006

回答by jmz

回答by LondonRob

相关推荐

Pandas GroupBy.apply 方法复制第一组

如何重新采样时间序列 Pandas 数据框？

pandas ValueError：在 LinearSVC 期间，数组在 _assert_all_finite 中包含 NaN 或无穷大

pandas 解析熊猫中的日期字符串

相关推荐

最近更新

标签