DataFrame groupby 上的 Pandas 百分比计数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32122300/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas Percentage count on a DataFrame groupby
提问by MikG
I have a DataFrame (mydf) along the lines of the following:
我有一个 DataFrame ( mydf) 沿着以下几行:
Index Feature ID Stuff1 Stuff2
1 True 1 23 12
2 True 1 54 12
3 False 0 45 67
4 True 0 38 29
5 False 1 32 24
6 False 1 59 39
7 True 0 37 32
8 False 0 76 65
9 False 1 32 12
10 True 0 23 15
..n True 1 21 99
I am trying to calculate the True and False percentages of the Featurefor each ID(0 or 1), and I am looking for two output for each ID:
我正在尝试计算Feature每个ID(0 或 1)的 True 和 False 百分比,并且我正在为每个 ID 寻找两个输出:
Feature ID Percent
True 1 20%
False 1 30%
Feature ID Percent
True 0 30%
False 0 20%
I have tried a few attempts, but I start getting counts for all columns and then a percentage for all columns.
我尝试了一些尝试,但我开始获取所有列的计数,然后是所有列的百分比。
Here's my bad attempt:
这是我的错误尝试:
percentageID0 = mydf[ mydf['ID']==0 ].set_index(['Feature']).count()
percentageID1 = mydf[ mydf['ID']==1 ].set_index(['Feature']).count()
fullcount = (mydf.groupby(['ID']).count()).sum()
print (percentageID0/fullcount) * 100
print (percentageID1/fullcount) * 100
Think I am getting mixed up with the groupby/index format.
认为我与 groupby/index 格式混淆了。
回答by CT Zhu
Could be just this:
可能只是这样:
In [73]:
print pd.DataFrame({'Percentage': df.groupby(('ID', 'Feature')).size() / len(df)})
Percentage
ID Feature
0 False 0.2
True 0.3
1 False 0.3
True 0.2
回答by TheBlackCat
You can use pd.crosstab:
您可以使用pd.crosstab:
>>> newdf = pd.crosstab(index=mydf['Feature'], columns=mydf['ID']).stack()/len(mydf)
>>> print(newdf)
Feature ID
False 0 0.2
1 0.3
True 0 0.3
1 0.2
dtype: float64
回答by tomp
You could also use the tableone packagefor this. Create the sample dataframe:
您也可以为此使用tableone 包。创建示例数据框:
# Create df with 10 rows.
df = pd.DataFrame({'Feature': [True,True,False,True,False,False,True,False,False,True],
'ID': [1,1,0,0,1,1,0,0,1,0],
'Stuff1': [23,54,45,38,32,59,37,76,32,23],
'Stuff2': [12,12,67,29,24,39,32,65,12,15]})
Input:
输入:
# Import the tableone package (v0.5.18)
from tableone import TableOne
# Create the table, specifying feature and id as categorical
TableOne(df, columns=['Feature','ID'],
categorical=['Feature','ID'],
label_suffix=True)
Output:
输出:


