Python 列中按值计算的熊猫百分比
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50558458/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Pandas percentage by value in a column
提问by SANM2009
I want to get a percentage of a particular value in a df column. Say I have a df with (col1, col2 , col3, gender) gender column has values of M or F. I want to get the percentage of M and F values in the df.
我想在 df 列中获得特定值的百分比。假设我有一个带有 (col1, col2, col3, sex) 性别列的 df,其值为 M 或 F。我想获得 df 中 M 和 F 值的百分比。
I have tried this, which gives me the number M and F instances, but I want these as a percentage of the total number of values in the df.
我试过这个,它给了我 M 和 F 实例的数量,但我希望这些作为 df 中值总数的百分比。
df.groupby('gender').size()
Can someone help?
有人可以帮忙吗?
回答by cs95
Use value_counts
with normalize=True
:
使用value_counts
有normalize=True
:
df['gender'].value_counts(normalize=True) * 100
回答by student
If you do not need to look M
and F
values other than gender
column then, may be you can try using value_counts()
and count()
as following:
如果你并不需要看M
和F
比其它值gender
列的话,可能是你可以尝试使用value_counts()
,并count()
为以下几点:
df = pd.DataFrame({'gender':['M','M','F', 'F', 'F']})
# Percentage calculation
(df['gender'].value_counts()/df['gender'].count())*100
Result:
结果:
F 60.0
M 40.0
Name: gender, dtype: float64
Or, using groupby
:
或者,使用groupby
:
(df.groupby('gender').size()/df['gender'].count())*100
回答by Rohith Gunda
Let's say there are 200 values out of which 120 are categorized as M and 80 as F
假设有 200 个值,其中 120 个归类为 M,80 个归类为 F
1)
1)
df['gender'].value_counts()
output:
M=120
F=80
2)
2)
df['gender'].value_counts(Normalize=True)
output:
M=0.60
F=0.40
3)
3)
df['gender'].value_counts(Normalize=True)*100 #will convert output to percentages
output:
M=60
F=40
回答by Harshal SG
print('(Gender Male= 0):\n {}%'.format(100 - round(df['Gender'].mean()*100, 2)))
print('(Gender Female=1):\n{}%'.format(round(df['Gender'].mean()*100, 2)))
回答by Ayyasamy
finding the percentage of target variation to chenck imbalance/not.
找到目标变异与 chenck 不平衡/不平衡的百分比。
g = data[Target_col_Y]
df = pd.concat([g.value_counts(),
g.value_counts(normalize=True).mul(100)],axis=1,keys=('counts','percentage'))
print (df)
counts percentage
计数百分比
0 36548 88.734583
0 36548 88.734583
1 4640 11.265417
1 4640 11.265417
finding the maximum in the columns percentage here, to check how much #imbalance there
在这里找到列百分比中的最大值,以检查那里有多少#imbalance
df1=df.diff(periods=1,axis=0)
difvalue=df1[[list(df1.columns)[-1]]].max()