Python 列中按值计算的熊猫百分比

Question

提问by SANM2009

I want to get a percentage of a particular value in a df column. Say I have a df with (col1, col2 , col3, gender) gender column has values of M or F. I want to get the percentage of M and F values in the df.

我想在 df 列中获得特定值的百分比。假设我有一个带有 (col1, col2, col3, sex) 性别列的 df，其值为 M 或 F。我想获得 df 中 M 和 F 值的百分比。

I have tried this, which gives me the number M and F instances, but I want these as a percentage of the total number of values in the df.

我试过这个，它给了我 M 和 F 实例的数量，但我希望这些作为 df 中值总数的百分比。

df.groupby('gender').size()

Can someone help?

有人可以帮忙吗？

Answer 1

回答by cs95

Use value_countswith normalize=True:

使用value_counts有normalize=True：

df['gender'].value_counts(normalize=True) * 100

Answer 2

回答by student

If you do not need to look Mand Fvalues other than gendercolumn then, may be you can try using value_counts()and count()as following:

如果你并不需要看M和F比其它值gender列的话，可能是你可以尝试使用value_counts()，并count()为以下几点：

df = pd.DataFrame({'gender':['M','M','F', 'F', 'F']})
# Percentage calculation
(df['gender'].value_counts()/df['gender'].count())*100

Result:

结果：

F    60.0
M    40.0
Name: gender, dtype: float64

Or, using groupby:

或者，使用groupby：

(df.groupby('gender').size()/df['gender'].count())*100

Answer 3

回答by Rohith Gunda

Let's say there are 200 values out of which 120 are categorized as M and 80 as F

假设有 200 个值，其中 120 个归类为 M，80 个归类为 F

1)

df['gender'].value_counts()

 output:

 M=120
 F=80

2)

df['gender'].value_counts(Normalize=True)

  output:

  M=0.60
  F=0.40

3)

df['gender'].value_counts(Normalize=True)*100 #will convert output to percentages

  output:

  M=60
  F=40

Answer 4

回答by Harshal SG

print('(Gender Male= 0):\n {}%'.format(100 - round(df['Gender'].mean()*100, 2)))
print('(Gender Female=1):\n{}%'.format(round(df['Gender'].mean()*100, 2)))

Answer 5

回答by Ayyasamy

finding the percentage of target variation to chenck imbalance/not.

找到目标变异与 chenck 不平衡/不平衡的百分比。

g = data[Target_col_Y]
df = pd.concat([g.value_counts(),              
g.value_counts(normalize=True).mul(100)],axis=1,keys=('counts','percentage'))

print (df)

counts percentage

计数百分比

0 36548 88.734583

1 4640 11.265417

finding the maximum in the columns percentage here, to check how much #imbalance there

在这里找到列百分比中的最大值，以检查那里有多少#imbalance

df1=df.diff(periods=1,axis=0)
difvalue=df1[[list(df1.columns)[-1]]].max()

Python 列中按值计算的熊猫百分比

提问by SANM2009

回答by cs95

回答by student

回答by Rohith Gunda

回答by Harshal SG

回答by Ayyasamy

finding the percentage of target variation to chenck imbalance/not.

找到目标变异与 chenck 不平衡/不平衡的百分比。

finding the maximum in the columns percentage here, to check how much #imbalance there

在这里找到列百分比中的最大值，以检查那里有多少#imbalance

相关推荐

最近更新

标签

Python 列中按值计算的熊猫百分比

提问by SANM2009

回答by cs95

回答by student

回答by Rohith Gunda

回答by Harshal SG

回答by Ayyasamy

finding the percentage of target variation to chenck imbalance/not.

找到目标变异与 chenck 不平衡/不平衡的百分比。

finding the maximum in the columns percentage here, to check how much #imbalance there

在这里找到列百分比中的最大值，以检查那里有多少#imbalance

相关推荐

Python 我可以向 tqdm 进度条添加消息吗？

Python Visual Studio Code：Intellisense 不起作用

Python pandas：为什么我的训练数据的 df.iloc[:, :-1].values 只选择到倒数第二列？

Python 导入 pandas_datareader 给出 ImportError：无法导入名称 'is_list_like'

相关推荐

最近更新

标签