DataFrame groupby 上的 Pandas 百分比计数

Question

提问by MikG

I have a DataFrame (mydf) along the lines of the following:

我有一个 DataFrame ( mydf) 沿着以下几行：

Index   Feature ID  Stuff1  Stuff2
1       True    1   23      12
2       True    1   54      12
3       False   0   45      67
4       True    0   38      29
5       False   1   32      24
6       False   1   59      39
7       True    0   37      32
8       False   0   76      65
9       False   1   32      12
10      True    0   23      15
..n     True    1   21      99

I am trying to calculate the True and False percentages of the Featurefor each ID(0 or 1), and I am looking for two output for each ID:

我正在尝试计算Feature每个ID（0 或 1）的 True 和 False 百分比，并且我正在为每个 ID 寻找两个输出：

Feature ID  Percent
True    1   20%
False   1   30%

Feature ID  Percent
True    0   30%
False   0   20%

I have tried a few attempts, but I start getting counts for all columns and then a percentage for all columns.

我尝试了一些尝试，但我开始获取所有列的计数，然后是所有列的百分比。

Here's my bad attempt:

这是我的错误尝试：

percentageID0 = mydf[ mydf['ID']==0 ].set_index(['Feature']).count()
percentageID1 = mydf[ mydf['ID']==1 ].set_index(['Feature']).count()
fullcount = (mydf.groupby(['ID']).count()).sum()

print (percentageID0/fullcount) * 100
print (percentageID1/fullcount) * 100

Think I am getting mixed up with the groupby/index format.

认为我与 groupby/index 格式混淆了。

Answer 1

回答by CT Zhu

Could be just this:

可能只是这样：

In [73]:

print pd.DataFrame({'Percentage': df.groupby(('ID', 'Feature')).size() / len(df)})
            Percentage
ID Feature            
0  False           0.2
   True            0.3
1  False           0.3
   True            0.2

Answer 2

回答by TheBlackCat

You can use pd.crosstab:

您可以使用pd.crosstab：

>>> newdf = pd.crosstab(index=mydf['Feature'], columns=mydf['ID']).stack()/len(mydf)
>>> print(newdf)
Feature  ID
False    0     0.2
         1     0.3
True     0     0.3
         1     0.2
dtype: float64

Answer 3

回答by tomp

You could also use the tableone packagefor this. Create the sample dataframe:

您也可以为此使用tableone 包。创建示例数据框：

# Create df with 10 rows.
df = pd.DataFrame({'Feature': [True,True,False,True,False,False,True,False,False,True], 
    'ID': [1,1,0,0,1,1,0,0,1,0],
    'Stuff1': [23,54,45,38,32,59,37,76,32,23],
    'Stuff2': [12,12,67,29,24,39,32,65,12,15]})

Input:

输入：

# Import the tableone package (v0.5.18)
from tableone import TableOne

# Create the table, specifying feature and id as categorical
TableOne(df, columns=['Feature','ID'], 
    categorical=['Feature','ID'],
    label_suffix=True)

Output:

输出：

DataFrame groupby 上的 Pandas 百分比计数

提问by MikG

回答by CT Zhu

回答by TheBlackCat

回答by tomp

相关推荐

最近更新

标签

DataFrame groupby 上的 Pandas 百分比计数

提问by MikG

回答by CT Zhu

回答by TheBlackCat

回答by tomp

相关推荐

Pandas：循环遍历列

Pandas 使用“更大”的 DataFrames 附加性能连接/附加

Python 和 Pandas：将列组合成一个日期

作为新列附加到 Pandas 中的 DataFrame

相关推荐

最近更新

标签