pandas 计算熊猫数据框中每一行的百分比

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31481803/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:38:13  来源:igfitidea点击:

Compute percentage for each row in pandas dataframe

pythonpandas

提问by user308827

                  country_name  country_code  val_code  \
   United States of America           231                     1   
   United States of America           231                     2   
   United States of America           231                     3   
   United States of America           231                     4   
   United States of America           231                     5   

      y191      y192      y193      y194      y195  \
   47052179  43361966  42736682  43196916  41751928   
   1187385   1201557   1172941   1176366   1192173   
   28211467  27668273  29742374  27543836  28104317   
   179000    193000    233338    276639    249688   
   12613922  12864425  13240395  14106139  15642337 

In the data frame above, I would like to compute for each row, the percentage of the total occupied by that val_code, resulting in foll. data frame.

在上面的数据框中,我想为每一行计算该 val_code 占用的总数的百分比,结果如下。数据框。

I.e. Sum up each row and divide by total of all rows

即总结每一行并除以所有行的总数

                  country_name  country_code  val_code  \
   United States of America           231                     1   
   United States of America           231                     2   
   United States of America           231                     3   
   United States of America           231                     4   
   United States of America           231                     5  

      perc   
  50.14947129
  1.363631254
  32.48344744
  0.260213146
  15.74323688

Right now, I am doing this, but it is not working

现在,我正在这样做,但它不起作用

grp_df = df.groupby(['country_name', 'val_code']).agg()

pct_df = grp_df.groupby(level=0).apply(lambda x: 100*x/float(x.sum()))

采纳答案by EdChum

Ge the total for all the columns of interest and then add the percentage column:

获取所有感兴趣的列的总数,然后添加百分比列:

In [35]:
total = np.sum(df.ix[:,'y191':].values)
df['percent'] = df.ix[:,'y191':].sum(axis=1)/total * 100
df

Out[35]:
               country_name  country_code  val_code      y191      y192  \
0  United States of America           231         1  47052179  43361966   
1  United States of America           231         1   1187385   1201557   
2  United States of America           231         1  28211467  27668273   
3  United States of America           231         1    179000    193000   
4  United States of America           231         1  12613922  12864425   

       y193      y194      y195    percent  
0  42736682  43196916  41751928  50.149471  
1   1172941   1176366   1192173   1.363631  
2  29742374  27543836  28104317  32.483447  
3    233338    276639    249688   0.260213  
4  13240395  14106139  15642337  15.743237  

So np.sumwill sum all the values:

因此np.sum将对所有值求和:

In [32]:
total = np.sum(df.ix[:,'y191':].values)
total

Out[32]:
434899243

We then call .sum(axis=1)/total * 100on the cols of interest to sum row-wise, divide by the total and multiply by 100 to get a percentage.

然后我们调用.sum(axis=1)/total * 100感兴趣的列按行求和,除以总数,再乘以 100 得到一个百分比。

回答by Alexander

You can get the percentages of each column using a lambdafunction as follows:

您可以使用lambda如下函数获取每列的百分比:

>>> df.iloc[:, 3:].apply(lambda x: x / x.sum())
       y191      y192      y193      y194      y195
0  0.527231  0.508411  0.490517  0.500544  0.480236
1  0.013305  0.014088  0.013463  0.013631  0.013713
2  0.316116  0.324405  0.341373  0.319164  0.323259
3  0.002006  0.002263  0.002678  0.003206  0.002872
4  0.141342  0.150833  0.151969  0.163455  0.179920

Your example does not have any duplicate values for val_code, so I'm unsure how you want your data to appear (i.e. show percent of total in column vs. total for each vval_codegroup.)

您的示例没有任何重复值val_code,所以我不确定您希望数据如何显示(即显示列中的总数百分比与每个 vval_code组的总数。)