Python pandas 数据框创建新列并填充来自同一 df 的计算值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18504967/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 10:55:17  来源:igfitidea点击:

pandas dataframe create new columns and fill with calculated values from same df

pythonpandascalculated-columns

提问by jonas

Here is a simplified example of my df:

这是我的 df 的一个简化示例:

ds = pd.DataFrame(np.abs(randn(3, 4)), index=[1,2,3], columns=['A','B','C','D'])
ds
      A         B         C         D
1  1.099679  0.042043  0.083903  0.410128
2  0.268205  0.718933  1.459374  0.758887
3  0.680566  0.538655  0.038236  1.169403

I would like to sum the data in the columns row wise:

我想按行对列中的数据求和:

ds['sum']=ds.sum(axis=1)
ds
      A         B         C         D       sum
1  0.095389  0.556978  1.646888  1.959295  4.258550
2  1.076190  2.668270  0.825116  1.477040  6.046616
3  0.245034  1.066285  0.967124  0.791606  3.070049

Now, here comes my question! I would like to create 4 new columns and calculate the percentage value from the total (sum) in every row. So first value in the first new column should be (0.095389/4.258550), first value in the second new column (0.556978/4.258550)...and so on... Help please

现在,我的问题来了!我想创建 4 个新列并计算每一行的总数(总和)的百分比值。所以第一个新列中的第一个值应该是 (0.095389/4.258550),第二个新列中的第一个值 (0.556978/4.258550)...等等...请帮忙

采纳答案by joris

You can do this easily manually for each column like this:

您可以像这样为每一列轻松手动执行此操作:

df['A_perc'] = df['A']/df['sum']


If you want to do this in one step for all columns, you can use the divmethod (http://pandas.pydata.org/pandas-docs/stable/basics.html#matching-broadcasting-behavior):

如果您想在一个步骤中对所有列执行此操作,您可以使用该div方法(http://pandas.pydata.org/pandas-docs/stable/basics.html#matching-broadcasting-behavior):

ds.div(ds['sum'], axis=0)

And if you want this in one step added to the same dataframe:

如果您希望在一步中将其添加到同一个数据框中:

>>> ds.join(ds.div(ds['sum'], axis=0), rsuffix='_perc')
          A         B         C         D       sum    A_perc    B_perc  \
1  0.151722  0.935917  1.033526  0.941962  3.063127  0.049532  0.305543   
2  0.033761  1.087302  1.110695  1.401260  3.633017  0.009293  0.299283   
3  0.761368  0.484268  0.026837  1.276130  2.548603  0.298739  0.190013   

     C_perc    D_perc  sum_perc  
1  0.337409  0.307517         1  
2  0.305722  0.385701         1  
3  0.010530  0.500718         1  

回答by waitingkuo

In [56]: df = pd.DataFrame(np.abs(randn(3, 4)), index=[1,2,3], columns=['A','B','C','D'])

In [57]: df.divide(df.sum(axis=1), axis=0)
Out[57]: 
          A         B         C         D
1  0.319124  0.296653  0.138206  0.246017
2  0.376994  0.326481  0.230464  0.066062
3  0.036134  0.192954  0.430341  0.340571