Python pandas 数据框创建新列并填充来自同一 df 的计算值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18504967/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas dataframe create new columns and fill with calculated values from same df
提问by jonas
Here is a simplified example of my df:
这是我的 df 的一个简化示例:
ds = pd.DataFrame(np.abs(randn(3, 4)), index=[1,2,3], columns=['A','B','C','D'])
ds
A B C D
1 1.099679 0.042043 0.083903 0.410128
2 0.268205 0.718933 1.459374 0.758887
3 0.680566 0.538655 0.038236 1.169403
I would like to sum the data in the columns row wise:
我想按行对列中的数据求和:
ds['sum']=ds.sum(axis=1)
ds
A B C D sum
1 0.095389 0.556978 1.646888 1.959295 4.258550
2 1.076190 2.668270 0.825116 1.477040 6.046616
3 0.245034 1.066285 0.967124 0.791606 3.070049
Now, here comes my question! I would like to create 4 new columns and calculate the percentage value from the total (sum) in every row. So first value in the first new column should be (0.095389/4.258550), first value in the second new column (0.556978/4.258550)...and so on... Help please
现在,我的问题来了!我想创建 4 个新列并计算每一行的总数(总和)的百分比值。所以第一个新列中的第一个值应该是 (0.095389/4.258550),第二个新列中的第一个值 (0.556978/4.258550)...等等...请帮忙
采纳答案by joris
You can do this easily manually for each column like this:
您可以像这样为每一列轻松手动执行此操作:
df['A_perc'] = df['A']/df['sum']
If you want to do this in one step for all columns, you can use the div
method (http://pandas.pydata.org/pandas-docs/stable/basics.html#matching-broadcasting-behavior):
如果您想在一个步骤中对所有列执行此操作,您可以使用该div
方法(http://pandas.pydata.org/pandas-docs/stable/basics.html#matching-broadcasting-behavior):
ds.div(ds['sum'], axis=0)
And if you want this in one step added to the same dataframe:
如果您希望在一步中将其添加到同一个数据框中:
>>> ds.join(ds.div(ds['sum'], axis=0), rsuffix='_perc')
A B C D sum A_perc B_perc \
1 0.151722 0.935917 1.033526 0.941962 3.063127 0.049532 0.305543
2 0.033761 1.087302 1.110695 1.401260 3.633017 0.009293 0.299283
3 0.761368 0.484268 0.026837 1.276130 2.548603 0.298739 0.190013
C_perc D_perc sum_perc
1 0.337409 0.307517 1
2 0.305722 0.385701 1
3 0.010530 0.500718 1
回答by waitingkuo
In [56]: df = pd.DataFrame(np.abs(randn(3, 4)), index=[1,2,3], columns=['A','B','C','D'])
In [57]: df.divide(df.sum(axis=1), axis=0)
Out[57]:
A B C D
1 0.319124 0.296653 0.138206 0.246017
2 0.376994 0.326481 0.230464 0.066062
3 0.036134 0.192954 0.430341 0.340571