Python 按总和标准化熊猫数据框的行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/35678874/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 16:48:27  来源:igfitidea点击:

Normalize rows of pandas data frame by their sums

pythonpandas

提问by rba

I have a pandas dataframe containing spectral data and metadata. The columns are labeled with a multiindex so that df['wvl']gives the spectra and df['meta']gives the metadata. Within df['wvl']the column labels are the wavelength values for the spectrometer channels.

我有一个包含光谱数据和元数据的熊猫数据框。列标有多索引,以便df['wvl']提供光谱和df['meta']元数据。内df['wvl']的列标签是用于光谱仪通道中的波长的值。

What I want to do is normalize each row of df['wvl']by the sum of that row so that adding up the values in the row gives a total of 1.0.

我想要做的是将每一行归一化为该行df['wvl']的总和,以便将行中的值相加得到 1.0。

Here's what one row of the dataframe looks like:

这是数据框的一行的样子:

df['wvl'].iloc[0]
246.050003     128.533035
246.102005     102.756321
246.156006      99.930775
...    
848.697205     121.313347
848.896423     127.011662
849.095703     123.234168
Name: 0, dtype: float64

But when I do something like:

但是当我做类似的事情时:

df['wvl'].iloc[0]=df['wvl'].iloc[0]/df['wvl'].iloc[0].sum()

Nothing happens! I get the exact same values:

没发生什么事!我得到完全相同的值:

df['wvl'].iloc[0]
246.050003     128.533035
246.102005     102.756321
246.156006      99.930775
...    
848.697205     121.313347
848.896423     127.011662
849.095703     123.234168
Name: 0, dtype: float64

If I create a temporary variable to hold the row, I can do the normalization just fine:

如果我创建一个临时变量来保存该行,我可以很好地进行规范化:

temp=df['wvl'].iloc[0]

temp=temp/temp.sum()

temp
246.050003    0.000027
246.102005    0.000022
246.156006    0.000021
                ...   
848.697205    0.000026
848.896423    0.000027
849.095703    0.000026
Name: 0, dtype: float64

But if I try to replace the dataframe row with the normalized temporary variable, nothing happens:

但是,如果我尝试用标准化临时变量替换数据帧行,则什么也不会发生:

df['wvl'].iloc[0]=temp

df['wvl'].iloc[0]
246.050003     128.533035
246.102005     102.756321
246.156006      99.930775
                 ...     
848.697205     121.313347
848.896423     127.011662
849.095703     123.234168
Name: 0, dtype: float64

I'm obviously missing something here, but I can't figure out what and it's driving me insane. Help? Thanks in advance!

我显然在这里遗漏了一些东西,但我无法弄清楚是什么,这让我发疯。帮助?提前致谢!

回答by Ami Tavory

You can use

您可以使用

df.div(df.sum(axis=1), axis=0)

df.sum(axis=1)sums up each row; df.div(..., axis=0)then divides.

df.sum(axis=1)总结每一行;df.div(..., axis=0)然后分。

Example:

例子:

import pandas as pd

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
>>> df.div(df.sum(axis=1), axis=0)
    a   b
0   0.250000    0.750000
1   0.333333    0.666667