Python 按总和标准化熊猫数据框的行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/35678874/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Normalize rows of pandas data frame by their sums
提问by rba
I have a pandas dataframe containing spectral data and metadata. The columns are labeled with a multiindex so that df['wvl']
gives the spectra and df['meta']
gives the metadata. Within df['wvl']
the column labels are the wavelength values for the spectrometer channels.
我有一个包含光谱数据和元数据的熊猫数据框。列标有多索引,以便df['wvl']
提供光谱和df['meta']
元数据。内df['wvl']
的列标签是用于光谱仪通道中的波长的值。
What I want to do is normalize each row of df['wvl']
by the sum of that row so that adding up the values in the row gives a total of 1.0.
我想要做的是将每一行归一化为该行df['wvl']
的总和,以便将行中的值相加得到 1.0。
Here's what one row of the dataframe looks like:
这是数据框的一行的样子:
df['wvl'].iloc[0]
246.050003 128.533035
246.102005 102.756321
246.156006 99.930775
...
848.697205 121.313347
848.896423 127.011662
849.095703 123.234168
Name: 0, dtype: float64
But when I do something like:
但是当我做类似的事情时:
df['wvl'].iloc[0]=df['wvl'].iloc[0]/df['wvl'].iloc[0].sum()
Nothing happens! I get the exact same values:
没发生什么事!我得到完全相同的值:
df['wvl'].iloc[0]
246.050003 128.533035
246.102005 102.756321
246.156006 99.930775
...
848.697205 121.313347
848.896423 127.011662
849.095703 123.234168
Name: 0, dtype: float64
If I create a temporary variable to hold the row, I can do the normalization just fine:
如果我创建一个临时变量来保存该行,我可以很好地进行规范化:
temp=df['wvl'].iloc[0]
temp=temp/temp.sum()
temp
246.050003 0.000027
246.102005 0.000022
246.156006 0.000021
...
848.697205 0.000026
848.896423 0.000027
849.095703 0.000026
Name: 0, dtype: float64
But if I try to replace the dataframe row with the normalized temporary variable, nothing happens:
但是,如果我尝试用标准化临时变量替换数据帧行,则什么也不会发生:
df['wvl'].iloc[0]=temp
df['wvl'].iloc[0]
246.050003 128.533035
246.102005 102.756321
246.156006 99.930775
...
848.697205 121.313347
848.896423 127.011662
849.095703 123.234168
Name: 0, dtype: float64
I'm obviously missing something here, but I can't figure out what and it's driving me insane. Help? Thanks in advance!
我显然在这里遗漏了一些东西,但我无法弄清楚是什么,这让我发疯。帮助?提前致谢!
回答by Ami Tavory
You can use
您可以使用
df.div(df.sum(axis=1), axis=0)
df.sum(axis=1)
sums up each row; df.div(..., axis=0)
then divides.
df.sum(axis=1)
总结每一行;df.div(..., axis=0)
然后分。
Example:
例子:
import pandas as pd
df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
>>> df.div(df.sum(axis=1), axis=0)
a b
0 0.250000 0.750000
1 0.333333 0.666667