Python 按总和标准化熊猫数据框的行

Question

提问by rba

I have a pandas dataframe containing spectral data and metadata. The columns are labeled with a multiindex so that df['wvl']gives the spectra and df['meta']gives the metadata. Within df['wvl']the column labels are the wavelength values for the spectrometer channels.

我有一个包含光谱数据和元数据的熊猫数据框。列标有多索引，以便df['wvl']提供光谱和df['meta']元数据。内df['wvl']的列标签是用于光谱仪通道中的波长的值。

What I want to do is normalize each row of df['wvl']by the sum of that row so that adding up the values in the row gives a total of 1.0.

我想要做的是将每一行归一化为该行df['wvl']的总和，以便将行中的值相加得到 1.0。

Here's what one row of the dataframe looks like:

这是数据框的一行的样子：

df['wvl'].iloc[0]
246.050003     128.533035
246.102005     102.756321
246.156006      99.930775
...    
848.697205     121.313347
848.896423     127.011662
849.095703     123.234168
Name: 0, dtype: float64

But when I do something like:

但是当我做类似的事情时：

df['wvl'].iloc[0]=df['wvl'].iloc[0]/df['wvl'].iloc[0].sum()

Nothing happens! I get the exact same values:

没发生什么事！我得到完全相同的值：

df['wvl'].iloc[0]
246.050003     128.533035
246.102005     102.756321
246.156006      99.930775
...    
848.697205     121.313347
848.896423     127.011662
849.095703     123.234168
Name: 0, dtype: float64

If I create a temporary variable to hold the row, I can do the normalization just fine:

如果我创建一个临时变量来保存该行，我可以很好地进行规范化：

temp=df['wvl'].iloc[0]

temp=temp/temp.sum()

temp
246.050003    0.000027
246.102005    0.000022
246.156006    0.000021
                ...   
848.697205    0.000026
848.896423    0.000027
849.095703    0.000026
Name: 0, dtype: float64

But if I try to replace the dataframe row with the normalized temporary variable, nothing happens:

但是，如果我尝试用标准化临时变量替换数据帧行，则什么也不会发生：

df['wvl'].iloc[0]=temp

df['wvl'].iloc[0]
246.050003     128.533035
246.102005     102.756321
246.156006      99.930775
                 ...     
848.697205     121.313347
848.896423     127.011662
849.095703     123.234168
Name: 0, dtype: float64

I'm obviously missing something here, but I can't figure out what and it's driving me insane. Help? Thanks in advance!

我显然在这里遗漏了一些东西，但我无法弄清楚是什么，这让我发疯。帮助？提前致谢！

Answer 1

回答by Ami Tavory

You can use

您可以使用

df.div(df.sum(axis=1), axis=0)

df.sum(axis=1)sums up each row; df.div(..., axis=0)then divides.

df.sum(axis=1)总结每一行；df.div(..., axis=0)然后分。

Example:

例子：

import pandas as pd

df = pd.DataFrame({'a': [1, 2], 'b': [3, 4]})
>>> df.div(df.sum(axis=1), axis=0)
    a   b
0   0.250000    0.750000
1   0.333333    0.666667

Python 按总和标准化熊猫数据框的行

提问by rba

回答by Ami Tavory

相关推荐

最近更新

标签

Python 按总和标准化熊猫数据框的行

提问by rba

回答by Ami Tavory

相关推荐

Python 如何在 SQLAlchemy 中设置连接超时

Python在列表中查找项目索引的最快方法

Python 熊猫：删除所有 NaN 的列

Python Pandas read_excel dtype str 在读取或通过 to_csv 写入时将 nan 替换为空白（''）

相关推荐

最近更新

标签