Python 按行规范化熊猫数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18594469/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 11:11:18  来源:igfitidea点击:

Normalizing a pandas DataFrame by row

pythonpandasnormalizationdataframe

提问by ChrisB

What is the most idiomatic way to normalize each row of a pandas DataFrame? Normalizing the columns is easy, so one (very ugly!) option is:

规范化熊猫数据帧的每一行的最惯用的方法是什么?标准化列很容易,所以一个(非常丑陋!)选项是:

(df.T / df.T.sum()).T

Pandas broadcasting rules prevent df / df.sum(axis=1)from doing this

熊猫广播规则阻止df / df.sum(axis=1)这样做

采纳答案by joris

To overcome the broadcasting issue, you can use the divmethod:

要克服广播问题,您可以使用以下div方法:

df.div(df.sum(axis=1), axis=0)

See http://pandas.pydata.org/pandas-docs/stable/basics.html#matching-broadcasting-behavior

http://pandas.pydata.org/pandas-docs/stable/basics.html#matching-broadcasting-behavior

回答by Rafa

I would suggest to use Scikit preprocessinglibraries and transpose your dataframe as required:

我建议使用Scikit 预处理库并根据需要转置数据帧:

'''
Created on 05/11/2015

@author: rafaelcastillo
'''

import matplotlib.pyplot as plt
import pandas
import random
import numpy as np
from sklearn import preprocessing

def create_cos(number_graphs,length,amp):
    # This function is used to generate cos-kind graphs for testing
    # number_graphs: to plot
    # length: number of points included in the x axis
    # amp: Y domain modifications to draw different shapes
    x = np.arange(length)
    amp = np.pi*amp
    xx = np.linspace(np.pi*0.3*amp, -np.pi*0.3*amp, length)
    for i in range(number_graphs):
        iterable = (2*np.cos(x) + random.random()*0.1 for x in xx)
        y = np.fromiter(iterable, np.float)
        if i == 0: 
            yfinal =  y
            continue
        yfinal = np.vstack((yfinal,y))
    return x,yfinal

x,y = create_cos(70,24,3)
data = pandas.DataFrame(y)

x_values = data.columns.values
num_rows = data.shape[0]

fig, ax = plt.subplots()
for i in range(num_rows):
    ax.plot(x_values, data.iloc[i])
ax.set_title('Raw data')
plt.show() 

std_scale = preprocessing.MinMaxScaler().fit(data.transpose())
df_std = std_scale.transform(data.transpose())
data = pandas.DataFrame(np.transpose(df_std))


fig, ax = plt.subplots()
for i in range(num_rows):
    ax.plot(x_values, data.iloc[i])
ax.set_title('Data Normalized')
plt.show()