pandas 将某些列除以熊猫中的另一列

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45899613/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:20:05  来源:igfitidea点击:

Divide certain columns by another column in pandas

pythonpandasdataframe

提问by user1179317

Was wondering if there is a more efficient way of dividing multiple columns a certain column. For example say I have:

想知道是否有更有效的方法将多个列划分为某个列。例如说我有:

prev    open    close   volume
20.77   20.87   19.87   962816
19.87   19.89   19.56   668076
19.56   19.96   20.1    578987
20.1    20.4    20.53   418597

And i would like to get:

我想得到:

prev    open    close   volume
20.77   1.0048  0.9567  962816
19.87   1.0010  0.9844  668076
19.56   1.0204  1.0276  578987
20.1    1.0149  1.0214  418597

Basically, columns 'open' and 'close' have been divided by the value from column 'prev.'

基本上,“open”和“close”列已除以“prev”列中的值。

I was able to do this by

我能够做到这一点

df['open'] = list(map(lambda x,y: x/y, df['open'],df['prev']))
df['close'] = list(map(lambda x,y: x/y, df['close'],df['prev']))

I was wondering if there is a simpler way? Especially if there are like 10 columns to be divided by the same value anyways?

我想知道是否有更简单的方法?特别是如果有 10 列要除以相同的值?

采纳答案by Scott Boston

df2[['open','close']] = df2[['open','close']].div(df2['prev'].values,axis=0)

Output:

输出:

    prev      open     close  volume
0  20.77  1.004815  0.956668  962816
1  19.87  1.001007  0.984399  668076
2  19.56  1.020450  1.027607  578987
3  20.10  1.014925  1.021393  418597

回答by DYZ

columns_to_divide = ['open', 'close']
df[columns_to_divide] = df[columns_to_divide] / df['prev']

回答by Divakar

For performance, I would suggest using the underlying array data and array-slicingas the two columns to be modified come in sequence to use view into it -

为了性能,我建议使用底层数组数据,array-slicing因为要修改的两列依次使用视图 -

a = df.values
df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]

To eloborate a bit more on the array-slicing part, with a[:,[1,2]]would have forced a copy there and would have slowed it down. a[:,[1,2]]on the dataframe side is equivalent to df[['open','close']]and that I am guessing is slowing things down too. df.iloc[:,1:3]is thus improving upon it.

在数组切片部分详细说明一下, witha[:,[1,2]]会强制在那里复制并减慢它的速度。a[:,[1,2]]在数据帧方面相当于df[['open','close']],我猜这也会减慢速度。df.iloc[:,1:3]因此正在改进它。

Sample run -

样品运行 -

In [64]: df
Out[64]: 
    prev   open  close  volume
0  20.77  20.87  19.87  962816
1  19.87  19.89  19.56  668076
2  19.56  19.96  20.10  578987
3  20.10  20.40  20.53  418597

In [65]: a = df.values
    ...: df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
    ...: 

In [66]: df
Out[66]: 
    prev      open     close  volume
0  20.77  1.004815  0.956668  962816
1  19.87  1.001007  0.984399  668076
2  19.56  1.020450  1.027607  578987
3  20.10  1.014925  1.021393  418597

Runtime test

运行时测试

Approaches -

方法 -

def numpy_app(df): # Proposed in this post
    a = df.values
    df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
    return df

def pandas_app1(df): # @Scott Boston's soln
    df[['open','close']] = df[['open','close']].div(df['prev'].values,axis=0)
    return df

Timings -

时间 -

In [44]: data = np.random.randint(15, 25, (100000,4)).astype(float)
    ...: df1 = pd.DataFrame(data, columns=(('prev','open','close','volume')))
    ...: df2 = df1.copy()
    ...: 

In [45]: %timeit pandas_app1(df1)
    ...: %timeit numpy_app(df2)
    ...: 
100 loops, best of 3: 2.68 ms per loop
1000 loops, best of 3: 885 μs per loop