pandas 将某些列除以熊猫中的另一列

Question

提问by user1179317

Was wondering if there is a more efficient way of dividing multiple columns a certain column. For example say I have:

想知道是否有更有效的方法将多个列划分为某个列。例如说我有：

prev    open    close   volume
20.77   20.87   19.87   962816
19.87   19.89   19.56   668076
19.56   19.96   20.1    578987
20.1    20.4    20.53   418597

And i would like to get:

我想得到：

prev    open    close   volume
20.77   1.0048  0.9567  962816
19.87   1.0010  0.9844  668076
19.56   1.0204  1.0276  578987
20.1    1.0149  1.0214  418597

Basically, columns 'open' and 'close' have been divided by the value from column 'prev.'

基本上，“open”和“close”列已除以“prev”列中的值。

I was able to do this by

我能够做到这一点

df['open'] = list(map(lambda x,y: x/y, df['open'],df['prev']))
df['close'] = list(map(lambda x,y: x/y, df['close'],df['prev']))

I was wondering if there is a simpler way? Especially if there are like 10 columns to be divided by the same value anyways?

我想知道是否有更简单的方法？特别是如果有 10 列要除以相同的值？

Answer 1

采纳答案by Scott Boston

df2[['open','close']] = df2[['open','close']].div(df2['prev'].values,axis=0)

Output:

输出：

    prev      open     close  volume
0  20.77  1.004815  0.956668  962816
1  19.87  1.001007  0.984399  668076
2  19.56  1.020450  1.027607  578987
3  20.10  1.014925  1.021393  418597

Answer 2

回答by DYZ

columns_to_divide = ['open', 'close']
df[columns_to_divide] = df[columns_to_divide] / df['prev']

Answer 3

回答by Divakar

For performance, I would suggest using the underlying array data and array-slicingas the two columns to be modified come in sequence to use view into it -

为了性能，我建议使用底层数组数据，array-slicing因为要修改的两列依次使用视图 -

a = df.values
df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]

To eloborate a bit more on the array-slicing part, with a[:,[1,2]]would have forced a copy there and would have slowed it down. a[:,[1,2]]on the dataframe side is equivalent to df[['open','close']]and that I am guessing is slowing things down too. df.iloc[:,1:3]is thus improving upon it.

在数组切片部分详细说明一下， witha[:,[1,2]]会强制在那里复制并减慢它的速度。a[:,[1,2]]在数据帧方面相当于df[['open','close']]，我猜这也会减慢速度。df.iloc[:,1:3]因此正在改进它。

Sample run -

样品运行 -

In [64]: df
Out[64]: 
    prev   open  close  volume
0  20.77  20.87  19.87  962816
1  19.87  19.89  19.56  668076
2  19.56  19.96  20.10  578987
3  20.10  20.40  20.53  418597

In [65]: a = df.values
    ...: df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
    ...: 

In [66]: df
Out[66]: 
    prev      open     close  volume
0  20.77  1.004815  0.956668  962816
1  19.87  1.001007  0.984399  668076
2  19.56  1.020450  1.027607  578987
3  20.10  1.014925  1.021393  418597

Runtime test

运行时测试

Approaches -

方法 -

def numpy_app(df): # Proposed in this post
    a = df.values
    df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
    return df

def pandas_app1(df): # @Scott Boston's soln
    df[['open','close']] = df[['open','close']].div(df['prev'].values,axis=0)
    return df

Timings -

时间 -

In [44]: data = np.random.randint(15, 25, (100000,4)).astype(float)
    ...: df1 = pd.DataFrame(data, columns=(('prev','open','close','volume')))
    ...: df2 = df1.copy()
    ...: 

In [45]: %timeit pandas_app1(df1)
    ...: %timeit numpy_app(df2)
    ...: 
100 loops, best of 3: 2.68 ms per loop
1000 loops, best of 3: 885 μs per loop

pandas 将某些列除以熊猫中的另一列

提问by user1179317

采纳答案by Scott Boston

回答by DYZ

回答by Divakar

相关推荐

最近更新

标签

pandas 将某些列除以熊猫中的另一列

提问by user1179317

采纳答案by Scott Boston

回答by DYZ

回答by Divakar

相关推荐

pandas 如何在 Python 中从 OSM 文件中提取和可视化数据

pandas 如何用字典键替换数据框列值？

pandas numpy.where: TypeError: 无效的类型提升

pandas 循环分组数据框中的组

相关推荐

最近更新

标签