pandas 将某些列除以熊猫中的另一列
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45899613/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Divide certain columns by another column in pandas
提问by user1179317
Was wondering if there is a more efficient way of dividing multiple columns a certain column. For example say I have:
想知道是否有更有效的方法将多个列划分为某个列。例如说我有:
prev open close volume
20.77 20.87 19.87 962816
19.87 19.89 19.56 668076
19.56 19.96 20.1 578987
20.1 20.4 20.53 418597
And i would like to get:
我想得到:
prev open close volume
20.77 1.0048 0.9567 962816
19.87 1.0010 0.9844 668076
19.56 1.0204 1.0276 578987
20.1 1.0149 1.0214 418597
Basically, columns 'open' and 'close' have been divided by the value from column 'prev.'
基本上,“open”和“close”列已除以“prev”列中的值。
I was able to do this by
我能够做到这一点
df['open'] = list(map(lambda x,y: x/y, df['open'],df['prev']))
df['close'] = list(map(lambda x,y: x/y, df['close'],df['prev']))
I was wondering if there is a simpler way? Especially if there are like 10 columns to be divided by the same value anyways?
我想知道是否有更简单的方法?特别是如果有 10 列要除以相同的值?
采纳答案by Scott Boston
df2[['open','close']] = df2[['open','close']].div(df2['prev'].values,axis=0)
Output:
输出:
prev open close volume
0 20.77 1.004815 0.956668 962816
1 19.87 1.001007 0.984399 668076
2 19.56 1.020450 1.027607 578987
3 20.10 1.014925 1.021393 418597
回答by DYZ
columns_to_divide = ['open', 'close']
df[columns_to_divide] = df[columns_to_divide] / df['prev']
回答by Divakar
For performance, I would suggest using the underlying array data and array-slicing
as the two columns to be modified come in sequence to use view into it -
为了性能,我建议使用底层数组数据,array-slicing
因为要修改的两列依次使用视图 -
a = df.values
df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
To eloborate a bit more on the array-slicing part, with a[:,[1,2]]
would have forced a copy there and would have slowed it down. a[:,[1,2]]
on the dataframe side is equivalent to df[['open','close']]
and that I am guessing is slowing things down too. df.iloc[:,1:3]
is thus improving upon it.
在数组切片部分详细说明一下, witha[:,[1,2]]
会强制在那里复制并减慢它的速度。a[:,[1,2]]
在数据帧方面相当于df[['open','close']]
,我猜这也会减慢速度。df.iloc[:,1:3]
因此正在改进它。
Sample run -
样品运行 -
In [64]: df
Out[64]:
prev open close volume
0 20.77 20.87 19.87 962816
1 19.87 19.89 19.56 668076
2 19.56 19.96 20.10 578987
3 20.10 20.40 20.53 418597
In [65]: a = df.values
...: df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
...:
In [66]: df
Out[66]:
prev open close volume
0 20.77 1.004815 0.956668 962816
1 19.87 1.001007 0.984399 668076
2 19.56 1.020450 1.027607 578987
3 20.10 1.014925 1.021393 418597
Runtime test
运行时测试
Approaches -
方法 -
def numpy_app(df): # Proposed in this post
a = df.values
df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
return df
def pandas_app1(df): # @Scott Boston's soln
df[['open','close']] = df[['open','close']].div(df['prev'].values,axis=0)
return df
Timings -
时间 -
In [44]: data = np.random.randint(15, 25, (100000,4)).astype(float)
...: df1 = pd.DataFrame(data, columns=(('prev','open','close','volume')))
...: df2 = df1.copy()
...:
In [45]: %timeit pandas_app1(df1)
...: %timeit numpy_app(df2)
...:
100 loops, best of 3: 2.68 ms per loop
1000 loops, best of 3: 885 μs per loop