pandas 熊猫将函数应用于多列和多行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24202110/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:09:42  来源:igfitidea点击:

pandas apply function to multiple columns and multiple rows

pythonpandas

提问by yemu

I have a dataframe with consecutive pixel coordinates in rows and columns 'xpos', 'ypos', and I want to calculate the angle in degrees of each path between consecutive pixels. Currently I have the solution presented below, which works fine and for teh size of my file is speedy enough, but iterating through all the rows seems not to be the pandas way to do it. I know how to apply a function to different columns, and how to apply functions to different rows of columns, but can't figure out how to combine both.

我有一个数据框,行和列中的连续像素坐标为“xpos”、“ypos”,我想计算连续像素之间每条路径的角度(以度为单位)。目前我有下面介绍的解决方案,它工作正常,而且我的文件的大小足够快,但遍历所有行似乎不是 Pandas 的方法。我知道如何将函数应用于不同的列,以及如何将函数应用于不同的列行,但无法弄清楚如何将两者结合起来。

here's my code:

这是我的代码:

fix_df = pd.read_csv('fixations_out.csv')

# wyliczanie k?ta sakady
temp_list=[]
for count, row in df.iterrows():
    x1 = row['xpos']
    y1 = row['ypos']
    try:
        x2 = df['xpos'].ix[count-1]
        y2 = df['ypos'].ix[count-1]
        a = abs(180/math.pi * math.atan((y2-y1)/(x2-x1)))
        temp_list.append(a)
    except KeyError:
        temp_list.append(np.nan)

and then I insert temp list into df

然后我将临时列表插入 df

EDIT: after implementing the tip from the comment I have:

编辑:在实施评论中的提示后,我有:

df['diff_x'] = df['xpos'].shift() - df['xpos']
df['diff_y'] = df['ypos'].shift() - df['ypos']

def calc_angle(x):
    try:
        a = abs(180/math.pi * math.atan((x.diff_y)/(x.diff_x)))
        return a
    except ZeroDivisionError:
        return 0

df['angle_degrees'] = df.apply(calc_angle, axis=1)

I compared the time of three solutions for my df (the size of the df is about 6k rows), the iteration is almost 9 times slower than apply, and about 1500 times slower then doing it without apply:

我比较了我的 df 三个解决方案的时间(df 的大小大约是 6k 行),迭代几乎比 apply 慢 9 倍,比不使用 apply 慢大约 1500 倍:

execution time of the solution with iteration, including insert of a new column back to df: 1,51s

带有迭代的解决方案的执行时间,包括将新列插入回 df:1,51s

execution time of the solution without iteration, with apply: 0.17s

没有迭代的解决方案的执行时间,应用:0.17s

execution time of accepted answer by EdChum using diff(), without iteration and without apply: 0.001s

EdChum 使用 diff() 接受答案的执行时间,无迭代且无应用:0.001s

Suggestion: do not use iteration or apply and always try to use vectorized calculation ;) it is not only faster, but also more readable.

建议:不要使用迭代或应用,始终尝试使用矢量化计算;) 它不仅更快,而且更具可读性。

回答by EdChum

You can do this via the following method and I compared the pandas way to your way and it is over 1000 times faster, and that is without adding the list back as a new column! This was done on a 10000 row dataframe

您可以通过以下方法执行此操作,我将 Pandas 方式与您的方式进行了比较,速度提高了 1000 多倍,而且无需将列表添加回新列!这是在 10000 行数据帧上完成的

In [108]:

%%timeit
import numpy as np
df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].shift() - df['xpos']/df['ypos'].shift() - df['ypos']))

1000 loops, best of 3: 1.27 ms per loop

In [100]:

%%timeit
temp_list=[]
for count, row in df.iterrows():
    x1 = row['xpos']
    y1 = row['ypos']
    try:
        x2 = df['xpos'].ix[count-1]
        y2 = df['ypos'].ix[count-1]
        a = abs(180/math.pi * math.atan((y2-y1)/(x2-x1)))
        temp_list.append(a)
    except KeyError:
        temp_list.append(np.nan)
1 loops, best of 3: 1.29 s per loop

Also if possible avoid using apply, as this operates row-wise, if you can find a vectorised method that can work on the entire series or dataframe then always prefer this.

此外,如果可能,请避免使用apply,因为它是按行操作的,如果您可以找到可以在整个系列或数据帧上工作的矢量化方法,那么总是更喜欢这个。

UPDATE

更新

seeing as you are just doing a subtraction from the previous row there is built in method for this diffthis results in even faster code:

看到您只是从前一行进行减法运算,因此内置方法diff可以生成更快的代码:

In [117]:

%%timeit
import numpy as np
df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].diff(1)/df['ypos'].diff(1)))

1000 loops, best of 3: 1.01 ms per loop

Another update

另一个更新

There is also a build in method for series and dataframe division, this now shaves more time off and I achieve sub 1ms time:

还有一个用于系列和数据帧划分的内置方法,这现在可以节省更多时间,我实现了低于 1 毫秒的时间:

In [9]:

%%timeit
import numpy as np
df['angle'] = np.abs(180/math.pi * np.arctan(df['xpos'].diff(1).div(df['ypos'].diff(1))))

1000 loops, best of 3: 951 μs per loop