Python Pandas .apply() 函数中的异常处理

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22847304/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 01:48:23  来源:igfitidea点击:

Exception Handling in Pandas .apply() function

pythonexception-handlingpandas

提问by RukTech

If I have a DataFrame:

如果我有一个数据帧:

myDF = DataFrame(data=[[11,11],[22,'2A'],[33,33]], columns = ['A','B'])

Gives the following dataframe (Starting out on stackoverflow and don't have enough reputation for an image of the DataFrame)

提供以下数据帧(从 stackoverflow 开始并且没有足够的 DataFrame 图像声誉)

   | A  | B  |

0  | 11 | 11 |

1  | 22 | 2A |

2  | 33 | 33 |

If i want to convert column B to int values and drop values that can't be converted I have to do:

如果我想将 B 列转换为 int 值并删除无法转换的值,我必须这样做:

def convertToInt(cell):
    try:
        return int(cell)
    except:
        return None
myDF['B'] = myDF['B'].apply(convertToInt)

If I only do:

如果我只做:

myDF['B'].apply(int)

myDF['B'].apply(int)

the error obviously is:

错误显然是:

C:\WinPython-32bit-2.7.5.3\python-2.7.5\lib\site-packages\pandas\lib.pyd in pandas.lib.map_infer (pandas\lib.c:42840)()

ValueError: invalid literal for int() with base 10: '2A'

C:\WinPython-32bit-2.7.5.3\python-2.7.5\lib\site-packages\pandas\lib.pyd in pandas.lib.map_infer (pandas\lib.c:42840)()

ValueError:int() 的无效文字,基数为 10:'2A'

Is there a way to add exception handling to myDF['B'].apply()

有没有办法向 myDF['B'].apply() 添加异常处理

Thank you in advance!

先感谢您!

采纳答案by Jeff

much better/faster to do:

做得更好/更快:

In [1]: myDF = DataFrame(data=[[11,11],[22,'2A'],[33,33]], columns = ['A','B'])

In [2]: myDF.convert_objects(convert_numeric=True)
Out[2]: 
    A   B
0  11  11
1  22 NaN
2  33  33

[3 rows x 2 columns]

In [3]: myDF.convert_objects(convert_numeric=True).dtypes
Out[3]: 
A      int64
B    float64
dtype: object

This is a vectorized method of doing just this. The coerceflag say to mark as nananything that cannot be converted to numeric.

这是执行此操作的矢量化方法。该coerce旗说,以纪念为nan任何无法转换为数字。

You can of course do this to a single column if you'd like.

如果您愿意,您当然可以对单个列执行此操作。

回答by Amit Verma

A way to achieve that with lambda:

一种实现这一目标的方法lambda

myDF['B'].apply(lambda x: int(x) if str(x).isdigit() else None)

For your input:

对于您的输入:

>>> myDF
    A   B
0  11  11
1  22  2A
2  33  33

[3 rows x 2 columns]


>>> myDF['B'].apply(lambda x: int(x) if str(x).isdigit() else None)
0    11
1   NaN
2    33
Name: B, dtype: float64

回答by atkat12

I had the same question, but for a more general case where it was hard to tell if the function would generate an exception (i.e. you couldn't explicitly check this condition with something as straightforward as isdigit).

我有同样的问题,但对于更一般的情况,很难判断函数是否会生成异常(即,您无法使用像 那样简单的方法明确检查此条件isdigit)。

After thinking about it for a while, I came up with the solution of embedding the try/exceptsyntax in a separate function. I'm posting a toy example in case it helps anyone.

想了想,想到了将try/except语法嵌入到单独的函数中的解决方案。我正在发布一个玩具示例,以防它对任何人有所帮助。

import pandas as pd
import numpy as np

x=pd.DataFrame(np.array([['a','a'], [1,2]]))

def augment(x):
    try:
        return int(x)+1
    except:
        return 'error:' + str(x)

x[0].apply(lambda x: augment(x))