Python 熊猫:如何删除 nan 和 -inf 值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45745085/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:15:49  来源:igfitidea点击:

Python pandas: how to remove nan and -inf values

pythonpython-3.xpandasnumpydataframe

提问by piRSquared

I have the following dataframe

我有以下数据框

           time       X    Y  X_t0     X_tp0  X_t1     X_tp1  X_t2     X_tp2
0         0.002876    0   10     0       NaN   NaN       NaN   NaN       NaN
1         0.002986    0   10     0       NaN     0       NaN   NaN       NaN
2         0.037367    1   10     1  1.000000     0       NaN     0       NaN
3         0.037374    2   10     2  0.500000     1  1.000000     0       NaN
4         0.037389    3   10     3  0.333333     2  0.500000     1  1.000000
5         0.037393    4   10     4  0.250000     3  0.333333     2  0.500000

....
1030308   9.962213  256  268   256  0.000000   256  0.003906   255  0.003922
1030309  10.041799    0  268     0      -inf   256  0.000000   256  0.003906
1030310  10.118960    0  268     0       NaN     0      -inf   256  0.000000

I tried with the following

我尝试了以下

df.dropna(inplace=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.40)

X_train = X_train.drop('time', axis=1)
X_train = X_train.drop('X_t1', axis=1)
X_train = X_train.drop('X_t2', axis=1)
X_test = X_test.drop('time', axis=1)
X_test = X_test.drop('X_t1', axis=1)
X_test = X_test.drop('X_t2', axis=1)
X_test.fillna(X_test.mean(), inplace=True)
X_train.fillna(X_train.mean(), inplace=True)
y_train.fillna(y_train.mean(), inplace=True)

However, I am still getting this error ValueError: Input contains NaN, infinity or a value too large for dtype('float32').whenever i try to fit a regression model fit(X_train, y_train)

但是,ValueError: Input contains NaN, infinity or a value too large for dtype('float32').每当我尝试拟合回归模型时 ,我仍然会收到此错误fit(X_train, y_train)

How can we remove both the NaNand -infvalues at the same time?

我们如何同时删除NaN-inf值?

回答by piRSquared

Use pd.DataFrame.isinand check for rows that have any with pd.DataFrame.any. Finally, use the boolean array to slice the dataframe.

使用pd.DataFrame.isin并检查有任何 with 的行pd.DataFrame.any。最后,使用布尔数组对数据帧进行切片。

df[~df.isin([np.nan, np.inf, -np.inf]).any(1)]

             time    X    Y  X_t0     X_tp0   X_t1     X_tp1   X_t2     X_tp2
4        0.037389    3   10     3  0.333333    2.0  0.500000    1.0  1.000000
5        0.037393    4   10     4  0.250000    3.0  0.333333    2.0  0.500000
1030308  9.962213  256  268   256  0.000000  256.0  0.003906  255.0  0.003922

回答by Alexander

You can replace infand -infwith NaN, and then select non-null rows.

您可以将inf和替换-infNaN,然后选择非空行。

df[df.replace([np.inf, -np.inf], np.nan).notnull().all(axis=1)]  # .astype(np.float64) ?

or

或者

df.replace([np.inf, -np.inf], np.nan).dropna(axis=1)

Check the type of your columns returns to make sure they are all as expected (e.g. np.float32/64) via df.info().

通过df.info().

回答by Maria Wollestonecraft

df.replace([np.inf, -np.inf], np.nan)

df.dropna(inplace=True)

回答by DougR

Instead of dropping rows which contain any nulls and infinite numbers, it is more succinct to the reverse the logic of that and instead return the rows where all cells are finite numbers. The numpy isfinite function does this and the '.all(1)' will only return a TRUE if allcells in row are finite.

与其删除包含任何空值和无限数的行,不如将其逻辑颠倒过来更简洁,而是返回所有单元格都是有限数的行。numpy isfinite 函数执行此操作,如果行中的所有单元格都是有限的,则 '.all(1)' 只会返回 TRUE 。

df = df[np.isfinite(df).all(1)]

回答by Sanjeev Mishra

df.replaceonly replaces the first occurrence on the value and thus the error

df.replace仅替换该值的第一次出现,从而替换错误

df = list(filter(lambda x: x!= inf, df))would remove all occurrences of infand then the dropfunction can be used

df = list(filter(lambda x: x!= inf, df))将删除所有出现的inf,然后drop可以使用该函数

回答by mrkbutty

I prefer to set the options so that inf values are calculated to nan;

我更喜欢设置选项,以便将 inf 值计算为 nan;

s1 = pd.Series([0, 1, 2])
s2 = pd.Series([2, 1, 0])
s1/s2
# Outputs:
# 0.0
# 1.0
# inf
# dtype: float64

pd.set_option('mode.use_inf_as_na', True)
s1/s2
# Outputs:
# 0.0
# 1.0
# NaN
# dtype: float64

Note you can also use context;

请注意,您还可以使用上下文;

with pd.option_context('mode.use_inf_as_na', True):
    print(s1/s2)
# Outputs:
# 0.0
# 1.0
# NaN
# dtype: float64