Python 熊猫：如何删除 nan 和 -inf 值

Question

提问by piRSquared

I have the following dataframe

我有以下数据框

           time       X    Y  X_t0     X_tp0  X_t1     X_tp1  X_t2     X_tp2
0         0.002876    0   10     0       NaN   NaN       NaN   NaN       NaN
1         0.002986    0   10     0       NaN     0       NaN   NaN       NaN
2         0.037367    1   10     1  1.000000     0       NaN     0       NaN
3         0.037374    2   10     2  0.500000     1  1.000000     0       NaN
4         0.037389    3   10     3  0.333333     2  0.500000     1  1.000000
5         0.037393    4   10     4  0.250000     3  0.333333     2  0.500000

....
1030308   9.962213  256  268   256  0.000000   256  0.003906   255  0.003922
1030309  10.041799    0  268     0      -inf   256  0.000000   256  0.003906
1030310  10.118960    0  268     0       NaN     0      -inf   256  0.000000

I tried with the following

我尝试了以下

df.dropna(inplace=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.40)

X_train = X_train.drop('time', axis=1)
X_train = X_train.drop('X_t1', axis=1)
X_train = X_train.drop('X_t2', axis=1)
X_test = X_test.drop('time', axis=1)
X_test = X_test.drop('X_t1', axis=1)
X_test = X_test.drop('X_t2', axis=1)
X_test.fillna(X_test.mean(), inplace=True)
X_train.fillna(X_train.mean(), inplace=True)
y_train.fillna(y_train.mean(), inplace=True)

However, I am still getting this error ValueError: Input contains NaN, infinity or a value too large for dtype('float32').whenever i try to fit a regression model fit(X_train, y_train)

但是，ValueError: Input contains NaN, infinity or a value too large for dtype('float32').每当我尝试拟合回归模型时，我仍然会收到此错误fit(X_train, y_train)

How can we remove both the NaNand -infvalues at the same time?

我们如何同时删除NaN和-inf值？

Answer 1

回答by piRSquared

Use pd.DataFrame.isinand check for rows that have any with pd.DataFrame.any. Finally, use the boolean array to slice the dataframe.

使用pd.DataFrame.isin并检查有任何 with 的行pd.DataFrame.any。最后，使用布尔数组对数据帧进行切片。

df[~df.isin([np.nan, np.inf, -np.inf]).any(1)]

             time    X    Y  X_t0     X_tp0   X_t1     X_tp1   X_t2     X_tp2
4        0.037389    3   10     3  0.333333    2.0  0.500000    1.0  1.000000
5        0.037393    4   10     4  0.250000    3.0  0.333333    2.0  0.500000
1030308  9.962213  256  268   256  0.000000  256.0  0.003906  255.0  0.003922

Answer 2

回答by Alexander

You can replace infand -infwith NaN, and then select non-null rows.

您可以将inf和替换-inf为NaN，然后选择非空行。

df[df.replace([np.inf, -np.inf], np.nan).notnull().all(axis=1)]  # .astype(np.float64) ?

or

或者

df.replace([np.inf, -np.inf], np.nan).dropna(axis=1)

Check the type of your columns returns to make sure they are all as expected (e.g. np.float32/64) via df.info().

通过df.info().

Answer 3

回答by Maria Wollestonecraft

df.replace([np.inf, -np.inf], np.nan)

df.dropna(inplace=True)

Answer 4

回答by DougR

Instead of dropping rows which contain any nulls and infinite numbers, it is more succinct to the reverse the logic of that and instead return the rows where all cells are finite numbers. The numpy isfinite function does this and the '.all(1)' will only return a TRUE if allcells in row are finite.

与其删除包含任何空值和无限数的行，不如将其逻辑颠倒过来更简洁，而是返回所有单元格都是有限数的行。numpy isfinite 函数执行此操作，如果行中的所有单元格都是有限的，则 '.all(1)' 只会返回 TRUE 。

df = df[np.isfinite(df).all(1)]

Answer 5

回答by Sanjeev Mishra

df.replaceonly replaces the first occurrence on the value and thus the error

df.replace仅替换该值的第一次出现，从而替换错误

df = list(filter(lambda x: x!= inf, df))would remove all occurrences of infand then the dropfunction can be used

df = list(filter(lambda x: x!= inf, df))将删除所有出现的inf，然后drop可以使用该函数

Answer 6

回答by mrkbutty

I prefer to set the options so that inf values are calculated to nan;

我更喜欢设置选项，以便将 inf 值计算为 nan；

s1 = pd.Series([0, 1, 2])
s2 = pd.Series([2, 1, 0])
s1/s2
# Outputs:
# 0.0
# 1.0
# inf
# dtype: float64

pd.set_option('mode.use_inf_as_na', True)
s1/s2
# Outputs:
# 0.0
# 1.0
# NaN
# dtype: float64

Note you can also use context;

请注意，您还可以使用上下文；

with pd.option_context('mode.use_inf_as_na', True):
    print(s1/s2)
# Outputs:
# 0.0
# 1.0
# NaN
# dtype: float64

Python 熊猫：如何删除 nan 和 -inf 值

提问by piRSquared

回答by piRSquared

回答by Alexander

回答by Maria Wollestonecraft

回答by DougR

回答by Sanjeev Mishra

回答by mrkbutty

相关推荐

最近更新

标签

Python 熊猫：如何删除 nan 和 -inf 值

提问by piRSquared

回答by piRSquared

回答by Alexander

回答by Maria Wollestonecraft

回答by DougR

回答by Sanjeev Mishra

回答by mrkbutty

相关推荐

Python 我可以在 GPU 上运行 Keras 模型吗？

Python 检查列表是否是子列表

Python 一一循环数据帧（熊猫）

Python Pycharm - 没有找到测试？

相关推荐

最近更新

标签