Python 熊猫:如何删除 nan 和 -inf 值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45745085/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python pandas: how to remove nan and -inf values
提问by piRSquared
I have the following dataframe
我有以下数据框
time X Y X_t0 X_tp0 X_t1 X_tp1 X_t2 X_tp2
0 0.002876 0 10 0 NaN NaN NaN NaN NaN
1 0.002986 0 10 0 NaN 0 NaN NaN NaN
2 0.037367 1 10 1 1.000000 0 NaN 0 NaN
3 0.037374 2 10 2 0.500000 1 1.000000 0 NaN
4 0.037389 3 10 3 0.333333 2 0.500000 1 1.000000
5 0.037393 4 10 4 0.250000 3 0.333333 2 0.500000
....
1030308 9.962213 256 268 256 0.000000 256 0.003906 255 0.003922
1030309 10.041799 0 268 0 -inf 256 0.000000 256 0.003906
1030310 10.118960 0 268 0 NaN 0 -inf 256 0.000000
I tried with the following
我尝试了以下
df.dropna(inplace=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.40)
X_train = X_train.drop('time', axis=1)
X_train = X_train.drop('X_t1', axis=1)
X_train = X_train.drop('X_t2', axis=1)
X_test = X_test.drop('time', axis=1)
X_test = X_test.drop('X_t1', axis=1)
X_test = X_test.drop('X_t2', axis=1)
X_test.fillna(X_test.mean(), inplace=True)
X_train.fillna(X_train.mean(), inplace=True)
y_train.fillna(y_train.mean(), inplace=True)
However, I am still getting this error ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
whenever i try to fit a regression model fit(X_train, y_train)
但是,ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
每当我尝试拟合回归模型时 ,我仍然会收到此错误fit(X_train, y_train)
How can we remove both the NaN
and -inf
values at the same time?
我们如何同时删除NaN
和-inf
值?
回答by piRSquared
Use pd.DataFrame.isin
and check for rows that have any with pd.DataFrame.any
. Finally, use the boolean array to slice the dataframe.
使用pd.DataFrame.isin
并检查有任何 with 的行pd.DataFrame.any
。最后,使用布尔数组对数据帧进行切片。
df[~df.isin([np.nan, np.inf, -np.inf]).any(1)]
time X Y X_t0 X_tp0 X_t1 X_tp1 X_t2 X_tp2
4 0.037389 3 10 3 0.333333 2.0 0.500000 1.0 1.000000
5 0.037393 4 10 4 0.250000 3.0 0.333333 2.0 0.500000
1030308 9.962213 256 268 256 0.000000 256.0 0.003906 255.0 0.003922
回答by Alexander
You can replace inf
and -inf
with NaN
, and then select non-null rows.
您可以将inf
和替换-inf
为NaN
,然后选择非空行。
df[df.replace([np.inf, -np.inf], np.nan).notnull().all(axis=1)] # .astype(np.float64) ?
or
或者
df.replace([np.inf, -np.inf], np.nan).dropna(axis=1)
Check the type of your columns returns to make sure they are all as expected (e.g. np.float32/64) via df.info()
.
通过df.info()
.
回答by Maria Wollestonecraft
df.replace([np.inf, -np.inf], np.nan)
df.dropna(inplace=True)
回答by DougR
Instead of dropping rows which contain any nulls and infinite numbers, it is more succinct to the reverse the logic of that and instead return the rows where all cells are finite numbers. The numpy isfinite function does this and the '.all(1)' will only return a TRUE if allcells in row are finite.
与其删除包含任何空值和无限数的行,不如将其逻辑颠倒过来更简洁,而是返回所有单元格都是有限数的行。numpy isfinite 函数执行此操作,如果行中的所有单元格都是有限的,则 '.all(1)' 只会返回 TRUE 。
df = df[np.isfinite(df).all(1)]
回答by Sanjeev Mishra
df.replace
only replaces the first occurrence on the value and thus the error
df.replace
仅替换该值的第一次出现,从而替换错误
df = list(filter(lambda x: x!= inf, df))
would remove all occurrences of inf
and then the drop
function can be used
df = list(filter(lambda x: x!= inf, df))
将删除所有出现的inf
,然后drop
可以使用该函数
回答by mrkbutty
I prefer to set the options so that inf values are calculated to nan;
我更喜欢设置选项,以便将 inf 值计算为 nan;
s1 = pd.Series([0, 1, 2])
s2 = pd.Series([2, 1, 0])
s1/s2
# Outputs:
# 0.0
# 1.0
# inf
# dtype: float64
pd.set_option('mode.use_inf_as_na', True)
s1/s2
# Outputs:
# 0.0
# 1.0
# NaN
# dtype: float64
Note you can also use context;
请注意,您还可以使用上下文;
with pd.option_context('mode.use_inf_as_na', True):
print(s1/s2)
# Outputs:
# 0.0
# 1.0
# NaN
# dtype: float64