Python 从熊猫的数据框中删除无限值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17477979/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:18:50  来源:igfitidea点击:

dropping infinite values from dataframes in pandas?

pythonnumpyscipypandas

提问by

what is the quickest/simplest way to drop nan and inf/-inf values from a pandas DataFrame without resetting mode.use_inf_as_null? I'd like to be able to use the subsetand howarguments of dropna, except with infvalues considered missing, like:

在不重置的情况下从 Pandas DataFrame 中删除 nan 和 inf/-inf 值的最快/最简单的方法是什么mode.use_inf_as_null?我希望能够使用 的subsethow参数dropna,除非inf值被认为是缺失的,例如:

df.dropna(subset=["col1", "col2"], how="all", with_inf=True)

is this possible? Is there a way to tell dropnato include infin its definition of missing values?

这可能吗?有没有办法告诉dropna包含inf在其缺失值的定义中?

采纳答案by Andy Hayden

The simplest way would be to first replaceinfs to NaN:

最简单的方法是先将replaceinfs 转换为 NaN:

df.replace([np.inf, -np.inf], np.nan)

and then use the dropna:

然后使用dropna

df.replace([np.inf, -np.inf], np.nan).dropna(subset=["col1", "col2"], how="all")

For example:

例如:

In [11]: df = pd.DataFrame([1, 2, np.inf, -np.inf])

In [12]: df.replace([np.inf, -np.inf], np.nan)
Out[12]:
    0
0   1
1   2
2 NaN
3 NaN

The same method would work for a Series.

同样的方法适用于系列。

回答by has2k1

The above solution will modify the infs that are not in the target columns. To remedy that,

上述解决方案将修改inf不在目标列中的s。为了解决这个问题,

lst = [np.inf, -np.inf]
to_replace = {v: lst for v in ['col1', 'col2']}
df.replace(to_replace, np.nan)

回答by Alexander

Here is another method using .locto replace inf with nan on a Series:

这是.loc用于在系列上用 nan 替换 inf 的另一种方法:

s.loc[(~np.isfinite(s)) & s.notnull()] = np.nan

So, in response to the original question:

所以,在回答最初的问题时:

df = pd.DataFrame(np.ones((3, 3)), columns=list('ABC'))

for i in range(3): 
    df.iat[i, i] = np.inf

df
          A         B         C
0       inf  1.000000  1.000000
1  1.000000       inf  1.000000
2  1.000000  1.000000       inf

df.sum()
A    inf
B    inf
C    inf
dtype: float64

df.apply(lambda s: s[np.isfinite(s)].dropna()).sum()
A    2
B    2
C    2
dtype: float64

回答by ayhan

With option context, this is possible without permanently setting use_inf_as_na. For example:

使用选项上下文,无需永久设置use_inf_as_na. 例如:

with pd.option_context('mode.use_inf_as_na', True):
    df = df.dropna(subset=['col1', 'col2'], how='all')

Of course it can be set to treat infas NaNpermanently with

当然它可以设置infNaN永久对待

pd.set_option('use_inf_as_na', True)


For older versions, replace use_inf_as_nawith use_inf_as_null.

对于旧版本,替换use_inf_as_nause_inf_as_null.

回答by Ted Petrou

Yet another solution would be to use the isinmethod. Use it to determine whether each value is infinite or missing and then chain the allmethod to determine if all the values in the rows are infinite or missing.

另一种解决方案是使用该isin方法。使用它来确定每个值是无限还是缺失,然后链接该all方法以确定行中的所有值是无限还是缺失。

Finally, use the negation of that result to select the rows that don't have all infinite or missing values via boolean indexing.

最后,使用该结果的否定通过布尔索引选择不具有所有无限值或缺失值的行。

all_inf_or_nan = df.isin([np.inf, -np.inf, np.nan]).all(axis='columns')
df[~all_inf_or_nan]

回答by jpp

You can use pd.DataFrame.maskwith np.isinf. You should ensure first your dataframe series are all of type float. Then use dropnawith your existing logic.

您可以pd.DataFrame.masknp.isinf. 您应该首先确保您的数据帧系列都是 type float。然后dropna与您现有的逻辑一起使用。

print(df)

       col1      col2
0 -0.441406       inf
1 -0.321105      -inf
2 -0.412857  2.223047
3 -0.356610  2.513048

df = df.mask(np.isinf(df))

print(df)

       col1      col2
0 -0.441406       NaN
1 -0.321105       NaN
2 -0.412857  2.223047
3 -0.356610  2.513048

回答by Markus Dutschke

Use (fast and simple):

使用(快速简单):

df = df[np.isfinite(df).all(1)]

This answer is based on DougR's answerin an other question. Here an example code:

此答案基于DougR在另一个问题中的回答。这是一个示例代码:

import pandas as pd
import numpy as np
df=pd.DataFrame([1,2,3,np.nan,4,np.inf,5,-np.inf,6])
print('Input:\n',df,sep='')
df = df[np.isfinite(df).all(1)]
print('\nDropped:\n',df,sep='')

Result:

结果:

Input:
    0
0  1.0000
1  2.0000
2  3.0000
3     NaN
4  4.0000
5     inf
6  5.0000
7    -inf
8  6.0000

Dropped:
     0
0  1.0
1  2.0
2  3.0
4  4.0
6  5.0
8  6.0