如何用 Pandas 数据框中的 NaN 替换所有非数字条目?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41938549/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to replace all non-numeric entries with NaN in a pandas dataframe?
提问by user6566438
I have various csv files and I import them as a DataFrame. The problem is that many files use different symbols for missing values. Some use nan, others NaN, ND, None, missing etc. or just live the entry empty. Is there a way to replace all these values with a np.nan? In other words, any non-numeric value in the dataframe becomes np.nan. Thank you for the help.
我有各种 csv 文件,并将它们作为 DataFrame 导入。问题是许多文件对缺失值使用不同的符号。有些使用 nan,其他使用 NaN、ND、None、missing 等,或者只是将条目留空。有没有办法用 np.nan 替换所有这些值?换句话说,数据帧中的任何非数字值都会变成 np.nan。感谢您的帮助。
采纳答案by instant
I found what I think is a relatively elegant but also robust method:
我发现我认为是一种相对优雅但也很健壮的方法:
def isnumber(x):
try:
float(x)
return True
except:
return False
df[df.applymap(isnumber)]
In case it's not clear: You define a function that returns True
only if whatever input you have can be converted to a float. You then filter df
with that boolean dataframe, which automatically assigns NaN
to the cells you didn't filter for.
如果不清楚:您定义了一个函数,该函数True
仅在您拥有的任何输入可以转换为浮点数时才返回。然后您df
使用该布尔数据框进行过滤,该数据框会自动分配NaN
给您未过滤的单元格。
Another solution I tried was to define isnumber
as
我尝试的另一个解决方案是定义isnumber
为
import number
def isnumber(x):
return isinstance(x, number.Number)
but what I liked less about that approach is that you can accidentally have a number as a string, so you would mistakenly filter those out. This is also a sneaky error, seeing that the dataframe displays the string "99"
the same as the number 99
.
但我不太喜欢这种方法的一点是,您可能会意外地将数字作为字符串,因此您会错误地将它们过滤掉。这也是一个偷偷摸摸的错误,因为数据帧显示的字符串"99"
与数字相同99
。
EDIT:
编辑:
In your case you probably still need to df = df.applymap(float)
after filtering, for the reason that float
works on all different capitalizations of 'nan'
, but until you explicitely convert them they will still be considered strings in the dataframe.
在您的情况下,您可能仍然需要df = df.applymap(float)
在过滤后进行,因为它float
适用于 的所有不同大小写'nan'
,但在您明确转换它们之前,它们仍将被视为数据框中的字符串。