如何用 Pandas 数据框中的 NaN 替换所有非数字条目？

Question

提问by user6566438

I have various csv files and I import them as a DataFrame. The problem is that many files use different symbols for missing values. Some use nan, others NaN, ND, None, missing etc. or just live the entry empty. Is there a way to replace all these values with a np.nan? In other words, any non-numeric value in the dataframe becomes np.nan. Thank you for the help.

我有各种 csv 文件，并将它们作为 DataFrame 导入。问题是许多文件对缺失值使用不同的符号。有些使用 nan，其他使用 NaN、ND、None、missing 等，或者只是将条目留空。有没有办法用 np.nan 替换所有这些值？换句话说，数据帧中的任何非数字值都会变成 np.nan。感谢您的帮助。

Answer 1

采纳答案by instant

I found what I think is a relatively elegant but also robust method:

我发现我认为是一种相对优雅但也很健壮的方法：

def isnumber(x):
    try:
        float(x)
        return True
    except:
        return False

df[df.applymap(isnumber)]

In case it's not clear: You define a function that returns Trueonly if whatever input you have can be converted to a float. You then filter dfwith that boolean dataframe, which automatically assigns NaNto the cells you didn't filter for.

如果不清楚：您定义了一个函数，该函数True仅在您拥有的任何输入可以转换为浮点数时才返回。然后您df使用该布尔数据框进行过滤，该数据框会自动分配NaN给您未过滤的单元格。

Another solution I tried was to define isnumberas

我尝试的另一个解决方案是定义isnumber为

import number
def isnumber(x):
    return isinstance(x, number.Number)

but what I liked less about that approach is that you can accidentally have a number as a string, so you would mistakenly filter those out. This is also a sneaky error, seeing that the dataframe displays the string "99"the same as the number 99.

但我不太喜欢这种方法的一点是，您可能会意外地将数字作为字符串，因此您会错误地将它们过滤掉。这也是一个偷偷摸摸的错误，因为数据帧显示的字符串"99"与数字相同99。

EDIT:

编辑：

In your case you probably still need to df = df.applymap(float)after filtering, for the reason that floatworks on all different capitalizations of 'nan', but until you explicitely convert them they will still be considered strings in the dataframe.

在您的情况下，您可能仍然需要df = df.applymap(float)在过滤后进行，因为它float适用于的所有不同大小写'nan'，但在您明确转换它们之前，它们仍将被视为数据框中的字符串。

如何用 Pandas 数据框中的 NaN 替换所有非数字条目？

提问by user6566438

采纳答案by instant

相关推荐

最近更新

标签

如何用 Pandas 数据框中的 NaN 替换所有非数字条目？

提问by user6566438

采纳答案by instant

相关推荐

pandas Python：将数据帧转换为列表中包含字符串项的列表

pandas 从熊猫数据框中按名称绘制正态分布图

Pandas 中的就地 sort_values 到底是什么意思？

Pandas read_xml() 方法测试策略

相关推荐

最近更新

标签