Python 熊猫如何更换？使用 NaN - 处理非标准缺失值

Question

提问by swati saoji

I am new to pandas , I am trying to load the csv in Dataframe. My data has missing values represented as ? , and I am trying to replace it with standard Missing values - NaN

我是 pandas 的新手，我正在尝试在 Dataframe 中加载 csv。我的数据缺失值表示为？，我试图用标准缺失值替换它 - NaN

Kindly help me with this . I have tried reading through Pandas docs, but I am not able to follow.

请帮我解决这个问题。我曾尝试通读 Pandas 文档，但我无法理解。

def readData(filename):
   DataLabels =["age", "workclass", "fnlwgt", "education", "education-num", "marital-status",
               "occupation", "relationship", "race", "sex", "capital-gain",
               "capital-loss", "hours-per-week", "native-country", "class"] 

   # ==== trying to replace ? with Nan using na_values
   rawfile = pd.read_csv(filename, header=None, names=DataLabels, na_values=["?"])
   age = rawfile["age"]
   print age
   print rawfile[25:40]

   #========trying to replace ?
   rawfile.replace("?", "NaN")
   print rawfile[25:40]

The Snap shot of the data

数据快照

Answer 1

采纳答案by EdChum

You can replace this just for that column using replace:

您可以使用以下方法为该列替换它replace：

df['workclass'].replace('?', np.NaN)

or for the whole df:

或对于整个 df：

df.replace('?', np.NaN)

UPDATE

更新

OK I figured out your problem, by default if you don't pass a separator character then read_csvwill use commas ','as the separator.

好的，我想出了您的问题，默认情况下，如果您不传递分隔符，read_csv则将使用逗号','作为分隔符。

Your data and in particular one example where you have a problematic line:

您的数据，特别是您遇到问题线路的一个示例：

54, ?, 180211, Some-college, 10, Married-civ-spouse, ?, Husband, Asian-Pac-Islander, Male, 0, 0, 60, South, >50K

has in fact a comma and a space as the separator so when you passed the na_value=['?']this didn't match because all your values have a space character in front of them all which you can't observe.

实际上有一个逗号和一个空格作为分隔符，所以当你通过na_value=['?']this时，这不匹配，因为你的所有值前面都有一个空格字符，你无法观察到。

if you change your line to this:

如果您将线路更改为：

rawfile = pd.read_csv(filename, header=None, names=DataLabels, sep=',\s', na_values=["?"])

then you should find that it all works:

那么你应该会发现一切正常：

27      54               NaN  180211  Some-college             10

Answer 2

回答by Liam Foley

Use numpy.nan

使用 numpy.nan

Numpy - Replace a number with NaN

Numpy - 用 NaN 替换数字

import numpy as np
df.applymap(lambda x: np.nan if x == '?' else x)

Answer 3

回答by swati saoji

okay I got it by :

好的，我是通过以下方式获得的：

 #========trying to replace ?
    newraw= rawfile.replace('[?]', np.nan, regex=True)
    print newraw[25:40]

Answer 4

回答by Nishanth

some times there will be white spaces with the ? in the file generated by systems like informatica or HANA

有时会有空格？在由 informatica 或 HANA 等系统生成的文件中

first you Need to strip the white spaces in the DataFrame

首先你需要去除 DataFrame 中的空格

temp_df_trimmed = temp_df.apply(lambda x: x.str.strip() if x.dtype == "object" else x)

And later apply the function to replace the data

然后应用该函数来替换数据

temp_df_trimmed['RC'] = temp_df_trimmed['RC'].map(lambda x: np.nan if x=="?"  else x)

Python 熊猫如何更换？使用 NaN - 处理非标准缺失值

提问by swati saoji

采纳答案by EdChum

回答by Liam Foley

回答by swati saoji

回答by Nishanth

相关推荐

最近更新

标签

Python 熊猫如何更换？使用 NaN - 处理非标准缺失值

提问by swati saoji

采纳答案by EdChum

回答by Liam Foley

回答by swati saoji

回答by Nishanth

相关推荐

Python 为什么这可以解决 matplotlib 的“无 $DISPLAY 环境”问题？

在python中按值删除字典项的最佳方法是什么？

如何获得超过一周的推文（使用 tweepy 或其他 python 库）

Python 基于 Pandas 数据框中的多列值选择行

相关推荐

最近更新

标签