pandas 在Pandas中,如果列最初为空,如何使用fillna用字符串填充整列?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/16067144/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
In Pandas, How to use fillna to fill the whole columns with string if the column is empty originally?
提问by waitingkuo
My table:
我的表:
In [15]: csv=u"""a,a,,a
....: b,b,,b
....: c,c,,c
....: """
In [18]: df = pd.read_csv(io.StringIO(csv), header=None)
Fill the empty columns as 'UNKNOWN'
将空列填充为“未知”
In [19]: df
Out[19]:
0 1 2 3
0 a a NaN a
1 b b NaN b
2 c c NaN c
In [20]: df.fillna({2:'UNKNOWN'})
Got the error
得到错误
ValueError: could not convert string to float: UNKNOWN
采纳答案by DSM
Your 2column probably has a float dtype:
您的2列可能有一个 float dtype:
>>> df
0 1 2 3
0 a a NaN a
1 b b NaN b
2 c c NaN c
>>> df.dtypes
0 object
1 object
2 float64
3 object
dtype: object
Hence the problem. If you don't mind converting the whole frame to object, you could:
因此问题来了。如果您不介意将整个框架转换为object,则可以:
>>> df.astype(object).fillna("UNKNOWN")
0 1 2 3
0 a a UNKNOWN a
1 b b UNKNOWN b
2 c c UNKNOWN c
Depending on whether there's non-string data you might want to be more selective about converting column dtypes, and/or specify the dtypes on read, but the above should work, anyhow.
根据是否存在非字符串数据,您可能希望对转换列 dtype 和/或在读取时指定 dtype 更有选择性,但无论如何,上述方法应该可行。
Update: if you have dtype information you want to preserve, rather than switching it back, I'd go the other way and only fill on the columns that you wanted to, either using a loop with fillna:
更新:如果您有要保留的 dtype 信息,而不是将其切换回来,我会走另一条路,只填写您想要的列,或者使用循环fillna:
>>> df
0 1 2 3 4 5
0 0 a a NaN a NaN
1 1 b b NaN b NaN
2 2 c c NaN c NaN
>>> df.dtypes
0 int64
1 object
2 object
3 float64
4 object
5 float64
dtype: object
>>> for col in df.columns[pd.isnull(df).all()]:
... df[col] = df[col].astype(object).fillna("UNKNOWN")
...
>>> df
0 1 2 3 4 5
0 0 a a UNKNOWN a UNKNOWN
1 1 b b UNKNOWN b UNKNOWN
2 2 c c UNKNOWN c UNKNOWN
>>> df.dtypes
0 int64
1 object
2 object
3 object
4 object
5 object
dtype: object
Or (if you're using all), then maybe not even use fillnaat all:
或者(如果您正在使用all),那么甚至可能根本不使用fillna:
>>> df
0 1 2 3 4 5
0 0 a a NaN a NaN
1 1 b b NaN b NaN
2 2 c c NaN c NaN
>>> df.ix[:,pd.isnull(df).all()] = "UNKNOWN"
>>> df
0 1 2 3 4 5
0 0 a a UNKNOWN a UNKNOWN
1 1 b b UNKNOWN b UNKNOWN
2 2 c c UNKNOWN c UNKNOWN
回答by Jeff
As a workaround, just set the column directly, the fillna upconversion should work and is a bug
作为一种解决方法,只需直接设置列,fillna 上转换应该可以工作并且是一个错误
In [8]: df = pd.read_csv(io.StringIO(csv), header=None)
In [9]: df
Out[9]:
0 1 2 3
0 a a NaN a
1 b b NaN b
2 c c NaN c
In [10]: df.loc[:,2] = 'foo'
In [11]: df
Out[11]:
0 1 2 3
0 a a foo a
1 b b foo b
2 c c foo c
In [12]: df.dtypes
Out[12]:
0 object
1 object
2 object
3 object
dtype: object

