pandas 在Pandas中,如果列最初为空,如何使用fillna用字符串填充整列?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16067144/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:46:25  来源:igfitidea点击:

In Pandas, How to use fillna to fill the whole columns with string if the column is empty originally?

pythonpandas

提问by waitingkuo

My table:

我的表:

In [15]: csv=u"""a,a,,a
   ....: b,b,,b
   ....: c,c,,c
   ....: """

In [18]: df = pd.read_csv(io.StringIO(csv), header=None)

Fill the empty columns as 'UNKNOWN'

将空列填充为“未知”

In [19]: df
Out[19]: 
   0  1   2  3
0  a  a NaN  a
1  b  b NaN  b
2  c  c NaN  c

In [20]: df.fillna({2:'UNKNOWN'})

Got the error

得到错误

ValueError: could not convert string to float: UNKNOWN

采纳答案by DSM

Your 2column probably has a float dtype:

您的2列可能有一个 float dtype:

>>> df
   0  1   2  3
0  a  a NaN  a
1  b  b NaN  b
2  c  c NaN  c
>>> df.dtypes
0     object
1     object
2    float64
3     object
dtype: object

Hence the problem. If you don't mind converting the whole frame to object, you could:

因此问题来了。如果您不介意将整个框架转换为object,则可以:

>>> df.astype(object).fillna("UNKNOWN")
   0  1        2  3
0  a  a  UNKNOWN  a
1  b  b  UNKNOWN  b
2  c  c  UNKNOWN  c

Depending on whether there's non-string data you might want to be more selective about converting column dtypes, and/or specify the dtypes on read, but the above should work, anyhow.

根据是否存在非字符串数据,您可能希望对转换列 dtype 和/或在读取时指定 dtype 更有选择性,但无论如何,上述方法应该可行。



Update: if you have dtype information you want to preserve, rather than switching it back, I'd go the other way and only fill on the columns that you wanted to, either using a loop with fillna:

更新:如果您有要保留的 dtype 信息,而不是将其切换回来,我会走另一条路,只填写您想要的列,或者使用循环fillna

>>> df
   0  1  2   3  4   5
0  0  a  a NaN  a NaN
1  1  b  b NaN  b NaN
2  2  c  c NaN  c NaN
>>> df.dtypes
0      int64
1     object
2     object
3    float64
4     object
5    float64
dtype: object
>>> for col in df.columns[pd.isnull(df).all()]:
...         df[col] = df[col].astype(object).fillna("UNKNOWN")
...     
>>> df
   0  1  2        3  4        5
0  0  a  a  UNKNOWN  a  UNKNOWN
1  1  b  b  UNKNOWN  b  UNKNOWN
2  2  c  c  UNKNOWN  c  UNKNOWN
>>> df.dtypes
0     int64
1    object
2    object
3    object
4    object
5    object
dtype: object

Or (if you're using all), then maybe not even use fillnaat all:

或者(如果您正在使用all),那么甚至可能根本不使用fillna

>>> df
   0  1  2   3  4   5
0  0  a  a NaN  a NaN
1  1  b  b NaN  b NaN
2  2  c  c NaN  c NaN
>>> df.ix[:,pd.isnull(df).all()] = "UNKNOWN"
>>> df
   0  1  2        3  4        5
0  0  a  a  UNKNOWN  a  UNKNOWN
1  1  b  b  UNKNOWN  b  UNKNOWN
2  2  c  c  UNKNOWN  c  UNKNOWN

回答by Jeff

As a workaround, just set the column directly, the fillna upconversion should work and is a bug

作为一种解决方法,只需直接设置列,fillna 上转换应该可以工作并且是一个错误

In [8]: df = pd.read_csv(io.StringIO(csv), header=None)

In [9]: df
Out[9]: 
   0  1   2  3
0  a  a NaN  a
1  b  b NaN  b
2  c  c NaN  c

In [10]: df.loc[:,2] = 'foo'

In [11]: df
Out[11]: 
   0  1    2  3
0  a  a  foo  a
1  b  b  foo  b
2  c  c  foo  c

In [12]: df.dtypes
Out[12]: 
0    object
1    object
2    object
3    object
dtype: object