pandas 如何使用pandas仅用空字符串替换None?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/31295740/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:35:28  来源:igfitidea点击:

How to replace None only with empty string using pandas?

pythonpandas

提问by Hymanson Tale

the code below generates a df:

下面的代码生成一个df

import pandas as pd
from datetime import datetime as dt
import numpy as np

dates = [dt(2014, 1, 2, 2), dt(2014, 1, 2, 3), dt(2014, 1, 2, 4), None]
strings1 = ['A', 'B',None, 'C']
strings2 = [None, 'B','C', 'C']
strings3 = ['A', 'B','C', None]
vals = [1.,2.,np.nan, 4.]
df = pd.DataFrame(dict(zip(['A','B','C','D','E'],
                           [strings1, dates, strings2, strings3, vals])))



+---+------+---------------------+------+------+-----+
|   |  A   |          B          |  C   |  D   |  E  |
+---+------+---------------------+------+------+-----+
| 0 | A    | 2014-01-02 02:00:00 | None | A    | 1   |
| 1 | B    | 2014-01-02 03:00:00 | B    | B    | 2   |
| 2 | None | 2014-01-02 04:00:00 | C    | C    | NaN |
| 3 | C    | NaT                 | C    | None | 4   |
+---+------+---------------------+------+------+-----+

I would like to replace all None(real Nonein python, not str) inside with ''(empty string).

我想用(空字符串)替换里面的所有NoneNone在python中是真实的,而不是str '')。

The expecteddfis

预期DF

+---+---+---------------------+---+---+-----+
|   | A |          B          | C | D |  E  |
+---+---+---------------------+---+---+-----+
| 0 | A | 2014-01-02 02:00:00 |   | A | 1   |
| 1 | B | 2014-01-02 03:00:00 | B | B | 2   |
| 2 |   | 2014-01-02 04:00:00 | C | C | NaN |
| 3 | C | NaT                 | C |   | 4   |
+---+---+---------------------+---+---+-----+


what I did is

我所做的是

df = df.replace([None], [''], regex=True)

df = df.replace([None], [''], regex=True)

But I got

但我得到了

+---+---+---------------------+---+------+---+
|   | A |          B          | C |  D   | E |
+---+---+---------------------+---+------+---+
| 0 | A | 1388628000000000000 |   | A    | 1 |
| 1 | B | 1388631600000000000 | B | B    | 2 |
| 2 |   | 1388635200000000000 | C | C    |   |
| 3 | C |                     | C |      | 4 |
+---+---+---------------------+---+------+---+


  1. all the dates becomes big numbers
  2. Even NaTand NaNare replaced, which I don't want.
  1. 所有的日期都变成了大数字
  2. EvenNaTNaN被替换,这是我不想要的。

How can I achieve that correctly and efficently?

我怎样才能正确有效地实现这一目标?

回答by EdChum

It looks like Noneis being promoted to NaNand so you cannot use replacelike usual, the following works:

看起来None正在升级NaN,所以你不能replace像往常一样使用,以下工作:

In [126]:
mask = df.applymap(lambda x: x is None)
cols = df.columns[(mask).any()]
for col in df[cols]:
    df.loc[mask[col], col] = ''
df

Out[126]:
   A                   B  C  D   E
0  A 2014-01-02 02:00:00     A   1
1  B 2014-01-02 03:00:00  B  B   2
2    2014-01-02 04:00:00  C  C NaN
3  C                 NaT  C      4

So we generate a mask of the Nonevalues using applymap, we then use this mask to iterate over each column of interest and using the boolean mask set the values.

因此,我们使用 生成None值的掩码applymap,然后使用此掩码迭代感兴趣的每一列,并使用布尔掩码设置值。

回答by Ricky McMaster

Since the relevant columns you wish to alter are all objects, you could just specify this with the dtype attribute (for completeness I added in string and unicode) and use fillna.

由于您希望更改的相关列都是对象,您只需使用dtype属性(为了完整性,我在 string 和 unicode 中添加)指定它并使用fillna

So:

所以:

for c in df:
   if str(df[c].dtype) in ('object', 'string_', 'unicode_'):
        df[c].fillna(value='', inplace=True)

This will leave numeric and date columns unaffected.

这将使数字和日期列不受影响。

To see the data types for all columns:

查看所有列的数据类型:

df.dtypes