pandas 如何使用pandas仅用空字符串替换None？

Question

提问by Hymanson Tale

the code below generates a df:

下面的代码生成一个df：

import pandas as pd
from datetime import datetime as dt
import numpy as np

dates = [dt(2014, 1, 2, 2), dt(2014, 1, 2, 3), dt(2014, 1, 2, 4), None]
strings1 = ['A', 'B',None, 'C']
strings2 = [None, 'B','C', 'C']
strings3 = ['A', 'B','C', None]
vals = [1.,2.,np.nan, 4.]
df = pd.DataFrame(dict(zip(['A','B','C','D','E'],
                           [strings1, dates, strings2, strings3, vals])))



+---+------+---------------------+------+------+-----+
|   |  A   |          B          |  C   |  D   |  E  |
+---+------+---------------------+------+------+-----+
| 0 | A    | 2014-01-02 02:00:00 | None | A    | 1   |
| 1 | B    | 2014-01-02 03:00:00 | B    | B    | 2   |
| 2 | None | 2014-01-02 04:00:00 | C    | C    | NaN |
| 3 | C    | NaT                 | C    | None | 4   |
+---+------+---------------------+------+------+-----+

I would like to replace all None(real Nonein python, not str) inside with ''(empty string).

我想用（空字符串）替换里面的所有None（None在python中是真实的，而不是str ''）。

The expecteddfis

该预期DF是

+---+---+---------------------+---+---+-----+
|   | A |          B          | C | D |  E  |
+---+---+---------------------+---+---+-----+
| 0 | A | 2014-01-02 02:00:00 |   | A | 1   |
| 1 | B | 2014-01-02 03:00:00 | B | B | 2   |
| 2 |   | 2014-01-02 04:00:00 | C | C | NaN |
| 3 | C | NaT                 | C |   | 4   |
+---+---+---------------------+---+---+-----+

what I did is

我所做的是

df = df.replace([None], [''], regex=True)

But I got

但我得到了

+---+---+---------------------+---+------+---+
|   | A |          B          | C |  D   | E |
+---+---+---------------------+---+------+---+
| 0 | A | 1388628000000000000 |   | A    | 1 |
| 1 | B | 1388631600000000000 | B | B    | 2 |
| 2 |   | 1388635200000000000 | C | C    |   |
| 3 | C |                     | C |      | 4 |
+---+---+---------------------+---+------+---+

all the dates becomes big numbers
Even NaTand NaNare replaced, which I don't want.

所有的日期都变成了大数字
EvenNaT和NaN被替换，这是我不想要的。

How can I achieve that correctly and efficently?

我怎样才能正确有效地实现这一目标？

Answer 1

回答by EdChum

It looks like Noneis being promoted to NaNand so you cannot use replacelike usual, the following works:

看起来None正在升级NaN，所以你不能replace像往常一样使用，以下工作：

In [126]:
mask = df.applymap(lambda x: x is None)
cols = df.columns[(mask).any()]
for col in df[cols]:
    df.loc[mask[col], col] = ''
df

Out[126]:
   A                   B  C  D   E
0  A 2014-01-02 02:00:00     A   1
1  B 2014-01-02 03:00:00  B  B   2
2    2014-01-02 04:00:00  C  C NaN
3  C                 NaT  C      4

So we generate a mask of the Nonevalues using applymap, we then use this mask to iterate over each column of interest and using the boolean mask set the values.

因此，我们使用生成None值的掩码applymap，然后使用此掩码迭代感兴趣的每一列，并使用布尔掩码设置值。

Answer 2

回答by Ricky McMaster

Since the relevant columns you wish to alter are all objects, you could just specify this with the dtype attribute (for completeness I added in string and unicode) and use fillna.

由于您希望更改的相关列都是对象，您只需使用dtype属性（为了完整性，我在 string 和 unicode 中添加）指定它并使用fillna。

So:

所以：

for c in df:
   if str(df[c].dtype) in ('object', 'string_', 'unicode_'):
        df[c].fillna(value='', inplace=True)

This will leave numeric and date columns unaffected.

这将使数字和日期列不受影响。

To see the data types for all columns:

查看所有列的数据类型：

df.dtypes

pandas 如何使用pandas仅用空字符串替换None？

提问by Hymanson Tale

回答by EdChum

回答by Ricky McMaster

相关推荐

最近更新

标签

pandas 如何使用pandas仅用空字符串替换None？

提问by Hymanson Tale

回答by EdChum

回答by Ricky McMaster

相关推荐

pandas matplotlib 中的气泡图或热图

从 Pandas 数据帧生成 SQL 语句

Pandas：将 Lambda 应用于多个数据帧

pandas 创建计数的熊猫数据框

相关推荐

最近更新

标签