pandas 熊猫数据框用 NaN 替换空白
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/30392720/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
pandas dataframe replace blanks with NaN
提问by Wannes Dermauw
I have a dataframe with empty cells and would like to replace these empty cells with NaN. A solution previously proposed at this forum works, but only if the cell contains a space:
我有一个带有空单元格的数据框,并想用 NaN 替换这些空单元格。先前在此论坛上提出的解决方案有效,但前提是单元格包含空格:
df.replace(r'\s+',np.nan,regex=True)
df.replace(r'\s+',np.nan,regex=True)
This code does not work when the cell is empty. Has anyone a suggestion for a panda code to replace empty cells.
当单元格为空时,此代码不起作用。有没有人建议用熊猫代码替换空单元格。
Wannes
瓦内斯
回答by EdChum
I think the easiest thing here is to do the replace twice:
我认为这里最简单的事情是进行两次替换:
In [117]:
df = pd.DataFrame({'a':['',' ','asasd']})
df
Out[117]:
a
0
1
2 asasd
In [118]:
df.replace(r'\s+',np.nan,regex=True).replace('',np.nan)
Out[118]:
a
0 NaN
1 NaN
2 asasd
回答by UNagaswamy
How about this?
这个怎么样?
df.replace(r'\s+|^$', np.nan, regex=True)
回答by Guido
Both other answers do not take in account all characters in a string. This is better:
其他两个答案都没有考虑字符串中的所有字符。这个更好:
df.replace(r'\s+( +\.)|#',np.nan,regex=True).replace('',np.nan))
df.replace(r'\s+( +\.)|#',np.nan,regex=True).replace('',np.nan))
More docs on: Replacing blank values (white space) with NaN in pandas
回答by deepgeek
As you've already seen, if you do the obvious thing and replace() with None it throws an error:
正如您已经看到的,如果您做明显的事情并将 replace() 与 None ,它会引发错误:
df.replace('', None)
TypeError: cannot replace [''] with method pad on a DataFrame
The solution seems to be to simply replace the empty string with numpy's NaN.
解决方案似乎是简单地用 numpy 的 NaN 替换空字符串。
import numpy as np
df.replace('', np.NaN)
While I'm not 100% sure that pd.NaN is treated in exactly the same way as np.NaN across all edge cases, I've not had any problems. fillna() works, persisting NULLs to database in place of np.NaN works, persisting NaN to csv works.
虽然我不是 100% 确定 pd.NaN 在所有边缘情况下的处理方式与 np.NaN 完全相同,但我没有遇到任何问题。fillna() 工作,将 NULL 持久化到数据库代替 np.NaN 工作,将 NaN 持久化到 csv 工作。
(Pandas version 18.1)
(熊猫版本 18.1)