pandas 如何将 NULL 视为带有熊猫的普通字符串?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/50683765/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to treat NULL as a normal string with pandas?
提问by piripiri
I have a csv-file with a column with strings and I want to read it with pandas. In this file the string null
occurs as an actual value and should not be regarded as a missing value.
我有一个带有字符串的列的 csv 文件,我想用 Pandas 读取它。在此文件中,字符串null
作为实际值出现,不应被视为缺失值。
Example:
例子:
import pandas as pd
from io import StringIO
data = u'strings,numbers\nfoo,1\nbar,2\nnull,3'
print(pd.read_csv(StringIO(data)))
This gives the following output:
这给出了以下输出:
strings numbers
0 foo 1
1 bar 2
2 NaN 3
What can I do to get the value null
as it is (and not as NaN) into the DataFrame? The file can be assumed to not contain any actually missing values.
我该怎么做才能将值null
(而不是 NaN)按原样获取到 DataFrame 中?可以假设该文件不包含任何实际缺失的值。
回答by cs95
You can specify a converters
argument for the string
column.
您可以converters
为该string
列指定一个参数。
pd.read_csv(StringIO(data), converters={'strings' : str})
strings numbers
0 foo 1
1 bar 2
2 null 3
This will by-pass pandas' automatic parsing.
这将绕过Pandas的自动解析。
Another option is setting na_filter=False
:
另一种选择是设置na_filter=False
:
pd.read_csv(StringIO(data), na_filter=False)
strings numbers
0 foo 1
1 bar 2
2 null 3
This works for the entire DataFrame, so use with caution. I recommend first option if you want to surgically apply this to select columns instead.
这适用于整个 DataFrame,因此请谨慎使用。如果您想通过外科手术将其应用于选择列,我推荐第一个选项。
回答by EdChum
The reason this happens is that the string 'null'
is treated as NaN
on parsing, you can turn this off by passing keep_default_na=False
in addition to @coldspeed's answer:
发生这种情况的原因是该字符串'null'
被视为NaN
解析,keep_default_na=False
除了@coldspeed 的答案之外,您还可以通过传递来关闭它:
In[49]:
data = u'strings,numbers\nfoo,1\nbar,2\nnull,3'
df = pd.read_csv(io.StringIO(data), keep_default_na=False)
df
Out[49]:
strings numbers
0 foo 1
1 bar 2
2 null 3
The full list is:
完整列表是:
na_values : scalar, str, list-like, or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘', ‘#N/A', ‘#N/A N/A', ‘#NA', ‘-1.#IND', ‘-1.#QNAN', ‘-NaN', ‘-nan', ‘1.#IND', ‘1.#QNAN', ‘N/A', ‘NA', ‘NULL', ‘NaN', ‘n/a', ‘nan', ‘null'.
na_values : 标量、str、类列表或字典,默认无
要识别为 NA/NaN 的其他字符串。如果 dict 通过,特定的每列 NA 值。默认情况下,以下值被解释为 NaN:''、'#N/A'、'#N/AN/A'、'#NA'、'-1.#IND'、'-1.#QNAN'、 '-NaN', '-nan', '1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan ', '空值'。
回答by MaxU
UPDATE:2020-03-23 for Pandas 1+:
更新:2020-03-23 Pandas 1+:
many thanks to @aiguoferfor the adapted solution:
非常感谢@aiguofer提供的调整后的解决方案:
na_vals = pd.io.parsers.STR_NA_VALUES.difference({'NULL','null'})
df = pd.read_csv(io.StringIO(data), na_values=na_vals, keep_default_na=False)
Old answer:
旧答案:
we can dynamically exclude 'NULL'
and 'null'
from the set of default _NA_VALUES
:
我们可以动态地排除'NULL'
和'null'
从默认设置中排除_NA_VALUES
:
In [4]: na_vals = pd.io.common._NA_VALUES.difference({'NULL','null'})
In [5]: na_vals
Out[5]:
{'',
'#N/A',
'#N/A N/A',
'#NA',
'-1.#IND',
'-1.#QNAN',
'-NaN',
'-nan',
'1.#IND',
'1.#QNAN',
'N/A',
'NA',
'NaN',
'n/a',
'nan'}
and use it in read_csv()
:
并将其用于read_csv()
:
df = pd.read_csv(io.StringIO(data), na_values=na_vals)
回答by Acccumulation
Other answers are better for reading in a csv without "null" being interpreted as Nan
, but if you have a dataframe that you want "fixed", this code will do so: df=df.fillna('null')
其他答案更适合在 csv 中读取而不会将“null”解释为Nan
,但是如果您有一个想要“固定”的数据框,则此代码将这样做:df=df.fillna('null')