pandas 如何将 NULL 视为带有熊猫的普通字符串？

Question

提问by piripiri

I have a csv-file with a column with strings and I want to read it with pandas. In this file the string nulloccurs as an actual value and should not be regarded as a missing value.

我有一个带有字符串的列的 csv 文件，我想用 Pandas 读取它。在此文件中，字符串null作为实际值出现，不应被视为缺失值。

Example:

例子：

import pandas as pd
from io import StringIO

data = u'strings,numbers\nfoo,1\nbar,2\nnull,3'
print(pd.read_csv(StringIO(data)))

This gives the following output:

这给出了以下输出：

  strings  numbers
0     foo        1
1     bar        2
2     NaN        3

What can I do to get the value nullas it is (and not as NaN) into the DataFrame? The file can be assumed to not contain any actually missing values.

我该怎么做才能将值null（而不是 NaN）按原样获取到 DataFrame 中？可以假设该文件不包含任何实际缺失的值。

Answer 1

回答by cs95

You can specify a convertersargument for the stringcolumn.

您可以converters为该string列指定一个参数。

pd.read_csv(StringIO(data), converters={'strings' : str})

  strings  numbers
0     foo        1
1     bar        2
2    null        3

This will by-pass pandas' automatic parsing.

这将绕过Pandas的自动解析。

Another option is setting na_filter=False:

另一种选择是设置na_filter=False：

pd.read_csv(StringIO(data), na_filter=False)

  strings  numbers
0     foo        1
1     bar        2
2    null        3

This works for the entire DataFrame, so use with caution. I recommend first option if you want to surgically apply this to select columns instead.

这适用于整个 DataFrame，因此请谨慎使用。如果您想通过外科手术将其应用于选择列，我推荐第一个选项。

Answer 2

回答by EdChum

The reason this happens is that the string 'null'is treated as NaNon parsing, you can turn this off by passing keep_default_na=Falsein addition to @coldspeed's answer:

发生这种情况的原因是该字符串'null'被视为NaN解析，keep_default_na=False除了@coldspeed 的答案之外，您还可以通过传递来关闭它：

In[49]:
data = u'strings,numbers\nfoo,1\nbar,2\nnull,3'
df = pd.read_csv(io.StringIO(data), keep_default_na=False)
df

Out[49]: 
  strings  numbers
0     foo        1
1     bar        2
2    null        3

The full list is:

完整列表是：

na_values : scalar, str, list-like, or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specific per-column NA values. By default the following values are interpreted as NaN: ‘', ‘#N/A', ‘#N/A N/A', ‘#NA', ‘-1.#IND', ‘-1.#QNAN', ‘-NaN', ‘-nan', ‘1.#IND', ‘1.#QNAN', ‘N/A', ‘NA', ‘NULL', ‘NaN', ‘n/a', ‘nan', ‘null'.

na_values : 标量、str、类列表或字典，默认无
要识别为 NA/NaN 的其他字符串。如果 dict 通过，特定的每列 NA 值。默认情况下，以下值被解释为 NaN：''、'#N/A'、'#N/AN/A'、'#NA'、'-1.#IND'、'-1.#QNAN'、 '-NaN', '-nan', '1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'n/a', 'nan '， '空值'。

Answer 3

回答by MaxU

UPDATE:2020-03-23 for Pandas 1+:

更新：2020-03-23 Pandas 1+：

many thanks to @aiguoferfor the adapted solution:

非常感谢@aiguofer提供的调整后的解决方案：

na_vals = pd.io.parsers.STR_NA_VALUES.difference({'NULL','null'})
df = pd.read_csv(io.StringIO(data), na_values=na_vals, keep_default_na=False)

Old answer:

旧答案：

we can dynamically exclude 'NULL'and 'null'from the set of default _NA_VALUES:

我们可以动态地排除'NULL'和'null'从默认设置中排除_NA_VALUES：

In [4]: na_vals = pd.io.common._NA_VALUES.difference({'NULL','null'})

In [5]: na_vals
Out[5]:
{'',
 '#N/A',
 '#N/A N/A',
 '#NA',
 '-1.#IND',
 '-1.#QNAN',
 '-NaN',
 '-nan',
 '1.#IND',
 '1.#QNAN',
 'N/A',
 'NA',
 'NaN',
 'n/a',
 'nan'}

and use it in read_csv():

并将其用于read_csv()：

df = pd.read_csv(io.StringIO(data), na_values=na_vals)

Answer 4

回答by Acccumulation

Other answers are better for reading in a csv without "null" being interpreted as Nan, but if you have a dataframe that you want "fixed", this code will do so: df=df.fillna('null')

其他答案更适合在 csv 中读取而不会将“null”解释为Nan，但是如果您有一个想要“固定”的数据框，则此代码将这样做：df=df.fillna('null')

pandas 如何将 NULL 视为带有熊猫的普通字符串？

提问by piripiri

回答by cs95

回答by EdChum

回答by MaxU

回答by Acccumulation

相关推荐

最近更新

标签

pandas 如何将 NULL 视为带有熊猫的普通字符串？

提问by piripiri

回答by cs95

回答by EdChum

回答by MaxU

回答by Acccumulation

相关推荐

pandas 使用样本权重训练 xgboost (0.7) 分类器

pandas 从 numpy 数组创建熊猫数据框

从 Pandas Column 解压字典

如何将函数应用于 Pandas 中的多列

相关推荐

最近更新

标签