在 Pandas 列中查找混合类型

Question

提问by K.-Michael Aye

Ever so often I get this warning when parsing data files:

在解析数据文件时，我经常收到此警告：

WARNING:py.warnings:/usr/local/python3/miniconda/lib/python3.4/site-
packages/pandas-0.16.0_12_gdcc7431-py3.4-linux-x86_64.egg/pandas
/io/parsers.py:1164: DtypeWarning: Columns (0,2,14,20) have mixed types. 
Specify dtype option on import or set low_memory=False.
          data = self._reader.read(nrows)

But if the data is large (I have 50k rows), how can I find WHERE in the data the change of dtype occurs?

但是如果数据很大（我有 50k 行），我如何在数据中找到发生 dtype 变化的 WHERE？

Answer 1

回答by DSM

I'm not entirely sure what you're after, but it's easy enough to find the rows which contain elements which don't share the type of the first row. For example:

我不完全确定你在追求什么，但很容易找到包含不共享第一行类型的元素的行。例如：

>>> df = pd.DataFrame({"A": np.arange(500), "B": np.arange(500.0)})
>>> df.loc[321, "A"] = "Fred"
>>> df.loc[325, "B"] = True
>>> weird = (df.applymap(type) != df.iloc[0].apply(type)).any(axis=1)
>>> df[weird]
        A     B
321  Fred   321
325   325  True

Answer 2

回答by K.-Michael Aye

In addition to DSM's answer, with a many-column dataframe it can be helpful to find the columns that change type like so:

除了 DSM 的答案之外，使用多列数据框查找更改类型的列会很有帮助，如下所示：

for col in df.columns:
    weird = (df[[col]].applymap(type) != df[[col]].iloc[0].apply(type)).any(axis=1)
    if len(df[weird]) > 0:
        print(col)

Answer 3

回答by Acumenus

This approach uses pandas.api.types.infer_dtypeto find the columns which have mixed dtypes. It was tested with Pandas 1 under Python 3.8.

这种方法用于pandas.api.types.infer_dtype查找具有混合 dtypes 的列。它在 Python 3.8 下使用 Pandas 1 进行了测试。

Note that this answer has multiple uses of assignment expressionswhich work only with Python 3.8 or newer. It can however trivially be modified to not use them.

请注意，此答案有多种赋值表达式的用途，这些表达式仅适用于 Python 3.8 或更高版本。然而，它可以被简单地修改为不使用它们。

if mixed_dtypes := {c: dtype for c in df.columns if (dtype := pd.api.types.infer_dtype(df[c])).startswith("mixed")}:
    raise TypeError(f"Dataframe has one more mixed dtypes: {mixed_dtypes}")

This approach doesn't however find a row with the changed dtype.

但是，这种方法不会找到具有更改的 dtype 的行。

在 Pandas 列中查找混合类型

提问by K.-Michael Aye

回答by DSM

回答by K.-Michael Aye

回答by Acumenus

相关推荐

最近更新

标签

在 Pandas 列中查找混合类型

提问by K.-Michael Aye

回答by DSM

回答by K.-Michael Aye

回答by Acumenus

相关推荐

pandas 熊猫合并的关键错误（左连接）

pandas 在 Python 中映射 if 语句

pandas python中的聚合时间序列

Python pandas 数据框警告，建议使用 .loc 代替？

相关推荐

最近更新

标签