在 Pandas 列中查找混合类型
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29376026/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find mixed types in Pandas columns
提问by K.-Michael Aye
Ever so often I get this warning when parsing data files:
在解析数据文件时,我经常收到此警告:
WARNING:py.warnings:/usr/local/python3/miniconda/lib/python3.4/site-
packages/pandas-0.16.0_12_gdcc7431-py3.4-linux-x86_64.egg/pandas
/io/parsers.py:1164: DtypeWarning: Columns (0,2,14,20) have mixed types.
Specify dtype option on import or set low_memory=False.
data = self._reader.read(nrows)
But if the data is large (I have 50k rows), how can I find WHERE in the data the change of dtype occurs?
但是如果数据很大(我有 50k 行),我如何在数据中找到发生 dtype 变化的 WHERE?
回答by DSM
I'm not entirely sure what you're after, but it's easy enough to find the rows which contain elements which don't share the type of the first row. For example:
我不完全确定你在追求什么,但很容易找到包含不共享第一行类型的元素的行。例如:
>>> df = pd.DataFrame({"A": np.arange(500), "B": np.arange(500.0)})
>>> df.loc[321, "A"] = "Fred"
>>> df.loc[325, "B"] = True
>>> weird = (df.applymap(type) != df.iloc[0].apply(type)).any(axis=1)
>>> df[weird]
A B
321 Fred 321
325 325 True
回答by K.-Michael Aye
In addition to DSM's answer, with a many-column dataframe it can be helpful to find the columns that change type like so:
除了 DSM 的答案之外,使用多列数据框查找更改类型的列会很有帮助,如下所示:
for col in df.columns:
weird = (df[[col]].applymap(type) != df[[col]].iloc[0].apply(type)).any(axis=1)
if len(df[weird]) > 0:
print(col)
回答by Acumenus
This approach uses pandas.api.types.infer_dtypeto find the columns which have mixed dtypes. It was tested with Pandas 1 under Python 3.8.
这种方法用于pandas.api.types.infer_dtype查找具有混合 dtypes 的列。它在 Python 3.8 下使用 Pandas 1 进行了测试。
Note that this answer has multiple uses of assignment expressionswhich work only with Python 3.8 or newer. It can however trivially be modified to not use them.
请注意,此答案有多种赋值表达式的用途,这些表达式仅适用于 Python 3.8 或更高版本。然而,它可以被简单地修改为不使用它们。
if mixed_dtypes := {c: dtype for c in df.columns if (dtype := pd.api.types.infer_dtype(df[c])).startswith("mixed")}:
raise TypeError(f"Dataframe has one more mixed dtypes: {mixed_dtypes}")
This approach doesn't however find a row with the changed dtype.
但是,这种方法不会找到具有更改的 dtype 的行。

