Python 检查数据框列是否分类
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26924904/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Check if dataframe column is Categorical
提问by Marius
I can't seem to get a simple dtype check working with Pandas' improved Categoricals in v0.15+. Basically I just want something like is_categorical(column) -> True/False.
我似乎无法在 v0.15+ 中使用 Pandas 改进的 Categoricals 进行简单的 dtype 检查。基本上我只想要像is_categorical(column) -> True/False.
import pandas as pd
import numpy as np
import random
df = pd.DataFrame({
'x': np.linspace(0, 50, 6),
'y': np.linspace(0, 20, 6),
'cat_column': random.sample('abcdef', 6)
})
df['cat_column'] = pd.Categorical(df2['cat_column'])
We can see that the dtypefor the categorical column is 'category':
我们可以看到dtype分类列的 'category' :
df.cat_column.dtype
Out[20]: category
And normally we can do a dtype check by just comparing to the name of the dtype:
通常我们可以通过与 dtype 的名称进行比较来进行 dtype 检查:
df.x.dtype == 'float64'
Out[21]: True
But this doesn't seem to work when trying to check if the xcolumn
is categorical:
但这在尝试检查x列是否分类时似乎不起作用:
df.x.dtype == 'category'
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-22-94d2608815c4> in <module>()
----> 1 df.x.dtype == 'category'
TypeError: data type "category" not understood
Is there any way to do these types of checks in pandas v0.15+?
有没有办法在 pandas v0.15+ 中进行这些类型的检查?
采纳答案by Jeff Tratner
Use the nameproperty to do the comparison instead, it should always work because it's just a string:
使用该name属性进行比较,它应该始终有效,因为它只是一个字符串:
>>> import numpy as np
>>> arr = np.array([1, 2, 3, 4])
>>> arr.dtype.name
'int64'
>>> import pandas as pd
>>> cat = pd.Categorical(['a', 'b', 'c'])
>>> cat.dtype.name
'category'
So, to sum up, you can end up with a simple, straightforward function:
所以,总而言之,你可以得到一个简单直接的函数:
def is_categorical(array_like):
return array_like.dtype.name == 'category'
回答by joris
First, the string representation of the dtype is 'category'and not 'categorical', so this works:
首先,dtype 的字符串表示形式 is'category'和 not 'categorical',所以这是有效的:
In [41]: df.cat_column.dtype == 'category'
Out[41]: True
But indeed, as you noticed, this comparison gives a TypeErrorfor other dtypes, so you would have to wrap it with a try .. except ..block.
但确实,正如您所注意到的,此比较TypeError为其他 dtype提供了 a ,因此您必须用try .. except ..块包装它。
Other ways to check using pandas internals:
使用熊猫内部检查的其他方法:
In [42]: isinstance(df.cat_column.dtype, pd.api.types.CategoricalDtype)
Out[42]: True
In [43]: pd.api.types.is_categorical_dtype(df.cat_column)
Out[43]: True
For non-categorical columns, those statements will return Falseinstead of raising an error. For example:
对于非分类列,这些语句将返回False而不是引发错误。例如:
In [44]: pd.api.types.is_categorical_dtype(df.x)
Out[44]: False
For much older version of pandas, replace pd.api.typesin the above snippet with pd.core.common.
对于更旧版本的pandas,将pd.api.types上面的代码片段替换为pd.core.common.
回答by jorijnsmit
Just putting this here because pandas.DataFrame.select_dtypes()is what I was actuallylooking for:
把它放在这里是因为这pandas.DataFrame.select_dtypes()是我真正想要的:
df['column'].name in df.select_dtypes(include='category').columns
Thanks to @Jeff.
感谢@Jeff。
回答by DieterDP
In my pandas version (v1.0.3), a shorter version of joris' answer is available.
在我的 Pandas 版本 (v1.0.3) 中,提供了一个较短版本的 joris 答案。
df = pd.DataFrame({'noncat': [1, 2, 3], 'categ': pd.Categorical(['A', 'B', 'C'])})
print(isinstance(df.noncat.dtype, pd.CategoricalDtype)) # False
print(isinstance(df.categ.dtype, pd.CategoricalDtype)) # True
print(pd.CategoricalDtype.is_dtype(df.noncat)) # False
print(pd.CategoricalDtype.is_dtype(df.categ)) # True

