Python 检查 DataFrame 中的哪些列是分类的

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29803093/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 05:02:05  来源:igfitidea点击:

Check which columns in DataFrame are Categorical

pythonpandas

提问by pds

I am new to Pandas... I want to a simple and generic way to find which columns are categoricalin my DataFrame, when I don't manually specify each column type, unlike in this SO question. The dfis created with:

我是新来的熊猫......我想一个简单的,并找到哪些列是通用的方式categorical在我DataFrame,当我不手动指定各列的类型,不像这太问题。在df与创建:

import pandas as pd
df = pd.read_csv("test.csv", header=None)

e.g.

例如

           0         1         2         3        4
0   1.539240  0.423437 -0.687014   Chicago   Safari
1   0.815336  0.913623  1.800160    Boston   Safari
2   0.821214 -0.824839  0.483724  New York   Safari

.

.

UPDATE (2018/02/04) The question assumes numerical columns are NOT categorical, @Zero's accepted answer solves this.

更新 (2018/02/04) 问题假设数字列不是分类的,@Zero接受的答案解决了这个问题

BE CAREFUL - As @Sagarkar's comment points out that's not always true.The difficulty is that Data Types and Categorical/Ordinal/Nominal types are orthogonal concepts, thus mapping between them isn't straightforward. @Jeff's answerbelow specifies the precise manner to achieve the manual mapping.

小心 - 正如@Sagarkar 的评论指出的那样,这并不总是正确的。困难在于数据类型和分类/有序/名义类型是正交概念,因此它们之间的映射并不简单。@Jeff下面的回答指定了实现手动映射的精确方式。

采纳答案by Zero

You could use df._get_numeric_data()to get numeric columns and then find out categorical columns

您可以使用df._get_numeric_data()获取数字列,然后找出分类列

In [66]: cols = df.columns

In [67]: num_cols = df._get_numeric_data().columns

In [68]: num_cols
Out[68]: Index([u'0', u'1', u'2'], dtype='object')

In [69]: list(set(cols) - set(num_cols))
Out[69]: ['3', '4']

回答by Liam Foley

Use .dtypes

使用 .dtypes

In [10]: df.dtypes
Out[10]: 
0    float64
1    float64
2    float64
3     object
4     object
dtype: object

回答by pds

The way I found was updating to Pandas v0.16.0, then excluding number dtypes with:

我发现的方法是更新到 Pandas v0.16.0,然后排除数字类型:

df.select_dtypes(exclude=["number","bool_","object_"])

Which works, providing no types are changed and no more are added to NumPy. The suggestion in the question's comments by @Jeffsuggests include=["category"], but that didn't seem to work.

哪个有效,只要没有改变类型,也没有更多的添加到 NumPy。在建议通过@Jeff问题的意见建议include=["category"],但似乎并没有工作。

NumPy Types:link

NumPy 类型:链接

Numpy Types

Numpy 类型

回答by Jeff

For posterity. The canonical method to select dtypes is .select_dtypes. You can specify an actual numpy dtype or convertible, or 'category' which not a numpy dtype.

为后人。选择 dtypes 的规范方法是.select_dtypes. 您可以指定实际的 numpy dtype 或可转换的,或者不是 numpy dtype 的“类别”。

In [1]: df = DataFrame({'A' : Series(range(3)).astype('category'), 'B' : range(3), 'C' : list('abc'), 'D' : np.random.randn(3) })

In [2]: df
Out[2]: 
   A  B  C         D
0  0  0  a  0.141296
1  1  1  b  0.939059
2  2  2  c -2.305019

In [3]: df.select_dtypes(include=['category'])
Out[3]: 
   A
0  0
1  1
2  2

In [4]: df.select_dtypes(include=['object'])
Out[4]: 
   C
0  a
1  b
2  c

In [5]: df.select_dtypes(include=['object']).dtypes
Out[5]: 
C    object
dtype: object

In [6]: df.select_dtypes(include=['category','int']).dtypes
Out[6]: 
A    category
B       int64
dtype: object

In [7]: df.select_dtypes(include=['category','int','float']).dtypes
Out[7]: 
A    category
B       int64
D     float64
dtype: object

回答by ankit2saxena

This will give an array of all the categorical variables in a dataframe.

这将给出数据框中所有分类变量的数组。

dataset.select_dtypes(include=['O']).columns.values

回答by Sudhir Tiwari

numeric_var = [key for key in dict(df.dtypes)
                   if dict(pd.dtypes)[key]
                       in ['float64','float32','int32','int64']] # Numeric Variable

cat_var = [key for key in dict(df.dtypes)
             if dict(df.dtypes)[key] in ['object'] ] # Categorical Varible

回答by Shikhar Omar

You can get the list of categorical columns using this code :

您可以使用以下代码获取分类列的列表:

dfName.select_dtypes(exclude=['int', 'float']).columns

And intuitively for numerical columns :

直观地用于数字列:

dfName.select_dtypes(include=['int', 'float']).columns

Hope that helps.

希望有帮助。

回答by Hamza Chennaq

# Import packages
import numpy as np
import pandas as pd

# Data
df = pd.DataFrame({"Country" : ["France", "Spain", "Germany", "Spain", "Germany", "France"], 
                   "Age" : [34, 27, 30, 32, 42, 30], 
                   "Purchased" : ["No", "Yes", "No", "No", "Yes", "Yes"]})
df

Out[1]:
  Country Age Purchased
0  France  34        No
1   Spain  27       Yes
2 Germany  30        No
3   Spain  32        No
4 Germany  42       Yes
5  France  30       Yes

# Checking data type
df.dtypes

Out[2]: 
Country      object
Age           int64
Purchased    object
dtype: object

# Saving CATEGORICAL Variables
cat_col = [c for i, c in enumerate(df.columns) if df.dtypes[i] in [np.object]]
cat_col
Out[3]: ['Country', 'Purchased']

回答by dCrystal

Use pandas.DataFrame.select_dtypes. There are categoricaldtypes that can be found by 'categorical' flag. For Stringsyou might use the numpy objectdtype

使用pandas.DataFrame.select_dtypes。有分类,可以通过“分类”标志可以找到dtypes。对于字符串,您可以使用 numpy对象dtype

More Info: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.select_dtypes.html

更多信息:https: //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.select_dtypes.html

Exemple:

例子:

import pandas as pd
df = pd.DataFrame({'Integer': [1, 2] * 3,'Bool': [True, False] * 3,'Float': [1.0, 2.0] * 3,'String': ['Dog', 'Cat'] * 3})
df

Out[1]:    
    Integer Bool    Float   String
0   1       True    1.0     Dog
1   2       False   2.0     Cat
2   1       True    1.0     Dog
3   2       False   2.0     Cat
4   1       True    1.0     Dog
5   2       False   2.0     Cat

df.select_dtypes(include=['category', object]).columns

Out[2]:
Index(['String'], dtype='object')

回答by Gucci148

select categorical column names

选择分类列名称

cat_features=[i for i in df.columns if df.dtypes[i]=='object']

cat_features=[i for i in df.columns if df.dtypes[i]=='object']