Python 检查 DataFrame 中的哪些列是分类的

Question

提问by pds

I am new to Pandas... I want to a simple and generic way to find which columns are categoricalin my DataFrame, when I don't manually specify each column type, unlike in this SO question. The dfis created with:

我是新来的熊猫......我想一个简单的，并找到哪些列是通用的方式categorical在我DataFrame，当我不手动指定各列的类型，不像这太问题。在df与创建：

import pandas as pd
df = pd.read_csv("test.csv", header=None)

e.g.

例如

           0         1         2         3        4
0   1.539240  0.423437 -0.687014   Chicago   Safari
1   0.815336  0.913623  1.800160    Boston   Safari
2   0.821214 -0.824839  0.483724  New York   Safari

.

UPDATE (2018/02/04) The question assumes numerical columns are NOT categorical, @Zero's accepted answer solves this.

更新 (2018/02/04) 问题假设数字列不是分类的，@Zero接受的答案解决了这个问题。

BE CAREFUL - As @Sagarkar's comment points out that's not always true.The difficulty is that Data Types and Categorical/Ordinal/Nominal types are orthogonal concepts, thus mapping between them isn't straightforward. @Jeff's answerbelow specifies the precise manner to achieve the manual mapping.

小心 - 正如@Sagarkar 的评论指出的那样，这并不总是正确的。困难在于数据类型和分类/有序/名义类型是正交概念，因此它们之间的映射并不简单。@Jeff下面的回答指定了实现手动映射的精确方式。

Answer 1

采纳答案by Zero

You could use df._get_numeric_data()to get numeric columns and then find out categorical columns

您可以使用df._get_numeric_data()获取数字列，然后找出分类列

In [66]: cols = df.columns

In [67]: num_cols = df._get_numeric_data().columns

In [68]: num_cols
Out[68]: Index([u'0', u'1', u'2'], dtype='object')

In [69]: list(set(cols) - set(num_cols))
Out[69]: ['3', '4']

Answer 2

回答by Liam Foley

Use .dtypes

使用 .dtypes

In [10]: df.dtypes
Out[10]: 
0    float64
1    float64
2    float64
3     object
4     object
dtype: object

Answer 3

回答by pds

The way I found was updating to Pandas v0.16.0, then excluding number dtypes with:

我发现的方法是更新到 Pandas v0.16.0，然后排除数字类型：

df.select_dtypes(exclude=["number","bool_","object_"])

Which works, providing no types are changed and no more are added to NumPy. The suggestion in the question's comments by @Jeffsuggests include=["category"], but that didn't seem to work.

哪个有效，只要没有改变类型，也没有更多的添加到 NumPy。在建议通过@Jeff问题的意见建议include=["category"]，但似乎并没有工作。

NumPy Types:link

NumPy 类型：链接

Numpy Types

Numpy 类型

Answer 4

回答by Jeff

For posterity. The canonical method to select dtypes is .select_dtypes. You can specify an actual numpy dtype or convertible, or 'category' which not a numpy dtype.

为后人。选择 dtypes 的规范方法是.select_dtypes. 您可以指定实际的 numpy dtype 或可转换的，或者不是 numpy dtype 的“类别”。

In [1]: df = DataFrame({'A' : Series(range(3)).astype('category'), 'B' : range(3), 'C' : list('abc'), 'D' : np.random.randn(3) })

In [2]: df
Out[2]: 
   A  B  C         D
0  0  0  a  0.141296
1  1  1  b  0.939059
2  2  2  c -2.305019

In [3]: df.select_dtypes(include=['category'])
Out[3]: 
   A
0  0
1  1
2  2

In [4]: df.select_dtypes(include=['object'])
Out[4]: 
   C
0  a
1  b
2  c

In [5]: df.select_dtypes(include=['object']).dtypes
Out[5]: 
C    object
dtype: object

In [6]: df.select_dtypes(include=['category','int']).dtypes
Out[6]: 
A    category
B       int64
dtype: object

In [7]: df.select_dtypes(include=['category','int','float']).dtypes
Out[7]: 
A    category
B       int64
D     float64
dtype: object

Answer 5

回答by ankit2saxena

This will give an array of all the categorical variables in a dataframe.

这将给出数据框中所有分类变量的数组。

dataset.select_dtypes(include=['O']).columns.values

Answer 6

回答by Sudhir Tiwari

numeric_var = [key for key in dict(df.dtypes)
                   if dict(pd.dtypes)[key]
                       in ['float64','float32','int32','int64']] # Numeric Variable

cat_var = [key for key in dict(df.dtypes)
             if dict(df.dtypes)[key] in ['object'] ] # Categorical Varible

Answer 7

回答by Shikhar Omar

You can get the list of categorical columns using this code :

您可以使用以下代码获取分类列的列表：

dfName.select_dtypes(exclude=['int', 'float']).columns

And intuitively for numerical columns :

直观地用于数字列：

dfName.select_dtypes(include=['int', 'float']).columns

Hope that helps.

希望有帮助。

Answer 8

回答by Hamza Chennaq

# Import packages
import numpy as np
import pandas as pd

# Data
df = pd.DataFrame({"Country" : ["France", "Spain", "Germany", "Spain", "Germany", "France"], 
                   "Age" : [34, 27, 30, 32, 42, 30], 
                   "Purchased" : ["No", "Yes", "No", "No", "Yes", "Yes"]})
df

Out[1]:
  Country Age Purchased
0  France  34        No
1   Spain  27       Yes
2 Germany  30        No
3   Spain  32        No
4 Germany  42       Yes
5  France  30       Yes

# Checking data type
df.dtypes

Out[2]: 
Country      object
Age           int64
Purchased    object
dtype: object

# Saving CATEGORICAL Variables
cat_col = [c for i, c in enumerate(df.columns) if df.dtypes[i] in [np.object]]
cat_col
Out[3]: ['Country', 'Purchased']

Answer 9

回答by dCrystal

Use pandas.DataFrame.select_dtypes. There are categoricaldtypes that can be found by 'categorical' flag. For Stringsyou might use the numpy objectdtype

使用pandas.DataFrame.select_dtypes。有分类，可以通过“分类”标志可以找到dtypes。对于字符串，您可以使用 numpy对象dtype

More Info: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.select_dtypes.html

更多信息：https: //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.select_dtypes.html

Exemple:

例子：

import pandas as pd
df = pd.DataFrame({'Integer': [1, 2] * 3,'Bool': [True, False] * 3,'Float': [1.0, 2.0] * 3,'String': ['Dog', 'Cat'] * 3})
df

Out[1]:    
    Integer Bool    Float   String
0   1       True    1.0     Dog
1   2       False   2.0     Cat
2   1       True    1.0     Dog
3   2       False   2.0     Cat
4   1       True    1.0     Dog
5   2       False   2.0     Cat

df.select_dtypes(include=['category', object]).columns

Out[2]:
Index(['String'], dtype='object')

Answer 10

回答by Gucci148

select categorical column names

选择分类列名称

cat_features=[i for i in df.columns if df.dtypes[i]=='object']

Python 检查 DataFrame 中的哪些列是分类的

提问by pds

采纳答案by Zero

回答by Liam Foley

回答by pds

回答by Jeff

回答by ankit2saxena

回答by Sudhir Tiwari

回答by Shikhar Omar

回答by Hamza Chennaq

回答by dCrystal

回答by Gucci148

相关推荐

最近更新

标签

Python 检查 DataFrame 中的哪些列是分类的

提问by pds

采纳答案by Zero

回答by Liam Foley

回答by pds

回答by Jeff

回答by ankit2saxena

回答by Sudhir Tiwari

回答by Shikhar Omar

回答by Hamza Chennaq

回答by dCrystal

回答by Gucci148

相关推荐

Python 的 lambda 带有下划线作为参数？

Python 导入 csv 到列表

Python 计算目录和子目录中的文件夹数

Python 使用 SMTP SSL/端口 465 发送电子邮件

相关推荐

最近更新

标签