Python 检查 DataFrame 中的哪些列是分类的
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/29803093/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Check which columns in DataFrame are Categorical
提问by pds
I am new to Pandas... I want to a simple and generic way to find which columns are categorical
in my DataFrame
, when I don't manually specify each column type, unlike in this SO question. The df
is created with:
我是新来的熊猫......我想一个简单的,并找到哪些列是通用的方式categorical
在我DataFrame
,当我不手动指定各列的类型,不像这太问题。在df
与创建:
import pandas as pd
df = pd.read_csv("test.csv", header=None)
e.g.
例如
0 1 2 3 4
0 1.539240 0.423437 -0.687014 Chicago Safari
1 0.815336 0.913623 1.800160 Boston Safari
2 0.821214 -0.824839 0.483724 New York Safari
.
.
UPDATE (2018/02/04) The question assumes numerical columns are NOT categorical, @Zero's accepted answer solves this.
更新 (2018/02/04) 问题假设数字列不是分类的,@Zero接受的答案解决了这个问题。
BE CAREFUL - As @Sagarkar's comment points out that's not always true.The difficulty is that Data Types and Categorical/Ordinal/Nominal types are orthogonal concepts, thus mapping between them isn't straightforward. @Jeff's answerbelow specifies the precise manner to achieve the manual mapping.
小心 - 正如@Sagarkar 的评论指出的那样,这并不总是正确的。困难在于数据类型和分类/有序/名义类型是正交概念,因此它们之间的映射并不简单。@Jeff下面的回答指定了实现手动映射的精确方式。
采纳答案by Zero
You could use df._get_numeric_data()
to get numeric columns and then find out categorical columns
您可以使用df._get_numeric_data()
获取数字列,然后找出分类列
In [66]: cols = df.columns
In [67]: num_cols = df._get_numeric_data().columns
In [68]: num_cols
Out[68]: Index([u'0', u'1', u'2'], dtype='object')
In [69]: list(set(cols) - set(num_cols))
Out[69]: ['3', '4']
回答by Liam Foley
Use .dtypes
使用 .dtypes
In [10]: df.dtypes
Out[10]:
0 float64
1 float64
2 float64
3 object
4 object
dtype: object
回答by pds
The way I found was updating to Pandas v0.16.0, then excluding number dtypes with:
我发现的方法是更新到 Pandas v0.16.0,然后排除数字类型:
df.select_dtypes(exclude=["number","bool_","object_"])
Which works, providing no types are changed and no more are added to NumPy. The suggestion in the question's comments by @Jeffsuggests include=["category"]
, but that didn't seem to work.
哪个有效,只要没有改变类型,也没有更多的添加到 NumPy。在建议通过@Jeff问题的意见建议include=["category"]
,但似乎并没有工作。
NumPy Types:link
NumPy 类型:链接
回答by Jeff
For posterity. The canonical method to select dtypes is .select_dtypes
. You can specify an actual numpy dtype or convertible, or 'category' which not a numpy dtype.
为后人。选择 dtypes 的规范方法是.select_dtypes
. 您可以指定实际的 numpy dtype 或可转换的,或者不是 numpy dtype 的“类别”。
In [1]: df = DataFrame({'A' : Series(range(3)).astype('category'), 'B' : range(3), 'C' : list('abc'), 'D' : np.random.randn(3) })
In [2]: df
Out[2]:
A B C D
0 0 0 a 0.141296
1 1 1 b 0.939059
2 2 2 c -2.305019
In [3]: df.select_dtypes(include=['category'])
Out[3]:
A
0 0
1 1
2 2
In [4]: df.select_dtypes(include=['object'])
Out[4]:
C
0 a
1 b
2 c
In [5]: df.select_dtypes(include=['object']).dtypes
Out[5]:
C object
dtype: object
In [6]: df.select_dtypes(include=['category','int']).dtypes
Out[6]:
A category
B int64
dtype: object
In [7]: df.select_dtypes(include=['category','int','float']).dtypes
Out[7]:
A category
B int64
D float64
dtype: object
回答by ankit2saxena
This will give an array of all the categorical variables in a dataframe.
这将给出数据框中所有分类变量的数组。
dataset.select_dtypes(include=['O']).columns.values
回答by Sudhir Tiwari
numeric_var = [key for key in dict(df.dtypes)
if dict(pd.dtypes)[key]
in ['float64','float32','int32','int64']] # Numeric Variable
cat_var = [key for key in dict(df.dtypes)
if dict(df.dtypes)[key] in ['object'] ] # Categorical Varible
回答by Shikhar Omar
You can get the list of categorical columns using this code :
您可以使用以下代码获取分类列的列表:
dfName.select_dtypes(exclude=['int', 'float']).columns
And intuitively for numerical columns :
直观地用于数字列:
dfName.select_dtypes(include=['int', 'float']).columns
Hope that helps.
希望有帮助。
回答by Hamza Chennaq
# Import packages
import numpy as np
import pandas as pd
# Data
df = pd.DataFrame({"Country" : ["France", "Spain", "Germany", "Spain", "Germany", "France"],
"Age" : [34, 27, 30, 32, 42, 30],
"Purchased" : ["No", "Yes", "No", "No", "Yes", "Yes"]})
df
Out[1]:
Country Age Purchased
0 France 34 No
1 Spain 27 Yes
2 Germany 30 No
3 Spain 32 No
4 Germany 42 Yes
5 France 30 Yes
# Checking data type
df.dtypes
Out[2]:
Country object
Age int64
Purchased object
dtype: object
# Saving CATEGORICAL Variables
cat_col = [c for i, c in enumerate(df.columns) if df.dtypes[i] in [np.object]]
cat_col
Out[3]: ['Country', 'Purchased']
回答by dCrystal
Use pandas.DataFrame.select_dtypes. There are categoricaldtypes that can be found by 'categorical' flag. For Stringsyou might use the numpy objectdtype
使用pandas.DataFrame.select_dtypes。有分类,可以通过“分类”标志可以找到dtypes。对于字符串,您可以使用 numpy对象dtype
More Info: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.select_dtypes.html
更多信息:https: //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.select_dtypes.html
Exemple:
例子:
import pandas as pd
df = pd.DataFrame({'Integer': [1, 2] * 3,'Bool': [True, False] * 3,'Float': [1.0, 2.0] * 3,'String': ['Dog', 'Cat'] * 3})
df
Out[1]:
Integer Bool Float String
0 1 True 1.0 Dog
1 2 False 2.0 Cat
2 1 True 1.0 Dog
3 2 False 2.0 Cat
4 1 True 1.0 Dog
5 2 False 2.0 Cat
df.select_dtypes(include=['category', object]).columns
Out[2]:
Index(['String'], dtype='object')
回答by Gucci148
select categorical column names
选择分类列名称
cat_features=[i for i in df.columns if df.dtypes[i]=='object']
cat_features=[i for i in df.columns if df.dtypes[i]=='object']