pandas 在数据框的每一列中查找数据类型
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/36822580/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find type of data in each column of dataframe
提问by xyz
I have read this link: Check which columns in DataFrame are Categorical
我已阅读此链接:检查 DataFrame 中的哪些列是分类的
I have a dataframe where salaries are mentioned with a $ prepended to it. It is also being shown as categorical data.
我有一个数据框,其中提到了薪水,并在其前面加上了 $。它也被显示为分类数据。
Moreover suppose my nominal data is not in form of strings such as 'F','M' etc. Then how do we classify which columns are numeric, categorical (with strings) and nominal?
此外,假设我的名义数据不是诸如“F”、“M”等字符串的形式。那么我们如何分类哪些列是数字、分类(带字符串)和名义列?
Say my data looks like this:
假设我的数据如下所示:
ID Gender Salary HasPet
1 M 0 0
2 F 00 0
3 M 00 1
回答by MaxU
You are confusing categoricaldata type with strings(pandas shows it as object
).
您将分类数据类型与字符串混淆(Pandas将其显示为object
)。
Numbers can't contain $
dollar sign by their nature and because of that pandas consider Salary
column as string and this is correctbehavior!
数字$
本质上不能包含美元符号,因为Pandas将Salary
列视为字符串,这是正确的行为!
You can easily convert your salary column to integer/float if you want though:
如果您愿意,您可以轻松地将您的工资列转换为整数/浮点数:
In [180]: df
Out[180]:
Gender Salary
0 F 83
1 M 58
2 F 21
3 F 32
4 M 98
5 F 75
6 F 10
7 M 73
8 F 82
9 M 15
10 F 58
11 F 31
12 M 74
13 F 61
14 M 12
In [181]: df.dtypes
Out[181]:
Gender object
Salary object
dtype: object
let's remove leading $
and convert Salary
to int
:
让我们删除前导$
并转换Salary
为int
:
In [182]: df.Salary = df.Salary.str.lstrip('$').astype(int)
In [183]: df.dtypes
Out[183]:
Gender object
Salary int32
dtype: object
and your Gender
column to categorical:
和你的Gender
专栏分类:
In [186]: df.Gender = df.Gender.astype('category')
In [187]: df.dtypes
Out[187]:
Gender category
Salary int32
dtype: object