pandas 在数据框的每一列中查找数据类型

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36822580/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 01:06:49  来源:igfitidea点击:

Find type of data in each column of dataframe

pythonpandasdataframecategorical-data

提问by xyz

I have read this link: Check which columns in DataFrame are Categorical

我已阅读此链接:检查 DataFrame 中的哪些列是分类的

I have a dataframe where salaries are mentioned with a $ prepended to it. It is also being shown as categorical data.

我有一个数据框,其中提到了薪水,并在其前面加上了 $。它也被显示为分类数据。

Moreover suppose my nominal data is not in form of strings such as 'F','M' etc. Then how do we classify which columns are numeric, categorical (with strings) and nominal?

此外,假设我的名义数据不是诸如“F”、“M”等字符串的形式。那么我们如何分类哪些列是数字、分类(带字符串)和名义列?

Say my data looks like this:

假设我的数据如下所示:

ID    Gender   Salary   HasPet  
1      M       0       0
2      F       00      0
3      M       00      1  

回答by MaxU

You are confusing categoricaldata type with strings(pandas shows it as object).

您将分类数据类型与字符串混淆(Pandas将其显示为object)。

Numbers can't contain $dollar sign by their nature and because of that pandas consider Salarycolumn as string and this is correctbehavior!

数字$本质上不能包含美元符号,因为Pandas将Salary列视为字符串,这是正确的行为!

You can easily convert your salary column to integer/float if you want though:

如果您愿意,您可以轻松地将您的工资列转换为整数/浮点数:

In [180]: df
Out[180]:
   Gender Salary
0       F  83
1       M  58
2       F  21
3       F  32
4       M  98
5       F  75
6       F  10
7       M  73
8       F  82
9       M  15
10      F  58
11      F  31
12      M  74
13      F  61
14      M  12

In [181]: df.dtypes
Out[181]:
Gender    object
Salary    object
dtype: object

let's remove leading $and convert Salaryto int:

让我们删除前导$并转换Salaryint

In [182]: df.Salary = df.Salary.str.lstrip('$').astype(int)

In [183]: df.dtypes
Out[183]:
Gender    object
Salary     int32
dtype: object

and your Gendercolumn to categorical:

和你的Gender专栏分类:

In [186]: df.Gender = df.Gender.astype('category')

In [187]: df.dtypes
Out[187]:
Gender    category
Salary       int32
dtype: object