Python LabelEncoder:类型错误:“float”和“str”的实例之间不支持“>”

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/46406720/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 17:37:12  来源:igfitidea点击:

LabelEncoder: TypeError: '>' not supported between instances of 'float' and 'str'

pythonpandasscikit-learn

提问by pceccon

I'm facing this error for multiple variables even treating missing values. For example:

即使处理缺失值,我也面临多个变量的错误。例如:

le = preprocessing.LabelEncoder()
categorical = list(df.select_dtypes(include=['object']).columns.values)
for cat in categorical:
    print(cat)
    df[cat].fillna('UNK', inplace=True)
    df[cat] = le.fit_transform(df[cat])
#     print(le.classes_)
#     print(le.transform(le.classes_))


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-24-424a0952f9d0> in <module>()
      4     print(cat)
      5     df[cat].fillna('UNK', inplace=True)
----> 6     df[cat] = le.fit_transform(df[cat].fillna('UNK'))
      7 #     print(le.classes_)
      8 #     print(le.transform(le.classes_))

C:\Users\paula.ceccon.ribeiro\AppData\Local\Continuum\Anaconda3\lib\site-packages\sklearn\preprocessing\label.py in fit_transform(self, y)
    129         y = column_or_1d(y, warn=True)
    130         _check_numpy_unicode_bug(y)
--> 131         self.classes_, y = np.unique(y, return_inverse=True)
    132         return y
    133 

C:\Users\paula.ceccon.ribeiro\AppData\Local\Continuum\Anaconda3\lib\site-packages\numpy\lib\arraysetops.py in unique(ar, return_index, return_inverse, return_counts)
    209 
    210     if optional_indices:
--> 211         perm = ar.argsort(kind='mergesort' if return_index else 'quicksort')
    212         aux = ar[perm]
    213     else:

TypeError: '>' not supported between instances of 'float' and 'str'

Checking the variable that lead to the error results ins:

检查导致错误结果的变量:

df['CRM do Médico'].isnull().sum()
0

Besides nan values, what could be causing this error?

除了 nan 值,什么可能导致此错误?

回答by sgDysregulation

This is due to the series df[cat]containing elements that have varying data types e.g.(strings and/or floats). This could be due to the way the data is read, i.e. numbers are read as float and text as strings or the datatype was float and changed after the fillnaoperation.

这是由于df[cat]包含具有不同数据类型的元素的系列,例如(字符串和/或浮点数)。这可能是由于读取数据的方式,即数字被读取为浮点数和文本作为字符串或数据类型是浮点数并在fillna操作后更改。

In other words

换句话说

pandas data type 'Object' indicates mixed types rather than str type

pandas 数据类型“Object”表示混合类型而不是 str 类型

so using the following line:

所以使用以下行:

df[cat] = le.fit_transform(df[cat].astype(str))


should help


应该有帮助

回答by Rutuja

As string data types have variable length, it is by default stored as object type. I faced this problem after treating missing values too. Converting all those columns to type 'category' before label encoding worked in my case.

由于字符串数据类型具有可变长度,因此默认情况下存储为对象类型。在处理缺失值后,我也遇到了这个问题。在标签编码在我的情况下工作之前,将所有这些列转换为“类别”类型。

df[cat]=df[cat].astype('category')

And then check df.dtypes and perform label encoding.

然后检查 df.dtypes 并执行标签编码。

回答by Max Kleiner

Or use a cast with split to uniform type of str

或者使用带有 split 到统一类型 str 的强制转换

unique, counts = numpy.unique(str(a).split(), return_counts=True)