pandas 如何将字符串标签转换为数值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/44496057/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:46:11  来源:igfitidea点击:

How to convert string labels to numeric values

pythonpython-2.7csvpandas

提问by T T

I have a csv file(delimiter=,) containing following fields

我有一个包含以下字段的 csv 文件(分隔符 =,)

filename labels
xyz.png  cat
pqz.png  dog
abc.png  mouse           

there is a list containing all the classes

有一个包含所有类的列表

data-classes = ["cat", "dog", "mouse"]

Question : How to replace the string labels in csv with the index of the labels data-classes (i.e. if label == catthen label should change to 0 ) and save it in csv file.

问题:如何将 csv 中的字符串标签替换为标签数据类的索引(即如果label == cat标签应更改为 0 )并将其保存在 csv 文件中。

回答by EdChum

Assuming that all classes are present in your list you can do this using applyand call indexon the list to return the ordinal position of the class in the list:

假设您的列表中存在所有类,您可以使用apply并调用index列表来返回该类在列表中的顺序位置:

In[5]:
df['labels'].apply(data_classes.index)

Out[5]: 
0    0
1    1
2    2
Name: labels, dtype: int64

However, it will be faster to define a dict of your mapping and pass this an use mapIMO as this is cython-ised so should be faster:

但是,定义映射的 dict 并将其传递给 use mapIMO 会更快,因为这是 cython-ised 所以应该更快:

In[7]:
d = dict(zip(data_classes, range(0,3)))
d

Out[7]: {'cat': 0, 'dog': 1, 'mouse': 2}

In[8]:
df['labels'].map(d, na_action='ignore')

Out[8]: 
0    0
1    1
2    2
Name: labels, dtype: int64

If there are classes not present then NaNis returned

如果不存在类,则NaN返回