pandas 如何将字符串标签转换为数值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/44496057/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert string labels to numeric values
提问by T T
I have a csv file(delimiter=,) containing following fields
我有一个包含以下字段的 csv 文件(分隔符 =,)
filename labels
xyz.png cat
pqz.png dog
abc.png mouse
there is a list containing all the classes
有一个包含所有类的列表
data-classes = ["cat", "dog", "mouse"]
Question : How to replace the string labels in csv with the index of the labels data-classes (i.e. if label == cat
then label should change to 0 ) and save it in csv file.
问题:如何将 csv 中的字符串标签替换为标签数据类的索引(即如果label == cat
标签应更改为 0 )并将其保存在 csv 文件中。
回答by EdChum
Assuming that all classes are present in your list you can do this using apply
and call index
on the list to return the ordinal position of the class in the list:
假设您的列表中存在所有类,您可以使用apply
并调用index
列表来返回该类在列表中的顺序位置:
In[5]:
df['labels'].apply(data_classes.index)
Out[5]:
0 0
1 1
2 2
Name: labels, dtype: int64
However, it will be faster to define a dict of your mapping and pass this an use map
IMO as this is cython-ised so should be faster:
但是,定义映射的 dict 并将其传递给 use map
IMO 会更快,因为这是 cython-ised 所以应该更快:
In[7]:
d = dict(zip(data_classes, range(0,3)))
d
Out[7]: {'cat': 0, 'dog': 1, 'mouse': 2}
In[8]:
df['labels'].map(d, na_action='ignore')
Out[8]:
0 0
1 1
2 2
Name: labels, dtype: int64
If there are classes not present then NaN
is returned
如果不存在类,则NaN
返回