pandas 在熊猫中标记变量?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22500108/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:49:24  来源:igfitidea点击:

Label a Variable in pandas?

pythonpandas

提问by Christian Sauer

I am fairly new to pandas and come from a statistics background and I am struggling with a conceptual problem: Pandas has columns, who are containing values. But sometimes values have a special meaning - in a statistical program like SPSS or R called a "label".

我对 Pandas 相当陌生,并且来自统计背景,我正在努力解决一个概念问题:Pandas 有包含值的列。但有时值具有特殊含义——在 SPSS 或 R 等统计程序中称为“标签”。

Imagine a column "rain" with two values 0"no rain" and 1 "raining" - is there a way to assign these labels to the columns?

想象一列“rain”有两个值 0“norain”和 1“raining”——有没有办法将这些标签分配给这些列?

Is there a way to do this in pandas, too? Mainly for platting and visualisation purposes.

有没有办法在Pandas中做到这一点?主要用于拼盘和可视化目的。

采纳答案by cd98

There's not need to use a mapanymore. Since version 0.15, Pandas allows a categorical data type for its columns. The stored data takes less space, operations on it are faster and you can use labels.

不再需要使用 amap了。从 0.15 版本开始,Pandas 允许其列使用分类数据类型。存储的数据占用更少的空间,对它的操作更快,您可以使用标签。

I'm taking an example from the pandas docs:

我从Pandas文档中举了一个例子:

df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})
#Recast grade as a categorical variable
df["grade"] = df["raw_grade"].astype("category")

df["grade"]

#Gives this:
Out[124]: 
0    a
1    b
2    b
3    a
4    a
5    e
Name: grade, dtype: category
Categories (3, object): [a, b, e]

You can also rename categories and add missing categories

您还可以重命名类别并添加缺少的类别

回答by grasshopper

You could have a separate dictionary which maps values to labels:

您可以有一个单独的字典来将值映射到标签:

 d={0:"no rain",1:"raining"}

and then you could access the labelled data by doing

然后您可以通过执行访问标记数据

 df.rain_column.apply(lambda x:d[x])