pandas 在熊猫中标记变量？

Question

提问by Christian Sauer

I am fairly new to pandas and come from a statistics background and I am struggling with a conceptual problem: Pandas has columns, who are containing values. But sometimes values have a special meaning - in a statistical program like SPSS or R called a "label".

我对 Pandas 相当陌生，并且来自统计背景，我正在努力解决一个概念问题：Pandas 有包含值的列。但有时值具有特殊含义——在 SPSS 或 R 等统计程序中称为“标签”。

Imagine a column "rain" with two values 0"no rain" and 1 "raining" - is there a way to assign these labels to the columns?

想象一列“rain”有两个值 0“norain”和 1“raining”——有没有办法将这些标签分配给这些列？

Is there a way to do this in pandas, too? Mainly for platting and visualisation purposes.

有没有办法在Pandas中做到这一点？主要用于拼盘和可视化目的。

Answer 1

采纳答案by cd98

There's not need to use a mapanymore. Since version 0.15, Pandas allows a categorical data type for its columns. The stored data takes less space, operations on it are faster and you can use labels.

不再需要使用 amap了。从 0.15 版本开始，Pandas 允许其列使用分类数据类型。存储的数据占用更少的空间，对它的操作更快，您可以使用标签。

I'm taking an example from the pandas docs:

我从Pandas文档中举了一个例子：

df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})
#Recast grade as a categorical variable
df["grade"] = df["raw_grade"].astype("category")

df["grade"]

#Gives this:
Out[124]: 
0    a
1    b
2    b
3    a
4    a
5    e
Name: grade, dtype: category
Categories (3, object): [a, b, e]

You can also rename categories and add missing categories

您还可以重命名类别并添加缺少的类别

Answer 2

回答by grasshopper

You could have a separate dictionary which maps values to labels:

您可以有一个单独的字典来将值映射到标签：

 d={0:"no rain",1:"raining"}

and then you could access the labelled data by doing

然后您可以通过执行访问标记数据

 df.rain_column.apply(lambda x:d[x])

pandas 在熊猫中标记变量？

提问by Christian Sauer

采纳答案by cd98

回答by grasshopper

相关推荐

最近更新

标签

pandas 在熊猫中标记变量？

提问by Christian Sauer

采纳答案by cd98

回答by grasshopper

相关推荐

从 MySQL 将数值数据加载到 python/pandas/numpy 数组的最快方法

pandas Python：四舍五入到最接近的秒和分钟

pandas 根据另一列中的值将值添加到熊猫数据框的一列

pandas 熊猫合并列，但不合并“关键”列

相关推荐

最近更新

标签