Pandas 数据框中的分类变量？

Question

提问by Anton

I am working my way through Wes's Python For Data Analysis, and I've run into a strange problem that is not addressed in the book.

我正在研究 Wes 的 Python For Data Analysis，但遇到了书中没有解决的奇怪问题。

In the code below, based on page 199 of his book, I create a dataframe and then use pd.cut()to create cat_obj. According to the book, cat_objis

在下面的代码中，基于他书中的第 199 页，我创建了一个数据框，然后使用它pd.cut()来创建cat_obj. 根据书，cat_obj是

"a special Categorical object. You can treat it like an array of strings indicating the bin name; internally it contains a levels array indicating the distinct category names along with a labeling for the ages data in the labels attribute"

“一个特殊的 Categorical 对象。您可以将其视为指示 bin 名称的字符串数组；在内部，它包含一个指示不同类别名称的 levels 数组以及标签属性中年龄数据的标签”

Awesome! However, if I use the exact same pd.cut()code (In [5] below) to create a new column of the dataframe (called df['cat']), that column is not treated as a special categorical variablebut simply as a regular pandas series.

惊人的！但是，如果我使用完全相同的pd.cut()代码（在下面的 [5] 中）创建数据框的新列（称为df['cat']），则该列不会被视为特殊的分类变量，而只会被视为常规的Pandas系列。

How, then, do I create a column in a dataframe that is treated as a categorical variable?

那么，如何在被视为分类变量的数据框中创建一列？

In [4]:

import pandas as pd

raw_data = {'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'], 
        'score': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['name', 'score'])

bins = [0, 25, 50, 75, 100]
group_names = ['Low', 'Okay', 'Good', 'Great']

In [5]:
cat_obj = pd.cut(df['score'], bins, labels=group_names)
df['cat'] = pd.cut(df['score'], bins, labels=group_names)
In [7]:

type(cat_obj)
Out[7]:
pandas.core.categorical.Categorical
In [8]:

type(df['cat'])
Out[8]:
pandas.core.series.Series

Answer 1

回答by xrage

It might be happening because of this kind of behaviour by setter-:

这可能是由于 setter- 的这种行为而发生的：

Sample getter and setter-

示例 getter 和 setter-

class a:
    x = 1
    @property
    def p(self):
        return int(self.x)

    @p.setter
    def p(self,v):
        self.x = v
t = 1.32
a().p = 1.32


print type(t) --> <type 'float'>
print type(a().p) --> <type 'int'>

For now dfonly accepts Series dataand its setter converts Categorial datainto Series. dfcategorial support is due in Next Pandas release.

现在df只接受Series data并且它的 setter 转换Categorial data为Series. df类别支持将在 Next Pandas 版本中到期。

Answer 2

回答by undershock

From http://pandas-docs.github.io/pandas-docs-travis/categorical.html, from pandas 0.15 onwards

从http://pandas-docs.github.io/pandas-docs-travis/categorical.html 开始，从Pandas 0.15 开始

Specify dtype="category" when constructing a Series:

构造系列时指定 dtype="category"：

In [1]: s = pd.Series(["a","b","c","a"], dtype="category")

In [2]: s
Out[2]: 
0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): [a, b, c]

You can then add this to an existing series.

然后，您可以将其添加到现有系列中。

Or convert an existing Series or column to a category dtype:

或者将现有的系列或列转换为类别 dtype：

In [3]: df = pd.DataFrame({"A":["a","b","c","a"]})

In [4]: df["B"] = df["A"].astype('category')

In [5]: df
Out[5]: 
   A  B
0  a  a
1  b  b
2  c  c
3  a  a

Answer 3

回答by jmxp

Right now, you can't have categorical data in a Series or DataFrame object, but this functionality will be implemented in Pandas 0.15(due in September).

现在，您不能在 Series 或 DataFrame 对象中包含分类数据，但此功能将在Pandas 0.15（九月到期）中实现。

Pandas 数据框中的分类变量？

提问by Anton

回答by xrage

回答by undershock

回答by jmxp

相关推荐

最近更新

标签

Pandas 数据框中的分类变量？

提问by Anton

回答by xrage

回答by undershock

回答by jmxp

相关推荐

将 Pandas 数据框列添加到新数据框

pandas value_counts 应用于每列

pandas 熊猫get_data_yahoo面板数据表

如何使用多索引移动 Pandas DataFrame？

相关推荐

最近更新

标签