pandas 熊猫：根据值落在范围内的位置分配类别

Question

提问by Johnny Metz

I have the following ranges and a pandas DataFrame:

我有以下范围和一个Pandas数据帧：

x >= 0        # success
-10 <= x < 0  # warning
X < -10       # danger

df = pd.DataFrame({'x': [2, 1], 'y': [-7, -5], 'z': [-30, -20]})

I'd like to categorize the values in the DataFrame based on where they fall within the defined ranges. So I'd like the final DF to look something like this:

我想根据它们在定义范围内的位置对 DataFrame 中的值进行分类。所以我希望最终的 DF 看起来像这样：

    x    y    z    x_cat    y_cat    z_cat
0   2   -7  -30  success  warning   danger
1   1   -5  -20  success  warning   danger

I've tried using the categorydatatype but it doesn't appear I can define a range anywhere.

我试过使用category数据类型，但似乎我不能在任何地方定义范围。

for category_column, value_column in zip(['x_cat', 'y_cat', 'z_cat'], ['x', 'y', 'z']):
    df[category_column] = df[value_column].astype('category')

Can I use the categorydatatype? If not, what can I do here?

我可以使用category数据类型吗？如果没有，我可以在这里做什么？

Answer 1

回答by piRSquared

pandas.cut

c = pd.cut(
    df.stack(),
    [-np.inf, -10, 0, np.inf],
    labels=['danger', 'warning', 'success']
)
df.join(c.unstack().add_suffix('_cat'))

   x  y   z    x_cat    y_cat   z_cat
0  2 -7 -30  success  warning  danger
1  1 -5 -20  success  warning  danger

numpy

v = df.values
cats = np.array(['danger', 'warning', 'success'])
code = np.searchsorted([-10, 0], v.ravel()).reshape(v.shape)
cdf = pd.DataFrame(cats[code], df.index, df.columns)
df.join(cdf.add_suffix('_cat'))

   x  y   z    x_cat    y_cat   z_cat
0  2 -7 -30  success  warning  danger
1  1 -5 -20  success  warning  danger

Answer 2

回答by plasmon360

you could use assign to make new columns. for each new column use apply to filter the series.

您可以使用assign来创建新列。对于每个新列，使用 apply 来过滤系列。

df.assign(x_cat = lambda v: v.x.apply(lambda x: 'Sucess' if x>=0 else None),
         y_cat = lambda v: v.y.apply(lambda x: 'warning' if -10<=x<0 else None),
         z_cat = lambda v: v.z.apply(lambda x: 'danger' if x<=-10 else None),)

will result in

会导致

    x   y   z   x_cat   y_cat   z_cat
0   2   -7  -30 Sucess  warning danger
1   1   -5  -20 Sucess  warning danger

Answer 3

回答by FLab

You can use pandas cut, but you need to apply it column by column (just because the function operates on 1-d input):

您可以使用 pandas cut，但您需要逐列应用它（只是因为该函数对一维输入进行操作）：

labels = df.apply(lambda x: pd.cut(x, [-np.inf, -10, 0, np.inf], labels = ['danger', 'warning', 'success']))

          x        y       z
0  success  warning  danger
1  success  warning  danger

So you can do:

所以你可以这样做：

pd.concat([df, labels.add_prefix('_cat')], axis = 1)

   x  y   z     cat_x     cat_y    cat_z
0  2 -7 -30  success  warning  danger
1  1 -5 -20  success  warning  danger

Answer 4

回答by Woody Pride

You could write a little function and then pass each series to the function using apply:

您可以编写一个小函数，然后使用 apply 将每个系列传递给该函数：

df = pd.DataFrame({'x': [2, 1], 'y': [-7, -5], 'z': [-30, -20]})

def cat(x):
    if x <-10:
        return "Danger"
    if x < 0:
        return "Warning"
    return "Success"

for col in df.columns:
    df[col] = df[col].apply(lambda x: cat(x))

Answer 5

回答by Quentin

Here's a ternary method for this type of thing.

这是这种类型的三元方法。

filter_method = lambda x: 'success' if x >= 0 else 'warning' if (x < 0 and x >= -10) else 'danger' if x < -10 else None
df[category_column] = df[value_column].apply(filter_method)

pandas 熊猫：根据值落在范围内的位置分配类别

提问by Johnny Metz

回答by piRSquared

回答by plasmon360

回答by FLab

回答by Woody Pride

回答by Quentin

相关推荐

最近更新

标签

pandas 熊猫：根据值落在范围内的位置分配类别

提问by Johnny Metz

回答by piRSquared

回答by plasmon360

回答by FLab

回答by Woody Pride

回答by Quentin

相关推荐

pandas 熊猫删除值小于给定值的行

pandas 如何根据pandas中的列名删除重复的列数据

pandas 将列添加到python中的数据集

python：pandas np.where 与 df.loc 具有多种条件

相关推荐

最近更新

标签