Pandas 可以按行执行 min() 和 max() 函数吗？

Question

提问by stevendesu

In my DataFrame I wish to clip the value of a particular column between 0 and 100. For instance, given the following:

在我的 DataFrame 中，我希望将特定列的值剪辑在 0 到 100 之间。例如，给定以下内容：

I want to get:

我想得到：

  a  b   c
0 10 90  90
1 20 150 100
2 30 -30 0

I know that in Pandas certain arithmetic operations work across columns. For instance, I could double every number in column blike so:

我知道在 Pandas 中，某些算术运算跨列工作。例如，我可以将列中的每个数字都加倍，b如下所示：

>>>df["c"] = df["b"] * 2
>>>df
  a  b   c
0 10 90  180
1 20 150 300
2 30 -30 -60

However this doesn't work for built-in functions like minand max:

然而，这并不为内置功能，如工作min和max：

>>>df["c"] = min(100, max(0, df["b"]))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Is there some way to accomplish what I want efficiently?

有什么方法可以有效地完成我想要的吗？

Answer 1

回答by jezrael

You can use Series.clip:

您可以使用Series.clip：

df['c'] = df['b'].clip(0,100)
print (df)
    a    b    c
0  10   90   90
1  20  150  100
2  30  -30    0

Answer 2

回答by Dat Chu

You can use the Pandas min function across an axis. Then combine it with min/max

您可以跨轴使用 Pandas min 函数。然后将其与 min/max 结合

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.min.html

For example

例如

df.max(axis=1)

But it looks like you want to clip the values instead of min/max.

但看起来您想要剪辑这些值而不是最小/最大。

Answer 3

回答by piRSquared

A numpyview. Not as elegant as clip.

一个numpy视图。没有那么优雅clip。

Option 1

选项1

df.assign(c=np.minimum(np.maximum(df.b.values, 0), 100))

    a    b    c
0  10   90   90
1  20  150  100
2  30  -30    0

Option 2

选项 2

b = df.b.values
df.assign(c=np.where(b > 100, 100, np.where(b < 0, 0, b)))

    a    b    c
0  10   90   90
1  20  150  100
2  30  -30    0

Timing
Code Below

下面的时序
代码

res.div(res.min(1), 0)

            pir1  pir2       jez1
10     30.895514   1.0  75.210427
30     28.611177   1.0  49.913498
100    20.658307   1.0  50.823106
300    19.842134   1.0  39.162901
1000   14.078159   1.0  25.148937
3000    8.767133   1.0  15.066847
10000   4.377849   1.0   8.849138
30000   2.634263   1.0   4.653956

res = pd.DataFrame(
    index=[10, 30, 100, 300, 1000, 3000, 10000, 30000],
    columns=['pir1', 'pir2', 'jez1'],
    dtype=float
)

jez1 = lambda d: d.assign(c=df.b.clip(0, 1))
pir1 = lambda d: d.assign(c=np.minimum(np.maximum(d.b.values, 0), 100))
pir2 = lambda d: (lambda b: np.where(b > 100, 100, np.where(b < 0, 0, b)))(d.b.values)

for i in res.index:
    d = pd.concat([df] * i, ignore_index=True)
    for j in res.columns:
        stmt = '{}(d)'.format(j)
        setp = 'from __main__ import d, {}'.format(j)
        res.at[i, j] = timeit(stmt, setp, number=10)

res.plot(loglog=True)

Pandas 可以按行执行 min() 和 max() 函数吗？

提问by stevendesu

回答by jezrael

回答by Dat Chu

回答by piRSquared

相关推荐

最近更新

标签

Pandas 可以按行执行 min() 和 max() 函数吗？

提问by stevendesu

回答by jezrael

回答by Dat Chu

回答by piRSquared

相关推荐

Python 的 Pandas：例外：数据必须是一维的

pandas 熊猫：平衡数据

向 Pandas 数据框插入一列

pandas 类型错误：'DataFrame' 对象不可调用 python 函数

相关推荐

最近更新

标签