Pandas 可以按行执行 min() 和 max() 函数吗?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/45966493/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 04:21:14  来源:igfitidea点击:

Can Pandas perform row-wise min() and max() functions?

pythonpandasdataframe

提问by stevendesu

In my DataFrame I wish to clip the value of a particular column between 0 and 100. For instance, given the following:

在我的 DataFrame 中,我希望将特定列的值剪辑在 0 到 100 之间。例如,给定以下内容:

  a  b
0 10 90
1 20 150
2 30 -30

I want to get:

我想得到:

  a  b   c
0 10 90  90
1 20 150 100
2 30 -30 0

I know that in Pandas certain arithmetic operations work across columns. For instance, I could double every number in column blike so:

我知道在 Pandas 中,某些算术运算跨列工作。例如,我可以将列中的每个数字都加倍,b如下所示:

>>>df["c"] = df["b"] * 2
>>>df
  a  b   c
0 10 90  180
1 20 150 300
2 30 -30 -60

However this doesn't work for built-in functions like minand max:

然而,这并不为内置功能,如工作minmax

>>>df["c"] = min(100, max(0, df["b"]))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Is there some way to accomplish what I want efficiently?

有什么方法可以有效地完成我想要的吗?

回答by jezrael

You can use Series.clip:

您可以使用Series.clip

df['c'] = df['b'].clip(0,100)
print (df)
    a    b    c
0  10   90   90
1  20  150  100
2  30  -30    0

回答by Dat Chu

You can use the Pandas min function across an axis. Then combine it with min/max

您可以跨轴使用 Pandas min 函数。然后将其与 min/max 结合

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.min.html

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.min.html

For example

例如

df.max(axis=1)

But it looks like you want to clip the values instead of min/max.

但看起来您想要剪辑这些值而不是最小/最大。

回答by piRSquared

A numpyview. Not as elegant as clip.

一个numpy视图。没有那么优雅clip

Option 1

选项1

df.assign(c=np.minimum(np.maximum(df.b.values, 0), 100))

    a    b    c
0  10   90   90
1  20  150  100
2  30  -30    0


Option 2

选项 2

b = df.b.values
df.assign(c=np.where(b > 100, 100, np.where(b < 0, 0, b)))

    a    b    c
0  10   90   90
1  20  150  100
2  30  -30    0


Timing
Code Below

下面的时序
代码

res.div(res.min(1), 0)

            pir1  pir2       jez1
10     30.895514   1.0  75.210427
30     28.611177   1.0  49.913498
100    20.658307   1.0  50.823106
300    19.842134   1.0  39.162901
1000   14.078159   1.0  25.148937
3000    8.767133   1.0  15.066847
10000   4.377849   1.0   8.849138
30000   2.634263   1.0   4.653956

enter image description here

在此处输入图片说明

res = pd.DataFrame(
    index=[10, 30, 100, 300, 1000, 3000, 10000, 30000],
    columns=['pir1', 'pir2', 'jez1'],
    dtype=float
)

jez1 = lambda d: d.assign(c=df.b.clip(0, 1))
pir1 = lambda d: d.assign(c=np.minimum(np.maximum(d.b.values, 0), 100))
pir2 = lambda d: (lambda b: np.where(b > 100, 100, np.where(b < 0, 0, b)))(d.b.values)

for i in res.index:
    d = pd.concat([df] * i, ignore_index=True)
    for j in res.columns:
        stmt = '{}(d)'.format(j)
        setp = 'from __main__ import d, {}'.format(j)
        res.at[i, j] = timeit(stmt, setp, number=10)

res.plot(loglog=True)