Pandas 可以按行执行 min() 和 max() 函数吗?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/45966493/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Can Pandas perform row-wise min() and max() functions?
提问by stevendesu
In my DataFrame I wish to clip the value of a particular column between 0 and 100. For instance, given the following:
在我的 DataFrame 中,我希望将特定列的值剪辑在 0 到 100 之间。例如,给定以下内容:
a b
0 10 90
1 20 150
2 30 -30
I want to get:
我想得到:
a b c
0 10 90 90
1 20 150 100
2 30 -30 0
I know that in Pandas certain arithmetic operations work across columns. For instance, I could double every number in column b
like so:
我知道在 Pandas 中,某些算术运算跨列工作。例如,我可以将列中的每个数字都加倍,b
如下所示:
>>>df["c"] = df["b"] * 2
>>>df
a b c
0 10 90 180
1 20 150 300
2 30 -30 -60
However this doesn't work for built-in functions like min
and max
:
然而,这并不为内置功能,如工作min
和max
:
>>>df["c"] = min(100, max(0, df["b"]))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Is there some way to accomplish what I want efficiently?
有什么方法可以有效地完成我想要的吗?
回答by jezrael
You can use Series.clip
:
您可以使用Series.clip
:
df['c'] = df['b'].clip(0,100)
print (df)
a b c
0 10 90 90
1 20 150 100
2 30 -30 0
回答by Dat Chu
You can use the Pandas min function across an axis. Then combine it with min/max
您可以跨轴使用 Pandas min 函数。然后将其与 min/max 结合
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.min.html
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.min.html
For example
例如
df.max(axis=1)
But it looks like you want to clip the values instead of min/max.
但看起来您想要剪辑这些值而不是最小/最大。
回答by piRSquared
A numpy
view. Not as elegant as clip
.
一个numpy
视图。没有那么优雅clip
。
Option 1
选项1
df.assign(c=np.minimum(np.maximum(df.b.values, 0), 100))
a b c
0 10 90 90
1 20 150 100
2 30 -30 0
Option 2
选项 2
b = df.b.values
df.assign(c=np.where(b > 100, 100, np.where(b < 0, 0, b)))
a b c
0 10 90 90
1 20 150 100
2 30 -30 0
Timing
Code Below
下面的时序
代码
res.div(res.min(1), 0)
pir1 pir2 jez1
10 30.895514 1.0 75.210427
30 28.611177 1.0 49.913498
100 20.658307 1.0 50.823106
300 19.842134 1.0 39.162901
1000 14.078159 1.0 25.148937
3000 8.767133 1.0 15.066847
10000 4.377849 1.0 8.849138
30000 2.634263 1.0 4.653956
res = pd.DataFrame(
index=[10, 30, 100, 300, 1000, 3000, 10000, 30000],
columns=['pir1', 'pir2', 'jez1'],
dtype=float
)
jez1 = lambda d: d.assign(c=df.b.clip(0, 1))
pir1 = lambda d: d.assign(c=np.minimum(np.maximum(d.b.values, 0), 100))
pir2 = lambda d: (lambda b: np.where(b > 100, 100, np.where(b < 0, 0, b)))(d.b.values)
for i in res.index:
d = pd.concat([df] * i, ignore_index=True)
for j in res.columns:
stmt = '{}(d)'.format(j)
setp = 'from __main__ import d, {}'.format(j)
res.at[i, j] = timeit(stmt, setp, number=10)
res.plot(loglog=True)