为 Pandas DataFrame 列返回零或值的最大值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/17068269/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Return max of zero or value for a pandas DataFrame column
提问by bjornarneson
I want to replace negative values in a pandas DataFrame column with zero.
我想用零替换Pandas DataFrame 列中的负值。
Is there a more concise way to construct this expression?
有没有更简洁的方法来构造这个表达式?
df['value'][df['value'] < 0] = 0
采纳答案by Jeff
Here is the canonical way of doing it, while not necessarily more concise, is more flexible (in that you can apply this to arbitrary columns)
这是这样做的规范方式,虽然不一定更简洁,但更灵活(因为您可以将其应用于任意列)
In [39]: df = DataFrame(randn(5,1),columns=['value'])
In [40]: df
Out[40]:
value
0 0.092232
1 -0.472784
2 -1.857964
3 -0.014385
4 0.301531
In [41]: df.loc[df['value']<0,'value'] = 0
In [42]: df
Out[42]:
value
0 0.092232
1 0.000000
2 0.000000
3 0.000000
4 0.301531
回答by unutbu
You could use the clip method:
您可以使用剪辑方法:
import pandas as pd
import numpy as np
df = pd.DataFrame({'value': np.arange(-5,5)})
df['value'] = df['value'].clip(0, None)
print(df)
yields
产量
value
0 0
1 0
2 0
3 0
4 0
5 0
6 1
7 2
8 3
9 4
回答by Dorian B.
Another possibility is numpy.maximum(). This is more straight-forward to read in my opinion.
另一种可能性是numpy.maximum()。在我看来,这是更直接的阅读。
import pandas as pd
import numpy as np
df['value'] = np.maximum(df.value, 0)
It's also significantly faster than all other methods.
它也比所有其他方法快得多。
df_orig = pd.DataFrame({'value': np.arange(-1000000, 1000000)})
df = df_orig.copy()
%timeit df['value'] = np.maximum(df.value, 0)
# 100 loops, best of 3: 8.36 ms per loop
df = df_orig.copy()
%timeit df['value'] = np.where(df.value < 0, 0, df.value)
# 100 loops, best of 3: 10.1 ms per loop
df = df_orig.copy()
%timeit df['value'] = df.value.clip(0, None)
# 100 loops, best of 3: 14.1 ms per loop
df = df_orig.copy()
%timeit df['value'] = df.value.clip_lower(0)
# 100 loops, best of 3: 14.2 ms per loop
df = df_orig.copy()
%timeit df.loc[df.value < 0, 'value'] = 0
# 10 loops, best of 3: 62.7 ms per loop
(notebook)
(笔记本)
回答by Max Ghenis
For completeness, np.whereis also a possibility, which is faster than most answers here. The np.maximumansweris the best approach though, as it's faster and more concise than this.
为了完整性,np.where也是一种可能性,这比这里的大多数答案都要快。的np.maximum回答是最好的方法,虽然,因为它是比这更快,更简洁。
df['value'] = np.where(df.value < 0, 0, df.value)
回答by Max Ghenis
回答by U10-Forward
Or whereto check:
或where检查:
>>> import pandas as pd,numpy as np
>>> df = pd.DataFrame(np.random.randn(5,1),columns=['value'])
>>> df
value
0 1.193313
1 -1.011003
2 -0.399778
3 -0.736607
4 -0.629540
>>> df['value']=df['value'].where(df['value']>0,0)
>>> df
value
0 1.193313
1 0.000000
2 0.000000
3 0.000000
4 0.000000
>>>
回答by Coolkau
Let's take only values greater than zero, leaving those which are negative as NaN (works with frames not with series), then impute.
让我们只取大于零的值,将那些负值保留为 NaN(适用于框架而不是系列),然后进行估算。
df[df > 0].fillna(0)

