为 Pandas DataFrame 列返回零或值的最大值

Question

提问by bjornarneson

I want to replace negative values in a pandas DataFrame column with zero.

我想用零替换Pandas DataFrame 列中的负值。

Is there a more concise way to construct this expression?

有没有更简洁的方法来构造这个表达式？

df['value'][df['value'] < 0] = 0

Answer 1

采纳答案by Jeff

Here is the canonical way of doing it, while not necessarily more concise, is more flexible (in that you can apply this to arbitrary columns)

这是这样做的规范方式，虽然不一定更简洁，但更灵活（因为您可以将其应用于任意列）

In [39]: df = DataFrame(randn(5,1),columns=['value'])

In [40]: df
Out[40]: 
      value
0  0.092232
1 -0.472784
2 -1.857964
3 -0.014385
4  0.301531

In [41]: df.loc[df['value']<0,'value'] = 0

In [42]: df
Out[42]: 
      value
0  0.092232
1  0.000000
2  0.000000
3  0.000000
4  0.301531

Answer 2

回答by unutbu

You could use the clip method:

您可以使用剪辑方法：

import pandas as pd
import numpy as np
df = pd.DataFrame({'value': np.arange(-5,5)})
df['value'] = df['value'].clip(0, None)
print(df)

yields

产量

Answer 3

回答by Dorian B.

Another possibility is numpy.maximum(). This is more straight-forward to read in my opinion.

另一种可能性是numpy.maximum()。在我看来，这是更直接的阅读。

import pandas as pd
import numpy as np
df['value'] = np.maximum(df.value, 0)

It's also significantly faster than all other methods.

它也比所有其他方法快得多。

df_orig = pd.DataFrame({'value': np.arange(-1000000, 1000000)})

df = df_orig.copy()
%timeit df['value'] = np.maximum(df.value, 0)
# 100 loops, best of 3: 8.36 ms per loop

df = df_orig.copy()
%timeit df['value'] = np.where(df.value < 0, 0, df.value)
# 100 loops, best of 3: 10.1 ms per loop

df = df_orig.copy()
%timeit df['value'] = df.value.clip(0, None)
# 100 loops, best of 3: 14.1 ms per loop

df = df_orig.copy()
%timeit df['value'] = df.value.clip_lower(0)
# 100 loops, best of 3: 14.2 ms per loop

df = df_orig.copy()
%timeit df.loc[df.value < 0, 'value'] = 0
# 10 loops, best of 3: 62.7 ms per loop

(notebook)

（笔记本）

Answer 4

回答by Max Ghenis

For completeness, np.whereis also a possibility, which is faster than most answers here. The np.maximumansweris the best approach though, as it's faster and more concise than this.

为了完整性，np.where也是一种可能性，这比这里的大多数答案都要快。的np.maximum回答是最好的方法，虽然，因为它是比这更快，更简洁。

df['value'] = np.where(df.value < 0, 0, df.value)

Answer 5

回答by Max Ghenis

df.value.clip_lower(0, inplace=True)is most concise, and is just about as fast as np.maximum, certainly faster than other methods here (notebook).

df.value.clip_lower(0, inplace=True)最简洁，并且与几乎一样快np.maximum，当然比这里的其他方法（notebook）快。

Answer 6

回答by U10-Forward

Or whereto check:

或where检查：

>>> import pandas as pd,numpy as np
>>> df = pd.DataFrame(np.random.randn(5,1),columns=['value'])
>>> df
      value
0  1.193313
1 -1.011003
2 -0.399778
3 -0.736607
4 -0.629540
>>> df['value']=df['value'].where(df['value']>0,0)
>>> df
      value
0  1.193313
1  0.000000
2  0.000000
3  0.000000
4  0.000000
>>>

Answer 7

回答by Coolkau

Let's take only values greater than zero, leaving those which are negative as NaN (works with frames not with series), then impute.

让我们只取大于零的值，将那些负值保留为 NaN（适用于框架而不是系列），然后进行估算。

df[df > 0].fillna(0)

为 Pandas DataFrame 列返回零或值的最大值

提问by bjornarneson

采纳答案by Jeff

回答by unutbu

回答by Dorian B.

回答by Max Ghenis

回答by Max Ghenis

回答by U10-Forward

回答by Coolkau

相关推荐

最近更新

标签

为 Pandas DataFrame 列返回零或值的最大值

提问by bjornarneson

采纳答案by Jeff

回答by unutbu

回答by Dorian B.

回答by Max Ghenis

回答by Max Ghenis

回答by U10-Forward

回答by Coolkau

相关推荐

如何使用 lambda 函数更改 pandas df 中任意列的名称？

pandas 提取值并从中创建新列

pandas 如何使用fill_value在pandas中重新采样TimeSeries？

将 Pandas DataFrame 旋转 90 度

相关推荐

最近更新

标签