Python 如何用零替换 Pandas Data Frame 中的负数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27759084/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 02:13:35  来源:igfitidea点击:

How to replace negative numbers in Pandas Data Frame by zero

pythonpandasdataframereplacenegative-number

提问by Hangon

I would like to know if there is someway of replacing all DataFrame negative numbers by zeros?

我想知道是否有办法用零替换所有 DataFrame 负数?

采纳答案by Lev Levitsky

If all your columns are numeric, you can use boolean indexing:

如果所有列都是数字,则可以使用布尔索引:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})

In [3]: df
Out[3]: 
   a  b
0  0 -3
1 -1  2
2  2  1

In [4]: df[df < 0] = 0

In [5]: df
Out[5]: 
   a  b
0  0  0
1  0  2
2  2  1


For the more general case, this answershows the private method _get_numeric_data:

对于更一般的情况,这个答案显示了私有方法_get_numeric_data

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1],
                           'c': ['foo', 'goo', 'bar']})

In [3]: df
Out[3]: 
   a  b    c
0  0 -3  foo
1 -1  2  goo
2  2  1  bar

In [4]: num = df._get_numeric_data()

In [5]: num[num < 0] = 0

In [6]: df
Out[6]: 
   a  b    c
0  0  0  foo
1  0  2  goo
2  2  1  bar


With timedeltatype, boolean indexing seems to work on separate columns, but not on the whole dataframe. So you can do:

对于timedelta类型,布尔索引似乎适用于单独的列,但不适用于整个数据帧。所以你可以这样做:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
   ...:                    'b': pd.to_timedelta([-3, 2, 1], 'd')})

In [3]: df
Out[3]: 
        a       b
0  0 days -3 days
1 -1 days  2 days
2  2 days  1 days

In [4]: for k, v in df.iteritems():
   ...:     v[v < 0] = 0
   ...:     

In [5]: df
Out[5]: 
       a      b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days


Update:comparison with a pd.Timedeltaworks on the whole DataFrame:

更新:pd.Timedelta整个 DataFrame 上的作品进行比较:

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
   ...:                    'b': pd.to_timedelta([-3, 2, 1], 'd')})

In [3]: df[df < pd.Timedelta(0)] = 0

In [4]: df
Out[4]: 
       a      b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days

回答by aus_lacy

Perhaps you could use pandas.where(args)like so:

也许你可以这样使用pandas.where(args)

data_frame = data_frame.where(data_frame < 0, 0)

回答by follyroof

Another succinct way of doing this is pandas.DataFrame.clip.

另一种简洁的方法是pandas.DataFrame.clip

For example:

例如:

import pandas as pd

In [20]: df = pd.DataFrame({'a': [-1, 100, -2]})

In [21]: df
Out[21]: 
     a
0   -1
1  100
2   -2

In [22]: df.clip(lower=0)
Out[22]: 
     a
0    0
1  100
2    0

There's also df.clip_lower(0).

还有df.clip_lower(0)

回答by MarKo9

If you are dealing with a large df (40m x 700 in my case) it works much faster and memory savvy through iteration on columns with something like.

如果您正在处理大型 df(在我的情况下为 40m x 700),它的工作速度会更快,并且通过对类似列的迭代来了解内存。

for col in df.columns:
    df[col][df[col] < 0] = 0

回答by Michael Conlin

Another clean option that I have found useful is pandas.DataFrame.maskwhich will "replace values where the condition is true."

我发现另一个有用的干净选项是 pandas.DataFrame.mask,它将“替换条件为真的值”。

Create the DataFrame:

创建数据框:

In [2]: import pandas as pd

In [3]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})

In [4]: df
Out[4]: 
   a  b
0  0 -3
1 -1  2
2  2  1

Replace negative numbers with 0:

用 0 替换负数:

In [5]: df.mask(df < 0, 0)
Out[5]: 
   a  b
0  0  0
1  0  2
2  2  1

Or, replace negative numbers with NaN, which I frequently need:

或者,用 NaN 替换负数,这是我经常需要的:

In [7]: df.mask(df < 0)
Out[7]: 
     a    b
0  0.0  NaN
1  NaN  2.0
2  2.0  1.0