Python 如何用零替换 Pandas Data Frame 中的负数
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/27759084/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to replace negative numbers in Pandas Data Frame by zero
提问by Hangon
I would like to know if there is someway of replacing all DataFrame negative numbers by zeros?
我想知道是否有办法用零替换所有 DataFrame 负数?
采纳答案by Lev Levitsky
If all your columns are numeric, you can use boolean indexing:
如果所有列都是数字,则可以使用布尔索引:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})
In [3]: df
Out[3]:
a b
0 0 -3
1 -1 2
2 2 1
In [4]: df[df < 0] = 0
In [5]: df
Out[5]:
a b
0 0 0
1 0 2
2 2 1
For the more general case, this answershows the private method _get_numeric_data
:
对于更一般的情况,这个答案显示了私有方法_get_numeric_data
:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1],
'c': ['foo', 'goo', 'bar']})
In [3]: df
Out[3]:
a b c
0 0 -3 foo
1 -1 2 goo
2 2 1 bar
In [4]: num = df._get_numeric_data()
In [5]: num[num < 0] = 0
In [6]: df
Out[6]:
a b c
0 0 0 foo
1 0 2 goo
2 2 1 bar
With timedelta
type, boolean indexing seems to work on separate columns, but not on the whole dataframe. So you can do:
对于timedelta
类型,布尔索引似乎适用于单独的列,但不适用于整个数据帧。所以你可以这样做:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
...: 'b': pd.to_timedelta([-3, 2, 1], 'd')})
In [3]: df
Out[3]:
a b
0 0 days -3 days
1 -1 days 2 days
2 2 days 1 days
In [4]: for k, v in df.iteritems():
...: v[v < 0] = 0
...:
In [5]: df
Out[5]:
a b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days
Update:comparison with a pd.Timedelta
works on the whole DataFrame:
更新:与pd.Timedelta
整个 DataFrame 上的作品进行比较:
In [1]: import pandas as pd
In [2]: df = pd.DataFrame({'a': pd.to_timedelta([0, -1, 2], 'd'),
...: 'b': pd.to_timedelta([-3, 2, 1], 'd')})
In [3]: df[df < pd.Timedelta(0)] = 0
In [4]: df
Out[4]:
a b
0 0 days 0 days
1 0 days 2 days
2 2 days 1 days
回答by aus_lacy
Perhaps you could use pandas.where(args)
like so:
也许你可以这样使用pandas.where(args)
:
data_frame = data_frame.where(data_frame < 0, 0)
回答by follyroof
Another succinct way of doing this is pandas.DataFrame.clip.
另一种简洁的方法是pandas.DataFrame.clip。
For example:
例如:
import pandas as pd
In [20]: df = pd.DataFrame({'a': [-1, 100, -2]})
In [21]: df
Out[21]:
a
0 -1
1 100
2 -2
In [22]: df.clip(lower=0)
Out[22]:
a
0 0
1 100
2 0
There's also df.clip_lower(0)
.
还有df.clip_lower(0)
。
回答by MarKo9
If you are dealing with a large df (40m x 700 in my case) it works much faster and memory savvy through iteration on columns with something like.
如果您正在处理大型 df(在我的情况下为 40m x 700),它的工作速度会更快,并且通过对类似列的迭代来了解内存。
for col in df.columns:
df[col][df[col] < 0] = 0
回答by Michael Conlin
Another clean option that I have found useful is pandas.DataFrame.maskwhich will "replace values where the condition is true."
我发现另一个有用的干净选项是 pandas.DataFrame.mask,它将“替换条件为真的值”。
Create the DataFrame:
创建数据框:
In [2]: import pandas as pd
In [3]: df = pd.DataFrame({'a': [0, -1, 2], 'b': [-3, 2, 1]})
In [4]: df
Out[4]:
a b
0 0 -3
1 -1 2
2 2 1
Replace negative numbers with 0:
用 0 替换负数:
In [5]: df.mask(df < 0, 0)
Out[5]:
a b
0 0 0
1 0 2
2 2 1
Or, replace negative numbers with NaN, which I frequently need:
或者,用 NaN 替换负数,这是我经常需要的:
In [7]: df.mask(df < 0)
Out[7]:
a b
0 0.0 NaN
1 NaN 2.0
2 2.0 1.0