如何在 Pandas 数据框中用 NaN 替换一系列值?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/40159763/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 02:15:25  来源:igfitidea点击:

How to replace a range of values with NaN in Pandas data-frame?

pythonpandasdataframe

提问by Mat_python

I have a huge data-frame. How should I replace a range of values (-200, -100) with NaN?

我有一个巨大的数据框。我应该如何用 NaN 替换一系列值 (-200, -100)?

回答by jpp

dataframe

数据框

You can use pd.DataFrame.mask:

您可以使用pd.DataFrame.mask

df.mask((df >= -200) & (df <= -100), inplace=True)

This method replaces elements identified by Truevalues in a Boolean array with a specified value, defaulting to NaNif a value is not specified.

此方法将True布尔数组中由值标识的元素替换为指定值,NaN如果未指定值,则默认为。

Equivalently, use pd.DataFrame.wherewith the reverse condition:

等效地,pd.DataFrame.where与相反的条件一起使用:

df.where((df < -200) | (df > -100), inplace=True)

series

系列

As with many methods, Pandas helpfully includes versions which work with series rather than an entire dataframe. So, for a column df['A'], you can use pd.Series.maskwith pd.Series.between:

与许多方法一样,Pandas 有助于包含适用于系列而不是整个数据框的版本。因此,对于 column df['A'],您可以使用pd.Series.maskwith pd.Series.between

df['A'].mask(df['A'].between(-200, -100), inplace=True)

For chaining, note inplace=Falseby default, so you can also use:

对于链接,请注意inplace=False默认情况下,因此您也可以使用:

df['A'] = df['A'].mask(df['A'].between(-200, -100))

回答by MaxU

You can do it this way:

你可以这样做:

In [145]: df = pd.DataFrame(np.random.randint(-250, 50, (10, 3)), columns=list('abc'))

In [146]: df
Out[146]:
     a    b    c
0 -188  -63 -228
1  -59  -70  -66
2 -110   39 -146
3  -67 -228 -232
4  -22 -180 -140
5 -191 -136 -188
6  -59  -30 -128
7 -201 -244 -195
8 -248  -30  -25
9   11    1   20

In [148]: df.loc[:, (df>=-200) & (df<=-100)] = np.nan

In [149]: df
Out[149]:
       a      b      c
0    NaN  -63.0 -228.0
1  -59.0  -70.0  -66.0
2    NaN   39.0    NaN
3  -67.0 -228.0 -232.0
4  -22.0    NaN    NaN
5    NaN    NaN    NaN
6  -59.0  -30.0    NaN
7 -201.0 -244.0    NaN
8 -248.0  -30.0  -25.0
9   11.0    1.0   20.0