Python Pandas Dataframe 替换低于阈值的值

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43894927/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:34:50  来源:igfitidea点击:

Python Pandas Dataframe replace values below treshold

pythonpandasdataframe

提问by J-H

How can I apply a function element-wise to a pandas DataFrame and pass a column-wise calculated value (e.g. quantile of column)? For example, what if I want to replace all elements in a DataFrame (with NaN) where the value is lower than the 80th percentile of the column?

如何将函数逐元素应用于 Pandas DataFrame 并传递逐列计算值(例如列的分位数)?例如,如果我想替换 DataFrame 中的所有元素(用NaN),其中值低于列的第 80 个百分位怎么办?

def _deletevalues(x, quantile):
if x < quantile:
    return np.nan
else:
    return x

df.applymap(lambda x: _deletevalues(x, x.quantile(0.8)))

Using applymaponly allows one to access each value individually and throws (of course) an AttributeError: ("'float' object has no attribute 'quantile'

Using applymaponly 允许单独访问每个值并抛出(当然)一个AttributeError: ("'float' object has no attribute 'quantile'

Thank you in advance.

先感谢您。

采纳答案by MaxU

In [139]: df
Out[139]:
   a  b  c
0  1  7  3
1  1  2  6
2  3  0  5
3  8  2  1
4  7  3  5
5  6  7  2
6  0  2  1
7  8  4  1
8  5  0  6
9  7  7  6

for allcolumns:

对于所有列:

In [145]: df.apply(lambda x: np.where(x < x.quantile(),np.nan,x))
Out[145]:
     a    b    c
0  NaN  7.0  NaN
1  NaN  NaN  6.0
2  NaN  NaN  5.0
3  8.0  NaN  NaN
4  7.0  3.0  5.0
5  6.0  7.0  NaN
6  NaN  NaN  NaN
7  8.0  4.0  NaN
8  NaN  NaN  6.0
9  7.0  7.0  6.0

or

或者

In [149]: df[df < df.quantile()] = np.nan

In [150]: df
Out[150]:
     a    b    c
0  NaN  7.0  NaN
1  NaN  NaN  6.0
2  NaN  NaN  5.0
3  8.0  NaN  NaN
4  7.0  3.0  5.0
5  6.0  7.0  NaN
6  NaN  NaN  NaN
7  8.0  4.0  NaN
8  NaN  NaN  6.0
9  7.0  7.0  6.0

回答by jezrael

Use DataFrame.mask:

使用DataFrame.mask

df = df.mask(df < df.quantile())
print (df)
     a    b    c
0  NaN  7.0  NaN
1  NaN  NaN  6.0
2  NaN  NaN  5.0
3  8.0  NaN  NaN
4  7.0  3.0  5.0
5  6.0  7.0  NaN
6  NaN  NaN  NaN
7  8.0  4.0  NaN
8  NaN  NaN  6.0
9  7.0  7.0  6.0