Python Pandas Dataframe 替换低于阈值的值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43894927/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas Dataframe replace values below treshold
提问by J-H
How can I apply a function element-wise to a pandas DataFrame and pass a column-wise calculated value (e.g. quantile of column)? For example, what if I want to replace all elements in a DataFrame (with NaN
) where the value is lower than the 80th percentile of the column?
如何将函数逐元素应用于 Pandas DataFrame 并传递逐列计算值(例如列的分位数)?例如,如果我想替换 DataFrame 中的所有元素(用NaN
),其中值低于列的第 80 个百分位怎么办?
def _deletevalues(x, quantile):
if x < quantile:
return np.nan
else:
return x
df.applymap(lambda x: _deletevalues(x, x.quantile(0.8)))
Using applymap
only allows one to access each value individually and throws (of course) an AttributeError: ("'float' object has no attribute 'quantile'
Using applymap
only 允许单独访问每个值并抛出(当然)一个AttributeError: ("'float' object has no attribute 'quantile'
Thank you in advance.
先感谢您。
采纳答案by MaxU
In [139]: df
Out[139]:
a b c
0 1 7 3
1 1 2 6
2 3 0 5
3 8 2 1
4 7 3 5
5 6 7 2
6 0 2 1
7 8 4 1
8 5 0 6
9 7 7 6
for allcolumns:
对于所有列:
In [145]: df.apply(lambda x: np.where(x < x.quantile(),np.nan,x))
Out[145]:
a b c
0 NaN 7.0 NaN
1 NaN NaN 6.0
2 NaN NaN 5.0
3 8.0 NaN NaN
4 7.0 3.0 5.0
5 6.0 7.0 NaN
6 NaN NaN NaN
7 8.0 4.0 NaN
8 NaN NaN 6.0
9 7.0 7.0 6.0
or
或者
In [149]: df[df < df.quantile()] = np.nan
In [150]: df
Out[150]:
a b c
0 NaN 7.0 NaN
1 NaN NaN 6.0
2 NaN NaN 5.0
3 8.0 NaN NaN
4 7.0 3.0 5.0
5 6.0 7.0 NaN
6 NaN NaN NaN
7 8.0 4.0 NaN
8 NaN NaN 6.0
9 7.0 7.0 6.0
回答by jezrael
Use DataFrame.mask
:
df = df.mask(df < df.quantile())
print (df)
a b c
0 NaN 7.0 NaN
1 NaN NaN 6.0
2 NaN NaN 5.0
3 8.0 NaN NaN
4 7.0 3.0 5.0
5 6.0 7.0 NaN
6 NaN NaN NaN
7 8.0 4.0 NaN
8 NaN NaN 6.0
9 7.0 7.0 6.0