Python 有条件替换 Pandas
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/21608228/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Conditional Replace Pandas
提问by BMichell
I'm probably doing something very stupid, but I'm stumped.
我可能正在做一些非常愚蠢的事情,但我很难过。
I have a dataframe, and I want to replace the values in a particular column that exceed a value with zero. I had thought this was a way of achieving this:
我有一个数据框,我想用零替换特定列中超过值的值。我曾认为这是实现这一目标的一种方式:
df[df.my_channel > 20000].my_channel = 0
If I copy the channel into a new data frame it's simple:
如果我将频道复制到一个新的数据框中,这很简单:
df2 = df.my_channel
df2[df2 > 20000] = 0
this does exactly what I want, but seems not to work with the channel as part of the original dataframe.
这正是我想要的,但似乎不适用于作为原始数据帧一部分的通道。
采纳答案by lmiguelvargasf
.ixindexer works okay for pandas version prior to 0.20.0, but since pandas 0.20.0, the .ixindexer is deprecated, so you should avoid using it. Instead, you can use .locor ilocindexers. You can solve this problem by:
.ixindexer 可以在 0.20.0 之前的 pandas 版本中正常工作,但由于 pandas 0.20.0,.ixindexer 已弃用,因此您应该避免使用它。相反,您可以使用.loc或iloc索引器。您可以通过以下方式解决此问题:
mask = df.my_channel > 20000
column_name = 'my_channel'
df.loc[mask, column_name] = 0
Or, in one line,
或者,在一行中,
df.loc[df.my_channel > 20000, 'my_channel'] = 0
maskhelps you to select the rows in which df.my_channel > 20000is True, while df.loc[mask, column_name] = 0sets the value 0 to the selected rows where maskholds in the column which name is column_name.
mask帮助您选择df.my_channel > 20000is 所在的行True,同时df.loc[mask, column_name] = 0将值 0 设置为maskname is 列中的选定行column_name。
Update:In this case, you should use locbecause if you use iloc, you will get a NotImplementedErrortelling you that iLocation based boolean indexing on an integer type is not available.
更新:在这种情况下,你应该使用,loc因为如果你使用iloc,你会得到一个NotImplementedError告诉你基于 iLocation 的整数类型的布尔索引不可用。
回答by lowtech
Try
尝试
df.loc[df.my_channel > 20000, 'my_channel'] = 0
Note:Since v0.20.0, ixhas been deprecatedin favour of loc/ iloc.
注:由于v0.20.0,ix已被弃用,赞成loc/ iloc。
回答by seeiespi
回答by cyber-math
I would use lambdafunction on a Seriesof a DataFramelike this:
我会lambda在这样Series的 a上使用函数DataFrame:
f = lambda x: 0 if x>100 else 1
df['my_column'] = df['my_column'].map(f)
I do not assert that this is an efficient way, but it works fine.
我并不断言这是一种有效的方式,但它工作正常。
回答by jpp
The reason your original dataframe does not update is because chained indexingmay cause you to modify a copy rather than a view of your dataframe. The docsgive this advice:
原始数据框未更新的原因是因为链式索引可能会导致您修改数据框的副本而不是视图。该文档给出这样的建议:
When setting values in a pandas object, care must be taken to avoid what is called chained indexing.
在 Pandas 对象中设置值时,必须小心避免所谓的链式索引。
You have a few alternatives:-
你有几个选择:-
loc+ Boolean indexing
loc+ 布尔索引
locmay be used for setting values and supports Boolean masks:
loc可用于设置值并支持布尔掩码:
df.loc[df['my_channel'] > 20000, 'my_channel'] = 0
mask+ Boolean indexing
mask+ 布尔索引
You can assign to your series:
您可以分配给您的系列:
df['my_channel'] = df['my_channel'].mask(df['my_channel'] > 20000, 0)
Or you can update your series in place:
或者您可以就地更新您的系列:
df['my_channel'].mask(df['my_channel'] > 20000, 0, inplace=True)
np.where+ Boolean indexing
np.where+ 布尔索引
You canuse NumPy by assigning your original series when your condition is notsatisfied; however, the first two solutions are cleaner since they explicitly change only specified values.
您可以通过分配当你的条件原系列使用NumPy的未满足的; 但是,前两个解决方案更清晰,因为它们仅显式更改指定值。
df['my_channel'] = np.where(df['my_channel'] > 20000, 0, df['my_channel'])
回答by R. Shams
Try this:
尝试这个:
df.my_channel = df.my_channel.where(df.my_channel <= 20000, other= 0)
df.my_channel = df.my_channel.where(df.my_channel <= 20000, other= 0)
or
或者
df.my_channel = df.my_channel.mask(df.my_channel > 20000, other= 0)
df.my_channel = df.my_channel.mask(df.my_channel > 20000, other= 0)

