Pandas 由布尔`loc` 和随后的`iloc` 索引

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/29608183/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:11:42  来源:igfitidea点击:

Pandas indexing by both boolean `loc` and subsequent `iloc`

pythonpandas

提问by tsawallis

I want to index a Pandas dataframe using a boolean mask, then set a value in a subset of the filtered dataframe based on an integer index, and have this value reflected in the dataframe. That is, I would be happy if this worked on a view of the dataframe.

我想使用布尔掩码对 Pandas 数据帧进行索引,然后根据整数索引在过滤后的数据帧的子集中设置一个值,并将该值反映在数据帧中。也就是说,如果这适用于数据框的视图,我会很高兴。

Example:

例子:

In [293]:

df = pd.DataFrame({'a': [0, 1, 2, 3, 4, 5, 6, 7],
                   'b': [5, 5, 2, 2, 5, 5, 2, 2],
                   'c': [0, 0, 0, 0, 0, 0, 0, 0]})

mask = (df['a'] < 7) & (df['b'] == 2)
df.loc[mask, 'c']

Out[293]:
2    0
3    0
6    0
Name: c, dtype: int64

Now I would like to set the values of the first two elements returned in the filtered dataframe. Chaining an iloconto the loccall above works to index:

现在我想设置过滤数据帧中返回的前两个元素的值。将 an 链接ilocloc上面的调用可以索引:

In [294]:

df.loc[mask, 'c'].iloc[0: 2]

Out[294]:

2    0
3    0
Name: c, dtype: int64

But not to assign:

但不分配:

In [295]:

df.loc[mask, 'c'].iloc[0: 2] = 1

print(df)

   a  b  c
0  0  5  0
1  1  5  0
2  2  2  0
3  3  2  0
4  4  5  0
5  5  5  0
6  6  2  0
7  7  2  0

Making the assign value the same length as the slice (i.e. = [1, 1]) also doesn't work. Is there a way to assign these values?

使分配值与切片的长度相同(即= [1, 1])也不起作用。有没有办法分配这些值?

采纳答案by EdChum

This does work but is a little ugly, basically we use the index generated from the mask and make an additional call to loc:

这确实有效,但有点难看,基本上我们使用从掩码生成的索引并额外调用loc

In [57]:

df.loc[df.loc[mask,'c'].iloc[0:2].index, 'c'] = 1
df
Out[57]:
   a  b  c
0  0  5  0
1  1  5  0
2  2  2  1
3  3  2  1
4  4  5  0
5  5  5  0
6  6  2  0
7  7  2  0

So breaking the above down:

所以分解上述内容:

In [60]:
# take the index from the mask and iloc
df.loc[mask, 'c'].iloc[0: 2]
Out[60]:
2    0
3    0
Name: c, dtype: int64
In [61]:
# call loc using this index, we can now use this to select column 'c' and set the value
df.loc[df.loc[mask,'c'].iloc[0:2].index]
Out[61]:
   a  b  c
2  2  2  0
3  3  2  0

回答by JoeCondron

How about.

怎么样。

ix = df.index[mask][:2]
df.loc[ix, 'c'] = 1

Same idea as EdChum but more elegant as suggested in the comment.

与 EdChum 相同的想法,但如评论中建议的那样更优雅。

EDIT: Have to be a little bit careful with this one as it may give unwanted results with a non-unique index, since there could be multiple rows indexed by either of the label in ixabove. If the index is non-unique and you only want the first 2 (or n) rows that satisfy the boolean key, it would be safer to use .ilocwith integer indexing with something like

编辑:必须对这个稍微小心一点,因为它可能会使用非唯一索引给出不需要的结果,因为ix上面的任何一个标签都可能索引多行。如果索引是非唯一的,并且您只想要满足布尔键的前 2(或 n)行,则将.iloc整数索引与类似的内容一起使用会更安全

ix = np.where(mask)[0][:2]
df.iloc[ix, 'c'] = 1

回答by JohnE

I don't know if this is any more elegant, but it's a little different:

我不知道这是否更优雅,但它有点不同:

mask = mask & (mask.cumsum() < 3)

df.loc[mask, 'c'] = 1

   a  b  c
0  0  5  0
1  1  5  0
2  2  2  1
3  3  2  1
4  4  5  0
5  5  5  0
6  6  2  0
7  7  2  0