Python 布尔系列键将被重新索引以匹配 DataFrame 索引
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/41710789/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Boolean Series key will be reindexed to match DataFrame index
提问by Cheng
Here is how I encountered the error:
这是我遇到错误的方式:
df.loc[a_list][df.a_col.isnull()]
The type of a_list
is Int64Index
, it contains a list of row indexes. All of these row indexes belong to df
.
a_list
is的类型Int64Index
,它包含一个行索引列表。所有这些行索引都属于df
.
The df.a_col.isnull()
part is a condition I need for filtering.
该df.a_col.isnull()
部分是我需要过滤的条件。
If I execute the following commands individually, I do not get any warnings:
如果我单独执行以下命令,则不会收到任何警告:
df.loc[a_list]
df[df.a_col.isnull()]
But if I put them together df.loc[a_list][df.a_col.isnull()]
, I get the warning message (but I can see the result):
但是如果我把它们放在一起df.loc[a_list][df.a_col.isnull()]
,我会收到警告信息(但我可以看到结果):
Boolean Series key will be reindexed to match DataFrame index
布尔系列键将被重新索引以匹配 DataFrame 索引
What is the meaning of this error message? Does it affect the result that it returned?
此错误消息的含义是什么?它会影响它返回的结果吗?
回答by IanS
Your approach will work despite the warning, but it's best not to rely on implicit, unclear behavior.
尽管有警告,您的方法仍然有效,但最好不要依赖隐含的、不清楚的行为。
Solution 1, make the selection of indices in a_list
a boolean mask:
解决方案 1,在a_list
布尔掩码中选择索引:
df[df.index.isin(a_list) & df.a_col.isnull()]
Solution 2, do it in two steps:
解决方案2,分两步完成:
df2 = df.loc[a_list]
df2[df2.a_col.isnull()]
Solution 3, if you want a one-liner, use a trick found here:
解决方案 3,如果您想要单线,请使用此处找到的技巧:
df.loc[a_list].query('a_col != a_col')
The warning comes from the fact that the boolean vector df.a_col.isnull()
is the length of df
, while df.loc[a_list]
is of the length of a_list
, i.e. shorter. Therefore, some indices in df.a_col.isnull()
are not in df.loc[a_list]
.
警告来自这样一个事实,即布尔向量df.a_col.isnull()
的长度为df
,而df.loc[a_list]
长度为a_list
,即更短。因此,某些索引 indf.a_col.isnull()
不在 中df.loc[a_list]
。
What pandas does is reindex the boolean series on the index of the calling dataframe. In effect, it gets from df.a_col.isnull()
the values corresponding to the indices in a_list
. This works, but the behavior is implicit, and could easily change in the future, so that's what the warning is about.
pandas 所做的是在调用数据帧的索引上重新索引布尔系列。实际上,它从df.a_col.isnull()
对应于 中的索引的值中获取a_list
。这有效,但行为是隐含的,将来很容易改变,所以这就是警告的内容。