基于 Python pandas 中索引的补充挑选元素

Question

提问by Zelazny7

I have a dataframe out of which I pick two subset dfs, df_aand df_b. For example in irisdataset:

我有一个数据框，从中我选择了两个子集 dfsdf_a和df_b. 例如在iris数据集中：

df_a = iris[iris.Name == "Iris-setosa"]
df_b = iris[iris.Name == "Iris-virginica"]

What's the best way to get all elements of iristhat are neither in df_anor in df_b? I prefer not to refer to the original conditions that defined df_aand df_b. I just assume that df_aand df_bare subsets of iris, so I'd like to pull out elements from irisbased on the indices of df_aand df_b. Basically, assume that:

获得iris既不在df_a也不在的所有元素的最佳方法是df_b什么？我不想参考定义df_a和的原始条件df_b。我只是假设df_a和df_b是的子集iris，所以我想从拔出元素iris基础上的指数df_a和df_b。基本上，假设：

df_a = get_a_subset(iris)
df_b = get_b_subset(iris)
# retrieve the subset of iris that 
# has all elements not in df_a or in df_b
# ...

EDIT:here is a solution that seems inefficient and inelegant and I'm sure pandas has a better way:

编辑：这是一个看起来效率低下且不优雅的解决方案，我相信大Pandas有更好的方法：

# get subset of iris that is not in a nor in b
df_rest = iris[map(lambda x: (x not in df_a.index) & (x not in df_b.index), iris.index)]

And a second one:

还有第二个：

df_rest = iris.ix[iris.index - df_a.index - df_b.index]

how can this be done most efficiently/elegantly in pandas? thanks.

如何在Pandas中最有效/最优雅地做到这一点？谢谢。

Answer 1

回答by Zelazny7

This seems a bit faster than your second solution. There's a bit more overhead when indexing with .ix:

这似乎比您的第二个解决方案快一点。使用以下方法进行索引时会产生更多开销.ix：

df[~df.index.isin(df_a.index+df_b.index)]

基于 Python pandas 中索引的补充挑选元素

提问by Zelazny7

回答by Zelazny7

相关推荐

最近更新

标签

基于 Python pandas 中索引的补充挑选元素

提问by Zelazny7

回答by Zelazny7

相关推荐

将 csv 文件转换为 Pandas 数据框

Pandas 删除时间范围之外的行

在 Pandas 中减少一列

在 Pandas 中读取带有逗号和字符的 CSV 文件时出现问题

相关推荐

最近更新

标签