Python Pandas - 基于先前获得的子集从数据帧中删除行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16704782/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 20:51:00  来源:igfitidea点击:

Python Pandas - Removing Rows From A DataFrame Based on a Previously Obtained Subset

pythonpandas

提问by DMML

I'm running Python 2.7with the Pandas 0.11.0library installed.

我正在运行安装Python 2.7Pandas 0.11.0库。

I've been looking around a haven't found an answer to this question, so I'm hoping somebody more experienced than I has a solution.

我一直在环顾四周,没有找到这个问题的答案,所以我希望有人比我有解决方案更有经验。

Lets say my data, in df1, looks like the following:

假设我的数据在 df1 中如下所示:

df1=

df1=

  zip  x  y  access
  123  1  1    4
  123  1  1    6
  133  1  2    3
  145  2  2    3
  167  3  1    1
  167  3  1    2

Using, for instance, df2 = df1[df1['zip'] == 123]and then df2 = df2.join(df1[df1['zip'] == 133])I get the following subset of data:

例如,使用,df2 = df1[df1['zip'] == 123]然后df2 = df2.join(df1[df1['zip'] == 133])我得到以下数据子集:

df2=

df2=

 zip  x  y  access
 123  1  1    4
 123  1  1    6
 133  1  2    3

What I want to do is either:

我想做的是:

1) Remove the rows from df1as they are defined/joined with df2

1)从df1定义/连接行中删除行df2

OR

或者

2) After df2has been created, remove the rows (difference?) from df1which df2is composed of

2)df2创建后,删除df1其中df2组成的行(差异?)

Hope all of that makes sense. Please let me know if any more info is needed.

希望所有这些都是有道理的。如果需要更多信息,请告诉我。

EDIT:

编辑:

Ideally a third dataframe would be create that looks like this:

理想情况下,将创建第三个数据框,如下所示:

df2=

df2=

 zip  x  y  access
 145  2  2    3
 167  3  1    1
 167  3  1    2

That is, everything from df1not in df2. Thanks!

也就是说,一切都来自df1not in df2。谢谢!

回答by DSM

Two options come to mind. First, use isinand a mask:

想到了两个选项。一、使用isin和面膜:

>>> df
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> keep = [123, 133]
>>> df_yes = df[df['zip'].isin(keep)]
>>> df_no = df[~df['zip'].isin(keep)]
>>> df_yes
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> df_no
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2

Second, use groupby:

其次,使用groupby

>>> grouped = df.groupby(df['zip'].isin(keep))

and then any of

然后任何一个

>>> grouped.get_group(True)
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> grouped.get_group(False)
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> [g for k,g in list(grouped)]
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]
>>> dict(list(grouped))
{False:    zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2, True:    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3}
>>> dict(list(grouped)).values()
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]

Which makes most sense depends upon the context, but I think you get the idea.

哪个最有意义取决于上下文,但我认为你明白了。