Python Pandas - 基于先前获得的子集从数据帧中删除行

Question

提问by DMML

I'm running Python 2.7with the Pandas 0.11.0library installed.

我正在运行安装Python 2.7的Pandas 0.11.0库。

I've been looking around a haven't found an answer to this question, so I'm hoping somebody more experienced than I has a solution.

我一直在环顾四周，没有找到这个问题的答案，所以我希望有人比我有解决方案更有经验。

Lets say my data, in df1, looks like the following:

假设我的数据在 df1 中如下所示：

df1=

  zip  x  y  access
  123  1  1    4
  123  1  1    6
  133  1  2    3
  145  2  2    3
  167  3  1    1
  167  3  1    2

Using, for instance, df2 = df1[df1['zip'] == 123]and then df2 = df2.join(df1[df1['zip'] == 133])I get the following subset of data:

例如，使用，df2 = df1[df1['zip'] == 123]然后df2 = df2.join(df1[df1['zip'] == 133])我得到以下数据子集：

df2=

 zip  x  y  access
 123  1  1    4
 123  1  1    6
 133  1  2    3

What I want to do is either:

我想做的是：

1) Remove the rows from df1as they are defined/joined with df2

1）从df1定义/连接行中删除行df2

OR

或者

2) After df2has been created, remove the rows (difference?) from df1which df2is composed of

2）df2创建后，删除df1其中df2组成的行（差异？）

Hope all of that makes sense. Please let me know if any more info is needed.

希望所有这些都是有道理的。如果需要更多信息，请告诉我。

EDIT:

编辑：

Ideally a third dataframe would be create that looks like this:

理想情况下，将创建第三个数据框，如下所示：

df2=

 zip  x  y  access
 145  2  2    3
 167  3  1    1
 167  3  1    2

That is, everything from df1not in df2. Thanks!

也就是说，一切都来自df1not in df2。谢谢！

Answer 1

回答by DSM

Two options come to mind. First, use isinand a mask:

想到了两个选项。一、使用isin和面膜：

>>> df
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> keep = [123, 133]
>>> df_yes = df[df['zip'].isin(keep)]
>>> df_no = df[~df['zip'].isin(keep)]
>>> df_yes
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> df_no
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2

Second, use groupby:

其次，使用groupby：

>>> grouped = df.groupby(df['zip'].isin(keep))

and then any of

然后任何一个

>>> grouped.get_group(True)
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> grouped.get_group(False)
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> [g for k,g in list(grouped)]
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]
>>> dict(list(grouped))
{False:    zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2, True:    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3}
>>> dict(list(grouped)).values()
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]

Which makes most sense depends upon the context, but I think you get the idea.

哪个最有意义取决于上下文，但我认为你明白了。

Python Pandas - 基于先前获得的子集从数据帧中删除行

提问by DMML

回答by DSM

相关推荐

最近更新

标签

Python Pandas - 基于先前获得的子集从数据帧中删除行

提问by DMML

回答by DSM

相关推荐

pandas 日期字段的 cut/qcut 相当于什么？

在 Python Pandas DataFrame 中删除重复项而不删除重复项

pandas 用之前的非缺失值填充缺失的pandas数据，按key分组

pandas 按值范围对数据进行分组

相关推荐

最近更新

标签