如何根据 Pandas 数据框中的两个或多个子集条件删除重复项

Question

提问by logic8

Lets say this is my data-frame

假设这是我的数据框

df = pd.DataFrame({ 'bio' : ['1', '1', '1', '4'],
                'center' : ['one', 'one', 'two', 'three'],
                'outcome' : ['f','t','f','f'] })

It looks like this ...

看起来像这样...

  bio center outcome
0   1    one       f
1   1    one       t
2   1    two       f
3   4  three       f

I want to drop row 1 because it has the same bio & center as row 0. I want to keep row 2 because it has the same bio but different center then row 0.

我想删除第 1 行，因为它与第 0 行具有相同的生物和中心。我想保留第 2 行，因为它具有相同的生物但中心与第 0 行不同。

Something like this won't work based on drop_duplicates input structure but it's what I am trying to do

基于 drop_duplicates 输入结构，这样的事情将不起作用，但这是我正在尝试做的

df.drop_duplicates(subset = 'bio' & subset = 'center' )

Any suggestions ?

有什么建议？

edit : changed df a bit to fit example by correct answer

编辑：通过正确答案稍微更改 df 以适合示例

Answer 1

回答by Gustavo Bezerra

Your syntax is wrong. Here's the correct way:

你的语法是错误的。这是正确的方法：

df.drop_duplicates(subset=['bio', 'center', 'outcome'])

Or in this specific case, just simply:

或者在这种特定情况下，只需简单地：

df.drop_duplicates()

Both return the following:

两者都返回以下内容：

  bio center outcome
0   1    one       f
2   1    two       f
3   4  three       f

Take a look at the df.drop_duplicatesdocumentationfor syntax details. subsetshould be a sequence of column labels.

查看df.drop_duplicates文档以了解语法详细信息。subset应该是一系列的列标签。

如何根据 Pandas 数据框中的两个或多个子集条件删除重复项

提问by logic8

回答by Gustavo Bezerra

相关推荐

最近更新

标签

如何根据 Pandas 数据框中的两个或多个子集条件删除重复项

提问by logic8

回答by Gustavo Bezerra

相关推荐

“字段列表”python pandas 中的未知列“nan”

如何使用多个 numpy 1d 数组创建一个 Pandas DataFrame？

如何使用 Pandas 中的 groupby 计算绝对和？

警告！***HDF5 库版本不匹配错误*** python pandas windows

相关推荐

最近更新

标签

警告！HDF5 库版本不匹配错误 python pandas windows