pandas 删除pandas.Dataframe中重复列的快速方法

Question

提问by Peter Klauke

so by using

所以通过使用

df_ab = pd.concat([df_a, df_b], axis=1, join='inner')

I get a Dataframe looking like this:

我得到一个看起来像这样的数据框：

    A    A    B    B
0   5    5   10   10
1   6    6   19   19

and I want to remove its multiple columns:

我想删除它的多列：

    A     B
0   5    10
1   6    19

Because df_a and df_b are subsets of the same Dataframe I know that all rows have the same values if the column name is the same. I have a working solution:

因为 df_a 和 df_b 是同一个 Dataframe 的子集，我知道如果列名相同，所有行都具有相同的值。我有一个可行的解决方案：

df_ab = df_ab.T.drop_duplicates().T

but I have a number of rows so this one is very slow. Does someone have a faster solution? I would prefer a solution where explicit knowledge of the column names isn't needed.

但我有很多行，所以这一行很慢。有人有更快的解决方案吗？我更喜欢不需要明确了解列名的解决方案。

Answer 1

采纳答案by behzad.nouri

You may use np.uniqueto get indices of unique columns, and then use .iloc:

您可以使用np.unique来获取唯一列的索引，然后使用.iloc：

>>> df
   A  A   B   B
0  5  5  10  10
1  6  6  19  19
>>> _, i = np.unique(df.columns, return_index=True)
>>> df.iloc[:, i]
   A   B
0  5  10
1  6  19

Answer 2

回答by Prayson W. Daniel

The easiest way is:

最简单的方法是：

df = df.loc[:,~df.columns.duplicated()]

One line of code can change everything

一行代码可以改变一切

Answer 3

回答by unutbu

Perhaps you would be better off avoiding the problem altogether, by using pd.mergeinstead of pd.concat:

也许你最好完全避免这个问题，使用pd.merge代替pd.concat：

df_ab = pd.merge(df_a, df_b, how='inner')

This will merge df_aand df_bon all columns shared in common.

这将合并df_a，并df_b在所有列在共同分享。

Answer 4

回答by James Wright

For those who skip the question and look straight at answers, the simplest way for me is to use OP's solution (assuming you don't run into the same performance issues he did: Transpose the dataframe, use drop_duplicates, and then Transpose it again:

对于那些跳过问题直接看答案的人，对我来说最简单的方法是使用 OP 的解决方案（假设您没有遇到他所做的相同的性能问题：转置数据帧，使用 drop_duplicates，然后再次转置它：

df.T.drop_duplicates().T

pandas 删除pandas.Dataframe中重复列的快速方法

提问by Peter Klauke

采纳答案by behzad.nouri

回答by Prayson W. Daniel

回答by unutbu

回答by James Wright

相关推荐

最近更新

标签

pandas 删除pandas.Dataframe中重复列的快速方法

提问by Peter Klauke

采纳答案by behzad.nouri

回答by Prayson W. Daniel

回答by unutbu

回答by James Wright

相关推荐

在 Pandas Dataframe 中删除标准差较低的列

pandas 将集合计数器变成字典

pandas 熊猫从 csv 读取数据帧，索引为字符串，而不是 int

pandas 使用熊猫读取csv中的特定单元格

相关推荐

最近更新

标签