pandas 在pandas中，如何水平连接然后去除多余的列

Question

提问by Jun Jang

Say I have two dataframes.

假设我有两个数据框。

DF1: col1, col2, col3,

DF2: col2, col4, col5

How do I concatenate the two dataframes horizontally and have the col1, col2, col3, col4, and col5? Right now, I am doing pd.concat([DF1, DF2], axis = 1) but it ends up having two col2's. Assuming all the values inside the two col2 are the same, I want to have only one columns.

如何水平连接两个数据帧并具有 col1、col2、col3、col4 和 col5？现在，我正在做 pd.concat([DF1, DF2],axis = 1) 但它最终有两个 col2。假设两个 col2 中的所有值都相同，我希望只有一列。

Answer 1

回答by Allen

Dropping duplicates should work. Because drop_duplicates only works with index, we need to transpose the DF to drop duplicates and transpose it back.

删除重复项应该有效。因为 drop_duplicates 仅适用于索引，我们需要转置 DF 以删除重复项并将其转回。

pd.concat([DF1, DF2], axis = 1).T.drop_duplicates().T

Answer 2

回答by jezrael

Use differencefor columns from DF2which are not in DF1and simple select them by []:

使用difference的列从DF2它不是DF1简单的通过选择它们[]：

DF1 = pd.DataFrame(columns=['col1', 'col2', 'col3'])
DF2 = pd.DataFrame(columns=['col2', 'col4', 'col5'])


DF2 = DF2[DF2.columns.difference(DF1.columns)]
print (DF2)
Empty DataFrame
Columns: [col4, col5]
Index: []

print (pd.concat([DF1, DF2], axis = 1))
Empty DataFrame
Columns: [col1, col2, col3, col4, col5]
Index: []

Timings:

时间：

np.random.seed(123)

N = 1000
DF1 = pd.DataFrame(np.random.rand(N,3), columns=['col1', 'col2', 'col3'])
DF2 = pd.DataFrame(np.random.rand(N,3), columns=['col2', 'col4', 'col5'])

DF2['col2'] = DF1['col2']

In [408]: %timeit (pd.concat([DF1, DF2], axis = 1).T.drop_duplicates().T)
10 loops, best of 3: 122 ms per loop

In [409]: %timeit (pd.concat([DF1, DF2[DF2.columns.difference(DF1.columns)]], axis = 1))
1000 loops, best of 3: 979 μs per loop

N = 10000:
In [411]: %timeit (pd.concat([DF1, DF2], axis = 1).T.drop_duplicates().T)
1 loop, best of 3: 1.4 s per loop

In [412]: %timeit (pd.concat([DF1, DF2[DF2.columns.difference(DF1.columns)]], axis = 1))
1000 loops, best of 3: 1.12 ms per loop

Answer 3

回答by YOBEN_S

DF2.drop(DF2.columns[DF2.columns.isin(DF1.columns)],axis=1,inplace=True)

Then,

然后，

pd.concat([DF1, DF2], axis = 1)

Answer 4

回答by maria_g

To avoid duplication of the columns while joining two data frames use the ignore_index argument.

为了避免在连接两个数据框时出现重复的列，请使用 ignore_index 参数。

pd.concat([df1, df2], ignore_index=True, sort=False)

But use it only if wish to append them and ignore the fact that they may have overlapping indexes

但仅当希望附加它们并忽略它们可能具有重叠索引的事实时才使用它

pandas 在pandas中，如何水平连接然后去除多余的列

提问by Jun Jang

回答by Allen

回答by jezrael

回答by YOBEN_S

回答by maria_g

相关推荐

最近更新

标签

pandas 在pandas中，如何水平连接然后去除多余的列

提问by Jun Jang

回答by Allen

回答by jezrael

回答by YOBEN_S

回答by maria_g

相关推荐

如何在 Pandas 中打开文件

pandas python csv到字典使用csv或pandas模块

AttributeError: 'module' 对象在 Pandas 中没有属性 'to_numeric'

Python pandas 数据框和 excel：添加单元格背景色

相关推荐

最近更新

标签