Pandas 合并两个没有某些列的 DataFrame

Question

提问by snoob dogg

Context

语境

I'm trying to merge two big CSV files together.

我正在尝试将两个大的 CSV 文件合并在一起。

Problem

问题

Let's say I've one Pandas DataFrame like the following...

假设我有一个 Pandas DataFrame，如下所示......

EntityNum    foo   ...
------------------------
1001.01      100
1002.02       50
1003.03      200

And another one like this...

还有一个这样的……

EntityNum    a_col    b_col
-----------------------------------
1001.01      alice        7  
1002.02        bob        8
1003.03        777        9

I'd like to join them like this:

我想像这样加入他们：

EntityNum    foo    a_col
----------------------------
1001.01      100    alice
1002.02       50      bob
1003.03      200      777

So Keep in mind, I don't want b_col in the final result. How do I I accomplish this with Pandas?

所以请记住，我不希望 b_col 出现在最终结果中。II 如何用 Pandas 实现这一点？

Using SQL, I should probably have done something like:

使用 SQL，我可能应该做一些类似的事情：

SELECT t1.*, t2.a_col FROM table_1 as t1
                      LEFT JOIN table_2 as t2
                      ON t1.EntityNum = t2.EntityNum;

Search

搜索

I know it is possible to use merge. This is what I've tried:

我知道可以使用合并。这是我尝试过的：

import pandas as pd

df_a = pd.read_csv(path_a, sep=',')
df_b = pd.read_csv(path_b, sep=',')
df_c = pd.merge(df_a, df_b, on='EntityNumber')

But I'm stuck when it comes to avoiding some of the unwanted columns in the final dataframe.

但是在避免最终数据框中的一些不需要的列时，我陷入了困境。

Answer 1

回答by Alexander

You can first access the relevant dataframe columns via their labels (e.g. df_a[['EntityNum', 'foo']]and then join those.

您可以首先通过标签访问相关的数据框列（例如df_a[['EntityNum', 'foo']]，然后加入这些列。

df_a[['EntityNum', 'foo']].merge(df_b[['EntityNum', 'a_col']], on='EntityNum', how='left')

Note that the default behavior for mergeis to do an inner join.

请注意， for 的默认行为merge是进行内部联接。

Answer 2

回答by DYZ

Note how in SQL, you first do the join and then select the columns that you want. In the same spirit, you can do a full join in Pandas and then select the wanted columns.

请注意在 SQL 中，您首先执行联接，然后选择所需的列。本着同样的精神，您可以在 Pandas 中进行完全连接，然后选择所需的列。

Alternatively, do a full join and delthe columns you do not want.

或者，对不需要del的列进行完全连接。

Finally, you can first select the columns that you ant and then do the join.

最后，您可以先选择您蚂蚁的列，然后进行连接。

Pandas 合并两个没有某些列的 DataFrame

提问by snoob dogg

Context

语境

Problem

问题

Search

搜索

回答by Alexander

回答by DYZ

相关推荐

最近更新

标签

Pandas 合并两个没有某些列的 DataFrame

提问by snoob dogg

Context

语境

Problem

问题

Search

搜索

回答by Alexander

回答by DYZ

相关推荐

pandas Python：json_normalize 熊猫系列给出了 TypeError

Pandas：如何获取数据帧第一行和最后一行的键（索引）

to_sql pandas 数据框导入 SQL 服务器错误：DatabaseError

通过整数访问行和通过标签 Pandas 访问列

相关推荐

最近更新

标签