在 Pandas 中合并两个数据集

Question

提问by freddy888

I have previously worked with Stata and am now trying to get the same done with Python. However, I have troubles with the merge command. Somehow I must be missing something. My two dataframes I want to merge look like this:

我以前曾与 Stata 合作过，现在正在尝试用 Python 完成同样的工作。但是，我在使用合并命令时遇到了麻烦。不知何故，我一定错过了一些东西。我要合并的两个数据框如下所示：

 df1:
 Date id Market_Cap
 2000 1  400
 2000 2  200
 2001 1  410
 2001 2  220

 df2:
 id Ticker
 1   Shell
 2   ExxonMobil

My aim now is to get the following dataset:

我现在的目标是获得以下数据集：

Date id Market_Cap  Ticker
2000 1  400        Shell 
2000 2  200        ExxonMobil 
2001 1  410        Shell 
2001 2  220        ExxonMobil

I tried the following command:

我尝试了以下命令：

merged= pd.merge(df1, df2, how="left", on="id")

This merges the datasets, but gives me only nan's in the Ticker column. I looked at several sources and maybe I am mistaken, but isn't the "left" command the right thing do to for my purpose? I also tried "right" and "outer". They don't get the result I want to and "inner" does not seem to work here in general.

这合并了数据集，但在 Ticker 列中只给了我 nan。我查看了几个来源，也许我弄错了，但是“左”命令不是为了我的目的而做的正确的事情吗？我也试过“右”和“外”。他们没有得到我想要的结果，“内在”似乎在这里不起作用。

Am I missing something crucial?

我错过了一些重要的东西吗？

Answer 1

回答by jezrael

Thyere is problem your column idin one df is object(obviously string) and another int, so no match and get NaN.

Thyere 的问题是你id在一个 df 中的列是object（显然string）和另一个int，所以没有 match 和 get NaN。

If have same dtypes:

如果有相同的dtypes：

print (df1['id'].dtypes)
int64
print (df2['id'].dtypes)
int64

merged = pd.merge(df1, df2, how="left", on="id")
print (merged)
   Date  id  Market_Cap      Ticker
0  2000   1         400       Shell
1  2000   2         200  ExxonMobil
2  2001   1         410       Shell
3  2001   2         220  ExxonMobil

Another solution if need add only one new column is map:

如果只需要添加一个新列，另一种解决方案是map：

df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
   Date  id  Market_Cap      Ticker
0  2000   1         400       Shell
1  2000   2         200  ExxonMobil
2  2001   1         410       Shell
3  2001   2         220  ExxonMobil

Simulate your problem:

模拟你的问题：

print (df1['id'].dtypes)
object
print (df2['id'].dtypes)
int64

df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
   Date id  Market_Cap Ticker
0  2000  1         400    NaN
1  2000  2         200    NaN
2  2001  1         410    NaN
3  2001  2         220    NaN

And solution is convert to intby astype(or column idin df2to str):

并将解决方案转换为intby astype（或列idin df2to str）：

df1['id'] = df1['id'].astype(int)
#alternatively
#df2['id'] = df2['id'].astype(str)
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
   Date  id  Market_Cap      Ticker
0  2000   1         400       Shell
1  2000   2         200  ExxonMobil
2  2001   1         410       Shell
3  2001   2         220  ExxonMobil

在 Pandas 中合并两个数据集

提问by freddy888

回答by jezrael

相关推荐

最近更新

标签

在 Pandas 中合并两个数据集

提问by freddy888

回答by jezrael

相关推荐

如何以预定义的百分比拆分 Pandas 中的 DataFrame？

Python pandas 将 csv ANSI 格式加载为 UTF-8

pandas 如何计算pandas中前N行的累积总和？

将搬运工词干分析器应用于每个单词的 Pandas 列

相关推荐

最近更新

标签