在 Pandas 中合并两个数据集
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/43909479/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Merge two datasets in Pandas
提问by freddy888
I have previously worked with Stata and am now trying to get the same done with Python. However, I have troubles with the merge command. Somehow I must be missing something. My two dataframes I want to merge look like this:
我以前曾与 Stata 合作过,现在正在尝试用 Python 完成同样的工作。但是,我在使用合并命令时遇到了麻烦。不知何故,我一定错过了一些东西。我要合并的两个数据框如下所示:
df1:
Date id Market_Cap
2000 1 400
2000 2 200
2001 1 410
2001 2 220
df2:
id Ticker
1 Shell
2 ExxonMobil
My aim now is to get the following dataset:
我现在的目标是获得以下数据集:
Date id Market_Cap Ticker
2000 1 400 Shell
2000 2 200 ExxonMobil
2001 1 410 Shell
2001 2 220 ExxonMobil
I tried the following command:
我尝试了以下命令:
merged= pd.merge(df1, df2, how="left", on="id")
This merges the datasets, but gives me only nan's in the Ticker column. I looked at several sources and maybe I am mistaken, but isn't the "left" command the right thing do to for my purpose? I also tried "right" and "outer". They don't get the result I want to and "inner" does not seem to work here in general.
这合并了数据集,但在 Ticker 列中只给了我 nan。我查看了几个来源,也许我弄错了,但是“左”命令不是为了我的目的而做的正确的事情吗?我也试过“右”和“外”。他们没有得到我想要的结果,“内在”似乎在这里不起作用。
Am I missing something crucial?
我错过了一些重要的东西吗?
回答by jezrael
Thyere is problem your column id
in one df is object
(obviously string
) and another int
, so no match and get NaN
.
Thyere 的问题是你id
在一个 df 中的列是object
(显然string
)和另一个int
,所以没有 match 和 get NaN
。
If have same dtypes
:
如果有相同的dtypes
:
print (df1['id'].dtypes)
int64
print (df2['id'].dtypes)
int64
merged = pd.merge(df1, df2, how="left", on="id")
print (merged)
Date id Market_Cap Ticker
0 2000 1 400 Shell
1 2000 2 200 ExxonMobil
2 2001 1 410 Shell
3 2001 2 220 ExxonMobil
Another solution if need add only one new column is map
:
如果只需要添加一个新列,另一种解决方案是map
:
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
Date id Market_Cap Ticker
0 2000 1 400 Shell
1 2000 2 200 ExxonMobil
2 2001 1 410 Shell
3 2001 2 220 ExxonMobil
Simulate your problem:
模拟你的问题:
print (df1['id'].dtypes)
object
print (df2['id'].dtypes)
int64
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
Date id Market_Cap Ticker
0 2000 1 400 NaN
1 2000 2 200 NaN
2 2001 1 410 NaN
3 2001 2 220 NaN
And solution is convert to int
by astype
(or column id
in df2
to str
):
并将解决方案转换为int
by astype
(或列id
in df2
to str
):
df1['id'] = df1['id'].astype(int)
#alternatively
#df2['id'] = df2['id'].astype(str)
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
Date id Market_Cap Ticker
0 2000 1 400 Shell
1 2000 2 200 ExxonMobil
2 2001 1 410 Shell
3 2001 2 220 ExxonMobil