Python 按索引合并两个数据帧

Question

提问by brucezepplin

Hi I have the following dataframes:

嗨，我有以下数据框：

> df1
  id begin conditional confidence discoveryTechnique  
0 278    56       false        0.0                  1   
1 421    18       false        0.0                  1 

> df2
   concept 
0  A  
1  B

How do I merge on the indices to get:

我如何合并索引以获得：

  id begin conditional confidence discoveryTechnique   concept 
0 278    56       false        0.0                  1  A 
1 421    18       false        0.0                  1  B

I ask because it is my understanding that merge()i.e. df1.merge(df2)uses columns to do the matching. In fact, doing this I get:

我问是因为我的理解是merge()iedf1.merge(df2)使用列来进行匹配。事实上，这样做我得到：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 4618, in merge
    copy=copy, indicator=indicator)
  File "/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 58, in merge
    copy=copy, indicator=indicator)
  File "/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 491, in __init__
    self._validate_specification()
  File "/usr/local/lib/python2.7/dist-packages/pandas/tools/merge.py", line 812, in _validate_specification
    raise MergeError('No common columns to perform merge on')
pandas.tools.merge.MergeError: No common columns to perform merge on

Is it bad practice to merge on index? Is it impossible? If so, how can I shift the index into a new column called "index"?

在索引上合并是不好的做法吗？不可能吗？如果是这样，我如何将索引移动到一个名为“索引”的新列中？

Thanks

谢谢

Answer 1

回答by jezrael

Use merge, which is inner join by default:

使用merge，默认为内连接：

pd.merge(df1, df2, left_index=True, right_index=True)

Or join, which is left join by default:

或者join，默认情况下是左连接：

df1.join(df2)

Or concat, which is outer join by default:

或者concat，默认情况下是外连接：

pd.concat([df1, df2], axis=1)

Samples:

样品：

df1 = pd.DataFrame({'a':range(6),
                    'b':[5,3,6,9,2,4]}, index=list('abcdef'))

print (df1)
   a  b
a  0  5
b  1  3
c  2  6
d  3  9
e  4  2
f  5  4

df2 = pd.DataFrame({'c':range(4),
                    'd':[10,20,30, 40]}, index=list('abhi'))

print (df2)
   c   d
a  0  10
b  1  20
h  2  30
i  3  40

#default inner join
df3 = pd.merge(df1, df2, left_index=True, right_index=True)
print (df3)
   a  b  c   d
a  0  5  0  10
b  1  3  1  20

#default left join
df4 = df1.join(df2)
print (df4)
   a  b    c     d
a  0  5  0.0  10.0
b  1  3  1.0  20.0
c  2  6  NaN   NaN
d  3  9  NaN   NaN
e  4  2  NaN   NaN
f  5  4  NaN   NaN

#default outer join
df5 = pd.concat([df1, df2], axis=1)
print (df5)
     a    b    c     d
a  0.0  5.0  0.0  10.0
b  1.0  3.0  1.0  20.0
c  2.0  6.0  NaN   NaN
d  3.0  9.0  NaN   NaN
e  4.0  2.0  NaN   NaN
f  5.0  4.0  NaN   NaN
h  NaN  NaN  2.0  30.0
i  NaN  NaN  3.0  40.0

Answer 2

回答by MaxU

you can use concat([df1, df2, ...], axis=1)in order to concatenate two or more DFs aligned by indexes:

您可以使用concat([df1, df2, ...], axis=1)来连接两个或多个按索引对齐的 DF：

pd.concat([df1, df2, df3, ...], axis=1)

or mergefor concatenating by custom fields / indexes:

或合并以按自定义字段/索引连接：

# join by _common_ columns: `col1`, `col3`
pd.merge(df1, df2, on=['col1','col3'])

# join by: `df1.col1 == df2.index`
pd.merge(df1, df2, left_on='col1' right_index=True)

or joinfor joining by index:

或join按索引加入：

 df1.join(df2)

Answer 3

回答by vicpal

By default:
joinis a column-wise left join
pd.mergeis a column-wise inner join
pd.concatis a row-wise outer join

默认情况下：
join是按列左连接
pd.merge是按列内连接
pd.concat是按行外连接

pd.concat:
takes Iterable arguments. Thus, it cannot take DataFrames directly (use [df,df2])
Dimensions of DataFrame should match along axis

pd.concat:
接受可迭代的参数。因此，它不能直接使用 DataFrames（使用[df,df2]）
DataFrame 的维度应该沿轴匹配

Joinand pd.merge:
can take DataFrame arguments

Join和pd.merge:
可以采用 DataFrame 参数

Answer 4

回答by Stephen Morrell

A silly bug that got me: the joins failed because index dtypesdiffered. This was not obvious as both tables were pivot tables of the same original table. After reset_index, the indices looked identical in Jupyter. It only came to light when saving to Excel...

一个让我感到愚蠢的错误：连接失败，因为索引dtypes不同。这并不明显，因为两个表都是同一个原始表的数据透视表。之后reset_index，索引在 Jupyter 中看起来相同。只有在保存到 Excel 时才出现...

Fixed with: df1[['key']] = df1[['key']].apply(pd.to_numeric)

固定： df1[['key']] = df1[['key']].apply(pd.to_numeric)

Hopefully this saves somebody an hour!

希望这可以为某人节省一个小时！

Answer 5

回答by vignesh babu

If u want to join two dataframes in pandas you can simply use available attributes like mergeor concatenate. For example if I have two dataframes df1and df2I can join them by:

如果你想在 Pandas 中加入两个数据帧，你可以简单地使用可用的属性，如merge或concatenate。例如，如果我有两个数据框df1，df2我可以通过以下方式加入它们：

newdataframe=merge(df1,df2,left_index=True,right_index=True)

Python 按索引合并两个数据帧

提问by brucezepplin

回答by jezrael

回答by MaxU

回答by vicpal

回答by Stephen Morrell

回答by vignesh babu

相关推荐

最近更新

标签

Python 按索引合并两个数据帧

提问by brucezepplin

回答by jezrael

回答by MaxU

回答by vicpal

回答by Stephen Morrell

回答by vignesh babu

相关推荐

Python TensorFlow ValueError：无法为形状为“(?, 64, 64, 3)”的张量 u'Placeholder:0' 提供形状 (64, 64, 3) 的值

Pythonic way to convert a dictionary into namedtuple or another hashable dict-like?

Python pickle 的首选（或最常见）文件扩展名

如何在python数据框中找到列的最大值

相关推荐

最近更新

标签