Python Pandas Merge (pd.merge) 如何设置索引和连接

Question

提问by user1911092

I have two pandas dataframes: dfLeft and dfRight with the date as the index.

我有两个熊猫数据框：dfLeft 和 dfRight，以日期为索引。

dfLeft:

df左：

            cusip    factorL
date  
2012-01-03    XXXX      4.5
2012-01-03    YYYY      6.2
....
2012-01-04    XXXX      4.7
2012-01-04    YYYY      6.1
....

dfRight:

右：

            idc__id    factorR
date  
2012-01-03    XXXX      5.0
2012-01-03    YYYY      6.0
....
2012-01-04    XXXX      5.1
2012-01-04    YYYY      6.2

Both have a shape close to (121900,3)

两者的形状都接近 (121900,3)

I tried the following merge:

我尝试了以下合并：

test = pd.merge(dfLeft, dfRight, left_index=True, right_index=True, left_on='cusip', right_on='idc__id', how = 'inner')

This gave test a shape of (60643500, 6).

这给了 test 一个形状(60643500, 6)。

Any recommendations on what is going wrong here? I want it to merge based on both date and cusip/idc_id. Note: for this example the cusips are lined up, but in reality that may not be so.

关于这里出了什么问题的任何建议？我希望它根据日期和 cusip/idc_id 进行合并。注意：对于这个例子，尖头是一字排开的，但实际上可能并非如此。

Thanks.

谢谢。

Expected Output test:

预期输出测试：

             cusip    factorL    factorR
date  
2012-01-03    XXXX      4.5          5.0
2012-01-03    YYYY      6.2          6.0
....
2012-01-04    XXXX      4.7          5.1
2012-01-04    YYYY      6.1          6.2

Answer 1

采纳答案by Andy Hayden

You could append 'cuspin'and 'idc_id'as a indices to your DataFrames before you join(here's how it would work on the first couple of rows):

您可以在您之前将'cuspin'和'idc_id'作为索引附加到您的数据帧join（这是它在前几行上的工作方式）：

In [10]: dfL
Out[10]: 
           cuspin  factorL
date                      
2012-01-03   XXXX      4.5
2012-01-03   YYYY      6.2

In [11]: dfL1 = dfLeft.set_index('cuspin', append=True)

In [12]: dfR1 = dfRight.set_index('idc_id', append=True)

In [13]: dfL1
Out[13]: 
                   factorL
date       cuspin         
2012-01-03 XXXX        4.5
           YYYY        6.2

In [14]: dfL1.join(dfR1)
Out[14]: 
                   factorL  factorR
date       cuspin                  
2012-01-03 XXXX        4.5        5
           YYYY        6.2        6

Answer 2

回答by Theodros Zelleke

Reset the indices and then merge on multiple (column-)keys:

重置索引，然后在多个（列）键上合并：

dfLeft.reset_index(inplace=True)
dfRight.reset_index(inplace=True)
dfMerged = pd.merge(dfLeft, dfRight,
              left_on=['date', 'cusip'],
              right_on=['date', 'idc__id'],
              how='inner')

You can then reset 'date' as an index:

然后，您可以将“日期”重置为索引：

dfMerged.set_index('date', inplace=True)

Here's an example:

下面是一个例子：

raw1 = '''
2012-01-03    XXXX      4.5
2012-01-03    YYYY      6.2
2012-01-04    XXXX      4.7
2012-01-04    YYYY      6.1
'''

raw2 = '''
2012-01-03    XYXX      45.
2012-01-03    YYYY      62.
2012-01-04    XXXX      -47.
2012-01-05    YYYY      61.
'''

import pandas as pd
from StringIO import StringIO


df1 = pd.read_table(StringIO(raw1), header=None,
                    delim_whitespace=True, parse_dates=[0], skiprows=1)
df2 = pd.read_table(StringIO(raw2), header=None,
                    delim_whitespace=True, parse_dates=[0], skiprows=1)

df1.columns = ['date', 'cusip', 'factorL']
df2.columns = ['date', 'idc__id', 'factorL']

print pd.merge(df1, df2,
         left_on=['date', 'cusip'],
         right_on=['date', 'idc__id'],
         how='inner')

which gives

这使

                  date cusip  factorL_x idc__id  factorL_y
0  2012-01-03 00:00:00  YYYY        6.2    YYYY         62
1  2012-01-04 00:00:00  XXXX        4.7    XXXX        -47

Python Pandas Merge (pd.merge) 如何设置索引和连接

提问by user1911092

采纳答案by Andy Hayden

回答by Theodros Zelleke

相关推荐

最近更新

标签

Python Pandas Merge (pd.merge) 如何设置索引和连接

提问by user1911092

采纳答案by Andy Hayden

回答by Theodros Zelleke

相关推荐

Python Matplotlib imshow：数据旋转？

有效地检查字符串是否由 Python 中的一个字符组成

如何在 Python 中进行指数和对数曲线拟合？我发现只有多项式拟合

Python 在 Django 模板中访问查询集对象

相关推荐

最近更新

标签