Python 在 MultiIndex 的单个级别上合并

Question

提问by Johann Hibschman

Is there any way to merge on a single level of a MultiIndex without resetting the index?

有没有办法在不重置索引的情况下在 MultiIndex 的单个级别上进行合并？

I have a "static" table of time-invariant values, indexed by an ObjectID, and I have a "dynamic" table of time-varying fields, indexed by ObjectID+Date. I'd like to join these tables together.

我有一个由 ObjectID 索引的时不变值的“静态”表，我有一个时变字段的“动态”表，由 ObjectID+Date 索引。我想把这些桌子连在一起。

Right now, the best I can think of is:

目前，我能想到的最好的是：

dynamic.reset_index().merge(static, left_on=['ObjectID'], right_index=True)

However, the dynamic table is very big, and I don't want to have to muck around with its index in order to combine the values.

但是，动态表非常大，我不想为了组合这些值而不得不考虑它的索引。

Answer 1

采纳答案by joelostblom

Yes, since pandas 0.14.0, it is now possible to merge a singly-indexed DataFrame with a level of a multi-indexed DataFrame using .join.

是的，从 pandas 0.14.0 开始，现在可以使用.join.

df1.join(df2, how='inner') # how='outer' keeps all records from both data frames

The 0.14 pandas docsdescribes this as equivalent but more memory efficient and faster than:

0.14 pandas docs将其描述为等效但比以下内容更有效和更快：

merge(df1.reset_index(),
      df2.reset_index(),
      on=['index1'],
      how='inner'
     ).set_index(['index1','index2'])

The docs also mention that .joincan not be used to merge two multiindexed DataFrames on a single level and from the GitHub tracker discussion for the previous issue, it seems like this might not of priority to implement:

文档还提到.join不能用于在单个级别上合并两个多索引数据帧，并且从上一期的 GitHub 跟踪器讨论来看，这似乎不是优先实现的：

so I merged in the single join, see #6363; along with some docs on how to do a multi-multi join. THat's fairly complicated to actually implement. and IMHO not worth the effort as it really doesn't change the memory usage/speed that much at all.

所以我合并了单个连接，见#6363；以及一些关于如何进行多多连接的文档。实际实施起来相当复杂。恕我直言不值得付出努力，因为它确实根本不会改变内存使用/速度。

However, there is a GitHub conversation regarding this, where there has been some recent development https://github.com/pydata/pandas/issues/6360. It is also possible achieve this by resetting the indices as mentioned earlier and described in the docs as well.

但是，有一个关于此的 GitHub 对话，最近有一些开发https://github.com/pydata/pandas/issues/6360。也可以通过重置前面提到的和文档中描述的索引来实现这一点。

Update for pandas >= 0.24.0

大熊猫更新 >= 0.24.0

It is now possible to merge multiindexed data frames with each other. As per the release notes:

现在可以相互合并多索引数据帧。根据发行说明：

index_left = pd.MultiIndex.from_tuples([('K0', 'X0'), ('K0', 'X1'),
                                        ('K1', 'X2')],
                                        names=['key', 'X'])

left = pd.DataFrame({'A': ['A0', 'A1', 'A2'],
                     'B': ['B0', 'B1', 'B2']}, index=index_left)

index_right = pd.MultiIndex.from_tuples([('K0', 'Y0'), ('K1', 'Y1'),
                                        ('K2', 'Y2'), ('K2', 'Y3')],
                                        names=['key', 'Y'])

right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3']}, index=index_right)

left.join(right)

Out:

出去：

            A   B   C   D
key X  Y                 
K0  X0 Y0  A0  B0  C0  D0
    X1 Y0  A1  B1  C0  D0
K1  X2 Y1  A2  B2  C1  D1

[3 rows x 4 columns]

Answer 2

回答by closedloop

I get around this by reindexing the dataframe merging to have the full multiindex so that a left join is possible.

我通过重新索引数据帧合并以获得完整的多索引来解决这个问题，以便左连接成为可能。

# Create the left data frame
import pandas as pd
idx = pd.MultiIndex(levels=[['a','b'],['c','d']],labels=[[0,0,1,1],[0,1,0,1]], names=['lvl1','lvl2'])
df = pd.DataFrame([1,2,3,4],index=idx,columns=['data'])

#Create the factor to join to the data 'left data frame'
newFactor = pd.DataFrame(['fact:'+str(x) for x in df.index.levels[0]], index=df.index.levels[0], columns=['newFactor'])

Do the join on the subindex by reindexing the newFactor dataframe to contain the index of the left data frame

通过重新索引 newFactor 数据框以包含左侧数据框的索引来连接子索引

df.join(newFactor.reindex(df.index,level=0))

Answer 3

回答by Andor

I would use mapping for a single column:

我会对单列使用映射：

df1['newcol'] = df1.index.get_level_values(-1).map(lambda x: df2.newcol[x])

Answer 4

回答by Muthu

This works for me!

这对我有用！

gData.columns = gData.columns.droplevel(0)

grpData = gData.reset_index()
grpData

pd.merge(grpData,cusData,how='inner')

Here gData is multi index dataframe with two levels and cusData is a single index dataframe.

这里 gData 是具有两个级别的多索引数据帧，而 cusData 是单索引数据帧。

Python 在 MultiIndex 的单个级别上合并

提问by Johann Hibschman

采纳答案by joelostblom

Update for pandas >= 0.24.0

大熊猫更新 >= 0.24.0

回答by closedloop

回答by Andor

回答by Muthu

相关推荐

最近更新

标签

Python 在 MultiIndex 的单个级别上合并

提问by Johann Hibschman

采纳答案by joelostblom

Update for pandas >= 0.24.0

大熊猫更新 >= 0.24.0

回答by closedloop

回答by Andor

回答by Muthu

相关推荐

python日志记录是否刷新每个日志？

在 Python 中，pydoc 是做什么的？

在 Python 中验证用户输入字符串

Python 如何让记录器在再次写入之前删除现有的日志文件？

相关推荐

最近更新

标签