pandas 理解熊猫合并中的“left_index”和“right_index”参数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/51814612/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 05:55:51  来源:igfitidea点击:

Understanding the "left_index" and "right_index" arguments in pandas merge

pythonpandasjoinmerge

提问by ste_kwr

I am really struggling to understand the "left_index" and "right_index" arguments in pandas.merge. I read the documentation, searched around, experimented with various setting and tried to understand but I am still confused. Consider this example:

我真的很难理解 pandas.merge 中的“left_index”和“right_index”参数。我阅读了文档,四处搜索,尝试了各种设置并试图理解,但我仍然感到困惑。考虑这个例子:

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'], 
                 'key2': ['K0', 'K1', 'K0', 'K1'],
                 'A': ['A0', 'A1', 'A2', 'A3'],
                 'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'C': ['C0', 'C1', 'C2', 'C3'],
                      'D': ['D0', 'D1', 'D2', 'D3'],
                      'E': [1,2,3,4]})

Now, when I run the following command:

现在,当我运行以下命令时:

pd.merge(left, right, left_on=['key2', 'key1'], right_on=['key1', 'key2'], how='outer', indicator=True, left_index=True)

I get:

我得到:

  key1_x key2_x    A    B key1_y key2_y    C    D    E      _merge
0     K0     K0   A0   B0     K0     K0   C0   D0  1.0        both
1     K0     K1   A1   B1     K1     K0   C1   D1  2.0        both
2     K0     K1   A1   B1     K1     K0   C2   D2  3.0        both
3     K1     K0   A2   B2    NaN    NaN  NaN  NaN  NaN   left_only
3     K2     K1   A3   B3    NaN    NaN  NaN  NaN  NaN   left_only
3    NaN    NaN  NaN  NaN     K2     K0   C3   D3  4.0  right_only

However, running the same with right_index=Truegives an error. Same if I introduce both. More interestingly, running the following merge gives a very unexpected result

但是,运行相同的 withright_index=True会出错。如果我同时介绍两者,则相同。更有趣的是,运行以下合并给出了一个非常意想不到的结果

pd.merge(left, right,  on=['key1', 'key2'],how='outer', validate = 'one_to_many', indicator=True, left_index = True, right_index = True)

Result is:

结果是:

  key1 key2   A   B   C   D  E _merge
0   K0   K0  A0  B0  C0  D0  1   both
1   K0   K1  A1  B1  C1  D1  2   both
2   K1   K0  A2  B2  C2  D2  3   both
3   K2   K1  A3  B3  C3  D3  4   both

As you can see, all information for right frame for key1and key2is completely lost.

正如你所看到的,所有的信息,右侧框架key1key2完全丧失。

Please help me understand the purpose and function of these arguments. Thank you.

请帮助我理解这些参数的目的和功能。谢谢你。

回答by Tam Le

Merging happens in a couple of ways:

合并有两种方式:

Column-Column Merge:Use left_on, right_on and how.

列列合并:使用 left_on、right_on 和 how。

Example:

例子:

# Gives same answer
pd.merge(left, right, left_on=['key2', 'key1'], right_on=['key1', 'key2'], how = 'outer')
pd.merge(left, right, on=['key1', 'key2'], how='outer', indicator=True)

Index-Index Merge:Set left_index and right_index to True or use on and use how.

索引索引合并:将 left_index 和 right_index 设置为 True 或使用 on 并使用 how。

Example:

例子:

pd.merge(left, right, how = 'inner', right_index = True, left_index = True)
# If you make matching unique multi-indexes for both data frames you can do
# pd.merge(left, right, how = 'inner', on = ['indexname1', 'indexname2'])
# In your data frames, you're keys duplicate values so you can't do this
# In general, a column with duplicate values does not make a good key

Column-Index Merge:Use left_on + right_index or left_index + right_on and how.

列索引合并:使用 left_on + right_index 或 left_index + right_on 以及如何使用。

Note: Both the values in index and left_on must match. If you're index is a integer and you're left_on is a string, you get error. Also, number of indexing levels must match.

注意: index 和 left_on 中的值都必须匹配。如果你的 index 是一个整数而你的 left_on 是一个字符串,你会得到错误。此外,索引级别的数量必须匹配。

Example:

例子:

# If how not specified, inner join is used
pd.merge(left, right, right_on=['E'], left_index = True, how = 'outer')  

# Gives error because left_on is string and right_index is integer
pd.merge(left, right, left_on=['key1'], right_index = True, how = 'outer')

# This gave you error because left_on has indexing level of 2 but right_index only has indexing level of 1.
pd.merge(left, right, left_on=['key2', 'key1'], right_on=['key1', 'key2'], how='outer', indicator=True, right_index=True)

You kind of mix up the different types of merges which gave weird results. If you can't see how the merging is going to happen conceptually, chances are a computer isn't going to do any better.

你有点混淆了不同类型的合并,这会产生奇怪的结果。如果您无法从概念上看出合并将如何发生,那么计算机很可能不会做得更好。

回答by gyoza

If I understand the behavior of mergecorrectly, you should pick only one option for leftand rightrespectively (i.e. You should notpick left_on=['x']and left_index=Trueat the same time). Otherwise, strange thing can happen in arbitrary way since it confuses mergeas to which keyshould be actually used as you have shown in current implementation of merge(I have not checked the pandas source in detail, but the behavior can change for different implementations in each version). Here is a small experiment.

如果我理解merge正确的行为,您应该分别为left和选择一个选项right(即您不应该同时选择left_on=['x']left_index=True)。否则,奇怪的事情可以在任意的方式发生,因为它混淆merge哪个key应该被实际使用,你在目前执行的证明merge(我没有检查详细大Pandas源,但行为可以在每个版本的不同实现改变) . 这是一个小实验。

>>> left
  key1 key2   A   B
0   K0   K0  A0  B0
1   K0   K1  A1  B1
2   K1   K0  A2  B2
3   K2   K1  A3  B3

>>> right
  key1 key2   C   D  E
0   K0   K0  C0  D0  1
1   K1   K0  C1  D1  2
2   K1   K0  C2  D2  3
3   K2   K0  C3  D3  4

(1) mergeusing ['key1', 'key2']

(1)merge使用['key1', 'key2']

>>> pd.merge(left, right, on=['key1', 'key2'], how='outer')

  key1 key2    A    B    C    D    E
0   K0   K0   A0   B0   C0   D0  1.0
1   K0   K1   A1   B1  NaN  NaN  NaN
2   K1   K0   A2   B2   C1   D1  2.0
3   K1   K0   A2   B2   C2   D2  3.0
4   K2   K1   A3   B3  NaN  NaN  NaN
5   K2   K0  NaN  NaN   C3   D3  4.0

(2) Set ['key1', 'key2']as leftindex and mergeit using the index and keys

(2) 设置['key1', 'key2']left索引并merge使用索引和键

>>> left = left.set_index(['key1', 'key2'])
>>> pd.merge(left, right, left_index=True, right_on=['key1', 'key2'], how='outer').reset_index(drop=True)

     A    B key1 key2    C    D    E
0   A0   B0   K0   K0   C0   D0  1.0
1   A1   B1   K0   K1  NaN  NaN  NaN
2   A2   B2   K1   K0   C1   D1  2.0
3   A2   B2   K1   K0   C2   D2  3.0
4   A3   B3   K2   K1  NaN  NaN  NaN
5  NaN  NaN   K2   K0   C3   D3  4.0

(3) Further set ['key1', 'key2']as rightindex and mergeit using the index

(3) 进一步设置['key1', 'key2']right索引并merge使用索引

>>> right = right.set_index(['key1', 'key2'])
>>> pd.merge(left, right, left_index=True, right_index=True, how='outer').reset_index()

  key1 key2    A    B    C    D    E
0   K0   K0   A0   B0   C0   D0  1.0
1   K0   K1   A1   B1  NaN  NaN  NaN
2   K1   K0   A2   B2   C1   D1  2.0
3   K1   K0   A2   B2   C2   D2  3.0
4   K2   K0  NaN  NaN   C3   D3  4.0
5   K2   K1   A3   B3  NaN  NaN  NaN

Please note that (1)(2)(3) above are showing the same results, and even if ['key1', 'key2']are set as index, you can still use left_on = ['key1', 'key2']instead of left_index=True.

请注意,上面的 (1)(2)(3) 显示的结果相同,即使['key1', 'key2']设置为索引,您仍然可以使用left_on = ['key1', 'key2']代替left_index=True

Now, if you really want to merge using both ['key1', 'key2']with index, one way to achieve this is:

现在,如果您真的想同时使用['key1', 'key2']with index,实现此目的的一种方法是:

>>> pd.merge(left.reset_index(), right.reset_index(), on=['index', 'key1', 'key2'], how='outer')

   index key1 key2    A    B    C    D    E
0      0   K0   K0   A0   B0   C0   D0  1.0
1      1   K0   K1   A1   B1  NaN  NaN  NaN
2      2   K1   K0   A2   B2   C2   D2  3.0
3      3   K2   K1   A3   B3  NaN  NaN  NaN
4      1   K1   K0  NaN  NaN   C1   D1  2.0
5      3   K2   K0  NaN  NaN   C3   D3  4.0

If you read down to here, I'm pretty sure now you know how to achieve above using multiple different ways. Hope this helps.

如果你读到这里,我很确定现在你知道如何使用多种不同的方式实现上述目标。希望这可以帮助。