Python 熊猫如何合并保留顺序?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20206615/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:53:09  来源:igfitidea点击:

How can a pandas merge preserve order?

pythonpandas

提问by user2543623

I have two DataFrames in pandas, trying to merge them. But pandas keeps changing the order. I've tried setting indexes, resetting them, no matter what I do, I can't get the returned output to have the rows in the same order. Is there a trick? Note we start out with the loans order 'a,b,c' but after the merge, it's "a,c,b".

我在熊猫中有两个数据帧,试图合并它们。但是大熊猫一直在改变顺序。我试过设置索引,重置它们,无论我做什么,我都无法让返回的输出以相同的顺序排列行。有诀窍吗?请注意,我们从贷款顺序 'a,b,c' 开始,但在合并之后,它是“a,c,b”。

import pandas
loans = [  'a',  'b', 'c' ]
states = [  'OR',  'CA', 'OR' ]
x = pandas.DataFrame({ 'loan' : loans, 'state' : states })
y = pandas.DataFrame({ 'state' : [ 'CA', 'OR' ], 'value' : [ 1, 2]})
z = x.merge(y, how='left', on='state')

But now the order is no longer the original 'a,b,c'. Any ideas? I'm using pandas version 11.

但现在顺序不再是原来的'a,b,c'。有任何想法吗?我正在使用熊猫版本 11。

回答by abarnert

Hopefully someone will provide a better answer, but in case no one does, this will definitely work, so…

希望有人会提供更好的答案,但万一没有人这样做,这肯定会奏效,所以……

Zeroth, I'm assuming you don't want to just end up sorted on loan, but to preserve whateveroriginal order was in x, which may or may not have anything to do with the order of the loancolumn. (Otherwise, the problem is easier, and less interesting.)

第零,我假设您不希望最终按 排序loan,而是要保留 中的任何原始顺序x,这可能与loan列的顺序有关,也可能无关。(否则,问题会更容易,也不会那么有趣。)

First, you're asking it to sort based on the join keys. As the docsexplain, that's the default when you don't pass a sortargument.

首先,您要求它根据连接键进行排序。正如文档所解释的那样,这是您不传递sort参数时的默认设置。



Second, if you don'tsort based on the join keys, the rows will end up grouped together, such that two rows that merged from the same source row end up next to each other, which means you're still going to get a, c, b.

其次,如果您根据连接键进行排序,这些行将最终分组在一起,这样从同一源行合并的两行最终会彼此相邻,这意味着您仍然会得到ac, b.

You can work around this by getting the rows grouped together in the order they appear in the original xby just merging again with x(on either side, it doesn't really matter), or by reindexing based on xif you prefer. Like this:

您可以通过将行按照它们在原始文件中出现的顺序组合在一起来解决这个问题,x只需再次合并x(在任何一侧,这都无关紧要),或者根据x您的喜好重新索引。像这样:

x.merge(x.merge(y, how='left', on='state', sort=False))


Alternatively, you can cram an x-index in there with reset_index, then just sort on that, like this:

或者,您可以使用 将 x-index 塞入其中reset_index,然后对其进行排序,如下所示:

x.reset_index().merge(y, how='left', on='state', sort=False).sort('index')


Either way obviously seems a bit wasteful, and clumsy… so, as I said, hopefully there's a better answer that I'm just not seeing at the moment. But if not, that works.

无论哪种方式显然都显得有点浪费和笨拙……所以,正如我所说,希望有一个我目前没有看到的更好的答案。但如果没有,那行得通。

回答by KCzar

The fastest way I've found to merge and restore order - if you are merging "left" - is to include the original order as a column in the left dataframe before merging, then use that to restore the order after merging:

我发现合并和恢复顺序的最快方法 - 如果您正在合并“左” - 是在合并之前将原始顺序作为左侧数据框中的一列包含在内,然后在合并后使用它来恢复顺序:

import pandas
loans = [  'a',  'b', 'c' ]
states = [  'OR',  'CA', 'OR' ]
x = pandas.DataFrame({ 'loan' : loans, 'state' : states })
y = pandas.DataFrame({ 'state' : [ 'CA', 'OR' ], 'value' : [ 1, 2]})

import numpy as np
x["Order"] = np.arange(len(x))

z = x.merge(y, how='left', on='state').set_index("Order").ix[np.arange(len(x)), :]

This method is faster than sorting. Here it is as a function:

这种方法比排序更快。这是一个函数:

def mergeLeftInOrder(x, y, on=None):
    x = x.copy()
    x["Order"] = np.arange(len(x))
    z = x.merge(y, how='left', on=on).set_index("Order").ix[np.arange(len(x)), :]
    return z

回答by Claygirl

Pandas v0.8.0 introduced new merge function that takes order into consideration - ordered_merge, so your solution is now as simple as:

Pandas v0.8.0 引入了考虑顺序的新合并函数 - ordered_merge,因此您的解决方案现在非常简单:

z = pandas.ordered_merge(x, y, on='state')

回答by filup

Use pd.merge_ordered(), documentation here.

使用pd.merge_ordered(),文档在这里

For your example,

对于你的例子,

z = pd.merge_ordered(x, y, how='left', on='state')

z = pd.merge_ordered(x, y, how='left', on='state')

EDIT: Just wanted to point out that default behavior for this function is an outer merge, different from the default behavior of the more common .merge()

编辑:只是想指出这个函数的默认行为是一个外部合并,不同于更常见的默认行为 .merge()

回答by Laurent T

I might have a much more simple solution:

我可能有一个更简单的解决方案:

df_z = df_x.join(df_y.set_index('state'), on = 'state')

Hope it helps

希望能帮助到你