pandas 合并列上的 DataFrame 列表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18838274/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 21:09:42  来源:igfitidea点击:

Merge a list of DataFrame's on a column?

pythonfiltermergepandasdataframe

提问by alokv28

I am having trouble combining an array of DataFrames into a single DataFrame, merged on a specific column.

我无法将一组 DataFrame 组合到单个 DataFrame 中,并在特定列上合并。

I have a list of DataFrames called data, with each element, data[i]looking like this:

我有一个名为 的 DataFrame 列表data,每个元素data[i]如下所示:

     Rank  Name
2400    1 name1
2401    2 name2
2402    3 name3
2403    4 name4
2404    5 name5

Each DataFrame contains a Top 5 list for a given month, and the list contains the monthly results for a year.

每个 DataFrame 包含给定月份的 Top 5 列表,该列表包含一年的月度结果。

I would like the final, merged DataFrame to look like this:

我希望最终合并的 DataFrame 看起来像这样:

     Rank  Name_month1 Name_month2 Name_month3 ...
2400    1        name1       name1       name1 ...
2401    2        name2       name2       name2 ...
2402    3        name3       name3       name3 ...
2403    4        name4       name4       name4 ...
2404    5        name5       name5       name5 ...

where each column, after the first, corresponds to a monthly rank.

其中每一列,在第一列之后,对应一个月排名。

I have no problem merging 2 DataFrames from the list, data:

我从列表中合并 2 个数据帧没有问题,data

pandas.merge(data[0], data[1], on='Rank', suffix=['_month1', '_month2'])

But when I try to use filter()or chain .merge's, I keep running into trouble.

但是当我尝试使用filter()or 链接.merge's 时,我一直遇到麻烦。

Any thoughts? Thanks!

有什么想法吗?谢谢!

采纳答案by Viktor Kerkez

The problem is that, when you did the first merge, you changed the names of the columns (adding suffixes) and there won't be a name collision on the second merge, so the suffixes in the second merge will never be used. The solution is to rename the columns manually after the merge.

问题是,当您进行第一次合并时,您更改了列的名称(添加后缀)并且第二次合并时不会发生名称冲突,因此永远不会使用第二次合并中的后缀。解决方案是在合并后手动重命名列。

In [2]: df
Out[2]:       Rank   Name
        2400     1  name1
        2401     2  name2
        2402     3  name3
        2403     4  name4
        2404     5  name5
In [3]: df.merge(
            df, on='Rank', suffixes=['_month1', '_month2']
        ).merge(df, on='Rank').rename(
            columns={'Name': 'Name_month3'}
        ).merge(df, on='Rank').rename(
            columns={'Name': 'Name_month4'}
        )
Out[3]:    Rank Name_month1 Name_month2 Name_month3 Name_month4
        0     1       name1       name1       name1       name1
        1     2       name2       name2       name2       name2
        2     3       name3       name3       name3       name3
        3     4       name4       name4       name4       name4
        4     5       name5       name5       name5       name5

If you have a list of DataFrames just do:

如果您有一个 DataFrame 列表,请执行以下操作:

In [4]: data = [df, df, df, df]
        current = data[0].rename(columns={'Name': 'Name_month1'})
        for i, frame in enumerate(data[1:], 2):
            current = current.merge(frame, on='Rank').rename(
                         columns={'Name': 'Name_month%d' % i})
        current
Out[4]:    Rank Name_month1 Name_month2 Name_month3 Name_month4
        0     1       name1       name1       name1       name1
        1     2       name2       name2       name2       name2
        2     3       name3       name3       name3       name3
        3     4       name4       name4       name4       name4
        4     5       name5       name5       name5       name5