Python Pandas 使用索引或列标识符连接/合并数据帧

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/24712647/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:15:39  来源:igfitidea点击:

Python Pandas concatenate/merge DataFrames using index or column identifiers

pythonpandasmergedataframeconcat

提问by tryptofame

I would like to use the 'pandas.concat' method to merge two DataFrames, but I don't fully understand all 'pandas.concat' arguments. I've got two DataFrames, which have the same identifying variables in the columns, but differ in one single column.

我想使用 'pandas.concat' 方法来合并两个 DataFrame,但我并不完全理解所有的 'pandas.concat' 参数。我有两个 DataFrame,它们在列中具有相同的识别变量,但在一列中不同。

import pandas as pd
dict_data = {'Treatment': ['C', 'C', 'C'], 'Biorep': ['A', 'A', 'A'], 'Techrep': [1, 1, 1], 'AAseq': ['ELVISLIVES', 'ELVISLIVES', 'ELVISLIVES'], 'mz':[500.0, 500.5, 501.0]}
df_a = pd.DataFrame(dict_data)
dict_data = {'Treatment': ['C', 'C', 'C'], 'Biorep': ['A', 'A', 'A'], 'Techrep': [1, 1, 1], 'AAseq': ['ELVISLIVES', 'ELVISLIVES', 'ELVISLIVES'], 'inte':[1100.0, 1050.0, 1010.0]}
df_b = pd.DataFrame(dict_data)

df_a

df_a

        AAseq   Biorep  Techrep Treatment   mz
0    ELVISLIVES  A   1   C   500.0
1    ELVISLIVES  A   1   C   500.5
2    ELVISLIVES  A   1   C   501.0

df_b

df_b

    AAseq   Biorep  Techrep Treatment   int
0    ELVISLIVES  A   1   C   1100
1    ELVISLIVES  A   1   C   1050
2    ELVISLIVES  A   1   C   1010

I can add the column the following way:

我可以通过以下方式添加列:

df_m = df_a.copy()
df_m['inte'] = df_b['inte']

     AAseq  Biorep  Techrep Treatment   inte
0    ELVISLIVES  A   1   C   1100
1    ELVISLIVES  A   1   C   1050
2    ELVISLIVES  A   1   C   1010

My real data looks much more complex and I'm afraid that the method above could lead to the wrong order of values in the rows (specially since I want to use 'pandas.melt' beforehand).

我的真实数据看起来要复杂得多,而且我担心上述方法可能会导致行中值的顺序错误(特别是因为我想事先使用 'pandas.melt')。

When using:

使用时:

dfm = pd.concat([df_a, df_b])

     AAseq  Biorep  Techrep Treatment   inte    mz
0    ELVISLIVES  A   1   C   NaN     500.0
1    ELVISLIVES  A   1   C   NaN     500.5
2    ELVISLIVES  A   1   C   NaN     501.0
0    ELVISLIVES  A   1   C   1100    NaN
1    ELVISLIVES  A   1   C   1050    NaN
2    ELVISLIVES  A   1   C   1010    NaN

The concatenated DataFrame extends the values rowwise leading to NaN vals.

串联的 DataFrame 逐行扩展导致 NaN vals 的值。

Question: How can I achieve the same result (shown above) using 'concat'?

问题:如何使用“concat”获得相同的结果(如上所示)?

Thank you for your support!

感谢您的支持!

回答by furas

Using

使用

 print pd.concat((df_a, df_b['inte']), axis=1)

you can get

你可以得到

        AAseq Biorep  Techrep Treatment     mz  inte
0  ELVISLIVES      A        1         C  500.0  1100
1  ELVISLIVES      A        1         C  500.5  1050
2  ELVISLIVES      A        1         C  501.0  1010

Is this what you expected ?

这是你的预期吗?



Or maybe you have more complicated data like this - see different values in column Treatment

或者你可能有更复杂的数据 - 在列中查看不同的值 Treatment

        AAseq Biorep  Techrep Treatment     mz
0  ELVISLIVES      A        1         A  500.0
1  ELVISLIVES      A        1         B  500.5
2  ELVISLIVES      A        1         C  501.0

        AAseq Biorep  Techrep Treatment  inte
0  ELVISLIVES      A        1         C  1100
1  ELVISLIVES      A        1         B  1050
2  ELVISLIVES      A        1         A  1010

and you need to keep order using values from columns AAseq Biorep Techrep Treatmentthen use merge

并且您需要使用列中的值保持顺序,AAseq Biorep Techrep Treatment然后使用merge

import pandas as pd
dict_data = {
    'AAseq': ['ELVISLIVES', 'ELVISLIVES', 'ELVISLIVES'],
    'Biorep': ['A', 'A', 'A'],
    'Techrep': [1, 1, 1],
    'Treatment': ['A', 'B', 'C'],
    'mz':[500.0, 500.5, 501.0]
}
df_a = pd.DataFrame(dict_data)

dict_data = {
    'AAseq': ['ELVISLIVES', 'ELVISLIVES', 'ELVISLIVES'],
    'Biorep': ['A', 'A', 'A'],
    'Techrep': [1, 1, 1],
    'Treatment': ['C', 'B', 'A'],
    'inte':[1100.0, 1050.0, 1010.0]
}
df_b = pd.DataFrame(dict_data)

print pd.merge(left=df_a, right=df_b, on=['AAseq', 'Biorep', 'Techrep', 'Treatment'])

result:

结果:

        AAseq Biorep  Techrep Treatment     mz  inte
0  ELVISLIVES      A        1         A  500.0  1010
1  ELVISLIVES      A        1         B  500.5  1050
2  ELVISLIVES      A        1         C  501.0  1100