Python Pandas 使用索引或列标识符连接/合并数据帧

Question

提问by tryptofame

I would like to use the 'pandas.concat' method to merge two DataFrames, but I don't fully understand all 'pandas.concat' arguments. I've got two DataFrames, which have the same identifying variables in the columns, but differ in one single column.

我想使用 'pandas.concat' 方法来合并两个 DataFrame，但我并不完全理解所有的 'pandas.concat' 参数。我有两个 DataFrame，它们在列中具有相同的识别变量，但在一列中不同。

import pandas as pd
dict_data = {'Treatment': ['C', 'C', 'C'], 'Biorep': ['A', 'A', 'A'], 'Techrep': [1, 1, 1], 'AAseq': ['ELVISLIVES', 'ELVISLIVES', 'ELVISLIVES'], 'mz':[500.0, 500.5, 501.0]}
df_a = pd.DataFrame(dict_data)
dict_data = {'Treatment': ['C', 'C', 'C'], 'Biorep': ['A', 'A', 'A'], 'Techrep': [1, 1, 1], 'AAseq': ['ELVISLIVES', 'ELVISLIVES', 'ELVISLIVES'], 'inte':[1100.0, 1050.0, 1010.0]}
df_b = pd.DataFrame(dict_data)

df_a

        AAseq   Biorep  Techrep Treatment   mz
0    ELVISLIVES  A   1   C   500.0
1    ELVISLIVES  A   1   C   500.5
2    ELVISLIVES  A   1   C   501.0

df_b

    AAseq   Biorep  Techrep Treatment   int
0    ELVISLIVES  A   1   C   1100
1    ELVISLIVES  A   1   C   1050
2    ELVISLIVES  A   1   C   1010

I can add the column the following way:

我可以通过以下方式添加列：

df_m = df_a.copy()
df_m['inte'] = df_b['inte']

     AAseq  Biorep  Techrep Treatment   inte
0    ELVISLIVES  A   1   C   1100
1    ELVISLIVES  A   1   C   1050
2    ELVISLIVES  A   1   C   1010

My real data looks much more complex and I'm afraid that the method above could lead to the wrong order of values in the rows (specially since I want to use 'pandas.melt' beforehand).

我的真实数据看起来要复杂得多，而且我担心上述方法可能会导致行中值的顺序错误（特别是因为我想事先使用 'pandas.melt'）。

When using:

使用时：

dfm = pd.concat([df_a, df_b])

     AAseq  Biorep  Techrep Treatment   inte    mz
0    ELVISLIVES  A   1   C   NaN     500.0
1    ELVISLIVES  A   1   C   NaN     500.5
2    ELVISLIVES  A   1   C   NaN     501.0
0    ELVISLIVES  A   1   C   1100    NaN
1    ELVISLIVES  A   1   C   1050    NaN
2    ELVISLIVES  A   1   C   1010    NaN

The concatenated DataFrame extends the values rowwise leading to NaN vals.

串联的 DataFrame 逐行扩展导致 NaN vals 的值。

Question: How can I achieve the same result (shown above) using 'concat'?

问题：如何使用“concat”获得相同的结果（如上所示）？

Thank you for your support!

感谢您的支持！

Answer 1

回答by furas

Using

使用

 print pd.concat((df_a, df_b['inte']), axis=1)

you can get

你可以得到

        AAseq Biorep  Techrep Treatment     mz  inte
0  ELVISLIVES      A        1         C  500.0  1100
1  ELVISLIVES      A        1         C  500.5  1050
2  ELVISLIVES      A        1         C  501.0  1010

Is this what you expected ?

这是你的预期吗？

Or maybe you have more complicated data like this - see different values in column Treatment

或者你可能有更复杂的数据 - 在列中查看不同的值 Treatment

        AAseq Biorep  Techrep Treatment     mz
0  ELVISLIVES      A        1         A  500.0
1  ELVISLIVES      A        1         B  500.5
2  ELVISLIVES      A        1         C  501.0

        AAseq Biorep  Techrep Treatment  inte
0  ELVISLIVES      A        1         C  1100
1  ELVISLIVES      A        1         B  1050
2  ELVISLIVES      A        1         A  1010

and you need to keep order using values from columns AAseq Biorep Techrep Treatmentthen use merge

并且您需要使用列中的值保持顺序，AAseq Biorep Techrep Treatment然后使用merge

import pandas as pd
dict_data = {
    'AAseq': ['ELVISLIVES', 'ELVISLIVES', 'ELVISLIVES'],
    'Biorep': ['A', 'A', 'A'],
    'Techrep': [1, 1, 1],
    'Treatment': ['A', 'B', 'C'],
    'mz':[500.0, 500.5, 501.0]
}
df_a = pd.DataFrame(dict_data)

dict_data = {
    'AAseq': ['ELVISLIVES', 'ELVISLIVES', 'ELVISLIVES'],
    'Biorep': ['A', 'A', 'A'],
    'Techrep': [1, 1, 1],
    'Treatment': ['C', 'B', 'A'],
    'inte':[1100.0, 1050.0, 1010.0]
}
df_b = pd.DataFrame(dict_data)

print pd.merge(left=df_a, right=df_b, on=['AAseq', 'Biorep', 'Techrep', 'Treatment'])

result:

结果：

        AAseq Biorep  Techrep Treatment     mz  inte
0  ELVISLIVES      A        1         A  500.0  1010
1  ELVISLIVES      A        1         B  500.5  1050
2  ELVISLIVES      A        1         C  501.0  1100

Python Pandas 使用索引或列标识符连接/合并数据帧

提问by tryptofame

回答by furas

相关推荐

最近更新

标签

Python Pandas 使用索引或列标识符连接/合并数据帧

提问by tryptofame

回答by furas

相关推荐

pandas 是否可以在 Python ggplot 上绘制多折线图？

Pandas：时间戳索引四舍五入到最接近的第 5 分钟

pandas 熊猫在excel编写器中设置单元格格式

pandas 熊猫读取excel：不解析数字

相关推荐

最近更新

标签