Python Pandas 使用索引或列标识符连接/合并数据帧
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/24712647/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Pandas concatenate/merge DataFrames using index or column identifiers
提问by tryptofame
I would like to use the 'pandas.concat' method to merge two DataFrames, but I don't fully understand all 'pandas.concat' arguments. I've got two DataFrames, which have the same identifying variables in the columns, but differ in one single column.
我想使用 'pandas.concat' 方法来合并两个 DataFrame,但我并不完全理解所有的 'pandas.concat' 参数。我有两个 DataFrame,它们在列中具有相同的识别变量,但在一列中不同。
import pandas as pd
dict_data = {'Treatment': ['C', 'C', 'C'], 'Biorep': ['A', 'A', 'A'], 'Techrep': [1, 1, 1], 'AAseq': ['ELVISLIVES', 'ELVISLIVES', 'ELVISLIVES'], 'mz':[500.0, 500.5, 501.0]}
df_a = pd.DataFrame(dict_data)
dict_data = {'Treatment': ['C', 'C', 'C'], 'Biorep': ['A', 'A', 'A'], 'Techrep': [1, 1, 1], 'AAseq': ['ELVISLIVES', 'ELVISLIVES', 'ELVISLIVES'], 'inte':[1100.0, 1050.0, 1010.0]}
df_b = pd.DataFrame(dict_data)
df_a
df_a
AAseq Biorep Techrep Treatment mz
0 ELVISLIVES A 1 C 500.0
1 ELVISLIVES A 1 C 500.5
2 ELVISLIVES A 1 C 501.0
df_b
df_b
AAseq Biorep Techrep Treatment int
0 ELVISLIVES A 1 C 1100
1 ELVISLIVES A 1 C 1050
2 ELVISLIVES A 1 C 1010
I can add the column the following way:
我可以通过以下方式添加列:
df_m = df_a.copy()
df_m['inte'] = df_b['inte']
AAseq Biorep Techrep Treatment inte
0 ELVISLIVES A 1 C 1100
1 ELVISLIVES A 1 C 1050
2 ELVISLIVES A 1 C 1010
My real data looks much more complex and I'm afraid that the method above could lead to the wrong order of values in the rows (specially since I want to use 'pandas.melt' beforehand).
我的真实数据看起来要复杂得多,而且我担心上述方法可能会导致行中值的顺序错误(特别是因为我想事先使用 'pandas.melt')。
When using:
使用时:
dfm = pd.concat([df_a, df_b])
AAseq Biorep Techrep Treatment inte mz
0 ELVISLIVES A 1 C NaN 500.0
1 ELVISLIVES A 1 C NaN 500.5
2 ELVISLIVES A 1 C NaN 501.0
0 ELVISLIVES A 1 C 1100 NaN
1 ELVISLIVES A 1 C 1050 NaN
2 ELVISLIVES A 1 C 1010 NaN
The concatenated DataFrame extends the values rowwise leading to NaN vals.
串联的 DataFrame 逐行扩展导致 NaN vals 的值。
Question: How can I achieve the same result (shown above) using 'concat'?
问题:如何使用“concat”获得相同的结果(如上所示)?
Thank you for your support!
感谢您的支持!
回答by furas
Using
使用
print pd.concat((df_a, df_b['inte']), axis=1)
you can get
你可以得到
AAseq Biorep Techrep Treatment mz inte
0 ELVISLIVES A 1 C 500.0 1100
1 ELVISLIVES A 1 C 500.5 1050
2 ELVISLIVES A 1 C 501.0 1010
Is this what you expected ?
这是你的预期吗?
Or maybe you have more complicated data like this - see different values in column Treatment
或者你可能有更复杂的数据 - 在列中查看不同的值 Treatment
AAseq Biorep Techrep Treatment mz
0 ELVISLIVES A 1 A 500.0
1 ELVISLIVES A 1 B 500.5
2 ELVISLIVES A 1 C 501.0
AAseq Biorep Techrep Treatment inte
0 ELVISLIVES A 1 C 1100
1 ELVISLIVES A 1 B 1050
2 ELVISLIVES A 1 A 1010
and you need to keep order using values from columns AAseq Biorep Techrep Treatmentthen use merge
并且您需要使用列中的值保持顺序,AAseq Biorep Techrep Treatment然后使用merge
import pandas as pd
dict_data = {
'AAseq': ['ELVISLIVES', 'ELVISLIVES', 'ELVISLIVES'],
'Biorep': ['A', 'A', 'A'],
'Techrep': [1, 1, 1],
'Treatment': ['A', 'B', 'C'],
'mz':[500.0, 500.5, 501.0]
}
df_a = pd.DataFrame(dict_data)
dict_data = {
'AAseq': ['ELVISLIVES', 'ELVISLIVES', 'ELVISLIVES'],
'Biorep': ['A', 'A', 'A'],
'Techrep': [1, 1, 1],
'Treatment': ['C', 'B', 'A'],
'inte':[1100.0, 1050.0, 1010.0]
}
df_b = pd.DataFrame(dict_data)
print pd.merge(left=df_a, right=df_b, on=['AAseq', 'Biorep', 'Techrep', 'Treatment'])
result:
结果:
AAseq Biorep Techrep Treatment mz inte
0 ELVISLIVES A 1 A 500.0 1010
1 ELVISLIVES A 1 B 500.5 1050
2 ELVISLIVES A 1 C 501.0 1100

