Python 如何组合两个数据框?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12850345/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How do I combine two data frames?
提问by MKoosej
I'm using Pandas data frames. I have a initial data frame, say D. I extract two data frames from it like this:
我正在使用 Pandas 数据框。我有一个初始数据框,比如说D。我像这样从中提取两个数据帧:
A = D[D.label == k]
B = D[D.label != k]
then I change the label in Aand B
然后我更改标签中A和B
A.label = 1
B.label = -1
I want to combine A and B so I can have them as one data frame, something like a union operation. The order of the data is not important. However, when we sample A and B from D, they retain their indexes from D.
我想组合 A 和 B,这样我就可以将它们作为一个数据框,就像联合操作一样。数据的顺序并不重要。但是,当我们从 D 中采样 A 和 B 时,它们保留了来自 D 的索引。
采纳答案by Joran Beasley
I believe you can use the appendmethod
我相信你可以用这个append方法
bigdata = data1.append(data2, ignore_index=True)
to keep their indexes just dont use the ignore_indexkeyword ...
保持他们的索引只是不要使用ignore_index关键字......
回答by ostrokach
You can also use pd.concat, which is particularly helpful when you are joining more than two dataframes:
您还可以使用pd.concat,这在您加入两个以上的数据帧时特别有用:
bigdata = pd.concat([data1, data2], ignore_index=True, sort =False)
回答by pelumi
Thought to add this here incase someone finds it useful. @ostrokach already mentioned how you can merge the data frames across rows which is
想在这里添加这个以防有人觉得它有用。@ostrokach 已经提到如何跨行合并数据帧
df_row_merged = pd.concat([df_a, df_b], ignore_index=True)
To merge across columns, you can use the following syntax:
要跨列合并,您可以使用以下语法:
df_col_merged =pd.concat([df_a, df_b], axis=1)
回答by martin-martin
There's another solution for the case that you are working with big data and need to concatenate multiple datasets. concatcan get performance-intensive, so if you don't want to create a new df each time, you can instead use a list comprehension:
对于您正在处理大数据并需要连接多个数据集的情况,还有另一种解决方案。concat可以获得性能密集型,因此如果您不想每次都创建一个新的 df,则可以改用列表理解:
frames = [ process_file(f) for f in dataset_files ]
result = pd.append(frames)
(as pointed out here in the docsat the bottom of the section):
(正如本节底部的文档中所指出的那样):
Note: It is worth noting however, that
concat(and thereforeappend) makes a full copy of the data, and that constantly reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension.
注意:然而,值得注意的是,
concat(因此append)制作了数据的完整副本,并且不断重用此功能会造成显着的性能损失。如果您需要对多个数据集使用该操作,请使用列表推导式。
回答by Mohsin Mahmood
If you want to update/replace the values of first dataframe df1with the values of second dataframe df2. you can do it by following steps —
如果您想df1用第二个数据帧的值更新/替换第一个数据帧的值df2。你可以通过以下步骤来做到——
Step 1:Set index of the first dataframe (df1)
步骤 1:设置第一个数据帧 (df1) 的索引
df1.set_index('id')
Step 2:Set index of the second dataframe (df2)
步骤 2:设置第二个数据帧的索引 (df2)
df2.set_index('id')
and finally update the dataframe using the following snippet —
最后使用以下代码段更新数据框 -
df1.update(df2)
回答by Harish Kumawat
1st dataFrame
第一个数据帧
train.shape
result:-
结果:-
(31962, 3)
2nd dataFrame
第二个数据帧
test.shape
result:-
结果:-
(17197, 2)
Combine
结合
new_data=train.append(test,ignore_index=True)
Check
查看
new_data.shape
result:-
结果:-
(49159, 3)

