Python 如何组合两个数据框?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12850345/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:00:11  来源:igfitidea点击:

How do I combine two data frames?

pythonpandas

提问by MKoosej

I'm using Pandas data frames. I have a initial data frame, say D. I extract two data frames from it like this:

我正在使用 Pandas 数据框。我有一个初始数据框,比如说D。我像这样从中提取两个数据帧:

A = D[D.label == k]
B = D[D.label != k]

then I change the label in Aand B

然后我更改标签中AB

A.label = 1
B.label = -1

I want to combine A and B so I can have them as one data frame, something like a union operation. The order of the data is not important. However, when we sample A and B from D, they retain their indexes from D.

我想组合 A 和 B,这样我就可以将它们作为一个数据框,就像联合操作一样。数据的顺序并不重要。但是,当我们从 D 中采样 A 和 B 时,它们保留了来自 D 的索引。

采纳答案by Joran Beasley

I believe you can use the appendmethod

我相信你可以用这个append方法

bigdata = data1.append(data2, ignore_index=True)

to keep their indexes just dont use the ignore_indexkeyword ...

保持他们的索引只是不要使用ignore_index关键字......

回答by ostrokach

You can also use pd.concat, which is particularly helpful when you are joining more than two dataframes:

您还可以使用pd.concat,这在您加入两个以上的数据帧时特别有用:

bigdata = pd.concat([data1, data2], ignore_index=True, sort =False)

回答by pelumi

Thought to add this here incase someone finds it useful. @ostrokach already mentioned how you can merge the data frames across rows which is

想在这里添加这个以防有人觉得它有用。@ostrokach 已经提到如何跨行合并数据帧

df_row_merged = pd.concat([df_a, df_b], ignore_index=True)

To merge across columns, you can use the following syntax:

要跨列合并,您可以使用以下语法:

df_col_merged =pd.concat([df_a, df_b], axis=1)

回答by martin-martin

There's another solution for the case that you are working with big data and need to concatenate multiple datasets. concatcan get performance-intensive, so if you don't want to create a new df each time, you can instead use a list comprehension:

对于您正在处理大数据并需要连接多个数据集的情况,还有另一种解决方案。concat可以获得性能密集型,因此如果您不想每次都创建一个新的 df,则可以改用列表理解

frames = [ process_file(f) for f in dataset_files ]
result = pd.append(frames)

(as pointed out here in the docsat the bottom of the section):

(正如本节底部的文档中所指出的那样):

Note: It is worth noting however, that concat(and therefore append) makes a full copy of the data, and that constantly reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension.

注意:然而,值得注意的是,concat(因此append)制作了数据的完整副本,并且不断重用此功能会造成显着的性能损失。如果您需要对多个数据集使用该操作,请使用列表推导式。

回答by Mohsin Mahmood

If you want to update/replace the values of first dataframe df1with the values of second dataframe df2. you can do it by following steps —

如果您想df1用第二个数据帧的值更新/替换第一个数据帧的值df2。你可以通过以下步骤来做到——

Step 1:Set index of the first dataframe (df1)

步骤 1:设置第一个数据帧 (df1) 的索引

df1.set_index('id')

Step 2:Set index of the second dataframe (df2)

步骤 2:设置第二个数据帧的索引 (df2)

df2.set_index('id')

and finally update the dataframe using the following snippet —

最后使用以下代码段更新数据框 -

df1.update(df2)

回答by Harish Kumawat

1st dataFrame

第一个数据帧

train.shape

result:-

结果:-

(31962, 3)

2nd dataFrame

第二个数据帧

test.shape

result:-

结果:-

(17197, 2)

Combine

结合

new_data=train.append(test,ignore_index=True)

Check

查看

new_data.shape

result:-

结果:-

(49159, 3)