Python 如何组合两个数据框？

Question

提问by MKoosej

I'm using Pandas data frames. I have a initial data frame, say D. I extract two data frames from it like this:

我正在使用 Pandas 数据框。我有一个初始数据框，比如说D。我像这样从中提取两个数据帧：

A = D[D.label == k]
B = D[D.label != k]

then I change the label in Aand B

然后我更改标签中A和B

A.label = 1
B.label = -1

I want to combine A and B so I can have them as one data frame, something like a union operation. The order of the data is not important. However, when we sample A and B from D, they retain their indexes from D.

我想组合 A 和 B，这样我就可以将它们作为一个数据框，就像联合操作一样。数据的顺序并不重要。但是，当我们从 D 中采样 A 和 B 时，它们保留了来自 D 的索引。

Answer 1

采纳答案by Joran Beasley

I believe you can use the appendmethod

我相信你可以用这个append方法

bigdata = data1.append(data2, ignore_index=True)

to keep their indexes just dont use the ignore_indexkeyword ...

保持他们的索引只是不要使用ignore_index关键字......

Answer 2

回答by ostrokach

You can also use pd.concat, which is particularly helpful when you are joining more than two dataframes:

您还可以使用pd.concat，这在您加入两个以上的数据帧时特别有用：

bigdata = pd.concat([data1, data2], ignore_index=True, sort =False)

Answer 3

回答by pelumi

Thought to add this here incase someone finds it useful. @ostrokach already mentioned how you can merge the data frames across rows which is

想在这里添加这个以防有人觉得它有用。@ostrokach 已经提到如何跨行合并数据帧

df_row_merged = pd.concat([df_a, df_b], ignore_index=True)

To merge across columns, you can use the following syntax:

要跨列合并，您可以使用以下语法：

df_col_merged =pd.concat([df_a, df_b], axis=1)

Answer 4

回答by martin-martin

There's another solution for the case that you are working with big data and need to concatenate multiple datasets. concatcan get performance-intensive, so if you don't want to create a new df each time, you can instead use a list comprehension:

对于您正在处理大数据并需要连接多个数据集的情况，还有另一种解决方案。concat可以获得性能密集型，因此如果您不想每次都创建一个新的 df，则可以改用列表理解：

frames = [ process_file(f) for f in dataset_files ]
result = pd.append(frames)

(as pointed out here in the docsat the bottom of the section):

（正如本节底部的文档中所指出的那样）：

Note: It is worth noting however, that concat(and therefore append) makes a full copy of the data, and that constantly reusing this function can create a significant performance hit. If you need to use the operation over several datasets, use a list comprehension.

注意：然而，值得注意的是，concat（因此append）制作了数据的完整副本，并且不断重用此功能会造成显着的性能损失。如果您需要对多个数据集使用该操作，请使用列表推导式。

Answer 5

回答by Mohsin Mahmood

If you want to update/replace the values of first dataframe df1with the values of second dataframe df2. you can do it by following steps —

如果您想df1用第二个数据帧的值更新/替换第一个数据帧的值df2。你可以通过以下步骤来做到——

Step 1:Set index of the first dataframe (df1)

步骤 1：设置第一个数据帧 (df1) 的索引

df1.set_index('id')

Step 2:Set index of the second dataframe (df2)

步骤 2：设置第二个数据帧的索引 (df2)

df2.set_index('id')

and finally update the dataframe using the following snippet —

最后使用以下代码段更新数据框 -

df1.update(df2)

Answer 6

回答by Harish Kumawat

1st dataFrame

第一个数据帧

train.shape

result:-

结果：-

(31962, 3)

2nd dataFrame

第二个数据帧

test.shape

result:-

结果：-

(17197, 2)

Combine

结合

new_data=train.append(test,ignore_index=True)

Check

查看

new_data.shape

result:-

结果：-

(49159, 3)

Python 如何组合两个数据框？

提问by MKoosej

采纳答案by Joran Beasley

回答by ostrokach

回答by pelumi

回答by martin-martin

回答by Mohsin Mahmood

回答by Harish Kumawat

相关推荐

最近更新

标签

Python 如何组合两个数据框？

提问by MKoosej

采纳答案by Joran Beasley

回答by ostrokach

回答by pelumi

回答by martin-martin

回答by Mohsin Mahmood

回答by Harish Kumawat

相关推荐

如何在 Python 模块中正确使用相对或绝对导入？

Python 格式错误的字符串 ValueError ast.literal_eval() 与元组的字符串表示

Python Scipy Normaltest 是怎么用的？

用于 MySQL 的转义字符串 Python

相关推荐

最近更新

标签