pandas 如何将数据帧拆分为多个数据帧，其中每个数据帧包含相等但随机的数据

Question

提问by Anil K

How do I split a dataframe into multiple dataframes where each dataframe contains equal but random data? It is not based on a specific column.

如何将数据帧拆分为多个数据帧，其中每个数据帧都包含相等但随机的数据？它不是基于特定的列。

For instance, I have one 100 rows and 30 columns in a dataframe. I want to divide this data into 5 lots. I should have 20 records in each of the dataframe with same 30 columns and there is no duplication across all the 5 lots and the way I pick the rows should be random.. I don't want the random picking on a single column.

例如，我在一个数据框中有 100 行和 30 列。我想把这些数据分成 5 批。我应该在每个数据框中有 20 条记录，具有相同的 30 列，并且所有 5 个批次都没有重复，而且我选择行的方式应该是随机的。我不想在单列上随机选择。

One way I thought I will use index and numpy and divide them into lots and use that to split the dataframe. Wanted to see if someone has an easy and pandas way of doing it.

我认为我将使用 index 和 numpy 并将它们分成很多部分并使用它来分割数据帧的一种方法。想看看是否有人有一种简单的Pandas方法来做到这一点。

Answer 1

回答by Patrick Hingston

If you do not care about the new dataframes potentially containing some of the same information, you could use samplewhere fracspecifies the fraction of the dataframe that you desire

如果您不关心可能包含某些相同信息的新数据帧，您可以使用samplewherefrac指定您想要的数据帧的分数

df1 = df.sample(frac=0.5) # df1 is now a random sample of half the dataframe

EDIT:

编辑：

If you want to avoid duplicates, you can use shufflefrom sklearn

如果你想避免重复，你可以使用shufflefromsklearn

from sklearn.utils import shuffle

df = shuffle(df)
df1 = df[0:3]
df2 = df[3:6]

Answer 2

回答by SimplySnee

Depending on your need, you could use pandas.DataFrame.sample()to randomly sample your original data frame, df.

根据您的需要，您可以使用pandas.DataFrame.sample()对原始数据框 df 进行随机采样。

df1 = df.sample(n=3) 
df2 = df.sample(n=3)

gives you two subsets, each with 3 samples. Equal number of records and random.

给你两个子集，每个子集有 3 个样本。记录数和随机数相等。

pandas 如何将数据帧拆分为多个数据帧，其中每个数据帧包含相等但随机的数据

提问by Anil K

回答by Patrick Hingston

回答by SimplySnee

相关推荐

最近更新

标签

pandas 如何将数据帧拆分为多个数据帧，其中每个数据帧包含相等但随机的数据

提问by Anil K

回答by Patrick Hingston

回答by SimplySnee

相关推荐

使用 pandas 和 numpy 将字符串类别映射到数字

PyInstaller with Pandas 创建超过 500 MB exe

pandas 重命名熊猫系列中的某些值

Python Pandas Dataframe 替换低于阈值的值

相关推荐

最近更新

标签