Pandas：随机删除行而不混洗数据集

Question

提问by Black

I've got a dataset which needs to omit a few rows whilst preserving the order of the rows. My idea was to use a mask with a random number between 0and the length of my dataset but I'm not sure how to setup a mask without shuffling the rows around i.e. a method similar to sampling a dataset.

我有一个数据集，它需要在保留行顺序的同时省略几行。我的想法是在0我的数据集的长度和之间使用一个带有随机数的掩码，但我不确定如何设置掩码而不改变周围的行，即类似于对数据集进行采样的方法。

Example: Dataset has 5 rows and 2 columns and I would like to remove a row at random.

示例：数据集有 5 行和 2 列，我想随机删除一行。

Col1 | Col2
  A  |  1
  B  |  2 
  C  |  5     
  D  |  4
  E  |  0

transforms to:

转换为：

Col1 | Col2
  A  |  1
  B  |  2   
  D  |  4
  E  |  0

with the third row (Col1='C') omitted by a random choice.

Col1='C'随机选择省略第三行 ( )。

How should I go about this?

我该怎么办？

Answer 1

回答by cel

The following should work for you. Here I sample remove_nrandom row_ids from df's index. After that df.dropremoves those rows from the data frame and returns the new subset of the old data frame.

以下应该适合您。在这里，我remove_n从df的索引中随机抽取 row_ids 。之后df.drop从数据框中删除这些行并返回旧数据框的新子集。

import pandas as pd
import numpy as np
np.random.seed(10)

remove_n = 1
df = pd.DataFrame({"a":[1,2,3,4], "b":[5,6,7,8]})
drop_indices = np.random.choice(df.index, remove_n, replace=False)
df_subset = df.drop(drop_indices)

DataFrame df:

数据帧df：

DataFrame df_subset:

数据帧df_subset：

Pandas：随机删除行而不混洗数据集

提问by Black

回答by cel

相关推荐

最近更新

标签

Pandas：随机删除行而不混洗数据集

提问by Black

回答by cel

相关推荐

pandas 不同时区的时间数组的时间戳减法

Pandas - 是否可以在没有quotechar 的情况下读取_csv？

pandas 创建一个空的 MultiIndex

pandas 使用熊猫将多个数据帧合并为一个

相关推荐

最近更新

标签