Pandas:随机删除行而不混洗数据集

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/28556942/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 22:57:18  来源:igfitidea点击:

Pandas: Remove rows at random without shuffling dataset

pythonpandas

提问by Black

I've got a dataset which needs to omit a few rows whilst preserving the order of the rows. My idea was to use a mask with a random number between 0and the length of my dataset but I'm not sure how to setup a mask without shuffling the rows around i.e. a method similar to sampling a dataset.

我有一个数据集,它需要在保留行顺序的同时省略几行。我的想法是在0我的数据集的长度和之间使用一个带有随机数的掩码,但我不确定如何设置掩码而不改变周围的行,即类似于对数据集进行采样的方法。

Example: Dataset has 5 rows and 2 columns and I would like to remove a row at random.

示例:数据集有 5 行和 2 列,我想随机删除一行。

Col1 | Col2
  A  |  1
  B  |  2 
  C  |  5     
  D  |  4
  E  |  0

transforms to:

转换为:

Col1 | Col2
  A  |  1
  B  |  2   
  D  |  4
  E  |  0

with the third row (Col1='C') omitted by a random choice.

Col1='C'随机选择省略第三行 ( )。

How should I go about this?

我该怎么办?

回答by cel

The following should work for you. Here I sample remove_nrandom row_ids from df's index. After that df.dropremoves those rows from the data frame and returns the new subset of the old data frame.

以下应该适合您。在这里,我remove_ndf的索引中随机抽取 row_ids 。之后df.drop从数据框中删除这些行并返回旧数据框的新子集。

import pandas as pd
import numpy as np
np.random.seed(10)

remove_n = 1
df = pd.DataFrame({"a":[1,2,3,4], "b":[5,6,7,8]})
drop_indices = np.random.choice(df.index, remove_n, replace=False)
df_subset = df.drop(drop_indices)

DataFrame df:

数据帧df

    a   b
0   1   5
1   2   6
2   3   7
3   4   8

DataFrame df_subset:

数据帧df_subset

    a   b
0   1   5
1   2   6
3   4   8