pandas 如何在熊猫的条件下采样?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/32683083/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-13 23:54:29  来源:igfitidea点击:

How to sample on condition with pandas?

pythonpandassampling

提问by Bob

I hava a dataframe df like the following:

我有一个如下所示的数据框 df:

   Col1      Col2
0  1         T
1  1         B 
2  3         S
3  2         A
4  1         C
5  2         A
etc...

I would like to create two dataframes: df1 is a random sample of 10 rows such that Col2=='T'. df2 is df minus the rows in df1.

我想创建两个数据帧:df1 是 10 行的随机样本,使得 Col2=='T'。df2 是 df 减去 df1 中的行。

回答by DSM

Assuming you have a unique-indexed dataframe (and if you don't, you can simply do .reset_index(), apply this, and then set_indexafter the fact), you could use DataFrame.sample. [Actually, you should be able to use sampleeven if the frame didn'thave a unique index, but you couldn't use the below method to get df2.]

假设你有一个唯一索引的数据框(如果你没有,你可以简单地做.reset_index(),应用它,然后set_index在事实之后),你可以使用DataFrame.sample. [实际上,sample即使框架没有唯一索引,您也应该可以使用,但是您无法使用以下方法获取df2。]

Note that I'm using A instead of T in this example because A is the only repeated value of Col2 in the example you gave, and I'll only select 1 randomly rather than 10.

请注意,在此示例中我使用 A 而不是 T,因为 A 是您给出的示例中唯一重复的 Col2 值,并且我只会随机选择 1 而不是 10。

>>> df1 = df[df.Col2 == "A"].sample(1)
>>> df2 = df[~df.index.isin(df1.index)]
>>> df1
   Col1 Col2
3     2    A
>>> df2
   Col1 Col2
0     1    T
1     1    B
2     3    S
4     1    C
5     2    A