Python：每组随机选择

Question

提问by Plug4

Say that I have a dataframe that looks like:

假设我有一个如下所示的数据框：

Name Group_Id
AAA  1
ABC  1
CCC  2
XYZ  2
DEF  3 
YYH  3

How could I randomly select one (or more) row for each Group_Id? Say that I want one random draw per Group_Id, I would get:

我怎么能随机选择一个（或多个）行Group_Id？假设我想要一个随机抽奖Group_Id，我会得到：

Name Group_Id
AAA  1
XYZ  2
DEF  3

Answer 1

采纳答案by behzad.nouri

size = 2        # sample size
replace = True  # with replacement
fn = lambda obj: obj.loc[np.random.choice(obj.index, size, replace),:]
df.groupby('Group_Id', as_index=False).apply(fn)

Answer 2

回答by gravetii

Using random.choice, you can do something like this:

使用random.choice，您可以执行以下操作：

import random
name_group = {'AAA': 1, 'ABC':1, 'CCC':2, 'XYZ':2, 'DEF':3, 'YYH':3}

names = [name for name in name_group.iterkeys()] #create a list out of the keys in the name_group dict

first_name = random.choice(names)
first_group = name_group[first_name]
print first_name, first_group

random.choice(seq)

Return a random element from the non-empty sequence seq. If seq is empty, raises IndexError.

random.choice(seq)

Return a random element from the non-empty sequence seq. If seq is empty, raises IndexError.

Answer 3

回答by YS-L

You can use a combination of pandas.groupby, pandas.concatand random.sample:

您可以使用pandas.groupby,pandas.concat和的组合random.sample：

import pandas as pd
import random

df = pd.DataFrame({
        'Name': ['AAA', 'ABC', 'CCC', 'XYZ', 'DEF', 'YYH'],
        'Group_ID': [1,1,2,2,3,3]
     })

grouped = df.groupby('Group_ID')
df_sampled = pd.concat([d.ix[random.sample(d.index, 1)] for _, d in grouped]).reset_index(drop=True)
print df_sampled

Output:

输出：

   Group_ID Name
0         1  AAA
1         2  XYZ
2         3  DEF

Answer 4

回答by grasshopper

Using groupby and random.choice in an elegant one liner:

在优雅的单行中使用 groupby 和 random.choice：

df.groupby('Group_Id').apply(lambda x :x.iloc[random.choice(range(0,len(x)))])

Answer 5

回答by Zero

From 0.16.xonwards pd.DataFrame.sampleprovides a way to return a random sample of items from an axis of object.

从0.16.x以后pd.DataFrame.sample提供了一种从对象轴返回随机项目样本的方法。

In [664]: df.groupby('Group_Id').apply(lambda x: x.sample(1)).reset_index(drop=True)
Out[664]:
  Name  Group_Id
0  ABC         1
1  XYZ         2
2  DEF         3

Answer 6

回答by ihadanny

for randomly selecting just one row per group try df.sample(frac = 1.0).groupby('Group_Id').head(1)

为每组随机选择一行尝试 df.sample(frac = 1.0).groupby('Group_Id').head(1)

Answer 7

回答by mikkokotila

There are two ways to do this very simply, one without using anything except basic pandas syntax:

有两种方法可以非常简单地做到这一点，一种是除了基本的 Pandas 语法之外不使用任何东西：

df[['x','y']].groupby('x').agg(pd.DataFrame.sample)

This takes 14.4ms with 50k row dataset.

对于 50k 行数据集，这需要 14.4 毫秒。

The other, slightly faster method, involves numpy.

另一种稍微快一点的方法是 numpy。

df[['x','y']].groupby('x').agg(np.random.choice)

This takes 10.9ms with (the same) 50k row dataset.

对于（相同的）50k 行数据集，这需要 10.9 毫秒。

Generally speaking, when using pandas, it's preferable to stick with its native syntax. Especially for beginners.

一般来说，在使用 Pandas 时，最好坚持使用它的原生语法。特别适合初学者。

Answer 8

回答by Selah

A very pandas-ish way:

一种非常熊猫式的方式：

takesamp = lambda d: d.sample(n)
df = df.groupby('Group_Id').apply(takesamp)

Python：每组随机选择

提问by Plug4

采纳答案by behzad.nouri

回答by gravetii

回答by YS-L

回答by grasshopper

回答by Zero

回答by ihadanny

回答by mikkokotila

回答by Selah

相关推荐

最近更新

标签

Python：每组随机选择

提问by Plug4

采纳答案by behzad.nouri

回答by gravetii

回答by YS-L

回答by grasshopper

回答by Zero

回答by ihadanny

回答by mikkokotila

回答by Selah

相关推荐

Python `sorted(list)` 和 `list.sort()` 有什么区别？

Python从视频文件中提取wav

Python SyntaxError: 无效语法 (<string>)

Python pyspark mapPartitions 函数是如何工作的？

相关推荐

最近更新

标签