Python:每组随机选择
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/22472213/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: Random selection per group
提问by Plug4
Say that I have a dataframe that looks like:
假设我有一个如下所示的数据框:
Name Group_Id
AAA  1
ABC  1
CCC  2
XYZ  2
DEF  3 
YYH  3
How could I randomly select one (or more) row for each Group_Id? Say that I want one random draw per Group_Id, I would get:
我怎么能随机选择一个(或多个)行Group_Id?假设我想要一个随机抽奖Group_Id,我会得到:
Name Group_Id
AAA  1
XYZ  2
DEF  3
采纳答案by behzad.nouri
size = 2        # sample size
replace = True  # with replacement
fn = lambda obj: obj.loc[np.random.choice(obj.index, size, replace),:]
df.groupby('Group_Id', as_index=False).apply(fn)
回答by gravetii
Using random.choice, you can do something like this:
使用random.choice,您可以执行以下操作:
import random
name_group = {'AAA': 1, 'ABC':1, 'CCC':2, 'XYZ':2, 'DEF':3, 'YYH':3}
names = [name for name in name_group.iterkeys()] #create a list out of the keys in the name_group dict
first_name = random.choice(names)
first_group = name_group[first_name]
print first_name, first_group
random.choice(seq)Return a random element from the non-empty sequence seq. If seq is empty, raises IndexError.
random.choice(seq)Return a random element from the non-empty sequence seq. If seq is empty, raises IndexError.
回答by YS-L
You can use a combination of pandas.groupby, pandas.concatand random.sample:
您可以使用pandas.groupby,pandas.concat和的组合random.sample:
import pandas as pd
import random
df = pd.DataFrame({
        'Name': ['AAA', 'ABC', 'CCC', 'XYZ', 'DEF', 'YYH'],
        'Group_ID': [1,1,2,2,3,3]
     })
grouped = df.groupby('Group_ID')
df_sampled = pd.concat([d.ix[random.sample(d.index, 1)] for _, d in grouped]).reset_index(drop=True)
print df_sampled
Output:
输出:
   Group_ID Name
0         1  AAA
1         2  XYZ
2         3  DEF
回答by grasshopper
Using groupby and random.choice in an elegant one liner:
在优雅的单行中使用 groupby 和 random.choice:
df.groupby('Group_Id').apply(lambda x :x.iloc[random.choice(range(0,len(x)))])
回答by Zero
From 0.16.xonwards pd.DataFrame.sampleprovides a way to return a random sample of items from an axis of object.
从0.16.x以后pd.DataFrame.sample提供了一种从对象轴返回随机项目样本的方法。
In [664]: df.groupby('Group_Id').apply(lambda x: x.sample(1)).reset_index(drop=True)
Out[664]:
  Name  Group_Id
0  ABC         1
1  XYZ         2
2  DEF         3
回答by ihadanny
for randomly selecting just one row per group try df.sample(frac = 1.0).groupby('Group_Id').head(1)
为每组随机选择一行尝试 df.sample(frac = 1.0).groupby('Group_Id').head(1)
回答by mikkokotila
There are two ways to do this very simply, one without using anything except basic pandas syntax:
有两种方法可以非常简单地做到这一点,一种是除了基本的 Pandas 语法之外不使用任何东西:
df[['x','y']].groupby('x').agg(pd.DataFrame.sample)
This takes 14.4ms with 50k row dataset.
对于 50k 行数据集,这需要 14.4 毫秒。
The other, slightly faster method, involves numpy.
另一种稍微快一点的方法是 numpy。
df[['x','y']].groupby('x').agg(np.random.choice)
This takes 10.9ms with (the same) 50k row dataset.
对于(相同的)50k 行数据集,这需要 10.9 毫秒。
Generally speaking, when using pandas, it's preferable to stick with its native syntax. Especially for beginners.
一般来说,在使用 Pandas 时,最好坚持使用它的原生语法。特别适合初学者。
回答by Selah
A very pandas-ish way:
一种非常熊猫式的方式:
takesamp = lambda d: d.sample(n)
df = df.groupby('Group_Id').apply(takesamp)

