Python 在pandas groupby之后对每组进行采样

Question

提问by gongzhitaao

I know this must have been answered some where but I just could not find it.

我知道这一定在某些地方得到了回答，但我就是找不到。

Problem: Sample each group after groupby operation.

问题：在 groupby 操作后对每个组进行采样。

import pandas as pd

df = pd.DataFrame({'a': [1,2,3,4,5,6,7],
                   'b': [1,1,1,0,0,0,0]})

grouped = df.groupby('b')

# now sample from each group, e.g., I want 30% of each group

Answer 1

回答by EdChum

Apply a lambda and call samplewith param frac:

应用 lambda 并sample使用 param调用frac：

In [2]:
df = pd.DataFrame({'a': [1,2,3,4,5,6,7],
                   'b': [1,1,1,0,0,0,0]})
?
grouped = df.groupby('b')
grouped.apply(lambda x: x.sample(frac=0.3))

Out[2]:
     a  b
b        
0 6  7  0
1 2  3  1

Answer 2

回答by cs95

Sample a fraction of each group

对每组的一小部分进行采样

You can use GroupBy.applywith sample. You do not need to use a lambda; applyaccepts keyword arguments:

您可以GroupBy.apply与sample. 您不需要使用 lambda；apply接受关键字参数：

df.groupby('b').apply(pd.DataFrame.sample, frac=.3)
     a  b
b        
0 6  7  0
1 0  1  1

If the MultiIndex is not required, you may specify group_keys=Falseto groupby:

如果不需要多指标，你可以指定group_keys=False到groupby：

df.groupby('b', group_keys=False).apply(pd.DataFrame.sample, frac=.3)

   a  b
6  7  0
2  3  1

Sample `N`rows from each group

`N`来自每组的样本行

applyis slow. If your use case is to sample a fixed number of rows, you can shuffle the DataFrame beforehand, then use GroupBy.head.

apply是慢的。如果您的用例是对固定数量的行进行采样，您可以事先对 DataFrame 进行混洗，然后使用GroupBy.head.

df.sample(frac=1).groupby('b').head(2)

   a  b
2  3  1
5  6  0
1  2  1
4  5  0

This is the same as df.groupby('b', group_keys=False).apply(pd.DataFrame.sample, n=N), but faster:

这与相同df.groupby('b', group_keys=False).apply(pd.DataFrame.sample, n=N)，但速度更快：

%%timeit df.groupby('b', group_keys=False).apply(pd.DataFrame.sample, n=2)  
                                                 # 3.19 ms ± 90.5 μs
%timeit df.sample(frac=1).groupby('b').head(2)   # 1.56 ms ± 103 μs

Python 在pandas groupby之后对每组进行采样

提问by gongzhitaao

回答by EdChum

回答by cs95

Sample a fraction of each group

对每组的一小部分进行采样

Sample `N`rows from each group

`N`来自每组的样本行

相关推荐

最近更新

标签

Python 在pandas groupby之后对每组进行采样

提问by gongzhitaao

回答by EdChum

回答by cs95

Sample a fraction of each group

对每组的一小部分进行采样

Sample Nrows from each group

N来自每组的样本行

相关推荐

Python 将外部 SQL 文件读入 Pandas Dataframe

Python asyncio.ensure_future 与 BaseEventLoop.create_task 与简单协程？

如何将 python int 转换为 numpy.int64？

Python 读取文本文件并将字符串转换为浮点数

相关推荐

最近更新

标签

Sample `N`rows from each group

`N`来自每组的样本行