Python Numpy:从二维数组中获取随机的一组行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/14262654/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Numpy: Get random set of rows from 2D array
提问by gha
I have a very large 2D array which looks something like this:
我有一个非常大的二维数组,看起来像这样:
a=
[[a1, b1, c1],
[a2, b2, c2],
...,
[an, bn, cn]]
Using numpy, is there an easy way to get a new 2D array with, e.g., 2 random rows from the initial array a(without replacement)?
使用 numpy,是否有一种简单的方法来获得一个新的二维数组,例如,来自初始数组的 2 个随机行a(无替换)?
e.g.
例如
b=
[[a4, b4, c4],
[a99, b99, c99]]
采纳答案by Daniel
>>> A = np.random.randint(5, size=(10,3))
>>> A
array([[1, 3, 0],
[3, 2, 0],
[0, 2, 1],
[1, 1, 4],
[3, 2, 2],
[0, 1, 0],
[1, 3, 1],
[0, 4, 1],
[2, 4, 2],
[3, 3, 1]])
>>> idx = np.random.randint(10, size=2)
>>> idx
array([7, 6])
>>> A[idx,:]
array([[0, 4, 1],
[1, 3, 1]])
Putting it together for a general case:
将其放在一起用于一般情况:
A[np.random.randint(A.shape[0], size=2), :]
For non replacement (numpy 1.7.0+):
对于非替换(numpy 1.7.0+):
A[np.random.choice(A.shape[0], 2, replace=False), :]
I do not believe there is a good way to generate random list without replacement before 1.7. Perhaps you can setup a small definition that ensures the two values are not the same.
我不相信在 1.7 之前有一种无需替换即可生成随机列表的好方法。也许您可以设置一个小的定义来确保两个值不同。
回答by Hezi Resheff
This is an old post, but this is what works best for me:
这是一个旧帖子,但这是最适合我的:
A[np.random.choice(A.shape[0], num_rows_2_sample, replace=False)]
change the replace=False to True to get the same thing, but with replacement.
将 replace=False 更改为 True 以获得相同的结果,但需要替换。
回答by isosceleswheel
Another option is to create a random mask if you just want to down-sample your data by a certain factor. Say I want to down-sample to 25% of my original data set, which is currently held in the array data_arr:
如果您只想按某个因素对数据进行下采样,另一种选择是创建一个随机掩码。假设我想将原始数据集的 25% 下采样,该数据集当前保存在数组中data_arr:
# generate random boolean mask the length of data
# use p 0.75 for False and 0.25 for True
mask = numpy.random.choice([False, True], len(data_arr), p=[0.75, 0.25])
Now you can call data_arr[mask]and return ~25% of the rows, randomly sampled.
现在您可以调用data_arr[mask]并返回大约 25% 的行,随机采样。
回答by Ankit Agrawal
If you need the same rows but just a random sample then,
如果您需要相同的行但只是一个随机样本,那么,
import random
new_array = random.sample(old_array,x)
Here x, has to be an 'int' defining the number of rows you want to randomly pick.
这里 x, 必须是一个“int”,定义你想要随机选择的行数。
回答by orli
I see permutation has been suggested. In fact it can be made into one line:
我看到已建议排列。其实可以写成一行:
>>> A = np.random.randint(5, size=(10,3))
>>> np.random.permutation(A)[:2]
array([[0, 3, 0],
[3, 1, 2]])
回答by Ben
If you want to generate multiple random subsets of rows, for example if your doing RANSAC.
如果您想生成多个随机的行子集,例如,如果您正在执行 RANSAC。
num_pop = 10
num_samples = 2
pop_in_sample = 3
rows_to_sample = np.random.random([num_pop, 5])
random_numbers = np.random.random([num_samples, num_pop])
samples = np.argsort(random_numbers, axis=1)[:, :pop_in_sample]
# will be shape [num_samples, pop_in_sample, 5]
row_subsets = rows_to_sample[samples, :]
回答by CB Madsen
This is a similar answer to the one Hezi Rasheff provided, but simplified so newer python users understand what's going on (I noticed many new datascience students fetch random samples in the weirdest ways because they don't know what they are doing in python).
这与 Hezi Rasheff 提供的答案类似,但经过简化,以便新的 Python 用户了解正在发生的事情(我注意到许多新的数据科学学生以最奇怪的方式获取随机样本,因为他们不知道他们在 Python 中做什么)。
You can get a number of random indices from your array by using:
您可以使用以下方法从数组中获取许多随机索引:
indices = np.random.choice(A.shape[0], amount_of_samples, replace=False)
You can then use slicing with your numpy array to get the samples at those indices:
然后,您可以对 numpy 数组使用切片来获取这些索引处的样本:
A[indices]
This will get you the specified number of random samples from your data.
这将从您的数据中获得指定数量的随机样本。

