Python Numpy：从二维数组中获取随机的一组行

Question

提问by gha

I have a very large 2D array which looks something like this:

我有一个非常大的二维数组，看起来像这样：

a=
[[a1, b1, c1],
 [a2, b2, c2],
 ...,
 [an, bn, cn]]

Using numpy, is there an easy way to get a new 2D array with, e.g., 2 random rows from the initial array a(without replacement)?

使用 numpy，是否有一种简单的方法来获得一个新的二维数组，例如，来自初始数组的 2 个随机行a（无替换）？

e.g.

例如

b=
[[a4,  b4,  c4],
 [a99, b99, c99]]

Answer 1

采纳答案by Daniel

>>> A = np.random.randint(5, size=(10,3))
>>> A
array([[1, 3, 0],
       [3, 2, 0],
       [0, 2, 1],
       [1, 1, 4],
       [3, 2, 2],
       [0, 1, 0],
       [1, 3, 1],
       [0, 4, 1],
       [2, 4, 2],
       [3, 3, 1]])
>>> idx = np.random.randint(10, size=2)
>>> idx
array([7, 6])
>>> A[idx,:]
array([[0, 4, 1],
       [1, 3, 1]])

Putting it together for a general case:

将其放在一起用于一般情况：

A[np.random.randint(A.shape[0], size=2), :]

For non replacement (numpy 1.7.0+):

对于非替换（numpy 1.7.0+）：

A[np.random.choice(A.shape[0], 2, replace=False), :]

I do not believe there is a good way to generate random list without replacement before 1.7. Perhaps you can setup a small definition that ensures the two values are not the same.

我不相信在 1.7 之前有一种无需替换即可生成随机列表的好方法。也许您可以设置一个小的定义来确保两个值不同。

Answer 2

回答by Hezi Resheff

This is an old post, but this is what works best for me:

这是一个旧帖子，但这是最适合我的：

A[np.random.choice(A.shape[0], num_rows_2_sample, replace=False)]

change the replace=False to True to get the same thing, but with replacement.

将 replace=False 更改为 True 以获得相同的结果，但需要替换。

Answer 3

回答by isosceleswheel

Another option is to create a random mask if you just want to down-sample your data by a certain factor. Say I want to down-sample to 25% of my original data set, which is currently held in the array data_arr:

如果您只想按某个因素对数据进行下采样，另一种选择是创建一个随机掩码。假设我想将原始数据集的 25% 下采样，该数据集当前保存在数组中data_arr：

# generate random boolean mask the length of data
# use p 0.75 for False and 0.25 for True
mask = numpy.random.choice([False, True], len(data_arr), p=[0.75, 0.25])

Now you can call data_arr[mask]and return ~25% of the rows, randomly sampled.

现在您可以调用data_arr[mask]并返回大约 25% 的行，随机采样。

Answer 4

回答by Ankit Agrawal

If you need the same rows but just a random sample then,

如果您需要相同的行但只是一个随机样本，那么，

import random
new_array = random.sample(old_array,x)

Here x, has to be an 'int' defining the number of rows you want to randomly pick.

这里 x, 必须是一个“int”，定义你想要随机选择的行数。

Answer 5

回答by orli

I see permutation has been suggested. In fact it can be made into one line:

我看到已建议排列。其实可以写成一行：

>>> A = np.random.randint(5, size=(10,3))
>>> np.random.permutation(A)[:2]

array([[0, 3, 0],
       [3, 1, 2]])

Answer 6

回答by Ben

If you want to generate multiple random subsets of rows, for example if your doing RANSAC.

如果您想生成多个随机的行子集，例如，如果您正在执行 RANSAC。

num_pop = 10
num_samples = 2
pop_in_sample = 3
rows_to_sample = np.random.random([num_pop, 5])
random_numbers = np.random.random([num_samples, num_pop])
samples = np.argsort(random_numbers, axis=1)[:, :pop_in_sample]
# will be shape [num_samples, pop_in_sample, 5]
row_subsets = rows_to_sample[samples, :]

Answer 7

回答by CB Madsen

This is a similar answer to the one Hezi Rasheff provided, but simplified so newer python users understand what's going on (I noticed many new datascience students fetch random samples in the weirdest ways because they don't know what they are doing in python).

这与 Hezi Rasheff 提供的答案类似，但经过简化，以便新的 Python 用户了解正在发生的事情（我注意到许多新的数据科学学生以最奇怪的方式获取随机样本，因为他们不知道他们在 Python 中做什么）。

You can get a number of random indices from your array by using:

您可以使用以下方法从数组中获取许多随机索引：

indices = np.random.choice(A.shape[0], amount_of_samples, replace=False)

You can then use slicing with your numpy array to get the samples at those indices:

然后，您可以对 numpy 数组使用切片来获取这些索引处的样本：

A[indices]

This will get you the specified number of random samples from your data.

这将从您的数据中获得指定数量的随机样本。

Python Numpy：从二维数组中获取随机的一组行

提问by gha

采纳答案by Daniel

回答by Hezi Resheff

回答by isosceleswheel

回答by Ankit Agrawal

回答by orli

回答by Ben

回答by CB Madsen

相关推荐

最近更新

标签

Python Numpy：从二维数组中获取随机的一组行

提问by gha

采纳答案by Daniel

回答by Hezi Resheff

回答by isosceleswheel

回答by Ankit Agrawal

回答by orli

回答by Ben

回答by CB Madsen

相关推荐

在 Python 中对列表列表进行排序

Python MySQLdb 安装错误 - _mysql.c:44:23: error: my_config.h: No such file or directory

Python 如何用numpy读取二进制文件的一部分？

使用 python，从字符串中删除 HTML 标签/格式

相关推荐

最近更新

标签