Python 删除numpy数组的重复行
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31097247/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Remove duplicate rows of a numpy array
提问by thunder
How can I remove duplicate rows of a 2 dimensional numpy
array?
如何删除二维numpy
数组的重复行?
data = np.array([[1,8,3,3,4],
[1,8,9,9,4],
[1,8,3,3,4]])
The answer should be as follows:
答案应该如下:
ans = array([[1,8,3,3,4],
[1,8,9,9,4]])
If there are two rows that are the same, then I would like to remove one "duplicate" row.
如果有两行相同,那么我想删除一个“重复”行。
采纳答案by ThePredator
You can use numpy unique
. Since you want the unique rows, we need to put them into tuples:
您可以使用numpy unique
. 由于您想要唯一的行,我们需要将它们放入元组中:
import numpy as np
data = np.array([[1,8,3,3,4],
[1,8,9,9,4],
[1,8,3,3,4]])
just applying np.unique
to the data
array will result in this:
只是应用np.unique
到data
阵列会导致这样的:
>>> uniques
array([1, 3, 4, 8, 9])
prints out the unique elements in the list. So putting them into tuples results in:
打印出列表中的唯一元素。因此,将它们放入元组会导致:
new_array = [tuple(row) for row in data]
uniques = np.unique(new_array)
which prints:
打印:
>>> uniques
array([[1, 8, 3, 3, 4],
[1, 8, 9, 9, 4]])
回答by omerbp
A simple solution can be:
一个简单的解决方案可以是:
import numpy as np
def unique_rows(a):
a = np.ascontiguousarray(a)
unique_a = np.unique(a.view([('', a.dtype)]*a.shape[1]))
return unique_a.view(a.dtype).reshape((unique_a.shape[0], a.shape[1]))
data = np.array([[1,8,3,3,4],
[1,8,9,9,4],
[1,8,3,3,4]])
print unique_rows(data)
#prints:
[[1 8 3 3 4]
[1 8 9 9 4]]
You can check thisfor many more solutions for this problem
您可以检查这个对于这个问题有更多的解决方案
回答by Divakar
One approach with lex-sorting
-
一种方法lex-sorting
-
# Perform lex sort and get sorted data
sorted_idx = np.lexsort(data.T)
sorted_data = data[sorted_idx,:]
# Get unique row mask
row_mask = np.append([True],np.any(np.diff(sorted_data,axis=0),1))
# Get unique rows
out = sorted_data[row_mask]
Sample run -
样品运行 -
In [199]: data
Out[199]:
array([[1, 8, 3, 3, 4],
[1, 8, 9, 9, 4],
[1, 8, 3, 3, 4],
[1, 8, 3, 3, 4],
[1, 8, 0, 3, 4],
[1, 8, 9, 9, 4]])
In [200]: sorted_idx = np.lexsort(data.T)
...: sorted_data = data[sorted_idx,:]
...: row_mask = np.append([True],np.any(np.diff(sorted_data,axis=0),1))
...: out = sorted_data[row_mask]
...:
In [201]: out
Out[201]:
array([[1, 8, 0, 3, 4],
[1, 8, 3, 3, 4],
[1, 8, 9, 9, 4]])
Runtime tests -
运行时测试 -
This section times all approaches proposed in the solutions presented thus far.
本节对迄今为止提出的解决方案中提出的所有方法进行计时。
In [34]: data = np.random.randint(0,10,(10000,10))
In [35]: def tuple_based(data):
...: new_array = [tuple(row) for row in data]
...: return np.unique(new_array)
...:
...: def lexsort_based(data):
...: sorted_data = data[np.lexsort(data.T),:]
...: row_mask = np.append([True],np.any(np.diff(sorted_data,axis=0),1))
...: return sorted_data[row_mask]
...:
...: def unique_based(a):
...: a = np.ascontiguousarray(a)
...: unique_a = np.unique(a.view([('', a.dtype)]*a.shape[1]))
...: return unique_a.view(a.dtype).reshape((unique_a.shape[0], a.shape[1]))
...:
In [36]: %timeit tuple_based(data)
10 loops, best of 3: 63.1 ms per loop
In [37]: %timeit lexsort_based(data)
100 loops, best of 3: 8.92 ms per loop
In [38]: %timeit unique_based(data)
10 loops, best of 3: 29.1 ms per loop