Python numpy.unique 保留顺序

Question

提问by siamii

['b','b','b','a','a','c','c']

numpy.unique gives

numpy.unique 给出

['a','b','c']

How can I get the original order preserved

如何保留原始订单

['b','a','c']

Great answers. Bonus question. Why do none of these methods work with this dataset? http://www.uploadmb.com/dw.php?id=1364341573Here's the question numpy sort wierd behavior

很棒的答案。奖金问题。为什么这些方法都不适用于此数据集？http://www.uploadmb.com/dw.php?id=1364341573这是问题numpy sort wierd 行为

Answer 1

采纳答案by HYRY

unique()is slow, O(Nlog(N)), but you can do this by following code:

unique()很慢，O(Nlog(N))，但您可以通过以下代码执行此操作：

import numpy as np
a = np.array(['b','a','b','b','d','a','a','c','c'])
_, idx = np.unique(a, return_index=True)
print(a[np.sort(idx)])

output:

输出：

['b' 'a' 'd' 'c']

Pandas.unique()is much faster for big array O(N):

Pandas.unique()对于大数组 O(N)，速度要快得多：

import pandas as pd

a = np.random.randint(0, 1000, 10000)
%timeit np.unique(a)
%timeit pd.unique(a)

1000 loops, best of 3: 644 us per loop
10000 loops, best of 3: 144 us per loop

Answer 2

回答by YXD

a = ['b','b','b','a','a','c','c']
[a[i] for i in sorted(np.unique(a, return_index=True)[1])]

Answer 3

回答by Fred Foo

Use the return_indexfunctionality of np.unique. That returns the indices at which the elements first occurred in the input. Then argsortthose indices.

使用return_index的功能np.unique。这将返回元素首次出现在输入中的索引。然后argsort是那些指数。

>>> u, ind = np.unique(['b','b','b','a','a','c','c'], return_index=True)
>>> u[np.argsort(ind)]
array(['b', 'a', 'c'], 
      dtype='|S1')

Answer 4

回答by Jan Spurny

If you're trying to remove duplication of an already sorted iterable, you can use itertools.groupbyfunction:

如果您尝试删除已排序的可迭代对象的重复项，您可以使用itertools.groupby函数：

>>> from itertools import groupby
>>> a = ['b','b','b','a','a','c','c']
>>> [x[0] for x in groupby(a)]
['b', 'a', 'c']

This works more like unix 'uniq' command, because it assumes the list is already sorted. When you try it on unsorted list you will get something like this:

这更像 unix 'uniq' 命令，因为它假设列表已经排序。当你在 unsorted list 上尝试时，你会得到这样的结果：

>>> b = ['b','b','b','a','a','c','c','a','a']
>>> [x[0] for x in groupby(b)]
['b', 'a', 'c', 'a']

Answer 5

回答by Albert

If you want to delete repeated entries, like the Unix tool uniq, this is a solution:

如果你想删除重复的条目，比如 Unix tool uniq，这是一个解决方案：

def uniq(seq):
  """
  Like Unix tool uniq. Removes repeated entries.
  :param seq: numpy.array
  :return: seq
  """
  diffs = np.ones_like(seq)
  diffs[1:] = seq[1:] - seq[:-1]
  idx = diffs.nonzero()
  return seq[idx]

Answer 6

回答by DanGoodrick

Use an OrderedDict (faster than a list comprehension)

使用 OrderedDict（比列表理解更快）

from collections import OrderedDict  
a = ['b','a','b','a','a','c','c']
list(OrderedDict.fromkeys(a))

Python numpy.unique 保留顺序

提问by siamii

采纳答案by HYRY

回答by YXD

回答by Fred Foo

回答by Jan Spurny

回答by Albert

回答by DanGoodrick

相关推荐

最近更新

标签

Python numpy.unique 保留顺序

提问by siamii

采纳答案by HYRY

回答by YXD

回答by Fred Foo

回答by Jan Spurny

回答by Albert

回答by DanGoodrick

相关推荐

树莓派自动连接wifi的python脚本

Python 模块 os.chmod(file, 664) 不会改变 rw-rw-r-- 的权限，而是 -w--wx----

Python Eclipse 和 Google App Engine：ImportError：没有名为 _sysconfigdata_nd 的模块；无法识别的参数：--high_replication

Python Django Rest框架文件上传

相关推荐

最近更新

标签