Python 是否可以按降序使用 argsort?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/16486252/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 22:47:43  来源:igfitidea点击:

Is it possible to use argsort in descending order?

pythonnumpy

提问by shn

Consider the following code:

考虑以下代码:

avgDists = np.array([1, 8, 6, 9, 4])
ids = avgDists.argsort()[:n]

This gives me indices of the nsmallest elements. Is it possible to use this same argsortin descending order to get the indices of nhighest elements?

这给了我n最小元素的索引。是否可以argsort按降序使用它来获取n最高元素的索引?

采纳答案by wim

If you negate an array, the lowest elements become the highest elements and vice-versa. Therefore, the indices of the nhighest elements are:

如果你否定一个数组,最低的元素成为最高的元素,反之亦然。因此,n最高元素的索引是:

(-avgDists).argsort()[:n]

Another way to reason about this, as mentioned in the comments, is to observe that the big elements are coming lastin the argsort. So, you can read from the tail of the argsort to find the nhighest elements:

评论中所述,另一种推理方式是观察大元素在 argsort中排在最后。因此,您可以从 argsort 的尾部读取以查找n最高元素:

avgDists.argsort()[::-1][:n]

Both methods are O(n log n)in time complexity, because the argsortcall is the dominant term here. But the second approach has a nice advantage: it replaces an O(n)negation of the array with an O(1)slice. If you're working with small arrays inside loops then you may get some performance gains from avoiding that negation, and if you're working with huge arrays then you can save on memory usage because the negation creates a copy of the entire array.

这两种方法的时间复杂度都是O(n log n),因为argsort调用是这里的主要术语。但是第二种方法有一个很好的优势:它用O(1)切片替换了数组的O(n)否定。如果您在循环内使用小数组,那么您可能会通过避免这种否定而获得一些性能提升,如果您正在使用大型数组,那么您可以节省内存使用量,因为否定会创建整个数组的副本。

Note that these methods do not always give equivalent results: if a stable sort implementation is requested to argsort, e.g. by passing the keyword argument kind='mergesort', then the first strategy will preserve the sorting stability, but the second strategy will break stability (i.e. the positions of equal items will get reversed).

请注意,这些方法并不总是给出等效的结果:如果请求稳定的排序实现argsort,例如通过传递关键字参数kind='mergesort',则第一个策略将保持排序稳定性,但第二个策略将破坏稳定性(即相等的位置项目将被逆转)。

Example timings:

示例时间:

Using a small array of 100 floats and a length 30 tail, the view method was about 15% faster

使用 100 个浮点数和长度为 30 的尾部的小数组,视图方法快了大约 15%

>>> avgDists = np.random.rand(100)
>>> n = 30
>>> timeit (-avgDists).argsort()[:n]
1.93 μs ± 6.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
1.64 μs ± 3.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
1.64 μs ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

For larger arrays, the argsort is dominant and there is no significant timing difference

对于较大的数组,argsort 占主导地位,并且没有显着的时序差异

>>> avgDists = np.random.rand(1000)
>>> n = 300
>>> timeit (-avgDists).argsort()[:n]
21.9 μs ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
21.7 μs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
21.9 μs ± 37.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Please note that the comment from nedimbelow is incorrect. Whether to truncate before or after reversing makes no difference in efficiency, since both of these operations are only striding a view of the array differently and not actually copying data.

请注意,下面来自 nedim 的评论是不正确的。在反转之前或之后进行截断对效率没有影响,因为这两种操作只是以不同的方式跨过数组的视图,而不是实际复制数据。

回答by dawg

Just like Python, in that [::-1]reverses the array returned by argsort()and [:n]gives that last n elements:

就像 Python 一样,它[::-1]反转了返回的数组argsort()[:n]给出了最后 n 个元素:

>>> avgDists=np.array([1, 8, 6, 9, 4])
>>> n=3
>>> ids = avgDists.argsort()[::-1][:n]
>>> ids
array([3, 1, 2])

The advantage of this method is that idsis a viewof avgDists:

这种方法的优点是ids是avgDists的视图

>>> ids.flags
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

(The 'OWNDATA' being False indicates this is a view, not a copy)

('OWNDATA' 为 False 表示这是一个视图,而不是副本)

Another way to do this is something like:

另一种方法是:

(-avgDists).argsort()[:n]

The problem is that the way this works is to create negative of each element in the array:

问题是它的工作方式是为数组中的每个元素创建负数:

>>> (-avgDists)
array([-1, -8, -6, -9, -4])

ANd creates a copy to do so:

并创建一个副本来这样做:

>>> (-avgDists_n).flags['OWNDATA']
True

So if you time each, with this very small data set:

因此,如果您对每个时间进行计时,使用这个非常小的数据集:

>>> import timeit
>>> timeit.timeit('(-avgDists).argsort()[:3]', setup="from __main__ import avgDists")
4.2879798610229045
>>> timeit.timeit('avgDists.argsort()[::-1][:3]', setup="from __main__ import avgDists")
2.8372560259886086

The view method is substantially faster (and uses 1/2 the memory...)

view 方法要快得多(并且使用 1/2 的内存......)

回答by MentholBonbon

You could create a copy of the array and then multiply each element with -1.
As an effect the before largest elements would become the smallest.
The indeces of the n smallest elements in the copy are the n greatest elements in the original.

您可以创建数组的副本,然后将每个元素乘以 -1。
结果,之前最大的元素将变成最小的。
副本中 n 个最小元素的 indeces 是原始 n 个最大元素。

回答by MSeifert

Instead of using np.argsortyou could use np.argpartition- if you only need the indices of the lowest/highest n elements.

如果您只需要最低/最高 n 元素的索引,则np.argsort可以使用而不是使用np.argpartition

That doesn't require to sort the whole array but just the part that you need but note that the "order inside your partition" is undefined, so while it gives the correct indices they might not be correctly ordered:

这不需要对整个数组进行排序,而只需要对您需要的部分进行排序,但请注意,“分区内的顺序”是未定义的,因此虽然它提供了正确的索引,但它们可能无法正确排序:

>>> avgDists = [1, 8, 6, 9, 4]
>>> np.array(avgDists).argpartition(2)[:2]  # indices of lowest 2 items
array([0, 4], dtype=int64)

>>> np.array(avgDists).argpartition(-2)[-2:]  # indices of highest 2 items
array([1, 3], dtype=int64)

回答by Kanmani

You can use the flip commands numpy.flipud()or numpy.fliplr()to get the indexes in descending order after sorting using the argsortcommand. Thats what I usually do.

您可以使用翻转命令numpy.flipud()numpy.fliplr()使用该argsort命令在排序后按降序获取索引。这就是我通常所做的。

回答by Biswajit Ghoshal

Another way is to use only a '-' in the argument for argsort as in : "df[np.argsort(-df[:, 0])]", provided df is the dataframe and you want to sort it by the first column (represented by the column number '0'). Change the column-name as appropriate. Of course, the column has to be a numeric one.

另一种方法是在 argsort 的参数中只使用一个“-”,如:“df[np.argsort(-df[:, 0])]”,前提是 df 是数据帧,并且你想按第一个排序列(由列号“0”表示)。根据需要更改列名。当然,该列必须是数字。

回答by Alexey Antonenko

With your example:

以你的例子:

avgDists = np.array([1, 8, 6, 9, 4])

Obtain indexes of n maximal values:

获取 n 个最大值的索引:

ids = np.argpartition(avgDists, -n)[-n:]

Sort them in descending order:

按降序对它们进行排序:

ids = ids[np.argsort(avgDists[ids])[::-1]]

Obtain results (for n=4):

获取结果(对于 n=4):

>>> avgDists[ids]
array([9, 8, 6, 4])

回答by Adam Erickson

As @Kanmani hinted, an easier to interpret implementation may use numpy.flip, as in the following:

正如@Kanmani 所暗示的那样,可以使用更易于解释的实现numpy.flip,如下所示:

import numpy as np

avgDists = np.array([1, 8, 6, 9, 4])
ids = np.flip(np.argsort(avgDists))
print(ids)

By using the visitor pattern rather than member functions, it is easier to read the order of operations.

通过使用访问者模式而不是成员函数,更容易阅读操作顺序。