Python 是否有按功能划分的任何 numpy 组?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38013778/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:13:25  来源:igfitidea点击:

Is there any numpy group by function?

pythonarraysnumpy

提问by John Dow

Is there any function in numpy to group this array down below by the first column?

numpy 中是否有任何函数可以将这个数组按第一列分组?

I couldn't find any good answer over the internet..

我在互联网上找不到任何好的答案..

>>> a
array([[  1, 275],
       [  1, 441],
       [  1, 494],
       [  1, 593],
       [  2, 679],
       [  2, 533],
       [  2, 686],
       [  3, 559],
       [  3, 219],
       [  3, 455],
       [  4, 605],
       [  4, 468],
       [  4, 692],
       [  4, 613]])

Wanted output:

想要的输出:

array([[[275, 441, 494, 593]],
       [[679, 533, 686]],
       [[559, 219, 455]],
       [[605, 468, 692, 613]]], dtype=object)

回答by Vincent J

Inspired by Eelco Hoogendoorn's library, but without his library, and using the fact that the first column of your array is always increasing.

受 Eelco Hoogendoorn 库的启发,但没有他的库,并使用数组的第一列始终增加的事实。

>>> np.split(a[:, 1], np.cumsum(np.unique(a[:, 0], return_counts=True)[1])[:-1])
[array([275, 441, 494, 593]),
 array([679, 533, 686]),
 array([559, 219, 455]),
 array([605, 468, 692, 613])]

I didn't "timeit" but this is probably the faster way to achieve the question :

我没有“timeit”,但这可能是解决问题的更快方法:

  • No python native loop
  • Result lists are numpy arrays, in case you need to make other numpy operations on them, no new conversion will be needed
  • Complexity like O(n)
  • 没有python本机循环
  • 结果列表是 numpy 数组,如果您需要对它们进行其他 numpy 操作,则不需要新的转换
  • 复杂度像 O(n)

PS: I wrote a similar line because I needed to "group by" the results of np.nonzero:

PS:我写了一个类似的行,因为我需要对 np.nonzero 的结果进行“分组”:

>>> indexes, values = np.nonzero(...)
>>> np.split(values, np.cumsum(np.unique(indexes, return_counts=True)[1]))

回答by Eelco Hoogendoorn

The numpy_indexedpackage (disclaimer: I am its author) aims to fill this gap in numpy. All operations in numpy-indexed are fully vectorized, and no O(n^2) algorithms were harmed during the making of this library.

numpy_indexed包(免责声明:我是它的作者)的目标,以填补在numpy的这一空白。numpy-indexed 中的所有操作都是完全矢量化的,并且在该库的制作过程中没有损坏 O(n^2) 算法。

import numpy_indexed as npi
npi.group_by(a[:, 0]).split(a[:, 1])

Note that it is usually more efficient to directly compute relevant properties over such groups (ie, group_by(keys).mean(values)), rather than first splitting into a list / jagged array.

请注意,直接计算此类组的相关属性通常更有效(即 group_by(keys).mean(values)),而不是首先拆分为列表/锯齿状数组。

回答by Piotr

Numpy is not very handy here because the desired output is not an array of integers (it is an array of list objects).

Numpy 在这里不是很方便,因为所需的输出不是整数数组(它是一个列表对象数组)。

I suggest either the pure Python way...

我建议使用纯 Python 方式...

from collections import defaultdict

%%timeit
d = defaultdict(list)
for key, val in a:
    d[key].append(val)
10.7 μs ± 156 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

# result:
defaultdict(list,
        {1: [275, 441, 494, 593],
         2: [679, 533, 686],
         3: [559, 219, 455],
         4: [605, 468, 692, 613]})

...or the pandas way:

...或熊猫方式:

import pandas as pd

%%timeit
df = pd.DataFrame(a, columns=["key", "val"])
df.groupby("key").val.apply(pd.Series.tolist)
979 μs ± 3.3 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# result:
key
1    [275, 441, 494, 593]
2         [679, 533, 686]
3         [559, 219, 455]
4    [605, 468, 692, 613]
Name: val, dtype: object

回答by Gioelelm

n = np.unique(a[:,0])
np.array( [ list(a[a[:,0]==i,1]) for i in n] )

outputs:

输出:

array([[275, 441, 494, 593], [679, 533, 686], [559, 219, 455],
       [605, 468, 692, 613]], dtype=object)

回答by ns63sr

Simplifying the answer of Vincent Jone can use return_index = Trueinstead of return_counts = Trueand get rid of the cumsum:

简化文森特 J答案,可以使用return_index = True代替return_counts = True并去掉cumsum

np.split(a[:,1], np.unique(idx,return_index = True)[1][1:])

Output

输出

[array([275, 441, 494, 593]),
 array([679, 533, 686]),
 array([559, 219, 455]),
 array([605, 468, 692, 613])]

回答by Guido Mocha

given X as array of items you want to be grouped and y (1D array) as corresponding groups, following function does the grouping with numpy:

给定 X 作为要分组的项目数组,将 y (一维数组)作为相应的组,以下函数使用numpy进行分组:

def groupby(X, y):
    y = np.asarray(y)
    X = np.asarray(X)
    y_uniques = np.unique(y)
    return [X[y==yi] for yi in y_uniques]

So, groupby(a[:,1], a[:,0])returns [array([275, 441, 494, 593]), array([679, 533, 686]), array([559, 219, 455]), array([605, 468, 692, 613])]

所以,groupby(a[:,1], a[:,0])返回 [array([275, 441, 494, 593]), array([679, 533, 686]), array([559, 219, 455]), array([605, 468, 692, 613])]

回答by user2251346

I used np.unique() followed by np.extract()

我使用 np.unique() 后跟 np.extract()

unique = np.unique(a[:, 0:1])
answer = []
for element in unique:
    present = a[:,0]==element
    answer.append(np.extract(present,a[:,-1]))
print (answer)

[array([275, 441, 494, 593]), array([679, 533, 686]), array([559, 219, 455]), array([605, 468, 692, 613])]

[array([275, 441, 494, 593]), array([679, 533, 686]), array([559, 219, 455]), array([605, 468, 692, 613])]

回答by javadba

We might also find it useful to generate a dict:

我们可能还会发现生成一个很有用dict

def groupby(X): 
    X = np.asarray(X) 
    x_uniques = np.unique(X) 
    return {xi:X[X==xi] for xi in x_uniques} 

Let's try it out:

让我们试试看:

X=[1,1,2,2,3,3,3,3,4,5,6,7,7,8,9,9,1,1,1]
groupby(X)                                                                                                      
Out[9]: 
{1: array([1, 1, 1, 1, 1]),
 2: array([2, 2]),
 3: array([3, 3, 3, 3]),
 4: array([4]),
 5: array([5]),
 6: array([6]),
 7: array([7, 7]),
 8: array([8]),
 9: array([9, 9])}

Note this by itself is not super compelling - but if we make Xan objector namedtupleand then provide a groupbyfunction it becomes more interesting. Will put that in later.

请注意,这本身并不是非常引人注目 - 但如果我们制作X一个objectornamedtuple然后提供一个groupby函数,它会变得更有趣。以后会放上来。