Python 是否有按功能划分的任何 numpy 组？

Question

提问by John Dow

Is there any function in numpy to group this array down below by the first column?

numpy 中是否有任何函数可以将这个数组按第一列分组？

I couldn't find any good answer over the internet..

我在互联网上找不到任何好的答案..

>>> a
array([[  1, 275],
       [  1, 441],
       [  1, 494],
       [  1, 593],
       [  2, 679],
       [  2, 533],
       [  2, 686],
       [  3, 559],
       [  3, 219],
       [  3, 455],
       [  4, 605],
       [  4, 468],
       [  4, 692],
       [  4, 613]])

Wanted output:

想要的输出：

array([[[275, 441, 494, 593]],
       [[679, 533, 686]],
       [[559, 219, 455]],
       [[605, 468, 692, 613]]], dtype=object)

Answer 1

回答by Vincent J

Inspired by Eelco Hoogendoorn's library, but without his library, and using the fact that the first column of your array is always increasing.

受 Eelco Hoogendoorn 库的启发，但没有他的库，并使用数组的第一列始终增加的事实。

>>> np.split(a[:, 1], np.cumsum(np.unique(a[:, 0], return_counts=True)[1])[:-1])
[array([275, 441, 494, 593]),
 array([679, 533, 686]),
 array([559, 219, 455]),
 array([605, 468, 692, 613])]

I didn't "timeit" but this is probably the faster way to achieve the question :

我没有“timeit”，但这可能是解决问题的更快方法：

No python native loop
Result lists are numpy arrays, in case you need to make other numpy operations on them, no new conversion will be needed
Complexity like O(n)

没有python本机循环
结果列表是 numpy 数组，如果您需要对它们进行其他 numpy 操作，则不需要新的转换
复杂度像 O(n)

PS: I wrote a similar line because I needed to "group by" the results of np.nonzero:

PS：我写了一个类似的行，因为我需要对 np.nonzero 的结果进行“分组”：

>>> indexes, values = np.nonzero(...)
>>> np.split(values, np.cumsum(np.unique(indexes, return_counts=True)[1]))

Answer 2

回答by Eelco Hoogendoorn

The numpy_indexedpackage (disclaimer: I am its author) aims to fill this gap in numpy. All operations in numpy-indexed are fully vectorized, and no O(n^2) algorithms were harmed during the making of this library.

该numpy_indexed包（免责声明：我是它的作者）的目标，以填补在numpy的这一空白。numpy-indexed 中的所有操作都是完全矢量化的，并且在该库的制作过程中没有损坏 O(n^2) 算法。

import numpy_indexed as npi
npi.group_by(a[:, 0]).split(a[:, 1])

Note that it is usually more efficient to directly compute relevant properties over such groups (ie, group_by(keys).mean(values)), rather than first splitting into a list / jagged array.

请注意，直接计算此类组的相关属性通常更有效（即 group_by(keys).mean(values)），而不是首先拆分为列表/锯齿状数组。

Answer 3

回答by Piotr

Numpy is not very handy here because the desired output is not an array of integers (it is an array of list objects).

Numpy 在这里不是很方便，因为所需的输出不是整数数组（它是一个列表对象数组）。

I suggest either the pure Python way...

我建议使用纯 Python 方式...

from collections import defaultdict

%%timeit
d = defaultdict(list)
for key, val in a:
    d[key].append(val)
10.7 μs ± 156 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

# result:
defaultdict(list,
        {1: [275, 441, 494, 593],
         2: [679, 533, 686],
         3: [559, 219, 455],
         4: [605, 468, 692, 613]})

...or the pandas way:

...或熊猫方式：

import pandas as pd

%%timeit
df = pd.DataFrame(a, columns=["key", "val"])
df.groupby("key").val.apply(pd.Series.tolist)
979 μs ± 3.3 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# result:
key
1    [275, 441, 494, 593]
2         [679, 533, 686]
3         [559, 219, 455]
4    [605, 468, 692, 613]
Name: val, dtype: object

Answer 4

回答by Gioelelm

n = np.unique(a[:,0])
np.array( [ list(a[a[:,0]==i,1]) for i in n] )

outputs:

输出：

array([[275, 441, 494, 593], [679, 533, 686], [559, 219, 455],
       [605, 468, 692, 613]], dtype=object)

Answer 5

回答by ns63sr

Simplifying the answer of Vincent Jone can use return_index = Trueinstead of return_counts = Trueand get rid of the cumsum:

简化文森特 J的答案，可以使用return_index = True代替return_counts = True并去掉cumsum：

np.split(a[:,1], np.unique(idx,return_index = True)[1][1:])

Output

输出

[array([275, 441, 494, 593]),
 array([679, 533, 686]),
 array([559, 219, 455]),
 array([605, 468, 692, 613])]

Answer 6

回答by Guido Mocha

given X as array of items you want to be grouped and y (1D array) as corresponding groups, following function does the grouping with numpy:

给定 X 作为要分组的项目数组，将 y （一维数组）作为相应的组，以下函数使用numpy进行分组：

def groupby(X, y):
    y = np.asarray(y)
    X = np.asarray(X)
    y_uniques = np.unique(y)
    return [X[y==yi] for yi in y_uniques]

So, groupby(a[:,1], a[:,0])returns [array([275, 441, 494, 593]), array([679, 533, 686]), array([559, 219, 455]), array([605, 468, 692, 613])]

所以，groupby(a[:,1], a[:,0])返回 [array([275, 441, 494, 593]), array([679, 533, 686]), array([559, 219, 455]), array([605, 468, 692, 613])]

Answer 7

回答by user2251346

I used np.unique() followed by np.extract()

我使用 np.unique() 后跟 np.extract()

unique = np.unique(a[:, 0:1])
answer = []
for element in unique:
    present = a[:,0]==element
    answer.append(np.extract(present,a[:,-1]))
print (answer)

[array([275, 441, 494, 593]), array([679, 533, 686]), array([559, 219, 455]), array([605, 468, 692, 613])]

Answer 8

回答by javadba

We might also find it useful to generate a dict:

我们可能还会发现生成一个很有用dict：

def groupby(X): 
    X = np.asarray(X) 
    x_uniques = np.unique(X) 
    return {xi:X[X==xi] for xi in x_uniques}

Let's try it out:

让我们试试看：

X=[1,1,2,2,3,3,3,3,4,5,6,7,7,8,9,9,1,1,1]
groupby(X)                                                                                                      
Out[9]: 
{1: array([1, 1, 1, 1, 1]),
 2: array([2, 2]),
 3: array([3, 3, 3, 3]),
 4: array([4]),
 5: array([5]),
 6: array([6]),
 7: array([7, 7]),
 8: array([8]),
 9: array([9, 9])}

Note this by itself is not super compelling - but if we make Xan objector namedtupleand then provide a groupbyfunction it becomes more interesting. Will put that in later.

请注意，这本身并不是非常引人注目 - 但如果我们制作X一个objectornamedtuple然后提供一个groupby函数，它会变得更有趣。以后会放上来。

Python 是否有按功能划分的任何 numpy 组？

提问by John Dow

回答by Vincent J

回答by Eelco Hoogendoorn

回答by Piotr

回答by Gioelelm

回答by ns63sr

回答by Guido Mocha

回答by user2251346

回答by javadba

相关推荐

最近更新

标签

Python 是否有按功能划分的任何 numpy 组？

提问by John Dow

回答by Vincent J

回答by Eelco Hoogendoorn

回答by Piotr

回答by Gioelelm

回答by ns63sr

回答by Guido Mocha

回答by user2251346

回答by javadba

相关推荐

Python 3.x - iloc 抛出错误 - “单个位置索引器越界”

Python 在处理上述异常的过程中，又发生了一个异常

Python 设置 pandas DataFrame 的索引名称

Python 如何在 google colab 中导入自定义模块？

相关推荐

最近更新

标签