在没有numpy的情况下在python中计算一组坐标元组的质心的最快方法

Question

提问by user3002473

I've been working on a project that is incredibly time sensitive (that unfortunately has to be in python) and one of the functions that is used extensively is a function that calculates the centroid of a list of (x, y) tuples. To illustrate:

我一直在研究一个对时间非常敏感的项目（不幸的是它必须在 python 中），其中一个广泛使用的函数是一个计算 (x, y) 元组列表质心的函数。为了显示：

def centroid(*points):
    x_coords = [p[0] for p in points]
    y_coords = [p[1] for p in points]
    _len = len(points)
    centroid_x = sum(x_coords)/_len
    centroid_y = sum(y_coords)/_len
    return [centroid_x, centroid_y]

where

在哪里

>>> centroid((0, 0), (10, 0), (10, 10), (0, 10))
[5, 5]

This function runs fairly quickly, the above example completing in an average of 1.49e-05 seconds on my system but I'm looking for the fastest way to calculate the centroid. Do you have any ideas?

这个函数运行得相当快，上面的例子在我的系统上平均需要 1.49e-05 秒完成，但我正在寻找计算质心的最快方法。你有什么想法？

One of the other solutions I had was to do the following (where lis the list of tuples):

我的其他解决方案之一是执行以下操作（l元组列表在哪里）：

map(len(l).__rtruediv__, map(sum, zip(*l)))

Which runs in between 1.01e-05 and 9.6e-06 seconds, but unfortunately converting to a list (by surrounding the whole statement in list( ... )) nearly doublescomputation time.

它在 1.01e-05 和 9.6e-06 秒之间运行，但不幸的是转换为列表（通过将整个语句包围在中list( ... )）几乎使计算时间加倍。

EDIT: Suggestions are welcome in pure python BUT NOT numpy.

编辑：欢迎在纯 python 中提出建议，但不是 numpy。

EDIT2: Just found out that if a separate variable is kept for the length of the list of tuples, then my above implementation with mapruns reliably under 9.2e-06 seconds, but there's still the problem of converting back to a list.

EDIT2：刚刚发现，如果为元组列表的长度保留了一个单独的变量，那么我上面的实现map可以在 9.2e-06 秒内可靠地运行，但是仍然存在转换回列表的问题。

EDIT3:

编辑3：

Now I'm only accepting answers in pure python, NOT in numpy (sorry to those that already answered in numpy!)

现在我只接受纯 python 的答案，而不是 numpy（对不起那些已经用 numpy 回答的人！）

Answer 1

采纳答案by Retozi

import numpy as np

data = np.random.randint(0, 10, size=(100000, 2))

this here is fast

这里很快

def centeroidnp(arr):
    length = arr.shape[0]
    sum_x = np.sum(arr[:, 0])
    sum_y = np.sum(arr[:, 1])
    return sum_x/length, sum_y/length

%timeit centeroidnp(data)
10000 loops, best of 3: 181 μs per loop

surprisingly, this is much slower:

令人惊讶的是，这要慢得多：

%timeit data.mean(axis=0)
1000 loops, best of 3: 1.75 ms per loop

numpy seems very quick to me...

numpy 对我来说似乎很快......

For completeness:

为了完整性：

def centeroidpython(data):
    x, y = zip(*data)
    l = len(x)
    return sum(x) / l, sum(y) / l
#take the data conversion out to be fair!
data = list(tuple(i) for i in data)

%timeit centeroidpython(data)
10 loops, best of 3: 57 ms per loop

Answer 2

回答by RemcoGerlich

This is a naive numpy implementation, I can't time here so I wonder how it does:

这是一个天真的 numpy 实现，我不能在这里计时，所以我想知道它是怎么做的：

import numpy as np

arr = np.asarray(points)
length = arr.shape[0]
sum_x = np.sum(arr[:, 0])
sum_y = np.sum(arr[:, 1])
return sum_x / length, sum_y / length

You pass the points to centroid()as separate parameters, that are then put into a single tuple with *points. It would be faster to just pass in a list or iterator with points.

您将这些点centroid()作为单独的参数传递给，然后将它们放入一个带有*points. 只传递带有点的列表或迭代器会更快。

Answer 3

回答by Pau B

Just for completeness, I modified Retozi's function so it accepts a vector of any dimension:

为了完整起见，我修改了 Retozi 的函数，使其接受任何维度的向量：

def centeroidnp(arr):
    length, dim = arr.shape
    return np.array([np.sum(arr[:, i])/length for i in range(dim)])

Answer 4

回答by Bobak Hashemi

In Cartesian coordinates, the centroid is just the mean of the components:

在笛卡尔坐标中，质心只是分量的平均值：

data = ((0,0), (1,1), (2,2))
np.mean(data, axis=0)
>>> array([1., 1.])

在没有numpy的情况下在python中计算一组坐标元组的质心的最快方法

提问by user3002473

采纳答案by Retozi

回答by RemcoGerlich

回答by Pau B

回答by Bobak Hashemi

相关推荐

最近更新

标签

在没有numpy的情况下在python中计算一组坐标元组的质心的最快方法

提问by user3002473

采纳答案by Retozi

回答by RemcoGerlich

回答by Pau B

回答by Bobak Hashemi

相关推荐

Python numpy中的3维数组

Python 并排打印 2 个均匀填充的列表

Python 使用sklearn在PCA中恢复explain_variance_ratio_的特征名称

Python pydot.InvocationException：找不到 GraphViz 的可执行文件

相关推荐

最近更新

标签