在没有numpy的情况下在python中计算一组坐标元组的质心的最快方法
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23020659/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Fastest way to calculate the centroid of a set of coordinate tuples in python without numpy
提问by user3002473
I've been working on a project that is incredibly time sensitive (that unfortunately has to be in python) and one of the functions that is used extensively is a function that calculates the centroid of a list of (x, y) tuples. To illustrate:
我一直在研究一个对时间非常敏感的项目(不幸的是它必须在 python 中),其中一个广泛使用的函数是一个计算 (x, y) 元组列表质心的函数。为了显示:
def centroid(*points):
x_coords = [p[0] for p in points]
y_coords = [p[1] for p in points]
_len = len(points)
centroid_x = sum(x_coords)/_len
centroid_y = sum(y_coords)/_len
return [centroid_x, centroid_y]
where
在哪里
>>> centroid((0, 0), (10, 0), (10, 10), (0, 10))
[5, 5]
This function runs fairly quickly, the above example completing in an average of 1.49e-05 seconds on my system but I'm looking for the fastest way to calculate the centroid. Do you have any ideas?
这个函数运行得相当快,上面的例子在我的系统上平均需要 1.49e-05 秒完成,但我正在寻找计算质心的最快方法。你有什么想法?
One of the other solutions I had was to do the following (where l
is the list of tuples):
我的其他解决方案之一是执行以下操作(l
元组列表在哪里):
map(len(l).__rtruediv__, map(sum, zip(*l)))
Which runs in between 1.01e-05 and 9.6e-06 seconds, but unfortunately converting to a list (by surrounding the whole statement in list( ... )
) nearly doublescomputation time.
它在 1.01e-05 和 9.6e-06 秒之间运行,但不幸的是转换为列表(通过将整个语句包围在 中list( ... )
)几乎使计算时间加倍。
EDIT: Suggestions are welcome in pure python BUT NOT numpy.
编辑:欢迎在纯 python 中提出建议,但不是 numpy。
EDIT2: Just found out that if a separate variable is kept for the length of the list of tuples, then my above implementation with map
runs reliably under 9.2e-06 seconds, but there's still the problem of converting back to a list.
EDIT2:刚刚发现,如果为元组列表的长度保留了一个单独的变量,那么我上面的实现map
可以在 9.2e-06 秒内可靠地运行,但是仍然存在转换回列表的问题。
EDIT3:
编辑3:
Now I'm only accepting answers in pure python, NOT in numpy (sorry to those that already answered in numpy!)
现在我只接受纯 python 的答案,而不是 numpy(对不起那些已经用 numpy 回答的人!)
采纳答案by Retozi
import numpy as np
data = np.random.randint(0, 10, size=(100000, 2))
this here is fast
这里很快
def centeroidnp(arr):
length = arr.shape[0]
sum_x = np.sum(arr[:, 0])
sum_y = np.sum(arr[:, 1])
return sum_x/length, sum_y/length
%timeit centeroidnp(data)
10000 loops, best of 3: 181 μs per loop
surprisingly, this is much slower:
令人惊讶的是,这要慢得多:
%timeit data.mean(axis=0)
1000 loops, best of 3: 1.75 ms per loop
numpy seems very quick to me...
numpy 对我来说似乎很快......
For completeness:
为了完整性:
def centeroidpython(data):
x, y = zip(*data)
l = len(x)
return sum(x) / l, sum(y) / l
#take the data conversion out to be fair!
data = list(tuple(i) for i in data)
%timeit centeroidpython(data)
10 loops, best of 3: 57 ms per loop
回答by RemcoGerlich
This is a naive numpy implementation, I can't time here so I wonder how it does:
这是一个天真的 numpy 实现,我不能在这里计时,所以我想知道它是怎么做的:
import numpy as np
arr = np.asarray(points)
length = arr.shape[0]
sum_x = np.sum(arr[:, 0])
sum_y = np.sum(arr[:, 1])
return sum_x / length, sum_y / length
You pass the points to centroid()
as separate parameters, that are then put into a single tuple with *points
. It would be faster to just pass in a list or iterator with points.
您将这些点centroid()
作为单独的参数传递给,然后将它们放入一个带有*points
. 只传递带有点的列表或迭代器会更快。
回答by Pau B
Just for completeness, I modified Retozi's function so it accepts a vector of any dimension:
为了完整起见,我修改了 Retozi 的函数,使其接受任何维度的向量:
def centeroidnp(arr):
length, dim = arr.shape
return np.array([np.sum(arr[:, i])/length for i in range(dim)])
回答by Bobak Hashemi
In Cartesian coordinates, the centroid is just the mean of the components:
在笛卡尔坐标中,质心只是分量的平均值:
data = ((0,0), (1,1), (2,2))
np.mean(data, axis=0)
>>> array([1., 1.])