最近邻搜索:Python

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12923586/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:11:32  来源:igfitidea点击:

Nearest Neighbor Search: Python

pythonnumpynearest-neighborkdtreeclosest-points

提问by Dlinet

I have a 2 dimensional array:

我有一个二维数组:

MyArray = array([6588252.24, 1933573.3, 212.79, 0, 0],
                [6588253.79, 1933602.89, 212.66, 0, 0],
                 etc...)

The first two elements MyArray[0]and MyArray[1]are the Xand Ycoordinates of the points.

前两个元素MyArray[0]MyArray[1]是点的XY坐标。

For every element in the array, I would like to find the quickestway to return its single nearest neighbor in a radius of Xunits. We are assuming this is in 2D space.

对于数组中的每个元素,我想找到以X单位为半径返回其单个最近邻居的最快方法。我们假设这是在 2D 空间中。

lets say for this example X = 6.

让我们说这个例子X = 6

I have solved the problem by comparing every element to every other element, but this takes 15 minutes or so when your list is 22k points long. We hope to eventually run this on lists of about 30million points.

我已经通过将每个元素与每个其他元素进行比较来解决这个问题,但是当您的列表长度为 22k 点时,这需要 15 分钟左右。我们希望最终在大约 3000 万个点的列表上运行它。

I have read about K-d trees and understand the basic concept, but have had trouble understanding how to script them.

我已经阅读了 Kd 树并理解了基本概念,但是在理解如何编写它们的脚本时遇到了麻烦。

采纳答案by Dlinet

Thanks to John Vinyard for suggesting scipy. After some good research and testing, here is the solution to this question:

感谢 John Vinyard 建议 scipy。经过一些良好的研究和测试,这里是这个问题的解决方案:

Prerequisites: Install Numpy and SciPy

先决条件:安装 Numpy 和 SciPy

  1. Import the SciPy and Numpy Modules

  2. Make a copy of the 5 dimensional array including justthe X and Y values.

  3. Create an instance of a cKDTreeas such:

    YourTreeName = scipy.spatial.cKDTree(YourArray, leafsize=100)
    #Play with the leafsize to get the fastest result for your dataset
    
  4. Query the cKDTreefor the Nearest Neighbor within 6 units as such:

    for item in YourArray:
        TheResult = YourTreeName.query(item, k=1, distance_upper_bound=6)
    

    for each item in YourArray, TheResultwill be a tuple of the distance between the two points, and the index of the location of the point in YourArray.

  1. 导入 SciPy 和 Numpy 模块

  2. 制作包含 X 和 Y 值的 5 维数组的副本。

  3. 创建一个 a 的实例,cKDTree如下所示:

    YourTreeName = scipy.spatial.cKDTree(YourArray, leafsize=100)
    #Play with the leafsize to get the fastest result for your dataset
    
  4. 查询cKDTree6 个单位内的最近邻居,如下所示:

    for item in YourArray:
        TheResult = YourTreeName.query(item, k=1, distance_upper_bound=6)
    

    对于 中的每个项目YourArrayTheResult将是两点之间距离的元组,以及 中点位置的索引YourArray