postgresql PostGIS中的K-最近邻查询

Question

提问by Abhishek Sagar

I am using the following Nearest Neighbor Query in PostGIS :

我在 PostGIS 中使用以下最近邻查询：

SELECT g1.gid g2.gid FROM points as g1, polygons g2   
WHERE g1.gid <> g2.gid
ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom)
LIMIT k;

Now, that I have created indexes on the_geom as well as gid column on both the tables, this query is taking much more time than other spatial queries involving spatial joins b/w two tables.

现在，我已经在两个表上的 the_geom 和 gid 列上创建了索引，这个查询比其他涉及空间连接 b/w 两个表的空间查询花费的时间要多得多。

Is there any better way to find K-nearest neighbors? I am using PostGIS.

有没有更好的方法来找到K-最近的邻居？我正在使用 PostGIS。

And, another query which is taking a unusually long time despite creating indexes on geometry column is:

而且，尽管在几何列上创建了索引，但另一个需要花费异常长时间的查询是：

select g1.gid , g2.gid from polygons as g1 , polygons as g2
where st_area(g1.the_geom) > st_area(g2.the_geom) ;

I believe, these queries arent benefited by gist indexes, but why?

我相信，这些查询并没有从 gist 索引中受益，但为什么呢？

Whereas this query:

而这个查询：

select a.polyid , sum(length(b.the_geom)) from polygon as a , roads as b  
where st_intersects(a.the_geom , b.the_geom);

returns result after some time despite involving "roads" table which is much bigger than polygons or points table and also involve more complex spatial operators.

尽管涉及比多边形或点表大得多的“道路”表，并且还涉及更复杂的空间运算符，但一段时间后仍会返回结果。

Answer 1

采纳答案by Thilo

Just a few thoughts on your problem:

关于你的问题的一些想法：

st_distance as well as st_area are not able to use indices. This is because both functions can not be reduced to questions like "Is a within b?" or "Do a and b overlap?". Even more concrete: GIST-indices can only operate on the bounding boxes of two objects.

st_distance 和 st_area 不能使用索引。这是因为这两个函数都不能简化为“a 在 b 中吗？”这样的问题。或“a 和 b 重叠吗？”。更具体的是：GIST-indices 只能对两个对象的边界框进行操作。

For more information on this you just could look in the postgis manual, which states an example with st_distance and how the query could be improved to perform better.

有关这方面的更多信息，您只需查看postgis 手册，其中说明了 st_distance 的示例以及如何改进查询以更好地执行。

However, this does not solve your k-nearest-neighbour-problem. For that, right now I do not have a good idea how to improve the performance of the query. The only chance I see would be assuming that the k nearest neighbors are always in a distance of below x meters. Then you could use a similar approach as done in the postgis manual.

但是，这并不能解决您的 k-最近邻问题。为此，现在我不知道如何提高查询的性能。我看到的唯一机会是假设 k 个最近的邻居总是在 x 米以下的距离内。然后您可以使用与 postgis 手册中所做的类似的方法。

Your second query could be speeded up a bit. Currently, you compute the area for each object in table 1 as often as table has rows - the strategy is first to join the data and then select based on that function. You could reduce the count of area computations significantly be precomputing the area:

您的第二个查询可以加快一点。目前，您计算表 1 中每个对象的面积与表有行的频率一样 - 策略是首先连接数据，然后根据该函数进行选择。您可以通过预先计算面积来显着减少面积计算的数量：

WITH polygonareas AS (
    SELECT gid, the_geom, st_area(the_geom) AS area
    FROM polygons
)
SELECT g1.gid, g2.gid
FROM polygonareas as g1 , polygonareas as g2 
WHERE g1.area > g2.area;

Your third query can be significantly optimized using bounding boxes: When the bounding boxes of two objects do not overlap, there is no way the objects do. This allows the usage of a given index and thus a huge performance gain.

您的第三个查询可以使用边界框进行显着优化：当两个对象的边界框不重叠时，对象无法重叠。这允许使用给定的索引，从而获得巨大的性能提升。

Answer 2

回答by natevw

Since late September 2011, PostGIS has supported indexed nearest neighbor queries via a special operator(s) usable in the ORDER BY clause:

自2011 年 9 月下旬以来，PostGIS 通过 ORDER BY 子句中可用的特殊运算符支持索引最近邻查询：

SELECT name, gid
FROM geonames
ORDER BY geom <-> st_setsrid(st_makepoint(-90,40),4326)
LIMIT 10;

...will return the 10 objects whose geomis nearest -90,40in a scalable way. A few more details (options and caveats) are in that announcement postand use of the <->and the <#> operatorsis also now documented in the official PostGIS 2.0 reference. (The main difference between the two is that <->compares the shape centroids and <#>compares their boundaries —?no difference for points, other shapes choose what is appropriate for your queries.)

...将以可扩展的方式返回geom最近的 10 个对象-90,40。更多细节（选项和注意事项）在该公告帖子中，并且<->和<#> 运算符的使用现在也记录在官方 PostGIS 2.0 参考中。（两者的主要区别在于<->比较形状质心和<#>比较它们的边界——点没有区别，其他形状选择适合您的查询的形状。）

Answer 3

回答by Grzegorz Grabek

You can do it with KNN index and lateral join.

您可以使用 KNN 索引和横向连接来实现。

SELECT v.gid, v2.gid,st_distance(v.the_geom, v2.the_geom)
  FROM geonames v, 
       lateral(select * 
                 from geonames v2
                where v2.id<>v.id
                ORDER BY v.the_geom <-> v2.the_geom LIMIT 10) v2
where v.gid in (...) - or other filtering condition

Answer 4

回答by Stefan

What you may need is the KNN index which is hopefully available soon in PostGIS 2.x and PostgreSQL 9.1: See http://blog.opengeo.org/tag/knn/

您可能需要的是 KNN 索引，它有望很快在 PostGIS 2.x 和 PostgreSQL 9.1 中可用：请参阅http://blog.opengeo.org/tag/knn/

Answer 5

回答by raphael

Assuming you have p point and g polygons, your original query:

假设您有 p 个点和 g 个多边形，您的原始查询：

SELECT g1.gid, g2.gid FROM points as g1, polygons g2   
WHERE g1.gid <> g2.gid
ORDER BY g1.gid, ST_Distance(g1.the_geom,g2.the_geom)
LIMIT k;

Is returning the k nearest neighbours in the p x g set. The query may be using indexes, but it still has to order the entire p x g set to find the k rows with the smallest distance. What you instead want is the following:

返回 pxg 集中的 k 个最近邻。查询可能正在使用索引，但它仍然必须对整个 pxg 集进行排序以找到距离最小的 k 行。你想要的是以下内容：

SELECT g1.gid, 
      (SELECT g2.gid FROM polygons g2   
       --prevents you from finding every nearest neighbour twice
       WHERE g1.gid < g2.gid 
       --ORDER BY gid is erroneous if you want to limit by the distance
       ORDER BY ST_Distance(g1.the_geom,g2.the_geom)
       LIMIT k)
FROM points as g1;

postgresql PostGIS中的K-最近邻查询

提问by Abhishek Sagar

采纳答案by Thilo

回答by natevw

回答by Grzegorz Grabek

回答by Stefan

回答by raphael

相关推荐

最近更新

标签

postgresql PostGIS中的K-最近邻查询

提问by Abhishek Sagar

采纳答案by Thilo

回答by natevw

回答by Grzegorz Grabek

回答by Stefan

回答by raphael

相关推荐

postgresql 在字符串值中导入 CSV 和逗号

postgresql 最初定义为 DEFERRABLE 的约束仍然是 DEFERRED？

postgresql 如何将数据库中的所有列转换为不区分大小写

PostgreSQL 在 plpgsql 函数中创建临时表

相关推荐

最近更新

标签