MySQL 数据库:查询地理位置数据的最佳性能方式?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1813460/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Database: Best performance way to query geo location data?
提问by HankW
I have a MySQL database. I store homes in the database and perform literally just 1 query against the database, but I need this query to be performed super fast, and that's to return all homes within a square box geo latitude & longitude.
我有一个 MySQL 数据库。我将房屋存储在数据库中,并且实际上只对数据库执行 1 次查询,但我需要超快地执行此查询,即返回一个方格地理纬度和经度范围内的所有房屋。
SELECT * FROM homes
WHERE geolat BETWEEN ??? AND ???
AND geolng BETWEEN ??? AND ???
How is the best way for me to store my geo data so that I can perform this query of displaying all home within the geolocation box the quickest?
我存储地理数据的最佳方式是什么,以便我可以最快地执行在地理定位框中显示所有房屋的查询?
Basically:
基本上:
- Am I using the best SQL statement to perform this query the quickest?
- Does any other method exist, maybe not even using a database, for me to query the fastest way a result of homes within a boxed geolocation bounds?
- 我是否使用最好的 SQL 语句以最快的速度执行此查询?
- 是否存在任何其他方法,甚至可能不使用数据库,让我以最快的方式查询盒装地理定位范围内的房屋结果?
In case it helps, I've include my database table schema below:
如果有帮助,我在下面包含了我的数据库表架构:
CREATE TABLE IF NOT EXISTS `homes` (
`home_id` int(10) unsigned NOT NULL auto_increment,
`address` varchar(128) collate utf8_unicode_ci NOT NULL,
`city` varchar(64) collate utf8_unicode_ci NOT NULL,
`state` varchar(2) collate utf8_unicode_ci NOT NULL,
`zip` mediumint(8) unsigned NOT NULL,
`price` mediumint(8) unsigned NOT NULL,
`sqft` smallint(5) unsigned NOT NULL,
`year_built` smallint(5) unsigned NOT NULL,
`geolat` decimal(10,6) default NULL,
`geolng` decimal(10,6) default NULL,
PRIMARY KEY (`home_id`),
KEY `geolat` (`geolat`),
KEY `geolng` (`geolng`),
) ENGINE=InnoDB ;
UPDATE
更新
I understand spatial will factor in the curvature of the earth but I'm most interested in returning geo data the FASTEST. Unless these spatial database packages somehow return data faster, please don't recommend spatial extensions. Thanks
我知道空间会影响地球的曲率,但我最感兴趣的是以最快的速度返回地理数据。除非这些空间数据库包以某种方式更快地返回数据,否则请不要推荐空间扩展。谢谢
UPDATE 2
更新 2
Please note, no one below has truly answered the question. I'm really looking forward to any assistance I might receive. Thanks in advance.
请注意,下面没有人真正回答了这个问题。我真的很期待我可能得到的任何帮助。提前致谢。
回答by Igor Zevaka
There is a good paper on MySQL geolocation performance here.
有在MySQL地理位置性能良好的纸在这里。
EDITPretty sure this is using fixed radius. Also I am not 100% certain the algorithm for calculating distance is the most advanced (i.e. it'll "drill" through Earth).
编辑很确定这是使用固定半径。此外,我也不是 100% 确定计算距离的算法是最先进的(即它会“钻探”地球)。
What's significant is that the algorithm is cheap to give you a ball park limit on the number of rows to do proper distance search.
重要的是,该算法很便宜,可以为您提供行数限制以进行适当的距离搜索。
The algorithm pre-filters by taking candidates in a square around the source point, then calculating the distance in miles.
该算法通过在源点周围的正方形中选取候选对象进行预过滤,然后计算以英里为单位的距离。
Pre-calculate this, or use a stored procedure as the source suggests:
预先计算这个,或使用存储过程作为来源建议:
# Pseudo code
# user_lon and user_lat are the source longitude and latitude
# radius is the radius where you want to search
lon_distance = radius / abs(cos(radians(user_lat))*69);
min_lon = user_lon - lon_distance;
max_lon = user_lon + lon_distance;
min_lat = user_lat - (radius / 69);
max_lat = user_lat + (radius / 69);
SELECT dest.*,
3956 * 2 * ASIN(
SQRT(
POWER(
SIN(
(user_lat - dest.lat) * pi() / 180 / 2
), 2
) + COS(
user_lat * pi() / 180
) * COS(
dest.lat * pi() / 180
) * POWER(
SIN(
(user_lon - dest.lon) * pi() / 180 / 2
), 2
)
)
) as distance
FROM dest
WHERE
dest.lon between min_lon and max_lon AND
dest.lat between min_lat and max_lat
HAVING distance < radius
ORDER BY distance
LIMIT 10
回答by Evert
回答by tosh
If you really need to go for performance you can define bounding boxes for your data and map the pre-compute bounding boxes to your objects on insertion and use them later for queries.
如果你真的需要提高性能,你可以为你的数据定义边界框,并在插入时将预先计算的边界框映射到你的对象,然后将它们用于查询。
If the resultsets are reasonably small you could still do accuracy corrections in the application logic (easier to scale horizontal than a database) while enabling to serve accurate results.
如果结果集相当小,您仍然可以在应用程序逻辑中进行精度校正(比数据库更容易横向扩展),同时能够提供准确的结果。
Take a look at Bret Slatkin's geobox.pywhich contains great documentation for the approach.
查看 Bret Slatkin 的geobox.py,其中包含有关该方法的出色文档。
I would still recommend checking out PostgreSQL and PostGISin comparison to MySQL if you intend to do more complex queries in the foreseeable future.
如果您打算在可预见的未来进行更复杂的查询,我仍然建议您查看 PostgreSQL 和PostGIS与 MySQL 相比。
回答by Peter Lindqvist
The indices you are using are indeed B-tree indices and support the BETWEEN
keyword in your query. This means that the optimizer is able to use your indices to find the homes within your "box". It does however not mean that it will always use the indices. If you specify a range that contains too many "hits" the indices will not be used.
您使用的索引确实是 B 树索引并支持BETWEEN
查询中的关键字。这意味着优化器能够使用您的索引在您的“盒子”中找到房屋。然而,这并不意味着它将始终使用索引。如果您指定的范围包含太多“命中”,则不会使用索引。
回答by Ben
Sticking with your current approach there is one change you should make, Rather than indexing geolat and geolong separately you should have a composite index:
坚持你目前的方法,你应该做一个改变,而不是分别索引 geolat 和 geolong,你应该有一个复合索引:
KEY `geolat_geolng` (`geolat`, `geolng`),
Currently your query will only be taking advantage of one of the two indexes.
目前,您的查询将仅利用两个索引之一。
回答by jalogar
A very good alternative is MongoDBwith its Geospatial Indexing.
一个非常好的替代方案是MongoDB及其Geospatial Indexing。
回答by Ben
Here's a trick I've used with some success is to create round-off regions. That is to say, if you have a location that's at 36.12345,-120.54321, and you want to group it with other locations which are within a half-mile (approximate) grid box, you can call its region 36.12x-120.54, and all other locations with the same round-off region will fall in the same box.
这是我使用的一个取得一些成功的技巧是创建四舍五入区域。也就是说,如果您有一个位于 36.12345,-120.54321 的位置,并且您想将其与半英里(大约)网格框内的其他位置分组,则可以将其区域称为 36.12x-120.54,并且具有相同舍入区域的所有其他位置将落在同一个框中。
Obviously, that doesn't get you a clean radius, i.e. if the location you're looking at is closer to one edge than another. However, with this sort of a set-up, it's easy enough to calculate the eight boxes that surround your main location's box. To wit:
显然,这不会给你一个干净的半径,即如果你正在看的位置比另一个更靠近一个边缘。但是,通过这种设置,计算主要位置框周围的八个框很容易。以机智:
[36.13x-120.55][36.13x-120.54][36.13x-120.53]
[36.12x-120.55][36.12x-120.54][36.12x-120.53]
[36.11x-120.55][36.11x-120.54][36.11x-120.53]
Pull all the locations with matching round-off labels and then, once you've got them out of the database, you can do your distance calculations to determine which ones to use.
使用匹配的舍入标签提取所有位置,然后,一旦将它们从数据库中取出,您就可以进行距离计算以确定使用哪些位置。
回答by Anak1
Since MySQL 5.7 mysql can use geoindex like ST_Distance_Sphere() and ST_Contains() wich improve performances.
由于 MySQL 5.7 mysql 可以使用像 ST_Distance_Sphere() 和 ST_Contains() 这样的地理索引来提高性能。
回答by novalis
回答by Clayton Stewart
You might consider creating a separate table 'GeoLocations' that has a primary key of ('geolat','geolng') and has a column that holds the home_id if that particular geolocation happens to have a home. This should allow the optimizer to search for a range of geo locations that will be sorted on disk for a list of home_ids. You could then perform a join with your 'homes' table to find information about those home_ids.
您可能会考虑创建一个单独的表 'GeoLocations',该表的主键为 ('geolat','geolng'),并且如果该特定地理位置恰好有一个家,则该表有一个包含 home_id 的列。这应该允许优化器搜索一系列地理位置,这些地理位置将在磁盘上排序以获取 home_id 列表。然后,您可以与您的“homes”表执行连接以查找有关这些 home_id 的信息。
CREATE TABLE IF NOT EXISTS `GeoLocations` (
`geolat` decimal(10,6) NOT NULL,
`geolng` decimal(10,6) NOT NULL,
`home_id` int(10) NULL
PRIMARY KEY (`geolat`,`geolng`)
);
SELECT GL.home_id
FROM GeoLocations GL
INNER JOIN Homes H
ON GL.home_id = H.home_id
WHERE GL.geolat between X and Y
and GL.geolng between X and Y