pandas 将 GeoDataFrame 写入 SQL 数据库

Question

提问by Jhonny

I hope that my question is not ridiculous since, surprisingly, this question has apparently not really been asked yet (to the best of my knowledge) on the popular websites.

我希望我的问题不是荒谬的，因为令人惊讶的是，这个问题显然还没有在流行的网站上（据我所知）真正被问过。

The situation is that I have several csv files containing more than 1 Mio observations in total. Each observation contains, among others, a postal address. I am planning to read all files into a single GeoDataFrame, geocode the addresses, perform a spatial join given a shapefile and save some information from the polygon for each row. Quite standard, I suppose. This is a part of a one-time data cleaning process.

情况是我有几个 csv 文件，总共包含 1 个以上的 Mio 观测值。每个观察都包含一个邮政地址等。我计划将所有文件读入单个 GeoDataFrame，对地址进行地理编码，在给定 shapefile 的情况下执行空间连接，并为每一行保存来自多边形的一些信息。很标准，我想。这是一次性数据清理过程的一部分。

My goal is to set up a database with this final dataset. This is because it allows me to share and search the data quite easily as well as e.g. plot some observations on a website. Also, it makes it quite easy to select observations based on some criteria and then run some analyses.

我的目标是用这个最终数据集建立一个数据库。这是因为它允许我很容易地共享和搜索数据以及在网站上绘制一些观察结果。此外，它可以很容易地根据某些标准选择观察结果，然后进行一些分析。

My problem is that the feature of inserting a GeoDataFrame into a database seems not to be implemented yet - apparently because GeoPandas is supposed to be a subsitute for databases ("GeoPandas enables you to easily do operations in python that would otherwise require a spatial database such as PostGIS").

我的问题是，将 GeoDataFrame 插入数据库的功能似乎尚未实现 - 显然是因为 GeoPandas 应该是数据库的替代品（“GeoPandas 使您可以轻松地在 python 中进行操作，否则需要空间数据库，例如作为 PostGIS”）。

Of course, I could iterate through each line and insert each data point "manually", but I am looking for the best solution here. For any workaround I would also be afraid that the datatype may conflict with that of the database. Is there "a best way" to take here?

当然，我可以遍历每一行并“手动”插入每个数据点，但我在这里寻找最佳解决方案。对于任何解决方法，我也担心数据类型可能与数据库的数据类型冲突。有没有“最好的方法”来这里？

Thanks for your help.

谢谢你的帮助。

Answer 1

采纳答案by Kartik

So, I just implemented this for a PostGIS database, and I can paste my method here. For MySQL, you'll have to adapt the code.

所以，我刚刚为 PostGIS 数据库实现了这个，我可以在这里粘贴我的方法。对于 MySQL，您必须修改代码。

First step was to convert the geocoded columns into WKB hex string, because I use SQLAlchemy, with an engine based on pyscopg, and both of those packages do not understand geo-types natively. Next step is to write that data into a SQL DB, as usual (note that all geometry columns should be converted to text columns holding the WKB hex string), and finally change the type of the columns to Geometry by executing a query. Refer to the following pseudocode:

第一步是将地理编码的列转换为 WKB 十六进制字符串，因为我使用SQLAlchemy和基于pyscopg的引擎，而这两个包本身都不理解地理类型。下一步是像往常一样将该数据写入 SQL DB（请注意，所有几何列都应转换为包含 WKB 十六进制字符串的文本列），最后通过执行查询将列的类型更改为 Geometry。参考如下伪代码：

# Imports
import sqlalchemy as sal
import geopandas as gpd

# Function to generate WKB hex
def wkb_hexer(line):
    return line.wkb_hex

# Convert `'geom'` column in GeoDataFrame `gdf` to hex
    # Note that following this step, the GeoDataFrame is just a regular DataFrame
    # because it does not have a geometry column anymore. Also note that
    # it is assumed the `'geom'` column is correctly datatyped.
gdf['geom'] = gdf['geom'].apply(wkb_hexer)

# Create SQL connection engine
engine = sal.create_engine('postgresql://username:password@host:socket/database')

# Connect to database using a context manager
with engine.connect() as conn, conn.begin():
    # Note use of regular Pandas `to_sql()` method.
    gdf.to_sql(table_name, con=conn, schema=schema_name,
               if_exists='append', index=False)
    # Convert the `'geom'` column back to Geometry datatype, from text
    sql = """ALTER TABLE schema_name.table_name
               ALTER COLUMN geom TYPE Geometry(LINESTRING, <SRID>)
                 USING ST_SetSRID(geom::Geometry, <SRID>)"""
    conn.execute(sql)

Answer 2

回答by Hamri Said

As mentioned before, @Kartik's answer works only for a single call, for appending data it raises a DataErrorsince the geomcolumn then expects the geometry to have an SRID. You can use GeoAlchemyto handle all the cases:

如前所述，@Kartik 的答案仅适用于单个调用，对于附加数据，它会引发 a，DataError因为该geom列然后期望几何具有 SRID。您可以使用GeoAlchemy来处理所有情况：

# Imports
from geoalchemy2 import Geometry, WKTElement
from sqlalchemy import *

# Use GeoAlchemy's WKTElement to create a geom with SRID
def create_wkt_element(geom):
    return WKTElement(geom.wkt, srid = <your_SRID>)

geodataframe['geom'] = geodataframe['geom'].apply(create_wkt_element)

db_url = create_engine('postgresql://username:password@host:socket/database')
engine = create_engine(db_url, echo=False)

# Use 'dtype' to specify column's type
# For the geom column, we will use GeoAlchemy's type 'Geometry'
your_geodataframe.to_sql(table_name, engine, if_exists='append', index=False, 
                         dtype={'geom': Geometry('POINT', srid= <your_srid>)})

EDIT: fixed syntax error on 2020-03-16

编辑：修复了 2020-03-16 上的语法错误

pandas 将 GeoDataFrame 写入 SQL 数据库

提问by Jhonny

采纳答案by Kartik

回答by Hamri Said

相关推荐

最近更新

标签

pandas 将 GeoDataFrame 写入 SQL 数据库

提问by Jhonny

采纳答案by Kartik

回答by Hamri Said

相关推荐

Hive 数据到 Pandas 数据框

用 mode() 填充 NAN 数据不起作用 -Pandas

Pandas：数据文件中没有列名

pandas 如何对数据框中的时间求和

相关推荐

最近更新

标签