从 Pandas 数据帧生成 SQL 语句
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31071952/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Generate SQL statements from a Pandas Dataframe
提问by Jorick Spitzen
I am loading data from various sources (csv, xls, json etc...) into Pandas dataframes and I would like to generate statements to create and fill a SQL database with this data. Does anyone know of a way to do this?
我正在将来自各种来源(csv、xls、json 等)的数据加载到 Pandas 数据帧中,并且我想生成语句来创建并用这些数据填充 SQL 数据库。有谁知道这样做的方法吗?
I know pandas has a to_sqlfunction, but that only works on a database connection, it can not generate a string.
我知道 Pandas 有一个to_sql函数,但它只适用于数据库连接,不能生成字符串。
Example
例子
What I would like is to take a dataframe like so:
我想要的是采用这样的数据框:
import pandas as pd
import numpy as np
dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
And a function that would generate this (this example is PostgreSQL but any would be fine):
还有一个可以生成这个的函数(这个例子是 PostgreSQL,但任何都可以):
CREATE TABLE data
(
index timestamp with time zone,
"A" double precision,
"B" double precision,
"C" double precision,
"D" double precision
)
回答by joris
If you only want the 'CREATE TABLE' sql code (and not the insert of the data), you can use the get_schemafunction of the pandas.io.sql module:
如果您只想要'CREATE TABLE' sql 代码(而不是数据的插入),则可以使用get_schemapandas.io.sql 模块的功能:
In [10]: print pd.io.sql.get_schema(df.reset_index(), 'data')
CREATE TABLE "data" (
"index" TIMESTAMP,
"A" REAL,
"B" REAL,
"C" REAL,
"D" REAL
)
Some notes:
一些注意事项:
- I had to use
reset_indexbecause it otherwise didn't include the index - If you provide an sqlalchemy engine of a certain database flavor, the result will be adjusted to that flavor (eg the data type names).
- 我不得不使用,
reset_index因为否则它不包括索引 - 如果您提供某种数据库风格的 sqlalchemy 引擎,结果将被调整为该风格(例如数据类型名称)。
回答by Jansen Simanullang
GENERATE SQL CREATE STATEMENT FROM DATAFRAME
从 DATAFRAME 生成 SQL 创建语句
SOURCE = df
TARGET = data
GENERATE SQL CREATE STATEMENT FROM DATAFRAME
从 DATAFRAME 生成 SQL 创建语句
def SQL_CREATE_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET):
# SQL_CREATE_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET)
# SOURCE: source dataframe
# TARGET: target table to be created in database
import pandas as pd
sql_text = pd.io.sql.get_schema(SOURCE.reset_index(), TARGET)
return sql_text
Check the SQL CREATE TABLEStatement String
检查 SQLCREATE TABLE语句字符串
print('\n\n'.join(sql_text))
GENERATE SQL INSERT STATEMENT FROM DATAFRAME
从 DATAFRAME 生成 SQL INSERT 语句
def SQL_INSERT_STATEMENT_FROM_DATAFRAME(SOURCE, TARGET):
sql_texts = []
for index, row in SOURCE.iterrows():
sql_texts.append('INSERT INTO '+TARGET+' ('+ str(', '.join(SOURCE.columns))+ ') VALUES '+ str(tuple(row.values)))
return sql_texts
Check the SQL INSERT INTOStatement String
检查 SQLINSERT INTO语句字符串
print('\n\n'.join(sql_texts))
回答by Delforge
If you want to write the file by yourself, you may also retrieve columns names and dtypes and build a dictionary to convert pandas data types to sql data types.
如果你想自己写文件,你也可以检索列名和dtypes并构建一个字典将pandas数据类型转换为sql数据类型。
As an example:
举个例子:
import pandas as pd
import numpy as np
dates = pd.date_range('20130101',periods=6)
df = pd.DataFrame(np.random.randn(6,4),index=dates,columns=list('ABCD'))
tableName = 'table'
columnNames = df.columns.values.tolist()
columnTypes = map(lambda x: x.name, df.dtypes.values)
# Storing column names and dtypes in a dataframe
tableDef = pd.DataFrame(index = range(len(df.columns) + 1), columns=['cols', 'dtypes'])
tableDef.iloc[0] = ['index', df.index.dtype.name]
tableDef.loc[1:, 'cols'] = columnNames
tableDef.loc[1:, 'dtypes'] = columnTypes
# Defining a dictionnary to convert dtypes
conversion = {'datetime64[ns]':'timestamp with time zone', 'float64':'double precision'}
# Writing sql in a file
f = open('yourdir\%s.sql' % tableName, 'w')
f.write('CREATE TABLE %s\n' % tableName)
f.write('(\n')
for i, row in tableDef.iterrows():
sep = ",\n" if i < tableDef.index[-1] else "\n"
f.write('\t\"%s\" %s%s' % (row['cols'], conversion[row['dtypes']], sep))
f.write(')')
f.close()
You can do the same way to populate your table with INSERT INTO.
您可以使用 INSERT INTO 以相同的方式填充您的表。

