Pandas to_sql 将列类型从 varchar 更改为 text

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/43631181/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 03:28:49  来源:igfitidea点击:

Pandas to_sql change column type from varchar to text

pythonsql-serverpandas

提问by O. San

I wrote a little script to copy a table between SQL servers. It works, but one of the columns changed type from varchar to text... How do I make it to copy a table with the same columns types?

我写了一个小脚本来在 SQL 服务器之间复制一个表。它有效,但其中一列将类型从 varchar 更改为文本...如何复制具有相同列类型的表?

import pymssql
import pandas as pd
from sqlalchemy import create_engine


db_server= 1.2.3.4\r2
db_database="Test_DB"
db_user="vaf"
db_password="1234"

local_db_server="1.1.1.1\r2"
local_db_database="Test_DB"
local_db_user="vaf"
local_db_password="1234"

some_query=("""
select * from some_table
""")




def main():
    conn=pymssql.connect(server=local_db_server,user=local_db_user,password=local_db_password,database=local_db_database,charset='UTF-8')
    data=pd.io.sql.read_sql(some_query,conn)

    connection_string='mssql+pymssql://{}:{}@{}/{}'.format(db_user,db_password,db_server,db_database)
    engine=create_engine(connection_string)
    data.to_sql(name="some_table",con=engine,if_exists='replace',index=False)

if __name__ == "__main__":
    main()

Thanks

谢谢

回答by Parfait

Consider three approaches:

考虑三种方法:

SPECIFY TYPES(proactive as it anticipates ahead)

指定类型(积极主动,因为它预测未来)

Using the dtypeargument of pandas.DataFrame.to_sql, pass a dictionary of sqlalchemy typesfor named columns.

使用 的dtype参数pandas.DataFrame.to_sql,为命名列传递sqlalchemy 类型的字典。

data.to_sql(name="some_table", con=engine, if_exists='replace', index=False,
            dtype={'datefld': sqlalchemy.DateTime(), 
                   'intfld':  sqlalchemy.types.INTEGER(),
                   'strfld': sqlalchemy.types.VARCHAR(length=255),
                   'floatfld': sqlalchemy.types.Float(precision=3, asdecimal=True),
                   'booleanfld': sqlalchemy.types.Boolean}

DELETE DATA(proactive as it anticipates ahead)

删除数据(主动,因为它预计未来)

Clean out table with DELETEaction query. Then, migrate only the data from pandas to SQL Server without structurally changing table since to_sqlreplaceargument re-creates the table. This approach assumes dataframe is always consistent (no new columns / changed types) with database table.

使用DELETE操作查询清理表。然后,仅将数据从 Pandas 迁移到 SQL Server,而无需在结构上更改表,因为to_sqlreplace参数会重新创建表。这种方法假设数据框始终与数据库表一致(没有新列/更改的类型)。

def main():
   connection_string = 'mssql+pymssql://{}:{}@{}/{}'\
                         .format(db_user,db_password,db_server,db_database)
   engine = create_engine(connection_string)

   # IMPORT DATA INTO DATA FRAME
   data = pd.read_sql(some_query, engine)

   # SQL DELETE (CLEAN OUT TABLE) VIA TRANSACTION
   with engine.begin() as conn:     
      conn.execute("DELETE FROM some_table")

   # MIGRATE DATA INTO DATA FRAME (APPEND NOT REPLACE)
   data.to_sql(name='some_table', con=engine, if_exists='append', index=False)

MODIFY COLUMN(reactive as it fixes ad-hoc)

修改列(反应性,因为它修复了临时)

Alter the column after migration with a DDL SQL statement.

使用 DDL SQL 语句在迁移后更改列。

def main():
   connection_string = 'mssql+pymssql://{}:{}@{}/{}'\
                         .format(db_user,db_password,db_server,db_database)
   engine = create_engine(connection_string)

   # IMPORT DATA INTO DATA FRAME
   data = pd.read_sql(some_query, engine)

   # MIGRATE DATA INTO DATA FRAME 
   data.to_sql(name="some_table", con=engine, if_exists='replace', index=False)

   # ALTER COLUMN TYPE (ASSUMING USER HAS RIGHTS/PRIVILEGES)
   with engine.begin() as conn:     
      conn.execute("ALTER TABLE some_table ALTER COLUMN mytextcolumn VARCHAR(255);")

I recommend the second approach as I believe databases should be agnostic to application code like python and pandas. Hence, initial build/re-build of table schema should be a planned, manual process, and no script should structurally change a database on the fly, only interact with data.

我推荐第二种方法,因为我认为数据库应该与 python 和 pandas 等应用程序代码无关。因此,表模式的初始构建/重新构建应该是一个有计划的手动过程,并且脚本不应在结构上动态更改数据库,只与数据交互。