Pandas to_sql 如何确定将哪个数据框列放入哪个数据库字段?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/34771256/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How does Pandas to_sql determine what dataframe column is placed into what database field?
提问by Alexander Moore
I'm currently using Pandas to_sql in order to place a large dataframe into an SQL database. I'm using sqlalchemy in order to connect with the database and part of that process is defining the columns of the database tables.
我目前正在使用 Pandas to_sql 以便将大型数据框放入 SQL 数据库中。我正在使用 sqlalchemy 来连接数据库,该过程的一部分是定义数据库表的列。
My question is, when I'm running to_sql on a dataframe, how does it know what column from the dataframe goes into what field in the database? Is it looking at column names in the dataframe and looking for the same fields in the database? Is it the order that the variables are in?
我的问题是,当我在数据帧上运行 to_sql 时,它如何知道数据帧中的哪一列进入了数据库中的哪个字段?它是否在查看数据框中的列名并在数据库中查找相同的字段?是变量所在的顺序吗?
Here's some example code to facilitate discussion:
下面是一些示例代码以方便讨论:
engine = create_engine('sqlite:///store_data.db')
meta = MetaData()
table_pop = Table('xrf_str_geo_ta4_1511', meta,
Column('TDLINX',Integer, nullable=True, index=True),
Column('GEO_ID',Integer, nullable=True),
Column('PERCINCL', Numeric, nullable=True)
)
meta.create_all(engine)
for df in pd.read_csv(file, chunksize=50000, iterator=True, encoding='utf-8', sep=',')
df.to_sql('table_name', engine, flavor='sqlite', if_exists='append', index=index)
The dataframe in question has 3 columns TDLINX, GEO_ID, and PERCINCL
有问题的数据框有 3 列 TDLINX、GEO_ID 和 PERCINCL
回答by joris
The answer is indeed what you suggest: it is looking at the column names. So matching columns names is important, the order does not matter.
答案确实是您所建议的:它正在查看列名。所以匹配列名很重要,顺序无关紧要。
To be fully correct, pandas will not actually check this. What to_sql
does under the hood is executing an insert statement where the data to insert is provided as a dict, and then it is just up to the database driver to handle this.
This also means that pandas will not check the dtypes or the number of columns (e.g. if not all fields of the database are present as columns in the dataframe, these will filled with a default value in the database for these rows).
为了完全正确,Pandas实际上不会检查这一点。什么to_sql
引擎盖下确实正在执行其中的数据插入作为字典提供了一个INSERT语句,然后它只是到数据库驱动程序来处理这个问题。
这也意味着 Pandas 不会检查 dtypes 或列数(例如,如果不是数据库的所有字段都作为列出现在数据框中,这些将在数据库中填充这些行的默认值)。