Pandas to_sql 如何确定将哪个数据框列放入哪个数据库字段？

Question

提问by Alexander Moore

I'm currently using Pandas to_sql in order to place a large dataframe into an SQL database. I'm using sqlalchemy in order to connect with the database and part of that process is defining the columns of the database tables.

我目前正在使用 Pandas to_sql 以便将大型数据框放入 SQL 数据库中。我正在使用 sqlalchemy 来连接数据库，该过程的一部分是定义数据库表的列。

My question is, when I'm running to_sql on a dataframe, how does it know what column from the dataframe goes into what field in the database? Is it looking at column names in the dataframe and looking for the same fields in the database? Is it the order that the variables are in?

我的问题是，当我在数据帧上运行 to_sql 时，它如何知道数据帧中的哪一列进入了数据库中的哪个字段？它是否在查看数据框中的列名并在数据库中查找相同的字段？是变量所在的顺序吗？

Here's some example code to facilitate discussion:

下面是一些示例代码以方便讨论：

engine = create_engine('sqlite:///store_data.db')
meta = MetaData()

table_pop = Table('xrf_str_geo_ta4_1511', meta, 
    Column('TDLINX',Integer, nullable=True, index=True),
    Column('GEO_ID',Integer, nullable=True),
    Column('PERCINCL', Numeric, nullable=True)
)

meta.create_all(engine)

for df in pd.read_csv(file, chunksize=50000, iterator=True, encoding='utf-8', sep=',')
    df.to_sql('table_name', engine, flavor='sqlite', if_exists='append', index=index)

The dataframe in question has 3 columns TDLINX, GEO_ID, and PERCINCL

有问题的数据框有 3 列 TDLINX、GEO_ID 和 PERCINCL

Answer 1

回答by joris

The answer is indeed what you suggest: it is looking at the column names. So matching columns names is important, the order does not matter.

答案确实是您所建议的：它正在查看列名。所以匹配列名很重要，顺序无关紧要。

To be fully correct, pandas will not actually check this. What to_sqldoes under the hood is executing an insert statement where the data to insert is provided as a dict, and then it is just up to the database driver to handle this.
This also means that pandas will not check the dtypes or the number of columns (e.g. if not all fields of the database are present as columns in the dataframe, these will filled with a default value in the database for these rows).

为了完全正确，Pandas实际上不会检查这一点。什么to_sql引擎盖下确实正在执行其中的数据插入作为字典提供了一个INSERT语句，然后它只是到数据库驱动程序来处理这个问题。
这也意味着 Pandas 不会检查 dtypes 或列数（例如，如果不是数据库的所有字段都作为列出现在数据框中，这些将在数据库中填充这些行的默认值）。

Pandas to_sql 如何确定将哪个数据框列放入哪个数据库字段？

提问by Alexander Moore

回答by joris

相关推荐

最近更新

标签

Pandas to_sql 如何确定将哪个数据框列放入哪个数据库字段？

提问by Alexander Moore

回答by joris

相关推荐

pandas 熊猫在 to_csv 中转义回车

pandas 如何将相关矩阵绘制为一组椭圆，类似于 R 露天包？

Pandas 数据框：如何按列中的值分组并从分组值中创建新列

Pandas: AttributeError: 'module' 对象没有属性 '__version__'

相关推荐

最近更新

标签

Pandas: AttributeError: 'module' 对象没有属性 'version'