使用 Python 将 CSV 文件导入 SQL Server

Question

提问by Cesar

I am having trouble uploading a CSV file into a table in MS SQL Server, The CSV file has 25 columns and the header has the same name as table in SQL which also has 25 columns. When I run the script it throws an error

我在将 CSV 文件上传到 MS SQL Server 中的表时遇到问题，CSV 文件有 25 列，标题与 SQL 中的表同名，后者也有 25 列。当我运行脚本时，它会引发错误

params arg (<class 'list'>) can be only a tuple or a dictionary

What is the best way to import this data into MS SQL? Both the CSV and SQL table have the exact same column names.

将此数据导入 MS SQL 的最佳方法是什么？CSV 和 SQL 表都具有完全相同的列名。

Here is the code:

这是代码：

import csv
import pymssql

conn = pymssql.connect(
    server="xx.xxx.xx.90",
    port = 2433,
    user='SQLAdmin',
    password='xxxxxxxx',
    database='NasrWeb'
)

cursor = conn.cursor()
customer_data = csv.reader('cleanNVG.csv') #25 columns with same header as SQL

for row in customer_data:
    cursor.execute('INSERT INTO zzzOracle_Extract([Customer Name]\
      ,[Customer #]\
      ,[Account Name]\
      ,[Identifying Address Flag]\
      ,[Address1]\
      ,[Address2]\
      ,[Address3]\
      ,[Address4]\
      ,[City]\
      ,[County]\
      ,[State]\
      ,[Postal Code]\
      ,[Country]\
      ,[Category ]\
      ,[Class]\
      ,[Reference]\
      ,[Party Status]\
      ,[Address Status]\
      ,[Site Status]\
      ,[Ship To or Bill To]\
      ,[Default Warehouse]\
      ,[Default Order Type]\
      ,[Default Shipping Method]\
      ,[Optifacts Customer Number]\
      ,[Salesperson])''VALUES(%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,%s,)',row)

conn.commit()
cursor.close()
print("Done")
conn.close()

This is what the first rows of the CSV file looks like

这是 CSV 文件的第一行的样子

Answer 1

回答by citynorman

Try d6tstackwhich has fast pandas to SQL functionalitybecause it uses native DB import commands. It works for Postgres and MYSQL, MS SQL is experimental. Comment or raise an issue if it doesn't work.

尝试d6tstack，它具有快速的 Pandas 到 SQL 功能，因为它使用本机 DB 导入命令。它适用于 Postgres 和 MYSQL，MS SQL 是实验性的。如果它不起作用，请发表评论或提出问题。

import pandas as pd
df = pd.read_csv('cleanNVG.csv')
uri_mssql = 'mssql+pymssql://usr:pwd@localhost/db'
d6tstack.utils.pd_to_mssql(df, uri_mssql, 'table', 'schema') # experimental

It is also useful for importing multiple CSV with data schema changes and/or preprocess with pandas before writing to db, see further down in examples notebook

在写入数据库之前导入多个带有数据模式更改的 CSV 和/或使用 Pandas 进行预处理也很有用，请参阅示例笔记本中的进一步内容

d6tstack.combine_csv.CombinerCSV(glob.glob('*.csv'), 
    apply_after_read=apply_fun).to_mssql_combine(uri_psql, 'table')

Answer 2

回答by Gord Thompson

You are using csv.readerincorrectly. The first argument to .readeris not the path to the CSV file, it is

您使用csv.reader不当。第一个参数.reader不是 CSV 文件的路径，而是

[an] object which supports the iterator protocol and returns a string each time its __next__()method is called — file objects and list objects are both suitable.

[an] 支持迭代器协议并在每次__next__()调用其方法时返回一个字符串的对象——文件对象和列表对象都适用。

Hence, according to the example in the documentation, you should be doing something like this:

因此，根据文档中的示例，您应该执行以下操作：

import csv
with open('cleanNVG.csv', newline='') as csvfile:
    customer_data = csv.reader(csvfile)
    for row in customer_data:
        cursor.execute(sql, tuple(row))

Answer 3

回答by Gord Thompson

Check the data types on the table, and the sizes of each field as well. If it is varchar(10) and your data is 20 characters long, it will throw an error.

检查表上的数据类型，以及每个字段的大小。如果它是 varchar(10) 并且您的数据长度为 20 个字符，则会引发错误。

Also,

还，

Consider building the query dynamically to ensure the number of placeholders matches your table and CSV file format. Then it's just a matter of ensuring your table and CSV file are correct, instead of checking that you typed enough ? placeholders in your code.

考虑动态构建查询以确保占位符的数量与您的表和 CSV 文件格式相匹配。那么只需确保您的表格和 CSV 文件正确无误，而不是检查您输入的内容是否足够？代码中的占位符。

The following example assumes

以下示例假设

CSV file contains column names in the first line
Connection is already built
File name is test.csv
Table name is MyTable
Python 3

...
with open ('test.csv', 'r') as f:
    reader = csv.reader(f)
    columns = next(reader) 
    query = 'insert into MyTable({0}) values ({1})'
    query = query.format(','.join(columns), ','.join('?' * len(columns)))
    cursor = connection.cursor()
    for data in reader:
        cursor.execute(query, data)
        cursor.commit()

If column names are not included in the file:

如果文件中不包含列名：

...
with open ('test.csv', 'r') as f:
    reader = csv.reader(f)
    data = next(reader) 
    query = 'insert into dbo.Test values ({0})'
    query = query.format(','.join('?' * len(data)))
    cursor = connection.cursor()
    cursor.execute(query, data)
    for data in reader:
        cursor.execute(query, data)
    cursor.commit()

Basically, though, your code looks fine. Here is a generic sample.

不过，基本上，您的代码看起来不错。这是一个通用示例。

cur=cnxn.cursor() # Get the cursor
csv_data = csv.reader(file(Samplefile.csv')) # Read the csv
for rows in csv_data: # Iterate through csv
    cur.execute("INSERT INTO MyTable(Col1,Col2,Col3,Col4) VALUES (?,?,?,?)",rows)
cnxn.commit()

使用 Python 将 CSV 文件导入 SQL Server

提问by Cesar

回答by citynorman

回答by Gord Thompson

回答by Gord Thompson

相关推荐

最近更新

标签

使用 Python 将 CSV 文件导入 SQL Server

提问by Cesar

回答by citynorman

回答by Gord Thompson

回答by Gord Thompson

相关推荐

使用python请求登录网站

Python matplotlib - 设置 x 轴比例

Python 在 Scikit Learn 中运行 SelectKBest 后获取特征名称的最简单方法

Python 当 ID 匹配时，在其他 Pyspark 数据帧中逐列划分 Pyspark 数据帧

相关推荐

最近更新

标签