如何将一个非常大的 csv 文件导入到现有的 SQL Server 表中?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/8328477/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 13:28:17  来源:igfitidea点击:

How to import a very large csv file into an existing SQL Server table?

sqlsql-servercsvbulkinsertbcp

提问by Micky Walia

I have a very large csv file with ~500 columns, ~350k rows, which I am trying to import into an existing SQL Server table.

我有一个非常大的 csv 文件,其中包含 ~500 列、~350k 行,我正在尝试将其导入到现有的 SQL Server 表中。

I have tried BULK INSERT, I get - Query executed successfully, 0 rows affected. Interestingly, BULK INSERTworked, in a matter of seconds, for a similar operation but for a much smaller csv file, less than 50 cols., ~77k rows.

我试过了BULK INSERT,我得到 - Query executed successfully, 0 rows affected。有趣的是,BULK INSERT在几秒钟内完成了类似的操作,但对于更小的 csv 文件,少于 50 列,~77k 行。

I have also tried bcp, I get - Unexpected EOF encountered in BCP data-file. BCP copy in failed.

我也试过bcp,我得到- Unexpected EOF encountered in BCP data-file. BCP copy in failed

The task is simple - it shouldn't be hard to the limits of pure frustration. Any ideas or suggestions? Any other tools, utilities that you have successfully used to accomplish a bulk import operation or something similar? Thanks.

任务很简单 - 应该不难达到纯粹的挫败感。有什么想法或建议吗?您已成功用于完成批量导入操作或类似操作的任何其他工具、实用程序?谢谢。

-- BULK INSERT

-- 批量插入

USE myDb  
BULK INSERT myTable  
FROM 'C:\Users\myFile.csv'  
WITH  
(  
FIRSTROW = 2,  
-- DATAFILETYPE = 'char',  
-- MAXERRORS = 100,  
FIELDTERMINATOR = ',',  
ROWTERMINATOR = '\n'  
);

-- bcp

-- bcp

bcp myDb.dbo.myTable in 'C:\Users\myFile.csv' -T -t, -c

UPDATE
I have now changed course. I've decided to join the csv files, which was my goal to begin with, outside of SQL Server so that I don't have to upload the data to a table for now. However, it'll be interesting to try to upload (BULK INSERT or 'bcp') only 1 record (~490 cols.) from the csv file, which otherwise failed, and see if it works.

更新
我现在改变了路线。我决定在 SQL Server 之外加入 csv 文件,这是我开始的目标,这样我现在不必将数据上传到表中。但是,尝试从 csv 文件上传(BULK INSERT 或“bcp”)仅 1 条记录(~490 列)会很有趣,否则会失败,看看它是否有效。

回答by Jimbo

Check your file for an EOF character where it shouldn't be - BCP is telling you there is a problem with the file.

检查文件中不应该出现的 EOF 字符 - BCP 告诉您文件有问题。

Notepad ++ may be able to load the file for you to view and search.

Notepad ++ 可能会加载文件供您查看和搜索。

回答by alzaimar

Most likely the last line lacks a \n. Also, there is a limitation in the row size (8060 bytes) in SQL-Server although T-SQL should have mention this. However, check this link:

很可能最后一行缺少\n. 此外,SQL-Server 中的行大小(8060 字节)有限制,尽管 T-SQL 应该提到这一点。但是,请检查此链接

My advice: Start with one row and get it to work. Then the rest.

我的建议:从一行开始,让它发挥作用。然后剩下的。

回答by m.swiss

It is probably not the solution your expecting but with Python you could create a table out of the csv very easily (just uploaded a 1GB CSV file):

这可能不是您期望的解决方案,但使用 Python 您可以非常轻松地从 csv 创建一个表(只需上传一个 1GB CSV 文件):

import pandas as pd
import psycopg2
from sqlalchemy import create_engine

# Read the csv to a dataframe
df = pd.read_csv('path_to_csv_file', index_col='name_of_index_column',  sep=",") 

# Connect and upload
engine = create_engine('postgresql+psycopg2://db_user_name:db_password@localhost:5432/' + 'db_name', client_encoding='utf8')
df.to_sql('table_name', engine, if_exists='replace', index =True, index_label='name_of_index_column')

回答by Dharmendar Kumar 'DK'

How are you mapping the fields in the file with the columns in the table? Are the number of columns in the table the same as the number of fields in the file? Or are you using a format file to specify the column mapping? If so, is the format file formatted correctly?

您如何将文件中的字段与表中的列进行映射?表中的列数是否与文件中的字段数相同?或者您是否使用格式文件来指定列映射?如果是,格式文件的格式是否正确?

If you are using the format file and if you have the "Number of columns" parameter wrong, it will cause the error "Unexpected end of file". See this for some other errors/issues with bulk uploading.

如果您使用的是格式文件,并且“列数”参数设置错误,则会导致“意外的文件结尾”错误。有关批量上传的其他一些错误/问题,请参见此处。