带有 FIRSTROW 参数的 SQL 批量插入会跳过以下行

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1029384/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-01 02:30:15  来源:igfitidea点击:

SQL Bulk Insert with FIRSTROW parameter skips the following line

sqlsql-server-2005bulkinsert

提问by gibbo

I can't seem to figure out how this is happening.

我似乎无法弄清楚这是怎么发生的。

Here's an example of the file that I'm attempting to bulk insert into SQL server 2005:

这是我尝试批量插入 SQL Server 2005 的文件示例:

***A NICE HEADER HERE***
0000001234|SSNV|00013893-03JUN09
0000005678|ABCD|00013893-03JUN09
0000009112|0000|00013893-03JUN09
0000009112|0000|00013893-03JUN09

Here's my bulk insert statement:

这是我的批量插入语句:

BULK INSERT sometable
FROM 'E:\filefromabove.txt
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR= '|',
ROWTERMINATOR = '\n'
)

But, for some reason the only output I can get is:

但是,由于某种原因,我能得到的唯一输出是:

0000005678|ABCD|00013893-03JUN09
0000009112|0000|00013893-03JUN09
0000009112|0000|00013893-03JUN09

The first record always gets skipped, unless I remove the header altogether and don't use the FIRSTROW parameter. How is this possible?

第一条记录总是被跳过,除非我完全删除标题并且不使用 FIRSTROW 参数。这怎么可能?

Thanks in advance!

提前致谢!

回答by Cade Roux

I don't think you can skip rows in a different format with BULK INSERT/BCP.

我认为您不能使用BULK INSERT/跳过不同格式的行BCP

When I run this:

当我运行这个:

TRUNCATE TABLE so1029384

BULK INSERT so1029384
FROM 'C:\Data\test\so1029384.txt'
WITH
(
--FIRSTROW = 2,
FIELDTERMINATOR= '|',
ROWTERMINATOR = '\n'
)

SELECT * FROM so1029384

I get:

我得到:

col1                                               col2                                               col3
-------------------------------------------------- -------------------------------------------------- --------------------------------------------------
***A NICE HEADER HERE***
0000001234               SSNV                                               00013893-03JUN09
0000005678                                         ABCD                                               00013893-03JUN09
0000009112                                         0000                                               00013893-03JUN09
0000009112                                         0000                                               00013893-03JUN09

It looks like it requires the '|' even in the header data, because it reads up to that into the first column - swallowing up a newline into the first column. Obviously if you include a field terminator parameter, it expects that every row MUSThave one.

看起来它需要“|” 即使在标题数据中,因为它读取到第一列 - 将换行符吞入第一列。显然,如果您包含字段终止符参数,则它希望每一行都必须有一个。

You could strip the row with a pre-processing step. Another possibility is to select only complete rows, then process them (exluding the header). Or use a tool which can handle this, like SSIS.

您可以使用预处理步骤剥离该行。另一种可能性是只选择完整的行,然后处理它们(不包括标题)。或者使用可以处理此问题的工具,例如 SSIS。

回答by Marc Gravell

Maybe check that the header has the same line-ending as the actual data rows (as specified in ROWTERMINATOR)?

也许检查标题是否与实际数据行具有相同的行尾(如 中指定ROWTERMINATOR)?

Update: from MSDN:

更新:来自MSDN

The FIRSTROW attribute is not intended to skip column headers. Skipping headers is not supported by the BULK INSERT statement. When skipping rows, the SQL Server Database Engine looks only at the field terminators, and does not validate the data in the fields of skipped rows.

FIRSTROW 属性不打算跳过列标题。BULK INSERT 语句不支持跳过标头。跳过行时,SQL Server 数据库引擎仅查看字段终止符,而不验证跳过行的字段中的数据。

回答by norlando

I found it easiest to just read the entire line into one column then parse out the data using XML.

我发现最简单的方法是将整行读入一列,然后使用 XML 解析数据。

IF (OBJECT_ID('tempdb..#data') IS NOT NULL) DROP TABLE #data
CREATE TABLE #data (data VARCHAR(MAX))

BULK INSERT #data FROM 'E:\filefromabove.txt' WITH (FIRSTROW = 2, ROWTERMINATOR = '\n')

IF (OBJECT_ID('tempdb..#dataXml') IS NOT NULL) DROP TABLE #dataXml
CREATE TABLE #dataXml (ID INT NOT NULL IDENTITY(1,1) PRIMARY KEY CLUSTERED, data XML)

INSERT #dataXml (data)
SELECT CAST('<r><d>' + REPLACE(data, '|', '</d><d>') + '</d></r>' AS XML)
FROM #data

SELECT  d.data.value('(/r//d)[1]', 'varchar(max)') AS col1,
        d.data.value('(/r//d)[2]', 'varchar(max)') AS col2,
        d.data.value('(/r//d)[3]', 'varchar(max)') AS col3
FROM #dataXml d

回答by suresh kumar

You can use the below snippet

您可以使用以下代码段

BULK INSERT TextData
FROM 'E:\filefromabove.txt'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '|',  --CSV field delimiter
ROWTERMINATOR = '\n',   --Use to shift the control to next row
ERRORFILE = 'E:\ErrorRows.csv',
TABLOCK
)

回答by Keith MacDonald

Given how mangled some data can look after BCP importing into SQL Server from non-SQL data sources, I'd suggest doing all the BCP import into some scratch tables first.

考虑到在将 BCP 从非 SQL 数据源导入 SQL Server 后,某些数据会受到多大影响,我建议先将所有 BCP 导入到一些临时表中。

For example

例如

truncate table Address_Import_tbl

截断表 Address_Import_tbl

BULK INSERT dbo.Address_Import_tbl FROM 'E:\external\SomeDataSource\Address.csv' WITH ( FIELDTERMINATOR = '|', ROWTERMINATOR = '\n', MAXERRORS = 10 )

BULK INSERT dbo.Address_Import_tbl FROM 'E:\external\SomeDataSource\Address.csv' WITH ( FIELDTERMINATOR = '|', ROWTERMINATOR = '\n', MAXERRORS = 10 )

Make sure all the columns in Address_Import_tbl are nvarchar(), to make it as agnostic as possible, and avoid type conversion errors.

确保 Address_Import_tbl 中的所有列都是 nvarchar(),以使其尽可能不可知,并避免类型转换错误。

Then apply whatever fixes you need to Address_Import_tbl. Like deleting the unwanted header.

然后将您需要的任何修复应用到 Address_Import_tbl。就像删除不需要的标题一样。

Then run a INSERT SELECT query, to copy from Address_Import_tbl to Address_tbl, along with any datatype conversions you need. For example, to cast imported dates to SQL DATETIME.

然后运行 ​​INSERT SELECT 查询,从 Address_Import_tbl 复制到 Address_tbl,以及您需要的任何数据类型转换。例如,将导入的日期转换为 SQL DATETIME。