MySQL LOAD DATA INFILE:有效,但不可预测的行终止符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/10935219/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 13:43:59  来源:igfitidea点击:

MySQL LOAD DATA INFILE: works, but unpredictable line terminator

mysqlcsvdelimiterend-of-line

提问by SDC

MySQL has a nice CSV import function LOAD DATA INFILE.

MySQL 有一个很好的 CSV 导入功能LOAD DATA INFILE

I have a large dataset that needs to be imported from CSV on a regular basis, so this feature is exactly what I need. I've got a working script that imports my data perfectly.

我有一个需要定期从 CSV 导入的大型数据集,所以这个功能正是我所需要的。我有一个可以完美导入我的数据的工作脚本。

.....except.... I don't know in advance what the end-of-line terminator will be.

.....除了.... 我事先不知道行尾终结符是什么。

My SQL code currently looks something like this:

我的 SQL 代码目前看起来像这样:

LOAD DATA INFILE '{fileName}'
 INTO TABLE {importTable}
 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
 LINES TERMINATED BY '\n'
 IGNORE 1 LINES
( {fieldList} );

This works great for some import files.

这对某些导入文件很有用。

However, the import data is coming from multiple sources. Some of them have the \nterminator; others have \r\n. I can't predict which one I'll have.

但是,导入数据来自多个来源。其中一些有\n终结者;别人有\r\n。我无法预测我会拥有哪一个。

Is there a way using LOAD DATA INFILEto specify that my lines may be terminated with either \nor \r\n? How do I deal with this?

有没有办法LOAD DATA INFILE用来指定我的行可以用\n或终止\r\n?我该如何处理?

采纳答案by codemonkey

I'd just pre-process it. A global search/replace to change \r\n to \n done from a command line tool as part of the import process should be simple and performant.

我只是预处理它。作为导入过程的一部分,从命令行工具将 \r\n 更改为 \n 的全局搜索/替换应该简单且高效。

回答by Devart

You can specify line separator as '\n' and remove trailing '\r' separators if necessary from the last field during loading.

您可以将行分隔符指定为 '\n' 并在加载期间根据需要从最后一个字段中删除尾随的 '\r' 分隔符。

For example -

例如 -

Suppose we have the 'entries.txt' file. The line separator is '\r\n', and only after line ITEM2 | CLASS3 | DATE2the separator is '\n':

假设我们有“entries.txt”文件。行分隔符是 '\r\n',只有在行之后ITEM2 | CLASS3 | DATE2,分隔符是 '\n':

COL1  | COL2   | COL3
ITEM1 | CLASS1 | DATE1
ITEM2 | CLASS3 | DATE2
ITEM3 | CLASS1 | DATE3
ITEM4 | CLASS2 | DATE4

CREATE TABLE statement:

创建表语句:

CREATE TABLE entries(
  column1 VARCHAR(255) DEFAULT NULL,
  column2 VARCHAR(255) DEFAULT NULL,
  column3 VARCHAR(255) DEFAULT NULL
)

Our LOAD DATA INFILE query:

我们的 LOAD DATA INFILE 查询:

LOAD DATA INFILE 'entries.txt' INTO TABLE entries
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(column1, column2, @var)
SET column3 = TRIM(TRAILING '\r' FROM @var);

Show results:

显示结果:

SELECT * FROM entries;
+---------+----------+---------+
| column1 | column2  | column3 |
+---------+----------+---------+
| ITEM1   |  CLASS1  |  DATE1  |
| ITEM2   |  CLASS3  |  DATE2  |
| ITEM3   |  CLASS1  |  DATE3  |
| ITEM4   |  CLASS2  |  DATE4  |
+---------+----------+---------+

回答by Prathap

I assuming the you need information only through mysql no by any programming language. Before use load data covert the format to windows format\r\n ( CR LF ) if u have notepad++. And then process the Load data query. Make sure the LINES TERMINATED BY '\r\n'

我假设您只需要通过 mysql 不需要任何编程语言的信息。如果您有记事本++,则在使用加载数据之前将格式转换为Windows 格式\r\n ( CR LF )。然后处理Load数据查询。确保 LINES TERMINATED BY '\r\n'

enter image description here

在此处输入图片说明

Edit:

编辑:

Since the editors are often unsuitable for converting larger files. For larger files the following command is often used both windows and linux

由于编辑器通常不适合转换较大的文件。对于较大的文件,以下命令通常用于 windows 和 linux

1) To convert into windows format in windows

1)在windows中转换成windows格式

TYPE [unix_file] | FIND "" /V > dos_file

2) To convert into windows format in linux

2)在linux下转换成windows格式

unix2dos  [file]

The other commands also available

其他命令也可用

A windows format file can be converted to Unix format by simply removing all ASCII CR \rcharacters by tr -d '\r' < inputfile > outputfile

Windows 格式文件可以通过简单地通过 tr -d '\r' < inputfile > outputfile删除所有 ASCII CR \r字符来转换为 Unix 格式

grep -PL $'\r\n' myfile.txt # show UNIX format  style file (LF terminated)
grep -Pl $'\r\n' myfile.txt # show WINDOS format style file (CRLF terminated)

In linux/unix the filecommand detects the type of End-Of-Line (EOL) used. So the file type can be checked using this command

在 linux/unix 中,file命令检测使用的行尾 (EOL) 类型。因此可以使用此命令检查文件类型

回答by BRM

You could also look into one of the data integration packages out there. Talend Open Studio has very flexible data input routines. For example you could process the file with one set of delimiters and catch the rejects and process them another way.

您还可以查看其中一个数据集成包。Talend Open Studio 具有非常灵活的数据输入例程。例如,您可以使用一组分隔符处理文件并捕获拒绝并以另一种方式处理它们。

回答by Breland

If the first load has 0 rows, do the same statement with the other line terminator. This should be do-able with some basic counting logic.

如果第一次加载有 0 行,则对另一个行终止符执行相同的语句。这应该可以通过一些基本的计数逻辑来实现。

At least it stays all in SQL, and if it works the first time you win. And could cause less headache that re-scanning all the rows and removing a particular character.

至少它保留在 SQL 中,并且如果它在您第一次获胜时有效。并且可以减少重新扫描所有行并删除特定字符的头痛。

回答by Tim

Why not first just take a peek at how the lines end?

为什么不先看看这些行是如何结束的呢?

$handle = fopen('inputFile.csv', 'r');

$i = 0;
if ($handle) {
    while (($buffer = fgets($handle)) !== false) {

        $s =  substr($buffer,-50);

        echo $s; 
        echo preg_match('/\r/', $s) ? 'cr ' : '-- ';
        echo preg_match('/\n/', $s) ? 'nl<br>' : '--<br>';          

        if( $i++ > 5)
            break;

    }

    fclose($handle);
}

回答by Ivan Abramenko

You can use LINES STARTING to separate usual line endings in text and a new row:

您可以使用 LINES STARTING 将文本中的常用行结尾和新行分开:

LOAD DATA LOCAL INFILE '/home/laptop/Downloads/field3-utf8.csv' 
IGNORE INTO TABLE Field FIELDS 
TERMINATED BY ';' 
OPTIONALLY ENCLOSED BY '^' 
LINES STARTING BY '^' 
TERMINATED BY '\r\n' 
(Id, Form_id, Name, Value)

For usual CSV files with " enclosing chars, it will be:

对于带有 " 封闭字符的普通 CSV 文件,它将是:

...
LINES STARTING BY '"' 
...