MySQL LOAD DATA INFILE:有效,但不可预测的行终止符
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/10935219/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
MySQL LOAD DATA INFILE: works, but unpredictable line terminator
提问by SDC
MySQL has a nice CSV import function LOAD DATA INFILE
.
MySQL 有一个很好的 CSV 导入功能LOAD DATA INFILE
。
I have a large dataset that needs to be imported from CSV on a regular basis, so this feature is exactly what I need. I've got a working script that imports my data perfectly.
我有一个需要定期从 CSV 导入的大型数据集,所以这个功能正是我所需要的。我有一个可以完美导入我的数据的工作脚本。
.....except.... I don't know in advance what the end-of-line terminator will be.
.....除了.... 我事先不知道行尾终结符是什么。
My SQL code currently looks something like this:
我的 SQL 代码目前看起来像这样:
LOAD DATA INFILE '{fileName}'
INTO TABLE {importTable}
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
( {fieldList} );
This works great for some import files.
这对某些导入文件很有用。
However, the import data is coming from multiple sources. Some of them have the \n
terminator; others have \r\n
. I can't predict which one I'll have.
但是,导入数据来自多个来源。其中一些有\n
终结者;别人有\r\n
。我无法预测我会拥有哪一个。
Is there a way using LOAD DATA INFILE
to specify that my lines may be terminated with either \n
or \r\n
? How do I deal with this?
有没有办法LOAD DATA INFILE
用来指定我的行可以用\n
或终止\r\n
?我该如何处理?
采纳答案by codemonkey
I'd just pre-process it. A global search/replace to change \r\n to \n done from a command line tool as part of the import process should be simple and performant.
我只是预处理它。作为导入过程的一部分,从命令行工具将 \r\n 更改为 \n 的全局搜索/替换应该简单且高效。
回答by Devart
You can specify line separator as '\n' and remove trailing '\r' separators if necessary from the last field during loading.
您可以将行分隔符指定为 '\n' 并在加载期间根据需要从最后一个字段中删除尾随的 '\r' 分隔符。
For example -
例如 -
Suppose we have the 'entries.txt' file. The line separator is '\r\n', and only after line ITEM2 | CLASS3 | DATE2
the separator is '\n':
假设我们有“entries.txt”文件。行分隔符是 '\r\n',只有在行之后ITEM2 | CLASS3 | DATE2
,分隔符是 '\n':
COL1 | COL2 | COL3
ITEM1 | CLASS1 | DATE1
ITEM2 | CLASS3 | DATE2
ITEM3 | CLASS1 | DATE3
ITEM4 | CLASS2 | DATE4
CREATE TABLE statement:
创建表语句:
CREATE TABLE entries(
column1 VARCHAR(255) DEFAULT NULL,
column2 VARCHAR(255) DEFAULT NULL,
column3 VARCHAR(255) DEFAULT NULL
)
Our LOAD DATA INFILE query:
我们的 LOAD DATA INFILE 查询:
LOAD DATA INFILE 'entries.txt' INTO TABLE entries
FIELDS TERMINATED BY '|'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(column1, column2, @var)
SET column3 = TRIM(TRAILING '\r' FROM @var);
Show results:
显示结果:
SELECT * FROM entries;
+---------+----------+---------+
| column1 | column2 | column3 |
+---------+----------+---------+
| ITEM1 | CLASS1 | DATE1 |
| ITEM2 | CLASS3 | DATE2 |
| ITEM3 | CLASS1 | DATE3 |
| ITEM4 | CLASS2 | DATE4 |
+---------+----------+---------+
回答by Prathap
I assuming the you need information only through mysql no by any programming language. Before use load data covert the format to windows format\r\n ( CR LF ) if u have notepad++. And then process the Load data query. Make sure the LINES TERMINATED BY '\r\n'
我假设您只需要通过 mysql 不需要任何编程语言的信息。如果您有记事本++,则在使用加载数据之前将格式转换为Windows 格式\r\n ( CR LF )。然后处理Load数据查询。确保 LINES TERMINATED BY '\r\n'
Edit:
编辑:
Since the editors are often unsuitable for converting larger files. For larger files the following command is often used both windows and linux
由于编辑器通常不适合转换较大的文件。对于较大的文件,以下命令通常用于 windows 和 linux
1) To convert into windows format in windows
1)在windows中转换成windows格式
TYPE [unix_file] | FIND "" /V > dos_file
2) To convert into windows format in linux
2)在linux下转换成windows格式
unix2dos [file]
The other commands also available
其他命令也可用
A windows format file can be converted to Unix format by simply removing all ASCII CR \rcharacters by tr -d '\r' < inputfile > outputfile
Windows 格式文件可以通过简单地通过 tr -d '\r' < inputfile > outputfile删除所有 ASCII CR \r字符来转换为 Unix 格式
grep -PL $'\r\n' myfile.txt # show UNIX format style file (LF terminated)
grep -Pl $'\r\n' myfile.txt # show WINDOS format style file (CRLF terminated)
In linux/unix the filecommand detects the type of End-Of-Line (EOL) used. So the file type can be checked using this command
在 linux/unix 中,file命令检测使用的行尾 (EOL) 类型。因此可以使用此命令检查文件类型
回答by BRM
You could also look into one of the data integration packages out there. Talend Open Studio has very flexible data input routines. For example you could process the file with one set of delimiters and catch the rejects and process them another way.
您还可以查看其中一个数据集成包。Talend Open Studio 具有非常灵活的数据输入例程。例如,您可以使用一组分隔符处理文件并捕获拒绝并以另一种方式处理它们。
回答by Breland
If the first load has 0 rows, do the same statement with the other line terminator. This should be do-able with some basic counting logic.
如果第一次加载有 0 行,则对另一个行终止符执行相同的语句。这应该可以通过一些基本的计数逻辑来实现。
At least it stays all in SQL, and if it works the first time you win. And could cause less headache that re-scanning all the rows and removing a particular character.
至少它保留在 SQL 中,并且如果它在您第一次获胜时有效。并且可以减少重新扫描所有行并删除特定字符的头痛。
回答by Tim
Why not first just take a peek at how the lines end?
为什么不先看看这些行是如何结束的呢?
$handle = fopen('inputFile.csv', 'r');
$i = 0;
if ($handle) {
while (($buffer = fgets($handle)) !== false) {
$s = substr($buffer,-50);
echo $s;
echo preg_match('/\r/', $s) ? 'cr ' : '-- ';
echo preg_match('/\n/', $s) ? 'nl<br>' : '--<br>';
if( $i++ > 5)
break;
}
fclose($handle);
}
回答by Ivan Abramenko
You can use LINES STARTING to separate usual line endings in text and a new row:
您可以使用 LINES STARTING 将文本中的常用行结尾和新行分开:
LOAD DATA LOCAL INFILE '/home/laptop/Downloads/field3-utf8.csv'
IGNORE INTO TABLE Field FIELDS
TERMINATED BY ';'
OPTIONALLY ENCLOSED BY '^'
LINES STARTING BY '^'
TERMINATED BY '\r\n'
(Id, Form_id, Name, Value)
For usual CSV files with " enclosing chars, it will be:
对于带有 " 封闭字符的普通 CSV 文件,它将是:
...
LINES STARTING BY '"'
...