MySQL 从 CSV 文件加载数据,其中双引号用作转义字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17053530/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-31 17:55:13  来源:igfitidea点击:

LOAD DATA from CSV file where doublequote was used as the escape character

mysqlsqlcsvload-data-infilerfc4180

提问by rampion

I have a bunch of CSV data that I need to load into a MySQL database. Well, CSV-ish, perhaps. (edit: actually, it looks like the stuff described in RFC 4180)

我有一堆 CSV 数据需要加载到 MySQL 数据库中。好吧,也许是 CSV 式的。(编辑实际上,它看起来像 RFC 4180 中描述的东西

Each row is a list of comma-separated doublequoted strings. To escape any doublequotes that appear within a column value, double doublequotes are used. Backslashes are allowed to represent themselves.

每行都是一个逗号分隔的双引号字符串列表。要转义列值中出现的任何双引号,请使用双双引号。允许反斜杠代表自己。

For example, the line:

例如,该行:

"", "\wave\", ""hello,"" said the vicar", "what are ""scare-quotes"" good for?", "I'm reading ""Bossypants"""

if parsed into JSON should be:

如果解析成 JSON 应该是:

[ "", "\wave\", "\"hello,\" said the vicar", "what are \"scare-quotes\" good for?", "I'm reading \"Bossypants\"" ]

I'm trying to use the LOAD DATAto read the CSV in, but I'm running into some weird behaviour.

我正在尝试使用LOAD DATA来读取 CSV,但我遇到了一些奇怪的行为。



As an example, consider if I have a simple two column table

例如,考虑我是否有一个简单的两列表

shell% mysql exampledb -e "describe person"
+-------+-----------+------+-----+---------+-------+
| Field | Type      | Null | Key | Default | Extra |
+-------+-----------+------+-----+---------+-------+
| ID    | int(11)   | YES  |     | NULL    |       |
| UID   | char(255) | YES  |     | NULL    |       |
+-------+-----------+------+-----+---------+-------+
shell%

If the first non-header line of my input file ends on "":

如果我的输入文件的第一个非标题行结束于""

shell% cat temp-1.csv
"ID","UID"
"9",""
"0","Steve the Pirate"
"1","\Alpha"
"2","Hoban ""Wash"" Washburne"
"3","Pastor Veal"
"4","Tucker"
"10",""
"5","Simon"
"6","Sonny"
"7","Wat\"

I can either load every non-header line but the first:

我可以加载每个非标题行,但第一个:

mysql> DELETE FROM person;
Query OK, 0 rows affected (0.00 sec)

mysql> LOAD DATA
          LOCAL INFILE 'temp-1.csv'
          INTO TABLE person
          FIELDS
            TERMINATED BY ','
            ENCLOSED BY '"'
            ESCAPED BY '"'
          LINES
            TERMINATED BY '\n'
          IGNORE 1 LINES
       ;
Query OK, 9 rows affected (0.00 sec)
Records: 9  Deleted: 0  Skipped: 0  Warnings: 0

mysql> SELECT * FROM person;
+------+------------------------+
| ID   | UID                    |
+------+------------------------+
|    0 | Steve the Pirate       |
|   10 |                        |
|    1 | \Alpha                 |
|    2 | Hoban "Wash" Washburne |
|    3 | Pastor Veal            |
|    4 | Tucker                 |
|    5 | Simon                  |
|    6 | Sonny                  |
|    7 | Wat\                   |
+------+------------------------+
9 rows in set (0.00 sec)

Or I can load all lines including the header:

或者我可以加载包括标题在内的所有行:

mysql> DELETE FROM person;
Query OK, 9 rows affected (0.00 sec)

mysql> LOAD DATA
          LOCAL INFILE 'temp-1.csv'
          INTO TABLE person
          FIELDS
            TERMINATED BY ','
            ENCLOSED BY '"'
            ESCAPED BY '"'
          LINES
            TERMINATED BY '\n'
          IGNORE 0 LINES
       ;
Query OK, 11 rows affected, 1 warning (0.01 sec)
Records: 11  Deleted: 0  Skipped: 0  Warnings: 1

mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level   | Code | Message                                                |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 |
+---------+------+--------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM person;
+------+------------------------+
| ID   | UID                    |
+------+------------------------+
|    0 | UID                    |
|    9 |                        |
|    0 | Steve the Pirate       |
|   10 |                        |
|    1 | \Alpha                 |
|    2 | Hoban "Wash" Washburne |
|    3 | Pastor Veal            |
|    4 | Tucker                 |
|    5 | Simon                  |
|    6 | Sonny                  |
|    7 | Wat\                   |
+------+------------------------+
11 rows in set (0.00 sec)

If no lines of my input file end on "":

如果我的输入文件没有任何行结束于""

shell% cat temp-2.csv
"ID","UID"
"0","Steve the Pirate"
"1","\Alpha"
"2","Hoban ""Wash"" Washburne"
"3","Pastor Veal"
"4","Tucker"
"5","Simon"
"6","Sonny"
"7","Wat\"

then I can either load no lines:

然后我可以不加载任何行:

mysql> DELETE FROM person;
Query OK, 11 rows affected (0.00 sec)

mysql> LOAD DATA
          LOCAL INFILE 'temp-2.csv'
          INTO TABLE person
          FIELDS
            TERMINATED BY ','
            ENCLOSED BY '"'
            ESCAPED BY '"'
          LINES
            TERMINATED BY '\n'
          IGNORE 1 LINES
       ;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Deleted: 0  Skipped: 0  Warnings: 0

mysql> SELECT * FROM person;
Empty set (0.00 sec)

Or I can load all the lines including the header:

或者我可以加载包括标题在内的所有行:

mysql> DELETE FROM person;
Query OK, 0 rows affected (0.00 sec)

mysql> LOAD DATA
          LOCAL INFILE 'temp-2.csv'
          INTO TABLE person
          FIELDS
            TERMINATED BY ','
            ENCLOSED BY '"'
            ESCAPED BY '"'
          LINES
            TERMINATED BY '\n'
          IGNORE 0 LINES
       ;
Query OK, 9 rows affected, 1 warning (0.03 sec)
Records: 9  Deleted: 0  Skipped: 0  Warnings: 1

mysql> show warnings;
+---------+------+--------------------------------------------------------+
| Level   | Code | Message                                                |
+---------+------+--------------------------------------------------------+
| Warning | 1366 | Incorrect integer value: 'ID' for column 'ID' at row 1 |
+---------+------+--------------------------------------------------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM person;
+------+------------------------+
| ID   | UID                    |
+------+------------------------+
|    0 | UID                    |
|    0 | Steve the Pirate       |
|    1 | \Alpha                 |
|    2 | Hoban "Wash" Washburne |
|    3 | Pastor Veal            |
|    4 | Tucker                 |
|    5 | Simon                  |
|    6 | Sonny                  |
|    7 | Wat\                   |
+------+------------------------+
9 rows in set (0.00 sec)

So now that I've discovered many ways to do it wrong, how can I use LOAD DATAto import the data from these files into my database?

既然我发现了很多方法都做错了,我该如何LOAD DATA将这些文件中的数据导入到我的数据库中呢?

回答by rampion

According to the documentation for LOAD DATA, treating doubled double quotes as a double quote is the default:

根据的文档LOAD DATA,将双引号视为双引号是默认值

If the field begins with the ENCLOSED BY character, instances of that character are recognized as terminating a field value only if followed by the field or line TERMINATED BY sequence. To avoid ambiguity, occurrences of the ENCLOSED BY character within a field value can be doubled and are interpreted as a single instance of the character. For example, if ENCLOSED BY '"' is specified, quotation marks are handled as shown here:

"The ""BIG"" boss"  -> The "BIG" boss
The "BIG" boss      -> The "BIG" boss
The ""BIG"" boss    -> The ""BIG"" boss

如果字段以 ENCLOSED BY 字符开头,则该字符的实例仅在后跟字段或行 TERMINATED BY 序列时才被识别为终止字段值。为避免歧义,字段值中 ENCLOSED BY 字符的出现次数可以加倍,并被解释为该字符的单个实例。例如,如果指定了 ENCLOSED BY '"',引号的处理如下所示:

"The ""BIG"" boss"  -> The "BIG" boss
The "BIG" boss      -> The "BIG" boss
The ""BIG"" boss    -> The ""BIG"" boss

So all I need to do is disable interpreting \as an escape character, by using ESCAPED BY ''.

所以我需要做的就是禁用解释\为转义字符,通过使用ESCAPED BY ''.

LOAD DATA
  LOCAL INFILE 'temp-1.csv'
  INTO TABLE person
  FIELDS
    TERMINATED BY ','
    ENCLOSED BY '"'
    ESCAPED BY ''
  LINES
    TERMINATED BY '\n'
  IGNORE 1 LINES
;