postgresql 是否可以在 CSV 格式的 Postgres COPY 命令中关闭报价处理?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20402696/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 01:13:18  来源:igfitidea点击:

Is it possible to turn off quote processing in the Postgres COPY command with CSV format?

postgresqlcsvimport

提问by Tom De Leu

I have CSV files, tab-separated, fields notwrapped in quotes, where field data can contain characters like single quotes, double quotes, pipes and backslashes.

我有 CSV 文件、制表符分隔、用引号括起来的字段,其中字段数据可以包含单引号、双引号、管道和反斜杠等字符。

Sample data can look like this:

示例数据可能如下所示:

1       2       "ba$aR\eR\       18

I want to import this data into Postgres using the COPY statement.

我想使用 COPY 语句将此数据导入 Postgres。

When I try to import this using

当我尝试使用

COPY <tablename> FROM  <filename> NULL AS '';

I get an error psql:-:1: ERROR: missing data for columnbecause Postgres is treating the backslash + tab as an "escaped tab" instead of a backslash followed by the field separator.

我收到一个错误,psql:-:1: ERROR: missing data for column因为 Postgres 将反斜杠 + 制表符视为“转义的制表符”而不是反斜杠后跟字段分隔符。

So I switched to using the "CSV format" of the COPY operator, like so:

所以我转而使用 COPY 运算符的“CSV 格式”,如下所示:

COPY <tablename> FROM <filename> WITH CSV DELIMITER E'\t' NULL AS '';

Now there's a new error psql:-:1: ERROR: value too long for type character varying(254)

现在有一个新的错误 psql:-:1: ERROR: value too long for type character varying(254)

Apparently because it's interpreting the double-quote at the start of field 3 as the field wrapping character.

显然是因为它将字段 3 开头的双引号解释为字段包装字符。

How can I specify that my data is NOTquoted at all?

如何指定我的数据根本没有引用?

回答by Tom De Leu

Workaround (thanks to this comment!)

解决方法(感谢此评论!)

COPY <tablename> FROM <filename> WITH CSV DELIMITER E'\t' QUOTE E'\b' NULL AS '';

So basically specifying a quote character that should never be in the text, but that's pretty ugly.

所以基本上指定一个不应出现在文本中的引号字符,但这非常难看。

I'd much prefer it if there was in fact a way to turn off quote processing altogether.

如果实际上有一种方法可以完全关闭报价处理,我会更喜欢它。

回答by Kyle Barron

(Added as a new answer since I don't have the reputation yet to comment.)

(添加为新答案,因为我还没有评论的声誉。)

For the record, since I've been struggling with the same issue, you can use trto remove \b, instead of just hopingit's not in your text anywhere.

作为记录,由于我一直在为同样的问题而苦苦挣扎,因此您可以使用tr删除\b,而不仅仅是希望它不在您的文本中的任何地方。

tr -d '0' < filename.csv > newfile.csv

(Using that \010is the octal representationof \b).

(使用\010的是八进制表示\b)。

Since COPYsupports reading from STDIN, you can ease the I/O impact by piping tr's output:

由于COPY支持从 读取STDIN,您可以通过管道tr输出来减轻 I/O 影响:

cat filename.csv | tr -d '0' | COPY <tablename> FROM STDIN WITH CSV DELIMITER E'\t' QUOTE E'\b' NULL AS '';

回答by nitro2k01

The mode you want to use for data formatted as you describe is the default text mode. It will pass most characters unhindered into the database. It does not have quote processing, and it's using tabs as delimiters. Using CSV mode will just cause you trouble because you're introducing quoting that you have to work around.

您要用于按照您描述的格式设置的数据的模式是默认文本模式。它将大多数字符不受阻碍地传递到数据库中。它没有报价处理,它使用制表符作为分隔符。使用 CSV 模式只会给您带来麻烦,因为您要引入必须解决的引用。

Text mode will pass dollar characters, single and double quotes, pipes, and even backspaces(even though that was not mentioned in the question) right in. The one thing in the example that's not passed through is backslashes. But that's as simple as escaping them, for example by this sedcommand:

文本模式将直接传递美元字符、单引号和双引号、管道甚至后退空格(即使问题中没有提到)。示例中没有传递的一件事是反斜杠。但这就像转义它们一样简单,例如通过以下sed命令:

sed -e 's/\/\\/g' < source.txt > processed.txt

Then the processed file should be importable without any additional options:

然后处理后的文件应该可以导入而无需任何其他选项:

\copy sometable from processed.txt