postgresql Postgres 插入错误 - 错误:编码“UTF8”的字节序列无效:0x00

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1347646/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-10 22:20:39  来源:igfitidea点击:

Postgres error on insert - ERROR: invalid byte sequence for encoding "UTF8": 0x00

postgresql

提问by ScArcher2

I get the following error when inserting data from mysql into postgres.

将数据从 mysql 插入 postgres 时出现以下错误。

Do I have to manually remove all null characters from my input data? Is there a way to get postgres to do this for me?

我是否必须从我的输入数据中手动删除所有空字符?有没有办法让 postgres 为我做这件事?

ERROR: invalid byte sequence for encoding "UTF8": 0x00

采纳答案by Magnus Hagander

PostgreSQL doesn't support storing NULL (\0x00) characters in text fields (this is obviously different from the database NULL value, which is fully supported).

PostgreSQL 不支持在文本字段中存储 NULL (\0x00) 字符(这与完全支持的数据库 NULL 值明显不同)。

Source: http://www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE

来源:http: //www.postgresql.org/docs/9.1/static/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE

If you need to store the NULL character, you must use a bytea field - which should store anything you want, but won't support text operations on it.

如果您需要存储 NULL 字符,则必须使用 bytea 字段 - 它应该存储您想要的任何内容,但不支持对其进行文本操作。

Given that PostgreSQL doesn't support it in text values, there's no good way to get it to remove it. You could import your data into bytea and later convert it to text using a special function (in perl or something, maybe?), but it's likely going to be easier to do that in preprocessing before you load it.

鉴于 PostgreSQL 在文本值中不支持它,因此没有好的方法可以将其删除。您可以将数据导入 bytea,然后使用特殊函数(在 perl 中,也许?)将其转换为文本,但在加载之前进行预处理可能会更容易。

回答by hicham

Just regex out null bytes:

只需正则表达式出空字节:

s/\x00//g;

回答by David Dal Busco

If you are using Java, you could just replace the x00 characters before the insert like following:

如果您使用的是 Java,则可以在插入之前替换 x00 字符,如下所示:

myValue.replaceAll("\u0000", "")

The solution was provided and explained by Csaba in following post:

Csaba 在以下帖子中提供并解释了该解决方案:

https://www.postgresql.org/message-id/1171970019.3101.328.camel%40coppola.muc.ecircle.de

https://www.postgresql.org/message-id/1171970019.3101.328.camel%40coppola.muc.ecircle.de

Respectively:

分别:

in Java you can actually have a "0x0" character in your string, and that's valid unicode. So that's translated to the character 0x0 in UTF8, which in turn is not accepted because the server uses null terminated strings... so the only way is to make sure your strings don't contain the character '\u0000'.

在 Java 中,您的字符串中实际上可以有一个“0x0”字符,这是有效的 unicode。所以这被转换为 UTF8 中的字符 0x0,这反过来不被接受,因为服务器使用空终止字符串......所以唯一的方法是确保您的字符串不包含字符 '\u0000'。

回答by techkuz

Only this regex worked for me:

只有这个正则表达式对我有用:

sed 's/\0//g'

So as you get your data do this: $ get_data | sed 's/\\0//g'which will output your data without 0x00

因此,当您获取数据时,请执行以下操作:$ get_data | sed 's/\\0//g'这将输出您的数据,而无需0x00

回答by Raido

You can first insert data into blob field and then copy to text field with the folloing function

您可以先将数据插入 blob 字段,然后使用以下函数复制到文本字段

CREATE OR REPLACE FUNCTION blob2text() RETURNS void AS $$
Declare
    ref record;
    i integer;
Begin
    FOR ref IN SELECT id, blob_field FROM table LOOP

          --  find 0x00 and replace with space    
      i := position(E'\000'::bytea in ref.blob_field);
      WHILE i > 0 LOOP
        ref.bob_field := set_byte(ref.blob_field, i-1, 20);
        i := position(E'\000'::bytea in ref.blobl_field);
      END LOOP

    UPDATE table SET field = encode(ref.blob_field, 'escape') WHERE id = ref.id;
    END LOOP;

End; $$ LANGUAGE plpgsql; 

--

——

SELECT blob2text();

回答by ?smail Yavuz

If you need to store null characters in text fields and don't want to change your data type other than text then you can follow my solution too:

如果您需要在文本字段中存储空字符并且不想更改文本以外的数据类型,那么您也可以按照我的解决方案进行操作:

Before insert:

插入前:

myValue = myValue.replaceAll("\u0000", "SomeVerySpecialText")

After select:

选择后:

myValue = myValue.replaceAll("SomeVerySpecialText","\u0000")

I've used "null" as my SomeVerySpecialText which I am sure that there will be no any "null" string in my values at all.

我已经使用“null”作为我的 SomeVerySpecialText,我确信我的值中根本没有任何“null”字符串。

回答by Steve Chávez

This kind of error can also happen when using COPYand having an escaped string containing NULL values(00) such as:

当使用COPY并且转义字符串包含 NULL values( 00)时,也会发生这种错误,例如:

"H\x00\x00\x00tj\xA8\x9E#D\x98+\xCA\xF0\xA7\xBBl\xC5\x19\xD7\x8D\xB6\x18\xEDJ\x1En"

"H\x00\x00\x00tj\xA8\x9E#D\x98+\xCA\xF0\xA7\xBBl\xC5\x19\xD7\x8D\xB6\x18\xEDJ\x1En"

If you use COPYwithout specifying the format 'CSV'postgres by default will assume format 'text'. This has a different interaction with backlashes, see text format.

如果您在COPY不指定format 'CSV'postgres 的情况下使用,默认情况下将假定format 'text'. 这与反冲有不同的交互,请参阅文本格式

If you're using COPYor a file_fdwmake sure to specify format 'CSV'to avoid this kind of errors.

如果您正在使用COPYfile_fdw确保指定format 'CSV'以避免此类错误。