postgresql Postgres 中不存在函数 convert_from(character variables, unknown)

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/23659685/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-21 01:26:59  来源:igfitidea点击:

function convert_from(character varying, unknown) does not exist in Postgres

postgresql

提问by Mega4alik

When i'm trying to convert from unicode to utf8 in the code below "function convert_from(character varying, unknown) does not exist"error occurs.

当我试图从Unicode转换为UTF-8在下面的代码 “功能convert_from(字符改变,未知)不存在”发生错误。

select convert_from(artists, 'UTF8') from songs where     
to_tsvector('simple',convert_from(artists, 'UTF8')) 
  @@ plainto_tsquery('simple','alizee') 
limit 100

Column "artists" has "TEXT" type.

列“艺术家”具有“文本”类型。

But when I'm running

但是当我跑步的时候

select convert_from(E'\u0422\u0438\u043c\u0430\u0442\u0438', 'UTF8');

it works well.

它运作良好。

How can I resolve this problem? I would appreciate any help. Thanks

我该如何解决这个问题?我将不胜感激任何帮助。谢谢

回答by klin

From documentation: convert_from(string bytea, src_encoding name). So cast artists to bytea:

文档convert_from(string bytea, src_encoding name)。所以将艺术家投给 bytea:

select convert_from(artists::bytea, 'UTF8') from songs where     
to_tsvector('simple',convert_from(artists, 'UTF8')) 
  @@ plainto_tsquery('simple','alizee') 
limit 100

回答by Craig Ringer

It looks to me like you've taken data in the form of 7-bit ascii with 2-byte unicode hex escapes (\uxxxx) and stored it in a varchartyped field.

在我看来,您已经以 7 位 ascii 的形式获取了带有 2 字节 unicode 十六进制转义符 ( \uxxxx) 的数据,并将其存储在varchar类型化字段中。

convert_fromis entirely the wrong way to handle that data. It isn't utf-8 except in that 7-bit ASCII is a subset of utf-8. If you forced it through convert_from(..., 'utf-8')you'd get exactly what you started with, blah\u0123etc.

convert_from是处理这些数据的完全错误的方式。它不是 utf-8,只是 7 位 ASCII 是 utf-8 的子集。如果你强迫它通过convert_from(..., 'utf-8')你会得到你开始的东西,blah\u0123等等。

The reason that select convert_from(E'\u0422\u0438\u043c\u0430\u0442\u0438', 'UTF8');appearsto work is that the convert_fromdoes nothing. Here's what happens:

select convert_from(E'\u0422\u0438\u043c\u0430\u0442\u0438', 'UTF8');似乎有效的原因是convert_from什么都不做。这是发生的事情:

  • PostgreSQL sees the literal value E'\u0422\u0438\u043c\u0430\u0442\u0438'and sees that convert_fromwants byteainput.

  • PostgreSQL parses the escape-string format of the string literal, decoding the unicode escapes to produce the utf-8 string Тимати. At this point it's as if you wrote:

    SELECT convert_from('Тимати', 'utf-8')
    
  • Because convert_fromwants byteainput, PostgreSQL implicitly casts the data to bytea, converting the text string to utf-8 encodied binary because that's your database server's text encoding. Now it's as if you wrote:

    SELECT convert_from( convert_to('Тимати','utf-8'), 'utf-8')
    

    which when the first conversion is done gets turned into:

    SELECT convert_from(BYTEA '\xd0a2d0b8d0bcd0b0d182d0b8', 'utf-8');
    
  • PostgreSQL的看到文字值E'\u0422\u0438\u043c\u0430\u0442\u0438'并看到convert_from希望bytea的输入。

  • PostgreSQL 解析字符串文字的转义字符串格式,解码 unicode 转义以生成 utf-8 字符串Тимати。在这一点上,就好像你写了:

    SELECT convert_from('Тимати', 'utf-8')
    
  • 因为convert_from想要bytea输入,PostgreSQL的隐式数据投射到bytea,转换文本字符串为UTF-8 encodied二进制,因为这是你的数据库服务器的文字编码。现在就好像你写道:

    SELECT convert_from( convert_to('Тимати','utf-8'), 'utf-8')
    

    当第一次转换完成时,它会变成:

    SELECT convert_from(BYTEA '\xd0a2d0b8d0bcd0b0d182d0b8', 'utf-8');
    

So effectively your convert_fromis a very slow and inefficient way to do nothing at all.

因此,您convert_from实际上是一种非常缓慢且低效的方法,什么都不做。

It doesn't work that way for a column value instead of a literal, because PostgreSQL will implicitly cast unknown-typed literals in ways it won't implicitly cast known-type column values like varcharcolumn. That's a type-safety thing.

它不适用于列值而不是文字,因为 PostgreSQL 将隐式转换未知类型的文字,它不会隐式转换已知类型的列值,如varcharcolumn.txt 。这是一个类型安全的事情。

So, to correctlyconvert your data, you need to decode those \uescapes. convert_fromis notwhat you want for that, because it's designed for converting binary representations of encoded text into the local database text encoding.

因此,要正确转换数据,您需要解码这些\u转义。convert_from不是你想要的,因为它是专为转换编码文本的二进制表示到本地数据库中的文本编码

PostgreSQL supports that for string literals, as we've determined by seeing what it's doing for E''literals. I'm currently looking for a function that exposes that decoding to the user, though. decode(...)doesn't support \uescapes, only octal escapes...

PostgreSQL 支持字符串字面量,正如我们通过查看它对E''字面量所做的事情而确定的那样。不过,我目前正在寻找一个向用户公开该解码的函数。decode(...)不支持\u转义,只支持八进制转义...