postgresql Postgres 中不存在函数 convert_from(character variables, unknown)
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23659685/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
function convert_from(character varying, unknown) does not exist in Postgres
提问by Mega4alik
When i'm trying to convert from unicode to utf8 in the code below "function convert_from(character varying, unknown) does not exist"error occurs.
当我试图从Unicode转换为UTF-8在下面的代码 “功能convert_from(字符改变,未知)不存在”发生错误。
select convert_from(artists, 'UTF8') from songs where
to_tsvector('simple',convert_from(artists, 'UTF8'))
@@ plainto_tsquery('simple','alizee')
limit 100
Column "artists" has "TEXT" type.
列“艺术家”具有“文本”类型。
But when I'm running
但是当我跑步的时候
select convert_from(E'\u0422\u0438\u043c\u0430\u0442\u0438', 'UTF8');
it works well.
它运作良好。
How can I resolve this problem? I would appreciate any help. Thanks
我该如何解决这个问题?我将不胜感激任何帮助。谢谢
回答by klin
From documentation: convert_from(string bytea, src_encoding name)
.
So cast artists to bytea:
从文档:convert_from(string bytea, src_encoding name)
。所以将艺术家投给 bytea:
select convert_from(artists::bytea, 'UTF8') from songs where
to_tsvector('simple',convert_from(artists, 'UTF8'))
@@ plainto_tsquery('simple','alizee')
limit 100
回答by Craig Ringer
It looks to me like you've taken data in the form of 7-bit ascii with 2-byte unicode hex escapes (\uxxxx
) and stored it in a varchar
typed field.
在我看来,您已经以 7 位 ascii 的形式获取了带有 2 字节 unicode 十六进制转义符 ( \uxxxx
) 的数据,并将其存储在varchar
类型化字段中。
convert_from
is entirely the wrong way to handle that data. It isn't utf-8 except in that 7-bit ASCII is a subset of utf-8. If you forced it through convert_from(..., 'utf-8')
you'd get exactly what you started with, blah\u0123
etc.
convert_from
是处理这些数据的完全错误的方式。它不是 utf-8,只是 7 位 ASCII 是 utf-8 的子集。如果你强迫它通过convert_from(..., 'utf-8')
你会得到你开始的东西,blah\u0123
等等。
The reason that select convert_from(E'\u0422\u0438\u043c\u0430\u0442\u0438', 'UTF8');
appearsto work is that the convert_from
does nothing. Here's what happens:
select convert_from(E'\u0422\u0438\u043c\u0430\u0442\u0438', 'UTF8');
似乎有效的原因是convert_from
什么都不做。这是发生的事情:
PostgreSQL sees the literal value
E'\u0422\u0438\u043c\u0430\u0442\u0438'
and sees thatconvert_from
wantsbytea
input.PostgreSQL parses the escape-string format of the string literal, decoding the unicode escapes to produce the utf-8 string
Тимати
. At this point it's as if you wrote:SELECT convert_from('Тимати', 'utf-8')
Because
convert_from
wantsbytea
input, PostgreSQL implicitly casts the data tobytea
, converting the text string to utf-8 encodied binary because that's your database server's text encoding. Now it's as if you wrote:SELECT convert_from( convert_to('Тимати','utf-8'), 'utf-8')
which when the first conversion is done gets turned into:
SELECT convert_from(BYTEA '\xd0a2d0b8d0bcd0b0d182d0b8', 'utf-8');
PostgreSQL的看到文字值
E'\u0422\u0438\u043c\u0430\u0442\u0438'
并看到convert_from
希望bytea
的输入。PostgreSQL 解析字符串文字的转义字符串格式,解码 unicode 转义以生成 utf-8 字符串
Тимати
。在这一点上,就好像你写了:SELECT convert_from('Тимати', 'utf-8')
因为
convert_from
想要bytea
输入,PostgreSQL的隐式数据投射到bytea
,转换文本字符串为UTF-8 encodied二进制,因为这是你的数据库服务器的文字编码。现在就好像你写道:SELECT convert_from( convert_to('Тимати','utf-8'), 'utf-8')
当第一次转换完成时,它会变成:
SELECT convert_from(BYTEA '\xd0a2d0b8d0bcd0b0d182d0b8', 'utf-8');
So effectively your convert_from
is a very slow and inefficient way to do nothing at all.
因此,您convert_from
实际上是一种非常缓慢且低效的方法,什么都不做。
It doesn't work that way for a column value instead of a literal, because PostgreSQL will implicitly cast unknown-typed literals in ways it won't implicitly cast known-type column values like varchar
column. That's a type-safety thing.
它不适用于列值而不是文字,因为 PostgreSQL 将隐式转换未知类型的文字,它不会隐式转换已知类型的列值,如varchar
column.txt 。这是一个类型安全的事情。
So, to correctlyconvert your data, you need to decode those \u
escapes. convert_from
is notwhat you want for that, because it's designed for converting binary representations of encoded text into the local database text encoding.
因此,要正确转换数据,您需要解码这些\u
转义。convert_from
是不是你想要的,因为它是专为转换编码文本的二进制表示到本地数据库中的文本编码。
PostgreSQL supports that for string literals, as we've determined by seeing what it's doing for E''
literals. I'm currently looking for a function that exposes that decoding to the user, though. decode(...)
doesn't support \u
escapes, only octal escapes...
PostgreSQL 支持字符串字面量,正如我们通过查看它对E''
字面量所做的事情而确定的那样。不过,我目前正在寻找一个向用户公开该解码的函数。decode(...)
不支持\u
转义,只支持八进制转义...