显示来自 oracle varchar2 的字符串的十六进制值?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18701984/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Displaying the hex value of a string from a oracle varchar2?
提问by Ray Kiddy
We are having problems with text that is encoded in some different ways but kept in a single column in a table. Long story. On MySQL, I can do "select hex(str) from table where" and I see the bytes of the string exactly as I set them.
我们遇到了以某些不同方式编码但保存在表格中的单个列中的文本问题。很长的故事。在 MySQL 上,我可以执行“select hex(str) from table where”,我看到的字符串字节与我设置的完全一样。
On Oracle, I have a string which starts with the Turkish character ?, which is the Unicode character 0x0130 "LATIN CAPITAL LETTER WITH DOT ABOVE". This is in my printed copy of the Unicode Version 2.0 book. In UTF-8, this character is 0xc4b0.
在 Oracle 上,我有一个以土耳其语字符 ? 开头的字符串,它是 Unicode 字符 0x0130“带有点上方的拉丁文大写字母”。这是我印刷的 Unicode 2.0 版书籍。在 UTF-8 中,这个字符是 0xc4b0。
We have very old client apps we need to support. They would send us this text in "windows-1254". We used to just close our eyes, store it, and hand it back later. Now we need the Unicode, or are being given the Unicode.
我们需要支持非常旧的客户端应用程序。他们会在“windows-1254”中向我们发送此文本。我们过去只是闭上眼睛,将它存放起来,然后再交还。现在我们需要Unicode,或者正在被赋予Unicode。
So I have:
所以我有:
SQL> select id, name from table where that thing;
ID NAME
------ ------------------------
746 Y
This makes sense because the "?" is 0xdd in windows-1254 and 0xdd in wondows-1252 is "Y". My terminal is presumably set to the usual windows-1252.
这是有道理的,因为“?” 在 windows-1254 中为 0xdd,而在 windows-1252 中为 0xdd 为“Y”。我的终端大概设置为通常的 windows-1252。
But:
但:
SQL> select id, rawtohex(name) from table where that thing;
ID RAWTOHEX(NAME)
------ ------------------------
746 C39D
There seems to be no equivalent to the hex(name) function in MySQL. But I mustbe missing something. What am I missing here?
MySQL 中似乎没有与 hex(name) 函数等效的函数。但我一定错过了一些东西。我在这里缺少什么?
My java code has to take the utf8 that I am supplied and save a utf8 copy and a windows-1252 copy. The java code gives me:
我的 java 代码必须采用我提供的 utf8 并保存一个 utf8 副本和一个 windows-1252 副本。java代码给了我:
bytes (utf8): c4 b0
bytes (1254): dd
Yet, when I save it, the client does not get the correct character. And when I try to see what Oracle has actually stored, i get the garbage seen above. I have noidea where the C39D is coming from. Any suggestions?
然而,当我保存它时,客户端没有得到正确的字符。当我尝试查看 Oracle 实际存储的内容时,我得到了上面看到的垃圾。我不知道 C39D 是从哪里来的。有什么建议?
We have ojdbc14.jar built into all of our applications and we are connecting to a database that says it is "Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production".
我们在所有应用程序中都内置了 ojdbc14.jar,并且我们正在连接到一个数据库,上面写着“Oracle Database 11g Enterprise Edition Release 11.2.0.2.0 - 64bit Production”。
回答by Vincent Malgrat
Use the dump
function to see how Oracle stores data internally.
使用该dump
函数可以查看 Oracle 如何在内部存储数据。
You seem to have a misunderstanding on how Oracle treats VARCHAR2
characters set conversions: you can't influence how Oracle stores its data physically. (Also if you haven't already, it's helpful to read: The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets).
您似乎对 Oracle 如何处理VARCHAR2
字符集转换存在误解:您无法影响 Oracle 如何物理存储其数据。(另外,如果您还没有,阅读:每个软件开发人员绝对、肯定必须了解 Unicode 和字符集的绝对最低要求会很有帮助。)
Your client speaks to Oracle only in binary. In fact all systems exchange information in binary only. To understand each others, it is necessary that both systems know what language (character set) is being used.
您的客户仅以二进制形式与 Oracle 对话。事实上,所有系统都只以二进制形式交换信息。为了相互理解,两个系统都必须知道正在使用什么语言(字符集)。
In your case we can reconstruct what happens:
在您的情况下,我们可以重建发生的情况:
- Your client sends the byte
dd
to Oracle and says it iswindows-1252
(instead of1254)
. - Oracle looks up its character set table and sees that this data is translated to the symbol
Y
in this character set. - Oracle logicallystores this information in its table.
Since Oracle is setup in
UTF-8
, it converts this data to theUTF-8
binary reprensentation ofY
:SQL> SELECT rawtohex('Y') FROM dual; RAWTOHEX('Y') -------------- C39D
Oracle stores
C39D
internally.
- 您的客户端将字节发送
dd
给 Oracle 并说它是windows-1252
(而不是1254)
. - Oracle 查找其字符集表并看到此数据已转换
Y
为此字符集中的符号。 - Oracle 在逻辑上将此信息存储在其表中。
由于 Oracle 是在 中设置的
UTF-8
,它会将这些数据转换为 的UTF-8
二进制表示Y
:SQL> SELECT rawtohex('Y') FROM dual; RAWTOHEX('Y') -------------- C39D
Oracle 在
C39D
内部存储。
As you can see, the problem comes from the first step: there is a problem of setup. As long as you don't fix this, the systems won't be able to successfully dialogue.
可以看到,问题出在第一步:设置有问题。只要你不解决这个问题,系统就无法成功对话。
The conversion is automaticwhen you use VARCHAR2
because this datatype is a logical text symbol interface (you have next to no control over forcing the actual binary data being stored).
使用时转换是自动的,VARCHAR2
因为此数据类型是逻辑文本符号接口(您几乎无法控制强制存储的实际二进制数据)。
回答by Ray Kiddy
I have bytes in UTF-8 to begin.
我有 UTF-8 字节开始。
String strFromUTF8 = new String(bytes, "UTF8");
byte[] strInOldStyle = strFromUTF8.getBytes("Cp1254");
With MySQL, I am done. I takes these bytes, turn them into a hex string and do an update with unhex(hexStr). This allows me to put the legacy bytes into a varchar column.
有了 MySQL,我就完成了。我获取这些字节,将它们转换为十六进制字符串并使用 unhex(hexStr) 进行更新。这允许我将旧字节放入 varchar 列中。
With Oracle, I must do:
使用 Oracle,我必须这样做:
String again = new String(strInOldStyle, "Cp1254");
byte[] nextOldBytes = again.getBytes("UTF8");
Now, I can do an update and get the bytes into a varchar2 column with:
现在,我可以进行更新并将字节放入 varchar2 列中:
update table set colName = UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('hexStr')) where ...
Strange, no? I am sure I have made this more complex than it needed to be.
很奇怪吧?我相信我已经使这比它需要的更复杂。
What we see is this, though,
然而我们看到的是
"?" in UTF-8 == 0xc4d0
"?" in Cp1254 == 0xdd == "Y" in Cp1252
"Y" in UTF-8 == 0xc3d9
So, if I get the string "?" and do:
那么,如果我得到字符串“?” 并做:
update table set name = UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('C3D9')) where ...
Then our legacy client gives us a "?". Yep. It works.
然后我们的旧客户给了我们一个“?”。是的。有用。