在 .Net 应用程序中使用/显示来自 Oracle 数据库的特殊字符时遇到问题
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/1222529/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Trouble using/displaying special characters from Oracle db in .Net app
提问by RobLinx
I have a C#.Net application that accesses data from a commercial application backed by an Oracle 10 db. A couple of fields in the commercial app's database (declared as varchar2(n)) contain special characters. The "smart quote" apostrophe, for example. The commercial client app displays these characters correctly, but my application is displaying them as an inverted question mark. The Oracle character set is "WE8ISO8859P1".
我有一个 C#.Net 应用程序,它从一个由 Oracle 10 db 支持的商业应用程序访问数据。商业应用程序数据库中的几个字段(声明为 varchar2(n))包含特殊字符。例如,“智能报价”撇号。商业客户端应用程序正确显示这些字符,但我的应用程序将它们显示为倒问号。Oracle 字符集是“WE8ISO8859P1”。
My application reads the commercial database using System.Data.OracleClient.OracleDataAdapter, converted into a table via DataSet.Tables. The tablerows are converted into objects, and the fields in question are stored as strings.
我的应用程序使用 System.Data.OracleClient.OracleDataAdapter 读取商业数据库,通过 DataSet.Tables 转换为表。tablerows 被转换为对象,有问题的字段被存储为字符串。
If I examine (in the debugger) the data in the DataSet immediately after reading it from the db, and the special characters are already displayed incorrectly. I can't figure out how to examine the data as hex bytes to see what's really there, nor am I certain what I should be looking for.
如果我在从数据库读取数据后立即检查(在调试器中)DataSet 中的数据,并且特殊字符已经显示不正确。我不知道如何以十六进制字节的形式检查数据以查看真正存在的内容,也不确定我应该寻找什么。
I have also noted that Toad displays the characters as inverted question marks as well.
我还注意到 Toad 也将字符显示为倒问号。
One aspect of our application writes these records to a separate table in our own database; when that occurs the special characters get modified, and subsequently display as boxes instead of upside-down question marks.
我们应用程序的一个方面是将这些记录写入我们自己数据库中的单独表中;发生这种情况时,特殊字符会被修改,并随后显示为框而不是倒置的问号。
I can provide further information if needed. Thank you for any and all help!
如果需要,我可以提供更多信息。感谢您的任何帮助!
采纳答案by RobLinx
Postscript for anyone browsing this thread:
浏览此线程的任何人的附言:
Bogdan was very helpful in getting me to the "answer" (such as it is) but as he points out, you might not have identical circumstances.
Bogdan 在让我找到“答案”(例如它是)方面非常有帮助,但正如他指出的那样,您可能没有完全相同的情况。
We communicated with the team responsible for using the commercial software. They had been copying/pasting from Word and Excel, which is how the special characters had been getting inserted.
The problem occurred in the translation of the character between the remote database and our database. Host database uses character set WE8ISO8859P1, where ours uses WE8MSWIN1252. Due to corporate-level concerns, modifying either character set is not feasible right now.
I used SYS.UTL_RAW.CAST_TO_RAW(fieldname) to convert the source field to search for 'BF' (the hex code for an inverted question mark in our character set). This at least let me identify the problem record and character. HOWEVER, many different special characters on the remote records would/could be translated to BF. For example, Word's hyphens are not simple "dash" characters, and also get translated to the inverted question mark.
dump(fieldname) somehow converts to decimal character codes BEFORE the translation, UNLESS I also used the SYS.UTL_RAW.CAST_TO_RAW in the same query. This caused amazing headaches. dump() by itself could be useful in identifying specific pretranslated characters from the source db.
我们与负责使用商业软件的团队进行了沟通。他们一直在从 Word 和 Excel 复制/粘贴,这就是插入特殊字符的方式。
问题发生在远程数据库和我们数据库之间的字符转换中。主机数据库使用字符集 WE8ISO8859P1,我们使用 WE8MSWIN1252。由于公司层面的考虑,现在修改任一字符集都不可行。
我使用 SYS.UTL_RAW.CAST_TO_RAW(fieldname) 将源字段转换为搜索“BF”(我们字符集中倒置问号的十六进制代码)。这至少让我确定了问题记录和性格。但是,远程记录上的许多不同的特殊字符将/可以转换为 BF。例如,Word 的连字符不是简单的“破折号”字符,也会被翻译成倒问号。
dump(fieldname) 在翻译之前以某种方式转换为十进制字符代码,除非我还在同一个查询中使用了 SYS.UTL_RAW.CAST_TO_RAW。这引起了惊人的头痛。dump() 本身可用于从源数据库中识别特定的预翻译字符。
Best solution would be to use the same character set on both dbs. Since that's not possible for us, we have manually replaced all occurrences of the special character on the source (remote) db with non-special equivalents (regular apostrophe or hyphen). However, since the commercial software doesn't correct or flag special characters, we may run into this problem in the future. So, our update application will scan for the inverted question mark and send a notification to the system owner with the ID of the bad record. This, like so many other corporate situations, will have to do. ;-)
最好的解决方案是在两个数据库上使用相同的字符集。由于这对我们来说是不可能的,我们已经用非特殊的等价物(常规撇号或连字符)手动替换了源(远程)数据库上所有出现的特殊字符。但是,由于商业软件没有纠正或标记特殊字符,我们将来可能会遇到这个问题。因此,我们的更新应用程序将扫描倒问号,并使用错误记录的 ID 向系统所有者发送通知。像许多其他公司情况一样,这将不得不这样做。;-)
Thanks again, Bogdan!
再次感谢,博格丹!
回答by Bogdan_Ch
Certain characters in the WE8ISO8859P1 character set have a different binary representation than the same character in UTF8.
WE8ISO8859P1 字符集中的某些字符与 UTF8 中的相同字符具有不同的二进制表示。
What I suggest are 2 possible ways
我建议的是两种可能的方法
1) Try using Oracle native data providers for .NET (ODP.NET). May be there is a bug/feature in Microsoft's library System.Data.OracleClient that this adapter do not automatically support converting WE8ISO8859P1 to unicode. Here is a link to ODP.NET
1) 尝试使用适用于 .NET 的 Oracle 本机数据提供程序 (ODP.NET)。可能是 Microsoft 的库 System.Data.OracleClient 中存在一个错误/功能,该适配器不自动支持将 WE8ISO8859P1 转换为 unicode。这是 ODP.NET 的链接
I hope that there will be a support for this encoding in ODP (but to say true I never checked this, it is only a suggestion)
我希望在 ODP 中会有对这种编码的支持(但说真的我从来没有检查过这个,这只是一个建议)
2) Workaround: in Dataset, you should create a binary field (mapped to the original table field) and a String field (not mapped to the database). When you load data to the dataset, iterate for each row and perfrom convertion from binary array to string.
2)解决方法:在Dataset中,你应该创建一个二进制字段(映射到原始表字段)和一个String字段(不映射到数据库)。将数据加载到数据集时,对每一行进行迭代并执行从二进制数组到字符串的转换。
Code should be something like this
代码应该是这样的
Encoding e = Encoding.GetEncoding("iso-8859-1");
foreach(DataRow row in dataset.Tables["MyTable"])
{
if (!row.IsNull("MyByteArrayField"))
row["MyStringField"] = e.GetString((row["MyByteArrayField"] as byte[]));
}