为什么我的中文字符在c#字符串中显示不正确
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/923876/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Why are my Chinese characters not displayed correctly in c# string
提问by Sean
I am storing Chinese and English text in an SQL Server 2005 database and displaying it on a webpage, but the Chinese is not being displayed correctly. I have been reading about the subject and have done the following:
我将中英文文本存储在SQL Server 2005数据库中并在网页上显示,但中文显示不正确。我一直在阅读有关该主题的内容,并已完成以下工作:
- used N before the text in my INSERT statement
- set the field type to nvarchar
- set the charset of the page to UTF-8
- 在我的 INSERT 语句中的文本之前使用 N
- 将字段类型设置为 nvarchar
- 将页面的字符集设置为 UTF-8
Chinese characters are being displayed in the page correctly when I insert them directly into the page i.e. don't get them from the database
将汉字直接插入页面时,即不从数据库中获取汉字,在页面中显示正确
These are the characters that should be displayed:全澳甲流确诊病例已破100
这些是应该显示的字符:全澳甲流舞台案例已破100
This is what is displayed when the text is retrieved from the database: ?…¨??3?”2?μ???èˉ??—…????·2? ′1001
这是从数据库中检索文本时显示的内容:?…¨??3?”2?μ???èˉ??—…????·2? '1001
This seems to be something that is related to how strings are handled in c# because the Chinese can get retrieved and displayed correctly in classic asp
这似乎与c#中如何处理字符串有关,因为在经典asp中可以正确检索和显示中文
Is there anything else I need to do to get the data out of the database, into a string and output correctly on an aspx page?
我还需要做些什么才能将数据从数据库中提取到字符串中并在 aspx 页面上正确输出?
回答by Will Charczuk
This is definitely a problem with the encoding of the strings at some point in your round trip from the database to the c# string, but from the sounds of it you're doing everything correctly.
在从数据库到 c# 字符串的往返过程中,这绝对是字符串编码的一个问题,但从它的声音来看,您所做的一切都是正确的。
For our database we store Unicode data in NVARCHAR() columns and then read them out to normal C# strings; no text encoding changes were necessary. What kind of of data objects are you using (i.e DataSets, just a DataReader, LINQtoSQL)?
对于我们的数据库,我们将 Unicode 数据存储在 NVARCHAR() 列中,然后将它们读出为普通的 C# 字符串;无需更改文本编码。您使用的是哪种数据对象(即数据集,只是一个 DataReader,LINQtoSQL)?
In our application we read the results of the stored procedure using FetchDataSet, and then do a DataBinder.Eval() to assign the string that is eventually the text of a label.
在我们的应用程序中,我们使用 FetchDataSet 读取存储过程的结果,然后执行 DataBinder.Eval() 来分配最终成为标签文本的字符串。
回答by russau
How are the characters getting into the database? Are you entering them via a stored proc? Make sure the parameters on your stored proc are also nvarchar AND on the parameters on the command object you are calling the proc from.
字符如何进入数据库?您是否通过存储过程输入它们?确保您存储的 proc 上的参数也是 nvarchar 并且在您调用 proc 的命令对象上的参数上。
Update: the consensus on the thread is that the database doesn't have properly encoded NVARCHAR content. Here's my latest theory: the database has the UTF8 bytes. These bytes remain untouched when they are output from from ASP. ASP.NET takes the UTF8 bytes and interprets it as single-byte characters.
更新:该线程的共识是数据库没有正确编码 NVARCHAR 内容。这是我的最新理论:数据库具有 UTF8 字节。当它们从 ASP 输出时,这些字节保持不变。ASP.NET 采用 UTF8 字节并将其解释为单字节字符。
Try get the bytes out of the the database, and decode it as UTF8, eg:
尝试从数据库中取出字节,并将其解码为 UTF8,例如:
SqlCommand command = new SqlCommand("SELECT zhtext FROM TestTable", connection);
byte[] byteArray = (byte[])command.ExecuteScalar();
lblText.Text = Encoding.UTF8.GetString(byteArray);
回答by Ali Shafai
Have you installed the "support for eastern languages" in your windows? is it XP? if that's the case, your data might be all well, just the SQL management studio doesn't show it properly. (all true type fonts show OK even without the "support for chinese", but system fonts don't)
您是否在 Windows 中安装了“支持东方语言”?是XP吗?如果是这种情况,您的数据可能一切正常,只是 SQL 管理工作室没有正确显示它。(即使没有“支持中文”,所有真字体都显示正常,但系统字体不显示)
回答by yinyueyouge
So far the information is:
到目前为止的信息是:
- You are using direct SQL INSERT script to insert into the database.
- The data appears broken in database.
- 您正在使用直接 SQL INSERT 脚本插入到数据库中。
- 数据在数据库中出现损坏。
The problem might lie in two places:
问题可能出在两个地方:
In your INSERT statement, did you prefix the insert value with N?
INSERT INTO #tmp VALUES (N'全澳甲流确诊病例已破100')
If you prefix the value with N, does the String object hold the correct data?
String sql = "INSERT INTO #tmp VALUES (N' " + value + "')"
在您的 INSERT 语句中,您是否在插入值前添加了 N?
INSERT INTO #tmp VALUES (N'全澳甲流案例已破100')
如果给值加上 N 前缀,String 对象是否保存了正确的数据?
String sql = "INSERT INTO #tmp VALUES (N'" + value + "')"
Here I assume valueis a String object.
这里我假设value是一个 String 对象。
Does this String object hold the correct Chinese characters?
这个 String 对象是否保存了正确的汉字?
Try print out its value and see.
尝试打印出它的值并查看。
Updated:
更新:
Let's assume the INSERT query is constructed as below:
让我们假设 INSERT 查询的构造如下:
String sql = "INSERT INTO #tmp VALUES (N' " + value + "')"
I assume valueholds the Chinese character.
我假设value包含汉字。
Did you assign the Chinese characters into value directly? Like
你是直接给汉字赋值的吗?喜欢
String value = "全澳甲流确诊病例已破100";
The above code shall work. However, if you have done any intermediate processing, it will cause problem.
上面的代码应该可以工作。但是,如果您进行了任何中间处理,则会导致问题。
I did a localized TC project before; the previous architect had done several encoding conversions which are necessary in ASP; but they will create problem in .NET:
之前做过一个本地化的TC项目;之前的架构师已经完成了 ASP 中必要的几个编码转换;但它们会在 .NET 中产生问题:
String value = "全澳甲流确诊病例已破100";
Encoding tc = Encoding.GetEncoding("BIG5");
byte[] bytes = tc.GetBytes(value);
value = Encoding.Unicode.GetString(bytes);
The above conversions are unnecessary. In .NET, simply direct assignment will work:
上述转换是不必要的。在 .NET 中,只需直接赋值即可:
String value = "全澳甲流确诊病例已破100";
That is because String constants and the String object itself are Unicode compliant.
这是因为 String 常量和 String 对象本身是 Unicode 兼容的。
The framework library, such as File IO, when reading a file which is not encoded in Unicode, they will convert the foreign encoding to Unicode; in other words, the framework will do this dirty job for you. You do not need to perform manual encoding conversion most of time.
框架库,如File IO,在读取非Unicode编码的文件时,会将外来编码转换为Unicode;换句话说,框架将为您完成这项肮脏的工作。大多数情况下您不需要执行手动编码转换。
Update: Understood that ASP is used to insert data into an SQL server.
更新:了解 ASP 用于将数据插入 SQL 服务器。
I have written a small piece of ASP to insert some Chinese chars into SQL database and it works.
我写了一小段 ASP 来在 SQL 数据库中插入一些中文字符并且它工作正常。
I have a database named "trans" and I created a table "temp" inside. The ASP page is encoded in UTF-8.
我有一个名为“trans”的数据库,我在里面创建了一个表“temp”。ASP 页以 UTF-8 编码。
<html>
<head title="Untitled">
<meta http-equiv="content-type" content="text/html";charset="utf-8">
</head>
<body>
<script language="vbscript" runat="server">
If Request.Form("Button1") = "Submit" Then
SqlQuery = "INSERT INTO trans..temp VALUES (N'" + Request.Form("Text1") + "')"
Set cn = Server.CreateObject("ADODB.Connection")
cn.Provider = "sqloledb"
cn.Properties("Data Source").Value = *********
cn.Properties("Initial Catalog").Value = "TRANS"
cn.Properties("User ID").Value = "sa"
cn.Properties("Password").Value = **********
cn.Properties("Persist Security Info").Value = False
cn.Open
cn.Execute(SqlQuery)
cn.Close
Set cn = Nothing
Response.Write SqlQuery
End If
</script>
<form name="form1" method="post" action="input.asp">
<input name="Text1" type="text" />
<input name="Button1" value="Submit" type="submit" />
</form>
</body>
</html>
The table is defined as belows in my database:
该表在我的数据库中定义如下:
create table temp (data NVARCHAR(100))
Submit the ASP page several times and my table contains proper Chinese data:
多次提交ASP页面,我的表格包含正确的中文数据:
select * from trans..temp
data
----------------
test
测试
全澳甲流确诊病例已破100
Hope this can help.
希望这能有所帮助。
回答by devio
The summary for me looks like:
我的总结如下:
- characters displayed correctly in ASP
- characters displayed garbled in SSMS
- characters displayed garbled in ASP.Net
- 字符在 ASP 中正确显示
- SSMS中显示乱码的字符
- ASP.Net 中显示乱码的字符
conclusion: data in the database is not encoded correctly, and you need to migrate the data to unicode to deal with them in C#, just as Ryan sketched.
结论:数据库中的数据没有正确编码,需要将数据迁移到unicode在C#中进行处理,就像Ryan所描绘的那样。