MySQL C# 文本编码问题

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/942277/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-06 03:52:58  来源:igfitidea点击:

MySQL C# Text Encoding Problems

c#mysqlunicodeutf-8

提问by Peter

I have an old MySQL database with encoding set to UTF-8. I am using Ado.Net Entity framework to connect to it.

我有一个旧的 MySQL 数据库,编码设置为 UTF-8。我正在使用 Ado.Net Entity 框架连接到它。

The string that I retrieve from it have strange characters when ? like characters are expected.

我从中检索的字符串在什么时候有奇怪的字符?像预期的字符。

For example: "?" is "??".

例如: ”?” 是 ”??”。

I thought I could get this right by converting from UTF8 to UTF16.

我想我可以通过从 UTF8 转换为 UTF16 来解决这个问题。

 return Encoding.Unicode.GetString(                
            Encoding.Convert(
            Encoding.UTF8,
            Encoding.Unicode,
            Encoding.UTF8.GetBytes(utf8)));
    }

This however doesn't change a thing.

然而这不会改变任何事情。

How could I get the data from this database in proper form?

我怎样才能以正确的形式从这个数据库中获取数据?

回答by RobV

Even if the database is set to UTF8 you must do the following things to get Unicode fields to work correctly:

即使数据库设置为 UTF8,您也必须执行以下操作才能使 Unicode 字段正常工作:

  1. Ensure you are using a Unicode field type like NVARCHARor TEXT CHARSET utf8
  2. Whenever you insert anything into the field you must prefix it with the N character to indicate Unicode data as shown in the examples below
  3. Whenever you select based on Unicode data ensure you use the N prefix again
  1. 确保您使用的是 Unicode 字段类型,如NVARCHARTEXT CHARSET utf8
  2. 每当您在字段中插入任​​何内容时,您都必须在其前面加上 N 字符以指示 Unicode 数据,如下例所示
  3. 每当您根据 Unicode 数据进行选择时,请确保再次使用 N 前缀


MySqlCommand cmd = new MySqlCommand("INSERT INTO EXAMPLE (someField) VALUES (N'Unicode Data')");

MySqlCommand cmd2 = new MySqlCommand("SELECT * FROM EXAMPLE WHERE someField=N'Unicode Data'");

If the database wasn't configured correctly or the data was inserted without using the N prefix it won't be possible to get the correct data out since it will have been downcast into the Latin 1/ASCII character set

如果数据库配置不正确或插入数据时没有使用 N 前缀,则无法获取正确的数据,因为它已被向下转换为拉丁文 1/ASCII 字符集

回答by erenon

Try set the encoding by "set names utf8" query. You can set this parameter in mysql config too.

尝试通过“set names utf8”查询设置编码。您也可以在 mysql 配置中设置此参数。

回答by JJJ

As others have said this could be a db issue, but it could also be caused by using an old version of the .net mysql connector.

正如其他人所说,这可能是数据库问题,但也可能是使用旧版本的 .net mysql 连接器造成的。

What I actually wanted to comment on was the utf8 to utf16 conversion. The string you are trying to convert is actually alreay unicode encoded, so your "??" characters actually takes up 4 bytes (or more) and are no longer, at the point of your conversion, a misrepresentation of the "?" character. That is the reason why your conversion doesn't do anything. If you want to do a conversion like that I think you would have to encode your utf8 string as a old style 1 byte per character string, using a codepage where the byte values of ? and ? actually represent the utf8 byte sequence of ? and then treat the bytes of this new string as an utf8 string. Fun stuff.

我真正想评论的是 utf8 到 utf16 的转换。您尝试转换的字符串实际上已经是 unicode 编码的,所以您的“??” 字符实际上占用 4 个字节(或更多)并且在转换时不再是对“?”的错误表示。特点。这就是为什么您的转换没​​有做任何事情的原因。如果您想进行这样的转换,我认为您必须将您的 utf8 字符串编码为每个字符串 1 个字节的旧样式,使用代码页,其中 ? 和 ?实际上代表 utf8 字节序列?然后将此新字符串的字节视为 utf8 字符串。好玩的东西。

回答by satnhak

There are two things that you need to do to support UTF-8 in the ADO.NET Entity frame work (or in general using the MySQL .NET Connector):

要在 ADO.NET 实体框架中支持 UTF-8(或通常使用 MySQL .NET 连接器),您需要做两件事:

  1. Ensure that the collation of your database of table is a UTF-8 collation (i.e. utf8_general_cior one of its relations)
  2. Add Charset=utf8;to your connection string.

    "Server=localhost;Database=test;Uid=test;Pwd=test;Charset=utf8;"
    
  1. 确保您的数据库表的排序规则是 UTF-8 排序规则(即utf8_general_ci或其关系之一)
  2. 添加Charset=utf8;到您的连接字符串。

    "Server=localhost;Database=test;Uid=test;Pwd=test;Charset=utf8;"
    

I'm not certain, but the encoding may be case sensitive; I found that CharSet=UTF8;did not work for me.

我不确定,但编码可能区分大小写;我发现这CharSet=UTF8;对我不起作用。

回答by f.n174

thank you The Mouth of a Cow , your solution works but still we need converting characters. i think this is your problem :) and for converting characters you can use this code

谢谢 The Mouth of a Cow ,您的解决方案有效,但我们仍然需要转换字符。我认为这是您的问题:) 并且对于转换字符,您可以使用此代码

 System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;

 string s = "unicode";

 //string to utf
 byte[] utf = System.Text.Encoding.UTF8.GetBytes(s);

 //utf to string
 string s2= System.Text.Encoding.UTF8.GetString(utf);