如何在 c# 中将字符串从 utf8 转换(音译)为 ASCII(单字节)?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/497782/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to convert (transliterate) a string from utf8 to ASCII (single byte) in c#?
提问by Geo
I have a string object
我有一个字符串对象
"with multiple characters and even special characters"
“具有多个字符甚至特殊字符”
I am trying to use
我正在尝试使用
UTF8Encoding utf8 = new UTF8Encoding();
ASCIIEncoding ascii = new ASCIIEncoding();
objects in order to convert that string to ascii. May I ask someone to bring some light to this simple task, that is hunting my afternoon.
对象以便将该字符串转换为 ascii。我可以请人为这个简单的任务带来一些启发,那就是打猎我的下午。
EDIT 1: What we are trying to accomplish is getting rid of special characters like some of the special windows apostrophes. The code that I posted below as an answer will not take care of that. Basically
编辑 1:我们试图完成的是摆脱一些特殊的窗口撇号等特殊字符。我在下面发布的作为答案的代码不会解决这个问题。基本上
O'Brian will become O?Brian. where ' is one of the special apostrophes
奥布莱恩将成为奥布莱恩。其中 ' 是特殊撇号之一
采纳答案by Mark Brackett
This was in response to your other question, that looks like it's been deleted....the point still stands.
这是对你的另一个问题的回应,看起来它已被删除......这一点仍然存在。
Looks like a classic Unicode to ASCII issue. The trick would be to find whereit's happening.
看起来像一个经典的 Unicode 到 ASCII 问题。诀窍是找到在那里,它的发生。
.NET works fine with Unicode, assuming it's told it's Unicodeto begin with (or left at the default).
.NET 与 Unicode 一起工作得很好,假设它被告知以Unicode开头(或保留默认值)。
My guessis that your receiving app can't handle it. So, I'd probably use the ASCIIEncoderwithan EncoderReplacementFallbackwith String.Empty:
我的猜测是您的接收应用无法处理它。所以,我可能会使用的ASCIIEncoder有一个EncoderReplacementFallback用的String.Empty:
using System.Text;
string inputString = GetInput();
var encoder = ASCIIEncoding.GetEncoder();
encoder.Fallback = new EncoderReplacementFallback(string.Empty);
byte[] bAsciiString = encoder.GetBytes(inputString);
// Do something with bytes...
// can write to a file as is
File.WriteAllBytes(FILE_NAME, bAsciiString);
// or turn back into a "clean" string
string cleanString = ASCIIEncoding.GetString(bAsciiString);
// since the offending bytes have been removed, can use default encoding as well
Assert.AreEqual(cleanString, Default.GetString(bAsciiString));
Of course, in the old days, we'd just loop though and remove any chars greater than 127...well, those of us in the US at least. ;)
当然,在过去,我们只是循环并删除任何大于 127 的字符......好吧,至少我们这些在美国。;)
回答by Geo
I was able to figure it out. In case someone wants to know below the code that worked for me:
我能够弄清楚。如果有人想知道下面对我有用的代码:
ASCIIEncoding ascii = new ASCIIEncoding();
byte[] byteArray = Encoding.UTF8.GetBytes(sOriginal);
byte[] asciiArray = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, byteArray);
string finalString = ascii.GetString(asciiArray);
Let me know if there is a simpler way o doing it.
如果有更简单的方法,请告诉我。
回答by Peter Drier
For anyone who likes Extension methods, this one does the trick for us.
对于任何喜欢扩展方法的人来说,这个方法对我们有用。
using System.Text;
namespace System
{
public static class StringExtension
{
private static readonly ASCIIEncoding asciiEncoding = new ASCIIEncoding();
public static string ToAscii(this string dirty)
{
byte[] bytes = asciiEncoding.GetBytes(dirty);
string clean = asciiEncoding.GetString(bytes);
return clean;
}
}
}
(System namespace so it's available pretty much automatically for all of our strings.)
(系统命名空间,因此它几乎可以自动用于我们所有的字符串。)
回答by tonycoupland
Based on Mark's answer above (and Geo's comment), I created a two liner version to remove all ASCII exception cases from a string. Provided for people searching for this answer (as I did).
根据上面 Mark 的回答(以及 Geo 的评论),我创建了一个两行版本来从字符串中删除所有 ASCII 异常情况。为搜索此答案的人提供(就像我一样)。
using System.Text;
// Create encoder with a replacing encoder fallback
var encoder = ASCIIEncoding.GetEncoding("us-ascii",
new EncoderReplacementFallback(string.Empty),
new DecoderExceptionFallback());
string cleanString = encoder.GetString(encoder.GetBytes(dirtyString));
回答by Rapeapach Suwasri
If you want 8 bit representation of characters that used in many encoding, this may help you.
如果您想要在许多编码中使用的字符的 8 位表示,这可能对您有所帮助。
You must change variable targetEncodingto whatever encoding you want.
您必须将变量targetEncoding更改为您想要的任何编码。
Encoding targetEncoding = Encoding.GetEncoding(874); // Your target encoding
Encoding utf8 = Encoding.UTF8;
var stringBytes = utf8.GetBytes(Name);
var stringTargetBytes = Encoding.Convert(utf8, targetEncoding, stringBytes);
var ascii8BitRepresentAsCsString = Encoding.GetEncoding("Latin1").GetString(stringTargetBytes);