如何在 c# 中将字符串从 utf8 转换(音译)为 ASCII(单字节)?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/497782/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-04 05:43:26  来源:igfitidea点击:

How to convert (transliterate) a string from utf8 to ASCII (single byte) in c#?

c#encodingutf-8asciitransliteration

提问by Geo

I have a string object

我有一个字符串对象

"with multiple characters and even special characters"

“具有多个字符甚至特殊字符”

I am trying to use

我正在尝试使用

UTF8Encoding utf8 = new UTF8Encoding();
ASCIIEncoding ascii = new ASCIIEncoding();

objects in order to convert that string to ascii. May I ask someone to bring some light to this simple task, that is hunting my afternoon.

对象以便将该字符串转换为 ascii。我可以请人为这个简单的任务带来一些启发,那就是打猎我的下午。

EDIT 1: What we are trying to accomplish is getting rid of special characters like some of the special windows apostrophes. The code that I posted below as an answer will not take care of that. Basically

编辑 1:我们试图完成的是摆脱一些特殊的窗口撇号等特殊字符。我在下面发布的作为答案的代码不会解决这个问题。基本上

O'Brian will become O?Brian. where ' is one of the special apostrophes

奥布莱恩将成为奥布莱恩。其中 ' 是特殊撇号之一

采纳答案by Mark Brackett

This was in response to your other question, that looks like it's been deleted....the point still stands.

这是对你的另一个问题的回应,看起来它已被删除......这一点仍然存在。

Looks like a classic Unicode to ASCII issue. The trick would be to find whereit's happening.

看起来像一个经典的 Unicode 到 ASCII 问题。诀窍是找到在那里,它的发生。

.NET works fine with Unicode, assuming it's told it's Unicodeto begin with (or left at the default).

.NET 与 Unicode 一起工作得很好,假设它被告知Unicode开头(或保留默认值)。

My guessis that your receiving app can't handle it. So, I'd probably use the ASCIIEncoderwithan EncoderReplacementFallbackwith String.Empty:

我的猜测是您的接收应用无法处理它。所以,我可能会使用的ASCIIEncoder一个EncoderReplacementFallback用的String.Empty:

using System.Text;

string inputString = GetInput();
var encoder = ASCIIEncoding.GetEncoder();
encoder.Fallback = new EncoderReplacementFallback(string.Empty);

byte[] bAsciiString = encoder.GetBytes(inputString);

// Do something with bytes...
// can write to a file as is
File.WriteAllBytes(FILE_NAME, bAsciiString);
// or turn back into a "clean" string
string cleanString = ASCIIEncoding.GetString(bAsciiString); 
// since the offending bytes have been removed, can use default encoding as well
Assert.AreEqual(cleanString, Default.GetString(bAsciiString));

Of course, in the old days, we'd just loop though and remove any chars greater than 127...well, those of us in the US at least. ;)

当然,在过去,我们只是循环并删除任何大于 127 的字符......好吧,至少我们这些在美国。;)

回答by Geo

I was able to figure it out. In case someone wants to know below the code that worked for me:

我能够弄清楚。如果有人想知道下面对我有用的代码:

ASCIIEncoding ascii = new ASCIIEncoding();
byte[] byteArray = Encoding.UTF8.GetBytes(sOriginal);
byte[] asciiArray = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, byteArray);
string finalString = ascii.GetString(asciiArray);

Let me know if there is a simpler way o doing it.

如果有更简单的方法,请告诉我。

回答by Peter Drier

For anyone who likes Extension methods, this one does the trick for us.

对于任何喜欢扩展方法的人来说,这个方法对我们有用。

using System.Text;

namespace System
{
    public static class StringExtension
    {
        private static readonly ASCIIEncoding asciiEncoding = new ASCIIEncoding();

        public static string ToAscii(this string dirty)
        {
            byte[] bytes = asciiEncoding.GetBytes(dirty);
            string clean = asciiEncoding.GetString(bytes);
            return clean;
        }
    }
}

(System namespace so it's available pretty much automatically for all of our strings.)

(系统命名空间,因此它几乎可以自动用于我们所有的字符串。)

回答by tonycoupland

Based on Mark's answer above (and Geo's comment), I created a two liner version to remove all ASCII exception cases from a string. Provided for people searching for this answer (as I did).

根据上面 Mark 的回答(以及 Geo 的评论),我创建了一个两行版本来从字符串中删除所有 ASCII 异常情况。为搜索此答案的人提供(就像我一样)。

using System.Text;

// Create encoder with a replacing encoder fallback
var encoder = ASCIIEncoding.GetEncoding("us-ascii", 
    new EncoderReplacementFallback(string.Empty), 
    new DecoderExceptionFallback());

string cleanString = encoder.GetString(encoder.GetBytes(dirtyString)); 

回答by Rapeapach Suwasri

If you want 8 bit representation of characters that used in many encoding, this may help you.

如果您想要在许多编码中使用的字符的 8 位表示,这可能对您有所帮助。

You must change variable targetEncodingto whatever encoding you want.

您必须将变量targetEncoding更改为您想要的任何编码。

Encoding targetEncoding = Encoding.GetEncoding(874); // Your target encoding
Encoding utf8 = Encoding.UTF8;

var stringBytes = utf8.GetBytes(Name);
var stringTargetBytes = Encoding.Convert(utf8, targetEncoding, stringBytes);
var ascii8BitRepresentAsCsString = Encoding.GetEncoding("Latin1").GetString(stringTargetBytes);