如何在 c# 中将字符串从 utf8 转换（音译）为 ASCII（单字节）？

Question

提问by Geo

I have a string object

我有一个字符串对象

"with multiple characters and even special characters"

“具有多个字符甚至特殊字符”

I am trying to use

我正在尝试使用

UTF8Encoding utf8 = new UTF8Encoding();
ASCIIEncoding ascii = new ASCIIEncoding();

objects in order to convert that string to ascii. May I ask someone to bring some light to this simple task, that is hunting my afternoon.

对象以便将该字符串转换为 ascii。我可以请人为这个简单的任务带来一些启发，那就是打猎我的下午。

EDIT 1: What we are trying to accomplish is getting rid of special characters like some of the special windows apostrophes. The code that I posted below as an answer will not take care of that. Basically

编辑 1：我们试图完成的是摆脱一些特殊的窗口撇号等特殊字符。我在下面发布的作为答案的代码不会解决这个问题。基本上

O'Brian will become O?Brian. where ' is one of the special apostrophes

奥布莱恩将成为奥布莱恩。其中 ' 是特殊撇号之一

Answer 1

采纳答案by Mark Brackett

This was in response to your other question, that looks like it's been deleted....the point still stands.

这是对你的另一个问题的回应，看起来它已被删除......这一点仍然存在。

Looks like a classic Unicode to ASCII issue. The trick would be to find whereit's happening.

看起来像一个经典的 Unicode 到 ASCII 问题。诀窍是找到在那里，它的发生。

.NET works fine with Unicode, assuming it's told it's Unicodeto begin with (or left at the default).

.NET 与 Unicode 一起工作得很好，假设它被告知以Unicode开头（或保留默认值）。

My guessis that your receiving app can't handle it. So, I'd probably use the ASCIIEncoder withan EncoderReplacementFallbackwith String.Empty:

我的猜测是您的接收应用无法处理它。所以，我可能会使用的ASCIIEncoder 有一个EncoderReplacementFallback用的String.Empty：

using System.Text;

string inputString = GetInput();
var encoder = ASCIIEncoding.GetEncoder();
encoder.Fallback = new EncoderReplacementFallback(string.Empty);

byte[] bAsciiString = encoder.GetBytes(inputString);

// Do something with bytes...
// can write to a file as is
File.WriteAllBytes(FILE_NAME, bAsciiString);
// or turn back into a "clean" string
string cleanString = ASCIIEncoding.GetString(bAsciiString); 
// since the offending bytes have been removed, can use default encoding as well
Assert.AreEqual(cleanString, Default.GetString(bAsciiString));

Of course, in the old days, we'd just loop though and remove any chars greater than 127...well, those of us in the US at least. ;)

当然，在过去，我们只是循环并删除任何大于 127 的字符......好吧，至少我们这些在美国。;)

Answer 2

回答by Geo

I was able to figure it out. In case someone wants to know below the code that worked for me:

我能够弄清楚。如果有人想知道下面对我有用的代码：

ASCIIEncoding ascii = new ASCIIEncoding();
byte[] byteArray = Encoding.UTF8.GetBytes(sOriginal);
byte[] asciiArray = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, byteArray);
string finalString = ascii.GetString(asciiArray);

Let me know if there is a simpler way o doing it.

如果有更简单的方法，请告诉我。

Answer 3

回答by Peter Drier

For anyone who likes Extension methods, this one does the trick for us.

对于任何喜欢扩展方法的人来说，这个方法对我们有用。

using System.Text;

namespace System
{
    public static class StringExtension
    {
        private static readonly ASCIIEncoding asciiEncoding = new ASCIIEncoding();

        public static string ToAscii(this string dirty)
        {
            byte[] bytes = asciiEncoding.GetBytes(dirty);
            string clean = asciiEncoding.GetString(bytes);
            return clean;
        }
    }
}

(System namespace so it's available pretty much automatically for all of our strings.)

（系统命名空间，因此它几乎可以自动用于我们所有的字符串。）

Answer 4

回答by tonycoupland

Based on Mark's answer above (and Geo's comment), I created a two liner version to remove all ASCII exception cases from a string. Provided for people searching for this answer (as I did).

根据上面 Mark 的回答（以及 Geo 的评论），我创建了一个两行版本来从字符串中删除所有 ASCII 异常情况。为搜索此答案的人提供（就像我一样）。

using System.Text;

// Create encoder with a replacing encoder fallback
var encoder = ASCIIEncoding.GetEncoding("us-ascii", 
    new EncoderReplacementFallback(string.Empty), 
    new DecoderExceptionFallback());

string cleanString = encoder.GetString(encoder.GetBytes(dirtyString));

Answer 5

回答by Rapeapach Suwasri

If you want 8 bit representation of characters that used in many encoding, this may help you.

如果您想要在许多编码中使用的字符的 8 位表示，这可能对您有所帮助。

You must change variable targetEncodingto whatever encoding you want.

您必须将变量targetEncoding更改为您想要的任何编码。

Encoding targetEncoding = Encoding.GetEncoding(874); // Your target encoding
Encoding utf8 = Encoding.UTF8;

var stringBytes = utf8.GetBytes(Name);
var stringTargetBytes = Encoding.Convert(utf8, targetEncoding, stringBytes);
var ascii8BitRepresentAsCsString = Encoding.GetEncoding("Latin1").GetString(stringTargetBytes);

如何在 c# 中将字符串从 utf8 转换（音译）为 ASCII（单字节）？

提问by Geo

采纳答案by Mark Brackett

回答by Geo

回答by Peter Drier

回答by tonycoupland

回答by Rapeapach Suwasri

相关推荐

最近更新

标签

如何在 c# 中将字符串从 utf8 转换（音译）为 ASCII（单字节）？

提问by Geo

采纳答案by Mark Brackett

回答by Geo

回答by Peter Drier

回答by tonycoupland

回答by Rapeapach Suwasri

相关推荐

C# 在 .NET 中初始化基类

C# 为什么 WCF 返回 myObject[] 而不是我期待的 List<T> ？

C# 在 .NET 远程处理中，RemotingConfiguration.RegisterWellKnownServiceType 和 RemotingServices.Marshal 之间有什么区别？

C# 将字符串转换为类名

相关推荐

最近更新

标签