如何检测 SUB 字符并将其从 C# 中的文本文件中删除?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12013201/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-09 20:04:46  来源:igfitidea点击:

How to detect SUB character and remove it from a text file in C#?

c#stringtext

提问by user981848

I am writing a program to process special text files. Some of these text files end with a SUB character (a substitute character. It may be 0x1A.) How do I detect this character and remove it from the text file using C#?

我正在编写一个程序来处理特殊的文本文件。其中一些文本文件以 SUB 字符(替代字符。它可能是 0x1A)结尾。如何使用 C# 检测此字符并将其从文本文件中删除?

采纳答案by Jon Skeet

If it's really 0x1A in the binary data, and if you're reading it as an ASCII or UTF-8 file, it should end up as U+001A when read in .NET. So you maybe able to write something like:

如果它在二进制数据中确实是 0x1A,并且您将它作为 ASCII 或 UTF-8 文件读取,那么在 .NET 中读取时它应该以 U+001A 结尾。因此,您可能可以编写如下内容:

string text = File.ReadAllText("file.txt");
text = text.Replace("\u001a", "");
File.WriteAllText("file.txt", text);

Note that the "\u001a"part is a string consisting of a single character: \uxxxxis an escape sequence for a single UTF-16 code point with the given Unicode value expressed in hex.

请注意,该"\u001a"部分是由单个字符组成的字符串:\uxxxx是单个 UTF-16 代码点的转义序列,具有以十六进制表示的给定 Unicode 值。

回答by KeithS

The easiest answer would probably be a Regex:

最简单的答案可能是正则表达式:

public static string RemoveAll(this string input, char toRemove)
{
   //produces a pattern like "\x1a+" which will match any occurrence
   //of one or more of the character with that hex value
   var pattern = @"\x" + ((int)toRemove).ToString("x") + "+";

   return Regex.Replace(input, pattern, String.Empty);
}

//usage
var cleanString = dirtyString.RemoveAll((char)0x1a);

Yes, you could just pass in the int, but that requires knowing the integer value of the character. using a char as a parameter allows you to specify a literal or char variable with less muck.

是的,您可以只传入 int,但这需要知道字符的整数值。使用 char 作为参数允许您指定一个字面量或 char 变量较少的垃圾。

回答by MethodMan

You could also try something like this it should work

你也可以尝试这样的东西它应该可以工作

using (FileStream f = File.OpenRead("path\file")) //Your filename + extension  
{
    using (StreamReader sr = new StreamReader(f)) 
    {
        string text = sr.ReadToEnd();
        text = text.Replace("\u001a", string.Empty);
    }
}

回答by donners45

C# has a method to detect control characters (including SUB). See msdn : https://msdn.microsoft.com/en-us/library/9s05w2k9(v=vs.110).aspx

C# 有一种检测控制字符(包括 SUB)的方法。请参阅 msdn:https://msdn.microsoft.com/en-us/library/9s05w2k9( v=vs.110).aspx