C# 从字符串中删除隐藏字符

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15259275/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-10 14:43:25  来源:igfitidea点击:

Removing hidden characters from within strings

c#.netstringhidden-characters

提问by bradley4

My problem:

我的问题:

I have a .NET application that sends out newsletters via email. When the newsletters are viewed in outlook, outlook displays a question mark in place of a hidden character it can't recognize. These hidden character(s) are coming from end users who copy and paste html that makes up the newsletters into a form and submits it. A c# trim() removes these hidden chars if they occur at the end or beginning of the string. When the newsletter is viewed in gmail, gmail does a good job ignoring them. When pasting these hidden characters in a word document and I turn on the “show paragraph marks and hidden symbols” option the symbols appear as one rectangle inside a bigger rectangle. Also the text that makes up the newsletters can be in any language, so accepting Unicode chars is a must. I've tried looping through the string to detect the character but the loop doesn't recognize it and passes over it. Also asking the end user to paste the html into notepad first before submitting it is out of the question.

我有一个 .NET 应用程序,它通过电子邮件发送时事通讯。在 Outlook 中查看时事通讯时,Outlook 会显示一个问号,而不是它无法识别的隐藏字符。这些隐藏字符来自将构成时事通讯的 html 复制并粘贴到表单中并提交的最终用户。如果这些隐藏字符出现在字符串的末尾或开头,c#trim() 会删除它们。当在 gmail 中查看时事通讯时,gmail 可以很好地忽略它们。在 Word 文档中粘贴这些隐藏字符并打开“显示段落标记和隐藏符号”选项时,这些符号显示为一个更大矩形内的一个矩形。此外,构成新闻稿的文本可以是任何语言,因此必须接受 Unicode 字符。一世' 我尝试遍历字符串以检测字符,但循环无法识别它并通过它。同样要求最终用户在提交之前先将 html 粘贴到记事本中也是不可能的。

My question:
How can I detect and eliminate these hidden characters using C#?

我的问题:
如何使用 C# 检测和消除这些隐藏字符?

采纳答案by Yannick Blondeau

You can remove all control characters from your input string with something like this:

您可以使用以下内容从输入字符串中删除所有控制字符:

string input; // this is your input string
string output = new string(input.Where(c => !char.IsControl(c)).ToArray());

Here is the documentationfor the IsControl()method.

这是IsControl()方法的文档

Or if you want to keep letters and digits only, you can also use the IsLetterand IsDigitfunction:

或者,如果您只想保留字母和数字,也可以使用IsLetterandIsDigit函数:

string output = new string(input.Where(c => char.IsLetter(c) || char.IsDigit(c)).ToArray());

回答by aush

You can do this:

你可以这样做:

var hChars = new char[] {...};
var result = new string(yourString.Where(c => !hChars.Contains(c)).ToArray());

回答by SimSimY

It has been a while but this haven't been answered yet.

已经有一段时间了,但这还没有得到答复。

How do you include the HMTL content in the sending code? if you are reading it from file, check the file encoding. If you are using UTF-8 with signature (the name slightly varies between editors), this is may cause the weird char at the begining of the mail.

你如何在发送代码中包含 HMTL 内容?如果您是从文件中读取它,请检查文件编码。如果您使用带签名的 UTF-8(不同编辑器的名称略有不同),这可能会导致邮件开头出现奇怪的字符。

回答by Mubashar

I usually use this regular expression to replace all non-printable characters.

我通常使用这个正则表达式来替换所有不可打印的字符。

By the way, most of the people think that tab, line feed and carriage return are non-printable characters, but for me they are not.

顺便说一句,大多数人认为制表符、换行符和回车符是不可打印的字符,但对我来说不是。

So here is the expression:

所以这里是表达式:

string output = Regex.Replace(input, @"[^\u0009\u000A\u000D\u0020-\u007E]", "*");
  • ^means if it's any of the following:
  • \u0009is tab
  • \u000Ais linefeed
  • \u000Dis carriage return
  • \u0020-\u007Emeans everything from space to ~-- that is, everything in ASCII.
  • ^表示是否属于以下任何一种情况:
  • \u0009是标签
  • \u000A是换行
  • \u000D是回车
  • \u0020-\u007E意味着从空间到~- 即 ASCII 中的所有内容。

See ASCII tableif you want to make changes. Remember it would strip off every non-ASCII character.

如果要进行更改,请参阅ASCII 表。请记住,它会去除每个非 ASCII 字符。

To test above you can create a string by yourself like this:

要进行上面的测试,您可以像这样自己创建一个字符串:

    string input = string.Empty;

    for (int i = 0; i < 255; i++)
    {
        input += (char)(i);
    }

回答by Niraj Kheria

string output = new string(input.Where(c => !char.IsControl(c)).ToArray());

This will surely solve the problem. I had a non printable substitute characer(ASCII 26) in a string which was causing my app to break and this line of code removed the characters

这肯定会解决问题。我在一个字符串中有一个不可打印的替代字符(ASCII 26),这导致我的应用程序中断,这行代码删除了这些字符

回答by Igor Meszaros

What best worked for me is:

最适合我的是:

string result = new string(value.Where(c =>  char.IsLetterOrDigit(c) || (c >= ' ' && c <= byte.MaxValue)).ToArray());

Where I'm making sure the character is any letter or digit, so that I don't ignore any non English letters, or if it is not a letter I check whether it's an ascii character that is greater or equal than Space to make sure I ignore some control characters, this ensures I don't ignore punctuation.

在我确保字符是任何字母或数字的地方,这样我就不会忽略任何非英文字母,或者如果它不是一个字母,我会检查它是否是一个大于或等于 Space 的 ascii 字符以确保我忽略了一些控制字符,这确保我不会忽略标点符号。

Some suggest using IsControl to check whether the character is non printable or not, but that ignores Left-To-Right mark for example.

有些人建议使用 IsControl 检查字符是否不可打印,但例如忽略从左到右标记。

回答by shanmuga raja

new string(input.Where(c => !char.IsControl(c)).ToArray());

IsControl misses some control characters like left-to-right mark (LRM) (the char which commonly hides in a string while doing copy paste). If you are sure that your string has only digits and numbers then you can use IsLetterOrDigit

IsControl 遗漏了一些控制字符,例如从左到右的标记 (LRM)(在执行复制粘贴时通常隐藏在字符串中的字符)。如果您确定您的字符串只有数字和数字,那么您可以使用 IsLetterOrDigit

new string(input.Where(c => char.IsLetterOrDigit(c)).ToArray())

If your string has special characters, then

如果你的字符串有特殊字符,那么

new string(input.Where(c => c < 128).ToArray())