C# 清洁绳子?有没有更好的方法呢?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/11395775/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-09 17:42:46  来源:igfitidea点击:

Clean the string? is there any better way of doing it?

c#asp.netstringlinq

提问by patel.milanb

I am using this method to clean the string

我正在使用这种方法来清理字符串

public static string CleanString(string dirtyString)
{
    string removeChars = " ?&^$#@!()+-,:;<>'\'-_*";
    string result = dirtyString;

    foreach (char c in removeChars)
    {
        result = result.Replace(c.ToString(), string.Empty);
    }

    return result;
}

This method works fine.. BUT there is a performance glitch in this method. everytime i pass the string, every character goes in loop, if i have a large string then it would take too much time to return the object.

这种方法工作正常......但是这种方法存在性能故障。每次我传递字符串时,每个字符都会进入循环,如果我有一个大字符串,那么返回对象将花费太多时间。

Is there any other better way of doing the same thing?. like in LINQ or JQUERY / Javascript

有没有其他更好的方法来做同样的事情?就像在 LINQ 或 JQUERY/Javascript 中一样

Any suggestion would be appreciated.

任何建议将不胜感激。

采纳答案by sloth

OK, consider the following test:

好的,请考虑以下测试:

public class CleanString
{
    //by MSDN http://msdn.microsoft.com/en-us/library/844skk0h(v=vs.71).aspx
    public static string UseRegex(string strIn)
    {
        // Replace invalid characters with empty strings.
        return Regex.Replace(strIn, @"[^\w\.@-]", "");
    }

    // by Paolo Tedesco
    public static String UseStringBuilder(string strIn)
    {
        const string removeChars = " ?&^$#@!()+-,:;<>'\'-_*";
        // specify capacity of StringBuilder to avoid resizing
        StringBuilder sb = new StringBuilder(strIn.Length);
        foreach (char x in strIn.Where(c => !removeChars.Contains(c)))
        {
            sb.Append(x);
        }
        return sb.ToString();
    }

    // by Paolo Tedesco, but using a HashSet
    public static String UseStringBuilderWithHashSet(string strIn)
    {
        var hashSet = new HashSet<char>(" ?&^$#@!()+-,:;<>'\'-_*");
        // specify capacity of StringBuilder to avoid resizing
        StringBuilder sb = new StringBuilder(strIn.Length);
        foreach (char x in strIn.Where(c => !hashSet.Contains(c)))
        {
            sb.Append(x);
        }
        return sb.ToString();
    }

    // by SteveDog
    public static string UseStringBuilderWithHashSet2(string dirtyString)
    {
        HashSet<char> removeChars = new HashSet<char>(" ?&^$#@!()+-,:;<>'\'-_*");
        StringBuilder result = new StringBuilder(dirtyString.Length);
        foreach (char c in dirtyString)
            if (removeChars.Contains(c))
                result.Append(c);
        return result.ToString();
    }

    // original by patel.milanb
    public static string UseReplace(string dirtyString)
    {
        string removeChars = " ?&^$#@!()+-,:;<>'\'-_*";
        string result = dirtyString;

        foreach (char c in removeChars)
        {
            result = result.Replace(c.ToString(), string.Empty);
        }

        return result;
    }

    // by L.B
    public static string UseWhere(string dirtyString)
    {
        return new String(dirtyString.Where(Char.IsLetterOrDigit).ToArray());
    }
}

static class Program
{
    /// <summary>
    /// The main entry point for the application.
    /// </summary>
    [STAThread]
    static void Main()
    {
        var dirtyString = "sdfdf.dsf8908()=(=(sadfJJLef@ssydsd?f////fj()=/§(§&/(\"&sdfdf.dsf8908()=(=(sadfJJLef@ssydsd?f////fj()=/§(§&/(\"&sdfdf.dsf8908()=(=(sadfJJLef@ssydsd?f";
        var sw = new Stopwatch();

        var iterations = 50000;

        sw.Start();
        for (var i = 0; i < iterations; i++)
            CleanString.<SomeMethod>(dirtyString);
        sw.Stop();
        Debug.WriteLine("CleanString.<SomeMethod>: " + sw.ElapsedMilliseconds.ToString());
        sw.Reset();

        ....
        <repeat>
        ....       
    }
}


Output

输出

CleanString.UseReplace: 791
CleanString.UseStringBuilder: 2805
CleanString.UseStringBuilderWithHashSet: 521
CleanString.UseStringBuilderWithHashSet2: 331
CleanString.UseRegex: 1700
CleanString.UseWhere: 233


Conclusion

结论

Does probably not matter which method you use.

您使用哪种方法可能并不重要。

The difference in time between the fasted (UseWhere: 233ms) and the slowest (UseStringBuilder: 2805ms) method is 2572ms when called 50000(!) times in a row. You should probably not need to care about it if don't run the method that often.

连续调用 50000(!) 次时,禁食 ( UseWhere:233ms) 和最慢 ( UseStringBuilder:2805ms) 方法之间的时间差为 2572ms。如果不经常运行该方法,您可能不需要关心它。

But if you do, use the UseWheremethod (written by L.B); but also note that it is slightly different.

但是如果你这样做了,请使用UseWhere方法(由 LB 编写);但也要注意,它略有不同。

回答by burning_LEGION

use regex [?&^$#@!()+-,:;<>'\'-_*]for replacing with empty string

使用正则表达式[?&^$#@!()+-,:;<>'\'-_*]替换为空字符串

回答by Steven Doggart

If it's purely speed and efficiency you are after, I would recommend doing something like this:

如果您追求的纯粹是速度和效率,我建议您这样做:

public static string CleanString(string dirtyString)
{
    HashSet<char> removeChars = new HashSet<char>(" ?&^$#@!()+-,:;<>'\'-_*");
    StringBuilder result = new StringBuilder(dirtyString.Length);
    foreach (char c in dirtyString)
        if (!removeChars.Contains(c)) // prevent dirty chars
            result.Append(c);
    return result.ToString();
}

RegEx is certainly an elegant solution, but it adds extra overhead. By specifying the starting length of the string builder, it will only need to allocate the memory once (and a second time for the ToStringat the end). This will cut down on memory usage and increase the speed, especially on longer strings.

RegEx 无疑是一个优雅的解决方案,但它增加了额外的开销。通过指定字符串构建器的起始长度,它只需要分配一次内存(最后一次分配内存ToString)。这将减少内存使用并提高速度,尤其是在较长的字符串上。

However, as L.B. said, if you are using this to properly encode text that is bound for HTML output, you should be using HttpUtility.HtmlEncodeinstead of doing it yourself.

但是,正如 LB 所说,如果您使用它来正确编码绑定到 HTML 输出的文本,您应该使用HttpUtility.HtmlEncode而不是自己做。

回答by Paolo Tedesco

I don't know if, performance-wise, using a Regexor LINQ would be an improvement.
Something that could be useful, would be to create the new string with a StringBuilderinstead of using string.Replaceeach time:

我不知道在性能方面,使用 aRegex或 LINQ 是否会有所改进。
可能有用的东西是使用 a 创建新字符串StringBuilder而不是string.Replace每次都使用:

using System.Linq;
using System.Text;

static class Program {
    static void Main(string[] args) {
        const string removeChars = " ?&^$#@!()+-,:;<>'\'-_*";
        string result = "x&y(z)";
        // specify capacity of StringBuilder to avoid resizing
        StringBuilder sb = new StringBuilder(result.Length);
        foreach (char x in result.Where(c => !removeChars.Contains(c))) {
            sb.Append(x);
        }
        result = sb.ToString();
    }
}

回答by atlaste

Perhaps it helps to first explain the 'why' and then the 'what'. The reason you're getting slow performance is because c# copies-and-replaces the strings for each replacement. From my experience using Regex in .NET isn't always better - although in most scenario's (I think including this one) it'll probably work just fine.

也许先解释“为什么”然后再解释“什么”会有所帮助。性能变慢的原因是 c# 会复制并替换每次替换的字符串。根据我在 .NET 中使用 Regex 的经验,并不总是更好 - 尽管在大多数情况下(我认为包括这个)它可能会工作得很好。

If I really need performance I usually don't leave it up to luck and just tell the compiler exactly what I want: that is: create a string with the upper bound number of characters and copy all the chars in there that you need. It's also possible to replace the hashset with a switch / case or array in which case you might end up with a jump table or array lookup - which is even faster.

如果我真的需要性能,我通常不会靠运气,只是告诉编译器我想要什么:即:创建一个具有上限字符数的字符串,然后复制您需要的所有字符。也可以用 switch/case 或 array 替换 hashset,在这种情况下,你可能会以跳转表或数组查找结束——这甚至更快。

The 'pragmatic' best, but fast solution is:

“务实”最好但快速的解决方案是:

char[] data = new char[dirtyString.Length];
int ptr = 0;
HashSet<char> hs = new HashSet<char>() { /* all your excluded chars go here */ };
foreach (char c in dirtyString)
    if (!hs.Contains(c))
        data[ptr++] = c;
return new string(data, 0, ptr);

BTW: this solution is incorrect when you want to process high surrogate Unicode characters - but can easily be adapted to include these characters.

顺便说一句:当您想要处理高代理 Unicode 字符时,此解决方案是不正确的 - 但可以轻松地进行调整以包含这些字符。

-Stefan.

-斯蒂芬。

回答by gd73

This one is even faster!
use:

这个速度更快!
用:

string dirty=@"tfgtf$@$%gttg%$% 664%$";
string clean = dirty.Clean();


    public static string Clean(this String name)
    {
        var namearray = new Char[name.Length];

        var newIndex = 0;
        for (var index = 0; index < namearray.Length; index++)
        {
            var letter = (Int32)name[index];

            if (!((letter > 96 && letter < 123) || (letter > 64 && letter < 91) || (letter > 47 && letter < 58)))
                continue;

            namearray[newIndex] = (Char)letter;
            ++newIndex;
        }

        return new String(namearray).TrimEnd();
    }

回答by user2623295

I am not able to spend time on acid testing this but this line did not actually clean slashes as desired.

我无法花时间对此进行酸测试,但这条线实际上并没有按照需要清理斜线。

HashSet<char> removeChars = new HashSet<char>(" ?&^$#@!()+-,:;<>'\'-_*");

I had to add slashes individually and escape the backslash

我不得不单独添加斜杠并避开反斜杠

HashSet<char> removeChars = new HashSet<char>(" ?&^$#@!()+-,:;<>''-_*");
removeChars.Add('/');
removeChars.Add('\');

回答by Manuel.B

I use this in my current project and it works fine. It takes a sentence, it removes all the non alphanumerical characters, it then returns the sentence with all the words in the first letter upper case and everything else in lower case. Maybe I should call it SentenceNormalizer. Naming is hard :)

我在我当前的项目中使用它,它工作正常。它需要一个句子,它删除所有非字母数字字符,然后返回第一个字母大写的所有单词以及其他所有小写字母的句子。也许我应该称之为 SentenceNormalizer。命名很难:)

    internal static string StringSanitizer(string whateverString)
{
    whateverString = whateverString.Trim().ToLower();
    Regex cleaner = new Regex("(?:[^a-zA-Z0-9 ])", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant | RegexOptions.Compiled);
    var listOfWords = (cleaner.Replace(whateverString, string.Empty).Split(' ', StringSplitOptions.RemoveEmptyEntries)).ToList();
    string cleanString = string.Empty;
    foreach (string word in listOfWords)
    {
        cleanString += $"{word.First().ToString().ToUpper() + word.Substring(1)} ";
    }
    return cleanString;
}