C# 清洁绳子？有没有更好的方法呢？

Question

提问by patel.milanb

I am using this method to clean the string

我正在使用这种方法来清理字符串

public static string CleanString(string dirtyString)
{
    string removeChars = " ?&^$#@!()+-,:;<>'\'-_*";
    string result = dirtyString;

    foreach (char c in removeChars)
    {
        result = result.Replace(c.ToString(), string.Empty);
    }

    return result;
}

This method works fine.. BUT there is a performance glitch in this method. everytime i pass the string, every character goes in loop, if i have a large string then it would take too much time to return the object.

这种方法工作正常......但是这种方法存在性能故障。每次我传递字符串时，每个字符都会进入循环，如果我有一个大字符串，那么返回对象将花费太多时间。

Is there any other better way of doing the same thing?. like in LINQ or JQUERY / Javascript

有没有其他更好的方法来做同样的事情？就像在 LINQ 或 JQUERY/Javascript 中一样

Any suggestion would be appreciated.

任何建议将不胜感激。

Answer 1

采纳答案by sloth

OK, consider the following test:

好的，请考虑以下测试：

public class CleanString
{
    //by MSDN http://msdn.microsoft.com/en-us/library/844skk0h(v=vs.71).aspx
    public static string UseRegex(string strIn)
    {
        // Replace invalid characters with empty strings.
        return Regex.Replace(strIn, @"[^\w\.@-]", "");
    }

    // by Paolo Tedesco
    public static String UseStringBuilder(string strIn)
    {
        const string removeChars = " ?&^$#@!()+-,:;<>'\'-_*";
        // specify capacity of StringBuilder to avoid resizing
        StringBuilder sb = new StringBuilder(strIn.Length);
        foreach (char x in strIn.Where(c => !removeChars.Contains(c)))
        {
            sb.Append(x);
        }
        return sb.ToString();
    }

    // by Paolo Tedesco, but using a HashSet
    public static String UseStringBuilderWithHashSet(string strIn)
    {
        var hashSet = new HashSet<char>(" ?&^$#@!()+-,:;<>'\'-_*");
        // specify capacity of StringBuilder to avoid resizing
        StringBuilder sb = new StringBuilder(strIn.Length);
        foreach (char x in strIn.Where(c => !hashSet.Contains(c)))
        {
            sb.Append(x);
        }
        return sb.ToString();
    }

    // by SteveDog
    public static string UseStringBuilderWithHashSet2(string dirtyString)
    {
        HashSet<char> removeChars = new HashSet<char>(" ?&^$#@!()+-,:;<>'\'-_*");
        StringBuilder result = new StringBuilder(dirtyString.Length);
        foreach (char c in dirtyString)
            if (removeChars.Contains(c))
                result.Append(c);
        return result.ToString();
    }

    // original by patel.milanb
    public static string UseReplace(string dirtyString)
    {
        string removeChars = " ?&^$#@!()+-,:;<>'\'-_*";
        string result = dirtyString;

        foreach (char c in removeChars)
        {
            result = result.Replace(c.ToString(), string.Empty);
        }

        return result;
    }

    // by L.B
    public static string UseWhere(string dirtyString)
    {
        return new String(dirtyString.Where(Char.IsLetterOrDigit).ToArray());
    }
}

static class Program
{
    /// <summary>
    /// The main entry point for the application.
    /// </summary>
    [STAThread]
    static void Main()
    {
        var dirtyString = "sdfdf.dsf8908()=(=(sadfJJLef@ssydsd?f////fj()=/§(§&/(\"&sdfdf.dsf8908()=(=(sadfJJLef@ssydsd?f////fj()=/§(§&/(\"&sdfdf.dsf8908()=(=(sadfJJLef@ssydsd?f";
        var sw = new Stopwatch();

        var iterations = 50000;

        sw.Start();
        for (var i = 0; i < iterations; i++)
            CleanString.<SomeMethod>(dirtyString);
        sw.Stop();
        Debug.WriteLine("CleanString.<SomeMethod>: " + sw.ElapsedMilliseconds.ToString());
        sw.Reset();

        ....
        <repeat>
        ....       
    }
}

Output

输出

CleanString.UseReplace: 791
CleanString.UseStringBuilder: 2805
CleanString.UseStringBuilderWithHashSet: 521
CleanString.UseStringBuilderWithHashSet2: 331
CleanString.UseRegex: 1700
CleanString.UseWhere: 233

Conclusion

结论

Does probably not matter which method you use.

您使用哪种方法可能并不重要。

The difference in time between the fasted (UseWhere: 233ms) and the slowest (UseStringBuilder: 2805ms) method is 2572ms when called 50000(!) times in a row. You should probably not need to care about it if don't run the method that often.

连续调用 50000(!) 次时，禁食 ( UseWhere:233ms) 和最慢 ( UseStringBuilder:2805ms) 方法之间的时间差为 2572ms。如果不经常运行该方法，您可能不需要关心它。

But if you do, use the UseWheremethod (written by L.B); but also note that it is slightly different.

但是如果你这样做了，请使用UseWhere方法（由 LB 编写）；但也要注意，它略有不同。

Answer 2

回答by burning_LEGION

use regex [?&^$#@!()+-,:;<>'\'-_*]for replacing with empty string

使用正则表达式[?&^$#@!()+-,:;<>'\'-_*]替换为空字符串

Answer 3

回答by Stuart.Sklinar

Give this a try: http://msdn.microsoft.com/en-us/library/xwewhkd1.aspx

试试这个：http: //msdn.microsoft.com/en-us/library/xwewhkd1.aspx

Answer 4

回答by Steven Doggart

If it's purely speed and efficiency you are after, I would recommend doing something like this:

如果您追求的纯粹是速度和效率，我建议您这样做：

public static string CleanString(string dirtyString)
{
    HashSet<char> removeChars = new HashSet<char>(" ?&^$#@!()+-,:;<>'\'-_*");
    StringBuilder result = new StringBuilder(dirtyString.Length);
    foreach (char c in dirtyString)
        if (!removeChars.Contains(c)) // prevent dirty chars
            result.Append(c);
    return result.ToString();
}

RegEx is certainly an elegant solution, but it adds extra overhead. By specifying the starting length of the string builder, it will only need to allocate the memory once (and a second time for the ToStringat the end). This will cut down on memory usage and increase the speed, especially on longer strings.

RegEx 无疑是一个优雅的解决方案，但它增加了额外的开销。通过指定字符串构建器的起始长度，它只需要分配一次内存（最后一次分配内存ToString）。这将减少内存使用并提高速度，尤其是在较长的字符串上。

However, as L.B. said, if you are using this to properly encode text that is bound for HTML output, you should be using HttpUtility.HtmlEncodeinstead of doing it yourself.

但是，正如 LB 所说，如果您使用它来正确编码绑定到 HTML 输出的文本，您应该使用HttpUtility.HtmlEncode而不是自己做。

Answer 5

回答by Paolo Tedesco

I don't know if, performance-wise, using a Regexor LINQ would be an improvement.
Something that could be useful, would be to create the new string with a StringBuilderinstead of using string.Replaceeach time:

我不知道在性能方面，使用 aRegex或 LINQ 是否会有所改进。
可能有用的东西是使用 a 创建新字符串StringBuilder而不是string.Replace每次都使用：

using System.Linq;
using System.Text;

static class Program {
    static void Main(string[] args) {
        const string removeChars = " ?&^$#@!()+-,:;<>'\'-_*";
        string result = "x&y(z)";
        // specify capacity of StringBuilder to avoid resizing
        StringBuilder sb = new StringBuilder(result.Length);
        foreach (char x in result.Where(c => !removeChars.Contains(c))) {
            sb.Append(x);
        }
        result = sb.ToString();
    }
}

Answer 6

回答by atlaste

Perhaps it helps to first explain the 'why' and then the 'what'. The reason you're getting slow performance is because c# copies-and-replaces the strings for each replacement. From my experience using Regex in .NET isn't always better - although in most scenario's (I think including this one) it'll probably work just fine.

也许先解释“为什么”然后再解释“什么”会有所帮助。性能变慢的原因是 c# 会复制并替换每次替换的字符串。根据我在 .NET 中使用 Regex 的经验，并不总是更好 - 尽管在大多数情况下（我认为包括这个）它可能会工作得很好。

If I really need performance I usually don't leave it up to luck and just tell the compiler exactly what I want: that is: create a string with the upper bound number of characters and copy all the chars in there that you need. It's also possible to replace the hashset with a switch / case or array in which case you might end up with a jump table or array lookup - which is even faster.

如果我真的需要性能，我通常不会靠运气，只是告诉编译器我想要什么：即：创建一个具有上限字符数的字符串，然后复制您需要的所有字符。也可以用 switch/case 或 array 替换 hashset，在这种情况下，你可能会以跳转表或数组查找结束——这甚至更快。

The 'pragmatic' best, but fast solution is:

“务实”最好但快速的解决方案是：

char[] data = new char[dirtyString.Length];
int ptr = 0;
HashSet<char> hs = new HashSet<char>() { /* all your excluded chars go here */ };
foreach (char c in dirtyString)
    if (!hs.Contains(c))
        data[ptr++] = c;
return new string(data, 0, ptr);

BTW: this solution is incorrect when you want to process high surrogate Unicode characters - but can easily be adapted to include these characters.

顺便说一句：当您想要处理高代理 Unicode 字符时，此解决方案是不正确的 - 但可以轻松地进行调整以包含这些字符。

-Stefan.

-斯蒂芬。

Answer 7

回答by gd73

This one is even faster!
use:

这个速度更快！
用：

string dirty=@"tfgtf$@$%gttg%$% 664%$";
string clean = dirty.Clean();


    public static string Clean(this String name)
    {
        var namearray = new Char[name.Length];

        var newIndex = 0;
        for (var index = 0; index < namearray.Length; index++)
        {
            var letter = (Int32)name[index];

            if (!((letter > 96 && letter < 123) || (letter > 64 && letter < 91) || (letter > 47 && letter < 58)))
                continue;

            namearray[newIndex] = (Char)letter;
            ++newIndex;
        }

        return new String(namearray).TrimEnd();
    }

Answer 8

回答by user2623295

I am not able to spend time on acid testing this but this line did not actually clean slashes as desired.

我无法花时间对此进行酸测试，但这条线实际上并没有按照需要清理斜线。

HashSet<char> removeChars = new HashSet<char>(" ?&^$#@!()+-,:;<>'\'-_*");

I had to add slashes individually and escape the backslash

我不得不单独添加斜杠并避开反斜杠

HashSet<char> removeChars = new HashSet<char>(" ?&^$#@!()+-,:;<>''-_*");
removeChars.Add('/');
removeChars.Add('\');

Answer 9

回答by Manuel.B

I use this in my current project and it works fine. It takes a sentence, it removes all the non alphanumerical characters, it then returns the sentence with all the words in the first letter upper case and everything else in lower case. Maybe I should call it SentenceNormalizer. Naming is hard :)

我在我当前的项目中使用它，它工作正常。它需要一个句子，它删除所有非字母数字字符，然后返回第一个字母大写的所有单词以及其他所有小写字母的句子。也许我应该称之为 SentenceNormalizer。命名很难:)

    internal static string StringSanitizer(string whateverString)
{
    whateverString = whateverString.Trim().ToLower();
    Regex cleaner = new Regex("(?:[^a-zA-Z0-9 ])", RegexOptions.IgnoreCase | RegexOptions.CultureInvariant | RegexOptions.Compiled);
    var listOfWords = (cleaner.Replace(whateverString, string.Empty).Split(' ', StringSplitOptions.RemoveEmptyEntries)).ToList();
    string cleanString = string.Empty;
    foreach (string word in listOfWords)
    {
        cleanString += $"{word.First().ToString().ToUpper() + word.Substring(1)} ";
    }
    return cleanString;
}

C# 清洁绳子？有没有更好的方法呢？

提问by patel.milanb

采纳答案by sloth

回答by burning_LEGION

回答by Stuart.Sklinar

回答by Steven Doggart

回答by Paolo Tedesco

回答by atlaste

回答by gd73

回答by user2623295

回答by Manuel.B

相关推荐

最近更新

标签

C# 清洁绳子？有没有更好的方法呢？

提问by patel.milanb

采纳答案by sloth

回答by burning_LEGION

回答by Stuart.Sklinar

回答by Steven Doggart

回答by Paolo Tedesco

回答by atlaste

回答by gd73

回答by user2623295

回答by Manuel.B

相关推荐

c#使用默认应用程序和参数打开文件

C# 如何从 .NET 使用 Oracle？

如何在 C# 中对字符串进行 SHA512 处理？

如何找到调用方法C#的全名

相关推荐

最近更新

标签