.NET - 如何将“大写”分隔的字符串拆分为数组?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/155303/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-03 10:10:38  来源:igfitidea点击:

.NET - How can you split a "caps" delimited string into an array?

.netregexalgorithmstringpascalcasing

提问by Matias Nino

How do I go from this string: "ThisIsMyCapsDelimitedString"

我如何从这个字符串开始:“ThisIsMyCapsDelimitedString”

...to this string: "This Is My Caps Delimited String"

...到这个字符串:“这是我的大写分隔字符串”

Fewest lines of code in VB.net is preferred but C# is also welcome.

VB.net 中的代码行数最少是首选,但也欢迎使用 C#。

Cheers!

干杯!

回答by Markus Jarderot

I made this a while ago. It matches each component of a CamelCase name.

我不久前做了这个。它匹配 CamelCase 名称的每个组成部分。

/([A-Z]+(?=$|[A-Z][a-z])|[A-Z]?[a-z]+)/g

For example:

例如:

"SimpleHTTPServer" => ["Simple", "HTTP", "Server"]
"camelCase" => ["camel", "Case"]

To convert that to just insert spaces between the words:

要将其转换为仅在单词之间插入空格:

Regex.Replace(s, "([a-z](?=[A-Z])|[A-Z](?=[A-Z][a-z]))", " ")


If you need to handle digits:

如果您需要处理数字:

/([A-Z]+(?=$|[A-Z][a-z]|[0-9])|[A-Z]?[a-z]+|[0-9]+)/g

Regex.Replace(s,"([a-z](?=[A-Z]|[0-9])|[A-Z](?=[A-Z][a-z]|[0-9])|[0-9](?=[^0-9]))"," ")

回答by Wayne

Regex.Replace("ThisIsMyCapsDelimitedString", "(\B[A-Z])", " ")

回答by JoshL

Great answer, MizardX! I tweaked it slightly to treat numerals as separate words, so that "AddressLine1" would become "Address Line 1" instead of "Address Line1":

很好的答案,MizardX!我稍微调整了一下以将数字视为单独的单词,以便“AddressLine1”成为“Address Line 1”而不是“Address Line1”:

Regex.Replace(s, "([a-z](?=[A-Z0-9])|[A-Z](?=[A-Z][a-z]))", " ")

回答by Troy Howard

Just for a little variety... Here's an extension method that doesn't use a regex.

只是为了一点点变化......这是一个不使用正则表达式的扩展方法。

public static class CamelSpaceExtensions
{
    public static string SpaceCamelCase(this String input)
    {
        return new string(Enumerable.Concat(
            input.Take(1), // No space before initial cap
            InsertSpacesBeforeCaps(input.Skip(1))
        ).ToArray());
    }

    private static IEnumerable<char> InsertSpacesBeforeCaps(IEnumerable<char> input)
    {
        foreach (char c in input)
        {
            if (char.IsUpper(c)) 
            { 
                yield return ' '; 
            }

            yield return c;
        }
    }
}

回答by Pseudo Masochist

Grant Wagner's excellent comment aside:

格兰特·瓦格纳 (Grant Wagner) 的精彩评论放在一边:

Dim s As String = RegularExpressions.Regex.Replace("ThisIsMyCapsDelimitedString", "([A-Z])", " ")

回答by Dan Malcolm

I needed a solution that supports acronyms and numbers. This Regex-based solution treats the following patterns as individual "words":

我需要一个支持首字母缩略词和数字的解决方案。这个基于正则表达式的解决方案将以下模式视为单独的“单词”:

  • A capital letter followed by lowercase letters
  • A sequence of consecutive numbers
  • Consecutive capital letters (interpreted as acronyms) - a new word can begin using the last capital, e.g. HTMLGuide => "HTML Guide", "TheATeam" => "The A Team"
  • 大写字母后跟小写字母
  • 一个连续的数字序列
  • 连续的大写字母(解释为首字母缩写词) - 一个新词可以使用最后一个大写字母开始,例如 HTMLGuide => "HTML Guide", "TheATeam" => "The A Team"

You coulddo it as a one-liner:

可以把它作为一个单行:

Regex.Replace(value, @"(?<!^)((?<!\d)\d|(?(?<=[A-Z])[A-Z](?=[a-z])|[A-Z]))", " ")

A more readable approach might be better:

更具可读性的方法可能会更好:

using System.Text.RegularExpressions;

namespace Demo
{
    public class IntercappedStringHelper
    {
        private static readonly Regex SeparatorRegex;

        static IntercappedStringHelper()
        {
            const string pattern = @"
                (?<!^) # Not start
                (
                    # Digit, not preceded by another digit
                    (?<!\d)\d 
                    |
                    # Upper-case letter, followed by lower-case letter if
                    # preceded by another upper-case letter, e.g. 'G' in HTMLGuide
                    (?(?<=[A-Z])[A-Z](?=[a-z])|[A-Z])
                )";

            var options = RegexOptions.IgnorePatternWhitespace | RegexOptions.Compiled;

            SeparatorRegex = new Regex(pattern, options);
        }

        public static string SeparateWords(string value, string separator = " ")
        {
            return SeparatorRegex.Replace(value, separator + "");
        }
    }
}

Here's an extract from the (XUnit) tests:

这是(XUnit)测试的摘录:

[Theory]
[InlineData("PurchaseOrders", "Purchase-Orders")]
[InlineData("purchaseOrders", "purchase-Orders")]
[InlineData("2Unlimited", "2-Unlimited")]
[InlineData("The2Unlimited", "The-2-Unlimited")]
[InlineData("Unlimited2", "Unlimited-2")]
[InlineData("222Unlimited", "222-Unlimited")]
[InlineData("The222Unlimited", "The-222-Unlimited")]
[InlineData("Unlimited222", "Unlimited-222")]
[InlineData("ATeam", "A-Team")]
[InlineData("TheATeam", "The-A-Team")]
[InlineData("TeamA", "Team-A")]
[InlineData("HTMLGuide", "HTML-Guide")]
[InlineData("TheHTMLGuide", "The-HTML-Guide")]
[InlineData("TheGuideToHTML", "The-Guide-To-HTML")]
[InlineData("HTMLGuide5", "HTML-Guide-5")]
[InlineData("TheHTML5Guide", "The-HTML-5-Guide")]
[InlineData("TheGuideToHTML5", "The-Guide-To-HTML-5")]
[InlineData("TheUKAllStars", "The-UK-All-Stars")]
[InlineData("AllStarsUK", "All-Stars-UK")]
[InlineData("UKAllStars", "UK-All-Stars")]

回答by Robert Paulson

For more variety, using plain old C# objects, the following produces the same output as @MizardX's excellent regular expression.

为了更加多样化,使用普通的旧 C# 对象,以下生成与@MizardX 出色的正则表达式相同的输出。

public string FromCamelCase(string camel)
{   // omitted checking camel for null
    StringBuilder sb = new StringBuilder();
    int upperCaseRun = 0;
    foreach (char c in camel)
    {   // append a space only if we're not at the start
        // and we're not already in an all caps string.
        if (char.IsUpper(c))
        {
            if (upperCaseRun == 0 && sb.Length != 0)
            {
                sb.Append(' ');
            }
            upperCaseRun++;
        }
        else if( char.IsLower(c) )
        {
            if (upperCaseRun > 1) //The first new word will also be capitalized.
            {
                sb.Insert(sb.Length - 1, ' ');
            }
            upperCaseRun = 0;
        }
        else
        {
            upperCaseRun = 0;
        }
        sb.Append(c);
    }

    return sb.ToString();
}

回答by Brantley Blanchard

Below is a prototype that converts the following to Title Case:

以下是将以下内容转换为标题案例的原型:

  • snake_case
  • camelCase
  • PascalCase
  • sentence case
  • Title Case (keep current formatting)
  • 蛇案例
  • 骆驼香烟盒
  • 帕斯卡案例
  • 判例
  • 标题大小写(保持当前格式)

Obviously you would only need the "ToTitleCase" method yourself.

显然,您自己只需要“ToTitleCase”方法。

using System;
using System.Collections.Generic;
using System.Globalization;
using System.Text.RegularExpressions;

public class Program
{
    public static void Main()
    {
        var examples = new List<string> { 
            "THEQuickBrownFox",
            "theQUICKBrownFox",
            "TheQuickBrownFOX",
            "TheQuickBrownFox",
            "the_quick_brown_fox",
            "theFOX",
            "FOX",
            "QUICK"
        };

        foreach (var example in examples)
        {
            Console.WriteLine(ToTitleCase(example));
        }
    }

    private static string ToTitleCase(string example)
    {
        var fromSnakeCase = example.Replace("_", " ");
        var lowerToUpper = Regex.Replace(fromSnakeCase, @"(\p{Ll})(\p{Lu})", " ");
        var sentenceCase = Regex.Replace(lowerToUpper, @"(\p{Lu}+)(\p{Lu}\p{Ll})", " ");
        return new CultureInfo("en-US", false).TextInfo.ToTitleCase(sentenceCase);
    }
}

The console out would be as follows:

控制台输出如下:

THE Quick Brown Fox
The QUICK Brown Fox
The Quick Brown FOX
The Quick Brown Fox
The Quick Brown Fox
The FOX
FOX
QUICK
THE Quick Brown Fox
The QUICK Brown Fox
The Quick Brown FOX
The Quick Brown Fox
The Quick Brown Fox
The FOX
FOX
QUICK

Blog Post Referenced

引用的博客文章

回答by Zar Shardan

Regex is about 10-12 times slower than a simple loop:

正则表达式比简单循环慢 10-12 倍:

    public static string CamelCaseToSpaceSeparated(this string str)
    {
        if (string.IsNullOrEmpty(str))
        {
            return str;
        }

        var res = new StringBuilder();

        res.Append(str[0]);
        for (var i = 1; i < str.Length; i++)
        {
            if (char.IsUpper(str[i]))
            {
                res.Append(' ');
            }
            res.Append(str[i]);

        }
        return res.ToString();
    }

回答by Ferruccio

string s = "ThisIsMyCapsDelimitedString";
string t = Regex.Replace(s, "([A-Z])", " ").Substring(1);