C# 在大写字母前添加空格

Question

提问by Bob

Given the string "ThisStringHasNoSpacesButItDoesHaveCapitals" what is the best way to add spaces before the capital letters. So the end string would be "This String Has No Spaces But It Does Have Capitals"

鉴于字符串“ThisStringHasNoSpacesButItDoesHaveCapitals”，在大写字母前添加空格的最佳方法是什么。所以结束字符串将是“这个字符串没有空格但它有大写字母”

Here is my attempt with a RegEx

这是我尝试使用 RegEx

System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " string AddSpacesToSentence(string text, bool preserveAcronyms)
{
        if (string.IsNullOrWhiteSpace(text))
           return string.Empty;
        StringBuilder newText = new StringBuilder(text.Length * 2);
        newText.Append(text[0]);
        for (int i = 1; i < text.Length; i++)
        {
            if (char.IsUpper(text[i]))
                if ((text[i - 1] != ' ' && !char.IsUpper(text[i - 1])) ||
                    (preserveAcronyms && char.IsUpper(text[i - 1]) && 
                     i < text.Length - 1 && !char.IsUpper(text[i + 1])))
                    newText.Append(' ');
            newText.Append(text[i]);
        }
        return newText.ToString();
}
")

Answer 1

采纳答案by Binary Worrier

The regexes will work fine (I even voted up Martin Browns answer), but they are expensive (and personally I find any pattern longer than a couple of characters prohibitively obtuse)

正则表达式可以正常工作（我什至投票支持 Martin Browns 的答案），但它们很昂贵（而且我个人发现任何比几个字符长的模式都非常钝）

This function

这个功能

if (char.IsUpper(text[i]))
    if (char.IsUpper(text[i - 1]))
        if (preserveAcronyms && i < text.Length - 1 && !char.IsUpper(text[i + 1]))
            newText.Append(' ');
        else ;
    else if (text[i - 1] != ' ')
        newText.Append(' ');

Will do it 100,000 times in 2,968,750 ticks, the regex will take 25,000,000 ticks (and thats with the regex compiled).

将在 2,968,750 个滴答中执行 100,000 次，正则表达式将花费 25,000,000 个滴答（这与编译的正则表达式相同）。

It's better, for a given value of better (i.e. faster) however it's more code to maintain. "Better" is often compromise of competing requirements.

它更好，对于给定的更好（即更快）值，但是需要维护更多代码。“更好”通常是对竞争要求的妥协。

Hope this helps :)

希望这可以帮助：）

Update
It's a good long while since I looked at this, and I just realised the timings haven't been updated since the code changed (it only changed a little).

更新
自从我看到这个已经很久了，我才意识到自从代码改变以来时间没有更新（它只是改变了一点）。

On a string with 'Abbbbbbbbb' repeated 100 times (i.e. 1,000 bytes), a run of 100,000 conversions takes the hand coded function 4,517,177 ticks, and the Regex below takes 59,435,719 making the Hand coded function run in 7.6% of the time it takes the Regex.

在 'Abbbbbbbbb' 重复 100 次（即 1,000 字节）的字符串上，运行 100,000 次转换需要手动编码函数 4,517,177 个滴答声，下面的正则表达式需要 59,435,719 次，使得手动编码函数运行时间为 7.6%正则表达式。

Update 2Will it take Acronyms into account? It will now! The logic of the if statment is fairly obscure, as you can see expanding it to this ...

更新 2会考虑 Acronyms 吗？现在会了！if 语句的逻辑相当晦涩，正如您所看到的，将其扩展为...

string AddSpacesToSentence(string text)
{
        if (string.IsNullOrWhiteSpace(text))
           return "";
        StringBuilder newText = new StringBuilder(text.Length * 2);
        newText.Append(text[0]);
        for (int i = 1; i < text.Length; i++)
        {
            if (char.IsUpper(text[i]) && text[i - 1] != ' ')
                newText.Append(' ');
            newText.Append(text[i]);
        }
        return newText.ToString();
}

... doesn't help at all!

……一点用都没有！

Here's the original simplemethod that doesn't worry about Acronyms

这是不用担心 Acronyms的原始简单方法

value = System.Text.RegularExpressions.Regex.Replace(value, "[A-Z]", " " This String..." instead of "This String..."
");

Answer 2

回答by Bill the Lizard

What you have works perfectly. Just remember to reassign valueto the return value of this function.

你所拥有的工作完美。只要记住重新分配value给这个函数的返回值。

newValue = Regex.Replace(value, "([a-z])([A-Z])", " ");

Answer 3

回答by Martin Brown

Your solution has an issue in that it puts a space before the first letter T so you get

您的解决方案有一个问题，它在第一个字母 T 之前放置了一个空格，因此您得到

newValue = Regex.Replace(value, @"((?<=\p{Ll})\p{Lu})|((?!\A)\p{Lu}(?>\p{Ll}))", " private string SplitCamelCase(string s) 
{ 
    Regex upperCaseRegex = new Regex(@"[A-Z]{1}[a-z]*"); 
    MatchCollection matches = upperCaseRegex.Matches(s); 
    List<string> words = new List<string>(); 
    foreach (Match match in matches) 
    { 
        words.Add(match.Value); 
    } 
    return String.Join(" ", words.ToArray()); 
}
");

To get around this look for the lower case letter preceding it as well and then insert the space in the middle:

要解决此问题，请同时查找前面的小写字母，然后在中间插入空格：

public static string AddSpacesToSentence(string text)
{
    if (string.IsNullOrEmpty(text))
        return "";
    StringBuilder newText = new StringBuilder(text.Length * 2);
    newText.Append(text[0]);
            for (int i = 1; i < result.Length; i++)
            {
                if (char.IsUpper(result[i]) && !char.IsUpper(result[i - 1]))
                {
                    newText.Append(' ');
                }
                else if (i < result.Length)
                {
                    if (char.IsUpper(result[i]) && !char.IsUpper(result[i + 1]))
                        newText.Append(' ');

                }
                newText.Append(result[i]);
            }
    return newText.ToString();
}

Edit 1:

编辑1：

If you use @"(\p{Ll})(\p{Lu})"it will pick up accented characters as well.

如果您使用@"(\p{Ll})(\p{Lu})"它，它也会拾取重音字符。

Edit 2:

编辑2：

If your strings can contain acronyms you may want to use this:

如果您的字符串可以包含首字母缩略词，您可能需要使用：

var val = "ThisIsAStringToTest";
val = string.Concat(val.Select(x => Char.IsUpper(x) ? " " + x : x.ToString())).TrimStart(' ');

So "DriveIsSCSICompatible" becomes "Drive Is SCSI Compatible"

所以“DriveIsSCSICompatible”变成“Drive Is SCSI Compatible”

Answer 4

回答by Cory Foy

Here's mine:

这是我的：

Regex.Replace(value, @"\B[A-Z]", " Testing TheLoneRanger
               Worst:    The_Lone_Ranger
               Ok:       The_Lone_Ranger
               Better:   The_Lone_Ranger
               Best:     The_Lone_Ranger
Testing MountM?KinleyNationalPark
     [WRONG]   Worst:    Mount_M?Kinley_National_Park
     [WRONG]   Ok:       Mount_M?Kinley_National_Park
     [WRONG]   Better:   Mount_M?Kinley_National_Park
               Best:     Mount_M?_Kinley_National_Park
Testing ElálamoTejano
     [WRONG]   Worst:    Elálamo_Tejano
               Ok:       El_álamo_Tejano
               Better:   El_álamo_Tejano
               Best:     El_álamo_Tejano
Testing The?varArnfj?reBjarmason
     [WRONG]   Worst:    The?var_Arnfj?reBjarmason
               Ok:       The_?var_Arnfj?re_Bjarmason
               Better:   The_?var_Arnfj?re_Bjarmason
               Best:     The_?var_Arnfj?re_Bjarmason
Testing IlCaffèMacchiato
     [WRONG]   Worst:    Il_CaffèMacchiato
               Ok:       Il_Caffè_Macchiato
               Better:   Il_Caffè_Macchiato
               Best:     Il_Caffè_Macchiato
Testing Mister?enan?ubovi?
     [WRONG]   Worst:    Mister?enan?ubovi?
     [WRONG]   Ok:       Mister?enan?ubovi?
               Better:   Mister_?enan_?ubovi?
               Best:     Mister_?enan_?ubovi?
Testing OleKingHenryⅧ
     [WRONG]   Worst:    Ole_King_HenryⅧ
     [WRONG]   Ok:       Ole_King_HenryⅧ
     [WRONG]   Better:   Ole_King_HenryⅧ
               Best:     Ole_King_Henry_Ⅷ
Testing CarlosⅤoElEmperador
     [WRONG]   Worst:    CarlosⅤoEl_Emperador
     [WRONG]   Ok:       CarlosⅤo_El_Emperador
     [WRONG]   Better:   CarlosⅤo_El_Emperador
               Best:     Carlos_Ⅴo_El_Emperador
")

Answer 5

回答by Richard Priddy

Binary Worrier, I have used your suggested code, and it is rather good, I have just one minor addition to it:

Binary Worrier，我已经使用了你建议的代码，它相当不错，我只有一个小补充：

#!/usr/bin/env perl
use utf8;
use strict;
use warnings;

# First I'll prove these are fine variable names:
my (
    $TheLoneRanger              ,
    $MountM?KinleyNationalPark  ,
    $ElálamoTejano              ,
    $The?varArnfj?reBjarmason   ,
    $IlCaffèMacchiato           ,
    $Mister?enan?ubovi?         ,
    $OleKingHenryⅧ              ,
    $CarlosⅤoElEmperador        ,
);

# Now I'll load up some string with those values in them:
my @strings = qw{
    TheLoneRanger
    MountM?KinleyNationalPark
    ElálamoTejano
    The?varArnfj?reBjarmason
    IlCaffèMacchiato
    Mister?enan?ubovi?
    OleKingHenryⅧ
    CarlosⅤoElEmperador
};

my($new, $best, $ok);
my $mask = "  %10s   %-8s  %s\n";

for my $old (@strings) {
    print "Testing $old\n";
    ($best = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;

    ($new = $old) =~ s/(?<=[a-z])(?=[A-Z])/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Worst:", $new;

    ($new = $old) =~ s/(?<=\p{Ll})(?=\p{Lu})/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Ok:", $new;

    ($new = $old) =~ s/(?<=\p{Ll})(?=[\p{Lu}\p{Lt}])/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Better:", $new;

    ($new = $old) =~ s/(?<=\p{Lowercase})(?=[\p{Uppercase}\p{Lt}])/_/g;
    $ok = ($new ne $best) && "[WRONG]";
    printf $mask, $ok, "Best:", $new;
}

I have added a condition !char.IsUpper(text[i - 1]). This fixed a bug that would cause something like 'AverageNOX' to be turned into 'Average N O X', which is obviously wrong, as it should read 'Average NOX'.

我添加了一个条件!char.IsUpper(text[i - 1])。这修复了一个错误，该错误会导致诸如“AverageNOX”之类的内容变成“Average NOX”，这显然是错误的，因为它应该显示为“Average NOX”。

Sadly this still has the bug that if you have the text 'FromAStart', you would get 'From AStart' out.

可悲的是，这仍然存在一个错误，即如果您有文本“FromAStart”，您会得到“From AStart”。

Any thoughts on fixing this?

关于解决这个问题的任何想法？

Answer 6

回答by EtienneT

Didn't test performance, but here in one line with linq:

没有测试性能，但这里与 linq 一致：

replaceAll("(?<=[^^\p{Uppercase}])(?=[\p{Uppercase}])"," ");

Answer 7

回答by Justin Morgan

Make sure you aren'tputting spaces at the beginning of the string, but you areputting them between consecutive capitals. Some of the answers here don't address one or both of those points. There are other ways than regex, but if you prefer to use that, try this:

确保你没有在字符串的开头插入空格，但你是把它们连续的首都之间。这里的一些答案没有解决这些问题中的一个或两个。除了正则表达式还有其他方法，但如果您更喜欢使用它，请尝试以下方法：

"((?<=\p{Ll})\p{Lu})|((?!\A)\p{Lu}(?>\p{Ll}))|((?<=[\p{Ll}\p{Lu}])\p{Nd})|((?<=\p{Nd})\p{Lu})"

The \Bis a negated \b, so it represents a non-word-boundary. It means the pattern matches "Y" in XYzabc, but not in Yzabcor X Yzabc. As a little bonus, you can use this on a string with spaces in it and it won't double them.

该\B是否定的\b，所以它代表了一个非单词边界。这意味着模式“Y”匹配XYzabc，但不是在Yzabc或X Yzabc。作为一个小奖励，您可以在带有空格的字符串上使用它，并且不会将它们加倍。

Answer 8

回答by tchrist

Welcome to Unicode

欢迎使用 Unicode

All these solutions are essentially wrong for modern text. You need to use something that understands case. Since Bob asked for other languages, I'll give a couple for Perl.

所有这些解决方案对于现代文本来说本质上都是错误的。你需要使用理解大小写的东西。由于 Bob 要求使用其他语言，我将为 Perl 提供一些。

I provide four solutions, ranging from worst to best. Only the best one is always right. The others have problems. Here is a test run to show you what works and what doesn't, and where. I've used underscores so that you can see where the spaces have been put, and I've marked as wrong anything that is, well, wrong.

我提供了四种解决方案，从最差到最好。只有最好的永远是对的。其他人有问题。这是一个测试运行，向您展示什么有效，什么无效，以及在哪里。我使用了下划线，以便您可以看到放置空格的位置，并且我已将任何错误标记为错误。

##代码##

BTW, almost everyone here has selected the first way, the one marked "Worst". A few have selected the second way, marked "OK". But no one else before me has shown you how to do either the "Better" or the "Best" approach.

顺便说一句，这里几乎每个人都选择了第一种方式，标记为“最差”的方式。少数人选择了第二种方式，标记为“OK”。但在我之前没有其他人向您展示过如何采用“更好”或“最佳”方法。

Here is the test program with its four methods:

这是带有四种方法的测试程序：

##代码##

When you can score the same as the "Best" on this dataset, you'll know you've done it correctly. Until then, you haven't. No one else here has done better than "Ok", and most have done it "Worst". I look forward to seeing someone post the correct ?? code.

当您可以在此数据集上获得与“最佳”相同的分数时，您就会知道自己做对了。在那之前，你还没有。这里没有其他人做得比“好”更好，而且大多数人都做得“最差”。我期待看到有人发布正确的帖子？？代码。

I notice that StackOverflow's highlighting code is miserably stoopid again. They're making all the same old lame as (most but not all) of the rest of the poor approaches mentioned here have made. Isn't it long past time to put ASCII to rest? It doens't make sense anymore, and pretending it's all you have is simply wrong. It makes for bad code.

我注意到 StackOverflow 的高亮代码再次变得非常愚蠢。他们正在制造与（大多数但不是全部）这里提到的其他糟糕方法一样的老跛脚。让 ASCII 停止使用是不是已经过去很久了？它不再有意义，假装它就是你所拥有的一切都是错误的。它会产生糟糕的代码。

Answer 9

回答by Randyaa

##代码##

Answer 10

回答by Daryl

In addition to Martin Brown's Answer, I had an issue with numbers as well. For Example: "Location2", or "Jan22" should be "Location 2", and "Jan 22" respectively.

除了 Martin Brown 的回答之外，我也遇到了数字问题。例如：“Location2”或“Jan22”应分别为“Location 2”和“Jan22”。

Here is my Regular Expression for doing that, using Martin Brown's answer:

这是我的正则表达式，使用 Martin Brown 的回答：

##代码##

Here are a couple great sites for figuring out what each part means as well:

这里有几个很棒的网站，用于弄清楚每个部分的含义：

Java Based Regular Expression Analyzer (but works for most .net regex's)

基于 Java 的正则表达式分析器（但适用于大多数 .net 正则表达式）

Action Script Based Analyzer

基于动作脚本的分析器

The above regex won't work on the action script site unless you replace all of the \p{Ll}with [a-z], the \p{Lu}with [A-Z], and \p{Nd}with [0-9].

除非您将所有\p{Ll}with [a-z]、\p{Lu}with[A-Z]和\p{Nd}with替换，否则上述正则表达式将无法在动作脚本站点上使用[0-9]。

C# 在大写字母前添加空格

提问by Bob

采纳答案by Binary Worrier

回答by Bill the Lizard

回答by Martin Brown

回答by Cory Foy

回答by Richard Priddy

回答by EtienneT

回答by Justin Morgan

回答by tchrist

Welcome to Unicode

欢迎使用 Unicode

回答by Randyaa

回答by Daryl

相关推荐

最近更新

标签

C# 在大写字母前添加空格

提问by Bob

采纳答案by Binary Worrier

回答by Bill the Lizard

回答by Martin Brown

回答by Cory Foy

回答by Richard Priddy

回答by EtienneT

回答by Justin Morgan

回答by tchrist

Welcome to Unicode

欢迎使用 Unicode

回答by Randyaa

回答by Daryl

相关推荐

C# 如何确定从母版页显示哪个子页面？

C# 如何在 Linq 结果中添加索引字段

C# ObservableCollection 还监视集合中元素的变化

C# 您可以将数据添加到没有数据源的数据网格吗？

相关推荐

最近更新

标签