有没有人有好的Proper Case算法

时间:2020-03-05 18:44:20  来源:igfitidea点击:

是否有人拥有受信任的适当案例或者PCase算法(类似于UCase或者Upper)?我正在寻找带有" GEORGE BURDELL"或者" George burdell"之类的值并将其转换为" George Burdell"的东西。

我有一个简单的例子可以处理简单的情况。理想的情况是拥有可以处理诸如" O'REILLY"之类的东西并将其转换为" O'Reilly"之类的东西,但是我知道这更难。

如果可以简化的话,我主要关注英语。

更新:我正在使用Cas语言,但是我几乎可以转换任何东西(假设存在功能)。

我同意麦当劳的情景很艰难。我想在我的O'Reilly示例中提到这一点,但在原始帖子中却没有提及。

解决方案

回答

我们使用哪种编程语言?许多语言都允许使用回调函数进行正则表达式匹配。这些可以用来轻松进行匹配。使用的正则表达式非常简单,我们只需要匹配所有单词字符,如下所示:

/\w+/

另外,我们也可以提取第一个字符作为额外的匹配项:

/(\w)(\w*)/

现在,我们可以分别访问匹配中的第一个字符和后续字符。然后,回调函数可以简单地返回命中的串联。在伪Python中(我实际上并不了解Python):

def make_proper(match):
    return match[1].to_upper + match[2]

顺便提一句,这也将处理O'Reilly的情况,因为O和Reilly会分别匹配并且都用大写字母表示。但是,还有其他一些特殊情况无法通过算法很好地处理,例如麦当劳(McDonald)或者一般带有撇号的单词。该算法将为后者生成Mcdonald'S。可以对撇号进行特殊处理,但这会干扰第一种情况。不可能找到理想的解决方案。实际上,这可能有助于考虑撇号后的零件长度。

回答

一种大写每个单词的第一个字母(用空格隔开)的简单方法

$words = explode(” “, $string);
for ($i=0; $i<count($words); $i++) {
$s = strtolower($words[$i]);
$s = substr_replace($s, strtoupper(substr($s, 0, 1)), 0, 1);
$result .= “$s “;
}
$string = trim($result);

在捕捉我们给的" O'REILLY"示例方面
在两个空格上分割字符串和'都行不通,因为它将大写撇号后出现的任何字母大写,即Fred的s

所以我可能会尝试像

$words = explode(” “, $string);
for ($i=0; $i<count($words); $i++) {

$s = strtolower($words[$i]);

if (substr($s, 0, 2) === "o'"){
$s = substr_replace($s, strtoupper(substr($s, 0, 3)), 0, 3);
}else{
$s = substr_replace($s, strtoupper(substr($s, 0, 1)), 0, 1);
}
$result .= “$s “;
}
$string = trim($result);

这应该抓住O'Reilly,O'Clock,O'Donnell等希望对我们有所帮助

请注意,此代码未经测试。

回答

我们没有提到想要哪种语言的解决方案,因此这里有一些伪代码。

Loop through each character
    If the previous character was an alphabet letter
        Make the character lower case
    Otherwise
        Make the character upper case
End loop

回答

这个标题框架文本也有这个整洁的Perl脚本。

http://daringfireball.net/2008/08/title_case_update

#!/usr/bin/perl

#     This filter changes all words to Title Caps, and attempts to be clever
# about *un*capitalizing small words like a/an/the in the input.
#
# The list of "small words" which are not capped comes from
# the New York Times Manual of Style, plus 'vs' and 'v'. 
#
# 10 May 2008
# Original version by John Gruber:
# http://daringfireball.net/2008/05/title_case
#
# 28 July 2008
# Re-written and much improved by Aristotle Pagaltzis:
# http://plasmasturm.org/code/titlecase/
#
#   Full change log at __END__.
#
# License: http://www.opensource.org/licenses/mit-license.php
#

use strict;
use warnings;
use utf8;
use open qw( :encoding(UTF-8) :std );

my @small_words = qw( (?<!q&)a an and as at(?!&t) but by en for if in of on or the to v[.]? via vs[.]? );
my $small_re = join '|', @small_words;

my $apos = qr/ (?: ['’] [[:lower:]]* )? /x;

while ( <> ) {
  s{\A\s+}{}, s{\s+\z}{};

  $_ = lc $_ if not /[[:lower:]]/;

  s{
      \b (_*) (?:
          ( (?<=[ ][/\]) [[:alpha:]]+ [-_[:alpha:]/\]+ |   # file path or
            [-_[:alpha:]]+ [@.:] [-_[:alpha:]@.:/]+ $apos )  # URL, domain, or email
          |
          ( (?i: $small_re ) $apos )                         # or small word (case-insensitive)
          |
          ( [[:alpha:]] [[:lower:]'’()\[\]{}]* $apos )       # or word w/o internal caps
          |
          ( [[:alpha:]] [[:alpha:]'’()\[\]{}]* $apos )       # or some other word
      ) (_*) \b
  }{
       . (
        defined  ?          # preserve URL, domain, or email
      : defined  ? "\L"     # lowercase small word
      : defined  ? "\u\L"   # capitalize word w/o internal caps
      :                       # preserve other kinds of word
      ) . 
  }xeg;

  # Exceptions for small words: capitalize at start and end of title
  s{
      (  \A [[:punct:]]*         # start of title...
      |  [:.;?!][ ]+             # or of subsentence...
      |  [ ]['"“‘(\[][ ]*     )  # or of inserted subphrase...
      ( $small_re ) \b           # ... followed by small word
  }{\u\L}xig;

  s{
      \b ( $small_re )      # small word...
      (?= [[:punct:]]* \Z   # ... at the end of the title...
      |   ['"’”)\]] [ ] )   # ... or of an inserted subphrase?
  }{\u\L}xig;

  # Exceptions for small words in hyphenated compound words
  ## e.g. "in-flight" -> In-Flight
  s{
      \b
      (?<! -)                 # Negative lookbehind for a hyphen; we don't want to match man-in-the-middle but do want (in-flight)
      ( $small_re )
      (?= -[[:alpha:]]+)      # lookahead for "-someword"
  }{\u\L}xig;

  ## # e.g. "Stand-in" -> "Stand-In" (Stand is already capped at this point)
  s{
      \b
      (?<!…)                  # Negative lookbehind for a hyphen; we don't want to match man-in-the-middle but do want (stand-in)
      ( [[:alpha:]]+- )       #  = first word and hyphen, should already be properly capped
      ( $small_re )           # ... followed by small word
      (?! - )                 # Negative lookahead for another '-'
  }{\u}xig;

  print "$_";
}

__END__

但这听起来像是通过适当的大小写。..仅适用于人们的名字。

回答

除非我误解了问题,否则我认为我们不需要自己动手,TextInfo类可以为我们完成。

using System.Globalization;

CultureInfo.InvariantCulture.TextInfo.ToTitleCase("GeOrGE bUrdEll")

将返回" George Burdell。如果涉及某些特殊规则,我们可以使用自己的文化。

更新:Michael(在对此答案的评论中)指出,如果输入全为大写,则此方法将不起作用,因为该方法将假定它是首字母缩写词。幼稚的解决方法是在将文本提交给ToTitleCase之前将文本放到.ToLower()处。

回答

这可能是一个幼稚的实现:

public class ProperCaseHelper {
  public string ToProperCase(string input) {
    string ret = string.Empty;

    var words = input.Split(' ');

    for (int i = 0; i < words.Length; ++i) {
      ret += wordToProperCase(words[i]);
      if (i < words.Length - 1) ret += " ";
    }

    return ret;
  }

  private string wordToProperCase(string word) {
    if (string.IsNullOrEmpty(word)) return word;

    // Standard case
    string ret = capitaliseFirstLetter(word);

    // Special cases:
    ret = properSuffix(ret, "'");
    ret = properSuffix(ret, ".");
    ret = properSuffix(ret, "Mc");
    ret = properSuffix(ret, "Mac");

    return ret;
  }

  private string properSuffix(string word, string prefix) {
    if(string.IsNullOrEmpty(word)) return word;

    string lowerWord = word.ToLower(), lowerPrefix = prefix.ToLower();
    if (!lowerWord.Contains(lowerPrefix)) return word;

    int index = lowerWord.IndexOf(lowerPrefix);

    // If the search string is at the end of the word ignore.
    if (index + prefix.Length == word.Length) return word;

    return word.Substring(0, index) + prefix +
      capitaliseFirstLetter(word.Substring(index + prefix.Length));
  }

  private string capitaliseFirstLetter(string word) {
    return char.ToUpper(word[0]) + word.Substring(1).ToLower();
  }
}

回答

克罗诺兹,谢谢。我在函数中发现该行:

`if (!lowerWord.Contains(lowerPrefix)) return word`;

必须说

if (!lowerWord.StartsWith(lowerPrefix)) return word;

因此" informacin"不会更改为" InforMacIn"

最好,

恩里克

回答

我将其用作文本框的textchanged事件处理程序。支持"麦当劳"

Public Shared Function DoProperCaseConvert(ByVal str As String, Optional ByVal allowCapital As Boolean = True) As String
    Dim strCon As String = ""
    Dim wordbreak As String = " ,.1234567890;/\-()#$%^&*!~+=@"
    Dim nextShouldBeCapital As Boolean = True

    'Improve to recognize all caps input
    'If str.Equals(str.ToUpper) Then
    '    str = str.ToLower
    'End If

    For Each s As Char In str.ToCharArray

        If allowCapital Then
            strCon = strCon & If(nextShouldBeCapital, s.ToString.ToUpper, s)
        Else
            strCon = strCon & If(nextShouldBeCapital, s.ToString.ToUpper, s.ToLower)
        End If

        If wordbreak.Contains(s.ToString) Then
            nextShouldBeCapital = True
        Else
            nextShouldBeCapital = False
        End If
    Next

    Return strCon
End Function