Python Titlecasing 字符串与异常

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3728655/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 12:27:10  来源:igfitidea点击:

Titlecasing a string with exceptions

pythonstringtitle-case

提问by yassin

Is there a standard way in Python to titlecase a string (i.e. words start with uppercase characters, all remaining cased characters have lowercase) but leaving articles like and, in, and oflowercased?

有没有在Python的标准方式标题字符的字符串(即词开始大写字符,所有剩余的套管字符有小写),但像离开的文章andinof小写?

采纳答案by dheerosaur

There are a few problems with this. If you use split and join, some white space characters will be ignored. The built-in capitalize and title methods do not ignore white space.

这有几个问题。如果使用 split 和 join,一些空白字符将被忽略。内置的大写和标题方法不会忽略空格。

>>> 'There     is a way'.title()
'There     Is A Way'

If a sentence starts with an article, you do not want the first word of a title in lowercase.

如果句子以文章开头,您不希望标题的第一个单词是小写的。

Keeping these in mind:

牢记这些:

import re 
def title_except(s, exceptions):
    word_list = re.split(' ', s)       # re.split behaves as expected
    final = [word_list[0].capitalize()]
    for word in word_list[1:]:
        final.append(word if word in exceptions else word.capitalize())
    return " ".join(final)

articles = ['a', 'an', 'of', 'the', 'is']
print title_except('there is a    way', articles)
# There is a    Way
print title_except('a whim   of an elephant', articles)
# A Whim   of an Elephant

回答by nosklo

There are these methods:

有这些方法:

>>> mytext = u'i am a foobar bazbar'
>>> print mytext.capitalize()
I am a foobar bazbar
>>> print mytext.title()
I Am A Foobar Bazbar

There's no lowercase article option. You'd have to code that yourself, probably by using a list of articles you want to lower.

没有小写文章选项。您必须自己编写代码,可能是使用要降低的文章列表。

回答by pyfunc

capitalize (word)

This should do. I get it differently.

这个应该可以。我的理解不同。

>>> mytext = u'i am a foobar bazbar'
>>> mytext.capitalize()
u'I am a foobar bazbar'
>>>

Ok as said in reply above, you have to make a custom capitalize:

好的,正如上面的回复所说,您必须自定义大写:

mytext = u'i am a foobar bazbar'

mytext = u'i am a foobar bazbar'

def xcaptilize(word):
    skipList = ['a', 'an', 'the', 'am']
    if word not in skipList:
        return word.capitalize()
    return word

k = mytext.split(" ") 
l = map(xcaptilize, k)
print " ".join(l)   

This outputs

这输出

I am a Foobar Bazbar

回答by Tony Veijalainen

 not_these = ['a','the', 'of']
thestring = 'the secret of a disappointed programmer'
print ' '.join(word
               if word in not_these
               else word.title()
               for word in thestring.capitalize().split(' '))
"""Output:
The Secret of a Disappointed Programmer
"""

The title starts with capitalized word and that does not match the article.

标题以大写单词开头,与文章不符。

回答by Etienne

Use the titlecase.pymodule! Works only for English.

使用titlecase.py模块!仅适用于英语。

>>> from titlecase import titlecase
>>> titlecase('i am a foobar bazbar')
'I Am a Foobar Bazbar'

GitHub: https://github.com/ppannuto/python-titlecase

GitHub: https://github.com/ppannuto/python-titlecase

回答by BioGeek

Stuart Colville has made a Python portof a Perl script written by John Gruberto convert strings into title case but avoids capitalizing small words based on rules from the New York Times Manual of style, as well as catering for several special cases.

Stuart Colville 制作了由 John Gruber 编写的 Perl 脚本的 Python 端口,用于将字符串转换为标题大小写,但避免根据纽约时报风格手册中的规则将小词大写,以及迎合一些特殊情况。

Some of the cleverness of these scripts:

这些脚本的一些聪明之处:

  • they capitalizes small words like if, in, of, on, etc., but will un-capitalize them if they're erroneously capitalized in the input.

  • the scripts assume that words with capitalized letters other than the first character are already correctly capitalized. This means they will leave a word like “iTunes” alone, rather than mangling it into “ITunes” or, worse, “Itunes”.

  • they skip over any words with line dots; “example.com” and “del.icio.us” will remain lowercase.

  • they have hard-coded hacks specifically to deal with odd cases, like “AT&T” and “Q&A”, both of which contain small words (at and a) which normally should be lowercase.

  • The first and last word of the title are always capitalized, so input such as “Nothing to be afraid of” will be turned into “Nothing to Be Afraid Of”.

  • A small word after a colon will be capitalized.

  • 它们将小词大写,如if、in、of、on等,但如果它们在输入中错误地大写,则会取消大写。

  • 脚本假定第一个字符以外的大写字母的单词已经正确大写。这意味着他们将单独留下“iTunes”之类的词,而不是将其改写为“iTunes”,或者更糟的是“iTunes”。

  • 他们跳过任何带有线点的单词;“example.com”和“del.icio.us”将保持小写。

  • 他们有专门用于处理奇怪情况的硬编码技巧,例如“AT&T”和“Q&A”,它们都包含通常应该是小写的小词(at 和 a)。

  • 标题的第一个和最后一个单词总是大写,所以像“没什么好怕的”这样的输入会变成“没什么好怕的”。

  • 冒号后的小词将大写。

You can download it here.

你可以在这里下载。

回答by boatcoder

Python 2.7's title method has a flaw in it.

Python 2.7 的 title 方法存在缺陷。

value.title()

will return Carpenter'SAssistant when value is Carpenter'sAssistant

将返回匠“小号助理当值匠”小号助理

The best solution is probably the one from @BioGeek using titlecase from Stuart Colville. Which is the same solution proposed by @Etienne.

最好的解决方案可能是来自@BioGeek 的使用 Stuart Colville 的 titlecase 的解决方案。这与@Etienne 提出的解决方案相同。

回答by user7297223

One-liner using list comprehension and the ternary operator

使用列表推导式和三元运算符的单行

reslt = " ".join([word.title() if word not in "the a on in of an" else word for word in "Wow, a python one liner for titles".split(" ")])
print(reslt)

Breakdown:

分解:

for word in "Wow, a python one liner for titles".split(" ")Splits the string into an list and initiates a for loop (in the list comprehenstion)

for word in "Wow, a python one liner for titles".split(" ")将字符串拆分为列表并启动 for 循环(在列表理解中)

word.title() if word not in "the a on in of an" else worduses native method title()to title case the string if it's not an article

word.title() if word not in "the a on in of an" else wordtitle()如果字符串不是文章,则使用本机方法为字符串命名

" ".joinjoins the list elements with a seperator of (space)

" ".join用(空格)分隔符连接列表元素

回答by August West

One important case that is not being considered is acronyms (the python-titlecase solution can handle acronyms if you explicitly provide them as exceptions). I prefer instead to simply avoid down-casing. With this approach, acronyms that are already upper case remain in upper case. The following code is a modification of that originally provided by dheerosaur.

一个未被考虑的重要情况是首字母缩略词(如果您明确将首字母缩略词作为例外提供,python-titlecase 解决方案可以处理它们)。我更喜欢简单地避免向下套管。使用这种方法,已经是大写的首字母缩略词保持大写。以下代码是对最初由dheerosaur提供的修改。

# This is an attempt to provide an alternative to ''.title() that works with 
# acronyms.
# There are several tricky cases to worry about in typical order of importance:
# 0. Upper case first letter of each word that is not an 'minor' word.
# 1. Always upper case first word.
# 2. Do not down case acronyms
# 3. Quotes
# 4. Hyphenated words: drive-in
# 5. Titles within titles: 2001 A Space Odyssey
# 6. Maintain leading spacing
# 7. Maintain given spacing: This is a test.  This is only a test.

# The following code addresses 0-3 & 7.  It was felt that addressing the others 
# would add considerable complexity.


def titlecase(
    s,
    exceptions = (
        'and', 'or', 'nor', 'but', 'a', 'an', 'and', 'the', 'as', 'at', 'by',
        'for', 'in', 'of', 'on', 'per', 'to'
    )
):
    words = s.strip().split(' ')
        # split on single space to maintain word spacing
        # remove leading and trailing spaces -- needed for first word casing

    def upper(s):
        if s:
            if s[0] in '‘“"??' + "'":
                return s[0] + upper(s[1:])
            return s[0].upper() + s[1:]
        return ''

    # always capitalize the first word
    first = upper(words[0])

    return ' '.join([first] + [
        word if word.lower() in exceptions else upper(word)
        for word in words[1:]
    ])


cases = '''
    CDC warns about "aggressive" rats as  shuts down restaurants
    L.A. County opens churches, stores, pools, drive-in theaters
    UConn senior accused of killing two men was looking for young woman
    Giant asteroid that killed the dinosaurs slammed into Earth at ‘deadliest possible angle,' study reveals
    Maintain given spacing: This is a test.  This is only a test.
'''.strip().splitlines()

for case in cases:
    print(titlecase(case))

When run, it produces the following:

运行时,它会产生以下结果:

CDC Warns About "Aggressive" Rats as  Shuts Down Restaurants L.A. County Opens Churches, Stores, Pools, Drive-in Theaters
UConn Senior Accused of Killing Two Men Was Looking for Young Woman
Giant Asteroid That Killed the Dinosaurs Slammed Into Earth at ‘Deadliest Possible Angle,' Study Reveals
Maintain Given Spacing: This Is a Test.  This Is Only a Test.