Python:查找字符串中最长的单词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/28982305/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python: Find the longest word in a string
提问by Mmt
I'm preparing for an exam but I'm having difficulties with one past-paper question. Given a string containing a sentence, I want to find the longest word in that sentence and return that word and its length. Edit:I only needed to return the length but I appreciate your answers for the original question! It helps me learn more. Thank you.
我正在准备考试,但我在回答一个过去的论文问题时遇到了困难。给定一个包含句子的字符串,我想找到该句子中最长的单词并返回该单词及其长度。编辑:我只需要返回长度,但我感谢您对原始问题的回答!它可以帮助我了解更多。谢谢你。
For example: string = "Hello I like cookies". My program should then return "Cookies" and the length 7.
例如:string = "你好,我喜欢饼干"。然后我的程序应该返回“Cookies”和长度 7。
Now the thing is that I am not allowed to use any function from the class String for a full score, and for a full score I can only go through the string once. I am not allowed to use string.split() (otherwise there wouldn't be any problem) and the solution shouldn't have too many for and while statements. The strings contains only letters and blanks and words are separated by one single blank.
现在的问题是我不允许使用 String 类中的任何函数来获得满分,而对于满分,我只能通过字符串一次。我不允许使用 string.split() (否则不会有任何问题)并且解决方案不应该有太多的 for 和 while 语句。字符串仅包含字母和空格,单词由一个空格分隔。
Any suggestions? I'm lost i.e. I don't have any code.
有什么建议?我迷路了,即我没有任何代码。
Thanks.
谢谢。
EDIT: I'm sorry, I misread the exam question. You only have to return the length of the longest word it seems, not the length + the word.
编辑:对不起,我误读了考试问题。您只需要返回看起来最长的单词的长度,而不是长度 + 单词。
EDIT2: Okay, with your help I think I'm onto something...
EDIT2:好的,在你的帮助下,我想我正在做一些事情......
def longestword(x):
alist = []
length = 0
for letter in x:
if letter != " ":
length += 1
else:
alist.append(length)
length = 0
return alist
But it returns [5, 1, 4] for "Hello I like cookies" so it misses "cookies". Why? EDIT: Ok, I got it. It's because there's no more " " after the last letter in the sentence and therefore it doesn't append the length. I fixed it so now it returns [5, 1, 4, 7] and then I just take the maximum value.
但是它为“你好,我喜欢饼干”返回 [5, 1, 4] 所以它错过了“饼干”。为什么?编辑:好的,我明白了。这是因为在句子中的最后一个字母之后没有更多的“”,因此它不会附加长度。我修复了它,所以现在它返回 [5, 1, 4, 7] 然后我只取最大值。
I suppose using lists but not .split() is okay? It just said that functions from "String" weren't allowed or are lists part of strings?
我想使用列表而不是 .split() 可以吗?它只是说不允许来自“String”的函数或者列表是字符串的一部分?
采纳答案by Francis Colas
Finding a max in one pass is easy:
在一次通过中找到最大值很容易:
current_max = 0
for v in values:
if v>current_max:
current_max = v
But in your case, you need to find the words. Remember this quote (attribute to J. Zawinski):
但在你的情况下,你需要找到这些词。记住这句话(归功于 J. Zawinski):
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
有些人在遇到问题时会想“我知道,我会使用正则表达式”。现在他们有两个问题。
Besides using regular expressions, you can simply check that the word has letters. A first approach is to go through the list and detect start or end of words:
除了使用正则表达式之外,您还可以简单地检查单词是否包含字母。第一种方法是遍历列表并检测单词的开头或结尾:
current_word = ''
current_longest = ''
for c in mystring:
if c in string.ascii_letters:
current_word += c
else:
if len(current_word)>len(current_longest):
current_longest = current_word
current_word = ''
else:
if len(current_word)>len(current_longest):
current_longest = current_word
A final way is to split words in a generator and find the max of what it yields (here I used the max
function):
最后一种方法是在生成器中拆分单词并找到它产生的最大值(这里我使用了该max
函数):
def split_words(mystring):
current = []
for c in mystring:
if c in string.ascii_letters:
current.append(c)
else:
if current:
yield ''.join(current)
max(split_words(mystring), key=len)
回答by Jon Surrell
I can see imagine some different alternatives. Regular expressionscan probably do much of the splitting words you need to do. This could be a simple option if you understand regexes.
我可以想象一些不同的选择。正则表达式可能可以完成您需要做的大部分拆分词。如果您了解正则表达式,这可能是一个简单的选择。
An alternative is to treat the string as a list, iterate over it keeping track of your index, and looking at each character to see if you're ending a word. Then you just need to keep the longest word (longest index difference) and you should find your answer.
另一种方法是将字符串视为一个列表,迭代它以跟踪您的索引,并查看每个字符以查看您是否正在结束一个单词。然后你只需要保留最长的单词(最长索引差),你应该找到你的答案。
回答by Alexandre
You can try to use regular expressions:
您可以尝试使用正则表达式:
import re
string = "Hello I like cookies"
word_pattern = "\w+"
regex = re.compile(word_pattern)
words_found = regex.findall(string)
if words_found:
longest_word = max(words_found, key=lambda word: len(word))
print(longest_word)
回答by Omid
Regular Expressions seems to be your best bet. First use re
to split the sentence:
正则表达式似乎是你最好的选择。首先使用re
拆分句子:
>>> import re
>>> string = "Hello I like cookies"
>>> string = re.findall(r'\S+',string)
\S+
looks for all the non-whitespace characters and puts them in a list:
\S+
查找所有非空白字符并将它们放在一个列表中:
>>> string
['Hello', 'I', 'like', 'cookies']
Now you can find the length of the list element containing the longest word and then use list comprehension to retrieve the element itself:
现在您可以找到包含最长单词的列表元素的长度,然后使用列表理解来检索元素本身:
>>> maxlen = max(len(word) for word in string)
>>> maxlen
7
>>> [word for word in string if len(word) == maxlen]
['cookies']
回答by Brionius
This method uses only one for
loop, doesn't use any methods in the String
class, strictly accesses each character only once. You may have to modify it depending on what characters count as part of a word.
该方法只使用一个for
循环,不使用String
类中的任何方法,严格每个字符只访问一次。您可能需要根据哪些字符算作单词的一部分来修改它。
s = "Hello I like cookies"
word = ''
maxLen = 0
maxWord = ''
for c in s+' ':
if c == ' ':
if len(word) > maxLen:
maxWord = word
word = ''
else:
word += c
print "Longest word:", maxWord
print "Length:", len(maxWord)
回答by Malik Brahimi
Just search for groups of non-whitespace characters, then find the maximum by length:
只需搜索非空白字符组,然后按长度找到最大值:
longest = len(max(re.findall(r'\S+',string), key = len))
回答by spectras
Given you are not allowed to use string.split()
I guess using a regexp to do the exact same thing should be ruled out as well.
鉴于您不允许使用,string.split()
我想也应该排除使用正则表达式来做完全相同的事情。
I do not want to solve your exercise for you, but here are a few pointers:
我不想为你解决你的练习,但这里有一些提示:
- Suppose you have a list of numbers and you want to return the highest value. How would you do that? What information do you need to track?
- Now, given your string, how would you build a list of all word lengths? What do you need to keep track of?
- Now, you only have to intertwine both logics so computed word lengths are compared as you go through the string.
- 假设您有一个数字列表,并且想要返回最高值。你会怎么做?您需要跟踪哪些信息?
- 现在,给定您的字符串,您将如何构建所有字长的列表?你需要跟踪什么?
- 现在,您只需将两个逻辑交织在一起,以便在您浏览字符串时比较计算出的字长。
回答by Jerome Vacher
My proposal ...
我的提议...
import re
def longer_word(sentence):
word_list = re.findall("\w+", sentence)
word_list.sort(cmp=lambda a,b: cmp(len(b),len(a)))
longer_word = word_list[0]
print "The longer word is '"+longer_word+"' with a size of", len(longer_word), "characters."
longer_word("Hello I like cookies")
回答by Nishant Goutham kumar
It's quite simple:
这很简单:
def long_word(s):
n = max(s.split())
return(n)
IN [48]:long_word('a bb ccc dddd')
在 [48] 中:long_word('a bb ccc dddd')
Out[48]:'dddd'
出[48]:'dddd'
回答by Aneesh Kumar
For python 3. If both the words in the sentence is of the same length, then it will return the word that appears first.
对于python 3.如果句子中的两个单词的长度相同,那么它将返回最先出现的单词。
def findMaximum(word):
li=word.split()
li=list(li)
op=[]
for i in li:
op.append(len(i))
l=op.index(max(op))
print (li[l])
findMaximum(input("Enter your word:"))