在python 3中查找字符串中出现的单词

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/17268958/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:55:20  来源:igfitidea点击:

Finding occurrences of a word in a string in python 3

pythonstringcountmatch

提问by lost9123193

I'm trying to find the number of occurrences of a word in a string.

我试图找到一个单词在字符串中出现的次数。

word = "dog"
str1 = "the dogs barked"

I used the following to count the occurrences:

我使用以下内容来计算出现次数:

count = str1.count(word)

The issue is I want an exact match. So the count for this sentence would be 0. Is that possible?

问题是我想要完全匹配。所以这句话的计数是0。这可能吗?

采纳答案by Amber

If you're going for efficiency:

如果您要提高效率:

import re
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), input_string))

This doesn't need to create any intermediate lists (unlike split()) and thus will work efficiently for large input_stringvalues.

这不需要创建任何中间列表(与 不同split()),因此可以有效地处理大input_string值。

It also has the benefit of working correctly with punctuation - it will properly return 1as the count for the phrase "Mike saw a dog."(whereas an argumentless split()would not). It uses the \bregex flag, which matches on word boundaries (transitions between \wa.k.a [a-zA-Z0-9_]and anything else).

它还具有使用标点符号正确工作的好处 - 它会正确返回1作为短语的计数"Mike saw a dog."(而无参数则split()不会)。它使用\b正则表达式标志,它匹配单词边界(\waka[a-zA-Z0-9_]和其他任何东西之间的转换)。

If you need to worry about languages beyond the ASCII character set, you may need to adjust the regex to properly match non-word characters in those languages, but for many applications this would be an overcomplication, and in many other cases setting the unicode and/or locale flags for the regex would suffice.

如果您需要担心 ASCII 字符集以外的语言,您可能需要调整正则表达式以正确匹配这些语言中的非单词字符,但对于许多应用程序来说,这将过于复杂,并且在许多其他情况下设置 unicode 和/ 或正则表达式的语言环境标志就足够了。

回答by TerryA

Use a list comprehension:

使用列表理解:

>>> word = "dog"
>>> str1 = "the dogs barked"
>>> sum(i == word for word in str1.split())
0

>>> word = 'dog'
>>> str1 = 'the dog barked'
>>> sum(i == word for word in str1.split())
1

split()returns a list of all the words in a sentence. Then we use a list comprehension to count how many times the word appears in a sentence.

split()返回一个句子中所有单词的列表。然后我们使用列表理解来计算单词在句子中出现的次数。

回答by grc

You can use str.split()to convert the sentence to a list of words:

您可以使用str.split()将句子转换为单词列表:

a = 'the dogs barked'.split()

This will create the list:

这将创建列表:

['the', 'dogs', 'barked']

You can then count the number of exact occurrences using list.count():

然后,您可以使用list.count()以下方法计算精确出现的次数:

a.count('dog')  # 0
a.count('dogs') # 1

If it needs to work with punctuation, you can use regular expressions. For example:

如果需要使用标点符号,可以使用正则表达式。例如:

import re
a = re.split(r'\W', 'the dogs barked.')
a.count('dogs') # 1

回答by Lennart Regebro

You need to split the sentence into words. For you example you can do that with just

您需要将句子拆分为单词。对于你的例子,你可以做到这一点

words = str1.split()

But for real word usage you need something more advanced that also handles punctuation. For most western languages you can get away with replacing all punctuation with spaces before doing str1.split().

但是对于真正的单词使用,您需要更高级的东西来处理标点符号。对于大多数西方语言,您可以在执行str1.split().

This will work for English as well in simple cases, but note that "I'm" will be split into two words: "I" and "m", and it should in fact be split into "I" and "am". But this may be overkill for this application.

在简单的情况下,这也适用于英语,但请注意,“I'm”将拆分为两个词:“I”和“m”,实际上应该拆分为“I”和“am”。但这对于这个应用程序来说可能有点矫枉过正。

For other cases such as Asian language, or actual real world usage of English, you might want to use a library that does the word splitting for you.

对于其他情况,例如亚洲语言或英语的实际实际使用情况,您可能需要使用为您进行分词的库。

Then you have a list of words, and you can do

然后你有一个单词列表,你可以做

count = words.count(word)

回答by Aaron

import re

word = "dog"
str = "the dogs barked"
print len(re.findall(word, str))

回答by abhay goyan

Below is a simple example where we can replace the desired word with the new word and also for desired number of occurrences:

下面是一个简单的例子,我们可以用新词替换所需的词,也可以替换出现的次数:

import string

def censor(text, word):<br>
    newString = text.replace(word,"+" * len(word),text.count(word))
    print newString

print censor("hey hey hey","hey")

output will be : +++ +++ +++

输出将是: +++ +++ +++

The first Parameter in function is search_string. Second one is new_string which is going to replace your search_string. Third and last is number of occurrences .

函数中的第一个参数是search_string。第二个是 new_string,它将替换您的 search_string。第三个也是最后一个是出现次数。

回答by Maxx Selva K

Let us consider the example s = "suvotisuvojitsuvo". If you want to count no of distinct count "suvo" and "suvojit" then you use the count() method... count distinct i.e) you don't count the suvojit to suvo.. only count the lonely "suvo".

让我们考虑一下这个例子s = "suvotisuvojitsuvo"。如果你想计算不同计数“suvo”和“suvojit”的数量,那么你使用 count() 方法......计数不同即)你不计算 suvojit 到 suvo ......只计算孤独的“suvo”。

suvocount = s.count("suvo") // #output: 3
suvojitcount = s.count("suvojit") //# output : 1

Then find the lonely suvo count you have to negate from the suvojit count.

然后找到您必须从 suvojit 计数中否定的孤独 suvo 计数。

lonelysuvo = suvocount - suvojicount //# output: 3-1 -> 2

回答by roger

This would be my solution with help of the comments:

这将是我在评论的帮助下的解决方案:

word = str(input("type the french word chiens in english:"))
str1 = "dogs"
times = int(str1.count(word))
if times >= 1:
    print ("dogs is correct")
else:
    print ("your wrong")

回答by Eng.Boniphace Udoya

    #counting the number of words in the text
def count_word(text,word):
    """
    Function that takes the text and split it into word
    and counts the number of occurence of that word
    input: text and word
    output: number of times the word appears
    """
    answer = text.split(" ")
    count = 0
    for occurence in answer:
        if word == occurence:
            count = count + 1
    return count

sentence = "To be a programmer you need to have a sharp thinking brain"
word_count = "a"
print(sentence.split(" "))
print(count_word(sentence,word_count))

#output
>>> %Run test.py
['To', 'be', 'a', 'programmer', 'you', 'need', 'to', 'have', 'a', 'sharp', 'thinking', 'brain']
2
>>> 

Create the function that takes two inputs which are sentence of text and word. Split the text of a sentence into the segment of words in a list, Then check whether the word to be counted exist in the segmented words and count the occurrence as a return of the function.

创建接受两个输入的函数,即文本句子和单词。将一个句子的文本拆分成一个列表中的词段,然后检查要统计的词是否存在于被分词的词中,并统计出现次数作为函数的返回值。

回答by HaSeeB MiR

If you don't need RegularExpressionthen you can do this neat trick.

如果你不需要RegularExpression那么你可以做这个巧妙的技巧

word = " is " #Add space at trailing and leading sides.
input_string = "This is some random text and this is str which is mutable"
print("Word count : ",input_string.count(word))
Output -- Word count :  3