Python 重复词
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 
原文地址: http://stackoverflow.com/questions/25798674/
Warning: these are provided under cc-by-sa 4.0 license.  You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python Duplicate words
提问by Erwy Lionel
I have a question where I have to count the duplicate words in Python (v3.4.1) and put them in a sentence. I used counter but I don't know how to get the output in this following order. The input is:
我有一个问题,我必须在 Python (v3.4.1) 中计算重复的单词并将它们放在一个句子中。我使用了计数器,但我不知道如何按以下顺序获取输出。输入是:
mysentence = As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality
I made this into a list and sorted it
我把它做成了一个列表并进行了排序
The output is suppose to be this
输出应该是这样的
"As" is repeated 1 time.
"are" is repeated 2 times.
"as" is repeated 3 times.
"certain" is repeated 2 times.
"do" is repeated 1 time.
"far" is repeated 2 times.
"laws" is repeated 1 time.
"mathematics" is repeated 1 time.
"not" is repeated 2 times.
"of" is repeated 1 time.
"reality" is repeated 2 times.
"refer" is repeated 2 times.
"the" is repeated 1 time.
"they" is repeated 3 times.
"to" is repeated 2 times.
I have come to this point so far
到目前为止我已经到了这一步
x=input ('Enter your sentence :')
y=x.split()
y.sort()
for y in sorted(y):
    print (y)
采纳答案by sberry
I can see where you are going with sort, as you can reliably know when you have hit a new word and keep track of counts for each unique word. However, what you really want to do is use a hash (dictionary) to keep track of the counts as dictionary keys are unique. For example:
我可以通过 sort 看到你要去哪里,因为你可以可靠地知道你什么时候遇到了一个新单词并跟踪每个唯一单词的计数。但是,您真正想要做的是使用哈希(字典)来跟踪计数,因为字典键是唯一的。例如:
words = sentence.split()
counts = {}
for word in words:
    if word not in counts:
        counts[word] = 0
    counts[word] += 1
Now that will give you a dictionary where the key is the word and the value is the number of times it appears.  There are things you can do like using collections.defaultdict(int)so you can just add the value:
现在这会给你一个字典,其中键是单词,值是它出现的次数。您可以使用一些方法collections.defaultdict(int)来添加值:
counts = collections.defaultdict(int)
for word in words:
    counts[word] += 1
But there is even something better than that... collections.Counterwhich will take your list of words and turn it into a dictionary (an extension of dictionary actually) containing the counts.
但还有比这更好的东西......collections.Counter它将把你的单词列表变成一个包含计数的字典(实际上是字典的扩展)。
counts = collections.Counter(words)
From there you want the list of words in sorted order with their counts so you can print them.  items()will give you a list of tuples, and sortedwill sort (by default) by the first item of each tuple (the word in this case)... which is exactly what you want.
从那里您需要按排序顺序排列的单词列表及其计数,以便您可以打印它们。  items()将为您提供一个元组列表,并将sorted按每个元组的第一项(在本例中为单词)排序(默认情况下)......这正是您想要的。
import collections
sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
words = sentence.split()
word_counts = collections.Counter(words)
for word, count in sorted(word_counts.items()):
    print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
OUTPUT
输出
"As" is repeated 1 time.
"are" is repeated 2 times.
"as" is repeated 3 times.
"certain" is repeated 2 times.
"do" is repeated 1 time.
"far" is repeated 2 times.
"laws" is repeated 1 time.
"mathematics" is repeated 1 time.
"not" is repeated 2 times.
"of" is repeated 1 time.
"reality" is repeated 2 times.
"refer" is repeated 2 times.
"the" is repeated 1 time.
"they" is repeated 3 times.
"to" is repeated 2 times.
回答by isamert
Here is a very bad example of doing this without using anything other than lists:
这是一个非常糟糕的例子,它不使用列表以外的任何东西:
x = "As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"
words = x.split(" ")
words.sort()
words_copied = x.split(" ")
words_copied.sort()
for word in words:
    count = 0
    while(True):
        try:
            index = words_copied.index(word)
            count += 1
            del words_copied[index]
        except ValueError:
            if count is not 0:
                print(word + " is repeated " + str(count) + " times.")
            break
EDIT: Here is a much better way:
编辑:这是一个更好的方法:
x = "As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"
words = x.split(" ")
words.sort()
last_word = ""
for word in words:
    if word != last_word:
        count = [i for i, w in enumerate(words) if w == word]
        print(word + " is repeated " + str(len(count)) + " times.")
    last_word = word
回答by jfs
To print word duplicates from a string in the sorted order:
要按排序顺序从字符串中打印单词重复项:
from itertools import groupby 
mysentence = ("As far as the laws of mathematics refer to reality "
              "they are not certain as far as they are certain "
              "they do not refer to reality")
words = mysentence.split() # get a list of whitespace-separated words
for word, duplicates in groupby(sorted(words)): # sort and group duplicates
    count = len(list(duplicates)) # count how many times the word occurs
    print('"{word}" is repeated {count} time{s}'.format(
            word=word, count=count,  s='s'*(count > 1)))
Output
输出
"As" is repeated 1 time "are" is repeated 2 times "as" is repeated 3 times "certain" is repeated 2 times "do" is repeated 1 time "far" is repeated 2 times "laws" is repeated 1 time "mathematics" is repeated 1 time "not" is repeated 2 times "of" is repeated 1 time "reality" is repeated 2 times "refer" is repeated 2 times "the" is repeated 1 time "they" is repeated 3 times "to" is repeated 2 times
回答by HimanshuGahlot
Hey i have tried it on python 2.7(mac) as i have that version so try to get hold of the logic
嘿,我已经在 python 2.7(mac) 上试过了,因为我有那个版本,所以试着掌握逻辑
from collections import Counter
mysentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
mysentence = dict(Counter(mysentence.split()))
for i in sorted(mysentence.keys()):
    print ('"'+i+'" is repeated '+str(mysentence[i])+' time.')
I hope this is what you are looking for if not then ping me up happy to learn something new.
我希望这就是你正在寻找的,如果不是,那么请让我高兴地学习新的东西。
"As" is repeated 1 time.
"are" is repeated 2 time.
"as" is repeated 3 time.
"certain" is repeated 2 time.
"do" is repeated 1 time.
"far" is repeated 2 time.
"laws" is repeated 1 time.
"mathematics" is repeated 1 time.
"not" is repeated 2 time.
"of" is repeated 1 time.
"reality" is repeated 2 time.
"refer" is repeated 2 time.
"the" is repeated 1 time.
"they" is repeated 3 time.
"to" is repeated 2 time.

