Python 计算列表中唯一单词的数量

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/33726361/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 13:51:06  来源:igfitidea点击:

Counting the number of unique words in a list

pythonpython-3.x

提问by Matt Swift

Using the following code from https://stackoverflow.com/a/11899925, I am able to find if a word is unique or not (by comparing if it was used once or greater than once):

使用https://stackoverflow.com/a/11899925 中的以下代码,我可以确定一个词是否唯一(通过比较它是否使用过一次或多次使用):

helloString = ['hello', 'world', 'world']
count = {}
for word in helloString :
   if word in count :
      count[word] += 1
   else:
      count[word] = 1

But, if I were to have a string with hundreds of words, how would I be able to count the number of unique words within that string?

但是,如果我有一个包含数百个单词的字符串,我将如何计算该字符串中唯一单词的数量?

For example, my code has:

例如,我的代码有:

uniqueWordCount = 0
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
count = {}
for word in words :
   if word in count :
      count[word] += 1
   else:
      count[word] = 1

How would I be able to set uniqueWordCountto 6? Usually, I am really good at solving these types of algorithmic puzzles, but I have been unsuccessful with figuring this one out. I feel as if it is right beneath my nose.

我怎样才能设置uniqueWordCount6?通常,我非常擅长解决这些类型的算法难题,但我一直没有成功解决这个难题。我觉得它就在我的鼻子底下。

采纳答案by Matthew MacGregor

The best way to solve this is to use the setcollection type. A setis a collection in which all elements are unique. Therefore:

解决这个问题的最好方法是使用set集合类型。Aset是一个集合,其中所有元素都是唯一的。所以:

unique = set([ 'one', 'two', 'two']) 
len(unique) # is 2

You can use a set from the outset, adding words to it as you go:

您可以从一开始就使用一个集合,并在其中添加单词:

unique.add('three')

This will throw out any duplicates as they are added. Or, you can collect all the elements in a list and pass the list to the set()function, which will remove the duplicates at that time. The example I provided above shows this pattern:

这将在添加时丢弃任何重复项。或者,您可以收集列表中的所有元素并将列表传递给set()函数,该函数将删除当时的重复项。我上面提供的示例显示了这种模式:

unique = set([ 'one', 'two', 'two'])
unique.add('three')

# unique now contains {'one', 'two', 'three'}

Read more about sets in Python.

阅读有关 Python 中集合的更多信息。

回答by kay - SE is evil

In your current code you can either increment uniqueWordCountin the elsecase where you already set count[word], or just lookup the number of keys in the dictionary: len(count).

在当前的代码,你可以增加uniqueWordCountelse那里你已经设置的情况下count[word],或只是查找字典中键的数量:len(count)

If you only want to know the number of unique elements, then get the elements in the set: len(set(helloString))

如果您只想知道唯一元素的数量,则获取 中的元素setlen(set(helloString))

回答by Ben Aubin

You have many options for this, I recommend a set, but you can also use a counter, which counts the amount a number shows up, or you can look at the number of keys for the dictionary you made.

你有很多选择,我推荐一个集合,但你也可以使用一个计数器,它计算一个数字出现的数量,或者你可以查看你制作的字典的键数。



Set

You can also convert the list to a set, where all elements have to be unique. Not unique elements are discarded:

您还可以将列表转换为集合,其中所有元素都必须是唯一的。不唯一的元素被丢弃:

helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
helloSet = set(helloString) #=> ['doing', 'how', 'are', 'world', 'you', 'hello', 'today']
uniqueWordCount = len(set(helloString)) #=> 7

Here's a link to further reading on sets

这是进一步阅读集合的链接

Counter

柜台

You can also use a counter, which can also tell you how often a word was used, if you still need that information.

您还可以使用计数器,如果您仍然需要该信息,它也可以告诉您某个词的使用频率。

from collections import Counter

helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
counter = Counter(helloString)
len(counter) #=> 7
counter["world"] #=> 2

Loop

环形

At the end for your loop, you can check the lenof count, also, you mistyped helloStringas words:

在循环结束时,您可以检查lenof count,此外,您输入错误helloStringwords

uniqueWordCount = 0
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
count = {}
for word in helloString:
   if word in count :
      count[word] += 1
   else:
      count[word] = 1
len(count) #=> 7

回答by NotAnAmbiTurner

I would do this using a set.

我会用一套来做到这一点。

def stuff(helloString):
    hello_set = set(helloString)
    return len(hello_set)

回答by Paul Rooney

You can use collections.Counter

您可以使用 collections.Counter

helloString = ['hello', 'world', 'world']

from collections import Counter

c = Counter(helloString)

print("There are {} unique words".format(len(c)))
print('They are')

for k, v in c.items():
    print(k)

I know the question doesn't specifically ask for this, but to maintain order

我知道这个问题不是专门问这个的,而是为了维持秩序

helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    pass

c = OrderedCounter(helloString)

print("There are {} unique words".format(len(c)))
print('They are')

for k, v in c.items():
    print(k)

回答by Trey Hunner

I may be misreading the question but I believe the goal is to find all elements which only occur one time in the list.

我可能误读了这个问题,但我相信目标是找到列表中只出现一次的所有元素。

from collections import Counter
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
counter = Counter(helloString)
uniques = [value for value, count in counter.items() if count == 1]

This will give us 6 items because "world" occurs twice in our list:

这将为我们提供 6 个项目,因为“world”在我们的列表中出现了两次:

>>> uniques
['you', 'are', 'doing', 'how', 'today', 'hello']