Python 计算列表中唯一单词的数量
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/33726361/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Counting the number of unique words in a list
提问by Matt Swift
Using the following code from https://stackoverflow.com/a/11899925, I am able to find if a word is unique or not (by comparing if it was used once or greater than once):
使用https://stackoverflow.com/a/11899925 中的以下代码,我可以确定一个词是否唯一(通过比较它是否使用过一次或多次使用):
helloString = ['hello', 'world', 'world']
count = {}
for word in helloString :
if word in count :
count[word] += 1
else:
count[word] = 1
But, if I were to have a string with hundreds of words, how would I be able to count the number of unique words within that string?
但是,如果我有一个包含数百个单词的字符串,我将如何计算该字符串中唯一单词的数量?
For example, my code has:
例如,我的代码有:
uniqueWordCount = 0
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
count = {}
for word in words :
if word in count :
count[word] += 1
else:
count[word] = 1
How would I be able to set uniqueWordCount
to 6
? Usually, I am really good at solving these types of algorithmic puzzles, but I have been unsuccessful with figuring this one out. I feel as if it is right beneath my nose.
我怎样才能设置uniqueWordCount
为6
?通常,我非常擅长解决这些类型的算法难题,但我一直没有成功解决这个难题。我觉得它就在我的鼻子底下。
采纳答案by Matthew MacGregor
The best way to solve this is to use the set
collection type. A set
is a collection in which all elements are unique. Therefore:
解决这个问题的最好方法是使用set
集合类型。Aset
是一个集合,其中所有元素都是唯一的。所以:
unique = set([ 'one', 'two', 'two'])
len(unique) # is 2
You can use a set from the outset, adding words to it as you go:
您可以从一开始就使用一个集合,并在其中添加单词:
unique.add('three')
This will throw out any duplicates as they are added. Or, you can collect all the elements in a list and pass the list to the set()
function, which will remove the duplicates at that time. The example I provided above shows this pattern:
这将在添加时丢弃任何重复项。或者,您可以收集列表中的所有元素并将列表传递给set()
函数,该函数将删除当时的重复项。我上面提供的示例显示了这种模式:
unique = set([ 'one', 'two', 'two'])
unique.add('three')
# unique now contains {'one', 'two', 'three'}
回答by kay - SE is evil
In your current code you can either increment uniqueWordCount
in the else
case where you already set count[word]
, or just lookup the number of keys in the dictionary: len(count)
.
在当前的代码,你可以增加uniqueWordCount
在else
那里你已经设置的情况下count[word]
,或只是查找字典中键的数量:len(count)
。
If you only want to know the number of unique elements, then get the elements in the set
: len(set(helloString))
如果您只想知道唯一元素的数量,则获取 中的元素set
:len(set(helloString))
回答by Ben Aubin
You have many options for this, I recommend a set, but you can also use a counter, which counts the amount a number shows up, or you can look at the number of keys for the dictionary you made.
你有很多选择,我推荐一个集合,但你也可以使用一个计数器,它计算一个数字出现的数量,或者你可以查看你制作的字典的键数。
Set
放
You can also convert the list to a set, where all elements have to be unique. Not unique elements are discarded:
您还可以将列表转换为集合,其中所有元素都必须是唯一的。不唯一的元素被丢弃:
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
helloSet = set(helloString) #=> ['doing', 'how', 'are', 'world', 'you', 'hello', 'today']
uniqueWordCount = len(set(helloString)) #=> 7
Here's a link to further reading on sets
这是进一步阅读集合的链接
Counter
柜台
You can also use a counter, which can also tell you how often a word was used, if you still need that information.
您还可以使用计数器,如果您仍然需要该信息,它也可以告诉您某个词的使用频率。
from collections import Counter
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
counter = Counter(helloString)
len(counter) #=> 7
counter["world"] #=> 2
Loop
环形
At the end for your loop, you can check the len
of count
, also, you mistyped helloString
as words
:
在循环结束时,您可以检查len
of count
,此外,您输入错误helloString
为words
:
uniqueWordCount = 0
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
count = {}
for word in helloString:
if word in count :
count[word] += 1
else:
count[word] = 1
len(count) #=> 7
回答by NotAnAmbiTurner
I would do this using a set.
我会用一套来做到这一点。
def stuff(helloString):
hello_set = set(helloString)
return len(hello_set)
回答by Paul Rooney
You can use collections.Counter
您可以使用 collections.Counter
helloString = ['hello', 'world', 'world']
from collections import Counter
c = Counter(helloString)
print("There are {} unique words".format(len(c)))
print('They are')
for k, v in c.items():
print(k)
I know the question doesn't specifically ask for this, but to maintain order
我知道这个问题不是专门问这个的,而是为了维持秩序
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
from collections import Counter, OrderedDict
class OrderedCounter(Counter, OrderedDict):
pass
c = OrderedCounter(helloString)
print("There are {} unique words".format(len(c)))
print('They are')
for k, v in c.items():
print(k)
回答by Trey Hunner
I may be misreading the question but I believe the goal is to find all elements which only occur one time in the list.
我可能误读了这个问题,但我相信目标是找到列表中只出现一次的所有元素。
from collections import Counter
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
counter = Counter(helloString)
uniques = [value for value, count in counter.items() if count == 1]
This will give us 6 items because "world" occurs twice in our list:
这将为我们提供 6 个项目,因为“world”在我们的列表中出现了两次:
>>> uniques
['you', 'are', 'doing', 'how', 'today', 'hello']