Python 计算列表中唯一单词的数量

Question

提问by Matt Swift

Using the following code from https://stackoverflow.com/a/11899925, I am able to find if a word is unique or not (by comparing if it was used once or greater than once):

使用https://stackoverflow.com/a/11899925 中的以下代码，我可以确定一个词是否唯一（通过比较它是否使用过一次或多次使用）：

helloString = ['hello', 'world', 'world']
count = {}
for word in helloString :
   if word in count :
      count[word] += 1
   else:
      count[word] = 1

But, if I were to have a string with hundreds of words, how would I be able to count the number of unique words within that string?

但是，如果我有一个包含数百个单词的字符串，我将如何计算该字符串中唯一单词的数量？

For example, my code has:

例如，我的代码有：

uniqueWordCount = 0
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
count = {}
for word in words :
   if word in count :
      count[word] += 1
   else:
      count[word] = 1

How would I be able to set uniqueWordCountto 6? Usually, I am really good at solving these types of algorithmic puzzles, but I have been unsuccessful with figuring this one out. I feel as if it is right beneath my nose.

我怎样才能设置uniqueWordCount为6？通常，我非常擅长解决这些类型的算法难题，但我一直没有成功解决这个难题。我觉得它就在我的鼻子底下。

Answer 1

采纳答案by Matthew MacGregor

The best way to solve this is to use the setcollection type. A setis a collection in which all elements are unique. Therefore:

解决这个问题的最好方法是使用set集合类型。Aset是一个集合，其中所有元素都是唯一的。所以：

unique = set([ 'one', 'two', 'two']) 
len(unique) # is 2

You can use a set from the outset, adding words to it as you go:

您可以从一开始就使用一个集合，并在其中添加单词：

unique.add('three')

This will throw out any duplicates as they are added. Or, you can collect all the elements in a list and pass the list to the set()function, which will remove the duplicates at that time. The example I provided above shows this pattern:

这将在添加时丢弃任何重复项。或者，您可以收集列表中的所有元素并将列表传递给set()函数，该函数将删除当时的重复项。我上面提供的示例显示了这种模式：

unique = set([ 'one', 'two', 'two'])
unique.add('three')

# unique now contains {'one', 'two', 'three'}

回答by kay - SE is evil

In your current code you can either increment uniqueWordCountin the elsecase where you already set count[word], or just lookup the number of keys in the dictionary: len(count).

在当前的代码，你可以增加uniqueWordCount在else那里你已经设置的情况下count[word]，或只是查找字典中键的数量：len(count)。

If you only want to know the number of unique elements, then get the elements in the set: len(set(helloString))

如果您只想知道唯一元素的数量，则获取中的元素set：len(set(helloString))

Answer 3

回答by Ben Aubin

You have many options for this, I recommend a set, but you can also use a counter, which counts the amount a number shows up, or you can look at the number of keys for the dictionary you made.

你有很多选择，我推荐一个集合，但你也可以使用一个计数器，它计算一个数字出现的数量，或者你可以查看你制作的字典的键数。

Set

放

You can also convert the list to a set, where all elements have to be unique. Not unique elements are discarded:

您还可以将列表转换为集合，其中所有元素都必须是唯一的。不唯一的元素被丢弃：

helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
helloSet = set(helloString) #=> ['doing', 'how', 'are', 'world', 'you', 'hello', 'today']
uniqueWordCount = len(set(helloString)) #=> 7

Here's a link to further reading on sets

这是进一步阅读集合的链接

Counter

柜台

You can also use a counter, which can also tell you how often a word was used, if you still need that information.

您还可以使用计数器，如果您仍然需要该信息，它也可以告诉您某个词的使用频率。

from collections import Counter

helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
counter = Counter(helloString)
len(counter) #=> 7
counter["world"] #=> 2

Loop

环形

At the end for your loop, you can check the lenof count, also, you mistyped helloStringas words:

在循环结束时，您可以检查lenof count，此外，您输入错误helloString为words：

uniqueWordCount = 0
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
count = {}
for word in helloString:
   if word in count :
      count[word] += 1
   else:
      count[word] = 1
len(count) #=> 7

Answer 4

回答by NotAnAmbiTurner

I would do this using a set.

我会用一套来做到这一点。

def stuff(helloString):
    hello_set = set(helloString)
    return len(hello_set)

Answer 5

回答by Paul Rooney

You can use collections.Counter

您可以使用 collections.Counter

helloString = ['hello', 'world', 'world']

from collections import Counter

c = Counter(helloString)

print("There are {} unique words".format(len(c)))
print('They are')

for k, v in c.items():
    print(k)

I know the question doesn't specifically ask for this, but to maintain order

我知道这个问题不是专门问这个的，而是为了维持秩序

helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    pass

c = OrderedCounter(helloString)

print("There are {} unique words".format(len(c)))
print('They are')

for k, v in c.items():
    print(k)

Answer 6

回答by Trey Hunner

I may be misreading the question but I believe the goal is to find all elements which only occur one time in the list.

我可能误读了这个问题，但我相信目标是找到列表中只出现一次的所有元素。

from collections import Counter
helloString = ['hello', 'world', 'world', 'how', 'are', 'you', 'doing', 'today']
counter = Counter(helloString)
uniques = [value for value, count in counter.items() if count == 1]

This will give us 6 items because "world" occurs twice in our list:

这将为我们提供 6 个项目，因为“world”在我们的列表中出现了两次：

>>> uniques
['you', 'are', 'doing', 'how', 'today', 'hello']

Python 计算列表中唯一单词的数量

提问by Matt Swift

采纳答案by Matthew MacGregor

回答by kay - SE is evil

回答by Ben Aubin

Set

放

Counter

柜台

Loop

环形

回答by NotAnAmbiTurner

回答by Paul Rooney

回答by Trey Hunner

相关推荐

最近更新

标签

Python 计算列表中唯一单词的数量

提问by Matt Swift

采纳答案by Matthew MacGregor

回答by kay - SE is evil

回答by Ben Aubin

Set

放

Counter

柜台

Loop

环形

回答by NotAnAmbiTurner

回答by Paul Rooney

回答by Trey Hunner

相关推荐

Python Pandas 错误“只能使用带有字符串值的 .str 访问器”

如何在 Python3 中像 printf 一样打印？

Python 使用列的长度过滤 DataFrame

Python Xpath 仅选择具有匹配属性的直接兄弟

相关推荐

最近更新

标签