仅将唯一值添加到 Python 中的列表

Question

提问by Tim Elhajj

I'm trying to learn python. Here is the relevant part of the exercise:

我正在尝试学习python。这是练习的相关部分：

For each word, check to see if the word is already in a list. If the word is not in the list, add it to the list.

对于每个单词，检查该单词是否已经在列表中。如果单词不在列表中，请将其添加到列表中。

Here is what I've got.

这是我所拥有的。

fhand = open('romeo.txt')
output = []

for line in fhand:
    words = line.split()
    for word in words:
        if word is not output:
            output.append(word)

print sorted(output)

Here is what I get.

这是我得到的。

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 'through', 'what', 'window', 'with', 'yonder']

Note duplication (and, is, sun, etc).

注意重复（和，是，太阳等）。

How do I get only unique values?

我如何只获得唯一值？

Answer 1

回答by Tony Tannous

To eliminate duplicates from a list, you can maintain an auxiliary list and check against.

要从列表中消除重复项，您可以维护一个辅助列表并进行检查。

myList = ['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'and', 'and', 
     'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'is', 'is', 'kill', 'light', 
     'moon', 'pale', 'sick', 'soft', 'sun', 'sun', 'the', 'the', 'the', 
     'through', 'what', 'window', 'with', 'yonder']

auxiliaryList = []
for word in myList:
    if word not in auxiliaryList:
        auxiliaryList.append(word)

output:

输出：

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 
  'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick',
  'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']

This is very simple to comprehend and code is self explanatory. However, code simplicity comes on the expense of code efficiency as linear scans over a growing list makes a linear algorithm degrade to quadratic.

这很容易理解，代码是不言自明的。然而，代码的简单性是以牺牲代码效率为代价的，因为对不断增长的列表进行线性扫描会使线性算法降级为二次方。

If the order is not important, you could use set()

如果顺序不重要，您可以使用set()

A set object is an unordered collection of distinct hashable objects.

集合对象是不同的可散列对象的无序集合。

Hashabilitymakes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

哈希能力使对象可用作字典键和集合成员，因为这些数据结构在内部使用哈希值。

Since the averagecase for membership checking in a hash-table is O(1), using a set is more efficient.

由于哈希表中成员资格检查的平均情况是 O(1)，因此使用集合更有效。

auxiliaryList = list(set(myList))

output:

输出：

['and', 'envious', 'already', 'fair', 'is', 'through', 'pale', 'yonder', 
 'what', 'sun', 'Who', 'But', 'moon', 'window', 'sick', 'east', 'breaks', 
 'grief', 'with', 'light', 'It', 'Arise', 'kill', 'the', 'soft', 'Juliet']

Answer 2

回答by falsetru

Instead of is notoperator, you should use not inoperator to check whether the item is in the list:

is not您应该使用not in运算符而不是运算符来检查项目是否在列表中：

if word not in output:

BTW, using setis a lot efficient (See Time complexity):

顺便说一句，使用set效率很高（参见时间复杂度）：

with open('romeo.txt') as fhand:
    output = set()
    for line in fhand:
        words = line.split()
        output.update(words)

UPDATEThe setdoes not preserve the original order. To preserve the order, use the set as an auxiliary data structure:

UPDATE将set不会保留原来的顺序。为了保持顺序，使用集合作为辅助数据结构：

output = []
seen = set()
with open('romeo.txt') as fhand:
    for line in fhand:
        words = line.split()
        for word in words:
            if word not in seen:  # faster than `word not in output`
                seen.add(word)
                output.append(word)

Answer 3

回答by Advait S

One method is to see if it's in the list prior to adding, which is what Tony's answer does. If you want to delete duplicate values afterthe list has been created, you can use set()to convert the existing list into a set of unique values, and then use list()to convert it into a list again. All in just one line:

一种方法是在添加之前查看它是否在列表中，这就是 Tony 的答案。如果要在创建列表后删除重复值，可以使用set()将现有列表转换为一组唯一值，然后使用list()将其再次转换为列表。全部在一行中：

list(set(output))

If you want to sort alphabetically, just add a sorted()to the above. Here's the result:

如果要按字母顺序排序，只需sorted()在上面添加一个。结果如下：

['Arise', 'But', 'It', 'Juliet', 'Who', 'already', 'and', 'breaks', 'east', 'envious', 'fair', 'grief', 'is', 'kill', 'light', 'moon', 'pale', 'sick', 'soft', 'sun', 'the', 'through', 'what', 'window', 'with', 'yonder']

Answer 4

回答by Mateen Ulhaq

Here's a "one-liner" which uses this implementationof removing duplicates while preserving order:

这是一个“单行”，它使用这种在保留顺序的同时删除重复项的实现：

def unique(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

output = unique([word for line in fhand for word in line.split()])

The last line flattens fhandinto a list of words, and then calls unique()on the resulting list.

最后一行fhand变成一个单词列表，然后调用unique()结果列表。

仅将唯一值添加到 Python 中的列表

提问by Tim Elhajj

回答by Tony Tannous

回答by falsetru

回答by Advait S

回答by Mateen Ulhaq

相关推荐

最近更新

标签

仅将唯一值添加到 Python 中的列表

提问by Tim Elhajj

回答by Tony Tannous

回答by falsetru

回答by Advait S

回答by Mateen Ulhaq

相关推荐

Python 从 Pandas 数据帧转换为 TensorFlow 张量对象

Python 如何获取 QLineEdit 文本？

Python 客户端错误“由对等方重置连接”

Python 中的简单多线程 for 循环

相关推荐

最近更新

标签