列表中的 Python 唯一值
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/20061970/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python unique values in a list
提问by Mike Mcmahon
I am new to Python and I am finding set() to be a bit confusing. Can someone offer some help with finding and creating a new list of unique numbers( another words eliminate duplicates)?
我是 Python 新手,我发现 set() 有点令人困惑。有人可以提供一些帮助来查找和创建一个新的唯一数字列表(换句话说就是消除重复)?
import string
import re
def go():
import re
file = open("C:/Cryptography/Pollard/Pollard/newfile.txt","w")
filename = "C:/Cryptography/Pollard/Pollard/primeFactors.txt"
with open(filename, 'r') as f:
lines = f.read()
found = re.findall(r'[\d]+[^\d.\d+()+\s]+[^\s]+[\d+\w+\d]+[\d+\^+\d]+[\d+\w+\d]+', lines)
a = found
for i in range(5):
a[i] = str(found[i])
print(a[i].split('x'))
Now
现在
print(a[i].split('x'))
....gives the following output
....给出以下输出
['2', '3', '1451', '40591', '258983', '11409589', '8337580729',
'1932261797039146667']
['2897', '514081', '585530047', '108785617538783538760452408483163']
['2', '3', '5', '19', '28087', '4947999059',
'2182718359336613102811898933144207']
['3', '5', '53', '293', '31159', '201911', '7511070764480753',
'22798192180727861167']
['2', '164493637239099960712719840940483950285726027116731']
How do I output a list of only non repeating numbers? I read on the forums that "set()" can do this, but I have tried this with no avail. Any help is much appreciated!
如何输出仅包含非重复数字的列表?我在论坛上读到“set()”可以做到这一点,但我试过没有用。任何帮助深表感谢!
采纳答案by Blckknght
A setis a collection (like a listor tuple), but it does not allow duplicates and has very fast membership testing. You can use a list comprehension to filter out values in one list that have appeared in a previous list:
Aset是一个集合(如 alist或tuple),但它不允许重复并且具有非常快的成员资格测试。您可以使用列表推导过滤掉一个列表中出现在上一个列表中的值:
data = [['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667'],
['2897', '514081', '585530047', '108785617538783538760452408483163'],
['2', '3', '5', '19', '28087', '4947999059', '2182718359336613102811898933144207'],
['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167'],
['2', '164493637239099960712719840940483950285726027116731']]
seen = set() # set of seen values, which starts out empty
for lst in data:
deduped = [x for x in lst if x not in seen] # filter out previously seen values
seen.update(deduped) # add the new values to the set
print(deduped) # do whatever with deduped list
Output:
输出:
['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667']
['2897', '514081', '585530047', '108785617538783538760452408483163']
['5', '19', '28087', '4947999059', '2182718359336613102811898933144207']
['53', '293', '31159', '201911', '7511070764480753', '22798192180727861167']
['164493637239099960712719840940483950285726027116731']
Note that this version does not filter out values that are duplicated within a single list (unless they're already duplicates of a value in a previous list). You could work around that by replacing the list comprehension with an explicit loop that checks each individual value against the seenset (and adds it if it's new) before appending to a list for output. Or if the order of the items in your sub-lists is not important, you could turn them into sets of their own:
请注意,此版本不会过滤掉在单个列表中重复的值(除非它们已经与上一个列表中的值重复)。您可以通过用显式循环替换列表理解来解决这个问题,该循环在附加到输出列表之前根据seen集合检查每个单独的值(add如果它是新的,则检查它)。或者,如果您的子列表中项目的顺序不重要,您可以将它们变成自己的集合:
seen = set()
for lst in data:
lst_as_set = set(lst) # this step eliminates internal duplicates
deduped_set = lst_as_set - seen # set subtraction!
seen.update(deduped_set)
# now do stuff with deduped_set, which is iterable, but in an arbitrary order
Finally, if the internal sub-lists are a red herring entirely and you want to simply filter a flattened list to get only unique values, that sounds like a job for the unique_everseenrecipe from the itertoolsdocumentation:
最后,如果内部子列表完全是一个红鲱鱼,并且您只想过滤扁平列表以仅获取唯一值,那么这听起来像是文档中unique_everseen配方的工作:itertools
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
回答by Anthony Kong
setshould work in this case.
set在这种情况下应该工作。
You can try the following:
您可以尝试以下操作:
# Concat all your lists into a single list
>>> a = ['2', '3', '1451', '40591', '258983', '11409589', '8337580729','1932261797039146667'] +['2897', '514081', '585530047', '108785617538783538760452408483163'] +['2', '3', '5', '19', '28087', '4947999059','2182718359336613102811898933144207'] + ['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167']+ ['2', '164493637239099960712719840940483950285726027116731']
>>> len(a)
29
>>> set(a)
set(['514081', '258983', '40591', '201911', '11409589', '585530047', '3', '2', '5', '108785617538783538760452408483163', '2279819218\
0727861167', '164493637239099960712719840940483950285726027116731', '8337580729', '4947999059', '19', '2897', '7511070764480753', '5\
3', '28087', '2182718359336613102811898933144207', '1451', '31159', '1932261797039146667', '293'])
>>> len(set(a))
24
>>>
回答by elucify
If you want unique values from the flattened list, you can use reduce() to flatten the list. Then use the frozenset() constructor to get the result list:
如果您想要扁平列表中的唯一值,您可以使用 reduce() 来扁平列表。然后使用frozenset()构造函数获取结果列表:
>>> data = [
['2', '3', '1451', '40591', '258983', '11409589', '8337580729', '1932261797039146667'],
['2897', '514081', '585530047', '108785617538783538760452408483163'],
['2', '3', '5', '19', '28087', '4947999059', '2182718359336613102811898933144207'],
['3', '5', '53', '293', '31159', '201911', '7511070764480753', '22798192180727861167'],
['2', '164493637239099960712719840940483950285726027116731']]
>>> print list(frozenset(reduce((lambda a, b: a+b), data)))
['514081', '258983', '40591', '201911', '11409589', '585530047', '3',
'2', '5', '108785617538783538760452408483163', '22798192180727861167',
'164493637239099960712719840940483950285726027116731', '8337580729',
'4947999059', '19', '2897', '7511070764480753', '53', '28087',
'2182718359336613102811898933144207', '1451', '31159',
'1932261797039146667', '293']

