Python 字符串中所有唯一字符的列表?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/13902805/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 09:53:54  来源:igfitidea点击:

List of all unique characters in a string?

pythonperformancedata-structures

提问by Ali

I want to append characters to a string, but want to make sure all the letters in the final list are unique.

我想将字符附加到字符串,但要确保最终列表中的所有字母都是唯一的

Example: "aaabcabccd""abcd"

示例:"aaabcabccd""abcd"

Now of course I have two solutions in my mind. One is using a listthat will map the characters with their ASCII codes. So whenever I encounter a letter it will set the index to True. Afterwards I will scan the list and append all the ones that were set. It will have a time complexity of O(n).

现在我当然有两个解决方案。一种是使用list将字符与其 ASCII 代码进行映射。因此,每当我遇到一个字母时,它都会将索引设置为True. 之后我将扫描列表并附加所有设置的列表。它将具有O(n)的时间复杂度。

Another solution would be using a dictand following the same procedure. After mapping every char, I will do the operation for each key in the dictionary. This will have a linearrunning time as well.

另一种解决方案是使用 adict并遵循相同的程序。映射完每个字符后,我将对字典中的每个键进行操作。这也将具有线性运行时间。

Since I am a Python newbie, I was wondering which would be more space efficient. Which one could be implemented more efficiently?

由于我是 Python 新手,我想知道哪个更节省空间。哪一个可以更有效地实施?

PS: Order is notimportant while creating the list.

PS:创建列表时顺序并不重要。

采纳答案by NPE

The simplest solution is probably:

最简单的解决方案可能是:

In [10]: ''.join(set('aaabcabccd'))
Out[10]: 'acbd'

Note that this doesn't guarantee the order in which the letters appear in the output, even though the example might suggest otherwise.

请注意,这并不能保证字母在输出中出现的顺序,即使示例可能另有建议。

You refer to the output as a "list". If a list is what you really want, replace ''.joinwith list:

您将输出称为“列表”。如果列表是您真正想要的,请替换''.joinlist

In [1]: list(set('aaabcabccd'))
Out[1]: ['a', 'c', 'b', 'd']

As far as performance goes, worrying about it at this stage sounds like premature optimization.

就性能而言,现阶段担心它听起来像是过早的优化。

回答by gefei

if the result does not need to be order-preserving, then you can simply use a set

如果结果不需要保序,那么你可以简单地使用一个集合

>>> ''.join(set( "aaabcabccd"))
'acbd'
>>>

回答by Abhijit

Use an OrderedDict. This will ensure that the order is preserved

使用OrderedDict。这将确保订单被保留

>>> ''.join(OrderedDict.fromkeys( "aaabcabccd").keys())
'abcd'

PS: I just timed both the OrderedDict and Set solution, and the later is faster. If order does not matter, set should be the natural solution, if Order Matter;s this is how you should do.

PS:我只是对 OrderedDict 和 Set 解决方案进行了计时,后者更快。如果顺序无关紧要, set 应该是自然的解决方案,如果 Order Matter;s 这就是你应该做的。

>>> from timeit import Timer
>>> t1 = Timer(stmt=stmt1, setup="from __main__ import data, OrderedDict")
>>> t2 = Timer(stmt=stmt2, setup="from __main__ import data")
>>> t1.timeit(number=1000)
1.2893918431815337
>>> t2.timeit(number=1000)
0.0632140599081196

回答by martineau

For completeness sake, here's another recipe that sorts the letters as a byproduct of the way it works:

为了完整起见,这是另一个将字母作为其工作方式的副产品进行排序的方法:

>>> from itertools import groupby
>>> ''.join(k for k, g in groupby(sorted("aaabcabccd")))
'abcd'

回答by Brent Pappas

I have an idea. Why not use the ascii_lowercaseconstant?

我有个主意。为什么不使用ascii_lowercase常量?

For example, running the following code:

例如,运行以下代码:

# string module, contains constant ascii_lowercase which is all the lowercase
# letters of the English alphabet
import string
# Example value of s, a string
s = 'aaabcabccd'
# Result variable to store the resulting string
result = ''
# Goes through each letter in the alphabet and checks how many times it appears.
# If a letter appears at least oce, then it is added to the result variable
for letter in string.ascii_letters:
    if s.count(letter) >= 1:
        result+=letter

# Optional three lines to convert result variable to a list for sorting
# and then back to a string
result = list(result)
result.sort()
result = ''.join(result)

print(result)

Will print 'abcd'

会打印 'abcd'

There you go, all duplicates removed and optionally sorted

好了,所有重复项都已删除并可选择排序

回答by dipenparmar12

Store Unique characters in list

在列表中存储唯一字符

Method 1:

方法一:

uniue_char = list(set('aaabcabccd'))
#['a', 'b', 'c', 'd']

Method 2: By Loop ( Complex )

方法 2:按循环(复杂)

uniue_char = []
for c in 'aaabcabccd':
    if not c in uniue_char:
        uniue_char.append(c)
print(uniue_char)
#['a', 'b', 'c', 'd']

回答by Amit Gupta

char_seen = []
for char in string:
    if char not in char_seen:
        char_seen.append(char)
print(''.join(char_seen))

This will preserve the order in which alphabets are coming,

这将保留字母出现的顺序,

output will be

输出将是

abcd