在 Python 中删除字符串中的重复项
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/2286860/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Removing duplicates in a string in Python
提问by SuperString
What is an efficient algorithm to removing all duplicates in a string?
删除字符串中所有重复项的有效算法是什么?
For example : aaaabbbccdbdbcd
例如:aaaabbbccdbdbcd
Required result: abcd
所需结果:abcd
回答by cletus
You use a hashtable to store currently discovered keys (access O(1)) and then loop through the array. If a character is in the hashtable, discard it. If it isn't add it to the hashtable and a result string.
您使用哈希表来存储当前发现的键(访问 O(1)),然后遍历数组。如果一个字符在哈希表中,则丢弃它。如果不是,则将其添加到哈希表和结果字符串中。
Overall: O(n) time (and space).
总体:O(n) 时间(和空间)。
The naive solution is to search for the character is the result string as you process each one. That O(n2).
天真的解决方案是在处理每个字符时搜索字符是结果字符串。那个 O(n 2)。
回答by John La Rooy
In Python
在Python 中
>>> ''.join(set("aaaabbbccdbdbcd"))
'acbd'
If the order needs to be preserved
如果订单需要保留
>>> q="aaaabbbccdbdbcd" # this one is not
>>> ''.join(sorted(set(q),key=q.index)) # so efficient
'abcd'
or
或者
>>> S=set()
>>> res=""
>>> for c in "aaaabbbccdbdbcd":
... if c not in S:
... res+=c
... S.add(c)
...
>>> res
'abcd'
or
或者
>>> S=set()
>>> L=[]
>>> for c in "aaaabbbccdbdbcd":
... if c not in S:
... L.append(c)
... S.add(c)
...
>>> ''.join(L)
'abcd'
In python3.1
在python3.1中
>>> from collections import OrderedDict
>>> ''.join(list(OrderedDict((c,0) for c in "aaaabbbccdbdbcd").keys()))
'abcd'
回答by Thomas Jung
This closely related to the question: Detecting repetition with infinite input.
这与以下问题密切相关:Detecting repeating with infinite input。
The hashtable approach may not be optimal depending on your input. Hashtables have a certain amount of overhead(buckets, entry objects). It is huge overhead compared to the actual stored char. (If you target environment is Java it is even worse as the HashMap is of type Map<Character,?>
.) The worse case runtime for a Hashtable access is O(n) due to collisions.
根据您的输入,哈希表方法可能不是最佳的。哈希表有一定的开销(存储桶、条目对象)。与实际存储的字符相比,这是巨大的开销。(如果您的目标环境是 Java,则更糟,因为 HashMap 的类型是Map<Character,?>
。)由于冲突,Hashtable 访问的最坏情况运行时是 O(n)。
You need only 8kbtoo represent all 2-byte unicode characters in a plain BitSet. This may be optimized if your input character set is more restricted or by using a compressed BitSets (as long as you have a sparse BitSet). The runtime performance will be favorable for a BitSet it is O(1).
您只需要8kb也可以表示纯BitSet中的所有 2 字节 unicode 字符。如果您的输入字符集受到更多限制或使用压缩的 BitSet(只要您有一个稀疏的 BitSet),这可能会得到优化。运行时性能将有利于 BitSet,它是 O(1)。
回答by RahulKT
You Can Do this in O(n) only if you are using HashTable. Code is given below Please Note- It is assumed that number of possible characters in input string are 256
仅当您使用 HashTable 时,您才能在 O(n) 中执行此操作。代码如下请注意-假设输入字符串中可能的字符数为 256
void removeDuplicates(char *str)
{
int len = strlen(str); //Gets the length of the String
int count[256] = {0}; //initializes all elements as zero
int i;
for(i=0;i<len;i++)
{
count[str[i]]++;
if(count[str[i]] == 1)
printf("%c",str[i]);
}
}
回答by SPWorley
Keep an array of 256 "seen" booleans, one for each possible character. Stream your string. If you haven't seen the character before, output it and set the "seen" flag for that character.
保留一个包含 256 个“可见”布尔值的数组,每个可能的字符一个。流式传输您的字符串。如果你以前没有见过这个角色,输出它并为那个角色设置“seen”标志。
回答by Stano
PHP algorythm - O(n):
PHP 算法 - O(n):
function remove_duplicate_chars($str) {
if (2 > $len = strlen($str)) {
return $str;
}
$flags = array_fill(0,256,false);
$flags[ord($str[0])]=true;
$j = 1;
for ($i=1; $i<$len; $i++) {
$ord = ord($str[$i]);
if (!$flags[$ord]) {
$str[$j] = $str[$i];
$j++;
$flags[$ord] = true;
}
}
if ($j<$i) { //if duplicates removed
$str = substr($str,0,$j);
}
return $str;
}
echo remove_duplicate_chars('aaaabbbccdbdbcd'); // result: 'abcd'
回答by TheMan
#include <iostream>
#include<string>
using namespace std;
#define MAX_SIZE 256
int main()
{
bool arr[MAX_SIZE] = {false};
string s;
cin>>s;
int k = 0;
for(int i = 0; i < s.length(); i++)
{
while(arr[s[i]] == true && i < s.length())
{
i++;
}
if(i < s.length())
{
s[k] = s[i];
arr[s[k]] = true;
k++;
}
}
s.resize(k);
cout << s<< endl;
return 0;
}
回答by user2409054
int main()
{
std::string s = "aaacabbbccdbdbcd";
std::set<char> set1;
set1.insert(s.begin(), s.end());
for(set<char>::iterator it = set1.begin(); it!= set1.end(); ++it)
std::cout << *it;
return 0;
}
std::set takes O(log n) to insert
回答by Mourya Gudladona
O(n) solution:
O(n) 解决方案:
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
void removeDuplicates(char *);
void removeDuplicates(char *inp)
{
int i=0, j=0, FLAG=0, repeat=0;
while(inp[i]!='string = "aaabbbccc"
product = reduce((lambda x,y: x if (y in x) else x+y), string)
print product
')
{
if(FLAG==1)
{
inp[i-repeat]=inp[i];
}
if(j==(j | 1<<(inp[i]-'abc
')))
{
repeat++;
FLAG=1;
}
j= j | 1<<(inp[i]-'string = "aaabssabcdsdwa"
str_uniq = ''.join(set(string))
print str_uniq
');
i++;
}
inp[i-repeat]='acbdsw
';
}
int main()
{
char inp[100] = "aaAABCCDdefgccc";
//char inp[100] = "ccccc";
//char inp[100] = "##代码##";
//char *inp = (char *)malloc(sizeof(char)*100);
printf (" INPUT STRING : %s\n", inp);
removeDuplicates(inp);
printf (" OUTPUT STRING : %s:\n", inp);
return 1;
}
回答by framontb
Perhaps the use of built in Python functions are more efficient that those "self made". Like this:
也许使用内置 Python 函数比那些“自制”更有效。像这样:
=====================
======================
NOTE: maintain order
注意:维持秩序
CODE
代码
##代码##OUTPUT
输出
##代码##=========================
==========================
NOTE: order neglected
注意:订单被忽略
CODE
代码
##代码##OUTPUT
输出
##代码##