在 Python 中删除字符串中的重复项

Question

提问by SuperString

What is an efficient algorithm to removing all duplicates in a string?

删除字符串中所有重复项的有效算法是什么？

For example : aaaabbbccdbdbcd

例如：aaaabbbccdbdbcd

Required result: abcd

所需结果：abcd

Answer 1

回答by cletus

You use a hashtable to store currently discovered keys (access O(1)) and then loop through the array. If a character is in the hashtable, discard it. If it isn't add it to the hashtable and a result string.

您使用哈希表来存储当前发现的键（访问 O(1)），然后遍历数组。如果一个字符在哈希表中，则丢弃它。如果不是，则将其添加到哈希表和结果字符串中。

Overall: O(n) time (and space).

总体：O(n) 时间（和空间）。

The naive solution is to search for the character is the result string as you process each one. That O(n²).

天真的解决方案是在处理每个字符时搜索字符是结果字符串。那个 O(n ²)。

Answer 2

回答by John La Rooy

In Python

在Python 中

>>> ''.join(set("aaaabbbccdbdbcd"))
'acbd'

If the order needs to be preserved

如果订单需要保留

>>> q="aaaabbbccdbdbcd"                    # this one is not
>>> ''.join(sorted(set(q),key=q.index))    # so efficient
'abcd'

or

或者

>>> S=set()
>>> res=""
>>> for c in "aaaabbbccdbdbcd":
...  if c not in S:
...   res+=c
...   S.add(c)
... 
>>> res
'abcd'

or

或者

>>> S=set()
>>> L=[]
>>> for c in "aaaabbbccdbdbcd":
...  if c not in S:
...   L.append(c)
...   S.add(c)
... 
>>> ''.join(L)
'abcd'

In python3.1

在python3.1中

>>> from collections import OrderedDict
>>> ''.join(list(OrderedDict((c,0) for c in "aaaabbbccdbdbcd").keys()))
'abcd'

Answer 3

回答by Thomas Jung

This closely related to the question: Detecting repetition with infinite input.

这与以下问题密切相关：Detecting repeating with infinite input。

The hashtable approach may not be optimal depending on your input. Hashtables have a certain amount of overhead(buckets, entry objects). It is huge overhead compared to the actual stored char. (If you target environment is Java it is even worse as the HashMap is of type Map<Character,?>.) The worse case runtime for a Hashtable access is O(n) due to collisions.

根据您的输入，哈希表方法可能不是最佳的。哈希表有一定的开销（存储桶、条目对象）。与实际存储的字符相比，这是巨大的开销。（如果您的目标环境是 Java，则更糟，因为 HashMap 的类型是Map<Character,?>。）由于冲突，Hashtable 访问的最坏情况运行时是 O(n)。

You need only 8kbtoo represent all 2-byte unicode characters in a plain BitSet. This may be optimized if your input character set is more restricted or by using a compressed BitSets (as long as you have a sparse BitSet). The runtime performance will be favorable for a BitSet it is O(1).

您只需要8kb也可以表示纯BitSet中的所有 2 字节 unicode 字符。如果您的输入字符集受到更多限制或使用压缩的 BitSet（只要您有一个稀疏的 BitSet），这可能会得到优化。运行时性能将有利于 BitSet，它是 O(1)。

Answer 4

回答by RahulKT

You Can Do this in O(n) only if you are using HashTable. Code is given below Please Note- It is assumed that number of possible characters in input string are 256

仅当您使用 HashTable 时，您才能在 O(n) 中执行此操作。代码如下请注意-假设输入字符串中可能的字符数为 256

void removeDuplicates(char *str)
{
 int len = strlen(str); //Gets the length of the String
 int count[256] = {0};  //initializes all elements as zero
 int i;
     for(i=0;i<len;i++)
     {
        count[str[i]]++;  
        if(count[str[i]] == 1)
          printf("%c",str[i]);                  
     }     
}

Answer 5

回答by SPWorley

Keep an array of 256 "seen" booleans, one for each possible character. Stream your string. If you haven't seen the character before, output it and set the "seen" flag for that character.

保留一个包含 256 个“可见”布尔值的数组，每个可能的字符一个。流式传输您的字符串。如果你以前没有见过这个角色，输出它并为那个角色设置“seen”标志。

Answer 6

回答by Stano

PHP algorythm - O(n):

PHP 算法 - O(n)：

function remove_duplicate_chars($str) {
    if (2 > $len = strlen($str)) {
        return $str;
    }
    $flags = array_fill(0,256,false);
    $flags[ord($str[0])]=true;
    $j = 1;
    for ($i=1; $i<$len; $i++) {
        $ord = ord($str[$i]);
        if (!$flags[$ord]) {
            $str[$j] = $str[$i];
            $j++;
            $flags[$ord] = true;
        }
    }
    if ($j<$i) { //if duplicates removed
        $str = substr($str,0,$j);
    }
    return $str;
}

echo remove_duplicate_chars('aaaabbbccdbdbcd'); // result: 'abcd'

Answer 7

回答by TheMan

#include <iostream>
#include<string>
using namespace std;
#define MAX_SIZE 256

int main()
{
    bool arr[MAX_SIZE] = {false};

    string s;
    cin>>s;
    int k = 0;

    for(int i = 0; i < s.length(); i++)
    {
        while(arr[s[i]] == true && i < s.length())
        {
            i++;
        }
        if(i < s.length())
        {
            s[k]    = s[i];
            arr[s[k]] = true;
            k++;
        }
    }
    s.resize(k);

    cout << s<< endl; 

    return 0;
}

Answer 8

回答by user2409054

int main()    
{    
    std::string s = "aaacabbbccdbdbcd";

    std::set<char> set1;
    set1.insert(s.begin(), s.end());

    for(set<char>::iterator it = set1.begin(); it!= set1.end(); ++it)
    std::cout << *it;

    return 0;
}

std::set takes O(log n) to insert

Answer 9

回答by Mourya Gudladona

O(n) solution:

O(n) 解决方案：

#include<stdio.h>
#include<string.h>
#include<stdlib.h>

void removeDuplicates(char *);

void removeDuplicates(char *inp)
{
        int i=0, j=0, FLAG=0, repeat=0;

     while(inp[i]!='string = "aaabbbccc"

product = reduce((lambda x,y: x if (y in x) else x+y), string)

print product
')
    {
        if(FLAG==1)
        {
                inp[i-repeat]=inp[i];
        }
        if(j==(j | 1<<(inp[i]-'abc
')))
        {
                repeat++;
                FLAG=1;
        }
                j= j | 1<<(inp[i]-'string = "aaabssabcdsdwa"

str_uniq = ''.join(set(string))

print str_uniq
');
                i++;
    }

     inp[i-repeat]='acbdsw
';
}

int main()
{
     char inp[100] = "aaAABCCDdefgccc";
    //char inp[100] = "ccccc";
    //char inp[100] = "##代码##";
    //char *inp = (char *)malloc(sizeof(char)*100);

    printf (" INPUT STRING : %s\n", inp);

     removeDuplicates(inp);

    printf (" OUTPUT STRING : %s:\n", inp);
    return 1;
}

Answer 10

回答by framontb

Perhaps the use of built in Python functions are more efficient that those "self made". Like this:

也许使用内置 Python 函数比那些“自制”更有效。像这样：

=====================

======================

NOTE: maintain order

注意：维持秩序

CODE

代码

##代码##

OUTPUT

输出

##代码##

=========================

==========================

NOTE: order neglected

注意：订单被忽略

CODE

代码

##代码##

OUTPUT

输出

##代码##

在 Python 中删除字符串中的重复项

提问by SuperString

回答by cletus

回答by John La Rooy

回答by Thomas Jung

回答by RahulKT

回答by SPWorley

回答by Stano

回答by TheMan

回答by user2409054

回答by Mourya Gudladona

回答by framontb

相关推荐

最近更新

标签

在 Python 中删除字符串中的重复项

提问by SuperString

回答by cletus

回答by John La Rooy

回答by Thomas Jung

回答by RahulKT

回答by SPWorley

回答by Stano

回答by TheMan

回答by user2409054

回答by Mourya Gudladona

回答by framontb

相关推荐

python Python在使用特殊字符时返回错误长度的字符串

python 对分，是否可以使用降序排序列表？

python `if __name__ == '__main__'` 等价于 Ruby

python 如何从 Django 中的 sql 模式生成数据模型？

相关推荐

最近更新

标签

python `if name == 'main'` 等价于 Ruby