Python中的运行长度编码

Question

提问by

I'm trying to write a simple python algorithm to solve this problem. Can you please help me figure out why my code is not working:

我正在尝试编写一个简单的 python 算法来解决这个问题。你能帮我弄清楚为什么我的代码不起作用：

Problem:

问题：

If any character is repeated more than 4 times, the entire set of repeated characters should be replaced with a slash '/', followed by a 2-digit number which is the length of this run of repeated characters, and the character. For example, "aaaaa" would be encoded as "/05a". Runs of 4 or less characters should not be replaced since performing the encoding would not decrease the length of the string.

如果任何字符重复超过 4 次，则整个重复字符集应替换为斜杠“/”，后跟一个 2 位数字，即该重复字符的长度，以及该字符。例如，“aaaaa”将被编码为“/05a”。不应替换 4 个或更少字符的运行，因为执行编码不会减少字符串的长度。

My Code:

我的代码：

def runLengthEncode (plainText):
    res=''
    a=''
    for i in plainText:
        if a.count(i)>0:
            a+=i
        else:
            if len(a)>4:
                res+="/" + str(len(a)) + a[0][:1]
            else:
                res+=a
                a=i
    return(res)

Answer 1

回答by Karoly Horvath

Just observe the behaviour:

只需观察行为：

>>> runLengthEncode("abcd")
'abc'

Last character is ignored. You have to append what you've collected.

最后一个字符被忽略。您必须附加您收集的内容。

>>> runLengthEncode("abbbbbcd")
'a/5b/5b'

Oops, problem after encoding. You should set a=ieven if you found a long enough sequence.

哎呀，编码后的问题。a=i即使找到足够长的序列，也应该进行设置。

Answer 2

回答by Serdalis

Aside for setting a=iafter encoding a sequence and setting a width for your int when printed into the string. You could also do the following which takes advantage of pythons groupby. Its also a good idea to use formatwhen constructing strings.

除了a=i在编码序列后设置并在打印到字符串时为 int 设置宽度。您还可以执行以下利用 pythons 的操作groupby。format在构造字符串时使用它也是一个好主意。

from itertools import groupby

def runLengthEncode (plainText):
    res = []

    for k,i in groupby(plainText):
        run = list(i)
        if(len(run) > 4):
            res.append("/{:02}{}".format(len(run), k))
        else:
            res.extend(run)

    return "".join(res)

Answer 3

回答by pkacprzak

You can use the groupby()function combined with a list/generator comprehension:

您可以将groupby()函数与列表/生成器理解结合使用：

from itertools import groupby, imap

''.join(x if reps <= 4 else "/%02d%s" % (reps, x) for x, reps in imap(lambda x: (x[0], len(list(x[1]))), groupby(s)))

Answer 4

回答by pkacprzak

I know this is not the most efficient solution, but we haven't studied functions like groupby()yet so here's what I did:

我知道这不是最有效的解决方案，但我们还没有研究过这样的函数，groupby()所以我做了以下工作：

def runLengthEncode (plainText):
    res=''
    a=''
    count = 0
    for i in plainText:
        count+=1
        if a.count(i)>0:
            a+=i
        else:
            if len(a)>4:
                if len(a)<10:
                    res+="/0"+str(len(a))+a[0][:1]
                else:
                    res+="/" + str(len(a)) + a[0][:1]
                a=i
            else:
                res+=a
                a=i
        if count == len(plainText):
            if len(a)>4:
                if len(a)<10:
                    res+="/0"+str(len(a))+a[0][:1]
                else:
                    res+="/" + str(len(a)) + a[0][:1]
            else:
                res+=a
    return(res)

Answer 5

回答by Thomas Ahle

Rosetta Code has a lot of implementations, that should easily be adaptable to your usecase.

Rosetta Code 有很多实现，应该很容易适应您的用例。

Here is Python code with regular expressions:

这是带有正则表达式的 Python 代码：

from re import sub

def encode(text):
    '''
    Doctest:
        >>> encode('WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW')
        '12W1B12W3B24W1B14W'    
    '''
    return sub(r'(.)*', lambda m: str(len(m.group(0))) + m.group(1),
               text)

def decode(text):
    '''
    Doctest:
        >>> decode('12W1B12W3B24W1B14W')
        'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW'
    '''
    return sub(r'(\d+)(\D)', lambda m: m.group(2) * int(m.group(1)),
               text)

textin = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"
assert decode(encode(textin)) == textin

Answer 6

回答by Ashwin Joshi

An easy solution to run-length encoding which I can think of:

我能想到的游程编码的简单解决方案：

For encoding a string like "a4b5c6d7...":

用于编码字符串，如"a4b5c6d7..."：

def encode(s):
    counts = {}
    for c in s:
        if counts.get(c) is None:
            counts[c] = s.count(c)
    return "".join(k+str(v) for k,v in counts.items())

For decoding a string like "aaaaaabbbdddddccccc....":

用于解码像这样的字符串"aaaaaabbbdddddccccc...."：

def decode(s):
    return "".join((map(lambda tup:  tup[0] * int(tup[1]), zip(s[0:len(s):2], s[1:len(s):2]))))

Fairly easy to read and simple.

相当容易阅读和简单。

Answer 7

回答by Christopher McGeough

text=input("Please enter the string to encode")
encoded=[]
index=0
amount=1
while index<=(len(text)-1):  
  if index==(len(text)-1) or text[index]!=text[(index+1)]:
    encoded.append((text[index],amount))        
    amount=1
  else:
    amount=amount+1            
  index=index+1   
print(encoded)

Answer 8

回答by Tom Smith

Split=(list(input("Enter string: ")))
Split.append("")
a = 0
for i in range(len(Split)):
    try:
        if (Split[i] in Split) >0:
            a = a + 1
        if Split[i] != Split[i+1]:
            print(Split[i],a)
            a = 0
    except IndexError:
        print()

this is much easier and works everytime

这更容易并且每次都有效

Answer 9

回答by Motez Ouledissa

def RLE_comp_encode(text):
    if text == text[0]*len(text) :
        return str(len(text))+text[0]
    else:
        comp_text , r = '' , 1
        for i in range (1,len(text)):
            if text[i]==text[i-1]:
                r +=1
                if i == len(text)-1:
                    comp_text += str(r)+text[i]
            else :
                comp_text += str(r)+text[i-1]
                r = 1
    return comp_text

This worked for me,

这对我有用，

Answer 10

回答by Hultner

I see many great solutions here but none that feels very pythonic to my eyes. So I'm contributing with a implementation I wrote myself today for this problem.

我在这里看到了许多很棒的解决方案，但没有一个让我觉得非常pythonic。所以我正在为我今天为这个问题编写的一个实现做出贡献。

def run_length_encode(data: str) -> Iterator[Tuple[str, int]]:
    """Returns run length encoded Tuples for string"""
    # A memory efficient (lazy) and pythonic solution using generators
    return ((x, sum(1 for _ in y)) for x, y in groupby(data))

This will return a generator of Tuples with the character and number of instance, but can easily be modified to return a string as well. A benefit of doing it this way is that it's all lazy evaluated and won't consume more memory or cpu if you don't need to exhaust the entire search space.

这将返回一个带有字符和实例数的元组生成器，但也可以很容易地修改为返回一个字符串。这样做的一个好处是，如果您不需要用尽整个搜索空间，那么它都是惰性计算的，并且不会消耗更多的内存或 cpu。

If you still want string encoding the code can quite easily be modified for that use case like this:

如果您仍然想要字符串编码，可以很容易地为该用例修改代码，如下所示：

def run_length_encode(data: str) -> str:
    """Returns run length encoded string for data"""
    # A memory efficient (lazy) and pythonic solution using generators
    return "".join(f"{x}{sum(1 for _ in y)}" for x, y in groupby(data))

This does a more generic run length encoding for all lengths and not just for those of more than 4 characters, but this could also quite easily be adapted with a conditional for the string if wanted.

这对所有长度进行了更通用的运行长度编码，而不仅仅是超过 4 个字符的长度，但是如果需要，这也可以很容易地通过字符串的条件进行调整。

Python中的运行长度编码

提问by

回答by Karoly Horvath

回答by Serdalis

回答by pkacprzak

回答by pkacprzak

回答by Thomas Ahle

回答by Ashwin Joshi

回答by Christopher McGeough

回答by Tom Smith

回答by Motez Ouledissa

回答by Hultner

相关推荐

最近更新

标签

Python中的运行长度编码

提问by

回答by Karoly Horvath

回答by Serdalis

回答by pkacprzak

回答by pkacprzak

回答by Thomas Ahle

回答by Ashwin Joshi

回答by Christopher McGeough

回答by Tom Smith

回答by Motez Ouledissa

回答by Hultner

相关推荐

如何在 Python 中隐藏海龟图标/指针

Python 类型错误：字符串索引必须是整数，而不是 str // 使用 dict

Python 我不断收到 'WSGIRequest' 对象在 django 上没有属性 'Get'

Python 如何在 Windows 上的 http 代理后面使用 easy_install？

相关推荐

最近更新

标签