Python中的运行长度编码
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/18948382/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Run length encoding in Python
提问by
I'm trying to write a simple python algorithm to solve this problem. Can you please help me figure out why my code is not working:
我正在尝试编写一个简单的 python 算法来解决这个问题。你能帮我弄清楚为什么我的代码不起作用:
Problem:
问题:
If any character is repeated more than 4 times, the entire set of repeated characters should be replaced with a slash '/', followed by a 2-digit number which is the length of this run of repeated characters, and the character. For example, "aaaaa" would be encoded as "/05a". Runs of 4 or less characters should not be replaced since performing the encoding would not decrease the length of the string.
如果任何字符重复超过 4 次,则整个重复字符集应替换为斜杠“/”,后跟一个 2 位数字,即该重复字符的长度,以及该字符。例如,“aaaaa”将被编码为“/05a”。不应替换 4 个或更少字符的运行,因为执行编码不会减少字符串的长度。
My Code:
我的代码:
def runLengthEncode (plainText):
res=''
a=''
for i in plainText:
if a.count(i)>0:
a+=i
else:
if len(a)>4:
res+="/" + str(len(a)) + a[0][:1]
else:
res+=a
a=i
return(res)
回答by Karoly Horvath
Just observe the behaviour:
只需观察行为:
>>> runLengthEncode("abcd")
'abc'
Last character is ignored. You have to append what you've collected.
最后一个字符被忽略。您必须附加您收集的内容。
>>> runLengthEncode("abbbbbcd")
'a/5b/5b'
Oops, problem after encoding. You should set a=i
even if you found a long enough sequence.
哎呀,编码后的问题。a=i
即使找到足够长的序列,也应该进行设置。
回答by Serdalis
Aside for setting a=i
after encoding a sequence and setting a width for your int when printed into the string. You could also do the following which takes advantage of pythons groupby
. Its also a good idea to use format
when constructing strings.
除了a=i
在编码序列后设置并在打印到字符串时为 int 设置宽度。您还可以执行以下利用 pythons 的操作groupby
。format
在构造字符串时使用它也是一个好主意。
from itertools import groupby
def runLengthEncode (plainText):
res = []
for k,i in groupby(plainText):
run = list(i)
if(len(run) > 4):
res.append("/{:02}{}".format(len(run), k))
else:
res.extend(run)
return "".join(res)
回答by pkacprzak
回答by pkacprzak
I know this is not the most efficient solution, but we haven't studied functions like groupby()
yet so here's what I did:
我知道这不是最有效的解决方案,但我们还没有研究过这样的函数,groupby()
所以我做了以下工作:
def runLengthEncode (plainText):
res=''
a=''
count = 0
for i in plainText:
count+=1
if a.count(i)>0:
a+=i
else:
if len(a)>4:
if len(a)<10:
res+="/0"+str(len(a))+a[0][:1]
else:
res+="/" + str(len(a)) + a[0][:1]
a=i
else:
res+=a
a=i
if count == len(plainText):
if len(a)>4:
if len(a)<10:
res+="/0"+str(len(a))+a[0][:1]
else:
res+="/" + str(len(a)) + a[0][:1]
else:
res+=a
return(res)
回答by Thomas Ahle
Rosetta Code has a lot of implementations, that should easily be adaptable to your usecase.
Rosetta Code 有很多实现,应该很容易适应您的用例。
Here is Python code with regular expressions:
这是带有正则表达式的 Python 代码:
from re import sub
def encode(text):
'''
Doctest:
>>> encode('WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW')
'12W1B12W3B24W1B14W'
'''
return sub(r'(.)*', lambda m: str(len(m.group(0))) + m.group(1),
text)
def decode(text):
'''
Doctest:
>>> decode('12W1B12W3B24W1B14W')
'WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW'
'''
return sub(r'(\d+)(\D)', lambda m: m.group(2) * int(m.group(1)),
text)
textin = "WWWWWWWWWWWWBWWWWWWWWWWWWBBBWWWWWWWWWWWWWWWWWWWWWWWWBWWWWWWWWWWWWWW"
assert decode(encode(textin)) == textin
回答by Ashwin Joshi
An easy solution to run-length encoding which I can think of:
我能想到的游程编码的简单解决方案:
For encoding a string like "a4b5c6d7..."
:
用于编码字符串,如"a4b5c6d7..."
:
def encode(s):
counts = {}
for c in s:
if counts.get(c) is None:
counts[c] = s.count(c)
return "".join(k+str(v) for k,v in counts.items())
For decoding a string like "aaaaaabbbdddddccccc...."
:
用于解码像这样的字符串"aaaaaabbbdddddccccc...."
:
def decode(s):
return "".join((map(lambda tup: tup[0] * int(tup[1]), zip(s[0:len(s):2], s[1:len(s):2]))))
Fairly easy to read and simple.
相当容易阅读和简单。
回答by Christopher McGeough
text=input("Please enter the string to encode")
encoded=[]
index=0
amount=1
while index<=(len(text)-1):
if index==(len(text)-1) or text[index]!=text[(index+1)]:
encoded.append((text[index],amount))
amount=1
else:
amount=amount+1
index=index+1
print(encoded)
回答by Tom Smith
Split=(list(input("Enter string: ")))
Split.append("")
a = 0
for i in range(len(Split)):
try:
if (Split[i] in Split) >0:
a = a + 1
if Split[i] != Split[i+1]:
print(Split[i],a)
a = 0
except IndexError:
print()
this is much easier and works everytime
这更容易并且每次都有效
回答by Motez Ouledissa
def RLE_comp_encode(text):
if text == text[0]*len(text) :
return str(len(text))+text[0]
else:
comp_text , r = '' , 1
for i in range (1,len(text)):
if text[i]==text[i-1]:
r +=1
if i == len(text)-1:
comp_text += str(r)+text[i]
else :
comp_text += str(r)+text[i-1]
r = 1
return comp_text
This worked for me,
这对我有用,
回答by Hultner
I see many great solutions here but none that feels very pythonic to my eyes. So I'm contributing with a implementation I wrote myself today for this problem.
我在这里看到了许多很棒的解决方案,但没有一个让我觉得非常pythonic。所以我正在为我今天为这个问题编写的一个实现做出贡献。
def run_length_encode(data: str) -> Iterator[Tuple[str, int]]:
"""Returns run length encoded Tuples for string"""
# A memory efficient (lazy) and pythonic solution using generators
return ((x, sum(1 for _ in y)) for x, y in groupby(data))
This will return a generator of Tuples with the character and number of instance, but can easily be modified to return a string as well. A benefit of doing it this way is that it's all lazy evaluated and won't consume more memory or cpu if you don't need to exhaust the entire search space.
这将返回一个带有字符和实例数的元组生成器,但也可以很容易地修改为返回一个字符串。这样做的一个好处是,如果您不需要用尽整个搜索空间,那么它都是惰性计算的,并且不会消耗更多的内存或 cpu。
If you still want string encoding the code can quite easily be modified for that use case like this:
如果您仍然想要字符串编码,可以很容易地为该用例修改代码,如下所示:
def run_length_encode(data: str) -> str:
"""Returns run length encoded string for data"""
# A memory efficient (lazy) and pythonic solution using generators
return "".join(f"{x}{sum(1 for _ in y)}" for x, y in groupby(data))
This does a more generic run length encoding for all lengths and not just for those of more than 4 characters, but this could also quite easily be adapted with a conditional for the string if wanted.
这对所有长度进行了更通用的运行长度编码,而不仅仅是超过 4 个字符的长度,但是如果需要,这也可以很容易地通过字符串的条件进行调整。