将字符串拆分为固定长度的块并在 Python 中使用它们的最佳方法是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/18854620/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 12:04:07  来源:igfitidea点击:

What's the best way to split a string into fixed length chunks and work with them in Python?

python

提问by LostRob

I am reading in a line from a text file using:

我正在使用以下命令从文本文件中读取一行:

   file = urllib2.urlopen("http://192.168.100.17/test.txt").read().splitlines()

and outputting it to an LCD display, which is 16 characters wide, in a telnetlib.write command. In the event that the line read is longer than 16 characters I want to break it down into sections of 16 character long strings and push each section out after a certain delay (e.g. 10 seconds), once complete the code should move onto the next line of the input file and continue.

并在 telnetlib.write 命令中将其输出到 16 个字符宽的 LCD 显示器。如果读取的行超过 16 个字符,我想将其分解为 16 个字符长的字符串部分,并在一定延迟(例如 10 秒)后推出每个部分,完成后代码应移至下一行输入文件并继续。

I've tried searching various solutions and reading up on itertools etc. but my understanding of Python just isn't sufficient to get anything to work without doing it in a very long winded way using a tangled mess of if then else statements that's probably going to tie me in knots!

我已经尝试搜索各种解决方案并阅读 itertools 等,但我对 Python 的理解还不足以让任何事情正常工作,而无需使用混乱的 if then else 语句给我打结!

What's the best way for me to do what I want?

对我来说,做我想做的事的最佳方式是什么?

采纳答案by rlms

One solution would be to use this function:

一种解决方案是使用此功能:

def chunkstring(string, length):
    return (string[0+i:length+i] for i in range(0, len(string), length))

This function returns a generator, using a generator comprehension. The generator returns the string sliced, from 0 + a multiple of the length of the chunks, to the length of the chunks + a multiple of the length of the chunks.

此函数使用生成器推导式返回生成器。生成器返回切片后的字符串,从 0 + 块长度的倍数,到块长度 + 块长度的倍数。

You can iterate over the generator like a list, tuple or string - for i in chunkstring(s,n):, or convert it into a list (for instance) with list(generator). Generators are more memory efficient than lists because they generator their elements as they are needed, not all at once, however they lack certain features like indexing.

您可以像列表、元组或字符串一样迭代生成器 - for i in chunkstring(s,n):,或将其转换为列表(例如)list(generator)。生成器比列表更节省内存,因为它们会根据需要生成元素,而不是一次性生成元素,但是它们缺少某些功能,例如索引。

This generator also contains any smaller chunk at the end:

这个生成器最后还包含任何较小的块:

>>> list(chunkstring("abcdefghijklmnopqrstuvwxyz", 5))
['abcde', 'fghij', 'klmno', 'pqrst', 'uvwxy', 'z']

Example usage:

用法示例:

text = """This is the first line.
           This is the second line.
           The line below is true.
           The line above is false.
           A short line.
           A very very very very very very very very very long line.
           A self-referential line.
           The last line.
        """

lines = (i.strip() for i in text.splitlines())

for line in lines:
    for chunk in chunkstring(line, 16):
        print(chunk)

回答by carl.anderson

My favorite way to solve this problem is with the remodule.

我最喜欢的解决这个问题的方法是使用re模块。

import re

def chunkstring(string, length):
  return re.findall('.{%d}' % length, string)

One caveat here is that re.findallwill not return a chunk that is less thanthe length value, so any remainder is skipped.

这里需要注意的是,re.findall不会返回小于长度值的块,因此会跳过任何余数。

However, if you're parsing fixed-width data, this is a great way to do it.

但是,如果您正在解析固定宽度的数据,这是一个很好的方法。

For example, if I want to parse a block of text that I know is made up of 32 byte characters (like a header section) I find this very readable and see no need to generalize it into a separate function (as in chunkstring):

例如,如果我想解析一个我知道由 32 字节字符组成的文本块(如标题部分),我发现这非常可读,并且认为没有必要将其概括为单独的函数(如chunkstring):

for header in re.findall('.{32}', header_data):
  ProcessHeader(header)

回答by Albert Siersema

I know it's an oldie, but like to add how to chop up a string with variable length columns:

我知道这是一个老歌,但想添加如何切碎具有可变长度列的字符串:

def chunkstring(string, lengths):
    return (string[pos:pos+length].strip()
            for idx,length in enumerate(lengths)
            for pos in [sum(map(int, lengths[:idx]))])

column_lengths = [10,19,13,11,7,7,15]
fields = list(chunkstring(line, column_lengths))