Python 如何清除 stringio 对象?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/4330812/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 15:18:01  来源:igfitidea点击:

how do I clear a stringio object?

pythonstringio

提问by Incognito

I have a stringio object created and it has some text in it. I'd like to clear its existing values and reuse it instead of recalling it. Is there anyway of doing this?

我创建了一个 stringio 对象,其中包含一些文本。我想清除它的现有值并重用它而不是召回它。反正有这样做吗?

采纳答案by Chris Morgan

TL;DR

TL; 博士

Don't bother clearing it, just create a new one—it's faster.

不要费心清除它,只需创建一个新的——它会更快。

The method

方法

Python 2

蟒蛇 2

Here's how I would find such things out:

以下是我如何找出这些事情:

>>> from StringIO import StringIO
>>> dir(StringIO)
['__doc__', '__init__', '__iter__', '__module__', 'close', 'flush', 'getvalue', 'isatty', 'next', 'read', 'readline', 'readlines', 'seek', 'tell', 'truncate', 'write', 'writelines']
>>> help(StringIO.truncate)
Help on method truncate in module StringIO:

truncate(self, size=None) unbound StringIO.StringIO method
    Truncate the file's size.

    If the optional size argument is present, the file is truncated to
    (at most) that size. The size defaults to the current position.
    The current file position is not changed unless the position
    is beyond the new file size.

    If the specified size exceeds the file's current size, the
    file remains unchanged.

So, you want .truncate(0). But it's probably cheaper (and easier) to initialise a new StringIO. See below for benchmarks.

所以,你想要.truncate(0). 但是初始化一个新的 StringIO 可能更便宜(也更容易)。请参阅下面的基准。

Python 3

蟒蛇 3

(Thanks to tstone2077for pointing out the difference.)

(感谢tstone2077指出的差别。)

>>> from io import StringIO
>>> dir(StringIO)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__enter__', '__eq__', '__exit__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__lt__', '__ne__', '__new__', '__next__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__setstate__', '__sizeof__', '__str__', '__subclasshook__', '_checkClosed', '_checkReadable', '_checkSeekable', '_checkWritable', 'close', 'closed', 'detach', 'encoding', 'errors', 'fileno', 'flush', 'getvalue', 'isatty', 'line_buffering', 'newlines', 'read', 'readable', 'readline', 'readlines', 'seek', 'seekable', 'tell', 'truncate', 'writable', 'write', 'writelines']
>>> help(StringIO.truncate)
Help on method_descriptor:

truncate(...)
    Truncate size to pos.

    The pos argument defaults to the current file position, as
    returned by tell().  The current file position is unchanged.
    Returns the new absolute position.

It is important to note with this that now the current file position is unchanged, whereas truncating to size zero would reset the position in the Python 2 variant.

重要的是要注意,现在当前文件位置不变,而截断为零将重置 Python 2 变体中的位置。

Thus, for Python 2, you only need

因此,对于 Python 2,您只需要

>>> from cStringIO import StringIO
>>> s = StringIO()
>>> s.write('foo')
>>> s.getvalue()
'foo'
>>> s.truncate(0)
>>> s.getvalue()
''
>>> s.write('bar')
>>> s.getvalue()
'bar'

If you do this in Python 3, you won't get the result you expected:

如果在 Python 3 中执行此操作,则不会得到预期的结果:

>>> from io import StringIO
>>> s = StringIO()
>>> s.write('foo')
3
>>> s.getvalue()
'foo'
>>> s.truncate(0)
0
>>> s.getvalue()
''
>>> s.write('bar')
3
>>> s.getvalue()
'\x00\x00\x00bar'

So in Python 3 you also need to reset the position:

所以在 Python 3 中你还需要重置位置:

>>> from cStringIO import StringIO
>>> s = StringIO()
>>> s.write('foo')
3
>>> s.getvalue()
'foo'
>>> s.truncate(0)
0
>>> s.seek(0)
0
>>> s.getvalue()
''
>>> s.write('bar')
3
>>> s.getvalue()
'bar'

If using the truncatemethod in Python 2 code, it's safer to call seek(0)at the same time (before or after, it doesn't matter) so that the code won't break when you inevitably port it to Python 3. And there's another reason why you should just create a new StringIOobject!

如果truncate在 Python 2 代码中使用该方法,同时调用更安全seek(0)(之前或之后,无关紧要),以便在不可避免地将其移植到 Python 3 时代码不会中断。 还有另一个原因你应该只创建一个新StringIO对象!

Times

时代

Python 2

蟒蛇 2

>>> from timeit import timeit
>>> def truncate(sio):
...     sio.truncate(0)
...     return sio
... 
>>> def new(sio):
...     return StringIO()
... 

When empty, with StringIO:

当为空时,使用 StringIO:

>>> from StringIO import StringIO
>>> timeit(lambda: truncate(StringIO()))
3.5194039344787598
>>> timeit(lambda: new(StringIO()))
3.6533868312835693

With 3KB of data in, with StringIO:

输入 3KB 数据,使用 StringIO:

>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
4.3437709808349609
>>> timeit(lambda: new(StringIO('abc' * 1000)))
4.7179079055786133

And the same with cStringIO:

与 cStringIO 相同:

>>> from cStringIO import StringIO
>>> timeit(lambda: truncate(StringIO()))
0.55461597442626953
>>> timeit(lambda: new(StringIO()))
0.51241087913513184
>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
1.0958449840545654
>>> timeit(lambda: new(StringIO('abc' * 1000)))
0.98760509490966797

So, ignoring potential memory concerns (del oldstringio), it's faster to truncate a StringIO.StringIO(3% faster for empty, 8% faster for 3KB of data), but it's faster ("fasterer" too) to create a new cStringIO.StringIO(8% faster for empty, 10% faster for 3KB of data). So I'd recommend just using the easiest one—so presuming you're working with CPython, use cStringIOand create new ones.

因此,忽略潜在的内存问题 ( del oldstringio),截断 a 的速度更快StringIO.StringIO(空的快3%,3KB 的数据快 8%),但创建新的更快(“更快”)cStringIO.StringIO(空的快 8%,对于 3KB 数据,速度提高 10%)。所以我建议只使用最简单的——假设你正在使用 CPython,使用cStringIO并创建新的。

Python 3

蟒蛇 3

The same code, just with seek(0)put in.

相同的代码,只需seek(0)放入即可。

>>> def truncate(sio):
...     sio.truncate(0)
...     sio.seek(0)
...     return sio
... 
>>> def new(sio):
...     return StringIO()
...

When empty:

空时:

>>> from io import StringIO
>>> timeit(lambda: truncate(StringIO()))
0.9706327870007954
>>> timeit(lambda: new(StringIO()))
0.8734330690022034

With 3KB of data in:

使用 3KB 数据:

>>> timeit(lambda: truncate(StringIO('abc' * 1000)))
3.5271066290006274
>>> timeit(lambda: new(StringIO('abc' * 1000)))
3.3496507499985455

So for Python 3 creating a new one instead of reusing a blank one is 11% faster and creating a new one instead of reusing a 3K one is 5% faster. Again, create a new StringIOrather than truncating and seeking.

因此,对于 Python 3,创建一个新的而不是重用空白的快 11%,创建一个新的而不是重用 3K 的快 5%。再次,创造一个新的StringIO而不是截断和寻求。

回答by Erik Kaplun

How I managed to optimise my processing (read in chunks, process each chunk, write processed stream out to file) of many files in a sequence is that I reuse the same cStringIO.StringIOinstance, but always reset()it after using, then write to it, and then truncate(). By doing this, I'm only truncating the part at the end that I don't need for the current file. This seems to have given me a ~3% performance increase. Anybody who's more expert on this could confirm if this indeed optimises memory allocation.

我如何设法优化我对序列中许多文件的处理(读入块,处理每个块,将处理过的流写入文件)是我重用同一个cStringIO.StringIO实例,但总是reset()在使用后,然后写入它,然后truncate(). 通过这样做,我只是在最后截断了当前文件不需要的部分。这似乎使我的性能提高了约 3%。任何对此更专业的人都可以确认这是否确实优化了内存分配。

sio = cStringIO.StringIO()
for file in files:
    read_file_chunks_and_write_to_sio(file, sio)
    sio.truncate()
    with open('out.bla', 'w') as f:
        f.write(sio.getvalue())
    sio.reset()

回答by tstone2077

There is something important to note (at least with Python 3.2):

有一些重要的事情需要注意(至少对于 Python 3.2):

seek(0) ISneeded before truncate(0). Here is some code without the seek(0):

求(0)IS截断(0)之前所需的。这是一些没有seek(0)的代码:

from io import StringIO
s = StringIO()
s.write('1'*3)
print(repr(s.getvalue()))
s.truncate(0)
print(repr(s.getvalue()))
s.write('1'*3)
print(repr(s.getvalue()))

Which outputs:

哪些输出:

'111'
''
'\x00\x00\x00111'

with seek(0) before the truncate, we get the expected output:

在截断之前使用 seek(0),我们得到预期的输出:

'111'
''
'111'