Python 相当于 Java StringBuffer?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/19926089/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Python equivalent of Java StringBuffer?
提问by user2902773
Is there anything in Python like Java's StringBuffer
? Since strings are immutable in Python too, editing them in loops would be inefficient.
Python 中有没有像 Java 那样的东西StringBuffer
?由于字符串在 Python 中也是不可变的,因此在循环中编辑它们将是低效的。
回答by Kamlesh Arya
this link might be useful for concatenation in python
此链接可能对 python 中的连接有用
http://pythonadventures.wordpress.com/2010/09/27/stringbuilder/
http://pythonadventures.wordpress.com/2010/09/27/stringbuilder/
example from above link:
上面链接的例子:
def g():
sb = []
for i in range(30):
sb.append("abcdefg"[i%7])
return ''.join(sb)
print g()
# abcdefgabcdefgabcdefgabcdefgab
回答by unutbu
Perhaps use a bytearray:
也许使用字节数组:
In [1]: s = bytearray('Hello World')
In [2]: s[:5] = 'Bye'
In [3]: s
Out[3]: bytearray(b'Bye World')
In [4]: str(s)
Out[4]: 'Bye World'
The appeal of using a bytearray is its memory-efficiency and convenient syntax. It can also be faster than using a temporary list:
使用字节数组的吸引力在于它的内存效率和方便的语法。它也可以比使用临时列表更快:
In [36]: %timeit s = list('Hello World'*1000); s[5500:6000] = 'Bye'; s = ''.join(s)
1000 loops, best of 3: 256 μs per loop
In [37]: %timeit s = bytearray('Hello World'*1000); s[5500:6000] = 'Bye'; str(s)
100000 loops, best of 3: 2.39 μs per loop
Note that much of the difference in speed is attributable to the creation of the container:
请注意,速度的大部分差异归因于容器的创建:
In [32]: %timeit s = list('Hello World'*1000)
10000 loops, best of 3: 115 μs per loop
In [33]: %timeit s = bytearray('Hello World'*1000)
1000000 loops, best of 3: 1.13 μs per loop
回答by bruno desthuilliers
Depends on what you want to do. If you want a mutable sequence, the builtin list
type is your friend, and going from str to list and back is as simple as:
取决于你想做什么。如果你想要一个可变序列,内置list
类型就是你的朋友,从 str 到 list 再返回就像这样简单:
mystring = "abcdef"
mylist = list(mystring)
mystring = "".join(mylist)
If you want to build a large string using a for loop, the pythonic way is usually to build a list of strings then join them together with the proper separator (linebreak or whatever).
如果你想使用 for 循环构建一个大字符串,pythonic 的方法通常是构建一个字符串列表,然后用适当的分隔符(换行符或其他)将它们连接在一起。
Else you can also use some text template system, or a parser or whatever specialized tool is the most appropriate for the job.
否则,您还可以使用一些文本模板系统、解析器或任何最适合该工作的专用工具。
回答by georg
Efficient String Concatenation in Pythonis a rather old article and its main statement that the naive concatenation is far slower than joining is not valid anymore, because this part has been optimized in CPython since then:
Efficient String Concatenation in Python是一篇相当老的文章,它的主要声明说naive concatenation 比join 慢得多已经不再有效,因为这部分从那时起就在CPython 中进行了优化:
CPython implementation detail: If s and t are both strings, some Python implementations such as CPython can usually perform an in-place optimization for assignments of the form s = s + t or s += t. When applicable, this optimization makes quadratic run-time much less likely. This optimization is both version and implementation dependent. For performance sensitive code, it is preferable to use the str.join() method which assures consistent linear concatenation performance across versions and implementations. @ http://docs.python.org/2/library/stdtypes.html
CPython 实现细节:如果 s 和 t 都是字符串,则某些 Python 实现(例如 CPython)通常可以对 s = s + t 或 s += t 形式的赋值执行就地优化。在适用时,这种优化使二次运行时间的可能性大大降低。此优化取决于版本和实现。对于性能敏感的代码,最好使用 str.join() 方法,以确保跨版本和实现的一致线性串联性能。@ http://docs.python.org/2/library/stdtypes.html
I've adapted their code a bit and got the following results on my machine:
我稍微修改了他们的代码,并在我的机器上得到了以下结果:
from cStringIO import StringIO
from UserString import MutableString
from array import array
import sys, timeit
def method1():
out_str = ''
for num in xrange(loop_count):
out_str += `num`
return out_str
def method2():
out_str = MutableString()
for num in xrange(loop_count):
out_str += `num`
return out_str
def method3():
char_array = array('c')
for num in xrange(loop_count):
char_array.fromstring(`num`)
return char_array.tostring()
def method4():
str_list = []
for num in xrange(loop_count):
str_list.append(`num`)
out_str = ''.join(str_list)
return out_str
def method5():
file_str = StringIO()
for num in xrange(loop_count):
file_str.write(`num`)
out_str = file_str.getvalue()
return out_str
def method6():
out_str = ''.join([`num` for num in xrange(loop_count)])
return out_str
def method7():
out_str = ''.join(`num` for num in xrange(loop_count))
return out_str
loop_count = 80000
print sys.version
print 'method1=', timeit.timeit(method1, number=10)
print 'method2=', timeit.timeit(method2, number=10)
print 'method3=', timeit.timeit(method3, number=10)
print 'method4=', timeit.timeit(method4, number=10)
print 'method5=', timeit.timeit(method5, number=10)
print 'method6=', timeit.timeit(method6, number=10)
print 'method7=', timeit.timeit(method7, number=10)
Results:
结果:
2.7.1 (r271:86832, Jul 31 2011, 19:30:53)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)]
method1= 0.171155929565
method2= 16.7158739567
method3= 0.420584917068
method4= 0.231794118881
method5= 0.323612928391
method6= 0.120429992676
method7= 0.145267963409
Conclusions:
结论:
join
still wins over concat, but marginally- list comprehensions are faster than loops (when building a list)
- joining generators is slower than joining lists
- other methods are of no use (unless you're doing something special)
join
仍然胜过 concat,但微乎其微- 列表推导比循环更快(构建列表时)
- 加入生成器比加入列表慢
- 其他方法没有用(除非你在做一些特别的事情)
py3:
py3:
import sys
import timeit
from io import StringIO
from array import array
def test_concat():
out_str = ''
for _ in range(loop_count):
out_str += 'abc'
return out_str
def test_join_list_loop():
str_list = []
for _ in range(loop_count):
str_list.append('abc')
return ''.join(str_list)
def test_array():
char_array = array('b')
for _ in range(loop_count):
char_array.frombytes(b'abc')
return str(char_array.tostring())
def test_string_io():
file_str = StringIO()
for _ in range(loop_count):
file_str.write('abc')
return file_str.getvalue()
def test_join_list_compr():
return ''.join(['abc' for _ in range(loop_count)])
def test_join_gen_compr():
return ''.join('abc' for _ in range(loop_count))
loop_count = 80000
print(sys.version)
res = {}
for k, v in dict(globals()).items():
if k.startswith('test_'):
res[k] = timeit.timeit(v, number=10)
for k, v in sorted(res.items(), key=lambda x: x[1]):
print('{:.5f} {}'.format(v, k))
results
结果
3.7.5 (default, Nov 1 2019, 02:16:32)
[Clang 11.0.0 (clang-1100.0.33.8)]
0.03738 test_join_list_compr
0.05681 test_join_gen_compr
0.09425 test_string_io
0.09636 test_join_list_loop
0.11976 test_concat
0.19267 test_array
回答by rhaertel80
The previously provided answers are almost always best. However, sometimes the string is built up across many method calls and/or loops, so it's not necessarily natural to build up a list of lines and then join them. And since there's no guarantee you are using CPython or that CPython's optimization will apply, then another approach is to just use print!
之前提供的答案几乎总是最好的。但是,有时字符串是跨许多方法调用和/或循环构建的,因此构建行列表然后加入它们不一定很自然。并且由于不能保证您使用的是 CPython 或 CPython 的优化将适用,那么另一种方法是仅使用打印!
Here's an example helper class, although the helper class is trivial and probably unnecessary, it serves to illustrate the approach (Python 3):
下面是一个示例助手类,虽然助手类是微不足道的,可能是不必要的,但它有助于说明该方法(Python 3):
import io
class StringBuilder(object):
def __init__(self):
self._stringio = io.StringIO()
def __str__(self):
return self._stringio.getvalue()
def append(self, *objects, sep=' ', end=''):
print(*objects, sep=sep, end=end, file=self._stringio)
sb = StringBuilder()
sb.append('a')
sb.append('b', end='\n')
sb.append('c', 'd', sep=',', end='\n')
print(sb) # 'ab\nc,d\n'
回答by Roee Gavirel
Just a test I run on python 3.6.2 showing that "join" still win BIG!
只是我在 python 3.6.2 上运行的一个测试表明“加入”仍然大获全胜!
from time import time
def _with_format(i):
_st = ''
for i in range(0, i):
_st = "{}{}".format(_st, "0")
return _st
def _with_s(i):
_st = ''
for i in range(0, i):
_st = "%s%s" % (_st, "0")
return _st
def _with_list(i):
l = []
for i in range(0, i):
l.append("0")
return "".join(l)
def _count_time(name, i, func):
start = time()
r = func(i)
total = time() - start
print("%s done in %ss" % (name, total))
return r
iterationCount = 1000000
r1 = _count_time("with format", iterationCount, _with_format)
r2 = _count_time("with s", iterationCount, _with_s)
r3 = _count_time("with list and join", iterationCount, _with_list)
if r1 != r2 or r2 != r3:
print("Not all results are the same!")
And the output was:
输出是:
with format done in 17.991968870162964s
with s done in 18.36879801750183s
with list and join done in 0.12142801284790039s
回答by Martlark
I've added to Roee Gavirel's code 2 additional tests that show conclusively that joining lists into strings is not any faster than s += "something".
我在 Roee Gavirel 的代码中添加了 2 个额外的测试,最终表明将列表加入字符串并不比 s += "something" 快。
Results:
结果:
Python 2.7.15rc1
Iterations: 100000
format done in 0.317540168762s
%s done in 0.151262044907s
list+join done in 0.0055148601532s
str cat done in 0.00391721725464s
Python 3.6.7
Iterations: 100000
format done in 0.35594654083251953s
%s done in 0.2868080139160156s
list+join done in 0.005924701690673828s
str cat done in 0.0054128170013427734s
f str done in 0.12870001792907715s
Code:
代码:
from time import time
def _with_cat(i):
_st = ''
for i in range(0, i):
_st += "0"
return _st
def _with_f_str(i):
_st = ''
for i in range(0, i):
_st = f"{_st}0"
return _st
def _with_format(i):
_st = ''
for i in range(0, i):
_st = "{}{}".format(_st, "0")
return _st
def _with_s(i):
_st = ''
for i in range(0, i):
_st = "%s%s" % (_st, "0")
return _st
def _with_list(i):
l = []
for i in range(0, i):
l.append("0")
return "".join(l)
def _count_time(name, i, func):
start = time()
r = func(i)
total = time() - start
print("%s done in %ss" % (name, total))
return r
iteration_count = 100000
print('Iterations: {}'.format(iteration_count))
r1 = _count_time("format ", iteration_count, _with_format)
r2 = _count_time("%s ", iteration_count, _with_s)
r3 = _count_time("list+join", iteration_count, _with_list)
r4 = _count_time("str cat ", iteration_count, _with_cat)
r5 = _count_time("f str ", iteration_count, _with_f_str)
if len(set([r1, r2, r3, r4, r5])) != 1:
print("Not all results are the same!")