在python中连接列表中元组的元素

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/20736917/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 21:07:35  来源:igfitidea点击:

Concatenate elements of a tuple in a list in python

pythonstringlisttuplesconcatenation

提问by alphacentauri

I have a list of tuples that has strings in it For instance:

我有一个包含字符串的元组列表例如:

[('this', 'is', 'a', 'foo', 'bar', 'sentences')
('is', 'a', 'foo', 'bar', 'sentences', 'and')
('a', 'foo', 'bar', 'sentences', 'and', 'i')
('foo', 'bar', 'sentences', 'and', 'i', 'want')
('bar', 'sentences', 'and', 'i', 'want', 'to')
('sentences', 'and', 'i', 'want', 'to', 'ngramize')
('and', 'i', 'want', 'to', 'ngramize', 'it')]

Now I wish to concatenate each string in a tuple to create a list of space separated strings. I used the following method:

现在我希望连接元组中的每个字符串以创建一个空格分隔字符串列表。我使用了以下方法:

NewData=[]
for grams in sixgrams:
       NewData.append( (''.join([w+' ' for w in grams])).strip())

which is working perfectly fine.

这工作得很好。

However, the list that I have has over a million tuples. So my question is that is this method efficient enough or is there some better way to do it. Thanks.

但是,我拥有的列表有超过一百万个元组。所以我的问题是这种方法是否足够有效,或者是否有更好的方法来做到这一点。谢谢。

采纳答案by lvc

For a lot of data, you should consider whether you needto keep it all in a list. If you are processing each one at a time, you can create a generator that will yield each joined string, but won't keep them all around taking up memory:

对于大量数据,您应该考虑是否需要将其全部保存在列表中。如果您一次处理每一个,您可以创建一个生成器来生成每个连接的字符串,但不会让它们全部占用内存:

new_data = (' '.join(w) for w in sixgrams)

if you can get the original tuples also from a generator, then you can avoid having the sixgramslist in memory as well.

如果您也可以从生成器中获取原始元组,那么您也可以避免将sixgrams列表放在内存中。

回答by falsetru

The list comprehension creates temporary strings. Just use ' '.joininstead.

列表推导式创建临时字符串。换用就好了' '.join

>>> words_list = [('this', 'is', 'a', 'foo', 'bar', 'sentences'),
...               ('is', 'a', 'foo', 'bar', 'sentences', 'and'),
...               ('a', 'foo', 'bar', 'sentences', 'and', 'i'),
...               ('foo', 'bar', 'sentences', 'and', 'i', 'want'),
...               ('bar', 'sentences', 'and', 'i', 'want', 'to'),
...               ('sentences', 'and', 'i', 'want', 'to', 'ngramize'),
...               ('and', 'i', 'want', 'to', 'ngramize', 'it')]
>>> new_list = []
>>> for words in words_list:
...     new_list.append(' '.join(words)) # <---------------
... 
>>> new_list
['this is a foo bar sentences', 
 'is a foo bar sentences and', 
 'a foo bar sentences and i', 
 'foo bar sentences and i want', 
 'bar sentences and i want to', 
 'sentences and i want to ngramize', 
 'and i want to ngramize it']


Above forloop can be expressed as following list comprehension:

上面的for循环可以表示为以下列表理解:

new_list = [' '.join(words) for words in words_list] 

回答by thefourtheye

You can do this efficiently like this

你可以像这样有效地做到这一点

joiner = " ".join
print map(joiner, sixgrams)

We can still improve the performance using list comprehension like this

我们仍然可以使用这样的列表理解来提高性能

joiner = " ".join
print [joiner(words) for words in sixgrams]

The performance comparison shows that the above seen list comprehension solution is slightly faster than other two solutions.

性能比较表明,上面看到的列表理解解决方案比其他两种解决方案略快。

from timeit import timeit

joiner = " ".join

def mapSolution():
    return map(joiner, sixgrams)

def comprehensionSolution1():
    return ["".join(words) for words in sixgrams]

def comprehensionSolution2():
    return [joiner(words) for words in sixgrams]

print timeit("mapSolution()", "from __main__ import joiner, mapSolution, sixgrams")
print timeit("comprehensionSolution1()", "from __main__ import sixgrams, comprehensionSolution1, joiner")
print timeit("comprehensionSolution2()", "from __main__ import sixgrams, comprehensionSolution2, joiner")

Output on my machine

在我的机器上输出

1.5691678524
1.66710209846
1.47555398941

The performance gain is most likely because of the fact that, we don't have to create the join function from the empty string everytime.

性能提升很可能是因为我们不必每次都从空字符串创建连接函数。

Edit:Though we can improve the performance like this, the most pythonic way is to go with generators like in lvc's answer.

编辑:虽然我们可以像这样提高性能,但最Pythonic 的方法是使用lvc's answer 中的生成器。