Python 'generator' 类型的对象没有 len()

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/36913543/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 18:32:25  来源:igfitidea点击:

object of type 'generator' has no len()

pythonnltk

提问by Vishal Kharde

I have just started to learn python.I want to write a program in NLTK that breaks a text into unigrams, bigrams. For example if the input text is:

我刚刚开始学习 python。我想用 NLTK 编写一个程序,将文本分解为 unigrams、bigrams。例如,如果输入文本是:

"I am feeling sad and disappointed due to errors"

“我因为错误而感到悲伤和失望”

Function should generate text like:

函数应生成如下文本:

I am-->am feeling-->feeling sad-->sad and-->and disappointed-->disppointed due-->due to-->to errors

我-->感觉-->感到悲伤-->悲伤和-->失望-->失望-->由于-->错误

I have written code to input text into the program. Here's the function I'm trying:

我已经编写了将文本输入到程序中的代码。这是我正在尝试的功能:

def gen_bigrams(text):
    token = nltk.word_tokenize(review)
    bigrams = ngrams(token, 2)
    #print Counter(bigrams)
    bigram_list = ""
    for x in range(0, len(bigrams)):
        words = bigrams[x]
        bigram_list = bigram_list + words[0]+ " " + words[1]+"-->"
    return bigram_list

The error I'm getting is...

我得到的错误是...

for x in range(0, len(bigrams)):
TypeError: object of type 'generator' has no len()

As the ngram function returns a generator, I tried using len(list(bigrams))but it returns 0 value, so I'm getting the same error. I have referred to other questions on stackexchange but I am still not getting round how to resolve this. I am stuck at this error. Any workaround, suggestion?.

由于 ngram 函数返回一个生成器,我尝试使用len(list(bigrams))但它返回 0 值,所以我得到了同样的错误。我已经提到了关于 stackexchange 的其他问题,但我仍然没有解决如何解决这个问题。我被这个错误困住了。任何解决方法,建议?

采纳答案by Ilja Everil?

Constructing strings by concatenating values separated by a separator is best done by str.join:

通过连接由分隔符分隔的值来构造字符串最好通过str.join以下方式完成:

def gen_bigrams(text):
    token = nltk.word_tokenize(text)
    bigrams = nltk.ngrams(token, 2)
    # instead of " ".join also "{} {}".format would work in the map
    return "-->".join(map(" ".join, bigrams))

Note that there'll be no trailing "-->", so add that, if it's necessary. This way you don't even have to think about the length of the iterable you're using. In general in python that is almost always the case. If you want to iterate through an iterable, use for x in iterable:. If you do need the indexes, use enumerate:

请注意,不会有尾随的“-->”,因此如有必要,请添加它。这样你甚至不必考虑你正在使用的迭代的长度。一般来说,在python中几乎总是如此。如果要遍历可迭代对象,请使用for x in iterable:. 如果确实需要索引,请使用enumerate

for i, x in enumerate(iterable):
    ...

回答by MohitC

bigrams is a generator function and bigrams.next() is what gives you the tuple of your tokens. You can do len() on bigrams.next() but not on the generator function. Following is more sophisticated code to do what you are trying to achieve.

bigrams 是一个生成器函数,而 bigrams.next() 是为您提供令牌元组的东西。您可以在 bigrams.next() 上执行 len() 但不能在生成器函数上执行。以下是更复杂的代码来完成您想要实现的目标。

>>> review = "i am feeling sad and disappointed due to errors"
>>> token = nltk.word_tokenize(review)
>>> bigrams = nltk.ngrams(token, 2)
>>> output = ""
>>> try:
...   while True:
...     temp = bigrams.next()
...     output += "%s %s-->" % (temp[0], temp[1])
... except StopIteration:
...   pass
... 
>>> output
'i am-->am feeling-->feeling sad-->sad and-->and disappointed-->disappointed due-->due to-->to errors-->'
>>>