Python 安装 gensim 时分块警告

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/41658568/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-20 01:22:38  来源:igfitidea点击:

Chunkize warning while installing gensim

pythongensim

提问by user7420652

I have installed gensim (through pip) in Python. After the installation is over I get the following warning:

我已经在 Python 中安装了 gensim(通过 pip)。安装完成后,我收到以下警告:

C:\Python27\lib\site-packages\gensim\utils.py:855: UserWarning: detected Windows; aliasing chunkize to chunkize_serial warnings.warn("detected Windows; aliasing chunkize to chunkize_serial")

C:\Python27\lib\site-packages\gensim\utils.py:855: UserWarning: 检测到 Windows;将chunkize别名为chunkize_serial warnings.warn("检测到Windows;将chunkize别名为chunkize_serial")

How can I rectify this?

我该如何纠正?

I am unable to import word2vec from gensim.models due to this warning.

由于此警告,我无法从 gensim.models 导入 word2vec。

I have the following configurations: Python 2.7, gensim-0.13.4.1, numpy-1.11.3, scipy-0.18.1, pattern-2.6.

我有以下配置:Python 2.7、gensim-0.13.4.1、numpy-1.11.3、scipy-0.18.1、pattern-2.6。

回答by Roland Pihlakas

You can suppress the message with this code beforeimporting gensim:

您可以导入 gensim之前使用此代码抑制消息:

import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')

import gensim

回答by Dongmin Pete Shin

I think is not a big problem. Gensim just lets you know that it will alias chunkize to different function because you use a specific os.

我觉得问题不大。Gensim 只是让您知道它将别名块化为不同的功能,因为您使用特定的 os。

Check out this code from gensim.utils

gensim.utils查看此代码

if os.name == 'nt':
    logger.info("detected Windows; aliasing chunkize to chunkize_serial")

    def chunkize(corpus, chunksize, maxsize=0, as_numpy=False):
        for chunk in chunkize_serial(corpus, chunksize, as_numpy=as_numpy):
            yield chunk
else:
    def chunkize(corpus, chunksize, maxsize=0, as_numpy=False):
    """
    Split a stream of values into smaller chunks.
    Each chunk is of length `chunksize`, except the last one which may be smaller.
    A once-only input stream (`corpus` from a generator) is ok, chunking is done
    efficiently via itertools.

    If `maxsize > 1`, don't wait idly in between successive chunk `yields`, but
    rather keep filling a short queue (of size at most `maxsize`) with forthcoming
    chunks in advance. This is realized by starting a separate process, and is
    meant to reduce I/O delays, which can be significant when `corpus` comes
    from a slow medium (like harddisk).

    If `maxsize==0`, don't fool around with parallelism and simply yield the chunksize
    via `chunkize_serial()` (no I/O optimizations).

    >>> for chunk in chunkize(range(10), 4): print(chunk)
    [0, 1, 2, 3]
    [4, 5, 6, 7]
    [8, 9]

    """
    assert chunksize > 0

    if maxsize > 0:
        q = multiprocessing.Queue(maxsize=maxsize)
        worker = InputQueue(q, corpus, chunksize, maxsize=maxsize, as_numpy=as_numpy)
        worker.daemon = True
        worker.start()
        while True:
            chunk = [q.get(block=True)]
            if chunk[0] is None:
                break
            yield chunk.pop()
    else:
        for chunk in chunkize_serial(corpus, chunksize, as_numpy=as_numpy):
            yield chunk