Python 如何下载 NLTK 数据?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/22211525/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 00:29:07  来源:igfitidea点击:

How do I download NLTK data?

pythonnltk

提问by Q-ximi

Updated answer:NLTK works for 2.7 well. I had 3.2. I uninstalled 3.2 and installed 2.7. Now it works!!

更新的答案:NLTK 适用于 2.7。我有3.2。我卸载了 3.2 并安装了 2.7。现在它起作用了!!

I have installed NLTK and tried to download NLTK Data. What I did was to follow the instrution on this site: http://www.nltk.org/data.html

我已经安装了 NLTK 并尝试下载 NLTK 数据。我所做的是按照本网站上的说明进行操作:http://www.nltk.org/data.html

I downloaded NLTK, installed it, and then tried to run the following code:

我下载了 NLTK,安装了它,然后尝试运行以下代码:

>>> import nltk
>>> nltk.download()

It gave me the error message like below:

它给了我如下错误消息:

Traceback (most recent call last):
  File "<pyshell#6>", line 1, in <module>
    nltk.download()
AttributeError: 'module' object has no attribute 'download'
 Directory of C:\Python32\Lib\site-packages

Tried both nltk.download()and nltk.downloader(), both gave me error messages.

nltk.download()nltk.downloader()都试过,都给了我错误信息。

Then I used help(nltk)to pull out the package, it shows the following info:

然后我用来help(nltk)拉出包裹,它显示以下信息:

NAME
    nltk

PACKAGE CONTENTS
    align
    app (package)
    book
    ccg (package)
    chat (package)
    chunk (package)
    classify (package)
    cluster (package)
    collocations
    corpus (package)
    data
    decorators
    downloader
    draw (package)
    examples (package)
    featstruct
    grammar
    help
    inference (package)
    internals
    lazyimport
    metrics (package)
    misc (package)
    model (package)
    parse (package)
    probability
    sem (package)
    sourcedstring
    stem (package)
    tag (package)
    test (package)
    text
    tokenize (package)
    toolbox
    tree
    treetransforms
    util
    yamltags

FILE
    c:\python32\lib\site-packages\nltk

I do see Downloader there, not sure why it does not work. Python 3.2.2, system Windows vista.

我确实在那里看到了下载器,不知道为什么它不起作用。Python 3.2.2,系统Windows vista。

回答by Miquel

If you are running a really old version of nltk, then there is indeed no download module available (reference)

如果您运行的是非常旧版本的 nltk,那么确实没有可用的下载模块(参考

Try this:

尝试这个:

import nltk
print(nltk.__version__)

As per the reference, anything after 0.9.5 should be fine

根据参考,0.9.5 之后的任何内容都应该没问题

回答by user3682157

you can't have a saved python file called nltk.pybecause the interpreter is reading from that and not from the actual file.

您不能调用已保存的 python 文件,nltk.py因为解释器正在从中读取而不是从实际文件中读取。

Change the name of your file that the python shell is reading from and try what you were doing originally:

更改 python shell 正在读取的文件的名称,然后尝试您最初在做什么:

import nltkand then nltk.download()

import nltk进而 nltk.download()

回答by alvas

TL;DR

TL; 博士

To download a particular dataset/models, use the nltk.download()function, e.g. if you are looking to download the punktsentence tokenizer, use:

要下载特定的数据集/模型,请使用该nltk.download()函数,例如,如果您要下载punkt句子标记器,请使用:

$ python3
>>> import nltk
>>> nltk.download('punkt')

If you're unsure of which data/model you need, you can start out with the basic list of data + models with:

如果您不确定您需要哪种数据/模型,您可以从数据 + 模型的基本列表开始:

>>> import nltk
>>> nltk.download('popular')

It will download a list of "popular" resources, these includes:

它将下载“流行”资源列表,其中包括:

<collection id="popular" name="Popular packages">
      <item ref="cmudict" />
      <item ref="gazetteers" />
      <item ref="genesis" />
      <item ref="gutenberg" />
      <item ref="inaugural" />
      <item ref="movie_reviews" />
      <item ref="names" />
      <item ref="shakespeare" />
      <item ref="stopwords" />
      <item ref="treebank" />
      <item ref="twitter_samples" />
      <item ref="omw" />
      <item ref="wordnet" />
      <item ref="wordnet_ic" />
      <item ref="words" />
      <item ref="maxent_ne_chunker" />
      <item ref="punkt" />
      <item ref="snowball_data" />
      <item ref="averaged_perceptron_tagger" />
    </collection>


EDITED

已编辑

In case anyone is avoiding errors from downloading larger datasets from nltk, from https://stackoverflow.com/a/38135306/610569

如果有人nltkhttps://stackoverflow.com/a/38135306/610569下载更大的数据集避免错误

$ rm /Users/<your_username>/nltk_data/corpora/panlex_lite.zip
$ rm -r /Users/<your_username>/nltk_data/corpora/panlex_lite
$ python

>>> import nltk
>>> dler = nltk.downloader.Downloader()
>>> dler._update_index()
>>> dler._status_cache['panlex_lite'] = 'installed' # Trick the index to treat panlex_lite as it's already installed.
>>> dler.download('popular')

Updated

更新

From v3.2.5, NLTK has a more informative error messagewhen nltk_dataresource is not found, e.g.:

从 v3.2.5 开始,nltk_data找不到资源时,NLTK 有一条信息更丰富的错误消息,例如:

>>> from nltk import word_tokenize
>>> word_tokenize('x')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/l/alvas/git/nltk/nltk/tokenize/__init__.py", line 128, in word_tokenize
    sentences = [text] if preserve_line else sent_tokenize(text, language)
  File "/Users//alvas/git/nltk/nltk/tokenize/__init__.py", line 94, in sent_tokenize
    tokenizer = load('tokenizers/punkt/{0}.pickle'.format(language))
  File "/Users/alvas/git/nltk/nltk/data.py", line 820, in load
    opened_resource = _open(resource_url)
  File "/Users/alvas/git/nltk/nltk/data.py", line 938, in _open
    return find(path_, path + ['']).open()
  File "/Users/alvas/git/nltk/nltk/data.py", line 659, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')

  Searched in:
    - '/Users/alvas/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
    - ''
**********************************************************************

Related

有关的

回答by victor_gu

I had the similar issue. Probably check if you are using proxy.

我有类似的问题。可能检查您是否使用代理。

If yes, set up the proxy before doing download:

如果是,请在下载前设置代理:

nltk.set_proxy('http://proxy.example.com:3128', ('USERNAME', 'PASSWORD'))

回答by ADITYA AISHWARY

you should add python to your PATH during installation of python...after installation.. open cmd prompt type command-pip install nltkthen go to IDLE and open a new file..save it as file.py..then open file.py type the following: import nltk

你应该在安装python的过程中将python添加到你的PATH中......安装后..打开cmd提示符类型命令-pip install nltk然后转到IDLE并打开一个新文件..将其保存为file.py ..然后打开file.py键入以下:导入 nltk

nltk.download()

回答by Touya D. Serdan

Do not name your file nltk.py I used the same code and name it nltk, and got the same error as you have, I changed the file name and it went well.

不要将您的文件命名为 nltk.py 我使用了相同的代码并将其命名为 nltk,并且遇到了与您相同的错误,我更改了文件名并且运行良好。

回答by GOKUL JAGANNATH

I think you must have named the file as nltk.py (or the folder consists of a file with that name) so change it to any other name and try executing it....

我认为您必须将该文件命名为 nltk.py(或该文件夹包含具有该名称的文件),因此将其更改为任何其他名称并尝试执行它....

回答by Henrique Brand?o

You may try:

你可以试试:

>> $ import nltk
>> $ nltk.download_shell()
>> $ d
>> $ *name of the package*

happy nlp'ing.

快乐nlp'ing。

回答by Arun Das

It's very simple....

这很简单......

  1. Open pyScripter or any editor
  2. Create a python file eg: install.py
  3. write the below code in it.
  1. 打开 pyScripter 或任何编辑器
  2. 创建一个python文件,例如:install.py
  3. 在其中写入以下代码。
import nltk
nltk.download()
  1. A pop-up window will apper and click on download .
  1. 将出现一个弹出窗口,然后单击下载。

The download window]

下载窗口]

回答by B K

Try

尝试

nltk.download('all')

nltk.download('all')

this will download all the data and no need to download individually.

这将下载所有数据,无需单独下载。