Python 使用 nltk.data.load 加载 english.pickle 失败

Question

提问by Martin

When trying to load the punkttokenizer...

尝试加载分punkt词器时...

import nltk.data
tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')

...a LookupErrorwas raised:

...aLookupError提出：

> LookupError: 
>     *********************************************************************   
> Resource 'tokenizers/punkt/english.pickle' not found.  Please use the NLTK Downloader to obtain the resource: nltk.download().   Searched in:
>         - 'C:\Users\Martinos/nltk_data'
>         - 'C:\nltk_data'
>         - 'D:\nltk_data'
>         - 'E:\nltk_data'
>         - 'E:\Python26\nltk_data'
>         - 'E:\Python26\lib\nltk_data'
>         - 'C:\Users\Martinos\AppData\Roaming\nltk_data'
>     **********************************************************************

Answer 1

回答by richardr

I had this same problem. Go into a python shell and type:

我有同样的问题。进入 python shell 并输入：

>>> import nltk
>>> nltk.download()

Then an installation window appears. Go to the 'Models' tab and select 'punkt' from under the 'Identifier' column. Then click Download and it will install the necessary files. Then it should work!

然后会出现一个安装窗口。转到“模型”选项卡，然后从“标识符”列下选择“朋克”。然后单击下载，它将安装必要的文件。那么它应该工作！

Answer 2

回答by Ashish Singh

i came across this problem when i was trying to do pos tagging in nltk. the way i got it correct is by making a new directory along with corpora directory named "taggers" and copying max_pos_tagger in directory taggers.
hope it works for you too. best of luck with it!!!.

当我尝试在 nltk 中进行 pos 标记时遇到了这个问题。我得到它正确的方法是创建一个新目录以及名为“taggers”的语料库目录并在目录taggers中复制max_pos_tagger。
希望它也适用于你。祝你好运！！！。

Answer 3

回答by Naren Yellavula

You can do that like this.

你可以这样做。

import nltk
nltk.download('punkt')

from nltk import word_tokenize,sent_tokenize

You can download the tokenizers by passing punktas an argument to the downloadfunction. The word and sentence tokenizers are then available on nltk.

您可以通过punkt作为参数传递给download函数来下载标记器。然后可以在上使用单词和句子标记器nltk。

If you want to download everything i.e chunkers, grammars, misc, sentiment, taggers, corpora, help, models, stemmers, tokenizers, do not pass any arguments like this.

如果您想下载所有内容，即chunkers, grammars, misc, sentiment, taggers, corpora, help, models, stemmers, tokenizers, 请勿传递任何此类参数。

nltk.download()

See this for more insights. https://www.nltk.org/data.html

有关更多见解，请参阅此内容。https://www.nltk.org/data.html

Answer 4

回答by Deepthi Karnam

Simple nltk.download()will not solve this issue. I tried the below and it worked for me:

简单nltk.download()并不能解决这个问题。我尝试了以下方法，它对我有用：

in the nltkfolder create a tokenizersfolder and copy your punktfolder into tokenizersfolder.

在nltk文件夹中创建一个tokenizers文件夹并将您的punkt文件tokenizers夹复制到文件夹中。

This will work.! the folder structure needs to be as shown in the picture!1

这会奏效。！文件夹结构需要如图所示！1

Answer 5

回答by jjinking

This is what worked for me just now:

这就是刚才对我有用的方法：

# Do this in a separate python interpreter session, since you only have to do it once
import nltk
nltk.download('punkt')

# Do this in your ipython notebook or analysis script
from nltk.tokenize import word_tokenize

sentences = [
    "Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.",
    "Professor Plum has a green plant in his study.",
    "Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."
]

sentences_tokenized = []
for s in sentences:
    sentences_tokenized.append(word_tokenize(s))

sentences_tokenized is a list of a list of tokens:

sentence_tokenized 是令牌列表的列表：

[['Mr.', 'Green', 'killed', 'Colonel', 'Mustard', 'in', 'the', 'study', 'with', 'the', 'candlestick', '.', 'Mr.', 'Green', 'is', 'not', 'a', 'very', 'nice', 'fellow', '.'],
['Professor', 'Plum', 'has', 'a', 'green', 'plant', 'in', 'his', 'study', '.'],
['Miss', 'Scarlett', 'watered', 'Professor', 'Plum', "'s", 'green', 'plant', 'while', 'he', 'was', 'away', 'from', 'his', 'office', 'last', 'week', '.']]

The sentences were taken from the example ipython notebook accompanying the book "Mining the Social Web, 2nd Edition"

这些句子取自“挖掘社交网络，第 2 版”一书随附的示例ipython 笔记本

Answer 6

回答by Torrtuga

Check if you have all NLTK libraries.

检查您是否拥有所有 NLTK 库。

Answer 7

回答by Jignesh Vasoya

nltk have its pre-trained tokenizer models. Model is downloading from internally predefined web sources and stored at path of installed nltk package while executing following possible function calls.

nltk 有其预训练的分词器模型。模型正在从内部预定义的 Web 源下载并存储在已安装的 nltk 包的路径中，同时执行以下可能的函数调用。

E.g. 1 tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')

例如 1 tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')

E.g. 2 nltk.download('punkt')

例如 2 nltk.download('punkt')

If you call above sentence in your code, Make sure you have internet connection without any firewall protections.

如果您在代码中调用上述句子，请确保您有没有任何防火墙保护的互联网连接。

I would like to share some more better alter-net way to resolve above issue with more better deep understandings.

我想分享一些更好的alter-net方法以更深入的理解来解决上述问题。

Please follow following steps and enjoy english word tokenization using nltk.

请按照以下步骤使用 nltk 享受英语单词标记化。

Step 1: First download the "english.pickle" model following web path.

第 1 步：首先在 Web 路径下下载“english.pickle”模型。

Goto link "http://www.nltk.org/nltk_data/" and click on "download" at option "107. Punkt Tokenizer Models"

转到链接“ http://www.nltk.org/nltk_data/”，然后在选项“107. Punkt Tokenizer Models”处单击“下载”

Step 2: Extract the downloaded "punkt.zip" file and find the "english.pickle" file from it and place in C drive.

第二步：解压下载的“punkt.zip”文件，从中找到“english.pickle”文件，放入C盘。

Step 3: copy paste following code and execute.

第3步：复制粘贴以下代码并执行。

from nltk.data import load
from nltk.tokenize.treebank import TreebankWordTokenizer

sentences = [
    "Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.",
    "Professor Plum has a green plant in his study.",
    "Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."
]

tokenizer = load('file:C:/english.pickle')
treebank_word_tokenize = TreebankWordTokenizer().tokenize

wordToken = []
for sent in sentences:
    subSentToken = []
    for subSent in tokenizer.tokenize(sent):
        subSentToken.extend([token for token in treebank_word_tokenize(subSent)])

    wordToken.append(subSentToken)

for token in wordToken:
    print token

Let me know, if you face any problem

如果您遇到任何问题，请告诉我

Answer 8

回答by cgl

From bash command line, run:

从 bash 命令行，运行：

$ python -c "import nltk; nltk.download('punkt')"

Answer 9

回答by Abhijeet

On Jenkins this can be fixed by adding following like of code to Virtualenv Builderunder Buildtab:

在 Jenkins 上，这可以通过在Build选项卡下向Virtualenv Builder添加以下类似代码来解决：

python -m nltk.downloader punkt

Answer 10

回答by Roshan Bagdiya

This Works for me:

这对我有用：

>>> import nltk
>>> nltk.download()

In windows you will also get nltk downloader

在 Windows 中，您还将获得 nltk 下载器

Python 使用 nltk.data.load 加载 english.pickle 失败

提问by Martin

回答by richardr

回答by Ashish Singh

回答by Naren Yellavula

回答by Deepthi Karnam

回答by jjinking

回答by Torrtuga

回答by Jignesh Vasoya

回答by cgl

回答by Abhijeet

回答by Roshan Bagdiya

相关推荐

最近更新

标签

Python 使用 nltk.data.load 加载 english.pickle 失败

提问by Martin

回答by richardr

回答by Ashish Singh

回答by Naren Yellavula

回答by Deepthi Karnam

回答by jjinking

回答by Torrtuga

回答by Jignesh Vasoya

回答by cgl

回答by Abhijeet

回答by Roshan Bagdiya

相关推荐

Python 为什么我的南迁移不起作用？

Python setup.py 并将文件添加到 /bin/

Python 字典中的值可以有两个值吗？

如何将 Python 代码保持在 80 个字符以下而不使其变得丑陋？

相关推荐

最近更新

标签