Python 使用 nltk.data.load 加载 english.pickle 失败
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/4867197/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Failed loading english.pickle with nltk.data.load
提问by Martin
When trying to load the punkttokenizer...
尝试加载分punkt词器时...
import nltk.data
tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')
...a LookupErrorwas raised:
...aLookupError提出:
> LookupError:
> *********************************************************************
> Resource 'tokenizers/punkt/english.pickle' not found. Please use the NLTK Downloader to obtain the resource: nltk.download(). Searched in:
> - 'C:\Users\Martinos/nltk_data'
> - 'C:\nltk_data'
> - 'D:\nltk_data'
> - 'E:\nltk_data'
> - 'E:\Python26\nltk_data'
> - 'E:\Python26\lib\nltk_data'
> - 'C:\Users\Martinos\AppData\Roaming\nltk_data'
> **********************************************************************
回答by richardr
I had this same problem. Go into a python shell and type:
我有同样的问题。进入 python shell 并输入:
>>> import nltk
>>> nltk.download()
Then an installation window appears. Go to the 'Models' tab and select 'punkt' from under the 'Identifier' column. Then click Download and it will install the necessary files. Then it should work!
然后会出现一个安装窗口。转到“模型”选项卡,然后从“标识符”列下选择“朋克”。然后单击下载,它将安装必要的文件。那么它应该工作!
回答by Ashish Singh
i came across this problem when i was trying to do pos tagging in nltk.
the way i got it correct is by making a new directory along with corpora directory named "taggers" and copying max_pos_tagger in directory taggers.
hope it works for you too. best of luck with it!!!.
当我尝试在 nltk 中进行 pos 标记时遇到了这个问题。我得到它正确的方法是创建一个新目录以及名为“taggers”的语料库目录并在目录taggers中复制max_pos_tagger。
希望它也适用于你。祝你好运!!!。
回答by Naren Yellavula
You can do that like this.
你可以这样做。
import nltk
nltk.download('punkt')
from nltk import word_tokenize,sent_tokenize
You can download the tokenizers by passing punktas an argument to the downloadfunction. The word and sentence tokenizers are then available on nltk.
您可以通过punkt作为参数传递给download函数来下载标记器。然后可以在 上使用单词和句子标记器nltk。
If you want to download everything i.e chunkers, grammars, misc, sentiment, taggers, corpora, help, models, stemmers, tokenizers, do not pass any arguments like this.
如果您想下载所有内容,即chunkers, grammars, misc, sentiment, taggers, corpora, help, models, stemmers, tokenizers, 请勿传递任何此类参数。
nltk.download()
See this for more insights. https://www.nltk.org/data.html
有关更多见解,请参阅此内容。https://www.nltk.org/data.html
回答by Deepthi Karnam
Simple nltk.download()will not solve this issue. I tried the below and it worked for me:
简单nltk.download()并不能解决这个问题。我尝试了以下方法,它对我有用:
in the nltkfolder create a tokenizersfolder and copy your punktfolder into tokenizersfolder.
在nltk文件夹中创建一个tokenizers文件夹并将您的punkt文件tokenizers夹复制到文件夹中。
This will work.! the folder structure needs to be as shown in the picture!1
这会奏效。!文件夹结构需要如图所示!1
回答by jjinking
This is what worked for me just now:
这就是刚才对我有用的方法:
# Do this in a separate python interpreter session, since you only have to do it once
import nltk
nltk.download('punkt')
# Do this in your ipython notebook or analysis script
from nltk.tokenize import word_tokenize
sentences = [
"Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.",
"Professor Plum has a green plant in his study.",
"Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."
]
sentences_tokenized = []
for s in sentences:
sentences_tokenized.append(word_tokenize(s))
sentences_tokenized is a list of a list of tokens:
sentence_tokenized 是令牌列表的列表:
[['Mr.', 'Green', 'killed', 'Colonel', 'Mustard', 'in', 'the', 'study', 'with', 'the', 'candlestick', '.', 'Mr.', 'Green', 'is', 'not', 'a', 'very', 'nice', 'fellow', '.'],
['Professor', 'Plum', 'has', 'a', 'green', 'plant', 'in', 'his', 'study', '.'],
['Miss', 'Scarlett', 'watered', 'Professor', 'Plum', "'s", 'green', 'plant', 'while', 'he', 'was', 'away', 'from', 'his', 'office', 'last', 'week', '.']]
The sentences were taken from the example ipython notebook accompanying the book "Mining the Social Web, 2nd Edition"
这些句子取自“挖掘社交网络,第 2 版”一书随附的示例ipython 笔记本
回答by Torrtuga
Check if you have all NLTK libraries.
检查您是否拥有所有 NLTK 库。
回答by Jignesh Vasoya
nltk have its pre-trained tokenizer models. Model is downloading from internally predefined web sources and stored at path of installed nltk package while executing following possible function calls.
nltk 有其预训练的分词器模型。模型正在从内部预定义的 Web 源下载并存储在已安装的 nltk 包的路径中,同时执行以下可能的函数调用。
E.g. 1 tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')
例如 1 tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')
E.g. 2 nltk.download('punkt')
例如 2 nltk.download('punkt')
If you call above sentence in your code, Make sure you have internet connection without any firewall protections.
如果您在代码中调用上述句子,请确保您有没有任何防火墙保护的互联网连接。
I would like to share some more better alter-net way to resolve above issue with more better deep understandings.
我想分享一些更好的alter-net方法以更深入的理解来解决上述问题。
Please follow following steps and enjoy english word tokenization using nltk.
请按照以下步骤使用 nltk 享受英语单词标记化。
Step 1: First download the "english.pickle" model following web path.
第 1 步:首先在 Web 路径下下载“english.pickle”模型。
Goto link "http://www.nltk.org/nltk_data/" and click on "download" at option "107. Punkt Tokenizer Models"
转到链接“ http://www.nltk.org/nltk_data/”,然后在选项“107. Punkt Tokenizer Models”处单击“下载”
Step 2: Extract the downloaded "punkt.zip" file and find the "english.pickle" file from it and place in C drive.
第二步:解压下载的“punkt.zip”文件,从中找到“english.pickle”文件,放入C盘。
Step 3: copy paste following code and execute.
第3步:复制粘贴以下代码并执行。
from nltk.data import load
from nltk.tokenize.treebank import TreebankWordTokenizer
sentences = [
"Mr. Green killed Colonel Mustard in the study with the candlestick. Mr. Green is not a very nice fellow.",
"Professor Plum has a green plant in his study.",
"Miss Scarlett watered Professor Plum's green plant while he was away from his office last week."
]
tokenizer = load('file:C:/english.pickle')
treebank_word_tokenize = TreebankWordTokenizer().tokenize
wordToken = []
for sent in sentences:
subSentToken = []
for subSent in tokenizer.tokenize(sent):
subSentToken.extend([token for token in treebank_word_tokenize(subSent)])
wordToken.append(subSentToken)
for token in wordToken:
print token
Let me know, if you face any problem
如果您遇到任何问题,请告诉我
回答by cgl
From bash command line, run:
从 bash 命令行,运行:
$ python -c "import nltk; nltk.download('punkt')"


