Python 未找到资源 u'tokenizers/punkt/english.pickle'
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26570944/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Resource u'tokenizers/punkt/english.pickle' not found
提问by Supreeth Meka
My Code:
我的代码:
import nltk.data
tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')
ERROR Message:
错误信息:
[ec2-user@ip-172-31-31-31 sentiment]$ python mapper_local_v1.0.py
Traceback (most recent call last):
File "mapper_local_v1.0.py", line 16, in <module>
tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')
File "/usr/lib/python2.6/site-packages/nltk/data.py", line 774, in load
opened_resource = _open(resource_url)
File "/usr/lib/python2.6/site-packages/nltk/data.py", line 888, in _open
return find(path_, path + ['']).open()
File "/usr/lib/python2.6/site-packages/nltk/data.py", line 618, in find
raise LookupError(resource_not_found)
LookupError:
Resource u'tokenizers/punkt/english.pickle' not found. Please
use the NLTK Downloader to obtain the resource:
>>>nltk.download()
Searched in:
- '/home/ec2-user/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''
I'm trying to run this program in Unix machine:
我试图在 Unix 机器上运行这个程序:
As per the error message, I logged into python shell from my unix machine then I used the below commands:
根据错误消息,我从我的 unix 机器登录到 python shell,然后我使用了以下命令:
import nltk
nltk.download()
and then I downloaded all the available things using d- down loader and l- list options but still the problem persists.
然后我使用 d-down loader 和 l-list 选项下载了所有可用的东西,但问题仍然存在。
I tried my best to find the solution in internet but I got the same solution what I did as I mentioned in my above steps.
我尽力在互联网上找到解决方案,但我得到了与我在上述步骤中提到的相同的解决方案。
采纳答案by Supreeth Meka
I got the solution:
我得到了解决方案:
import nltk
nltk.download()
once the NLTK Downloader starts
一旦 NLTK 下载器启动
d) Download l) List u) Update c) Config h) Help q) Quit
d) 下载 l) 列表 u) 更新 c) 配置 h) 帮助 q) 退出
Downloader> d
下载器>d
Download which package (l=list; x=cancel)? Identifier> punkt
下载哪个包(l=list;x=cancel)?标识符> punkt
回答by eeelnico
The same thing happened to me recently, you just need to download the "punkt" package and it should work.
最近我也发生了同样的事情,你只需要下载“punkt”包就可以了。
When you execute "list" (l) after having "downloaded all the available things", is everything marked like the following line?:
在“下载所有可用的东西”后执行“list”(l) 时,是否所有内容都标记为如下行?:
[*] punkt............... Punkt Tokenizer Models
If you see this line with the star, it means you have it, and nltk should be able to load it.
如果您看到带有星号的这一行,则表示您拥有它,并且 nltk 应该能够加载它。
回答by alvas
If you're looking to only download the punktmodel:
如果您只想下载punkt模型:
import nltk
nltk.download('punkt')
If you're unsure which data/model you need, you can install the populardatasets, models and taggers from NLTK:
如果您不确定您需要哪种数据/模型,您可以从 NLTK安装流行的数据集、模型和标记器:
import nltk
nltk.download('popular')
With the above command, there is no need to use the GUI to download the datasets.
使用上述命令,无需使用 GUI 下载数据集。
回答by yprez
To add to alvas' answer, you can download only the punktcorpus:
要添加到alvas 的回答中,您只能下载punkt语料库:
nltk.download('punkt')
Downloading allsounds like overkill to me. Unless that's what you want.
下载all对我来说听起来有点矫枉过正。除非那是你想要的。
回答by Raj
My issue was that I called nltk.download('all')as the root user, but the process that eventually used nltk was another user who didn't have access to /root/nltk_data where the content was downloaded.
我的问题是我nltk.download('all')以 root 用户身份调用,但最终使用 nltk 的进程是另一个用户,该用户无权访问下载内容的 /root/nltk_data。
So I simply recursively copied everything from the download location to one of the paths where NLTK was looking to find it like this:
所以我只是递归地将所有内容从下载位置复制到 NLTK 希望找到它的路径之一,如下所示:
cp -R /root/nltk_data/ /home/ubuntu/nltk_data
回答by Deepthi Karnam
Simple nltk.download() will not solve this issue. I tried the below and it worked for me:
简单的 nltk.download() 不会解决这个问题。我尝试了以下方法,它对我有用:
in the nltk folder create a tokenizers folder and copy your punkt folder into tokenizers folder.
在 nltk 文件夹中创建一个 tokenizers 文件夹并将您的 punkt 文件夹复制到 tokenizers 文件夹中。
This will work.! the folder structure needs to be as shown in the picture
这会奏效。! 文件夹结构需要如图所示
回答by alily
You need to rearrange your folders
Move your tokenizersfolder into nltk_datafolder.
This doesn't work if you have nltk_datafolder containing corporafolder containing tokenizersfolder
您需要重新排列文件夹 将tokenizers文件夹移动到nltk_data文件夹中。如果您的nltk_data文件夹包含包含corpora文件夹的 tokenizers文件夹,则这不起作用
回答by Franck Dernoncourt
From the shell you can execute:
您可以从 shell 执行:
sudo python -m nltk.downloader punkt
If you want to install the popular NLTK corpora/models:
如果要安装流行的 NLTK 语料库/模型:
sudo python -m nltk.downloader popular
If you want to install allNLTK corpora/models:
如果要安装所有NLTK 语料库/模型:
sudo python -m nltk.downloader all
To list the resources you have downloaded:
要列出您已下载的资源:
python -c 'import os; import nltk; print os.listdir(nltk.data.find("corpora"))'
python -c 'import os; import nltk; print os.listdir(nltk.data.find("tokenizers"))'
回答by Dharani Manne
Go to python console by typing
通过键入转到 python 控制台
$ python
$蟒蛇
in your terminal. Then, type the following 2 commands in your python shell to install the respective packages:
在您的终端中。然后,在你的 python shell 中键入以下 2 个命令来安装相应的包:
>> nltk.download('punkt') >> nltk.download('averaged_perceptron_tagger')
>> nltk.download('punkt') >> nltk.download('averaged_perceptron_tagger')
This solved the issue for me.
这为我解决了这个问题。
回答by Camille
For me nothing of the above worked, so I just downloaded all the files by hand from the web site http://www.nltk.org/nltk_data/and I put them also by hand in a file "tokenizers" inside of "nltk_data" folder. Not a pretty solution but still a solution.
对我来说,以上都没有奏效,所以我只是从网站http://www.nltk.org/nltk_data/手动下载了所有文件,并将它们手动放入“nltk_data”内的“tokenizers”文件中“ 文件夹。不是一个很好的解决方案,但仍然是一个解决方案。

