Python 在 Heroku 上找不到资源“语料库/wordnet”
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/13965823/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Resource 'corpora/wordnet' not found on Heroku
提问by user1881006
I'm trying to get NLTK and wordnet working on Heroku. I've already done
我正在尝试让 NLTK 和 wordnet 在 Heroku 上工作。我已经做了
heroku run python
nltk.download()
wordnet
pip install -r requirements.txt
But I get this error:
但我收到此错误:
Resource 'corpora/wordnet' not found. Please use the NLTK
Downloader to obtain the resource: >>> nltk.download()
Searched in:
- '/app/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
Yet, I've looked at in /app/nltk_data and it's there, so I'm not sure what's going on.
然而,我已经查看了 /app/nltk_data 并且它在那里,所以我不确定发生了什么。
采纳答案by follyroof
I just had this same problem. What ended up working for me is creating an 'nltk_data' directory in the application's folder itself, downloading the corpus to that directory and adding a line to my code that lets the nltk know to look in that directory. You can do this all locally and then push the changes to Heroku.
我刚刚遇到了同样的问题。最终对我有用的是在应用程序的文件夹本身中创建一个“nltk_data”目录,将语料库下载到该目录并在我的代码中添加一行,让 nltk 知道在该目录中查找。您可以在本地完成所有这些操作,然后将更改推送到 Heroku。
So, supposing my python application is in a directory called "myapp/"
因此,假设我的 Python 应用程序位于名为“myapp/”的目录中
Step 1: Create the directory
第一步:创建目录
cd myapp/
mkdir nltk_data
Step 2: Download Corpus to New Directory
第二步:将语料库下载到新目录
python -m nltk.downloader
This'll pop up the nltkdownloader. Set your Download Directoryto whatever_the_absolute_path_to_myapp_is/nltk_data/. If you're using the GUI downloader, the download directory is set through a text field on the bottom of the UI. If you're using the command line one, you set it in the config menu.
这将弹出nltk下载器。将您的下载目录设置为whatever_the_absolute_path_to_myapp_is/nltk_data/. 如果您使用的是 GUI 下载器,则下载目录是通过 UI 底部的文本字段设置的。如果您使用命令行一,则在配置菜单中进行设置。
Once the downloader knows to point to your newly created nltk_datadirectory, download your corpus.
一旦下载器知道指向您新创建的nltk_data目录,请下载您的语料库。
Or in one step from Python code:
或者从 Python 代码一步:
nltk.download("wordnet", "whatever_the_absolute_path_to_myapp_is/nltk_data/")
Step 3: Let nltk Know Where to Look
第 3 步:让 nltk 知道去哪里找
ntlklooks for data,resources,etc. in the locations specified in the nltk.data.pathvariable. All you need to do is add nltk.data.path.append('./nltk_data/')to the python file actually using nltk, and it will look for corpora, tokenizers, and such in there in addition to the default paths.
ntlk寻找数据、资源等。在nltk.data.path变量中指定的位置。您需要做的就是nltk.data.path.append('./nltk_data/')实际使用 nltk添加到 python 文件中,除了默认路径之外,它还会在其中查找语料库、标记器等。
Step 4: Send it to Heroku
第 4 步:将其发送到 Heroku
git add nltk_data/
git commit -m 'super useful commit message'
git push heroku master
That should work! It did for me anyway. One thing worth noting is that the path from the python file executing nltk stuff to the nltk_data directory may be different depending on how you've structured your application, so just account for that when you do nltk.data.path.append('path_to_nltk_data')
那应该工作!无论如何,它对我有用。值得注意的一件事是,从执行 nltk 内容的 python 文件到 nltk_data 目录的路径可能会有所不同,具体取决于您构建应用程序的方式,因此在您执行此操作时只需考虑这一点nltk.data.path.append('path_to_nltk_data')
回答by Gaurav Anand
I was getting this issue. For those who are not working in virtual environment, will need to download to following directory in ubuntu:
我遇到了这个问题。对于那些不在虚拟环境中工作的人,需要下载到 ubuntu 中的以下目录:
/usr/share/nltk_data/corpora/wordnet
Instead of wordnet it could be brown or whatever. You can directly run this command in your terminal if you want to download the corpus.
而不是 wordnet,它可以是棕色或其他什么。如果要下载语料库,可以直接在终端中运行此命令。
$ sudo python -m nltk.downloader -d /usr/share/nltk_data wordnet
Again instead of wordnet it could be brown.
再次代替wordnet,它可能是棕色的。
回答by HappyCoding
For Mac OS user only.
仅适用于 Mac OS 用户。
python -m nltk.downloader -d /usr/share/nltk_data wordnet
python -m nltk.downloader -d /usr/share/nltk_data wordnet
the corpora data can't be downloaded directly to the /usr/share/nltk_datafolder. error reports "no permission", two solutions:
语料库数据不能直接下载到/usr/share/nltk_data文件夹中。报错“无权限”,两种解决方法:
Add additional permission change to the Mac system, details refer to Operation Not Permitted when on root El capitan (rootless disabled). However, I don't want to change to mac default setting just for this corpora. and I go for the second solution.
- Download the corpora to any directory you have the access to. `python -m nltk.downloader -d some_user_accessable_directory wordnet'. Noted, there you only download the required corpora, e.g., wordnet, reuters instead of the whole corpora from nltk.
Add path to nltk path. In py file, add following lines:
import nltk nltk.data.path.append('nltk_data')
向 Mac 系统添加额外的权限更改,详细信息请参阅在 root El capitalan (rootless disabled) 上时不允许操作。但是,我不想仅针对此语料库更改为 mac 默认设置。我选择第二种解决方案。
- 将语料库下载到您有权访问的任何目录。`python -m nltk.downloader -d some_user_accessable_directory wordnet'。请注意,在那里您只能下载所需的语料库,例如 wordnet、reuters,而不是从 nltk 下载整个语料库。
将路径添加到 nltk 路径。在 py 文件中,添加以下几行:
import nltk nltk.data.path.append('nltk_data')
回答by Michael Godshall
Update
更新
As Kenneth Reitz pointed out, a much simpler solution has been added to the heroku-python-buildpack. Add a nltk.txtfile to your root directory and list your corpora inside. See https://devcenter.heroku.com/articles/python-nltkfor details.
正如 Kenneth Reitz 指出的那样,heroku-python-buildpack 中添加了一个更简单的解决方案。将nltk.txt文件添加到您的根目录并在其中列出您的语料库。有关详细信息,请参阅https://devcenter.heroku.com/articles/python-nltk。
Original Answer
原答案
Here's a cleaner solution that allows you to install the NLTK data directly on Heroku without adding it to your git repo.
这是一个更简洁的解决方案,它允许您直接在 Heroku 上安装 NLTK 数据,而无需将其添加到您的 git 存储库中。
I used similar steps to install Textblobon Heroku, which uses NLTK as a dependency. I've made some minor adjustments to my original code in steps 3 and 4 that should work for an NLTK only installation.
我使用类似的步骤在 Heroku上安装Textblob,它使用 NLTK 作为依赖项。我在第 3 步和第 4 步中对我的原始代码做了一些小的调整,这些调整应该只适用于 NLTK 安装。
The default heroku buildpack includes a post_compilestepthat runs after all of the default build steps have been completed:
默认的Heroku buildpack包括post_compile步,经过所有的默认构建步骤运行已完成:
# post_compile
#!/usr/bin/env bash
if [ -f bin/post_compile ]; then
echo "-----> Running post-compile hook"
chmod +x bin/post_compile
sub-env bin/post_compile
fi
As you can see, it looks in your project directory for your own post_compilefile in the bindirectory, and it runs it if it exists. You can use this hook to install the nltk data.
如您所见,它会在您的项目目录中查找该目录中您自己的post_compile文件bin,并在该文件存在时运行它。你可以使用这个钩子来安装 nltk 数据。
Create the
bindirectory in the root of your local project.Add your own
post_compilefile to thebindirectory.# bin/post_compile #!/usr/bin/env bash if [ -f bin/install_nltk_data ]; then echo "-----> Running install_nltk_data" chmod +x bin/install_nltk_data bin/install_nltk_data fi echo "-----> Post-compile done"Add your own
install_nltk_datafile to thebindirectory.# bin/install_nltk_data #!/usr/bin/env bash source $BIN_DIR/utils echo "-----> Starting nltk data installation" # Assumes NLTK_DATA environment variable is already set # $ heroku config:set NLTK_DATA='/app/nltk_data' # Install the nltk data # NOTE: The following command installs the wordnet corpora, # so you may want to change for your specific needs. # See http://www.nltk.org/data.html python -m nltk.downloader wordnet # If using Textblob, use this instead: # python -m textblob.download_corpora lite # Open the NLTK_DATA directory cd ${NLTK_DATA} # Delete all of the zip files find . -name "*.zip" -type f -delete echo "-----> Finished nltk data installation"Add
nltkto yourrequirements.txtfile (Ortextblobif you are using Textblob).- Commit all of these changes to your repo.
Set the NLTK_DATA environment variable on your heroku app.
$ heroku config:set NLTK_DATA='/app/nltk_data'- Deploy to Heroku. You will see the
post_compilestep trigger at the end of the deployment, followed by the nltk download.
bin在本地项目的根目录中创建目录。将您自己的
post_compile文件添加到bin目录中。# bin/post_compile #!/usr/bin/env bash if [ -f bin/install_nltk_data ]; then echo "-----> Running install_nltk_data" chmod +x bin/install_nltk_data bin/install_nltk_data fi echo "-----> Post-compile done"将您自己的
install_nltk_data文件添加到bin目录中。# bin/install_nltk_data #!/usr/bin/env bash source $BIN_DIR/utils echo "-----> Starting nltk data installation" # Assumes NLTK_DATA environment variable is already set # $ heroku config:set NLTK_DATA='/app/nltk_data' # Install the nltk data # NOTE: The following command installs the wordnet corpora, # so you may want to change for your specific needs. # See http://www.nltk.org/data.html python -m nltk.downloader wordnet # If using Textblob, use this instead: # python -m textblob.download_corpora lite # Open the NLTK_DATA directory cd ${NLTK_DATA} # Delete all of the zip files find . -name "*.zip" -type f -delete echo "-----> Finished nltk data installation"添加
nltk到您的requirements.txt文件(或者,textblob如果您使用的是 Textblob)。- 将所有这些更改提交到您的存储库。
在您的 heroku 应用程序上设置 NLTK_DATA 环境变量。
$ heroku config:set NLTK_DATA='/app/nltk_data'- 部署到 Heroku。您将
post_compile在部署结束时看到步骤触发器,然后是 nltk 下载。
I hope you found this helpful! Enjoy!
我希望你觉得这有帮助!享受!
回答by Kenneth Reitz
Heroku now officially supports NLTK data, built-in!
Heroku 现在正式支持 NLTK 数据,内置!
回答by Joolah
This one works:
这个有效:
For Mac OS users.
对于 Mac OS 用户。
python -m nltk.downloader -d /usr/local/share/nltk_data wordnet
回答by Scid
I faced the exact same problem while deploying a chatbot on Heroku platform. Although the answer from follyroof is a fool-proof solution, but in many cases, the size of the repository would be increased drastically.
在 Heroku 平台上部署聊天机器人时,我遇到了完全相同的问题。虽然 follyroof 的答案是一个万无一失的解决方案,但在许多情况下,存储库的大小会急剧增加。
So, I used the nltk.download('PACKAGE') in my app.py file. This way whenever app.py is run, the dependencies are automatically downloaded.
所以,我在 app.py 文件中使用了 nltk.download('PACKAGE') 。这样,无论何时运行 app.py,都会自动下载依赖项。
回答by Laura Corssac
I could only solve my problem with this solution:
我只能用这个解决方案解决我的问题:
https://github.com/gunthercox/ChatterBot/issues/930#issuecomment-322111087
https://github.com/gunthercox/ChatterBot/issues/930#issuecomment-322111087
It is a workaround related to SSL.
这是与 SSL 相关的解决方法。

