Python 在 Heroku 上找不到资源“语料库/wordnet”

Question

提问by user1881006

I'm trying to get NLTK and wordnet working on Heroku. I've already done

我正在尝试让 NLTK 和 wordnet 在 Heroku 上工作。我已经做了

heroku run python
nltk.download()
  wordnet
pip install -r requirements.txt

But I get this error:

但我收到此错误：

Resource 'corpora/wordnet' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
    - '/app/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'

Yet, I've looked at in /app/nltk_data and it's there, so I'm not sure what's going on.

然而，我已经查看了 /app/nltk_data 并且它在那里，所以我不确定发生了什么。

Answer 1

采纳答案by follyroof

I just had this same problem. What ended up working for me is creating an 'nltk_data' directory in the application's folder itself, downloading the corpus to that directory and adding a line to my code that lets the nltk know to look in that directory. You can do this all locally and then push the changes to Heroku.

我刚刚遇到了同样的问题。最终对我有用的是在应用程序的文件夹本身中创建一个“nltk_data”目录，将语料库下载到该目录并在我的代码中添加一行，让 nltk 知道在该目录中查找。您可以在本地完成所有这些操作，然后将更改推送到 Heroku。

So, supposing my python application is in a directory called "myapp/"

因此，假设我的 Python 应用程序位于名为“myapp/”的目录中

Step 1: Create the directory

第一步：创建目录

cd myapp/
mkdir nltk_data

Step 2: Download Corpus to New Directory

第二步：将语料库下载到新目录

python -m nltk.downloader

This'll pop up the nltkdownloader. Set your Download Directoryto whatever_the_absolute_path_to_myapp_is/nltk_data/. If you're using the GUI downloader, the download directory is set through a text field on the bottom of the UI. If you're using the command line one, you set it in the config menu.

这将弹出nltk下载器。将您的下载目录设置为whatever_the_absolute_path_to_myapp_is/nltk_data/. 如果您使用的是 GUI 下载器，则下载目录是通过 UI 底部的文本字段设置的。如果您使用命令行一，则在配置菜单中进行设置。

Once the downloader knows to point to your newly created nltk_datadirectory, download your corpus.

一旦下载器知道指向您新创建的nltk_data目录，请下载您的语料库。

Or in one step from Python code:

或者从 Python 代码一步：

nltk.download("wordnet", "whatever_the_absolute_path_to_myapp_is/nltk_data/")

Step 3: Let nltk Know Where to Look

第 3 步：让 nltk 知道去哪里找

ntlklooks for data,resources,etc. in the locations specified in the nltk.data.pathvariable. All you need to do is add nltk.data.path.append('./nltk_data/')to the python file actually using nltk, and it will look for corpora, tokenizers, and such in there in addition to the default paths.

ntlk寻找数据、资源等。在nltk.data.path变量中指定的位置。您需要做的就是nltk.data.path.append('./nltk_data/')实际使用 nltk添加到 python 文件中，除了默认路径之外，它还会在其中查找语料库、标记器等。

Step 4: Send it to Heroku

第 4 步：将其发送到 Heroku

git add nltk_data/
git commit -m 'super useful commit message'
git push heroku master

That should work! It did for me anyway. One thing worth noting is that the path from the python file executing nltk stuff to the nltk_data directory may be different depending on how you've structured your application, so just account for that when you do nltk.data.path.append('path_to_nltk_data')

那应该工作！无论如何，它对我有用。值得注意的一件事是，从执行 nltk 内容的 python 文件到 nltk_data 目录的路径可能会有所不同，具体取决于您构建应用程序的方式，因此在您执行此操作时只需考虑这一点nltk.data.path.append('path_to_nltk_data')

Answer 2

回答by Gaurav Anand

I was getting this issue. For those who are not working in virtual environment, will need to download to following directory in ubuntu:

我遇到了这个问题。对于那些不在虚拟环境中工作的人，需要下载到 ubuntu 中的以下目录：

/usr/share/nltk_data/corpora/wordnet

Instead of wordnet it could be brown or whatever. You can directly run this command in your terminal if you want to download the corpus.

而不是 wordnet，它可以是棕色或其他什么。如果要下载语料库，可以直接在终端中运行此命令。

$ sudo python -m nltk.downloader -d /usr/share/nltk_data wordnet

Again instead of wordnet it could be brown.

再次代替wordnet，它可能是棕色的。

Answer 3

回答by HappyCoding

For Mac OS user only.

仅适用于 Mac OS 用户。

python -m nltk.downloader -d /usr/share/nltk_data wordnet

the corpora data can't be downloaded directly to the /usr/share/nltk_datafolder. error reports "no permission", two solutions:

语料库数据不能直接下载到/usr/share/nltk_data文件夹中。报错“无权限”，两种解决方法：

Add additional permission change to the Mac system, details refer to Operation Not Permitted when on root El capitan (rootless disabled). However, I don't want to change to mac default setting just for this corpora. and I go for the second solution.
- Download the corpora to any directory you have the access to. `python -m nltk.downloader -d some_user_accessable_directory wordnet'. Noted, there you only download the required corpora, e.g., wordnet, reuters instead of the whole corpora from nltk.
- Add path to nltk path. In py file, add following lines:
  import nltk nltk.data.path.append('nltk_data')

向 Mac 系统添加额外的权限更改，详细信息请参阅在 root El capitalan (rootless disabled) 上时不允许操作。但是，我不想仅针对此语料库更改为 mac 默认设置。我选择第二种解决方案。
- 将语料库下载到您有权访问的任何目录。`python -m nltk.downloader -d some_user_accessable_directory wordnet'。请注意，在那里您只能下载所需的语料库，例如 wordnet、reuters，而不是从 nltk 下载整个语料库。
- 将路径添加到 nltk 路径。在 py 文件中，添加以下几行：
  import nltk nltk.data.path.append('nltk_data')

Answer 4

回答by Michael Godshall

Update

更新

As Kenneth Reitz pointed out, a much simpler solution has been added to the heroku-python-buildpack. Add a nltk.txtfile to your root directory and list your corpora inside. See https://devcenter.heroku.com/articles/python-nltkfor details.

正如 Kenneth Reitz 指出的那样，heroku-python-buildpack 中添加了一个更简单的解决方案。将nltk.txt文件添加到您的根目录并在其中列出您的语料库。有关详细信息，请参阅https://devcenter.heroku.com/articles/python-nltk。

Original Answer

原答案

Here's a cleaner solution that allows you to install the NLTK data directly on Heroku without adding it to your git repo.

这是一个更简洁的解决方案，它允许您直接在 Heroku 上安装 NLTK 数据，而无需将其添加到您的 git 存储库中。

I used similar steps to install Textblobon Heroku, which uses NLTK as a dependency. I've made some minor adjustments to my original code in steps 3 and 4 that should work for an NLTK only installation.

我使用类似的步骤在 Heroku上安装Textblob，它使用 NLTK 作为依赖项。我在第 3 步和第 4 步中对我的原始代码做了一些小的调整，这些调整应该只适用于 NLTK 安装。

The default heroku buildpack includes a post_compilestepthat runs after all of the default build steps have been completed:

默认的Heroku buildpack包括post_compile步，经过所有的默认构建步骤运行已完成：

# post_compile
#!/usr/bin/env bash

if [ -f bin/post_compile ]; then
    echo "-----> Running post-compile hook"
    chmod +x bin/post_compile
    sub-env bin/post_compile
fi

As you can see, it looks in your project directory for your own post_compilefile in the bindirectory, and it runs it if it exists. You can use this hook to install the nltk data.

如您所见，它会在您的项目目录中查找该目录中您自己的post_compile文件bin，并在该文件存在时运行它。你可以使用这个钩子来安装 nltk 数据。

Create the bindirectory in the root of your local project.

Add your own post_compilefile to the bindirectory.

# bin/post_compile
#!/usr/bin/env bash

if [ -f bin/install_nltk_data ]; then
    echo "-----> Running install_nltk_data"
    chmod +x bin/install_nltk_data
    bin/install_nltk_data
fi

echo "-----> Post-compile done"

Add your own install_nltk_datafile to the bindirectory.

# bin/install_nltk_data
#!/usr/bin/env bash

source $BIN_DIR/utils

echo "-----> Starting nltk data installation"

# Assumes NLTK_DATA environment variable is already set
# $ heroku config:set NLTK_DATA='/app/nltk_data'

# Install the nltk data
# NOTE: The following command installs the wordnet corpora, 
# so you may want to change for your specific needs.  
# See http://www.nltk.org/data.html
python -m nltk.downloader wordnet

# If using Textblob, use this instead:
# python -m textblob.download_corpora lite

# Open the NLTK_DATA directory
cd ${NLTK_DATA}

# Delete all of the zip files
find . -name "*.zip" -type f -delete

echo "-----> Finished nltk data installation"

Add nltkto your requirements.txtfile (Or textblobif you are using Textblob).
Commit all of these changes to your repo.
Set the NLTK_DATA environment variable on your heroku app.
```
$ heroku config:set NLTK_DATA='/app/nltk_data'
```
Deploy to Heroku. You will see the post_compilestep trigger at the end of the deployment, followed by the nltk download.

将您自己的post_compile文件添加到bin目录中。

# bin/post_compile
#!/usr/bin/env bash

if [ -f bin/install_nltk_data ]; then
    echo "-----> Running install_nltk_data"
    chmod +x bin/install_nltk_data
    bin/install_nltk_data
fi

echo "-----> Post-compile done"

将您自己的install_nltk_data文件添加到bin目录中。

# bin/install_nltk_data
#!/usr/bin/env bash

source $BIN_DIR/utils

echo "-----> Starting nltk data installation"

# Assumes NLTK_DATA environment variable is already set
# $ heroku config:set NLTK_DATA='/app/nltk_data'

# Install the nltk data
# NOTE: The following command installs the wordnet corpora, 
# so you may want to change for your specific needs.  
# See http://www.nltk.org/data.html
python -m nltk.downloader wordnet

# If using Textblob, use this instead:
# python -m textblob.download_corpora lite

# Open the NLTK_DATA directory
cd ${NLTK_DATA}

# Delete all of the zip files
find . -name "*.zip" -type f -delete

echo "-----> Finished nltk data installation"

添加nltk到您的requirements.txt文件（或者，textblob如果您使用的是 Textblob）。
将所有这些更改提交到您的存储库。
在您的 heroku 应用程序上设置 NLTK_DATA 环境变量。
```
$ heroku config:set NLTK_DATA='/app/nltk_data'
```
部署到 Heroku。您将post_compile在部署结束时看到步骤触发器，然后是 nltk 下载。

I hope you found this helpful! Enjoy!

我希望你觉得这有帮助！享受！

Answer 5

回答by Kenneth Reitz

Heroku now officially supports NLTK data, built-in!

Heroku 现在正式支持 NLTK 数据，内置！

https://devcenter.heroku.com/articles/python-nltk

Answer 6

回答by Joolah

This one works:

这个有效：

For Mac OS users.

对于 Mac OS 用户。

python -m nltk.downloader -d /usr/local/share/nltk_data wordnet

Answer 7

回答by Scid

I faced the exact same problem while deploying a chatbot on Heroku platform. Although the answer from follyroof is a fool-proof solution, but in many cases, the size of the repository would be increased drastically.

在 Heroku 平台上部署聊天机器人时，我遇到了完全相同的问题。虽然 follyroof 的答案是一个万无一失的解决方案，但在许多情况下，存储库的大小会急剧增加。

So, I used the nltk.download('PACKAGE') in my app.py file. This way whenever app.py is run, the dependencies are automatically downloaded.

所以，我在 app.py 文件中使用了 nltk.download('PACKAGE') 。这样，无论何时运行 app.py，都会自动下载依赖项。

Answer 8

回答by Laura Corssac

I could only solve my problem with this solution:

我只能用这个解决方案解决我的问题：

https://github.com/gunthercox/ChatterBot/issues/930#issuecomment-322111087

It is a workaround related to SSL.

这是与 SSL 相关的解决方法。

Python 在 Heroku 上找不到资源“语料库/wordnet”

提问by user1881006

采纳答案by follyroof

回答by Gaurav Anand

回答by HappyCoding

回答by Michael Godshall

Update

更新

Original Answer

原答案

回答by Kenneth Reitz

回答by Joolah

回答by Scid

回答by Laura Corssac

相关推荐

最近更新

标签

Python 在 Heroku 上找不到资源“语料库/wordnet”

提问by user1881006

采纳答案by follyroof

回答by Gaurav Anand

回答by HappyCoding

回答by Michael Godshall

Update

更新

Original Answer

原答案

回答by Kenneth Reitz

回答by Joolah

回答by Scid

回答by Laura Corssac

相关推荐

使用 urllib2 - Python 2.7 登录网站

Python 在字符串中显示不可打印的字符

Python SQLite 外键示例

Python：JSON 字符串到字典列表 - 迭代时出错

相关推荐

最近更新

标签