Python 如何从代码配置 nltk 数据目录?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3522372/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 11:32:50  来源:igfitidea点击:

How to config nltk data directory from code?

pythonpathdirectorynlpnltk

提问by Juanjo Conti

How to config nltk data directory from code?

如何从代码配置 nltk 数据目录?

回答by Tim McNamara

Just change items of nltk.data.path, it's a simple list.

只需更改 的项目nltk.data.path,这是一个简单的列表。

回答by bahlum

I use append, example

我使用附加,例如

nltk.data.path.append('/libs/nltk_data/')

回答by alvas

From the code, http://www.nltk.org/_modules/nltk/data.html:

从代码,http: //www.nltk.org/_modules/nltk/data.html

``nltk:path``: Specifies the file stored in the NLTK data
 package at *path*.  NLTK will search for these files in the
 directories specified by ``nltk.data.path``.
``nltk:path``: Specifies the file stored in the NLTK data
 package at *path*.  NLTK will search for these files in the
 directories specified by ``nltk.data.path``.

Then within the code:

然后在代码中:

######################################################################
# Search Path
######################################################################

path = []
"""A list of directories where the NLTK data package might reside.
   These directories will be checked in order when looking for a
   resource in the data package.  Note that this allows users to
   substitute in their own versions of resources, if they have them
   (e.g., in their home directory under ~/nltk_data)."""

# User-specified locations:
path += [d for d in os.environ.get('NLTK_DATA', str('')).split(os.pathsep) if d]
if os.path.expanduser('~/') != '~/':
    path.append(os.path.expanduser(str('~/nltk_data')))

if sys.platform.startswith('win'):
    # Common locations on Windows:
    path += [
        str(r'C:\nltk_data'), str(r'D:\nltk_data'), str(r'E:\nltk_data'),
        os.path.join(sys.prefix, str('nltk_data')),
        os.path.join(sys.prefix, str('lib'), str('nltk_data')),
        os.path.join(os.environ.get(str('APPDATA'), str('C:\')), str('nltk_data'))
    ]
else:
    # Common locations on UNIX & OS X:
    path += [
        str('/usr/share/nltk_data'),
        str('/usr/local/share/nltk_data'),
        str('/usr/lib/nltk_data'),
        str('/usr/local/lib/nltk_data')
    ]

To modify the path, simply append to the list of possible paths:

要修改路径,只需附加到可能的路径列表:

import nltk
nltk.data.path.append("/home/yourusername/whateverpath/")

Or in windows:

或在窗口中:

import nltk
nltk.data.path.append("C:\somewhere\farfar\away\path")

回答by danyamachine

For those using uwsgi:

对于那些使用 uwsgi 的人:

I was having trouble because I wanted a uwsgi app (running as a different user than myself) to have access to nltk data that I had previously downloaded. What worked for me was adding the following line to myapp_uwsgi.ini:

我遇到了麻烦,因为我想要一个 uwsgi 应用程序(以与我不同的用户身份运行)能够访问我之前下载的 nltk 数据。对我有用的是将以下行添加到myapp_uwsgi.ini

env = NLTK_DATA=/home/myuser/nltk_data/

This sets the environment variable NLTK_DATA, as suggested by @schemacs.
You may need to restart your uwsgi process after making this change.

这会设置环境变量NLTK_DATA,如@schemacs 所建议的。
进行此更改后,您可能需要重新启动 uwsgi 进程。

回答by fnjn

Instead of adding nltk.data.path.append('your/path/to/nltk_data')to every script, NLTK accepts NLTK_DATA environment variable. (code link)

nltk.data.path.append('your/path/to/nltk_data')NLTK 接受 NLTK_DATA 环境变量,而不是添加到每个脚本中。(代码链接

Open ~/.bashrc(or ~/.profile) with text editor (e.g. nano, vim, gedit), and add following line:

使用文本编辑器打开~/.bashrc(或~/.profile)(例如nanovimgedit),并添加以下行:

export NLTK_DATA="your/path/to/nltk_data"

Execute sourceto load environmental variable

执行source加载环境变量

source ~/.bashrc



Test

测试

Open python and execute following lines

打开python并执行以下几行

import nltk
nltk.data.path

Your can see your nltk data path already in there.

您可以在那里看到您的 nltk 数据路径。

Reference: @alvations's answer on nltk/nltk #1997

参考:@alvations 对nltk/nltk的回答 #1997

回答by Steve

Another solution is to get ahead of it.

另一个解决方案是抢先一步。

try import nltk nltk.download()

尝试导入 nltk nltk.download()

When the window box pops up asking if you want to download the corpus , you can specify there which directory it is to be downloaded to.

当弹出窗口询问您是否要下载语料库时,您可以在那里指定要将其下载到哪个目录。