Python 如何从代码配置 nltk 数据目录?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3522372/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to config nltk data directory from code?
提问by Juanjo Conti
How to config nltk data directory from code?
如何从代码配置 nltk 数据目录?
回答by Tim McNamara
Just change items of nltk.data.path, it's a simple list.
只需更改 的项目nltk.data.path,这是一个简单的列表。
回答by bahlum
I use append, example
我使用附加,例如
nltk.data.path.append('/libs/nltk_data/')
回答by alvas
From the code, http://www.nltk.org/_modules/nltk/data.html:
从代码,http: //www.nltk.org/_modules/nltk/data.html:
``nltk:path``: Specifies the file stored in the NLTK data package at *path*. NLTK will search for these files in the directories specified by ``nltk.data.path``.
``nltk:path``: Specifies the file stored in the NLTK data package at *path*. NLTK will search for these files in the directories specified by ``nltk.data.path``.
Then within the code:
然后在代码中:
######################################################################
# Search Path
######################################################################
path = []
"""A list of directories where the NLTK data package might reside.
These directories will be checked in order when looking for a
resource in the data package. Note that this allows users to
substitute in their own versions of resources, if they have them
(e.g., in their home directory under ~/nltk_data)."""
# User-specified locations:
path += [d for d in os.environ.get('NLTK_DATA', str('')).split(os.pathsep) if d]
if os.path.expanduser('~/') != '~/':
path.append(os.path.expanduser(str('~/nltk_data')))
if sys.platform.startswith('win'):
# Common locations on Windows:
path += [
str(r'C:\nltk_data'), str(r'D:\nltk_data'), str(r'E:\nltk_data'),
os.path.join(sys.prefix, str('nltk_data')),
os.path.join(sys.prefix, str('lib'), str('nltk_data')),
os.path.join(os.environ.get(str('APPDATA'), str('C:\')), str('nltk_data'))
]
else:
# Common locations on UNIX & OS X:
path += [
str('/usr/share/nltk_data'),
str('/usr/local/share/nltk_data'),
str('/usr/lib/nltk_data'),
str('/usr/local/lib/nltk_data')
]
To modify the path, simply append to the list of possible paths:
要修改路径,只需附加到可能的路径列表:
import nltk
nltk.data.path.append("/home/yourusername/whateverpath/")
Or in windows:
或在窗口中:
import nltk
nltk.data.path.append("C:\somewhere\farfar\away\path")
回答by danyamachine
For those using uwsgi:
对于那些使用 uwsgi 的人:
I was having trouble because I wanted a uwsgi app (running as a different user than myself) to have access to nltk data that I had previously downloaded. What worked for me was adding the following line to myapp_uwsgi.ini:
我遇到了麻烦,因为我想要一个 uwsgi 应用程序(以与我不同的用户身份运行)能够访问我之前下载的 nltk 数据。对我有用的是将以下行添加到myapp_uwsgi.ini:
env = NLTK_DATA=/home/myuser/nltk_data/
This sets the environment variable NLTK_DATA, as suggested by @schemacs.
You may need to restart your uwsgi process after making this change.
这会设置环境变量NLTK_DATA,如@schemacs 所建议的。
进行此更改后,您可能需要重新启动 uwsgi 进程。
回答by fnjn
Instead of adding nltk.data.path.append('your/path/to/nltk_data')to every script, NLTK accepts NLTK_DATA environment variable. (code link)
nltk.data.path.append('your/path/to/nltk_data')NLTK 接受 NLTK_DATA 环境变量,而不是添加到每个脚本中。(代码链接)
Open ~/.bashrc(or ~/.profile) with text editor (e.g. nano, vim, gedit), and add following line:
使用文本编辑器打开~/.bashrc(或~/.profile)(例如nano、vim、gedit),并添加以下行:
export NLTK_DATA="your/path/to/nltk_data"
Execute sourceto load environmental variable
执行source加载环境变量
source ~/.bashrc
Test
测试
Open python and execute following lines
打开python并执行以下几行
import nltk
nltk.data.path
Your can see your nltk data path already in there.
您可以在那里看到您的 nltk 数据路径。
Reference: @alvations's answer on nltk/nltk #1997
回答by Steve
Another solution is to get ahead of it.
另一个解决方案是抢先一步。
try import nltk nltk.download()
尝试导入 nltk nltk.download()
When the window box pops up asking if you want to download the corpus , you can specify there which directory it is to be downloaded to.
当弹出窗口询问您是否要下载语料库时,您可以在那里指定要将其下载到哪个目录。

