Python NLTK 和停用词失败 #lookuperror
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/26693736/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
NLTK and Stopwords Fail #lookuperror
提问by Facundo
I am trying to start a project of sentiment analysis and I will use the stop words method. I made some research and I found that nltk have stopwords but when I execute the command there is an error.
我正在尝试启动一个情感分析项目,我将使用停用词方法。我做了一些研究,发现 nltk 有停用词,但是当我执行命令时出现错误。
What I do is the following, in order to know which are the words that nltk use (like what you may found here http://www.nltk.org/book/ch02.htmlin section4.1):
我所做的是以下内容,以便知道 nltk 使用哪些词(例如您在http://www.nltk.org/book/ch02.html中的第4.1 节中可能会找到的内容):
from nltk.corpus import stopwords
stopwords.words('english')
But when I press enter I obtain
但是当我按下回车键时,我得到
---------------------------------------------------------------------------
LookupError Traceback (most recent call last)
<ipython-input-6-ff9cd17f22b2> in <module>()
----> 1 stopwords.words('english')
C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __getattr__(self, attr)
66
67 def __getattr__(self, attr):
---> 68 self.__load()
69 # This looks circular, but its not, since __load() changes our
70 # __class__ to something new:
C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __load(self)
54 except LookupError, e:
55 try: root = nltk.data.find('corpora/%s' % zip_name)
---> 56 except LookupError: raise e
57
58 # Load the corpus.
LookupError:
**********************************************************************
Resource 'corpora/stopwords' not found. Please use the NLTK
Downloader to obtain the resource: >>> nltk.download()
Searched in:
- 'C:\Users\Meru/nltk_data'
- 'C:\nltk_data'
- 'D:\nltk_data'
- 'E:\nltk_data'
- 'C:\Users\Meru\Anaconda\nltk_data'
- 'C:\Users\Meru\Anaconda\lib\nltk_data'
- 'C:\Users\Meru\AppData\Roaming\nltk_data'
**********************************************************************
And, because of this problem things like this cannot run properly (obtaining the same error):
而且,由于这个问题,这样的事情无法正常运行(获得相同的错误):
>>> from nltk.corpus import stopwords
>>> stop = stopwords.words('english')
>>> sentence = "this is a foo bar sentence"
>>> print [i for i in sentence.split() if i not in stop]
Do you know what may be problem? I must use words in Spanish, do you recomend another method? I also thought using Goslate package with datasets in english
你知道可能有什么问题吗?我必须使用西班牙语单词,你推荐另一种方法吗?我还想过使用带有英文数据集的 Goslate 包
Thanks for reading!
谢谢阅读!
P.D.: I use Ananconda
PD:我使用 Ananconda
采纳答案by tttthomasssss
You don't seem to have the stopwords corpus on your computer.
您的计算机上似乎没有停用词语料库。
You need to start the NLTK Downloader and download all the data you need.
您需要启动 NLTK Downloader 并下载您需要的所有数据。
Open a Python console and do the following:
打开 Python 控制台并执行以下操作:
>>> import nltk
>>> nltk.download()
showing info http://nltk.github.com/nltk_data/
In the GUI window that opens simply press the 'Download' button to download all corpora or go to the 'Corpora' tab and only download the ones you need/want.
在打开的 GUI 窗口中,只需按“下载”按钮即可下载所有语料库,或转到“语料库”选项卡并仅下载您需要/想要的语料库。
回答by SVK
If you want to manually install NLTK Corpus.
如果您想手动安装 NLTK 语料库。
1) Go to http://www.nltk.org/nltk_data/and download your desired NLTK Corpus file.
1) 到http://www.nltk.org/nltk_data/下载你想要的 NLTK Corpus 文件。
2) Now in a Python shell check the value of nltk.data.path
2) 现在在 Python shell 中检查 nltk.data.path 的值
3) Choose one of the path that exists on your machine, and unzip the data files into the corpora sub directory inside.
3) 选择机器上存在的路径之一,将数据文件解压到里面的语料库子目录中。
4) Now you can import the data from nltk.corpos import stopwords
4) 现在你可以从 nltk.corpos import stopwords 导入数据
Reference: https://medium.com/@satorulogic/how-to-manually-download-a-nltk-corpus-f01569861da9
参考:https: //medium.com/@satorulogic/how-to-manually-download-a-nltk-corpus-f01569861da9
回答by Rohit P
import nltk
nltk.download()
Click on download button when gui prompted. It worked for me.(nltk.download('stopwords')doesn't work for me)
gui 提示时单击下载按钮。它对我有用。(对我nltk.download('stopwords')不起作用)
回答by Abu Shoeb
I tried from ubuntu terminal and I don't know why the GUI didn't show up according to tttthomasssss answer. So I followed the comment from KLDavenport and it worked. Here is the summary:
我从 ubuntu 终端尝试过,但根据 tttthomasssss 的回答,我不知道为什么 GUI 没有出现。所以我遵循了 KLDavenport 的评论并且它奏效了。这是摘要:
Open your terminal/command-line and type python then
打开你的终端/命令行并输入 python 然后
>>> import nltk
.>>> nltk.download("stopwords")
>>> import nltk
.>>> nltk.download("stopwords")
This will store the stopwords corpus under the nltk_data. For my case it was /home/myusername/nltk_data/corpora/stopwords.
这将在 nltk_data 下存储停用词语料库。就我而言,它是/home/myusername/nltk_data/corpora/stopwords.
If you need another corpus then visit nltk dataand find the corpus with their ID. Then use the ID to download like we did for stopwords.
如果您需要另一个语料库,请访问nltk 数据并找到带有其 ID 的语料库。然后使用 ID 下载,就像我们对停用词所做的那样。
回答by Jishnu Nair
import nltk
import nltk
nltk.download()
nltk.download()
- A GUI pops up and in that go the Corpora section, select the required corpus.
- Verified Result
- 一个 GUI 弹出,然后进入语料库部分,选择所需的语料库。
- 验证结果
回答by Haseeb
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
STOPWORDS = set(stopwords.words('english'))

