Python NLTK 和停用词失败 #lookuperror

Question

提问by Facundo

I am trying to start a project of sentiment analysis and I will use the stop words method. I made some research and I found that nltk have stopwords but when I execute the command there is an error.

我正在尝试启动一个情感分析项目，我将使用停用词方法。我做了一些研究，发现 nltk 有停用词，但是当我执行命令时出现错误。

What I do is the following, in order to know which are the words that nltk use (like what you may found here http://www.nltk.org/book/ch02.htmlin section4.1):

我所做的是以下内容，以便知道 nltk 使用哪些词（例如您在http://www.nltk.org/book/ch02.html中的第4.1 节中可能会找到的内容）：

from nltk.corpus import stopwords
stopwords.words('english')

But when I press enter I obtain

但是当我按下回车键时，我得到

---------------------------------------------------------------------------
LookupError                               Traceback (most recent call last)
<ipython-input-6-ff9cd17f22b2> in <module>()
----> 1 stopwords.words('english')

C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __getattr__(self, attr)
 66
 67     def __getattr__(self, attr):
---> 68         self.__load()
 69         # This looks circular, but its not, since __load() changes our
 70         # __class__ to something new:

C:\Users\Usuario\Anaconda\lib\site-packages\nltk\corpus\util.pyc in __load(self)
 54             except LookupError, e:
 55                 try: root = nltk.data.find('corpora/%s' % zip_name)
---> 56                 except LookupError: raise e
 57
 58         # Load the corpus.

LookupError:
**********************************************************************
  Resource 'corpora/stopwords' not found.  Please use the NLTK
  Downloader to obtain the resource:  >>> nltk.download()
  Searched in:
- 'C:\Users\Meru/nltk_data'
- 'C:\nltk_data'
- 'D:\nltk_data'
- 'E:\nltk_data'
- 'C:\Users\Meru\Anaconda\nltk_data'
- 'C:\Users\Meru\Anaconda\lib\nltk_data'
- 'C:\Users\Meru\AppData\Roaming\nltk_data'
**********************************************************************

And, because of this problem things like this cannot run properly (obtaining the same error):

而且，由于这个问题，这样的事情无法正常运行（获得相同的错误）：

>>> from nltk.corpus import stopwords
>>> stop = stopwords.words('english')
>>> sentence = "this is a foo bar sentence"
>>> print [i for i in sentence.split() if i not in stop]

Do you know what may be problem? I must use words in Spanish, do you recomend another method? I also thought using Goslate package with datasets in english

你知道可能有什么问题吗？我必须使用西班牙语单词，你推荐另一种方法吗？我还想过使用带有英文数据集的 Goslate 包

Thanks for reading!

谢谢阅读！

P.D.: I use Ananconda

PD：我使用 Ananconda

Answer 1

采纳答案by tttthomasssss

You don't seem to have the stopwords corpus on your computer.

您的计算机上似乎没有停用词语料库。

You need to start the NLTK Downloader and download all the data you need.

您需要启动 NLTK Downloader 并下载您需要的所有数据。

Open a Python console and do the following:

打开 Python 控制台并执行以下操作：

>>> import nltk
>>> nltk.download()
showing info http://nltk.github.com/nltk_data/

In the GUI window that opens simply press the 'Download' button to download all corpora or go to the 'Corpora' tab and only download the ones you need/want.

在打开的 GUI 窗口中，只需按“下载”按钮即可下载所有语料库，或转到“语料库”选项卡并仅下载您需要/想要的语料库。

Answer 2

回答by SVK

If you want to manually install NLTK Corpus.

如果您想手动安装 NLTK 语料库。

1) Go to http://www.nltk.org/nltk_data/and download your desired NLTK Corpus file.

1) 到http://www.nltk.org/nltk_data/下载你想要的 NLTK Corpus 文件。

2) Now in a Python shell check the value of nltk.data.path

2) 现在在 Python shell 中检查 nltk.data.path 的值

3) Choose one of the path that exists on your machine, and unzip the data files into the corpora sub directory inside.

3) 选择机器上存在的路径之一，将数据文件解压到里面的语料库子目录中。

4) Now you can import the data from nltk.corpos import stopwords

4) 现在你可以从 nltk.corpos import stopwords 导入数据

Reference: https://medium.com/@satorulogic/how-to-manually-download-a-nltk-corpus-f01569861da9

参考：https: //medium.com/@satorulogic/how-to-manually-download-a-nltk-corpus-f01569861da9

Answer 3

回答by Rohit P

import nltk
nltk.download()

Click on download button when gui prompted. It worked for me.(nltk.download('stopwords')doesn't work for me)

gui 提示时单击下载按钮。它对我有用。（对我nltk.download('stopwords')不起作用）

Answer 4

回答by Abu Shoeb

I tried from ubuntu terminal and I don't know why the GUI didn't show up according to tttthomasssss answer. So I followed the comment from KLDavenport and it worked. Here is the summary:

我从 ubuntu 终端尝试过，但根据 tttthomasssss 的回答，我不知道为什么 GUI 没有出现。所以我遵循了 KLDavenport 的评论并且它奏效了。这是摘要：

Open your terminal/command-line and type python then

打开你的终端/命令行并输入 python 然后

>>> import nltk .>>> nltk.download("stopwords")

This will store the stopwords corpus under the nltk_data. For my case it was /home/myusername/nltk_data/corpora/stopwords.

这将在 nltk_data 下存储停用词语料库。就我而言，它是/home/myusername/nltk_data/corpora/stopwords.

If you need another corpus then visit nltk dataand find the corpus with their ID. Then use the ID to download like we did for stopwords.

如果您需要另一个语料库，请访问nltk 数据并找到带有其 ID 的语料库。然后使用 ID 下载，就像我们对停用词所做的那样。

Answer 5

回答by Jishnu Nair

import nltk

nltk.download()

A GUI pops up and in that go the Corpora section, select the required corpus.
Verified Result

一个 GUI 弹出，然后进入语料库部分，选择所需的语料库。
验证结果

Answer 6

回答by Haseeb

import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
STOPWORDS = set(stopwords.words('english'))

Python NLTK 和停用词失败 #lookuperror

提问by Facundo

采纳答案by tttthomasssss

回答by SVK

回答by Rohit P

回答by Abu Shoeb

回答by Jishnu Nair

回答by Haseeb

相关推荐

最近更新

标签

Python NLTK 和停用词失败 #lookuperror

提问by Facundo

采纳答案by tttthomasssss

回答by SVK

回答by Rohit P

回答by Abu Shoeb

回答by Jishnu Nair

回答by Haseeb

相关推荐

Python：子进程并运行带有多个参数的 bash 脚本

Python openpyxl中的水平文本对齐

Python 计算某个值在数据帧列中出现的频率

Python 在 DataFrame 索引中查找标签位置

相关推荐

最近更新

标签