Python 如何使用 spacy lemmatizer 将单词转换为基本形式

Question

提问by yi wang

I am new to spacy and I want to use its lemmatizer function, but I don't know how to use it, like I into strings of word, which will return the string with the basic form the words.

我是 spacy 的新手，我想使用它的 lemmatizer 功能，但我不知道如何使用它，就像我进入单词字符串一样，它将返回具有单词基本形式的字符串。

Examples:

例子：

'words'=> 'word'
'did' => 'do'

'字'=> '字'
'做了' => '做'

Thank you.

谢谢你。

Answer 1

回答by damio

Previous answer is convoluted and can't be edited, so here's a more conventional one.

上一个答案令人费解且无法编辑，因此这里有一个更传统的答案。

# make sure your downloaded the english model with "python -m spacy download en"

import spacy
nlp = spacy.load('en')

doc = nlp(u"Apples and oranges are similar. Boots and hippos aren't.")

for token in doc:
    print(token, token.lemma, token.lemma_)

Output:

输出：

Apples 6617 apples
and 512 and
oranges 7024 orange
are 536 be
similar 1447 similar
. 453 .
Boots 4622 boot
and 512 and
hippos 98365 hippo
are 536 be
n't 538 not
. 453 .

From the official Lighting tour

来自官方照明之旅

Answer 2

回答by joel

If you want to use just the Lemmatizer, you can do that in the following way:

如果您只想使用 Lemmatizer，您可以通过以下方式进行：

from spacy.lemmatizer import Lemmatizer
from spacy.lang.en import LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES

lemmatizer = Lemmatizer(LEMMA_INDEX, LEMMA_EXC, LEMMA_RULES)
lemmas = lemmatizer(u'ducks', u'NOUN')
print(lemmas)

Output

输出

['duck']

Update

更新

Since spacy version 2.2, LEMMA_INDEX, LEMMA_EXC, and LEMMA_RULES have been bundled into a LookupsObject:

从 spacy 2.2 版本开始，LEMMA_INDEX、LEMMA_EXC 和 LEMMA_RULES 已被捆绑到一个Lookups对象中：

import spacy
nlp = spacy.load('en')

nlp.vocab.lookups
>>> <spacy.lookups.Lookups object at 0x7f89a59ea810>
nlp.vocab.lookups.tables
>>> ['lemma_lookup', 'lemma_rules', 'lemma_index', 'lemma_exc']

You can still use the lemmatizer directly with a word and a POS (part of speech) tag:

您仍然可以直接使用单词和 POS（词性）标签使用 lemmatizer：

from spacy.lemmatizer import Lemmatizer, ADJ, NOUN, VERB

lemmatizer = nlp.vocab.morphology.lemmatizer
lemmatizer('ducks', NOUN)
>>> ['duck']

You can pass the POS tag as the imported constant like above or as string:

您可以将 POS 标签作为导入的常量（如上）或作为字符串传递：

lemmatizer('ducks', 'NOUN')
>>> ['duck']

from spacy.lemmatizer import Lemmatizer, ADJ, NOUN, VERB

从 spacy.lemmatizer 导入 Lemmatizer, ADJ, NOUN, VERB

Answer 3

回答by RAVI

Code :

代码：

import os
from spacy.en import English, LOCAL_DATA_DIR

data_dir = os.environ.get('SPACY_DATA', LOCAL_DATA_DIR)

nlp = English(data_dir=data_dir)

doc3 = nlp(u"this is spacy lemmatize testing. programming books are more better than others")

for token in doc3:
    print token, token.lemma, token.lemma_

Output :

输出：

this 496 this
is 488 be
spacy 173779 spacy
lemmatize 1510965 lemmatize
testing 2900 testing
. 419 .
programming 3408 programming
books 1011 book
are 488 be
more 529 more
better 615 better
than 555 than
others 871 others

Example Ref: here

示例参考：这里

Answer 4

回答by Syauqi Haris

I use Spacy version 2.x

我使用的是 Spacy 2.x 版

import spacy
nlp = spacy.load('en_core_web_sm', disable=['parser', 'ner'])
doc = nlp('did displaying words')
print (" ".join([token.lemma_ for token in doc]))

and the output :

和输出：

do display word

Hope it helps :)

希望能帮助到你：）

Python 如何使用 spacy lemmatizer 将单词转换为基本形式

提问by yi wang

回答by damio

回答by joel

回答by RAVI

回答by Syauqi Haris

相关推荐

最近更新

标签

Python 如何使用 spacy lemmatizer 将单词转换为基本形式

提问by yi wang

回答by damio

回答by joel

回答by RAVI

回答by Syauqi Haris

相关推荐

Python Anaconda 安装问题 - 无法创建 Anaconda 菜单

Python 如何从 macOS 完全卸载 Anaconda

Python 格式化字符串与连接

Python 我想用列表中的双引号替换单引号

相关推荐

最近更新

标签