Python NLTK 的所有可能的 pos 标签是什么?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/15388831/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-18 19:59:25  来源:igfitidea点击:

What are all possible pos tags of NLTK?

pythonnltk

提问by OrangeTux

How do I find a list with all possible pos tags used by the Natural Language Toolkit (nltk)?

如何找到包含自然语言工具包 (nltk) 使用的所有可能 pos 标签的列表?

采纳答案by phipsgabler

The bookhas a note how to find help on tag sets, e.g.:

这本书有一个说明如何找到标签集的帮助,例如:

nltk.help.upenn_tagset()

Others are probably similar. (Note: Maybe you first have to download tagsetsfrom the download helper's Modelssection for this)

其他的应该是类似的。(注意:也许你首先必须tagsets从下载助手的模型部分下载)

回答by Suzana

The tag set depends on the corpus that was used to train the tagger. The default tagger of nltk.pos_tag()uses the Penn Treebank Tag Set.

标签集取决于用于训练标注器的语料库。默认标记器nltk.pos_tag()使用Penn Treebank Tag Set

In NLTK 2, you could check which tagger is the default tagger as follows:

在 NLTK 2 中,您可以检查哪个标记器是默认标记器,如下所示:

import nltk
nltk.tag._POS_TAGGER
>>> 'taggers/maxent_treebank_pos_tagger/english.pickle'

That means that it's a Maximum Entropy tagger trained on the Treebank corpus.

这意味着它是在 Treebank 语料库上训练的最大熵标记器。

nltk.tag._POS_TAGGERdoes not exist anymore in NLTK 3 but the documentation statesthat the off-the-shelf tagger still uses the Penn Treebank tagset.

nltk.tag._POS_TAGGERNLTK 3 中不再存在,但文档指出现成的标记器仍然使用 Penn Treebank 标记集。

回答by Doug Shore

The below can be useful to access a dict keyed by abbreviations:

以下内容可用于访问以缩写为键的字典:

>>> from nltk.data import load
>>> tagdict = load('help/tagsets/upenn_tagset.pickle')
>>> tagdict['NN'][0]
'noun, common, singular or mass'
>>> tagdict.keys()
['PRP$', 'VBG', 'VBD', '``', 'VBN', ',', "''", 'VBP', 'WDT', ...

回答by binarymax

To save some folks some time, here is a list I extracted from a small corpus. I do not know if it is complete, but it should have most (if not all) of the help definitions from upenn_tagset...

为了节省一些人的时间,这是我从一个小语料库中提取的列表。我不知道它是否完整,但它应该包含来自 upenn_tagset 的大部分(如果不是全部)帮助定义...

CC: conjunction, coordinating

CC: 连接,协调

& 'n and both but either et for less minus neither nor or plus so
therefore times v. versus vs. whether yet

CD: numeral, cardinal

CD: 数字,基数

mid-1890 nine-thirty forty-two one-tenth ten million 0.5 one forty-
seven 1987 twenty '79 zero two 78-degrees eighty-four IX '60s .025
fifteen 271,124 dozen quintillion DM2,000 ...

DT: determiner

DT: 确定器

all an another any both del each either every half la many much nary
neither no some such that the them these this those

EX: existential there

EX: 存在那里

there

IN: preposition or conjunction, subordinating

IN:介词或连词,从属

astride among uppon whether out inside pro despite on by throughout
below within for towards near behind atop around if like until below
next into if beside ...

JJ: adjective or numeral, ordinal

JJ:形容词或数词,序数

third ill-mannered pre-war regrettable oiled calamitous first separable
ectoplasmic battery-powered participatory fourth still-to-be-named
multilingual multi-disciplinary ...

JJR: adjective, comparative

JJR: 形容词,比较级

bleaker braver breezier briefer brighter brisker broader bumper busier
calmer cheaper choosier cleaner clearer closer colder commoner costlier
cozier creamier crunchier cuter ...

JJS: adjective, superlative

JJS:形容词,最高级

calmest cheapest choicest classiest cleanest clearest closest commonest
corniest costliest crassest creepiest crudest cutest darkest deadliest
dearest deepest densest dinkiest ...

LS: list item marker

LS: 列表项标记

A A. B B. C C. D E F First G H I J K One SP-44001 SP-44002 SP-44005
SP-44007 Second Third Three Two * a b c d first five four one six three
two

MD: modal auxiliary

MD: 模态辅助

can cannot could couldn't dare may might must need ought shall should
shouldn't will would

NN: noun, common, singular or mass

NN: 名词、普通、单数或质量

common-carrier cabbage knuckle-duster Casino afghan shed thermostat
investment slide humour falloff slick wind hyena override subhumanity
machinist ...

NNP: noun, proper, singular

NNP:名词,专有名词,单数

Motown Venneboerger Czestochwa Ranzer Conchita Trumplane Christos
Oceanside Escobar Kreisler Sawyer Cougar Yvette Ervin ODI Darryl CTCA
Shannon A.K.C. Meltex Liverpool ...

NNS: noun, common, plural

NNS: 名词,普通,复数

undergraduates scotches bric-a-brac products bodyguards facets coasts
divestitures storehouses designs clubs fragrances averages
subjectivists apprehensions muses factory-jobs ...

PDT: pre-determiner

PDT: 预定器

all both half many quite such sure this

POS: genitive marker

POS: 属格标记

' 's

PRP: pronoun, personal

PRP: 代词,人称

hers herself him himself hisself it itself me myself one oneself ours
ourselves ownself self she thee theirs them themselves they thou thy us

PRP$: pronoun, possessive

PRP$:代词,所有格

her his mine my our ours their thy your

RB: adverb

RB: 副词

occasionally unabatingly maddeningly adventurously professedly
stirringly prominently technologically magisterially predominately
swiftly fiscally pitilessly ...

RBR: adverb, comparative

RBR:副词,比较级

further gloomier grander graver greater grimmer harder harsher
healthier heavier higher however larger later leaner lengthier less-
perfectly lesser lonelier longer louder lower more ...

RBS: adverb, superlative

RBS:副词,最高级

best biggest bluntest earliest farthest first furthest hardest
heartiest highest largest least less most nearest second tightest worst

RP: particle

RP: 粒子

aboard about across along apart around aside at away back before behind
by crop down ever fast for forth from go high i.e. in into just later
low more off on open out over per pie raising start teeth that through
under unto up up-pp upon whole with you

TO: "to" as preposition or infinitive marker

TO: “to”作为介词或不定式标记

to

UH: interjection

:感叹词

Goodbye Goody Gosh Wow Jeepers Jee-sus Hubba Hey Kee-reist Oops amen
huh howdy uh dammit whammo shucks heck anyways whodunnit honey golly
man baby diddle hush sonuvabitch ...

VB: verb, base form

VB: 动词,基本形式

ask assemble assess assign assume atone attention avoid bake balkanize
bank begin behold believe bend benefit bevel beware bless boil bomb
boost brace break bring broil brush build ...

VBD: verb, past tense

VBD:动词,过去时

dipped pleaded swiped regummed soaked tidied convened halted registered
cushioned exacted snubbed strode aimed adopted belied figgered
speculated wore appreciated contemplated ...

VBG: verb, present participle or gerund

VBG: 动词、现在分词或动名词

telegraphing stirring focusing angering judging stalling lactating
hankerin' alleging veering capping approaching traveling besieging
encrypting interrupting erasing wincing ...

VBN: verb, past participle

VBN: 动词,过去分词

multihulled dilapidated aerosolized chaired languished panelized used
experimented flourished imitated reunifed factored condensed sheared
unsettled primed dubbed desired ...

VBP: verb, present tense, not 3rd person singular

VBP:动词,现在时,不是第三人称单数

predominate wrap resort sue twist spill cure lengthen brush terminate
appear tend stray glisten obtain comprise detest tease attract
emphasize mold postpone sever return wag ...

VBZ: verb, present tense, 3rd person singular

VBZ:动词,现在时,第三人称单数

bases reconstructs marks mixes displeases seals carps weaves snatches
slumps stretches authorizes smolders pictures emerges stockpiles
seduces fizzes uses bolsters slaps speaks pleads ...

WDT: WH-determiner

WDT: WH 决定器

that what whatever which whichever

WP: WH-pronoun

WP: WH-代词

that what whatever whatsoever which who whom whosoever

WRB: Wh-adverb

WRB: Wh-副词

how however whence whenever where whereby whereever wherein whereof why

回答by mdubez

The reference is available at the official site

参考可在官方网站上找到

Copy and pasting from there:

从那里复制和粘贴:

  • CC | Coordinating conjunction |
  • CD | Cardinal number |
  • DT | Determiner |
  • EX | Existential there|
  • FW | Foreign word |
  • IN | Preposition or subordinating conjunction |
  • JJ | Adjective |
  • JJR | Adjective, comparative |
  • JJS | Adjective, superlative |
  • LS | List item marker |
  • MD | Modal |
  • NN | Noun, singular or mass |
  • NNS | Noun, plural |
  • NNP | Proper noun, singular |
  • NNPS | Proper noun, plural |
  • PDT | Predeterminer |
  • POS | Possessive ending |
  • PRP | Personal pronoun |
  • PRP$ | Possessive pronoun |
  • RB | Adverb |
  • RBR | Adverb, comparative |
  • RBS | Adverb, superlative |
  • RP | Particle |
  • SYM | Symbol |
  • TO | to|
  • UH | Interjection |
  • VB | Verb, base form |
  • VBD | Verb, past tense |
  • VBG | Verb, gerund or present participle |
  • VBN | Verb, past participle |
  • VBP | Verb, non-3rd person singular present |
  • VBZ | Verb, 3rd person singular present |
  • WDT | Wh-determiner |
  • WP | Wh-pronoun |
  • WP$ | Possessive wh-pronoun |
  • WRB | Wh-adverb |
  • 抄送 | 并列连词 |
  • 光盘 | 基数 |
  • DT | 确定者 |
  • 前 | 存在那里|
  • 防火墙 | 外来词 |
  • 在 | 介词或从属连词 |
  • 杰杰 | 形容词 |
  • JJR | 形容词,比较级 |
  • JJS | 形容词,最高级 |
  • LS | 列表项标记 |
  • 医学博士 | 模态 |
  • 神经网络 | 名词,单数或质量 |
  • 神经网络 | 名词,复数 |
  • 神经网络 | 专有名词,单数 |
  • 国家邮政总局 | 专有名词,复数 |
  • PDT | 预定器 |
  • 销售点 | 所有的结局 |
  • PRP | 人称代词 |
  • PRP$ | 所有格代词 |
  • RB | 副词 |
  • RBR | 副词,比较级 |
  • 苏格兰皇家银行 | 副词,最高级 |
  • RP | 粒子 |
  • SYM | 符号 |
  • 至 | |
  • 呃| 感叹词 |
  • VB | 动词,基本形式 |
  • VBD | 动词,过去时 |
  • VBG | 动词、动名词或现在分词 |
  • VBN | 动词,过去分词 |
  • VBP | 动词,非第三人称单数现在时 |
  • VBZ | 动词,第三人称单数现在时 |
  • WDT | W-决定子|
  • 工作坊 | W-代词 |
  • WP$ | 所有格 wh 代词 |
  • WRB | w-副词|

回答by phanindravarma

You can download the list here: ftp://ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz. It includes confusing parts of speech, capitalization, and other conventions. Also, wikipediahas an interesting section similar to this. Section: Part-of-speech tags used.

您可以在此处下载列表:ftp: //ftp.cis.upenn.edu/pub/treebank/doc/tagguide.ps.gz。它包括令人困惑的词性、大小写和其他约定。此外,维基百科有一个与此类似的有趣部分。部分:使用的词性标签。

回答by Sumit Pokhrel

Just run this verbatim.

只需逐字运行。

import nltk
nltk.download('tagsets')
nltk.help.upenn_tagset()

nltk.tag._POS_TAGGERwon't work. It will give AttributeError: module 'nltk.tag' has no attribute '_POS_TAGGER'. It's not available in NLTK 3 anymore.

nltk.tag._POS_TAGGER不会工作。它会给出AttributeError: module 'nltk.tag' has no attribute '_POS_TAGGER'。它在 NLTK 3 中不再可用。

回答by little_thumb

['LS', 'TO', 'VBN', "''", 'WP', 'UH', 'VBG', 'JJ', 'VBZ', '--', 'VBP', 'NN', 'DT', 'PRP', ':', 'WP$', 'NNPS', 'PRP$', 'WDT', '(', ')', '.', ',', '``', '$', 'RB', 'RBR', 'RBS', 'VBD', 'IN', 'FW', 'RP', 'JJR', 'JJS', 'PDT', 'MD', 'VB', 'WRB', 'NNP', 'EX', 'NNS', 'SYM', 'CC', 'CD', 'POS']

['LS', 'TO', 'VBN', "''", 'WP', 'UH', 'VBG', 'JJ', 'VBZ', '--', 'VBP', 'NN' , 'DT', 'PRP', ':', 'WP$', 'NNPS', 'PRP$', 'WDT', '(', ')', '.', ',', '`` ', '$', 'RB', 'RBR', 'RBS', 'VBD', 'IN', 'FW', 'RP', 'JJR', 'JJS', 'PDT', 'MD', 'VB'、'WRB'、'NNP'、'EX'、'NNS'、'SYM'、'CC'、'CD'、'POS']

Based on Doug Shore's method but make it more copy-paste friendly

基于 Doug Shore 的方法,但使其更易于复制粘贴