Python 类型错误:必须是 unicode,而不是 NLTK 中的 str

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/38392407/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 20:43:26  来源:igfitidea点击:

TypeError: must be unicode, not str in NLTK

pythonnltkcrf

提问by backtrack

I am using python2.7, nltk 3.2.1 and python-crfsuite 0.8.4. I am following this page : http://www.nltk.org/api/nltk.tag.html?highlight=stanford#nltk.tag.stanford.NERTaggerfor nltk.tag.crf module.

我正在使用 python2.7、nltk 3.2.1 和 python-crfsuite 0.8.4。我正在关注此页面:http: //www.nltk.org/api/nltk.tag.html?highlight =stanford#nltk.tag.stanford.NERTagger 用于 nltk.tag.crf 模块。

To start with i just run this

首先我只是运行这个

from nltk.tag import CRFTagger
ct = CRFTagger()
train_data = [[('dfd','dfd')]]
ct.train(train_data,"abc")

I tried this too

我也试过这个

f = open("abc","wb")
ct.train(train_data,f)

but i am getting the following error,

但我收到以下错误,

  File "C:\Python27\lib\site-packages\nltk\tag\crf.py", line 129, in <genexpr>
    if all (unicodedata.category(x) in punc_cat for x in token):
TypeError: must be unicode, not str

回答by tripleee

In Python 2, regular quotes '...'or "..."create byte strings. To get Unicode strings, use a uprefix before the string, like u'dfd'.

在 Python 2 中,常规引号'...'"..."创建字节字符串。要获取 Unicode 字符串,请u在字符串前使用前缀,例如u'dfd'.

To read from a file, you'll want to specify an encoding. See Backporting Python 3 open(encoding="utf-8")to Python 2for options; most straightforwardly, replace open()with io.open().

要从文件中读取,您需要指定编码。有关选项,请参阅将Python 3 向后移植open(encoding="utf-8")到 Python 2;最直接地,替换open()io.open().

To convert an existing string, use the unicode()method; though usually, you'll want to use decode()and supply an encoding, too.

要转换现有字符串,请使用该unicode()方法;虽然通常情况下,您也需要使用decode()和提供编码。

For (much) more details, Ned Batchelder's "Pragmatic Unicode" slides are recommended, if not outright obligatory reading; http://nedbatchelder.com/text/unipain.html

对于(更多)更多细节,推荐 Ned Batchelder 的“Pragmatic Unicode”幻灯片,如果不是完全必须阅读的话;http://nedbatchelder.com/text/unipain.html