str.translate 给出了 TypeError - Translate 需要一个参数(给出 2 个),在 Python 2 中工作
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/23175809/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
str.translate gives TypeError - Translate takes one argument (2 given), worked in Python 2
提问by carebear
I have the following code
我有以下代码
import nltk, os, json, csv, string, cPickle
from scipy.stats import scoreatpercentile
lmtzr = nltk.stem.wordnet.WordNetLemmatizer()
def sanitize(wordList):
answer = [word.translate(None, string.punctuation) for word in wordList]
answer = [lmtzr.lemmatize(word.lower()) for word in answer]
return answer
words = []
for filename in json_list:
words.extend([sanitize(nltk.word_tokenize(' '.join([tweet['text']
for tweet in json.load(open(filename,READ))])))])
I've tested lines 2-4 in a separate testing.py file when I wrote
我在编写时在单独的 testing.py 文件中测试了第 2-4 行
import nltk, os, json, csv, string, cPickle
from scipy.stats import scoreatpercentile
wordList= ['\'the', 'the', '"the']
print wordList
wordList2 = [word.translate(None, string.punctuation) for word in wordList]
print wordList2
answer = [lmtzr.lemmatize(word.lower()) for word in wordList2]
print answer
freq = nltk.FreqDist(wordList2)
print freq
and the command prompt returns ['the','the','the'], which is what I wanted (removing punctuation).
并且命令提示符返回 ['the','the','the'],这是我想要的(删除标点符号)。
However, when I put the exact same code in a different file, python returns a TypeError stating that
但是,当我将完全相同的代码放在不同的文件中时,python 返回一个 TypeError 说明
File "foo.py", line 8, in <module>
for tweet in json.load(open(filename, READ))])))])
File "foo.py", line 2, in sanitize
answer = [word.translate(None, string.punctuation) for word in wordList]
TypeError: translate() takes exactly one argument (2 given)
json_list is a list of all the file paths (I printed and check that this list is valid). I'm confused on this TypeError because everything works perfectly fine when I'm just testing it in a different file.
json_list 是所有文件路径的列表(我打印并检查此列表是否有效)。我对这个 TypeError 感到困惑,因为当我只是在不同的文件中测试它时,一切都很好。
回答by Blckknght
I suspect your issue has to do with the differences between str.translate
and unicode.translate
(these are also the differences between str.translate
on Python 2 versus Python 3). I suspect your original code is being sent unicode
instances while your test code is using regular 8-bit str
instances.
我怀疑你的问题是与之间的差异做str.translate
和unicode.translate
(这些也是之间的差异str.translate
上的Python 2与Python 3中)。我怀疑您的原始代码正在发送unicode
实例,而您的测试代码正在使用常规 8 位str
实例。
I don't suggest converting Unicode strings back to regular str
instances, since unicode
is a much better type for handling text data (and it is the future!). Instead, you should just adapt to the new unicode.translate
syntax. With regular str.translate
(on Python 2), you can pass an optional deletechars
argument and the characters in it would be removed from the string. For unicode.translate
(and str.translate
on Python 3), the extra argument is no longer allowed, but translation table entries with None
as their value will be deleted from the output.
我不建议将 Unicode 字符串转换回常规str
实例,因为这unicode
是处理文本数据的更好的类型(而且是未来!)。相反,您应该只适应新unicode.translate
语法。使用常规str.translate
(在 Python 2 上),您可以传递一个可选deletechars
参数,其中的字符将从字符串中删除。对于unicode.translate
(以及str.translate
在 Python 3 上),不再允许使用额外的参数,但None
会从输出中删除带有其值的转换表条目。
To solve the problem you'll need to create an appropriate translation table. A translation table is a dictionary mapping from Unicode ordinals (that is, int
s) to ordinals, strings or None
. A helper function for making them exists in Python 2 as string.maketrans
(and Python 3 as a method of the str
type), but the Python 2 version of it doesn't handle the case we care about (putting None
values into the table). You can build an appropriate dictionary yourself with something like {ord(c): None for c in string.punctuation}
.
要解决这个问题,您需要创建一个适当的转换表。转换表是从 Unicode 序数(即int
s)到序数、字符串或的字典映射None
。在 Python 2 中存在一个用于生成它们的辅助函数string.maketrans
(以及 Python 3 作为该str
类型的方法),但它的 Python 2 版本不处理我们关心的情况(将None
值放入表中)。你可以用类似的东西自己构建一个合适的字典{ord(c): None for c in string.punctuation}
。
回答by drchuck
If all you are looking to accomplish is to do the same thing you were doing in Python 2 in Python 3, here is what I was doing in Python 2.0 to throw away punctuation and numbers:
如果您想要完成的只是在 Python 3 中做与在 Python 2 中所做的相同的事情,那么我在 Python 2.0 中所做的就是丢弃标点符号和数字:
text = text.translate(None, string.punctuation)
text = text.translate(None, '1234567890')
Here is my Python 3.0 equivalent:
这是我的 Python 3.0 等效项:
text = text.translate(str.maketrans('','',string.punctuation))
text = text.translate(str.maketrans('','','1234567890'))
Basically it says 'translate nothing to nothing' (first two parameters) and translate any punctuation or numbers to None
(i.e. remove them).
基本上它说“什么都不翻译”(前两个参数)并将任何标点或数字翻译成None
(即删除它们)。
回答by ChuQuan
Python 3.0:
蟒蛇 3.0:
text = text.translate(str.maketrans('','','1234567890'))
static str.maketrans(x[, y[, z]])
This static method returns a translation table usable for
str.translate()
.
静态 str.maketrans(x[, y[, z]])
此静态方法返回可用于 的转换表
str.translate()
。
If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters (strings of length 1) to Unicode ordinals, strings (of arbitrary lengths) or None
. Character keys will then be converted to ordinals.
如果只有一个参数,则它必须是将 Unicode 序数(整数)或字符(长度为 1 的字符串)映射到 Unicode 序数、字符串(任意长度)或None
. 然后字符键将被转换为序数。
If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x
will be mapped to the character at the same position in y
. If there is a third argument, it must be a string, whose characters will be mapped to None
in the result.
如果有两个参数,它们必须是等长的字符串,并且在生成的字典中,每个字符 inx
都会映射到 in 中相同位置的字符y
。如果有第三个参数,它必须是一个字符串,其字符将被映射到None
结果中。
https://docs.python.org/3/library/stdtypes.html?highlight=maketrans#str.maketrans
https://docs.python.org/3/library/stdtypes.html?highlight=maketrans#str.maketrans
回答by Preeti Duhan
This is how translate works:
这就是翻译的工作方式:
yourstring.translate(str.maketrans(fromstr, tostr, deletestr))
Replace the characters in fromstr
with the character in the same position in tostr
and delete all characters that are in deletestr
. The fromstr
and tostr
can be
empty strings and the deletestr
parameter can be omitted.
将 infromstr
中的字符替换为in 中相同位置的tostr
字符,并删除 中的所有字符deletestr
。该fromstr
和tostr
可以为空字符串和deletestr
可以省略参数。
example:
例子:
str="preetideepak12345aeiou"
>>> str.translate(str.maketrans('abcde','12345','p'))
output:
输出:
'r55ti4551k1234515iou'
here:
这里:
a is translated to 1
b is translated to 2
c is translated to 3 and so on
and p is deleted from string.