不能在类似字节的对象上使用字符串模式 - python 的重新错误

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/30478736/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-19 08:28:58  来源:igfitidea点击:

Can't use a string pattern on a bytes-like object - python's re error

pythonregex

提问by Matchbox2093

I'm doing the python challenge and trying to familiarize myself with python, so without looking at the answers, I tried using python's url reader to read the html and then find the letters needed. However in the code below I get an error, which was originally the python 3 urllib.request but after resolving it I get a new error:

我正在做 python 挑战并试图让自己熟悉 python,所以没有看答案,我尝试使用 python 的 url 阅读器读取 html,然后找到所需的字母。然而,在下面的代码中,我得到一个错误,它最初是 python 3 urllib.request 但在解决它后我得到一个新的错误:

<module>
    print ("".join(re.findall("[A-Za-z]", data)))
  File "C:\Python34\lib\re.py", line 210, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object
<module>
    print ("".join(re.findall("[A-Za-z]", data)))
  File "C:\Python34\lib\re.py", line 210, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object

Now I tried looking this error up on google, but all I got was about json, which I shouldn't need? My python isn't that strong, so maybe I am doing this incorrectly?

现在我尝试在谷歌上查找这个错误,但我得到的只是关于 json,我不应该需要它?我的蟒蛇不是那么强壮,所以也许我做错了?

#Question 2 - find rare characters

import re
import urllib.request

data = urllib.request.urlopen("http://www.pythonchallenge.com/pc/def/ocr.html")
mess = data.read()
messarr = mess.split("--")

print ("".join(re.findall("[A-Za-z]", data)))

#Question 3 - Find characters in list

page = urllib.request.urlopen("http://www.pythonchallenge.com/pc/def/equality.html")
mess = page.read()
messarr = mess.split("--")
print ("".join(re.findall("[^A-Z]+[A-Z]{3}([a-z])[A-Z]{3}[^A-Z]+", page)))

采纳答案by wouter bolsterlee

The problem is that you're mixing bytes and text strings. You should either decode your data into a text string (unicode), e.g. data.decode('utf-8'), or use a bytes object for the pattern, e.g. re.findall(b"[A-Za-z]")(note the leading bbefore the string literal).

问题是您正在混合字节和文本字符串。您应该将数据解码为文本字符串(unicode),例如data.decode('utf-8'),或使用字节对象作为模式,例如re.findall(b"[A-Za-z]")(注意b字符串文字前的前导)。