我如何 .decode('string-escape') 在 Python3 中？

Question

提问by vy32

I have some escaped strings that need to be unescaped. I'd like to do this in Python.

我有一些需要转义的转义字符串。我想在 Python 中做到这一点。

For example, in python2.7 I can do this:

例如，在 python2.7 中，我可以这样做：

>>> "\123omething special".decode('string-escape')
'Something special'
>>>

How do I do it in Python3? This doesn't work:

我如何在 Python3 中做到这一点？这不起作用：

>>> b"\123omething special".decode('string-escape')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
LookupError: unknown encoding: string-escape
>>>

My goal is to be abel to take a string like this:

我的目标是成为 abel 接受这样的字符串：

s"[email protected]"
0udef string_escape(s, encoding='utf-8'):
    return (s.encode('latin1')         # To bytes, required by 'unicode-escape'
             .decode('unicode-escape') # Perform the actual octal-escaping decode
             .encode('latin1')         # 1:1 mapping back to bytes
             .decode(encoding))        # Decode original encoding
0p>>> string_escape('\123omething special')
'Something special'

>>> string_escape(r's>>> b"\123omething special".decode('unicode_escape')
0u>>> value = b's\000u\000p\000p\000o\000r\000t\000@\000p\000s\000i\000l\000o\000c\000.\000c\000o\000m\000'
>>> value.decode('unicode_escape').encode('latin1')  # convert to bytes
b's\x00u\x00p\x00p\x00o\x00r\x00t\x00@\x00p\x00s\x00i\x00l\x00o\x00c\x00.\x00c\x00o\x00m\x00'
>>> _.decode('utf-16-le') # decode from UTF-16-LE
'[email protected]'
0pdef unescape(text):
    regex = re.compile(b'\\(\\|[0-7]{1,3}|x.[0-9a-f]?|[\'"abfnrt]|.|$)')
    def replace(m):
        b = m.group(1)
        if len(b) == 0:
            raise ValueError("Invalid character escape: '\'.")
        i = b[0]
        if i == 120:
            v = int(b[1:], 16)
        elif 48 <= i <= 55:
            v = int(b, 8)
        elif i == 34: return b'"'
        elif i == 39: return b"'"
        elif i == 92: return b'\'
        elif i == 97: return b'\a'
        elif i == 98: return b'\b'
        elif i == 102: return b'\f'
        elif i == 110: return b'\n'
        elif i == 114: return b'\r'
        elif i == 116: return b'\t'
        else:
            s = b.decode('ascii')
            raise UnicodeDecodeError(
                'stringescape', text, m.start(), m.end(), "Invalid escape: %r" % s
            )
        return bytes((v, ))
    result = regex.sub(replace, text)
0p>>> import codecs
>>> codecs.escape_decode(b"ab\xff")
(b'ab\xff', 6)
>>> codecs.escape_encode(b"ab\xff")
(b'ab\xff', 3)
0o>>> value = b's\000u\000p\000p\000o\000r\000t\000@\000p\000s\000i\000l\000o\000c\000.\000c\000o\000m\000'
>>> codecs.escape_decode(value)[0]
b's\x00u\x00p\x00p\x00o\x00r\x00t\x00@\x00p\x00s\x00i\x00l\x00o\x00c\x00.\x00c\x00o\x00m\x00'
0rPy2: my_input.decode('string_escape')
Py3: bytes(my_input.decode('unicode_escape'), 'latin1')
0tdef string_escape(my_bytes):
    return bytes(my_bytes.decode('unicode_escape'), 'latin1')
0@'
                  r'##代码##0p##代码##0s##代码##0i##代码##0l##代码##0o##代码##0c##代码##0.##代码##0c##代码##0o##代码##0m##代码##0',
                  'utf-16-le')
'[email protected]'
0p##代码##0o##代码##0r##代码##0t##代码##0@##代码##0p##代码##0s##代码##0i##代码##0l##代码##0o##代码##0c##代码##0.##代码##0c##代码##0o##代码##0m##代码##0

And turn it into:

并将其变成：

##代码##

After I do the conversion, I'll probe to see if the string I have is encoded in UTF-8 or UTF-16.

完成转换后，我将探查我拥有的字符串是用 UTF-8 还是 UTF-16 编码的。

Answer 1

采纳答案by MestreLion

If you want str-to-strdecoding of escape sequences, so both input and output are Unicode:

如果你想str-to- str解码转义序列，那么输入和输出都是 Unicode：

##代码##

Testing:

测试：

##代码##

Answer 2

回答by Martijn Pieters

You'll have to use unicode_escapeinstead:

你必须unicode_escape改用：

##代码##

If you startwith a strobject instead (equivalent to the python 2.7 unicode) you'll need to encode to bytes first, then decode with unicode_escape.

如果你开始使用str对象，而不是（相当于Python 2.7版的Unicode），您需要编码的字节先，然后用解码unicode_escape。

If you need bytes as end result, you'll have to encode again to a suitable encoding (.encode('latin1')for example, if you need to preserve literal byte values; the first 256 Unicode code points map 1-on-1).

如果您需要字节作为最终结果，则必须再次编码.encode('latin1')为合适的编码（例如，如果您需要保留文字字节值；前 256 个 Unicode 代码点映射一对一）。

Your example is actually UTF-16 data with escapes. Decode from unicode_escape, back to latin1to preserve the bytes, then from utf-16-le(UTF 16 little endian without BOM):

您的示例实际上是带有转义符的 UTF-16 数据。解码 from unicode_escape, back tolatin1以保留字节，然后 from utf-16-le(UTF 16 little endian without BOM):

##代码##

Answer 3

回答by malthe

You can't use unicode_escapeon byte strings (or rather, you can, but it doesn't always return the same thing as string_escapedoes on Python 2) – beware!

你不能unicode_escape在字节字符串上使用（或者更确切地说，你可以，但它并不总是像string_escape在 Python 2 上那样返回相同的东西）——当心！

This function implements string_escapeusing a regular expression and custom replacement logic.

此函数string_escape使用正则表达式和自定义替换逻辑来实现。

##代码##

Answer 4

回答by Nathaniel J. Smith

The old "string-escape" codec maps bytestrings to bytestrings, and there's been a lot of debate about what to do with such codecs, so it isn't currently available through the standard encode/decode interfaces.

旧的“字符串转义”编解码器将字节串映射到字节串，关于如何处理此类编解码器存在很多争论，因此目前无法通过标准编码/解码接口使用它。

BUT, the code is still there in the C-API (as PyBytes_En/DecodeEscape), and this is still exposed to Python via the undocumented codecs.escape_encodeand codecs.escape_decode.

但是，代码仍然存在于 C-API 中（作为PyBytes_En/DecodeEscape），并且它仍然通过未记录的codecs.escape_encode和codecs.escape_decode.

##代码##

These functions return the transformed bytesobject, plus a number indicating how many bytes were processed... you can just ignore the latter.

这些函数返回转换后的bytes对象，加上一个表示处理了多少字节的数字……你可以忽略后者。

##代码##

Answer 5

回答by guettli

At least in my case this was equivalent:

至少在我的情况下，这是等效的：

##代码##

convertutils.py:

转换工具.py：

##代码##

我如何 .decode('string-escape') 在 Python3 中？

提问by vy32

采纳答案by MestreLion

回答by Martijn Pieters

回答by malthe

回答by Nathaniel J. Smith

回答by guettli

相关推荐

最近更新

标签

我如何 .decode('string-escape') 在 Python3 中？

提问by vy32

采纳答案by MestreLion

回答by Martijn Pieters

回答by malthe

回答by Nathaniel J. Smith

回答by guettli

相关推荐

Python子进程超时？

Python 如何使用 glob.glob 模块搜索子文件夹？

Python 如何很好地格式化 dict 字符串输出

元组对，使用python找到最小值

相关推荐

最近更新

标签