Python：以字节为单位获取字符串的大小

Question

提问by Iffat Fatima

I have a string that is to be sent over a network. I need to check the total bytes it is represented in.

我有一个要通过网络发送的字符串。我需要检查它代表的总字节数。

sys.getsizeof(string_name)returns extra bytes. For example for sys.getsizeof("a")returns 22 , while one character is only represented in 1 byte in python. Is there some other method to find this ?

sys.getsizeof(string_name)返回额外的字节。例如对于sys.getsizeof("a")返回 22 ，而一个字符在 python 中仅用 1 个字节表示。有没有其他方法可以找到这个？

Answer 1

采纳答案by Kris

If you want the number of bytes in a string, this function should do it for you pretty solidly.

如果您想要字符串中的字节数，此函数应该非常可靠地为您完成。

def utf8len(s):
    return len(s.encode('utf-8'))

The reason you got weird numbers is because encapsulated in a string is a bunch of other information due to the fact that strings are actual objects in python.

你得到奇怪数字的原因是因为字符串是python中的实际对象，所以封装在字符串中的是一堆其他信息。

Its interesting because if you look at my solution to encode the string into 'utf-8', there's an 'encode' method on the 's' object (which is a string). Well, it needs to be stored somewhere right? Hence, the higher than normal byte count. Its including that method, along with a few others :).

这很有趣，因为如果您查看我将字符串编码为 'utf-8' 的解决方案，则在 's' 对象（它是一个字符串）上有一个 'encode' 方法。好吧，它需要存储在某个地方，对吗？因此，高于正常字节数。它包括该方法以及其他一些方法:)。

Answer 2

回答by sboby

There's a caveat to the accepted answer.

接受的答案有一个警告。

For some multi-byte encodings (e.g. utf-16), string.encodewill add a Byte Order Mark(BOM) at the start, which is a sequence of special bytes that inform the reader on the byte endiannessused. So the length you get is actually len(BOM) + len(encoded_word).

对于某些多字节编码（例如 utf-16），string.encode会在开头添加一个字节顺序标记(BOM)，这是一个特殊字节序列，用于通知读者所使用的字节顺序。所以你得到的长度实际上是len(BOM) + len(encoded_word).

If you don't want to count the BOM bytes, you can use either the little-endian version of the encoding (adding the suffix "-le") or the big-endian version (adding the suffix "be").

如果您不想计算 BOM 字节数，您可以使用小端版本的编码（添加后缀“-le”）或大端版本（添加后缀“be”）。

>>> len('ciao'.encode('utf-16'))
10
>>> len('ciao'.encode('utf-16-le'))
8

Python：以字节为单位获取字符串的大小

提问by Iffat Fatima

采纳答案by Kris

回答by sboby

相关推荐

最近更新

标签

Python：以字节为单位获取字符串的大小

提问by Iffat Fatima

采纳答案by Kris

回答by sboby

相关推荐

Python 使用 statsmodel.formula.api 与 statsmodel.api 的 OLS

Python 错误：赋值前引用了局部变量

Python 在 virtualenv 中使用 pip install 时出现“权限被拒绝”错误

Python 三重双引号 vs 双引号

相关推荐

最近更新

标签