Javascript 比较 JSON 和 BSON
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/12601890/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Compare JSON and BSON
提问by Ronald
I am comparing JSON and BSON for serializing objects. These objects contain several arrays of a large number of integers. In my test the object I am serializing contains a total number of about 12,000 integers. I am only interested in how the sizes compare of the serialized results. I am using JSON.NET as the library which does the serialization. I am using JSON because I also want to be able to work with it in Javascript.
我正在比较 JSON 和 BSON 来序列化对象。这些对象包含多个包含大量整数的数组。在我的测试中,我正在序列化的对象总共包含大约 12,000 个整数。我只对序列化结果的大小比较感兴趣。我使用 JSON.NET 作为执行序列化的库。我使用 JSON 是因为我也希望能够在 Javascript 中使用它。
The size of the JSON string is about 43kb and the size of the BSON result is 161kb. So a difference factor of about 4. This is not what I expected because I looked at BSON because I thought BSON is more efficient in storing data.
JSON字符串大小约为43kb,BSON结果大小为161kb。所以差异因子约为 4。这不是我的预期,因为我查看了 BSON,因为我认为 BSON 在存储数据方面更有效。
So my question is why is BSON not efficient, can it be made more efficient? Or is there another way of serializing data with arrays containing large number of integers, which can be easily handled in Javascript?
所以我的问题是为什么 BSON 效率不高,可以提高效率吗?或者是否有另一种方法可以使用包含大量整数的数组来序列化数据,这可以在 Javascript 中轻松处理?
Below you find the code to test the JSON/BSON serialization.
您可以在下面找到测试 JSON/BSON 序列化的代码。
// Read file which contain json string
string _jsonString = ReadFile();
object _object = Newtonsoft.Json.JsonConvert.DeserializeObject(_jsonString);
FileStream _fs = File.OpenWrite("BsonFileName");
using (Newtonsoft.Json.Bson.BsonWriter _bsonWriter = new BsonWriter(_fs)
{ CloseOutput = false })
{
Newtonsoft.Json.JsonSerializer _jsonSerializer = new JsonSerializer();
_jsonSerializer.Serialize(_bsonWriter, _object);
_bsonWriter.Flush();
}
Edit:
编辑:
Here are the resulting files https://skydrive.live.com/redir?resid=9A6F31F60861DD2C!362&authkey=!AKU-ZZp8C_0gcR0
以下是结果文件 https://skydrive.live.com/redir?resid=9A6F31F60861DD2C!362&authkey=!AKU-ZZp8C_0gcR0
回答by saml
The efficiency of JSON vs BSON depends on the size of the integers you're storing. There's an interesting point where ASCII takes fewer bytes than actually storing integer types. 64-bit integers, which is how it appears your BSON document, take up 8 bytes. Your numbers are all less than 10,000, which means you could store each one in ASCII in 4 bytes (one byte for each character up through 9999). In fact, most of your data look like it's less than 1000, meaning it can be stored in 3 or fewer bytes. Of course, that deserialization takes time and isn't cheap, but it saves space. Furthermore, Javascript uses 64-bit values to represent all numbers, so if you wrote it to BSON after converting each integer to a more appropriate dataformat, your BSON file could be much larger.
JSON 与 BSON 的效率取决于您存储的整数的大小。有一个有趣的地方,即 ASCII 占用的字节数比实际存储整数类型要少。64 位整数,即 BSON 文档的显示方式,占用 8 个字节。您的数字都小于 10,000,这意味着您可以将每个数字存储在 4 个字节的 ASCII 中(每个字符一个字节,直到 9999)。事实上,您的大部分数据看起来都少于 1000,这意味着它可以存储在 3 个或更少的字节中。当然,反序列化需要时间并且不便宜,但它节省了空间。此外,Javascript 使用 64 位值来表示所有数字,因此如果您在将每个整数转换为更合适的数据格式后将其写入 BSON,您的 BSON 文件可能会更大。
According to the spec, BSON contains a lot of metadata that JSON doesn't. This metadata is mostly length prefixes so that you can skip through data you aren't interested in. For example, take the following data:
根据规范,BSON 包含很多 JSON 没有的元数据。此元数据主要是长度前缀,以便您可以跳过不感兴趣的数据。例如,取以下数据:
["hello there, this is an necessarily long string. It's especially long, but you don't care about it. You're just trying to get to the next element. But I keep going on and on.",
"oh man. here's another string you still don't care about. You really just want the third element in the array. How long are the first two elements? JSON won't tell you",
"data_you_care_about"]
Now, if you're using JSON, you have to parse the entirety of the first two strings to find out where the third one is. If you use BSON, you'll get markup more like (but not actually, because I'm making this markup up for the sake of example):
现在,如果您使用的是 JSON,则必须解析前两个字符串的全部内容才能找出第三个字符串的位置。如果你使用 BSON,你会得到更像的标记(但实际上不是,因为我为了示例而制作这个标记):
[175 "hello there, this is an necessarily long string. It's especially long, but you don't care about it. You're just trying to get to the next element. But I keep going on and on.",
169 "oh man. here's another string you still don't care about. You really just want the third element in the array. How long are the first two elements? JSON won't tell you",
19 "data_you_care_about"]
So now, you can read '175', know to skip forward 175 bytes, then read '169', skip forward 169 bytes, and then read '19' and copy the next 19 bytes to your string. That way you don't even have to parse the strings for delimiters.
所以现在,您可以读取 '175',知道向前跳过 175 个字节,然后读取 '169',向前跳过 169 个字节,然后读取 '19' 并将接下来的 19 个字节复制到您的字符串中。这样你甚至不必为分隔符解析字符串。
Using one versus the other is very dependent on what your needs are. If you're going to be storing enormous documents that you've got all the time in the world to parse, but your disk space is limited, use JSON because it's more compact and space efficient. If you're going to be storing documents, but reducing wait time (perhaps in a server context) is more important to you than saving some disk space, use BSON.
使用一个与另一个在很大程度上取决于您的需求。如果您要存储世界上所有时间都需要解析的大量文档,但您的磁盘空间有限,请使用 JSON,因为它更紧凑且空间效率更高。如果您要存储文档,但减少等待时间(可能在服务器上下文中)对您来说比节省一些磁盘空间更重要,请使用 BSON。
Another thing to consider in your choice is human readability. If you need to debug a crash report that contains BSON, you'll probably need a utility to decipher it. You probably don't just know BSON, but you can just read JSON.
在您的选择中要考虑的另一件事是人类可读性。如果您需要调试包含 BSON 的崩溃报告,您可能需要一个实用程序来解密它。您可能不仅了解 BSON,而且还可以阅读 JSON。