java 在cassandra中将json存储为文本与blob的优缺点是什么?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/31339150/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
What are the pros or cons of storing json as text vs blob in cassandra?
提问by pinkpanther
One problem with blob for me is, in java, ByteBuffer (which is mapped to blob in cassandra) is not Serializable hence does not work well with EJBs.
对我来说 blob 的一个问题是,在 java 中,ByteBuffer(在 cassandra 中映射到 blob)不是可序列化的,因此不能很好地与 EJB 配合使用。
Considering the json is fairly large what would be the better type for storing json in cassandra. Is it text or blob?
考虑到 json 相当大,在 cassandra 中存储 json 的更好类型是什么。是文本还是blob?
Does the size of the json matter when deciding the blob vs json?
在决定 blob 还是 json 时,json 的大小是否重要?
If it were any other database like oracle, it's common to use blob/clob. But in Cassandra where each cell can hold as large as 2GB, does it matter?
如果是像 oracle 这样的任何其他数据库,通常使用 blob/clob。但是在 Cassandra 中,每个单元可以容纳 2GB 的大小,这有关系吗?
Please consider this question as the choose between text vs blob for this case, instead of sorting to suggestions regarding whether to use single column for json.
对于这种情况,请将此问题视为文本与 blob 之间的选择,而不是根据有关是否对 json 使用单列的建议进行排序。
回答by aroth
I don't think there's any benefit for storing the literalJSON data as a BLOB
in Cassandra. At best your storage costs are identical, and in general the API's are less convenient in terms of working with BLOB
types as they are for working with strings/text.
我认为在 Cassandra中将文字JSON 数据存储为 a没有任何好处BLOB
。充其量您的存储成本是相同的,并且通常 API 在处理BLOB
类型方面不太方便,因为它们用于处理字符串/文本。
For instance, if you're using their Java APIthen in order to store the data as a BLOB
using a parameterized PreparedStatement
you first need to load it all into a ByteBuffer
, for instance by packing your JSON data into an InputStream
.
例如,如果您使用他们的Java API,那么为了将数据存储为BLOB
using 参数化,PreparedStatement
您首先需要将其全部加载ByteBuffer
到InputStream
.
Unless you're dealing with very largeJSON snippets that force you to stream your data anyways, that's a fair bit of extra work to get access to the BLOB
type. And what would you gain from it? Essentially nothing.
除非您正在处理非常大的JSON 片段,这些片段迫使您无论如何都要流式传输数据,否则访问该BLOB
类型需要做一些额外的工作。你会从中得到什么?基本上什么都没有。
However, I think there's some merit in asking 'Should I store JSON as text, or gzip it and store the compressed data as a BLOB
?'.
但是,我认为询问“我应该将 JSON 存储为文本,还是 gzip 并将压缩数据存储为BLOB
?'。
And the answer to that comes down to how you've configured Cassandra and your table. In particular, as long as you're using Cassandra version 1.1 or later your tables have compression enabled by default. That may be adequate, particularly if your JSON data is fairly uniform across each row.
答案归结为您如何配置 Cassandra 和您的表。特别是,只要您使用 Cassandra 1.1 版或更高版本,您的表就会默认启用压缩。这可能就足够了,特别是如果您的 JSON 数据在每一行中都相当一致。
However, Cassandra's built-in compression is applied table-wide, rather than to individual rows. So you may get a better compression ratio by manually compressing your JSON data before storage, writing the compressed bytes into a ByteBuffer
, and then shipping the data into Cassandra as a BLOB
.
但是,Cassandra 的内置压缩是在表范围内应用的,而不是应用于单个行。所以,你可以通过存储之前手动压缩您的JSON数据,写入压缩字节为获得更好的压缩比ByteBuffer
,然后将数据运送到卡桑德拉的BLOB
。
So it essentially comes down to a tradeoff in terms of storage space vs. programming convenience vs. CPU usage. I would decide the matter as follows:
因此,它本质上归结为在存储空间、编程便利性和 CPU 使用率方面的权衡。我将决定此事如下:
- Is minimizing the amount of storage consumed your biggestconcern?
- If yes, compress the JSON data and store the compressed bytes as a
BLOB
; - Otherwise, proceed to #2.
- If yes, compress the JSON data and store the compressed bytes as a
- Is Cassandra's built-in compression available and enabled for your table?
- If no (and if you can't enable the compression), compress the JSON data and store the compressed bytes as a
BLOB
; - Otherwise, proceed to #3.
- If no (and if you can't enable the compression), compress the JSON data and store the compressed bytes as a
- Is the data you'll be storing relatively uniform across each row?
- Probably for JSON data the answer is 'yes', in which case you should store the data as text and let Cassandra handle the compression;
- Otherwise proceed to #4.
- Do you want efficiency, or convenience?
- Efficiency; compress the JSON data and store the compressed bytes as a
BLOB
. - Convenience; compress the JSON data, base64 the compressed data, and then store the base64-encoded data as text.
- Efficiency; compress the JSON data and store the compressed bytes as a
- 最大限度地减少消耗的存储量是您最关心的问题吗?
- 如果是,则压缩 JSON 数据并将压缩的字节存储为
BLOB
; - 否则,继续#2。
- 如果是,则压缩 JSON 数据并将压缩的字节存储为
- Cassandra 的内置压缩是否可用并为您的表启用?
- 如果没有(并且如果您无法启用压缩),则压缩 JSON 数据并将压缩的字节存储为
BLOB
; - 否则,继续#3。
- 如果没有(并且如果您无法启用压缩),则压缩 JSON 数据并将压缩的字节存储为
- 您将在每一行中存储的数据是否相对统一?
- 可能对于 JSON 数据,答案是“是”,在这种情况下,您应该将数据存储为文本并让 Cassandra 处理压缩;
- 否则继续#4。
- 你想要效率,还是方便?
- 效率;压缩 JSON 数据并将压缩的字节存储为
BLOB
. - 方便;压缩 JSON 数据,base64 压缩数据,然后将 base64 编码的数据存储为文本。
- 效率;压缩 JSON 数据并将压缩的字节存储为
回答by Astrogat
Since the data is not binary there is really little reason to use a Binary Large OBject. Sure you can do it, but why? Text is easier to read for humans, and there isn't really a speed/size difference (.
由于数据不是二进制数据,因此几乎没有理由使用二进制大对象。你当然可以做到,但为什么呢?文本对人类来说更容易阅读,并且没有真正的速度/大小差异 (.
Even in other DBs you can often store JSON as text. E.g. even MySQL has text fields that can handle quite bit of text (LONGTEXT = 4Gb). Yeah, Oracle is behind, but hopefully they will also get a reasonable long text field sometimes.
即使在其他数据库中,您通常也可以将 JSON 存储为文本。例如,即使 MySQL 也有文本字段,可以处理相当多的文本 (LONGTEXT = 4Gb)。是的,Oracle 落后了,但希望他们有时也会得到一个合理的长文本字段。
But why do you want to store a whole Json object as text? The json should really be normalized and stored as multiple fields in the DB.
但是为什么要将整个 Json 对象存储为文本呢? json 应该真正标准化并存储为数据库中的多个字段。
回答by Jonathan
I would definitely say that text would be better than a blob for storing JSON. JSON is ultimately text, so this type makes sense, but also there may be extra overhead for blobs as some of drivers seem to require that they be converted to Hex before inserting them. Also, blobs show up as base64-encoded strings when using cqlsh, so you wouldn't be able to easily check what JSON was actually stored if you needed to for testing purposes. I'm not sure exactly how blobs are stored on disk, but I'd imagine it's very similar to how text is.
我肯定会说文本比 blob 更适合存储 JSON。JSON 最终是文本,所以这种类型是有道理的,但也可能有额外的 blob 开销,因为某些驱动程序似乎要求在插入它们之前将它们转换为十六进制。此外,在使用 cqlsh 时,blob 显示为 base64 编码的字符串,因此如果出于测试目的需要,您将无法轻松检查实际存储的 JSON。我不确定 blob 是如何存储在磁盘上的,但我想它与文本非常相似。
With that said, storing large entries can cause problems and is not recommended. This can cause issues with sharding and consume a lot of memory. Although the FAQ refers to files over 64MB, from experience even files a few megabytes each on average can cause performance issues when you start storing a large number of them. If possible, it would be better to use an object store if you expect the JSON to be in the megabytes in size and store references to that store in Cassandra instead.
话虽如此,存储大条目可能会导致问题,因此不建议这样做。这可能会导致分片问题并消耗大量内存。尽管 FAQ 指的是超过 64MB 的文件,但根据经验,当您开始存储大量文件时,即使平均每个文件只有几兆字节也会导致性能问题。如果可能,如果您希望 JSON 的大小为兆字节,并且在 Cassandra 中存储对该存储的引用,那么最好使用对象存储。
回答by Fredrik LS
In the upcoming 2.2 release there is also native support in Cassandra for JSON. http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-2-json-support
在即将发布的 2.2 版本中,Cassandra 也对 JSON 提供了本机支持。 http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-2-json-support