database Cassandra 中列族的行数

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/1951843/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-08 07:33:27  来源:igfitidea点击:

Row count of a column family in Cassandra

databasecountcassandrarowcount

提问by Henri Liljeroos

Is there a way to get a row count (key count) of a single column family in Cassandra? get_count can only be used to get the column count.

有没有办法在 Cassandra 中获取单个列族的行数(键数)?get_count 只能用于获取列数。

For instance, if I have a column family containing users and wanted to get the number of users. How could I do it? Each user is it's own row.

例如,如果我有一个包含用户的列族并想获取用户数。我怎么能做到?每个用户都是它自己的行。

回答by Justin DeMaris

If you are working on a large data set and are okay with a pretty good approximation, I highly recommend using the command:

如果您正在处理大型数据集并且可以使用非常好的近似值,我强烈建议您使用以下命令:

nodetool --host <hostname> cfstats

This will dump out a list for each column family looking like this:

这将为每个列族输出一个列表,如下所示:

Column Family: widgets
SSTable count: 11
Space used (live): 4295810363
Space used (total): 4295810363
Number of Keys (estimate): 9709824
Memtable Columns Count: 99008
Memtable Data Size: 150297312
Memtable Switch Count: 434
Read Count: 9716802
Read Latency: 0.036 ms.
Write Count: 9716806
Write Latency: 0.024 ms.
Pending Tasks: 0
Bloom Filter False Postives: 10428
Bloom Filter False Ratio: 1.00000
Bloom Filter Space Used: 18216448
Compacted row minimum size: 771
Compacted row maximum size: 263210
Compacted row mean size: 1634

The "Number of Keys (estimate)" row is a good guess across the cluster and the performance is a lot faster than explicit count approaches.

“键数(估计)”行是整个集群的一个很好的猜测,性能比显式计数方法快得多。

回答by ajjain

I found an excellent article on this here.. http://www.planetcassandra.org/blog/post/counting-keys-in-cassandra

我在这里找到了一篇很棒的文章.. http://www.planetcassandra.org/blog/post/counting-keys-in-cassandra

select count(*) from cf limit 1000000

从 cf limit 1000000 中选择 count(*)

Above statement can be used if we have an approximate upper bound known before hand. I found this useful for my case.

如果我们事先知道一个近似的上限,就可以使用上面的语句。我发现这对我的情况很有用。

回答by jbellis

If you are using an order-preserving partitioner, you can do this with get_range_slice or get_key_range.

如果您使用的是保留顺序的分区程序,则可以使用 get_range_slice 或 get_key_range 执行此操作。

If you are not, you will need to store your user ids in a special row.

如果不是,则需要将用户 ID 存储在特殊行中。

回答by Ben Burns

[Edit: This answer is out of date as of Cassandra 0.8.1 -- please see the Counters entryin the Cassandra Wiki for the correct way to handle Counter Columns in Cassandra.]

[编辑:此答案自 Cassandra 0.8.1 起已过时——请参阅Cassandra Wiki 中的计数器条目,了解在 Cassandra 中处理计数器列的正确方法。]

I'm new to Cassandra, but I have messed around a lot with Google's App Engine. If no other solution presents itself, you may consider keeping a separate counter in a platform that supports atomic increment operations like memcached. I know that Cassandra is working on atomic counter increment/decrement functionality, but it's not yet ready for prime time.

我是 Cassandra 的新手,但我在 Google 的 App Engine 上遇到了很多麻烦。如果没有其他解决方案,您可以考虑在支持诸如 memcached 之类的原子增量操作的平台中保留一个单独的计数器。我知道 Cassandra 正在研究原子计数器递增/递减功能,但它还没有准备好迎接黄金时间。

I can only post one hyperlink because I'm new, so for progress on counter support see the link in my comment below.

我只能发布一个超链接,因为我是新手,因此有关柜台支持的进展,请参阅下面我的评论中的链接。

Note that this thread suggests ZooKeeper, memcached, and redis as possible solutions. My personal preference would be memcached.

请注意,该线程建议将 ZooKeeper、memcached 和 redis 作为可能的解决方案。我个人的偏好是 memcached。

http://www.mail-archive.com/[email protected]/msg03965.html

http://www.mail-archive.com/[email protected]/msg03965.html

回答by Dean Hiller

There is always map/reduce but that probably goes without saying. If you have that with hive or pig, then you can do it for any table across the cluster though I am not sure tasktrackers know about cassandra locality and so it may have to stream the whole table across the network so you get task trackers on cassandra nodes but the data they receive may be from another cassandra node :(. I would love to hear if anyone knows for sure though.

总是有 map/reduce ,但这可能不言而喻。如果你有 hive 或 pig ,那么你可以对集群中的任何表执行它,尽管我不确定 tasktrackers 知道 cassandra 位置,因此它可能必须通过网络传输整个表,以便你在 cassandra 上获得任务跟踪器节点,但他们收到的数据可能来自另一个 cassandra 节点 :(。我很想听听是否有人确切知道。

NOTE: We are setting up map/reduce on cassandra mainly because if we want an index later, we can map/reduce one into cassandra.

注意:我们在 cassandra 上设置 map/reduce 主要是因为如果我们以后想要一个索引,我们可以将一个映射/reduce 到 cassandra 中。

回答by Philip Schlump

I have been getting the counts like this after I convert the data into a hash in PHP.

在将数据转换为 PHP 中的散列后,我一直得到这样的计数。