php 如何获取 DynamoDB 表中的项目总数?

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12499822/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-25 03:41:21  来源:igfitidea点击:

How can I get the total number of items in a DynamoDB table?

phpamazon-dynamodb

提问by Tom

I want to know how many items are in my dynamodb table. From the API guide, one way to do it is using a scanas follows:

我想知道我的 dynamodb 表中有多少项目。根据 API 指南,一种方法是使用扫描,如下所示:

<?php
$dynamodb = new AmazonDynamoDB();

$scan_response = $dynamodb->scan(array(
    'TableName' => 'ProductCatalog' 
));

echo "Total number of items: ".count($scan_response->body->Items)."\n";

However, this has to fetch all items and store them in an array in memory which isn't feasible in most cases I would presume. Is there a way to get the total item count more efficiently?

但是,这必须获取所有项目并将它们存储在内存中的数组中,这在我认为的大多数情况下是不可行的。有没有办法更有效地获得总项目数?

This data is not available in the AWS Dynamo web-console, I have already checked. (at first it looks like it is shown alongside the pagination buttons, but it turns out the figure grows as you go to the next page of items).

此数据在 AWS Dynamo 网络控制台中不可用,我已经检查过。(起初看起来它显示在分页按钮旁边,但事实证明,当您转到下一页项目时,数字会增长)。

回答by Hanny

I can think of three options to get the total number of items in a DynamoDB table.

我可以想到三个选项来获取 DynamoDB 表中的项目总数。

  1. The first option is using the scan, but the scan function is inefficient and is in general a bad practice, especially for tables with heavy reads or production tables.

  2. The second option is what was mention by Atharva:

    A better solution that comes to my mind is to maintain the total number of item counts for such tables in a separate table, where each item will have Table name as it's hash key and total number of items in that table as it's non-key attribute. You can then keep this Table possibly named "TotalNumberOfItemsPerTable" updated by making atomic update operations to increment/decrement the total item count for a particular table.

    The only problem this is that increment operations are not idempotent. So if a write fails or you write more than once this will be reflected in the count. If you need pin-point accuracy, use a conditional update instead.

  3. The simplest solution is the DescribeTable which returns ItemCount. The only issue is that the count isn't up to date. The count is updated every 6 hours.

  1. 第一个选项是使用扫描,但扫描功能效率低下,通常是一种不好的做法,特别是对于读取量大的表或生产表。

  2. 第二个选项是 Atharva 提到的:

    我想到的一个更好的解决方案是在一个单独的表中维护此类表的项目总数,其中每个项目将表名称作为它的散列键和该表中项目的总数作为它的非键属性. 然后,您可以通过进行原子更新操作来增加/减少特定表的总项目数,从而使这个可能名为“TotalNumberOfItemsPerTable”的表保持更新。

    唯一的问题是增量操作不是幂等的。因此,如果写入失败或您多次写入,这将反映在计数中。如果您需要精确定位,请改用条件更新。

  3. 最简单的解决方案是返回 ItemCount 的 DescribeTable。唯一的问题是计数不是最新的。计数每 6 小时更新一次。

http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_DescribeTable.html

http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_DescribeTable.html

回答by Jeremy Lindblom

The Countoption is definitely what you want, but you also have to take into account that there may be one or more "page" of results in your Scan result. The Scan operation only scans 1MB of data in your table at a time, so the value of Countin the result is only going to reflect the count of the first 1MB of the table. You will need to make subsequent requests using the value of LastEvaluatedKeyin the result (if it is there). Here is some sample code for doing something like that:

Count选项绝对是您想要的,但您还必须考虑到扫描结果中可能有一个或多个“页面”结果。Scan 操作一次只扫描表中 1MB 的数据,因此Count结果中的值只会反映表的前 1MB 的计数。您将需要使用结果中的值LastEvaluatedKey(如果存在)发出后续请求。这是一些用于执行类似操作的示例代码:

<?php

$dynamo_db = new AmazonDynamoDB();

$total = 0;
$start_key = null;
$params = array(
    'TableName' => 'my-table',
    'Count'     => true
);

do {
    if ($start_key) {
        $params['ExclusiveStartKey'] = $start_key->getArrayCopy();
    }

    $response = $dynamo_db->scan($params);

    if ($response->isOK()) {
        $total += (string) $response->body->Count;

        if ($response->body->LastEvaluatedKey) {
            $start_key = $response->body->LastEvaluatedKey->to_array();
        } else {
            $start_key = null;
        }
    }
} while ($start_key);

echo "Count: {$total}";

回答by Tom

Aha, there is a Countoption in the scanAPI, see http://docs.amazonwebservices.com/AWSSDKforPHP/latest/#m=AmazonDynamoDB/scan

啊哈,API里有个Count选项scan,见http://docs.amazonwebservices.com/AWSSDKforPHP/latest/#m=AmazonDynamoDB/scan

<?php
$dynamodb = new DynamoMetadata();

$scan_response = $dynamodb->scan(array(
    'TableName' => 'ProductCatalog'
    'Count'     => true,
));

echo "Count: ".$scan_response->body->Count."\n";

回答by Atharva

If you are interested in using the total number of items in a table in your application's logic, that means you are going to query for the total counts pretty frequently. Now one way to achieve this is by using scan operation. But remember that scan operation literally scans through the whole table and therefore consumes lots of throughput, so all the query operations will receive Throttled Exception in that duration. And even considering the fact that scan will limit the resultant count by size of 1MB, you will have to make repeated scan operations to get the actual number of items if the table is very large. This will require to write a custom query logic and handle inevitable throttling in query operations.

如果您对在应用程序逻辑中使用表中的项目总数感兴趣,这意味着您将非常频繁地查询总数。现在实现这一目标的一种方法是使用扫描操作。但是请记住,扫描操作实际上是扫描整个表,因此会消耗大量吞吐量,因此所有查询操作都会在此期间收到 Throttled Exception。即使考虑到 scan 会将结果计数限制为 1MB 的大小,如果表非常大,您将不得不进行重复扫描操作以获取实际项目数。这将需要编写自定义查询逻辑并处理查询操作中不可避免的限制。

A better solution that comes to my mind is to maintain the total number of item counts for such tables in a separate table, where each item will have Table name as it's hash key and total number of items in that table as it's non-key attribute. You can then keep this Table possibly named "TotalNumberOfItemsPerTable" updated by making atomic update operations to increment/decrement the total item count for a particular table.

我想到的一个更好的解决方案是在一个单独的表中维护此类表的项目总数,其中每个项目将表名称作为它的散列键和该表中项目的总数作为它的非键属性. 然后,您可以通过进行原子更新操作来增加/减少特定表的总项目数,从而使这个可能名为“TotalNumberOfItemsPerTable”的表保持更新。

No issue of throttling or 1MB limit.

没有节流或 1MB 限制的问题。

Furthermore, you can expand this concept to even further granularity for example to maintain total number of items matching with some hash key or any arbitrary criteria which you can encode in string form to make an entry in your table named something like "TotalNumberOfItemsInSomeCollection" or "TotalNumberOfItemsMatchingSomeCriteria". These tables can then contain entries for number of items per table, per collection or items matching with some criteria.

此外,您可以将此概念扩展到更进一步的粒度,例如维护与某些哈希键或任何任意条件匹配的项目总数,您可以将其编码为字符串形式以在表中创建一个名为“TotalNumberOfItemsInSomeCollection”或“的条目” TotalNumberOfItemsMatchingSomeCriteria”。然后,这些表可以包含每个表、每个集合的项目数或与某些条件匹配的项目的条目。

回答by Nizam Mohideen

An approximate item count value (supposedly updated every six hours) is available in the AWS console for DynamoDB. Just select the table and look under the Details tab, last entry is Item Count. If this works for you, then you can avoid consuming your table throughput to do the count.

DynamoDB 的 AWS 控制台中提供了近似的项目计数值(假设每六小时更新一次)。只需选择表格并查看详细信息选项卡下,最后一个条目是项目计数。如果这对您有用,那么您可以避免消耗表吞吐量来进行计数。

回答by Golda Velez

This is now available in the AWS table overview screen under the section 'Table details', field 'Item count'. It appears to be just a dump of DescribeTable, and notes that its updated roughly every six hours.

这现在可在 AWS 表概览屏幕中的“表详细信息”部分下的“项目计数”字段中使用。它似乎只是 DescribeTable 的转储,并指出它大约每六个小时更新一次。

回答by jc mannem

Here's how I get the exact item count on my billion records DynamoDB table:

以下是我如何在我的十亿条记录 DynamoDB 表中获得确切的项目数:

hive>

蜂巢>

set dynamodb.throughput.write.percent = 1;
set dynamodb.throughput.read.percent = 1;
set hive.execution.engine = mr;
set mapreduce.reduce.speculative=false;
set mapreduce.map.speculative=false;

CREATE EXTERNAL TABLE dynamodb_table (`ID` STRING,`DateTime` STRING,`ReportedbyName` STRING,`ReportedbySurName` STRING,`Company` STRING,`Position` STRING,`Country` STRING,`MailDomain` STRING) STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' TBLPROPERTIES ("dynamodb.table.name" = "BillionData", "dynamodb.column.mapping" = "ID:ID,DateTime:DateTime,ReportedbyName:ReportedbyName,ReportedbySurName:ReportedbySurName,Company:Company,Position:Position,Country:Country,MailDomain:MailDomain");

SELECT count(*) FROM dynamodb_table;

*You should have a EMR cluster, which comes installed with Hive and DynamoDB record Handler. *With this command, DynamoDB handler on the hive issues "PARALLEL SCANS" with multiple Mapreduce mappers(AKA Workers) working on different partitions to get the count. This will be much efficient and faster than normal scans.
*You must be willing to bump up Read capacity very high for certain period of time. * On a decent sized(20 node) cluster , With 10000 RCU , it took 15 minutes to get count on billion records Approx.
* New writes on this DDB table during this period will make the count inconsistent.

*您应该有一个 EMR 集群,它随 Hive 和 DynamoDB 记录处理程序一起安装。*使用此命令,配置单元上的 DynamoDB 处理程序会通过在不同分区上工作的多个 Mapreduce 映射器(AKA Worker)发出“并行扫描”以获取计数。这将比正常扫描更有效、更快。
*您必须愿意在一段时间内非常高地提高读取容量。* 在一个体面大小的(20 个节点)集群上,使用 10000 个 RCU,需要 15 分钟才能获得约 10 亿条记录。
* 在此期间对该 DDB 表的新写入会导致计数不一致。

回答by Sajid Ali

Please do not use scan method of dynamoDb because it read all data of tabel but in this case u need only count so use this .

请不要使用 dynamoDb 的扫描方法,因为它会读取表格的所有数据,但在这种情况下,您只需要计数,因此请使用它。

$scan_response = $dynamodb->describeTable(array(
'TableName' => 'ProductCatalog' ));

and print this for result $scan_response['Table']['ItemCount']

并打印此结果 $scan_response['Table']['ItemCount']