java 如何批量删除hbase中的多行?
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/32598003/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
How to mass delete multiple rows in hbase?
提问by Rolando
I have the following rows with these keys in hbase table "mytable"
我在 hbase 表“mytable”中有以下带有这些键的行
user_1
user_2
user_3
...
user_9999999
I want to use the Hbase shell to delete rows from:
我想使用 Hbase shell 从以下位置删除行:
user_500 to user_900
user_500 到 user_900
I know there is no way to delete, but is there a way I could use the "BulkDeleteProcessor" to do this?
我知道没有办法删除,但是有没有办法可以使用“BulkDeleteProcessor”来做到这一点?
I see here:
我在这里看到:
I want to just paste in imports and then paste this into the shell, but have no idea how to go about this. Does anyone know how I can use this endpoint from the jruby hbase shell?
我只想粘贴导入,然后将其粘贴到外壳中,但不知道该怎么做。有谁知道我如何从 jruby hbase shell 使用这个端点?
Table ht = TEST_UTIL.getConnection().getTable("my_table");
long noOfDeletedRows = 0L;
Batch.Call<BulkDeleteService, BulkDeleteResponse> callable =
new Batch.Call<BulkDeleteService, BulkDeleteResponse>() {
ServerRpcController controller = new ServerRpcController();
BlockingRpcCallback<BulkDeleteResponse> rpcCallback =
new BlockingRpcCallback<BulkDeleteResponse>();
public BulkDeleteResponse call(BulkDeleteService service) throws IOException {
Builder builder = BulkDeleteRequest.newBuilder();
builder.setScan(ProtobufUtil.toScan(scan));
builder.setDeleteType(deleteType);
builder.setRowBatchSize(rowBatchSize);
if (timeStamp != null) {
builder.setTimestamp(timeStamp);
}
service.delete(controller, builder.build(), rpcCallback);
return rpcCallback.get();
}
};
Map<byte[], BulkDeleteResponse> result = ht.coprocessorService(BulkDeleteService.class, scan
.getStartRow(), scan.getStopRow(), callable);
for (BulkDeleteResponse response : result.values()) {
noOfDeletedRows += response.getRowsDeleted();
}
ht.close();
If there exists no way to do this through JRuby, Java or alternate way to quickly delete multiple rows is fine.
如果没有办法通过 JRuby、Java 或替代方法来快速删除多行,那就没问题了。
采纳答案by Vikram Singh Chandel
Do you really want to do it in shell because there are various other better ways. One way is using the native java API
你真的想在 shell 中做这件事吗,因为还有其他各种更好的方法。一种方法是使用本机 java API
- Construct an array list of deletes
- pass this array list to Table.delete method
- 构造一个删除的数组列表
- 将此数组列表传递给 Table.delete 方法
Method 1: if you already know the range of keys.
方法一:如果你已经知道key的范围。
public void massDelete(byte[] tableName) throws IOException {
HTable table=(HTable)hbasePool.getTable(tableName);
String tablePrefix = "user_";
int startRange = 500;
int endRange = 999;
List<Delete> listOfBatchDelete = new ArrayList<Delete>();
for(int i=startRange;i<=endRange;i++){
String key = tablePrefix+i;
Delete d=new Delete(Bytes.toBytes(key));
listOfBatchDelete.add(d);
}
try {
table.delete(listOfBatchDelete);
} finally {
if (hbasePool != null && table != null) {
hbasePool.putTable(table);
}
}
}
Method 2: If you want to do a batch delete on the basis of a scan result.
方法二:如果要根据扫描结果进行批量删除。
public bulkDelete(final HTable table) throws IOException {
Scan s=new Scan();
List<Delete> listOfBatchDelete = new ArrayList<Delete>();
//add your filters to the scanner
s.addFilter();
ResultScanner scanner=table.getScanner(s);
for (Result rr : scanner) {
Delete d=new Delete(rr.getRow());
listOfBatchDelete.add(d);
}
try {
table.delete(listOfBatchDelete);
} catch (Exception e) {
LOGGER.log(e);
}
}
Now coming down to using a CoProcessor. only one advice, 'DON'T USE CoProcessor' unless you are an expert in HBase. CoProcessors have many inbuilt issues if you need I can provide a detailed description to you. Secondly when you delete anything from HBase it's never directly deleted from Hbase there is tombstone marker get attached to that record and later during a major compaction it gets deleted, so no need to use a coprocessor which is highly resource exhaustive.
现在归结为使用协处理器。只有一个建议,除非您是 HBase 专家,否则“不要使用协处理器”。如果您需要,协处理器有许多内置问题,我可以为您提供详细说明。其次,当您从 HBase 中删除任何内容时,它永远不会直接从 Hbase 中删除,墓碑标记会附加到该记录,稍后在主要压缩期间它会被删除,因此无需使用资源耗尽的协处理器。
Modified code to support batch operation.
修改代码以支持批量操作。
int batchSize = 50;
int batchCounter=0;
for(int i=startRange;i<=endRange;i++){
String key = tablePrefix+i;
Delete d=new Delete(Bytes.toBytes(key));
listOfBatchDelete.add(d);
batchCounter++;
if(batchCounter==batchSize){
try {
table.delete(listOfBatchDelete);
listOfBatchDelete.clear();
batchCounter=0;
}
}}
Creating HBase conf and getting table instance.
创建 HBase conf 并获取表实例。
Configuration hConf = HBaseConfiguration.create(conf);
hConf.set("hbase.zookeeper.quorum", "Zookeeper IP");
hConf.set("hbase.zookeeper.property.clientPort", ZookeeperPort);
HTable hTable = new HTable(hConf, tableName);
回答by Prasad Khode
If you already aware of the rowkeys of the records that you want to delete from HBase table then you can use the following approach
如果您已经知道要从 HBase 表中删除的记录的行键,那么您可以使用以下方法
1.First create a List objects with these rowkeys
1.首先用这些rowkeys创建一个List对象
for (int rowKey = 1; rowKey <= 10; rowKey++) {
deleteList.add(new Delete(Bytes.toBytes(rowKey + "")));
}
2.Then get the Table object by using HBase Connection
2.然后通过HBase Connection获取Table对象
Table table = connection.getTable(TableName.valueOf(tableName));
3.Once you have table object call delete() by passing the list
3.一旦你有表对象通过传递列表调用delete()
table.delete(deleteList);
The complete code will look like below
完整的代码如下所示
Configuration config = HBaseConfiguration.create();
config.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
config.addResource(new Path("/etc/hadoop/conf/core-site.xml"));
String tableName = "users";
Connection connection = ConnectionFactory.createConnection(config);
Table table = connection.getTable(TableName.valueOf(tableName));
List<Delete> deleteList = new ArrayList<Delete>();
for (int rowKey = 500; rowKey <= 900; rowKey++) {
deleteList.add(new Delete(Bytes.toBytes("user_" + rowKey)));
}
table.delete(deleteList);