java 在 HBase 中将 bytes[] 转换为字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/12392768/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-10-31 08:43:37  来源:igfitidea点击:

Converting bytes[] to string in HBase

javahadoopbytehbase

提问by fanbondi

I have the below row stored in a HBase table

我将以下行存储在 HBase 表中

 DIEp(^o^)q3    column=DIE:ID, timestamp=1346194191174, value=\x00\x00\x00\x01

I am trying to access the value and convert it to its string representation which should be 1, but I don't get the right string representation when I catthis file (where my output is redirected to)

我正在尝试访问该值并将其转换为它应该是的字符串表示1,但是当我cat这个文件时我没有得到正确的字符串表示(我的输出被重定向到)

cat /hadoop/logs/userlogs/job_201209121654_0027/attempt_201209121654_0027_m_000000_0/stdout

I got something like this garbage NUL NUL NUL SOH

我有这样的垃圾 NUL NUL NUL SOH

below is the code fragment that I am using.

下面是我正在使用的代码片段。

byte[] result1 = value.getValue("DIE".getBytes(), "ID".getBytes());
String myresult = Bytes.toString(result1);
System.out.println(myresult);

采纳答案by Jon Skeet

Firstly, I'd avoid using String.getBytes()without specifying an encoding. What encoding does the code actually expect? Specify it explicitly when you call "DIE".getBytes()and "ID".getBytes().

首先,我会避免在String.getBytes()不指定编码的情况下使用。代码实际期望什么编码?当您调用"DIE".getBytes()和时明确指定它"ID".getBytes()

Next, it looks like you should be converting the 4 bytes into an integerfirst - then convert that integer into a string. For example:

接下来,看起来您应该先将4 个字节转换为整数- 然后将该整数转换为字符串。例如:

byte[] valueAsBytes = ...;
int valueAsInt = ((valueAsBytes[0] & 0xff) << 24) |
                 ((valueAsBytes[1] & 0xff) << 16) |
                 ((valueAsBytes[2] & 0xff) << 8) |
                 (valueAsBytes[3] & 0xff);
String valueAsString = String.valueof(valueAsInt);

There may well be something in the Java API to do the bit manipulation directly, but I can't think of it right now. (There's DataInputStream, but that would require wrapping the byte array in a ByteArrayInputStreamfirst, then you'd need to check the endianness...)

Java API 中可能有一些东西可以直接进行位操作,但我现在想不出来。(有DataInputStream,但这需要ByteArrayInputStream先将字节数组包装起来,然后您需要检查字节序...)

Your current code is doing exactly what you ask it to - admittedly with the default encoding of the platform. You've got "\u0000\u0000\u0000\u0001" basically.

您当前的代码完全按照您的要求执行 - 诚然,使用平台的默认编码。你基本上有“\u0000\u0000\u0000\u0001”。

回答by David

The standard HBase way of string conversion is Bytes.toBytes(string) and Bytes.toString(bytes). But Jon Skeet is correct in that you need to consider how you put the data into the column in the first place. If you used Bytes.toBytes(int), then you need to convert your bytes back into an integer before you convert to a string.

字符串转换的标准 HBase 方式是 Bytes.toBytes(string) 和 Bytes.toString(bytes)。但是 Jon Skeet 是正确的,因为您首先需要考虑如何将数据放入列中。如果您使用了 Bytes.toBytes(int),那么您需要在转换为字符串之前将字节转换回整数。

回答by vikas

We have simply used new String(byte[]), where byte[] comes from org.apache.hadoop.hbase.KeyValue.getValue()to parse the bytes from HBase column as string and it working fine for our projects. :) Sorry, if I missed something in the question. Hope this helps.

我们只是使用new String(byte[]), 其中 byte[] 来自org.apache.hadoop.hbase.KeyValue.getValue()将 HBase 列中的字节解析为字符串,它对我们的项目工作正常。:) 对不起,如果我错过了问题中的某些内容。希望这可以帮助。