java 在 HBase 中将 bytes[] 转换为字符串

Question

提问by fanbondi

I have the below row stored in a HBase table

我将以下行存储在 HBase 表中

 DIEp(^o^)q3    column=DIE:ID, timestamp=1346194191174, value=\x00\x00\x00\x01

I am trying to access the value and convert it to its string representation which should be 1, but I don't get the right string representation when I catthis file (where my output is redirected to)

我正在尝试访问该值并将其转换为它应该是的字符串表示1，但是当我cat这个文件时我没有得到正确的字符串表示（我的输出被重定向到）

cat /hadoop/logs/userlogs/job_201209121654_0027/attempt_201209121654_0027_m_000000_0/stdout

I got something like this garbage NUL NUL NUL SOH

我有这样的垃圾 NUL NUL NUL SOH

below is the code fragment that I am using.

下面是我正在使用的代码片段。

byte[] result1 = value.getValue("DIE".getBytes(), "ID".getBytes());
String myresult = Bytes.toString(result1);
System.out.println(myresult);

Answer 1

采纳答案by Jon Skeet

Firstly, I'd avoid using String.getBytes()without specifying an encoding. What encoding does the code actually expect? Specify it explicitly when you call "DIE".getBytes()and "ID".getBytes().

首先，我会避免在String.getBytes()不指定编码的情况下使用。代码实际期望什么编码？当您调用"DIE".getBytes()和时明确指定它"ID".getBytes()。

Next, it looks like you should be converting the 4 bytes into an integerfirst - then convert that integer into a string. For example:

接下来，看起来您应该先将4 个字节转换为整数- 然后将该整数转换为字符串。例如：

byte[] valueAsBytes = ...;
int valueAsInt = ((valueAsBytes[0] & 0xff) << 24) |
                 ((valueAsBytes[1] & 0xff) << 16) |
                 ((valueAsBytes[2] & 0xff) << 8) |
                 (valueAsBytes[3] & 0xff);
String valueAsString = String.valueof(valueAsInt);

There may well be something in the Java API to do the bit manipulation directly, but I can't think of it right now. (There's DataInputStream, but that would require wrapping the byte array in a ByteArrayInputStreamfirst, then you'd need to check the endianness...)

Java API 中可能有一些东西可以直接进行位操作，但我现在想不出来。（有DataInputStream，但这需要ByteArrayInputStream先将字节数组包装起来，然后您需要检查字节序...）

Your current code is doing exactly what you ask it to - admittedly with the default encoding of the platform. You've got "\u0000\u0000\u0000\u0001" basically.

您当前的代码完全按照您的要求执行 - 诚然，使用平台的默认编码。你基本上有“\u0000\u0000\u0000\u0001”。

Answer 2

回答by David

The standard HBase way of string conversion is Bytes.toBytes(string) and Bytes.toString(bytes). But Jon Skeet is correct in that you need to consider how you put the data into the column in the first place. If you used Bytes.toBytes(int), then you need to convert your bytes back into an integer before you convert to a string.

字符串转换的标准 HBase 方式是 Bytes.toBytes(string) 和 Bytes.toString(bytes)。但是 Jon Skeet 是正确的，因为您首先需要考虑如何将数据放入列中。如果您使用了 Bytes.toBytes(int)，那么您需要在转换为字符串之前将字节转换回整数。

Answer 3

回答by vikas

We have simply used new String(byte[]), where byte[] comes from org.apache.hadoop.hbase.KeyValue.getValue()to parse the bytes from HBase column as string and it working fine for our projects. :) Sorry, if I missed something in the question. Hope this helps.

我们只是使用new String(byte[]), 其中 byte[] 来自org.apache.hadoop.hbase.KeyValue.getValue()将 HBase 列中的字节解析为字符串，它对我们的项目工作正常。:) 对不起，如果我错过了问题中的某些内容。希望这可以帮助。

java 在 HBase 中将 bytes[] 转换为字符串

提问by fanbondi

采纳答案by Jon Skeet

回答by David

回答by vikas

相关推荐

最近更新

标签

java 在 HBase 中将 bytes[] 转换为字符串

提问by fanbondi

采纳答案by Jon Skeet

回答by David

回答by vikas

相关推荐

如何使用 Apache Ant 以 UTF-8 编码 Java 文件？

使用多字符分隔符分割 Java 字符串

java 从 Linux 64 位访问 javax.smartcardio

java 状态代码：403，同时获取不存在的对象

相关推荐

最近更新

标签