如何在 Java 中围绕字节序列拆分字节数组？

Question

提问by Ori Popowski

How to split a byte[]around a byte sequence in Java? Something like the byte[]version of String#split(regex).

如何byte[]在 Java 中围绕字节序列拆分 a ？类似的byte[]版本String#split(regex)。

Example

例子

Let's take this byte array:
[11 11 FF FF 22 22 22 FF FF 33 33 33 33]

让我们以这个字节数组为例：
[11 11 FF FF 22 22 22 FF FF 33 33 33 33]

and let's choose the delimiter to be
[FF FF]

让我们选择分隔符
[FF FF]

Then the split will result in these three parts:
[11 11]
[22 22 22]
[33 33 33 33]

那么拆分将导致这三个部分：
[11 11]
[22 22 22]
[33 33 33 33]

EDIT:

编辑：

Please note that you cannot convert the byte[]to String, then split it, then back because of encoding issues. When you do such conversion on byte arrays, the resulting byte[]will be different. Please refer to this: Conversion of byte[] into a String and then back to a byte[]

请注意，由于编码问题，您不能将转换byte[]为String，然后将其拆分，然后再返回。当您对字节数组进行此类转换时，结果byte[]会有所不同。请参考： Conversion of byte[] into a String and then back to a byte[]

Answer 1

采纳答案by slim

Note that you can reliably convert from byte[] to String and back, with a one-to-one mapping of chars to bytes, if you use the encoding "iso8859-1".

请注意，如果您使用编码“iso8859-1”，您可以可靠地从 byte[] 转换为 String 并返回，通过字符到字节的一对一映射。

However, it's still an ugly solution.

然而，这仍然是一个丑陋的解决方案。

I think you'll need to roll your own.

我想你需要自己动手。

I suggest solving it in two stages:

我建议分两个阶段解决：

Work out how to find the of indexes of each occurrence of the separator. Google for "Knuth-Morris-Pratt" for an efficient algorithm - although a more naive algorithm will be fine for short delimiters.
Each time you find an index, use Arrays.copyOfRange() to get the piece you need and add it to your output list.

找出如何找到每个出现的分隔符的索引。谷歌搜索“Knuth-Morris-Pratt”以获得一种高效的算法——尽管更简单的算法对于短分隔符会很好。
每次找到索引时，请使用 Arrays.copyOfRange() 获取所需的部分并将其添加到输出列表中。

Here it is using a naive pattern finding algorithm. KMP would become worth it if the delimiters are long (because it saves backtracking, but doesn't miss delimiters if they're embedded in sequence that mismatches at the end).

这里它使用了一个简单的模式发现算法。如果分隔符很长，KMP 将变得值得（因为它可以节省回溯，但如果它们嵌入在最后不匹配的序列中，则不会错过分隔符）。

public static boolean isMatch(byte[] pattern, byte[] input, int pos) {
    for(int i=0; i< pattern.length; i++) {
        if(pattern[i] != input[pos+i]) {
            return false;
        }
    }
    return true;
}

public static List<byte[]> split(byte[] pattern, byte[] input) {
    List<byte[]> l = new LinkedList<byte[]>();
    int blockStart = 0;
    for(int i=0; i<input.length; i++) {
       if(isMatch(pattern,input,i)) {
          l.add(Arrays.copyOfRange(input, blockStart, i));
          blockStart = i+pattern.length;
          i = blockStart;
       }
    }
    l.add(Arrays.copyOfRange(input, blockStart, input.length ));
    return l;
}

Answer 2

回答by devmtl

Refer to Java Doc for String

有关字符串，请参阅Java 文档

You can construct a Stringobject from bytearray. Guess you know the rest.

您可以String从byte数组构造一个对象。猜你知道剩下的。

public static byte[][] splitByteArray(byte[] bytes, byte[] regex, Charset charset) {
    String str = new String(bytes, charset);
    String[] split = str.split(new String(regex, charset));
    byte[][] byteSplit = new byte[split.length][];
    for (int i = 0; i < split.length; i++) {
        byteSplit[i] = split[i].getBytes(charset);
    }
    return byteSplit;
}

public static void main(String[] args) {
    Charset charset = Charset.forName("UTF-8");
    byte[] bytes = {
        '1', '1', ' ', '1', '1',
        'F', 'F', ' ', 'F', 'F',
        '2', '2', ' ', '2', '2', ' ', '2', '2',
        'F', 'F', ' ', 'F', 'F',
        '3', '3', ' ', '3', '3', ' ', '3', '3', ' ', '3', '3'
    };
    byte[] regex = {'F', 'F', ' ', 'F', 'F'};
    byte[][] splitted = splitByteArray(bytes, regex, charset);
    for (byte[] arr : splitted) {
        System.out.print("[");
        for (byte b : arr) {
            System.out.print((char) b);
        }
        System.out.println("]");
    }
}

Answer 3

回答by avgvstvs

Rolling your own is the only way to go here. The best idea I can offer if you're open to non-standard libraries is this class from Apache:

自己动手是唯一的出路。如果您对非标准库持开放态度，我可以提供的最佳想法是来自 Apache 的这个类：

http://commons.apache.org/proper/commons-primitives/apidocs/org/apache/commons/collections/primitives/ArrayByteList.html

Knuth's solution is probably the best, but I would treat the array as a stack and do something like this:

Knuth 的解决方案可能是最好的，但我会将数组视为堆栈并执行以下操作：

List<ArrayByteList> targetList = new ArrayList<ArrayByteList>();
while(!stack.empty()){
  byte top = stack.pop();
  ArrayByteList tmp = new ArrayByteList();

  if( top == 0xff && stack.peek() == 0xff){
    stack.pop();
    continue;
  }else{
    while( top != 0xff ){
      tmp.add(stack.pop());
    }
    targetList.add(tmp);
  }
}

I'm aware that this is pretty quick and dirty but it should deliver O(n) in all cases.

我知道这非常快速和肮脏，但它应该在所有情况下都提供 O(n)。

Answer 4

回答by Maysam Torabi

You can use Arrays.copyOfRange()for that.

你可以用Arrays.copyOfRange()它。

Answer 5

回答by L. Blanc

Here is a straightforward solution.

这是一个简单的解决方案。

Unlike avgvstvs approach it handles arbitrary length delimiters. The top answer is also good, but the author hasn't fixed the issue pointed out by Eitan Perkal. That issue is avoided here using the approach Perkal suggests.

与 avgvstvs 方法不同，它处理任意长度的分隔符。最佳答案也很好，但作者尚未解决 Eitan Perkal 指出的问题。这里使用 Perkal 建议的方法避免了这个问题。

public static List<byte[]> tokens(byte[] array, byte[] delimiter) {
        List<byte[]> byteArrays = new LinkedList<>();
        if (delimiter.length == 0) {
            return byteArrays;
        }
        int begin = 0;

        outer:
        for (int i = 0; i < array.length - delimiter.length + 1; i++) {
            for (int j = 0; j < delimiter.length; j++) {
                if (array[i + j] != delimiter[j]) {
                    continue outer;
                }
            }
            byteArrays.add(Arrays.copyOfRange(array, begin, i));
            begin = i + delimiter.length;
        }
        byteArrays.add(Arrays.copyOfRange(array, begin, array.length));
        return byteArrays;
    }

Answer 6

回答by Roger

I modified 'L. Blanc' answer to handle delimiters at the very beginning and at the very end. Plus I renamed it to 'split'.

我修改了'L. Blanc 在开头和结尾处理分隔符的答案。另外，我将其重命名为“拆分”。

private List<byte[]> split(byte[] array, byte[] delimiter)
{
   List<byte[]> byteArrays = new LinkedList<byte[]>();
   if (delimiter.length == 0)
   {
      return byteArrays;
   }
   int begin = 0;

   outer: for (int i = 0; i < array.length - delimiter.length + 1; i++)
   {
      for (int j = 0; j < delimiter.length; j++)
      {
         if (array[i + j] != delimiter[j])
         {
            continue outer;
         }
      }

      // If delimiter is at the beginning then there will not be any data.
      if (begin != i)
         byteArrays.add(Arrays.copyOfRange(array, begin, i));
      begin = i + delimiter.length;
   }

   // delimiter at the very end with no data following?
   if (begin != array.length)
      byteArrays.add(Arrays.copyOfRange(array, begin, array.length));

   return byteArrays;
}

如何在 Java 中围绕字节序列拆分字节数组？

提问by Ori Popowski

Example

例子

EDIT:

编辑：

采纳答案by slim

回答by devmtl

回答by avgvstvs

回答by Maysam Torabi

回答by L. Blanc

回答by Roger

相关推荐

最近更新

标签

如何在 Java 中围绕字节序列拆分字节数组？

提问by Ori Popowski

Example

例子

EDIT:

编辑：

采纳答案by slim

回答by devmtl

回答by avgvstvs

回答by Maysam Torabi

回答by L. Blanc

回答by Roger

相关推荐

是否可以在 Java 8 中转换 Stream？

Java 自 OpenQuant 消亡以来，是否有免费的实时金融数据馈送？

Java 在 Spring 配置 XML 文件中出错：org.xml.sax.SAXParseException

Java 重写到 MVC 后 GUI 不工作

相关推荐

最近更新

标签