使用 Java 在二进制文件中搜索字节序列

Question

提问by Bassam

I have a sequence of bytes that I have to search for in a set of Binary files using Java.

我有一个必须使用 Java 在一组二进制文件中搜索的字节序列。

Example: I'm searching for the byte sequence DEADBEEF(in hex) in a Binary file. How would I go about doing this in Java? Is there a built-in method, like String.contains()for Binary files?

示例：我DEADBEEF在二进制文件中搜索字节序列（十六进制）。我将如何在 Java 中执行此操作？是否有内置方法，例如String.contains()二进制文件？

Answer 1

采纳答案by janko

No, there is no built-in method to do that. But, directly copied from HERE(with two fixes applied to the original code):

不，没有内置方法可以做到这一点。但是，直接从HERE复制（对原始代码应用了两个修复程序）：

/**
 * Knuth-Morris-Pratt Algorithm for Pattern Matching
 */
class KMPMatch {
    /**
     * Finds the first occurrence of the pattern in the text.
     */
    public static int indexOf(byte[] data, byte[] pattern) {
        if (data.length == 0) return -1;

        int[] failure = computeFailure(pattern);    
        int j = 0;

        for (int i = 0; i < data.length; i++) {
            while (j > 0 && pattern[j] != data[i]) {
                j = failure[j - 1];
            }
            if (pattern[j] == data[i]) { j++; }
            if (j == pattern.length) {
                return i - pattern.length + 1;
            }
        }
        return -1;
    }

    /**
     * Computes the failure function using a boot-strapping process,
     * where the pattern is matched against itself.
     */
    private static int[] computeFailure(byte[] pattern) {
        int[] failure = new int[pattern.length];

        int j = 0;
        for (int i = 1; i < pattern.length; i++) {
            while (j > 0 && pattern[j] != pattern[i]) {
                j = failure[j - 1];
            }
            if (pattern[j] == pattern[i]) {
                j++;
            }
            failure[i] = j;
        }

        return failure;
    }
}

Answer 2

回答by joseluisbz

private int bytesIndexOf(byte[] source, byte[] search, int fromIndex) {
    boolean find = false;
    int i;
    for (i = fromIndex; i < (source.length - search.length); i++) {
        if (source[i] == search[0]) {
            find = true;
            for (int j = 0; j < search.length; j++) {
                if (source[i + j] != search[j]) {
                    find = false;
                }
            }
        }
        if (find) {
            break;
        }
    }
    if (!find) {
        return -1;
    }
    return i;
}

Answer 3

回答by L. Blanc

For those who prefer libraries, there is an implementation (source below) of the Knuth-Morris-Pratt algorithm in Twitter's Elephant-Bird open source library (Apache license).

对于那些喜欢库的人，在 Twitter 的 Elephant-Bird 开源库（Apache 许可）中有 Knuth-Morris-Pratt 算法的实现（来源如下）。

You can find the library on Github at: https://github.com/twitter/elephant-bird

你可以在 Github 上找到这个库：https: //github.com/twitter/elephant-bird

package com.twitter.elephantbird.util;

import java.io.IOException;
import java.io.InputStream;
import java.util.Arrays;

/**
 * An efficient stream searching class based on the Knuth-Morris-Pratt algorithm.
 * For more on the algorithm works see: http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm.
 */
public class StreamSearcher {

  protected byte[] pattern_;
  protected int[] borders_;

  // An upper bound on pattern length for searching. Throws exception on longer patterns
  public static final int MAX_PATTERN_LENGTH = 1024;

  public StreamSearcher(byte[] pattern) {
    setPattern(pattern);
  }

  /**
   * Sets a new pattern for this StreamSearcher to use.
   * @param pattern
   *          the pattern the StreamSearcher will look for in future calls to search(...)
   */
  public void setPattern(byte[] pattern) {
    if (pattern.length > MAX_PATTERN_LENGTH) {
      throw new IllegalArgumentException("The maximum pattern length is " + MAX_PATTERN_LENGTH);
    }

    pattern_ = Arrays.copyOf(pattern, pattern.length);
    borders_ = new int[pattern_.length + 1];
    preProcess();
  }

  /**
   * Searches for the next occurrence of the pattern in the stream, starting from the current stream position. Note
   * that the position of the stream is changed. If a match is found, the stream points to the end of the match -- i.e. the
   * byte AFTER the pattern. Else, the stream is entirely consumed. The latter is because InputStream semantics make it difficult to have
   * another reasonable default, i.e. leave the stream unchanged.
   *
   * @return bytes consumed if found, -1 otherwise.
   * @throws IOException
   */
  public long search(InputStream stream) throws IOException {
    long bytesRead = 0;

    int b;
    int j = 0;

    while ((b = stream.read()) != -1) {
      bytesRead++;

      while (j >= 0 && (byte)b != pattern_[j]) {
        j = borders_[j];
      }
      // Move to the next character in the pattern.
      ++j;

      // If we've matched up to the full pattern length, we found it.  Return,
      // which will automatically save our position in the InputStream at the point immediately
      // following the pattern match.
      if (j == pattern_.length) {
        return bytesRead;
      }
    }

    // No dice, Note that the stream is now completely consumed.
    return -1;
  }

  /**
   * Builds up a table of longest "borders" for each prefix of the pattern to find. This table is stored internally
   * and aids in implementation of the Knuth-Moore-Pratt string search.
   * <p>
   * For more information, see: http://www.inf.fh-flensburg.de/lang/algorithmen/pattern/kmpen.htm.
   */
  protected void preProcess() {
    int i = 0;
    int j = -1;
    borders_[i] = j;
    while (i < pattern_.length) {
      while (j >= 0 && pattern_[i] != pattern_[j]) {
        j = borders_[j];
      }
      borders_[++i] = ++j;
    }
  }
}

Answer 4

回答by riversun

You can find sequence of bytes from giga-bytes order file using bigdoc.

您可以使用 bigdoc 从千兆字节顺序文件中找到字节序列。

Lib and Example here on Github at: https://github.com/riversun/bigdoc

Github 上的 Lib 和示例：https: //github.com/riversun/bigdoc

package org.example;

import java.io.File;
import java.util.List;

import org.riversun.bigdoc.bin.BigFileSearcher;

public class Example {

    public static void main(String[] args) throws Exception {

        byte[] searchBytes = "hello world.".getBytes("UTF-8");

        File file = new File("/var/tmp/yourBigfile.bin");

        BigFileSearcher searcher = new BigFileSearcher();

        List<Long> findList = searcher.searchBigFile(file, searchBytes);

        System.out.println("positions = " + findList);
    }
}

If you want to search it on memory,check this. Examples here on Github at: https://github.com/riversun/finbin

如果你想在内存中搜索它，请检查这个。Github 上的示例：https: //github.com/riversun/finbin

 import java.util.List;

 import org.riversun.finbin.BigBinarySearcher;

 public class Example {

     public static void main(String[] args) throws Exception {

         BigBinarySearcher bbs = new BigBinarySearcher();

         byte[] iamBigSrcBytes = "Hello world.It's a small world.".getBytes("utf-8");

         byte[] searchBytes = "world".getBytes("utf-8");

         List<Integer> indexList = bbs.searchBytes(iamBigSrcBytes, searchBytes);

         System.out.println("indexList=" + indexList);
     }
 }

Returns all the matched positions in the array of bytes

返回字节数组中所有匹配的位置

It also can withstand a large array of bytes:)

它还可以承受大量字节:)

使用 Java 在二进制文件中搜索字节序列

提问by Bassam

采纳答案by janko

回答by joseluisbz

回答by L. Blanc

回答by riversun

相关推荐

最近更新

标签

使用 Java 在二进制文件中搜索字节序列

提问by Bassam

采纳答案by janko

回答by joseluisbz

回答by L. Blanc

回答by riversun

相关推荐

在java中用空格标记一个字符串

Java 更改 JTextArea 的背景颜色会引发异常

java中最常用的运行时异常有哪些？

字符串中 hh:mm:ss 的 Java 正则表达式匹配

相关推荐

最近更新

标签