Java 在另一个更大的数组中找到一个数组
声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow
原文地址: http://stackoverflow.com/questions/3940194/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me):
StackOverFlow
Find an array inside another larger array
提问by Roman
I was recently asked to write 3 test programs for a job. They would be written using just core Java API's and any test framework of my choice. Unit tests should be implemented where appropriate.
我最近被要求为一份工作编写 3 个测试程序。它们将仅使用核心 Java API 和我选择的任何测试框架编写。应在适当的情况下实施单元测试。
Although I haven't received any feedback at all, I suppose they didn't like my solutions (otherwise I would have heard from them), so I decided to show my programs here and ask if this implementation can be considered good, and, if not, then why?
虽然我根本没有收到任何反馈,但我想他们不喜欢我的解决方案(否则我会从他们那里听到),所以我决定在这里展示我的程序并询问这个实现是否可以被认为是好的,并且,如果不是,那为什么呢?
To avoid confusion, I'll ask only first one for now.
为了避免混淆,我现在只问第一个。
Implement a function that finds an array in another larger array. It should accept two arrays as parameters and it will return the index of the first array where the second array first occurs in full. Eg, findArray([2,3,7,1,20], [7,1]) should return 2.
实现一个函数,在另一个更大的数组中找到一个数组。它应该接受两个数组作为参数,它将返回第一个数组的索引,其中第二个数组首先完整出现。例如, findArray([2,3,7,1,20], [7,1]) 应该返回 2。
I didn't try to find any existing solution, but instead wanted to do it myself.
我没有试图找到任何现有的解决方案,而是想自己做。
Possible reasons: 1. Should be static. 2. Should use line comments instead of block ones. 3. Didn't check for null values first (I know, just spotted too late). 4. ?
可能的原因: 1. 应该是静态的。2. 应该使用行注释而不是块注释。3. 没有先检查空值(我知道,只是发现太晚了)。4. ?
UPDATE:
Quite a few reasons have been presented, and it's very difficult for me to choose one answer as many answers have a good solution. As @adietrich mentioned, I tend to believe they wanted me to demonstrate knowledge of core API (they even asked to write a function, not to write an algorithm).
更新:
已经提出了很多原因,我很难选择一个答案,因为很多答案都有很好的解决方案。正如@adietrich 提到的,我倾向于相信他们希望我展示核心 API 的知识(他们甚至要求编写函数,而不是编写算法)。
I believe the best way to secure the job was to provide as many solutions as possible, including: 1. Implementation using Collections.indexOfSubList() method to show that I know core collections API. 2. Implement using brute-force approach, but provide a more elegant solution. 3. Implement using a search algorithm, for example Boyer-Moore. 4. Implement using combination of System.arraycopy() and Arrays.equal(). However not the best solution in terms of performance, it would show my knowledge of standard array routines.
我认为确保工作安全的最佳方法是提供尽可能多的解决方案,包括: 1. 使用 Collections.indexOfSubList() 方法实现以表明我了解核心集合 API。2. 使用蛮力方法实施,但提供更优雅的解决方案。3. 使用搜索算法实现,例如 Boyer-Moore。4. 结合使用 System.arraycopy() 和 Arrays.equal() 来实现。然而,就性能而言,这不是最佳解决方案,它将显示我对标准数组例程的了解。
Thank you all for your answers!
END OF UPDATE.
谢谢大家的答案!
更新结束。
Here is what I wrote:
这是我写的:
Actual program:
实际程序:
package com.example.common.utils;
/**
* This class contains functions for array manipulations.
*
* @author Roman
*
*/
public class ArrayUtils {
/**
* Finds a sub array in a large array
*
* @param largeArray
* @param subArray
* @return index of sub array
*/
public int findArray(int[] largeArray, int[] subArray) {
/* If any of the arrays is empty then not found */
if (largeArray.length == 0 || subArray.length == 0) {
return -1;
}
/* If subarray is larger than large array then not found */
if (subArray.length > largeArray.length) {
return -1;
}
for (int i = 0; i < largeArray.length; i++) {
/* Check if the next element of large array is the same as the first element of subarray */
if (largeArray[i] == subArray[0]) {
boolean subArrayFound = true;
for (int j = 0; j < subArray.length; j++) {
/* If outside of large array or elements not equal then leave the loop */
if (largeArray.length <= i+j || subArray[j] != largeArray[i+j]) {
subArrayFound = false;
break;
}
}
/* Sub array found - return its index */
if (subArrayFound) {
return i;
}
}
}
/* Return default value */
return -1;
}
}
Test code:
测试代码:
package com.example.common.utils;
import com.example.common.utils.ArrayUtils;
import junit.framework.TestCase;
public class ArrayUtilsTest extends TestCase {
private ArrayUtils arrayUtils = new ArrayUtils();
public void testFindArrayDoesntExist() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {8,9,10};
int expected = -1;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayExistSimple() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {3,4,5};
int expected = 2;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayExistFirstPosition() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {1,2,3};
int expected = 0;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayExistLastPosition() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {5,6,7};
int expected = 4;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayDoesntExistPartiallyEqual() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {6,7,8};
int expected = -1;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayExistPartiallyEqual() {
int[] largeArray = {1,2,3,1,2,3,4,5,6,7};
int[] subArray = {1,2,3,4};
int expected = 3;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArraySubArrayEmpty() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {};
int expected = -1;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArraySubArrayLargerThanArray() {
int[] largeArray = {1,2,3,4,5,6,7};
int[] subArray = {4,5,6,7,8,9,10,11};
int expected = -1;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
public void testFindArrayExistsVeryComplex() {
int[] largeArray = {1234, 56, -345, 789, 23456, 6745};
int[] subArray = {56, -345, 789};
int expected = 1;
int actual = arrayUtils.findArray(largeArray, subArray);
assertEquals(expected, actual);
}
}
采纳答案by adietrich
The requirement of "using just core Java API's" could also mean that they wanted to see whether you would reinvent the wheel. So in addition to your own implementation, you could give the one-line solution, just to be safe:
“仅使用核心 Java API”的要求也可能意味着他们想看看您是否会重新发明轮子。因此,除了您自己的实现之外,您还可以提供单行解决方案,以确保安全:
public static int findArray(Integer[] array, Integer[] subArray)
{
return Collections.indexOfSubList(Arrays.asList(array), Arrays.asList(subArray));
}
It may or may not be a good idea to point out that the example given contains invalid array literals.
指出给出的示例包含无效的数组文字可能是也可能不是一个好主意。
回答by Maurice Perry
I would suggest the following improvements:
我会建议以下改进:
- make the function static so that you can avoid creating an instance
- the outer loop condition could be
i <= largeArray.length-subArray.length
, to avoid a test inside the loop - remove the test (
largeArray[i] == subArray[0]
) that is redundant
- 将函数设为静态,以避免创建实例
- 外循环条件可能是
i <= largeArray.length-subArray.length
,以避免在循环内进行测试 - 删除
largeArray[i] == subArray[0]
多余的测试 ( )
回答by EboMike
Well, off the top of my head:
好吧,在我的头顶上:
Yes, should be static.
A company complaining about that would not be worth working for.
Yeah, but what would you do? Return? Or throw an exception? It'll throw an exception the way it is already.
I think the main problem is that your code is not very elegant. Too many checks in the inner loop. Too many redundant checks.
是的,应该是静态的。
一家抱怨这一点的公司不值得为之工作。
是的,但你会怎么做?返回?还是抛出异常?它会像现在这样抛出异常。
我认为主要问题是您的代码不是很优雅。内循环中的检查过多。过多的冗余检查。
Just raw, off the top of my head:
只是原始的,在我的头顶上:
public int findArray(int[] largeArray, int[] subArray) {
int subArrayLength = subArray.length;
if (subArrayLength == 0) {
return -1;
}
int limit = largeArray.length - subArrayLength;
int i=0;
for (int i = 0; i <= limit; i++) {
boolean subArrayFound = true;
for (int j = 0; j < subArrayLength; j++) {
if (subArray[j] != largeArray[i+j]) {
subArrayFound = false;
break;
}
/* Sub array found - return its index */
if (subArrayFound) {
return i;
}
}
/* Return default value */
return -1;
}
You couldkeep that check for the first element so you don't have the overhead of setting up the boolean and the for loop for every single element in the array. Then you'd be looking at
您可以保留对第一个元素的检查,这样您就没有为数组中的每个元素设置布尔值和 for 循环的开销。然后你会看
public int findArray(int[] largeArray, int[] subArray) {
int subArrayLength = subArray.length;
if (subArrayLength == 0) {
return -1;
}
int limit = largeArray.length - subArrayLength;
int i=0;
for (int i = 0; i <= limit; i++) {
if (subArray[0] == largeArray[i]) {
boolean subArrayFound = true;
for (int j = 1; j < subArrayLength; j++) {
if (subArray[j] != largeArray[i+j]) {
subArrayFound = false;
break;
}
/* Sub array found - return its index */
if (subArrayFound) {
return i;
}
}
}
/* Return default value */
return -1;
}
回答by Emil
int findSubArr(int[] arr,int[] subarr)
{
int lim=arr.length-subarr.length;
for(int i=0;i<=lim;i++)
{
int[] tmpArr=Arrays.copyOfRange(arr,i,i+subarr.length);
if(Arrays.equals(tmpArr,subarr))
return i; //returns starting index of sub array
}
return -1;//return -1 on finding no sub-array
}
UPDATE:
更新:
By reusing the same int array instance:
通过重用相同的 int 数组实例:
int findSubArr(int[] arr,int[] subarr)
{
int lim=arr.length-subarr.length;
int[] tmpArr=new int[subarr.length];
for(int i=0;i<=lim;i++)
{
System.arraycopy(arr,i,tmpArr,0,subarr.length);
if(Arrays.equals(tmpArr,subarr))
return i; //returns starting index of sub array
}
return -1;//return -1 on finding no sub-array
}
回答by Marc
For finding an array of integers in a larger array of integers, you can use the same kind of algorithms as finding a substring in a larger string. For this there are many algorithms known (see Wikipedia). Especially the Boyer-Moore string search is efficient for large arrays. The algorithm that you are trying to implement is not very efficient (Wikipedia calls this the 'naive' implementation).
要在较大的整数数组中查找整数数组,可以使用与在较大字符串中查找子字符串相同类型的算法。为此,有许多已知的算法(参见维基百科)。特别是 Boyer-Moore 字符串搜索对于大型数组非常有效。您尝试实现的算法效率不高(维基百科称其为“幼稚”实现)。
For your questions:
对于您的问题:
- Yes, such a method should be static
- Don't care, that's a question of taste
- The null check can be included, or you should state in the JavaDoc that null values are not allowed, or JavaDoc should state that when either parameter is null a NullPointerException will be thrown.
- 是的,这样的方法应该是静态的
- 没关系,这是口味问题
- 可以包括空检查,或者您应该在 JavaDoc 中声明不允许空值,或者 JavaDoc 应该声明当任一参数为空时,将抛出 NullPointerException。
回答by Antonio
A little bit optimized code that was posted before:
之前贴出的一点优化代码:
public int findArray(byte[] largeArray, byte[] subArray) {
if (subArray.length == 0) {
return -1;
}
int limit = largeArray.length - subArray.length;
next:
for (int i = 0; i <= limit; i++) {
for (int j = 0; j < subArray.length; j++) {
if (subArray[j] != largeArray[i+j]) {
continue next;
}
}
/* Sub array found - return its index */
return i;
}
/* Return default value */
return -1;
}
回答by Kong
Here's #indexOf from String:
这是字符串中的#indexOf:
/**
* Code shared by String and StringBuffer to do searches. The
* source is the character array being searched, and the target
* is the string being searched for.
*
* @param source the characters being searched.
* @param sourceOffset offset of the source string.
* @param sourceCount count of the source string.
* @param target the characters being searched for.
* @param targetOffset offset of the target string.
* @param targetCount count of the target string.
* @param fromIndex the index to begin searching from.
*/
static int indexOf(char[] source, int sourceOffset, int sourceCount,
char[] target, int targetOffset, int targetCount,
int fromIndex) {
if (fromIndex >= sourceCount) {
return (targetCount == 0 ? sourceCount : -1);
}
if (fromIndex < 0) {
fromIndex = 0;
}
if (targetCount == 0) {
return fromIndex;
}
char first = target[targetOffset];
int max = sourceOffset + (sourceCount - targetCount);
for (int i = sourceOffset + fromIndex; i <= max; i++) {
/* Look for first character. */
if (source[i] != first) {
while (++i <= max && source[i] != first);
}
/* Found first character, now look at the rest of v2 */
if (i <= max) {
int j = i + 1;
int end = j + targetCount - 1;
for (int k = targetOffset + 1; j < end && source[j]
== target[k]; j++, k++);
if (j == end) {
/* Found whole string. */
return i - sourceOffset;
}
}
}
return -1;
}
回答by Kumar Gaurav
Following is an approach using KMP pattern matching algorithm. This solution takes O(n+m)
. Where n = length of large array
and m = length of sub array
. For more information, check:
下面是一种使用 KMP 模式匹配算法的方法。此解决方案需要O(n+m)
. 哪里n = length of large array
和m = length of sub array
。有关更多信息,请检查:
https://en.wikipedia.org/wiki/KMP_algorithm
https://en.wikipedia.org/wiki/KMP_algorithm
Brute force takes O(n*m)
. I just checked that Collections.indexOfSubList
method is also O(n*m)
.
蛮力需要O(n*m)
。我刚刚检查了该Collections.indexOfSubList
方法也是O(n*m)
.
public static int subStringIndex(int[] largeArray, int[] subArray) {
if (largeArray.length == 0 || subArray.length == 0){
throw new IllegalArgumentException();
}
if (subArray.length > largeArray.length){
throw new IllegalArgumentException();
}
int[] prefixArr = getPrefixArr(subArray);
int indexToReturn = -1;
for (int m = 0, s = 0; m < largeArray.length; m++) {
if (subArray[s] == largeArray[m]) {
s++;
} else {
if (s != 0) {
s = prefixArr[s - 1];
m--;
}
}
if (s == subArray.length) {
indexToReturn = m - subArray.length + 1;
break;
}
}
return indexToReturn;
}
private static int[] getPrefixArr(int[] subArray) {
int[] prefixArr = new int[subArray.length];
prefixArr[0] = 0;
for (int i = 1, j = 0; i < prefixArr.length; i++) {
while (subArray[i] != subArray[j]) {
if (j == 0) {
break;
}
j = prefixArr[j - 1];
}
if (subArray[i] == subArray[j]) {
prefixArr[i] = j + 1;
j++;
} else {
prefixArr[i] = j;
}
}
return prefixArr;
}
回答by Amit Kumar Sharma
Clean and improved code
public static int findArrayIndex(int[] subArray, int[] parentArray) {
if(subArray.length==0){
return -1;
}
int sL = subArray.length;
int l = parentArray.length - subArray.length;
int k = 0;
for (int i = 0; i < l; i++) {
if (parentArray[i] == subArray[k]) {
for (int j = 0; j < subArray.length; j++) {
if (parentArray[i + j] == subArray[j]) {
sL--;
if (sL == 0) {
return i;
}
}
}
}
}
return -1;
}
回答by xehpuk
First to your possible reasons:
先说说你可能的原因:
- Yes. And the class
final
with aprivate
constructor. - Shouldn't use this kind of comments at all. The code should be self-explanatory.
- You're basicallyimplicitly checking for
null
by accessing thelength
field which will throw aNullPointerException
. Only in the case of alargeArray.length == 0
and asubArray == null
will this slip through.
- 是的。和
final
带有private
构造函数的类。 - 根本不应该使用这种评论。代码应该是不言自明的。
- 您基本上是
null
通过访问length
将抛出NullPointerException
. 只有在 alargeArray.length == 0
和 a的情况下subArray == null
才会漏掉。
More potential reasons:
更多潜在原因:
- The class doesn't contain any function for array manipulations, opposed to what the documentation says.
- The documentation for the method is very sparse. It should state when and which exceptions are thrown (e.g.
NullPointerException
) and which return value to expect if the second array isn't found or if it is empty. - The code is more complex than needed.
- Why is the equality of the first elements so important that it gets its own check?
- In the first loop, it is assumed that the second array will be found, which is unintentional.
- Unneeded variable and jump (
boolean
andbreak
), further reducing legibility. largeArray.length <= i+j
is not easy to grasp. Should be checked before the loop, improving the performance along the way.- I'd swap the operands of
subArray[j] != largeArray[i+j]
. Seems more natural to me. - All in all too long.
- The test code is lacking more edge cases (
null
arrays, first array empty, both arrays empty, first array contained in second array, second array contained multiple times etc.). - Why is the last test case named
testFindArrayExistsVeryComplex
?
- 与文档所说的相反,该类不包含任何用于数组操作的函数。
- 该方法的文档非常稀少。它应该说明何时以及抛出哪些异常(例如
NullPointerException
),以及如果第二个数组未找到或它为空,则期望返回哪个值。 - 代码比需要的更复杂。
- 为什么第一个元素的相等性如此重要以至于它有自己的检查?
- 在第一个循环中,假设会找到第二个数组,这是无意的。
- 不需要的变量和跳转(
boolean
和break
),进一步降低了易读性。 largeArray.length <= i+j
不容易掌握。应该在循环之前进行检查,以提高沿途的性能。- 我会交换
subArray[j] != largeArray[i+j]
. 对我来说似乎更自然。 - 总而言之太久了。
- 测试代码缺少更多边缘情况(
null
数组,第一个数组为空,两个数组都为空,第一个数组包含在第二个数组中,第二个数组包含多次等)。 - 为什么最后一个测试用例命名为
testFindArrayExistsVeryComplex
?
What the exercise is missing is a specification of the component type of the array parameters, respectively the signature of the method. It makes a huge difference whether the component type is a primitive type or a reference type. The solution of adietrichassumes a reference type (thus could be generified as further improvement), mine assumes a primitive type (int
).
练习缺少的是数组参数的组件类型的规范,分别是方法的签名。组件类型是原始类型还是引用类型会产生巨大的差异。adietrich 的解决方案假设引用类型(因此可以作为进一步改进进行泛化),我的解决方案假设原始类型 ( int
)。
So here's my shot, concentrating on the code / disregarding documentation and tests:
所以这是我的镜头,专注于代码/无视文档和测试:
public final class ArrayUtils {
// main method
public static int indexOf(int[] haystack, int[] needle) {
return indexOf(haystack, needle, 0);
}
// helper methods
private static int indexOf(int[] haystack, int[] needle, int fromIndex) {
for (int i = fromIndex; i < haystack.length - needle.length; i++) {
if (containsAt(haystack, needle, i)) {
return i;
}
}
return -1;
}
private static boolean containsAt(int[] haystack, int[] needle, int offset) {
for (int i = 0; i < needle.length; i++) {
if (haystack[i + offset] != needle[i]) {
return false;
}
}
return true;
}
// prevent initialization
private ArrayUtils() {}
}