Java 数组,查找重复项

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/3951547/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-14 07:29:34  来源:igfitidea点击:

Java Array, Finding Duplicates

javaarrays

提问by Snowman

I have an array, and am looking for duplicates.

我有一个数组,正在寻找重复项。

duplicates = false;
for(j = 0; j < zipcodeList.length; j++){
    for(k = 0; k < zipcodeList.length; k++){
        if (zipcodeList[k] == zipcodeList[j]){
            duplicates = true;
        }
    }
}

However, this code doesnt work when there are no duplicates. Whys that?

但是,当没有重复项时,此代码不起作用。为什么?

采纳答案by andersoj

On the nose answer..

在鼻子上回答..

duplicates=false;
for (j=0;j<zipcodeList.length;j++)
  for (k=j+1;k<zipcodeList.length;k++)
    if (k!=j && zipcodeList[k] == zipcodeList[j])
      duplicates=true;

Edited to switch .equals()back to ==since I read somewhere you're using int, which wasn't clear in the initial question. Also to set k=j+1, to halve execution time, but it's still O(n2).

编辑切换.equals()回,==因为我在某个地方阅读了您正在使用的内容int,这在最初的问题中不清楚。还要 set k=j+1,将执行时间减半,但它仍然是 O(n 2)。

A faster (in the limit) way

更快(在极限中)的方式

Here's a hash based approach. You gotta pay for the autoboxing, but it's O(n) instead of O(n2). An enterprising soul would go find a primitive int-based hash set (Apache or Google Collections has such a thing, methinks.)

这是一种基于哈希的方法。您必须为自动装箱付费,但它是 O(n) 而不是 O(n 2)。一个有进取心的人会去寻找一个原始的基于 int 的哈希集(我认为 Apache 或 Google Collections 有这样的东西。)

boolean duplicates(final int[] zipcodelist)
{
  Set<Integer> lump = new HashSet<Integer>();
  for (int i : zipcodelist)
  {
    if (lump.contains(i)) return true;
    lump.add(i);
  }
  return false;
}

Bow to HuyLe

向 HuyLe 鞠躬

See HuyLe's answerfor a more or less O(n) solution, which I think needs a couple of add'l steps:

有关或多或少 O(n) 的解决方案,请参阅HuyLe 的答案,我认为这需要几个附加步骤:

static boolean duplicates(final int[] zipcodelist)
{
   final int MAXZIP = 99999;
   boolean[] bitmap = new boolean[MAXZIP+1];
   java.util.Arrays.fill(bitmap, false);
   for (int item : zipcodeList)
     if (!bitmap[item]) bitmap[item] = true;
     else return true;
   }
   return false;
}

Or Just to be Compact

或者只是为了紧凑

static boolean duplicates(final int[] zipcodelist)
{
   final int MAXZIP = 99999;
   boolean[] bitmap = new boolean[MAXZIP+1];  // Java guarantees init to false
   for (int item : zipcodeList)
     if (!(bitmap[item] ^= true)) return true;
   return false;
}

Does it Matter?

有关系吗?

Well, so I ran a little benchmark, which is iffy all over the place, but here's the code:

好吧,所以我运行了一个小基准,它到处都是不确定的,但这是代码:

import java.util.BitSet;

class Yuk
{
  static boolean duplicatesZero(final int[] zipcodelist)
  {
    boolean duplicates=false;
    for (int j=0;j<zipcodelist.length;j++)
      for (int k=j+1;k<zipcodelist.length;k++)
        if (k!=j && zipcodelist[k] == zipcodelist[j])
          duplicates=true;

    return duplicates;
  }


  static boolean duplicatesOne(final int[] zipcodelist)
  {
    final int MAXZIP = 99999;
    boolean[] bitmap = new boolean[MAXZIP + 1];
    java.util.Arrays.fill(bitmap, false);
    for (int item : zipcodelist) {
      if (!(bitmap[item] ^= true))
        return true;
    }
    return false;
  }

  static boolean duplicatesTwo(final int[] zipcodelist)
  {
    final int MAXZIP = 99999;

    BitSet b = new BitSet(MAXZIP + 1);
    b.set(0, MAXZIP, false);
    for (int item : zipcodelist) {
      if (!b.get(item)) {
        b.set(item, true);
      } else
        return true;
    }
    return false;
  }

  enum ApproachT { NSQUARED, HASHSET, BITSET};

  /**
   * @param args
   */
  public static void main(String[] args)
  {
    ApproachT approach = ApproachT.BITSET;

    final int REPS = 100;
    final int MAXZIP = 99999;

    int[] sizes = new int[] { 10, 1000, 10000, 100000, 1000000 };
    long[][] times = new long[sizes.length][REPS];

    boolean tossme = false;

    for (int sizei = 0; sizei < sizes.length; sizei++) {
      System.err.println("Trial for zipcodelist size= "+sizes[sizei]);
      for (int rep = 0; rep < REPS; rep++) {
        int[] zipcodelist = new int[sizes[sizei]];
        for (int i = 0; i < zipcodelist.length; i++) {
          zipcodelist[i] = (int) (Math.random() * (MAXZIP + 1));
        }
        long begin = System.currentTimeMillis();
        switch (approach) {
        case NSQUARED :
          tossme ^= (duplicatesZero(zipcodelist));
          break;
        case HASHSET :
          tossme ^= (duplicatesOne(zipcodelist));
          break;
        case BITSET :
          tossme ^= (duplicatesTwo(zipcodelist));
          break;

        }
        long end = System.currentTimeMillis();
        times[sizei][rep] = end - begin;


      }
      long avg = 0;
      for (int rep = 0; rep < REPS; rep++) {
        avg += times[sizei][rep];
      }
      System.err.println("Size=" + sizes[sizei] + ", avg time = "
            + avg / (double)REPS + "ms");
    }
  }

}

With NSQUARED:

使用 NSQUARED:

Trial for size= 10
Size=10, avg time = 0.0ms
Trial for size= 1000
Size=1000, avg time = 0.0ms
Trial for size= 10000
Size=10000, avg time = 100.0ms
Trial for size= 100000
Size=100000, avg time = 9923.3ms

With HashSet

使用哈希集

Trial for zipcodelist size= 10
Size=10, avg time = 0.16ms
Trial for zipcodelist size= 1000
Size=1000, avg time = 0.15ms
Trial for zipcodelist size= 10000
Size=10000, avg time = 0.0ms
Trial for zipcodelist size= 100000
Size=100000, avg time = 0.16ms
Trial for zipcodelist size= 1000000
Size=1000000, avg time = 0.0ms

With BitSet

使用位集

Trial for zipcodelist size= 10
Size=10, avg time = 0.0ms
Trial for zipcodelist size= 1000
Size=1000, avg time = 0.0ms
Trial for zipcodelist size= 10000
Size=10000, avg time = 0.0ms
Trial for zipcodelist size= 100000
Size=100000, avg time = 0.0ms
Trial for zipcodelist size= 1000000
Size=1000000, avg time = 0.0ms

BITSET Wins!

BITSET 赢了!

But only by a hair... .15ms is within the error for currentTimeMillis(), and there are some gaping holes in my benchmark. Note that for any list longer than 100000, you can simply return truebecause there will be a duplicate. In fact, if the list is anything like random, you can return true WHP for a much shorter list. What's the moral? In the limit, the most efficient implementation is:

但只有一根头发...... .15ms 在误差范围内currentTimeMillis(),并且我的基准测试中有一些大漏洞。请注意,对于任何超过 100000 的列表,您可以简单地返回,true因为会有重复。事实上,如果列表是随机的,您可以为更短的列表返回真正的 WHP。什么是道德?在极限情况下,最有效的实现是:

 return true;

And you won't be wrong very often.

而且你不会经常出错。

回答by codaddict

To check for duplicates you need to compare distinctpairs.

要检查重复项,您需要比较不同的对。

回答by james_bond

Cause you are comparing the first element of the array against itself so It finds that there are duplicates even where there aren't.

因为您正在将数组的第一个元素与其自身进行比较,因此即使没有重复,它也会发现存在重复项。

回答by Lie Ryan

Let's see how your algorithm works:

让我们看看你的算法是如何工作的:

an array of unique values:

[1, 2, 3]

check 1 == 1. yes, there is duplicate, assigning duplicate to true.
check 1 == 2. no, doing nothing.
check 1 == 3. no, doing nothing.
check 2 == 1. no, doing nothing.
check 2 == 2. yes, there is duplicate, assigning duplicate to true.
check 2 == 3. no, doing nothing.
check 3 == 1. no, doing nothing.
check 3 == 2. no, doing nothing.
check 3 == 3. yes, there is duplicate, assigning duplicate to true.

a better algorithm:

更好的算法:

for (j=0;j<zipcodeList.length;j++) {
    for (k=j+1;k<zipcodeList.length;k++) {
        if (zipcodeList[k]==zipcodeList[j]){ // or use .equals()
            return true;
        }
    }
}
return false;

回答by KitsuneYMG

Don't use ==use .equals.

不要使用==use .equals

try this instead (IIRC, ZipCode needs to implement Comparablefor this to work.

试试这个(IIRC,ZipCode 需要实现Comparable它才能工作。

boolean unique;
Set<ZipCode> s = new TreeSet<ZipCode>();
for( ZipCode zc : zipcodelist )
    unique||=s.add(zc);
duplicates = !unique;

回答by doobop

Initialize k = j+1. You won't compare elements to themselves and you'll also not duplicate comparisons. For example, j = 0, k = 1 and k = 0, j = 1 compare the same set of elements. This would remove the k = 0, j = 1 comparison.

初始化 k = j+1。您不会将元素与其自身进行比较,也不会重复比较。例如,j = 0, k = 1 和 k = 0, j = 1 比较同一组元素。这将删除 k = 0, j = 1 比较。

回答by Huy Le

You can use bitmap for better performance with large array.

您可以使用位图来提高大型数组的性能。

    java.util.Arrays.fill(bitmap, false);

    for (int item : zipcodeList)
        if (!bitmap[item]) bitmap[item] = true;
        else break;

UPDATE: This is a very negligent answer of mine back in the day, keeping it here just for reference. You should refer to andersoj's excellent answer.

更新:这是我当时非常疏忽的回答,将其保留在这里仅供参考。您应该参考 andersoj 的出色回答

回答by Sephiroth

How about using this method?

用这个方法怎么样?

HashSet<Integer> zipcodeSet = new HashSet<Integer>(Arrays.asList(zipcodeList));
duplicates = zipcodeSet.size()!=zipcodeList.length;

回答by Rolando F

You can also work with Set, which doesn't allow duplicates in Java..

您还可以使用 Set,它不允许在 Java 中重复。

    for (String name : names)
    {         
      if (set.add(name) == false) 
         { // your duplicate element }
    }

using add() method and check return value. If add() returns false it means that element is not allowed in the Set and that is your duplicate.

使用 add() 方法并检查返回值。如果 add() 返回 false,则表示 Set 中不允许该元素,并且该元素是您的重复项。

回答by Huy Hóm H?nh

@andersoj gave a great answer, but I also want add new simple way

@andersoj 给出了一个很好的答案,但我也想添加新的简单方法

    private boolean checkDuplicateBySet(Integer[] zipcodeList) {
        Set<Integer> zipcodeSet = new HashSet(Arrays.asList(zipcodeList));
        if (zipcodeSet.size() == zipcodeList.length) {
            return true;
        }
        return false;
    }

In case zipcodeList is int[], you need convert int[] to Integer[] first(It not auto-boxing), code here

如果 zipcodeList 是 int[],您需要先将 int[] 转换为 Integer[](它不是自动装箱),代码在这里

Complete code will be:

完整的代码将是:

    private boolean checkDuplicateBySet2(int[] zipcodeList) {
        Integer[] zipcodeIntegerArray = new Integer[zipcodeList.length];
        for (int i = 0; i < zipcodeList.length; i++) {
            zipcodeIntegerArray[i] = Integer.valueOf(zipcodeList[i]);
        }

        Set<Integer> zipcodeSet = new HashSet(Arrays.asList(zipcodeIntegerArray));
        if (zipcodeSet.size() == zipcodeList.length) {
            return true;
        }
        return false;
    }

Hope this helps!

希望这可以帮助!