Java 8, Streams 查找重复元素

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/27677256/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-08-11 04:50:10  来源:igfitidea点击:

Java 8, Streams to find the duplicate elements

javalambdajava-8java-stream

提问by Siva

I am trying to list out duplicate elements in the integer list say for eg,

我正在尝试列出整数列表中的重复元素,例如,

List<Integer> numbers = Arrays.asList(new Integer[]{1,2,1,3,4,4});    

using Streams of jdk 8. Has anybody tried out. To remove the duplicates we can use the distinct() api. But what about finding the duplicated elements ? Anybody can help me out ?

使用 Streams of jdk 8。有没有人尝试过。要删除重复项,我们可以使用 distinct() api。但是如何找到重复的元素呢?任何人都可以帮助我吗?

回答by Oussama Zoghlami

You can get the duplicated like this :

你可以得到这样的重复:

List<Integer> numbers = Arrays.asList(1, 2, 1, 3, 4, 4);
Set<Integer> duplicated = numbers
  .stream()
  .filter(n -> numbers
        .stream()
        .filter(x -> x == n)
        .count() > 1)
   .collect(Collectors.toSet());

回答by Dave

You need a set (allItemsbelow) to hold the entire array contents, but this is O(n):

您需要一个集合(allItems如下)来保存整个数组内容,但这是 O(n):

Integer[] numbers = new Integer[] { 1, 2, 1, 3, 4, 4 };
Set<Integer> allItems = new HashSet<>();
Set<Integer> duplicates = Arrays.stream(numbers)
        .filter(n -> !allItems.add(n)) //Set.add() returns false if the item was already in the set.
        .collect(Collectors.toSet());
System.out.println(duplicates); // [1, 4]

回答by RobAu

Basic example. First-half builds the frequency-map, second-half reduces it to a filtered list. Probably not as efficient as Dave's answer, but more versatile (like if you want to detect exactly two etc.)

基本示例。前半部分构建频率图,后半部分将其缩减为过滤列表。可能不如 Dave 的回答那么有效,但用途更广(例如,如果您只想检测两个等)

     List<Integer> duplicates = IntStream.of( 1, 2, 3, 2, 1, 2, 3, 4, 2, 2, 2 )
       .boxed()
       .collect( Collectors.groupingBy( Function.identity(), Collectors.counting() ) )
       .entrySet()
       .stream()
       .filter( p -> p.getValue() > 1 )
       .map( Map.Entry::getKey )
       .collect( Collectors.toList() );

回答by Thomas Mathew

An O(n) way would be as below:

O(n) 方式如下:

List<Integer> numbers = Arrays.asList(1, 2, 1, 3, 4, 4);
Set<Integer> duplicatedNumbersRemovedSet = new HashSet<>();
Set<Integer> duplicatedNumbersSet = numbers.stream().filter(n -> !duplicatedNumbersRemovedSet.add(n)).collect(Collectors.toSet());

The space complexity would go double in this approach, but that space is not a waste; in-fact, we now have the duplicated alone only as a Set as well as another Set with all the duplicates removed too.

在这种方法中,空间复杂度会加倍,但空间并不是浪费;事实上,我们现在将重复的单独作为一个集合以及另一个删除了所有重复项的集合。

回答by Tagir Valeev

My StreamExlibrary which enhances the Java 8 streams provides a special operation distinct(atLeast)which can retain only elements appearing at least the specified number of times. So your problem can be solved like this:

我的StreamEx库增强了 Java 8 流,它提供了一个特殊的操作distinct(atLeast),它只能保留至少出现指定次数的元素。所以你的问题可以这样解决:

List<Integer> repeatingNumbers = StreamEx.of(numbers).distinct(2).toList();

Internally it's similar to @Dave solution, it counts objects, to support other wanted quantities and it's parallel-friendly (it uses ConcurrentHashMapfor parallelized stream, but HashMapfor sequential). For big amounts of data you can get a speed-up using .parallel().distinct(2).

在内部,它类似于@Dave 解决方案,它计算对象,以支持其他想要的数量,并且它是并行友好的(它ConcurrentHashMap用于并行化流,但HashMap用于顺序)。对于大量数据,您可以使用.parallel().distinct(2).

回答by Bao Dinh

You can use Collections.frequency:

您可以使用Collections.frequency

numbers.stream().filter(i -> Collections.frequency(numbers, i) >1)
                .collect(Collectors.toSet()).forEach(System.out::println);

回答by Zhurov Konstantin

I think I have good solution how to fix problem like this - List => List with grouping by Something.a & Something.b. There is extended definition:

我想我有很好的解决方案如何解决这样的问题 - 列表 => 列表,按Something.a 和Something.b 分组。有扩展定义:

public class Test {

    public static void test() {

        class A {
            private int a;
            private int b;
            private float c;
            private float d;

            public A(int a, int b, float c, float d) {
                this.a = a;
                this.b = b;
                this.c = c;
                this.d = d;
            }
        }


        List<A> list1 = new ArrayList<A>();

        list1.addAll(Arrays.asList(new A(1, 2, 3, 4),
                new A(2, 3, 4, 5),
                new A(1, 2, 3, 4),
                new A(2, 3, 4, 5),
                new A(1, 2, 3, 4)));

        Map<Integer, A> map = list1.stream()
                .collect(HashMap::new, (m, v) -> m.put(
                        Objects.hash(v.a, v.b, v.c, v.d), v),
                        HashMap::putAll);

        list1.clear();
        list1.addAll(map.values());

        System.out.println(list1);
    }

}

class A, list1 it's just incoming data - magic is in the Objects.hash(...) :)

A 类,list1 它只是传入的数据 - 魔法在 Objects.hash(...) :)

回答by Victor

Do you have to use the java 8 idioms (steams)? Perphaps a simple solution would be to move the complexity to a map alike data structure that holds numbers as key (without repeating) and the times it ocurrs as a value. You could them iterate that map an only do something with those numbers that are ocurrs > 1.

你必须使用java 8 idioms (steams) 吗?也许一个简单的解决方案是将复杂性转移到类似地图的数据结构中,该数据结构将数字作为键(不重复)和它出现的时间作为值。你可以让他们迭代那个地图,只对 ocurrs > 1 的那些数字做一些事情。

import java.lang.Math;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.HashMap;
import java.util.Iterator;

public class RemoveDuplicates
{
  public static void main(String[] args)
  {
   List<Integer> numbers = Arrays.asList(new Integer[]{1,2,1,3,4,4});
   Map<Integer,Integer> countByNumber = new HashMap<Integer,Integer>();
   for(Integer n:numbers)
   {
     Integer count = countByNumber.get(n);
     if (count != null) {
       countByNumber.put(n,count + 1);
     } else {
       countByNumber.put(n,1);
     }
   }
   System.out.println(countByNumber);
   Iterator it = countByNumber.entrySet().iterator();
    while (it.hasNext()) {
        Map.Entry pair = (Map.Entry)it.next();
        System.out.println(pair.getKey() + " = " + pair.getValue());
    }
  }
}

回答by Ilia Galperin

Try this solution:

试试这个解决方案:

public class Anagramm {

public static boolean isAnagramLetters(String word, String anagramm) {
    if (anagramm.isEmpty()) {
        return false;
    }

    Map<Character, Integer> mapExistString = CharCountMap(word);
    Map<Character, Integer> mapCheckString = CharCountMap(anagramm);
    return enoughLetters(mapExistString, mapCheckString);
}

private static Map<Character, Integer> CharCountMap(String chars) {
    HashMap<Character, Integer> charCountMap = new HashMap<Character, Integer>();
    for (char c : chars.toCharArray()) {
        if (charCountMap.containsKey(c)) {
            charCountMap.put(c, charCountMap.get(c) + 1);
        } else {
            charCountMap.put(c, 1);
        }
    }
    return charCountMap;
}

static boolean enoughLetters(Map<Character, Integer> mapExistString, Map<Character,Integer> mapCheckString) {
    for( Entry<Character, Integer> e : mapCheckString.entrySet() ) {
        Character letter = e.getKey();
        Integer available = mapExistString.get(letter);
        if (available == null || e.getValue() > available) return false;
    }
    return true;
}

}

回答by Prashant

I think basic solutions to the question should be as below:

我认为这个问题的基本解决方案应该如下:

Supplier supplier=HashSet::new; 
HashSet has=ls.stream().collect(Collectors.toCollection(supplier));

List lst = (List) ls.stream().filter(e->Collections.frequency(ls,e)>1).distinct().collect(Collectors.toList());

well, it is not recommended to perform a filter operation, but for better understanding, i have used it, moreover, there should be some custom filtration in future versions.

嗯,不建议进行过滤操作,但为了更好的理解,我已经使用了它,而且,在未来的版本中应该会有一些自定义过滤。