从 Java 8 中的列表中提取重复对象

Question

提问by Marco Tulio Avila Cerón

This code removes duplicates from the original list, but I want to extract the duplicates from the original list -> not removing them (this package name is just part of another project):

此代码从原始列表中删除重复项，但我想从原始列表中提取重复项 -> 不删除它们（此包名称只是另一个项目的一部分）：

Given:

鉴于：

a Person pojo:

一个人 pojo：

package at.mavila.learn.kafka.kafkaexercises;

import org.apache.commons.lang3.builder.ToStringBuilder;

public class Person {

private final Long id;
private final String firstName;
private final String secondName;


private Person(final Builder builder) {
    this.id = builder.id;
    this.firstName = builder.firstName;
    this.secondName = builder.secondName;
}


public Long getId() {
    return id;
}

public String getFirstName() {
    return firstName;
}

public String getSecondName() {
    return secondName;
}

public static class Builder {

    private Long id;
    private String firstName;
    private String secondName;

    public Builder id(final Long builder) {
        this.id = builder;
        return this;
    }

    public Builder firstName(final String first) {
        this.firstName = first;
        return this;
    }

    public Builder secondName(final String second) {
        this.secondName = second;
        return this;
    }

    public Person build() {
        return new Person(this);
    }


}

@Override
public String toString() {
    return new ToStringBuilder(this)
            .append("id", id)
            .append("firstName", firstName)
            .append("secondName", secondName)
            .toString();
}
}

Duplication extraction code.

重复提取代码。

Notice here we filter the id and the first name to retrieve a new list, I saw this code someplace else, not mine:

注意这里我们过滤了 id 和名字来检索一个新列表，我在其他地方看到了这段代码，不是我的：

package at.mavila.learn.kafka.kafkaexercises;

import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Collectors;

import static java.util.Objects.isNull;

public final class DuplicatePersonFilter {


private DuplicatePersonFilter() {
    //No instances of this class
}

public static List<Person> getDuplicates(final List<Person> personList) {

   return personList
           .stream()
           .filter(duplicateByKey(Person::getId))
           .filter(duplicateByKey(Person::getFirstName))
           .collect(Collectors.toList());

}

private static <T> Predicate<T> duplicateByKey(final Function<? super T, Object> keyExtractor) {
    Map<Object,Boolean> seen = new ConcurrentHashMap<>();
    return t -> isNull(seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE));

}

}

The test code. If you run this test case you will get [alex, lolita, elpidio, romualdo].

测试代码。如果你运行这个测试用例，你会得到 [alex, lolita, elpidio, romualdo]。

I would expect to get instead [romualdo, otroRomualdo] as the extracted duplicates given the id and the firstName:

我希望得到 [romualdo, otroRomualdo] 作为给定 id 和 firstName 的提取重复项：

package at.mavila.learn.kafka.kafkaexercises;


import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.util.ArrayList;
import java.util.List;

import static org.junit.Assert.*;

public class DuplicatePersonFilterTest {

private static final Logger LOGGER = LoggerFactory.getLogger(DuplicatePersonFilterTest.class);



@Test
public void testList(){

    Person alex = new Person.Builder().id(1L).firstName("alex").secondName("salgado").build();
    Person lolita = new Person.Builder().id(2L).firstName("lolita").secondName("llanero").build();
    Person elpidio = new Person.Builder().id(3L).firstName("elpidio").secondName("ramirez").build();
    Person romualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("gomez").build();
    Person otroRomualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("perez").build();


    List<Person> personList = new ArrayList<>();

    personList.add(alex);
    personList.add(lolita);
    personList.add(elpidio);
    personList.add(romualdo);
    personList.add(otroRomualdo);

    final List<Person> duplicates = DuplicatePersonFilter.getDuplicates(personList);

    LOGGER.info("Duplicates: {}",duplicates);

}

}

In my job I was able to get the desired result it by using Comparator using TreeMap and ArrayList, but this was creating a list then filtering it, passing the filter again to a newly created list, this looks bloated code, (and probably inefficient)

在我的工作中，我能够通过使用使用 TreeMap 和 ArrayList 的 Comparator 获得所需的结果，但这是创建一个列表然后过滤它，再次将过滤器传递给新创建的列表，这看起来臃肿的代码，（可能效率低下）

Does someone has a better idea how to extract duplicates?, not remove them.

有人有更好的主意如何提取重复项吗？而不是删除它们。

Thanks in advance.

提前致谢。

Update:

更新：

Thanks everyone for your answers

谢谢大家的回答

To remove the duplicate using same approach with the uniqueAttributes:

使用与 uniqueAttributes 相同的方法删除重复项：

 public static List<Person> removeDuplicates(final List<Person> personList) {

    return personList.stream().collect(Collectors
            .collectingAndThen(Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(
                    PersonListFilters::uniqueAttributes))),
                    ArrayList::new));

}

 private static String uniqueAttributes(Person person){

    if(Objects.isNull(person)){
        return StringUtils.EMPTY;
    }



    return (person.getId()) + (person.getFirstName()) ;
}

Answer 1

采纳答案by Magnilex

To indentify duplicates, no method I know of is better suited than Collectors.groupingBy(). This allows you to group the list into a map based on a condition of your choice.

为了识别重复项，我所知道的任何方法都没有比Collectors.groupingBy(). 这允许您根据您选择的条件将列表分组到地图中。

Your condition is a combination of idand firstName. Let's extract this part into an own method in Person:

你的病情的组合id和firstName。让我们把这部分提取到一个自己的方法中Person：

String uniqueAttributes() {
  return id + firstName;
}

The getDuplicates()method is now quite straightforward:

该getDuplicates()方法现在非常简单：

public static List<Person> getDuplicates(final List<Person> personList) {
  return getDuplicatesMap(personList).values().stream()
      .filter(duplicates -> duplicates.size() > 1)
      .flatMap(Collection::stream)
      .collect(Collectors.toList());
}

private static Map<String, List<Person>> getDuplicatesMap(List<Person> personList) {
  return personList.stream().collect(groupingBy(Person::uniqueAttributes));
}

The first line calls another method getDuplicatesMap()to create the map as explained above.
It then streams over the values of the map, which are lists of persons.
It filters out everything except lists with a size greater than 1, i.e. it finds the duplicates.
Finally, flatMap()is used to flatten the stream of lists into one single stream of persons, and collects the stream to a list.

第一行调用另一种方法getDuplicatesMap()来创建如上所述的地图。
然后它流过地图的值，这些值是人员列表。
它过滤掉除了大小大于 1 的列表之外的所有内容，即它找到重复项。
最后，flatMap()用于将列表流扁平化为一个人流，并将流收集到列表中。

An alternative, if you truly identify persons as equal if the have the same idand firstNameis to go with the solution by Jonathan Johxand implement an equals()method.

另一种选择，如果你真的认为人是平等的，如果他们有相同的人id，firstName并且要采用 Jonathan Johx 的解决方案并实施一种equals()方法。

Answer 2

回答by Deadpool

In this scenario you need to write your custom logic to extract the duplicates from the list, you will get all the duplicates in the Personlist

在这种情况下，你需要编写自定义的逻辑，从列表中提取重复，你会得到所有的副本Person列表

   public static List<Person> extractDuplicates(final List<Person> personList) {

    return personList.stream().flatMap(i -> {
        final AtomicInteger count = new AtomicInteger();
        final List<Person> duplicatedPersons = new ArrayList<>();

        personList.forEach(p -> {

            if (p.getId().equals(i.getId()) && p.getFirstName().equals(i.getFirstName())) {
                count.getAndIncrement();
            }

            if (count.get() == 2) {
                duplicatedPersons.add(i);
            }

        });

        return duplicatedPersons.stream();
    }).collect(Collectors.toList());
}

Applied to:

应用于：

 List<Person> l = new ArrayList<>();
           Person alex = new 
 Person.Builder().id(1L).firstName("alex").secondName("salgado").build();
            Person lolita = new 
 Person.Builder().id(2L).firstName("lolita").secondName("llanero").build();
            Person elpidio = new 
 Person.Builder().id(3L).firstName("elpidio").secondName("ramirez").build();
            Person romualdo = new 
 Person.Builder().id(4L).firstName("romualdo").secondName("gomez").build();
            Person otroRomualdo = new 
 Person.Builder().id(4L).firstName("romualdo").secondName("perez").build();
      l.add(alex);
      l.add(lolita);
      l.add(elpidio);
      l.add(romualdo);
      l.add(otroRomualdo);

Output:

输出：

[Person [id=4, firstName=romualdo, secondName=gomez], Person [id=4, firstName=romualdo, secondName=perez]]

Answer 3

回答by YoYo

List<Person> duplicates = personList.stream()
  .collect(Collectors.groupingBy(Person::getId))
  .entrySet().stream()
  .filter(e->e.getValue().size() > 1)
  .flatMap(e->e.getValue().stream())
  .collect(Collectors.toList());

That should give you a List of Personwhere the idhas been duplicated.

这应该给你的名单Person中，其中id已被复制。

Answer 4

回答by Jonathan JOhx

I think first you should overwrite equals method of Person class and focus on id and name. And after you can update it adding a filter for that.

我认为首先你应该覆盖 Person 类的 equals 方法并关注 id 和 name。在您可以更新它之后，为此添加一个过滤器。

@Override
public int hashCode() {
    return Objects.hash(id, name);
}

@Override
public boolean equals(Object obj) {
    if (this == obj) {
        return true;
    }
    if (obj == null) {
        return false;
    }
    if (getClass() != obj.getClass()) {
        return false;
    }
    final Person other = (Person) obj;
    if (!Objects.equals(name, other.name)) {
        return false;
    }
    if (!Objects.equals(id, other.id)) {
        return false;
    }
    return true;
}

 personList
       .stream() 
       .filter(p -> personList.contains(p))
       .collect(Collectors.toList());

Answer 5

回答by Lokesh Balaji

List<Person> arr = new ArrayList<>();
arr.add(alex);
arr.add(lolita);
arr.add(elpidio);
arr.add(romualdo);
arr.add(otroRomualdo);

Set<String> set = new HashSet<>();
List<Person> result = arr.stream()
                         .filter(data -> (set.add(data.name +";"+ Long.toString(data.id)) == false))
                         .collect(Collectors.toList());
arr.removeAll(result);
Set<String> set2 = new HashSet<>();
result.stream().forEach(data -> set2.add(data.name +";"+ Long.toString(data.id)));
List<Person> resultTwo = arr.stream()
                            .filter(data -> (set2.add(data.name +";"+ Long.toString(data.id)) == false))
                            .collect(Collectors.toList());
result.addAll(resultTwo);

The above code will filter based on name and id. The result List will have all the duplicated Person Object

上面的代码将根据 name 和 id 进行过滤。结果 List 将包含所有重复的 Person 对象

Answer 6

回答by Leonid Dashko

Solution based on generic key:

基于通用密钥的解决方案：

public static <T> List<T> findDuplicates(List<T> list, Function<T, ?> uniqueKey) {
    if (list == null) {
        return emptyList();
    }
    Function<T, ?> notNullUniqueKey = el -> uniqueKey.apply(el) == null ? "" : uniqueKey.apply(el);
    return list.stream()
            .collect(groupingBy(notNullUniqueKey))
            .values()
            .stream()
            .filter(matches -> matches.size() > 1)
            .map(matches -> matches.get(0))
            .collect(toList());
}


// Example of usage:
List<Person> duplicates = findDuplicates(list, el -> el.getFirstName());

从 Java 8 中的列表中提取重复对象

提问by Marco Tulio Avila Cerón

采纳答案by Magnilex

回答by Deadpool

回答by YoYo

回答by Jonathan JOhx

回答by Lokesh Balaji

回答by Leonid Dashko

相关推荐

最近更新

标签

从 Java 8 中的列表中提取重复对象

提问by Marco Tulio Avila Cerón

采纳答案by Magnilex

回答by Deadpool

回答by YoYo

回答by Jonathan JOhx

回答by Lokesh Balaji

回答by Leonid Dashko

相关推荐

Java 错误：无法找到或加载主类cucumber.cli.Main

Java 对于 groovy 中的每个循环

Spring Boot - JSON 对象数组到 Java 数组

Java Spring-Boot：如何设置 JDBC 池属性，例如最大连接数？

相关推荐

最近更新

标签