问题描述
此代码从原始列表中删除重复项,但我想从原始列表中提取重复项 -> 不删除它们(此包名称只是另一个项目的一部分):
This code removes duplicates from the original list, but I want to extract the duplicates from the original list -> not removing them (this package name is just part of another project):
给定:
一个人 pojo:
package at.mavila.learn.kafka.kafkaexercises;
import org.apache.commons.lang3.builder.ToStringBuilder;
public class Person {
private final Long id;
private final String firstName;
private final String secondName;
private Person(final Builder builder) {
this.id = builder.id;
this.firstName = builder.firstName;
this.secondName = builder.secondName;
}
public Long getId() {
return id;
}
public String getFirstName() {
return firstName;
}
public String getSecondName() {
return secondName;
}
public static class Builder {
private Long id;
private String firstName;
private String secondName;
public Builder id(final Long builder) {
this.id = builder;
return this;
}
public Builder firstName(final String first) {
this.firstName = first;
return this;
}
public Builder secondName(final String second) {
this.secondName = second;
return this;
}
public Person build() {
return new Person(this);
}
}
@Override
public String toString() {
return new ToStringBuilder(this)
.append("id", id)
.append("firstName", firstName)
.append("secondName", secondName)
.toString();
}
}
重复提取码.
注意这里我们过滤了 id 和名字来检索一个新列表,我在其他地方看到了这段代码,不是我的:
Notice here we filter the id and the first name to retrieve a new list, I saw this code someplace else, not mine:
package at.mavila.learn.kafka.kafkaexercises;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.concurrent.ConcurrentHashMap;
import java.util.function.Function;
import java.util.function.Predicate;
import java.util.stream.Collectors;
import static java.util.Objects.isNull;
public final class DuplicatePersonFilter {
private DuplicatePersonFilter() {
//No instances of this class
}
public static List<Person> getDuplicates(final List<Person> personList) {
return personList
.stream()
.filter(duplicateByKey(Person::getId))
.filter(duplicateByKey(Person::getFirstName))
.collect(Collectors.toList());
}
private static <T> Predicate<T> duplicateByKey(final Function<? super T, Object> keyExtractor) {
Map<Object,Boolean> seen = new ConcurrentHashMap<>();
return t -> isNull(seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE));
}
}
测试代码.如果你运行这个测试用例,你会得到 [alex, lolita, elpidio, romualdo].
The test code. If you run this test case you will get [alex, lolita, elpidio, romualdo].
我希望得到 [romualdo, otroRomualdo] 作为给定 id 和 firstName 的提取副本:
I would expect to get instead [romualdo, otroRomualdo] as the extracted duplicates given the id and the firstName:
package at.mavila.learn.kafka.kafkaexercises;
import org.junit.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.util.ArrayList;
import java.util.List;
import static org.junit.Assert.*;
public class DuplicatePersonFilterTest {
private static final Logger LOGGER = LoggerFactory.getLogger(DuplicatePersonFilterTest.class);
@Test
public void testList(){
Person alex = new Person.Builder().id(1L).firstName("alex").secondName("salgado").build();
Person lolita = new Person.Builder().id(2L).firstName("lolita").secondName("llanero").build();
Person elpidio = new Person.Builder().id(3L).firstName("elpidio").secondName("ramirez").build();
Person romualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("gomez").build();
Person otroRomualdo = new Person.Builder().id(4L).firstName("romualdo").secondName("perez").build();
List<Person> personList = new ArrayList<>();
personList.add(alex);
personList.add(lolita);
personList.add(elpidio);
personList.add(romualdo);
personList.add(otroRomualdo);
final List<Person> duplicates = DuplicatePersonFilter.getDuplicates(personList);
LOGGER.info("Duplicates: {}",duplicates);
}
}
在我的工作中,我能够通过使用 TreeMap 和 ArrayList 的 Comparator 来获得所需的结果,但这是创建一个列表然后对其进行过滤,再次将过滤器传递给新创建的列表,这看起来很臃肿的代码,(并且可能效率低下)
In my job I was able to get the desired result it by using Comparator using TreeMap and ArrayList, but this was creating a list then filtering it, passing the filter again to a newly created list, this looks bloated code, (and probably inefficient)
有人对如何提取重复项有更好的想法吗?而不是删除它们.
Does someone has a better idea how to extract duplicates?, not remove them.
提前致谢.
更新
感谢大家的回答
使用与 uniqueAttributes 相同的方法删除重复项:
To remove the duplicate using same approach with the uniqueAttributes:
public static List<Person> removeDuplicates(List<Person> personList) {
return getDuplicatesMap(personList).values().stream()
.filter(duplicates -> duplicates.size() > 1)
.flatMap(Collection::stream)
.collect(Collectors.toList());
}
private static Map<String, List<Person>> getDuplicatesMap(List<Person> personList) {
return personList.stream().collect(groupingBy(DuplicatePersonFilter::uniqueAttributes));
}
private static String uniqueAttributes(Person person){
if(Objects.isNull(person)){
return StringUtils.EMPTY;
}
return (person.getId()) + (person.getFirstName()) ;
}
更新 2
但@brett-ryan 提供的答案也是正确的:
But also the answer provided by @brett-ryan is correct:
public static List<Person> extractDuplicatesWithIdentityCountingV2(final List<Person> personList){
List<Person> duplicates = personList.stream()
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()))
.entrySet().stream()
.filter(n -> n.getValue() > 1)
.flatMap(n -> nCopies(n.getValue().intValue(), n.getKey()).stream())
.collect(toList());
return duplicates;
}
编辑
上面的代码可以在下面找到:
Above code can be found under:
https://gitlab.com/totopoloco/marco_utilities/-/tree/master/duplicates_exercises
请看:
用法:https://gitlab.com/totopoloco/marco_utilities/-/blob/master/duplicates_exercises/src/test/java/at/mavila/exercises/duplicates/lists/DuplicatePersonFilterTest.java
实施:https://gitlab.com/totopoloco/marco_utilities/-/blob/master/duplicates_exercises/src/main/java/at/mavila/exercises/duplicates/lists/DuplicatePersonFilter.java
推荐答案
如果你可以在 Person
上实现 equals
和 hashCode
那么你就可以使用 groupingBy
的计数下游收集器来获取已重复的不同元素.
If you could implement equals
and hashCode
on Person
you could then use a counting down-stream collector of the groupingBy
to get distinct elements that have been duplicated.
List<Person> duplicates = personList.stream()
.collect(groupingBy(identity(), counting()))
.entrySet().stream()
.filter(n -> n.getValue() > 1)
.map(n -> n.getKey())
.collect(toList());
如果您想保留一个连续重复元素的列表,您可以使用 Collections.nCopies 将其展开.此方法将确保重复的元素排列在一起.
If you would like to keep a list of sequential repeated elements you can then expand this out using Collections.nCopies to expand it back out. This method will ensure repeated elements are ordered together.
List<Person> duplicates = personList.stream()
.collect(groupingBy(identity(), counting()))
.entrySet().stream()
.filter(n -> n.getValue() > 1)
.flatMap(n -> nCopies(n.getValue().intValue(), n.getKey()).stream())
.collect(toList());
这篇关于从 Java 8 中的列表中提取重复对象的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!