问题描述
我有两个多重集,都是 IEnumerables,我想比较它们.
I have two multisets, both IEnumerables, and I want to compare them.
string[] names1 = { "tom", "dick", "harry" };
string[] names2 = { "tom", "dick", "harry", "harry"};
string[] names3 = { "tom", "dick", "harry", "sally" };
string[] names4 = { "dick", "harry", "tom" };
希望 names1 == names4 返回 true(而 self == self 显然返回 true)
但所有其他组合都返回 false.
Want names1 == names4 to return true (and self == self returns true obviously)
But all other combos return false.
什么是最有效的方法?这些可以是大量复杂对象.
What is the most efficient way? These can be large sets of complex objects.
我看着做:var a = name1.orderby
var b = name4.orderby
return a == b;
推荐答案
最有效的方法取决于数据类型.一个相当有效且非常短的 O(N) 解决方案如下:
The most efficient way would depend on the datatypes. A reasonably efficient O(N) solution that's very short is the following:
var list1Groups=list1.ToLookup(i=>i);
var list2Groups=list2.ToLookup(i=>i);
return list1Groups.Count == list2Groups.Count
&& list1Groups.All(g => g.Count() == list2Groups[g.Key].Count());
项目必须具有有效的 Equals
和 GetHashcode
实现.
The items are required to have a valid Equals
and GetHashcode
implementation.
如果您想要一个更快的解决方案,cdhowie 的解决方案在 10000 个元素下相当快,并且领先大型简单对象集合的因子 5 - 可能是由于更好的内存效率.
If you want a faster solution, cdhowie's solution below is comparably fast @ 10000 elements, and pulls ahead by a factor 5 for large collections of simple objects - probably due to better memory efficiency.
最后,如果您真的对性能感兴趣,我肯定会尝试 Sort-then-SequenceEqual 方法.虽然它的复杂性更差,但这只是一个 log N
因素,并且这些因素肯定会被所有实际数据集大小的常数差异所淹没 - 你也许可以就地排序,使用数组甚至增量排序(可以是线性的).即使有 40 亿个元素,log-base-2 也只有 32;这是一个相关的性能差异,但常数因子的差异可能会更大.例如,如果您正在处理整数数组并且不介意修改收集顺序,那么即使对于 10000000 个项目(两倍,我在 32 位上得到 OutOfMemory),以下选项也比任何一个选项都快:
Finally, if you're really interested in performance, I'd definitely try the Sort-then-SequenceEqual approach. Although it has worse complexity, that's just a log N
factor, and those can definitely be drowned out by differences in the constant for all practical data set sizes - and you might be able to sort in-place, use arrays or even incrementally sort (which can be linear). Even at 4 billion elements, the log-base-2 is just 32; that's a relevant performance difference, but the difference in constant factor could conceivably be larger. For example, if you're dealing with arrays of ints and don't mind modifying the collection order, the following is faster than either option even for 10000000 items (twice that and I get an OutOfMemory on 32-bit):
Array.Sort(list1);
Array.Sort(list2);
return list1.SequenceEqual(list2);
YMMV 取决于机器、数据类型、月球周期和其他影响微基准的常见因素.
YMMV depending on machine, data-type, lunar cycle, and the other usual factors influencing microbenchmarks.
这篇关于测试两个IEnumerable<T>具有相同频率的相同值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!