问题描述
从 Java 8 开始,hashMap 稍作修改,如果同一存储桶上有超过 8 个 (TREEIFY_THRESHOLD=8) 项,则 hashMap 具有平衡树而不是链表.选择 8 有什么理由吗?
From Java 8, the hashMap modified slightly to have balanced tree instead of linkedlist if more than 8 (TREEIFY_THRESHOLD=8) items on same bucket. is there any reason choosing 8?
如果是 9 会影响性能吗?
would it impact the performance in case it is 9?
推荐答案
使用平衡树而不是链表是一种权衡.在列表的情况下,必须执行线性扫描以在存储桶中执行查找,而树允许日志时间访问.当列表很小时,查找速度很快,并且使用树实际上并没有提供任何好处,而大约 8 个左右的元素在列表中查找的成本变得足够显着,以至于树提供了加速.
The use of a balanced tree instead of a linked-list is a tradeoff. In the case of a list, a linear scan must be performed to perform a lookup in a bucket, while the tree allows for log-time access. When the list is small, the lookup is fast and using a tree doesn't actually provide a benefit while around 8 or so elements the cost of a lookup in the list becomes significant enough that the tree provides a speed-up.
我怀疑树的使用是针对密钥哈希被灾难性破坏(例如许多密钥冲突)的例外情况;虽然线性查找会导致性能严重下降,但使用树可以缓解这种情况性能有所损失,如果键可直接比较.
I suspect that the use of a tree is intended for the exceptional case where the key hash is catastrophically broken (e.g. many keys collide); while a linear lookup will cause performance to degrade severely the use of a tree mitigates this performance loss somewhat, if the keys are directly comparable.
因此,8 个条目的确切阈值可能不是非常重要:假设良好的密钥分布,树箱的机会是 0.00000006,因此在这种情况下显然很少使用树箱.当哈希算法灾难性地失败时,存储桶中的键数无论如何都远大于 8.
Therefore, the exact threshold of 8 entries may not be terribly significant: the chance of a tree bin is 0.00000006 assuming good key distribution, so tree bins are obviously used very rarely in such a case. When the hash algorithm is failing catastrophically, then the number of keys in the bucket is far greater than 8 anyway.
这会带来空间损失,因为树节点必须包含额外的引用:四个对树节点的引用和一个布尔值除了 LinkedHashMap.Entry
(见 它的来源).
This comes at a space penalty since the tree-node must include additional references: four references to tree nodes and a boolean in addition to the fields of a LinkedHashMap.Entry
(see its source).
来自 HashMap类源码中的注释:
因为 TreeNode 的大小大约是常规节点的两倍,我们仅当 bin 包含足够的节点以保证使用时才使用它们(参见 TREEIFY_THRESHOLD).当它们变得太小时(由于删除或调整大小)它们被转换回普通垃圾箱.在使用分布良好的用户哈希码,树箱是很少使用.理想情况下,在随机哈希码下,箱中的节点遵循泊松分布(http://en.wikipedia.org/wiki/Poisson_distribution)默认调整大小的平均参数约为 0.50.75 的阈值,尽管有很大的差异,因为调整粒度.忽略方差,预期列表大小 k 的出现次数为 (exp(-0.5) * pow(0.5, k)/阶乘(k)).
Because TreeNodes are about twice the size of regular nodes, we use them only when bins contain enough nodes to warrant use (see TREEIFY_THRESHOLD). And when they become too small (due to removal or resizing) they are converted back to plain bins. In usages with well-distributed user hashCodes, tree bins are rarely used. Ideally, under random hashCodes, the frequency of nodes in bins follows a Poisson distribution (http://en.wikipedia.org/wiki/Poisson_distribution) with a parameter of about 0.5 on average for the default resizing threshold of 0.75, although with a large variance because of resizing granularity. Ignoring variance, the expected occurrences of list size k are (exp(-0.5) * pow(0.5, k) / factorial(k)).
这篇关于Java Hashmap 中有什么理由在 TREEIFY_THRESHOLD 上有 8 个吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!