问题描述
所以我在这里问了另一个相关问题:java string hash function with avalanche effect,但我现在有一个不同的相关问题.
So I asked another related question here: java string hash function with avalanche effect, but I have a different, related question now.
我在那个问题中确定的是 String 的 hashCode() 函数没有雪崩效应.这意味着,例如,如果我有字符串k1"、k2"、k3",并且我在每个字符串上调用 hashCode(),则返回的值将是连续的.
What I established in that question was that the hashCode() function for String does not have an avalanche effect. This means, for example, that if I have strings "k1", "k2", "k3", and I call hashCode() on each, the values returned will be contiguous.
现在,根据我对数据结构 101 的回忆,我的印象是这是一件坏事.因为假设 HashMap 通过以下算法选择存储桶:
Now, based on my recollection of data structures 101, I was under the impression that this is a bad thing. Because assuming that HashMap chooses buckets by an algorithm something like:
class HashMap {
private int capacity;
private int chooseBucket(String key) {
return key.hashCode() % capacity;
}
}
这意味着相似的密钥存储在连续的桶中,导致更高的冲突率,将 big-O 查找时间从 O(1) 降级为......谁知道有多糟糕......可能比 O 更糟(日志 n).
It would mean that similar keys are stored in contiguous buckets, leading to a higher rate of collisions, degrading big-O lookup time from O(1) to be...who knows how bad...maybe worse than O(log n).
我对第一个问题的回答类型大致是这里不需要雪崩效应"、它仅适用于加密哈希函数"和字符串的 hashCode 实现速度很快,并且适用于小哈希图".
The types of answers I got to my first question were along the lines of 'avalanche effect isn't needed here', 'it's only for cryptography hash functions', and 'the hashCode implementation for strings is fast and works well for small hash maps'.
这让我很困惑.所有数据结构在很小的时候都很快.Sun 不会提供适用于 large 数据集的默认 hashCode 函数吗?那时 HashMap 的性能真的很重要,不是吗?
Which confuses me. All data structures are fast when they're small. Wouldn't Sun provide a default hashCode function that will work well for large data sets? That's when the performance of HashMap really matters anyway, isn't it?
或者,我错过了什么?请赐教.
Or, am I missing something? Please enlighten me.
推荐答案
将密钥存储在连续的桶中不会导致性能下降.将密钥存储在 same 存储桶中(例如,chaining).使用链式解决哈希冲突时:
Storing keys in contiguous buckets does not cause performance degradation. Storing keys in the same bucket (e.g., chaining) does. When using chaining to resolve hash collisions:
- 最坏的情况:每个哈希值都是相同的,所以所有元素最终都在同一个桶中,在这种情况下你会得到 O(n) 的性能(假设链是链表)
- 最佳情况:每个哈希值都不同,因此每个元素最终都位于不同的存储桶中,因此您可以获得预期的 O(1) 性能.
用于哈希表(等)的哈希代码不需要需要 雪崩效应.
Hash codes for use in hash tables (and the like) do not need an avalanche effect.
这篇关于hashCode、实现以及与 HashMap 的关系的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!