问题描述
我有一长串英文单词,我想对它们进行哈希处理.什么是好的散列函数?到目前为止,我的散列函数对字母的 ASCII 值求和,然后对表大小求模.我正在寻找高效而简单的东西.
I have a long list of English words and I would like to hash them. What would be a good hashing function? So far my hashing function sums the ASCII values of the letters then modulo the table size. I'm looking for something efficient and simple.
推荐答案
简单地将字母相加并不是一个好的策略,因为排列会产生相同的结果.
To simply sum the letters is not a good strategy because a permutation gives the same result.
这个 (djb2) 非常受欢迎,并且与ASCII 字符串.
This one (djb2) is quite popular and works nicely with ASCII strings.
unsigned long hashstring(unsigned char *str)
{
unsigned long hash = 5381;
int c;
while (c = *str++)
hash = ((hash << 5) + hash) + c; /* hash * 33 + c */
return hash;
}
更多信息此处.
如果您需要更多替代方案和一些性能措施,请阅读此处.
If you need more alternatives and some perfomance measures, read here.
添加:这些是通用散列函数,其中输入域是事先未知的(除了一些非常一般的假设:例如,上述使用 ascii 稍微好一点输入),这是最常见的场景.如果您有一个已知的受限域(固定输入集),您可以做得更好,请参阅 Fionn 的回答.
Added: These are general hashing functions, where the input domain is not known in advance (except perhaps some very general assumptions: eg the above works slightly better with ascii input), which is the most usual scenario. If you have a known restricted domain (set of inputs fixed) you can do better, see Fionn's answer.
这篇关于什么是英语单词的好的哈希函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!