全域哈希


Universal hashing

https://en.wikipedia.org/wiki/Universal_hashing

Notion

\[\delta_f(x,y) = \begin{cases}1 &f(x)==f(y)&and&x\neq y\\0&otherwise\end{cases} \]

\[\delta_H(x,S) = \sum_{f\in H}\sum_{y\in S}\delta_f(x,y) \]

S is a set. f is a mapping from A to B where A and B is a subset in set S. and |A| > |B|.

若x是一个实数,[x] 表示大于x的最小整数. Z_n表示(任意 mod n)的结果。

Let H be a class of functions from A to B. We say that H is universal, if for all x, y
in A, δ_H(x, y) ≤ |H| / |B|. That is, H is universal_2 (some other article called 2-Universal ), if no pair of distinct keys collide under more than (1 /|B|)th of the functions. The subscript “2” is intended to emphasize that the definition constrains the behavior of H only on pairs of elements of A. It turns
out that this is powerful enough for many purposes, as the propositions of this section suggest. However, for some applications of hashing, it is desirable to have a class of functions which distribute larger subsets of A in a uniform manner. This may be the subject of a future paper.

如果在任意x,y值的情况下,(没有一对xy会碰撞在超过1/|B|的函数中)哈希函数集中总共不超过1/|B|的函数会产生碰撞,则说这样的函数集是2-Universial。

这样的函数集有很多(在论文里作者给出了三组)

最常用的是一组,也就是H_1

\[a=|A|\\b=|B|\\A=\{0,1,...,a-1\}\\B=\{0,1,...,b-1\}\\p\ is\ prime\ where\ p \ge a\\g:Z_p\mapsto B\ and\ \ |\{y\in Z_p|g(y)=i\}|\le[p/b]\\m,n \in Z_p \ and \ m\ne0\\h_m,_n:A \mapsto Z_p \\h_m,_n=(mx+n)\ mod\ p\\f_m,_n=g(h_m,_n(x))\\ H_1=\{f_m,_n|m,n\in Z_p\ \} \]

g函数是Z_p映射到B,一般选择 mod b即可。因为mod b可以映射任意值到B中。m,n可以使用任意随机数 mod p。

需要注意的是,H_1全域哈希函数只能满足δ_H_1(x, y) ≤ 2|H| / |B| 。论文中有论述。

为了避免取余操作,可以使用优化方法。参见wiki页面。