The C Minimal Perfect Hashing Library is a portable LGPL library designed to create and operate minimal perfect hashing functions.
Perfect hash functions are vital for mapping a static set of n keys into a set of m integer numbers without any collisions. If m equals n, the function is called minimal. The purpose of perfect hash functions is for memory efficient storage and fast retrieval of items from static sets. These sets include items such as natural language words, reserved words in programming languages or interactive systems, URLs in Web search engines, and item sets in data mining techniques. These sets are critical in information retrieval systems, database systems, language translation systems, electronic commerce systems, compilers, operating systems, among other areas.
The only drawback of perfect hash functions is that current algorithms limit their use to small sets of keys. However, the C Minimal Perfect Hashing Library offers a solution; it gives the free software community an API that works efficiently for sets that contain billions of keys. As the most popular data structure used as an indexing structure in databases is the B+ tree, the use of minimal perfect hash functions becomes essential for larger sets of keys in temporary memory.
The B+ tree is useful for dynamic applications with frequent insertions and deletions of records. However, for applications with sporadic modifications and a vast number of queries, the B+ tree becomes insufficient due to the impracticality of the structure. The C Minimal Perfect Hashing Library offers a better alternative for obtaining minimal perfect hash functions. By using this library, the process of assigning ids to web pages from a collection becomes straightforward. Additionally, minimal perfect hash functions can easily scale to hundreds of millions of entries, using stock hardware.
The need for such efficient algorithms to build memory and time efficient minimal perfect hash functions has led to the creation of the C Minimal Perfect Hashing Library. The lack of similar free software libraries has been the primary motivation behind this library. The CMPH library was designed to create minimal perfect hash functions in very large sets of keys. However, gperf, although slightly different, was designed to create very fast perfect hash functions for small sets of keys. The C Minimal Perfect Hashing Library is a portable, LGPLed library that generates and works with highly efficient minimal perfect hash functions giving more control to the user.
Version 0.9: N/A