doc/lhash.doc - third_party/openssl - Git at Google

 The LHASH library.

 I wrote this library in 1991 and have since forgotten why I called it lhash.
 It implements a hash table from an article I read at the
 time from 'Communications of the ACM'.  What makes this hash
 table different is that as the table fills, the hash table is
 increased (or decreased) in size via realloc().
 When a 'resize' is done, instead of all hashes being redistributed over
 twice as many 'buckets', one bucket is split.  So when an 'expand' is done,
 there is only a minimal cost to redistribute some values.  Subsequent
 inserts will cause more single 'bucket' redistributions but there will
 never be a sudden large cost due to redistributing all the 'buckets'.

 The state for a particular hash table is kept in the LHASH structure.
 The LHASH structure also records statistics about most aspects of accessing
 the hash table.  This is mostly a legacy of my writing this library for
 the reasons of implementing what looked like a nice algorithm rather than
 for a particular software product.

 Internal stuff you probably don't want to know about.
 The decision to increase or decrease the hash table size is made depending
 on the 'load' of the hash table.  The load is the number of items in the
 hash table divided by the size of the hash table.  The default values are
 as follows.  If (hash->up_load < load) => expand.
 if (hash->down_load > load) =>  contract.  The 'up_load' has a default value of
 1 and 'down_load' has a default value of 2.  These numbers can be modified
 by the application by just playing with the 'up_load' and 'down_load'
 variables.  The 'load' is kept in a form which is multiplied by 256.  So
 hash->up_load=8*256; will cause a load of 8 to be set.

 If you are interested in performance the field to watch is
 num_comp_calls.  The hash library keeps track of the 'hash' value for
 each item so when a lookup is done, the 'hashes' are compared, if
 there is a match, then a full compare is done, and
 hash->num_comp_calls is incremented.  If num_comp_calls is not equal
 to num_delete plus num_retrieve it means that your hash function is
 generating hashes that are the same for different values.  It is
 probably worth changing your hash function if this is the case because
 even if your hash table has 10 items in a 'bucked', it can be searched
 with 10 'unsigned long' compares and 10 linked list traverses.  This
 will be much less expensive that 10 calls to you compare function.

 LHASH *lh_new(
 unsigned long (*hash)(),
 int (*cmp)());
 	This function is used to create a new LHASH structure.  It is passed
 	function pointers that are used to store and retrieve values passed
 	into the hash table.  The 'hash'
 	function is a hashing function that will return a hashed value of
 	it's passed structure.  'cmp' is passed 2 parameters, it returns 0
 	is they are equal, otherwise, non zero.
 	If there are any problems (usually malloc failures), NULL is
 	returned, otherwise a new LHASH structure is returned.  The
 	hash value is normally truncated to a power of 2, so make sure
 	that your hash function returns well mixed low order bits.

 void lh_free(
 LHASH *lh);
 	This function free()s a LHASH structure.  If there is malloced
 	data in the hash table, it will not be freed.  Consider using the
 	lh_doall function to deallocate any remaining entries in the hash
 	table.

 char *lh_insert(
 LHASH *lh,
 char *data);
 	This function inserts the data pointed to by data into the lh hash
 	table.  If there is already and entry in the hash table entry, the
 	value being replaced is returned.  A NULL is returned if the new
 	entry does not clash with an entry already in the table (the normal
 	case) or on a malloc() failure (perhaps I should change this....).
 	The 'char *data' is exactly what is passed to the hash and
 	comparison functions specified in lh_new().

 char *lh_delete(
 LHASH *lh,
 char *data);
 	This routine deletes an entry from the hash table.  The value being
 	deleted is returned.  NULL is returned if there is no such value in
 	the hash table.

 char *lh_retrieve(
 LHASH *lh,
 char *data);
 	If 'data' is in the hash table it is returned, else NULL is
 	returned.  The way these routines would normally be uses is that a
 	dummy structure would have key fields populated and then
 	ret=lh_retrieve(hash,&dummy);.  Ret would now be a pointer to a fully
 	populated structure.

 void lh_doall(
 LHASH *lh,
 void (*func)(char *a));
 	This function will, for every entry in the hash table, call function
 	'func' with the data item as parameters.
 	This function can be quite useful when used as follows.
 	void cleanup(STUFF *a)
 		{ STUFF_free(a); }
 	lh_doall(hash,cleanup);
 	lh_free(hash);
 	This can be used to free all the entries, lh_free() then
 	cleans up the 'buckets' that point to nothing.  Be careful
 	when doing this.  If you delete entries from the hash table,
 	in the call back function, the table may decrease in size,
 	moving item that you are
 	currently on down lower in the hash table.  This could cause
 	some entries to be skipped.  The best solution to this problem
 	is to set lh->down_load=0 before you start.  This will stop
 	the hash table ever being decreased in size.

 void lh_doall_arg(
 LHASH *lh;
 void(*func)(char *a,char *arg));
 char *arg;
 	This function is the same as lh_doall except that the function
 	called will be passed 'arg' as the second argument.

 unsigned long lh_strhash(
 char *c);
 	This function is a demo string hashing function.  Since the LHASH
 	routines would normally be passed structures, this routine would
 	not normally be passed to lh_new(), rather it would be used in the
 	function passed to lh_new().

 The next three routines print out various statistics about the state of the
 passed hash table.  These numbers are all kept in the lhash structure.

 void lh_stats(
 LHASH *lh,
 FILE *out);
 	This function prints out statistics on the size of the hash table,
 	how many entries are in it, and the number and result of calls to
 	the routines in this library.

 void lh_node_stats(
 LHASH *lh,
 FILE *out);
 	For each 'bucket' in the hash table, the number of entries is
 	printed.

 void lh_node_usage_stats(
 LHASH *lh,
 FILE *out);
 	This function prints out a short summary of the state of the hash
 	table.  It prints what I call the 'load' and the 'actual load'.
 	The load is the average number of data items per 'bucket' in the
 	hash table.  The 'actual load' is the average number of items per
 	'bucket', but only for buckets which contain entries.  So the
 	'actual load' is the average number of searches that will need to
 	find an item in the hash table, while the 'load' is the average number
 	that will be done to record a miss.
	The LHASH library.

	I wrote this library in 1991 and have since forgotten why I called it lhash.
	It implements a hash table from an article I read at the
	time from 'Communications of the ACM'. What makes this hash
	table different is that as the table fills, the hash table is
	increased (or decreased) in size via realloc().
	When a 'resize' is done, instead of all hashes being redistributed over
	twice as many 'buckets', one bucket is split. So when an 'expand' is done,
	there is only a minimal cost to redistribute some values. Subsequent
	inserts will cause more single 'bucket' redistributions but there will
	never be a sudden large cost due to redistributing all the 'buckets'.

	The state for a particular hash table is kept in the LHASH structure.
	The LHASH structure also records statistics about most aspects of accessing
	the hash table. This is mostly a legacy of my writing this library for
	the reasons of implementing what looked like a nice algorithm rather than
	for a particular software product.

	Internal stuff you probably don't want to know about.
	The decision to increase or decrease the hash table size is made depending
	on the 'load' of the hash table. The load is the number of items in the
	hash table divided by the size of the hash table. The default values are
	as follows. If (hash->up_load < load) => expand.
	if (hash->down_load > load) => contract. The 'up_load' has a default value of
	1 and 'down_load' has a default value of 2. These numbers can be modified
	by the application by just playing with the 'up_load' and 'down_load'
	variables. The 'load' is kept in a form which is multiplied by 256. So
	hash->up_load=8*256; will cause a load of 8 to be set.

	If you are interested in performance the field to watch is
	num_comp_calls. The hash library keeps track of the 'hash' value for
	each item so when a lookup is done, the 'hashes' are compared, if
	there is a match, then a full compare is done, and
	hash->num_comp_calls is incremented. If num_comp_calls is not equal
	to num_delete plus num_retrieve it means that your hash function is
	generating hashes that are the same for different values. It is
	probably worth changing your hash function if this is the case because
	even if your hash table has 10 items in a 'bucked', it can be searched
	with 10 'unsigned long' compares and 10 linked list traverses. This
	will be much less expensive that 10 calls to you compare function.

	LHASH *lh_new(
	unsigned long (*hash)(),
	int (*cmp)());
	This function is used to create a new LHASH structure. It is passed
	function pointers that are used to store and retrieve values passed
	into the hash table. The 'hash'
	function is a hashing function that will return a hashed value of
	it's passed structure. 'cmp' is passed 2 parameters, it returns 0
	is they are equal, otherwise, non zero.
	If there are any problems (usually malloc failures), NULL is
	returned, otherwise a new LHASH structure is returned. The
	hash value is normally truncated to a power of 2, so make sure
	that your hash function returns well mixed low order bits.

	void lh_free(
	LHASH *lh);
	This function free()s a LHASH structure. If there is malloced
	data in the hash table, it will not be freed. Consider using the
	lh_doall function to deallocate any remaining entries in the hash
	table.

	char *lh_insert(
	LHASH *lh,
	char *data);
	This function inserts the data pointed to by data into the lh hash
	table. If there is already and entry in the hash table entry, the
	value being replaced is returned. A NULL is returned if the new
	entry does not clash with an entry already in the table (the normal
	case) or on a malloc() failure (perhaps I should change this....).
	The 'char *data' is exactly what is passed to the hash and
	comparison functions specified in lh_new().

	char *lh_delete(
	LHASH *lh,
	char *data);
	This routine deletes an entry from the hash table. The value being
	deleted is returned. NULL is returned if there is no such value in
	the hash table.

	char *lh_retrieve(
	LHASH *lh,
	char *data);
	If 'data' is in the hash table it is returned, else NULL is
	returned. The way these routines would normally be uses is that a
	dummy structure would have key fields populated and then
	ret=lh_retrieve(hash,&dummy);. Ret would now be a pointer to a fully
	populated structure.

	void lh_doall(
	LHASH *lh,
	void (func)(char a));
	This function will, for every entry in the hash table, call function
	'func' with the data item as parameters.
	This function can be quite useful when used as follows.
	void cleanup(STUFF *a)
	{ STUFF_free(a); }
	lh_doall(hash,cleanup);
	lh_free(hash);
	This can be used to free all the entries, lh_free() then
	cleans up the 'buckets' that point to nothing. Be careful
	when doing this. If you delete entries from the hash table,
	in the call back function, the table may decrease in size,
	moving item that you are
	currently on down lower in the hash table. This could cause
	some entries to be skipped. The best solution to this problem
	is to set lh->down_load=0 before you start. This will stop
	the hash table ever being decreased in size.

	void lh_doall_arg(
	LHASH *lh;
	void(func)(char a,char *arg));
	char *arg;
	This function is the same as lh_doall except that the function
	called will be passed 'arg' as the second argument.

	unsigned long lh_strhash(
	char *c);
	This function is a demo string hashing function. Since the LHASH
	routines would normally be passed structures, this routine would
	not normally be passed to lh_new(), rather it would be used in the
	function passed to lh_new().

	The next three routines print out various statistics about the state of the
	passed hash table. These numbers are all kept in the lhash structure.

	void lh_stats(
	LHASH *lh,
	FILE *out);
	This function prints out statistics on the size of the hash table,
	how many entries are in it, and the number and result of calls to
	the routines in this library.

	void lh_node_stats(
	LHASH *lh,
	FILE *out);
	For each 'bucket' in the hash table, the number of entries is
	printed.

	void lh_node_usage_stats(
	LHASH *lh,
	FILE *out);
	This function prints out a short summary of the state of the hash
	table. It prints what I call the 'load' and the 'actual load'.
	The load is the average number of data items per 'bucket' in the
	hash table. The 'actual load' is the average number of items per
	'bucket', but only for buckets which contain entries. So the
	'actual load' is the average number of searches that will need to
	find an item in the hash table, while the 'load' is the average number
	that will be done to record a miss.