Remove two bn_wexpand() from BN_mul(), which is a step toward getting
BN_mul() correctly constified, avoids two realloc()'s that aren't
really necessary and saves memory to boot.  This required a small
change in bn_mul_part_recursive() and the addition of variants of
bn_cmp_words(), bn_add_words() and bn_sub_words() that can take arrays
with differing sizes.

The test results show a performance that very closely matches the
original code from before my constification.  This may seem like a
very small win from a performance point of view, but if one remembers
that the variants of bn_cmp_words(), bn_add_words() and bn_sub_words()
are not at all optimized for the moment (and there's no corresponding
assembler code), and that their use may be just as non-optimal, I'm
pretty confident there are possibilities...

This code needs reviewing!
diff --git a/CHANGES b/CHANGES
index a6f14fd..c54ab99 100644
--- a/CHANGES
+++ b/CHANGES
@@ -4,6 +4,15 @@
 
  Changes between 0.9.6 and 0.9.7  [xx XXX 2000]
 
+  *) Remove a few calls to bn_wexpand() in BN_sqr() (the one in there
+     was actually never needed) and in BN_mul().  The removal in BN_mul()
+     required a small change in bn_mul_part_recursive() and the addition
+     of the static functions bn_cmp_part_words(), bn_sub_part_words()
+     and bn_add_part_words() which do the same thing as bn_cmp_words(),
+     bn_sub_words() and bn_add_words() except they take arrays with
+     differing sizes.
+     [Richard Levitte]
+
   *) In 'openssl passwd', verify passwords read from the terminal
      unless the '-salt' option is used (which usually means that
      verification would just waste user's time since the resulting