Handle the common case first (where input size is a multiple of block size).
Worth around 5% for encrypt. Slows down decrypt slightly, but I expect to
regain that later.
1 file changed