Optimize png_do_expand_palette for ARM

ARM-specific optimization processes 8 or 4 pixels at once.
Improves performance by around 10-22% on a recent ARM Chromebook.
7 files changed