[ARM] __cxa_end_cleanup: avoid clobbering r4

The fix for D111703 clobbered r4 both to:
 - Save/restore the original lr.
 - Load the address of _Unwind_Resume for LIBCXXABI_BAREMETAL.

This patch saves and restores lr without clobbering any extra

For LIBCXXABI_BAREMETAL, it is still necessary to clobber one extra
register to hold the address of _Unwind_Resume, but it seems better to
use ip/r12 (intended for linker veneers/trampolines) than r4 for this

The function also clobbers r0 for the _Unwind_Resume function's
parameter, but that is unavoidable.

Reviewed By: danielkiss, logan, MaskRay

Differential Revision: https://reviews.llvm.org/D121432

GitOrigin-RevId: 659029302dfb16a29219f931f66e9ad483ad1445
1 file changed