Fix leakage when the cacheline is 32-bytes in CBC_MAC_ROTATE_IN_PLACE

rotated_mac is a 64-byte aligned buffer of size 64 and rotate_offset is secret.
Consider a weaker leakage model(CL) where only cacheline base address is leaked,
i.e address/32 for 32-byte cacheline(CL32).

Previous code used to perform two loads
    1. rotated_mac[rotate_offset ^ 32] and
    2. rotated_mac[rotate_offset++]
which would leak 2q + 1, 2q for 0 <= rotate_offset < 32
and 2q, 2q + 1 for 32 <= rotate_offset < 64

The proposed fix performs load operations which will always leak 2q, 2q + 1 and
selects the appropriate value in constant-time.

Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/18033)
diff --git a/ssl/record/tls_pad.c b/ssl/record/tls_pad.c
index e559350..7311c82 100644
--- a/ssl/record/tls_pad.c
+++ b/ssl/record/tls_pad.c
@@ -207,6 +207,7 @@
 #if defined(CBC_MAC_ROTATE_IN_PLACE)
     unsigned char rotated_mac_buf[64 + EVP_MAX_MD_SIZE];
     unsigned char *rotated_mac;
+    char aux1, aux2, aux3, mask;
 #else
     unsigned char rotated_mac[EVP_MAX_MD_SIZE];
 #endif
@@ -288,12 +289,19 @@
 #if defined(CBC_MAC_ROTATE_IN_PLACE)
     j = 0;
     for (i = 0; i < mac_size; i++) {
-        /* in case cache-line is 32 bytes, touch second line */
-        ((volatile unsigned char *)rotated_mac)[rotate_offset ^ 32];
+        /*
+         * in case cache-line is 32 bytes,
+         * load from both lines and select appropriately
+         */
+        aux1 = rotated_mac[rotate_offset & ~32];
+        aux2 = rotated_mac[rotate_offset | 32];
+        mask = constant_time_eq_8(rotate_offset & ~32, rotate_offset);
+        aux3 = constant_time_select_8(mask, aux1, aux2);
+        rotate_offset++;
 
         /* If the padding wasn't good we emit a random MAC */
         out[j++] = constant_time_select_8((unsigned char)(good & 0xff),
-                                          rotated_mac[rotate_offset++],
+                                          aux3,
                                           randmac[i]);
         rotate_offset &= constant_time_lt_s(rotate_offset, mac_size);
     }