Tested to pass FATE on Linux and Windows. Checkasm numbers vs the existing SSE2 code on Zen 5 (Strix Halo): vp9_inv_adst_adst_16x16_sub16_add_10_sse2: 1041.8 ( 1.92x) vp9_inv_adst_adst_16x16_sub16_add_10_avx512icl: 132.5 (15.06x) vp9_inv_dct_adst_16x16_sub16_add_10_sse2: 901.0 ( 1.98x) vp9_inv_dct_adst_16x16_sub16_add_10_avx512icl: 120.8 (14.79x) vp9_inv_dct_dct_16x16_sub16_add_10_sse2: 750.6 ( 2.10x) vp9_inv_dct_dct_16x16_sub16_add_10_avx512icl: 110.9 (14.18x) vp9_inv_dct_dct_32x32_sub32_add_10_sse2: 3922.6 ( 2.24x) vp9_inv_dct_dct_32x32_sub32_add_10_avx512icl: 506.6 (17.37x)