The reason for using m1+le8 instead of stride load + larger group multipliers is the same as in "[FFmpeg-devel] [PATCH 1/7] lavc/me_cmp: R-V V pix_abs." In the test, there is #define src (buf + 2 * SRC_BUF_STRIDE + 2 + 1) Therefore, not using e8 will result : (fatal signal 7: Bus error).