Add missing patch attachment... 在 2023/6/18 16:23, Logan.Lyu 写道: > Hi, Martin, > > I modified it according to your comments. Please review again. > > And here are the checkasm benchmark results of the related functions: > > put_hevc_epel_h4_8_c: 67.1 > put_hevc_epel_h4_8_i8mm: 21.1 > put_hevc_epel_h6_8_c: 147.1 > put_hevc_epel_h6_8_i8mm: 45.1 > put_hevc_epel_h8_8_c: 237.4 > put_hevc_epel_h8_8_i8mm: 72.1 > put_hevc_epel_h12_8_c: 527.4 > put_hevc_epel_h12_8_i8mm: 115.4 > put_hevc_epel_h16_8_c: 943.6 > put_hevc_epel_h16_8_i8mm: 153.9 > put_hevc_epel_h24_8_c: 2105.4 > put_hevc_epel_h24_8_i8mm: 384.4 > put_hevc_epel_h32_8_c: 3631.4 > put_hevc_epel_h32_8_i8mm: 519.9 > put_hevc_epel_h48_8_c: 8082.1 > put_hevc_epel_h48_8_i8mm: 1110.4 > put_hevc_epel_h64_8_c: 14400.6 > put_hevc_epel_h64_8_i8mm: 2057.1 > > put_hevc_qpel_h4_8_c: 124.9 > put_hevc_qpel_h4_8_neon: 43.1 > put_hevc_qpel_h4_8_i8mm: 33.1 > put_hevc_qpel_h6_8_c: 269.4 > put_hevc_qpel_h6_8_neon: 90.6 > put_hevc_qpel_h6_8_i8mm: 61.4 > put_hevc_qpel_h8_8_c: 477.6 > put_hevc_qpel_h8_8_neon: 82.1 > put_hevc_qpel_h8_8_i8mm: 99.9 > put_hevc_qpel_h12_8_c: 1062.4 > put_hevc_qpel_h12_8_neon: 226.9 > put_hevc_qpel_h12_8_i8mm: 170.9 > put_hevc_qpel_h16_8_c: 1880.6 > put_hevc_qpel_h16_8_neon: 302.9 > put_hevc_qpel_h16_8_i8mm: 251.4 > put_hevc_qpel_h24_8_c: 4221.9 > put_hevc_qpel_h24_8_neon: 893.9 > put_hevc_qpel_h24_8_i8mm: 626.1 > put_hevc_qpel_h32_8_c: 7437.6 > put_hevc_qpel_h32_8_neon: 1189.9 > put_hevc_qpel_h32_8_i8mm: 959.1 > put_hevc_qpel_h48_8_c: 16838.4 > put_hevc_qpel_h48_8_neon: 2727.9 > put_hevc_qpel_h48_8_i8mm: 2163.9 > put_hevc_qpel_h64_8_c: 29982.1 > put_hevc_qpel_h64_8_neon: 4777.6 > > > 在 2023/6/12 16:12, Martin Storsjö 写道: >> On Sun, 4 Jun 2023, Logan.Lyu@myais.com.cn wrote: >> >>> From: Logan Lyu >>> >>> Signed-off-by: Logan Lyu >>> --- >>> libavcodec/aarch64/hevcdsp_epel_neon.S    | 343 ++++++++++++++++++++++ >>> libavcodec/aarch64/hevcdsp_init_aarch64.c |   7 +- >>> 2 files changed, 349 insertions(+), 1 deletion(-) >> >> >>> +        st2             {v20.8h, v21.8h}, [x7] >>> +        subs            w3, w3, #1   // height >>> +        b.ne            1b >>> +        ret >> >> In general, place the loop counter decrement somewhere else than >> exactly before the branch that depends on the result. E.g. after the >> initial loads is usually a good place, or between the st1/2 >> instructions and the instructions that calculate the final output >> values. >> >> The same goes probably for all places in all these patches. >> >>> @@ -283,13 +287,14 @@ av_cold void >>> ff_hevc_dsp_init_aarch64(HEVCDSPContext *c, const int bit_depth) >>>         NEON8_FNASSIGN_PARTIAL_4(c->put_hevc_qpel_uni_w, 1, 0, >>> qpel_uni_w_v,); >>> >>>         if (have_i8mm(cpu_flags)) { >>> +            NEON8_FNASSIGN(c->put_hevc_epel, 0, 1, epel_h, _i8mm); >>>             NEON8_FNASSIGN(c->put_hevc_epel_uni_w, 0, 1, >>> epel_uni_w_h ,_i8mm); >>>             NEON8_FNASSIGN(c->put_hevc_qpel, 0, 1, qpel_h, _i8mm); >>>             NEON8_FNASSIGN(c->put_hevc_qpel_uni_w, 0, 1, >>> qpel_uni_w_h, _i8mm); >>> NEON8_FNASSIGN_PARTIAL_5(c->put_hevc_qpel_uni_w, 1, 1, >>> qpel_uni_w_hv, _i8mm); >>>         } >>> - >>>     } >>> + >>>     if (bit_depth == 10) { >> >> Here are some stray unrelated whitespace changes. >> >> Other than that, this patch looks mostly reasonable. >> >> // Martin >> > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".