Hi, Martin I have finished the modification, please review again. Thanks. 在 2023/5/26 16:34, Martin Storsjö 写道: > Hi, > > Overall these patches seem mostly ok, but I've got a few minor points > to make: > > - The usdot instruction requires the i8mm extension (part of > armv8.6-a), while udot or sdot would require the dotprod extension > (available in armv8.4-a). If you could manage with udot or sdot, these > functions would be usable on a wider set of CPUs. > > Therefore, the current guards are wrong. Also, I finally got support > implemented for optionally using these cpu extensions, even if the > baseline of the compile don't include it, by runtime enabling it. See > the patchset at > https://patchwork.ffmpeg.org/project/ffmpeg/list/?series=9009. > > To adapt your patches on top of this, see the two topmost commits at > https://github.com/mstorsjo/ffmpeg/commits/archext. Fixed. > - The indentation is inconsistent; in the first patch, you have some > instructions written like this: > > +        sqadd   v1.4s, v1.4s, v29.4s > > While you later use this style: > > +        dup             v1.16b, v28.b[1] > > The latter seems to match the style we commonly use; please reformat > your code to match that consistently. > > With some macro invocations in the first patch, you also seem to have > too much indentation in some places. See e.g. this: > > +1:      ldr             q23, [x2, x3] > +        add             x2, x2, x3, lsl #1 > +        QPEL_FILTER_B     v26, v16, v17, v18, v19, v20, v21, v22, v23 > +        QPEL_FILTER_B2    v27, v16, v17, v18, v19, v20, v21, v22, v23 > +        QPEL_UNI_W_V_16 > +        subs            w4, w4, #1 > +        b.eq            2f > > (If the macro name is too long, that's ok, but here there's no need to > have those lines unaligned.) Fixed. > > - In the third patch, you've got multiple parameters from the stack > like this: > > +        ldp             x14, x15, [sp]          // mx, my > +        ldr             w13, [sp, #16]          // width > > I see that the mx an my parameters are intptr_t; that's good, since if > they would be 32 bit integers, the ABI for such parameters on the > stack differ between macOS/Darwin and Linux. But as long as they're > intptr_t they behave the same. > > - At the same place, you're backing up a bunch of registers: > > +        stp             x20, x21, [sp, #-16]! > +        stp             x22, x23, [sp, #-16]! > +        stp             x24, x25, [sp, #-16]! > +        stp             x26, x27, [sp, #-16]! > +        stp             x28, x30, [sp, #-16]! > > This is inefficient; instead, do this: > > +        stp             x28, x30, [sp, #-80]! > +        stp             x20, x21, [sp, #16] > +        stp             x22, x23, [sp, #32] > +        stp             x24, x25, [sp, #48] > +        stp             x26, x27, [sp, #64] > > Also, following that, I see that you back up the stack pointer in x28. > Why do you use x28 for that? Using x29 would be customary as frame > pointer. Using more efficient implementation now.  And x28 in this case is a common callee save register. Anyway, I use x19 instead now. > > Aside for that, I think the rest of the patches is acceptable. > > // Martin > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".