Re: [FFmpeg-devel] [PATCH] checkasm: add support for vvc alf

From: Nuo Mi <nuomi2021@gmail.com>
To: "Martin Storsjö" <martin@martin.st>
Cc: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Subject: Re: [FFmpeg-devel] [PATCH] checkasm: add support for vvc alf
Date: Tue, 28 Feb 2023 23:06:36 +0800
Message-ID: <CAFXK13cdMJp61KC3ZTnGMVd6YS=JtmcwTyfBOC3QmPTnQjc65g@mail.gmail.com> (raw)
In-Reply-To: <64cc66f8-a2fd-20f1-543-cab5403bfc2f@martin.st>

On Tue, Feb 28, 2023 at 7:51 PM Martin Storsjö <martin@martin.st> wrote:

> On Sun, 26 Feb 2023, Nuo Mi wrote:
>
> > +#include <string.h>
> > +
> > +#include "libavutil/intreadwrite.h"
> > +#include "libavutil/mem_internal.h"
> > +
> > +#include "libavcodec/avcodec.h"
> > +
> > +#include "libavcodec/vvcdsp.h"
> > +#include "libavcodec/vvcdec.h"
> > +
> > +#include "checkasm.h"
> > +
> > +static const uint32_t pixel_mask[3] = { 0xffffffff, 0x03ff03ff,
> 0x0fff0fff };
> > +
> > +#define SIZEOF_PIXEL ((bit_depth + 7) / 8)
> > +#define PIXEL_STRIDE (ALF_SUBBLOCK_SIZE + 2 * ALF_PADDING_SIZE)
> > +#define BUF_SIZE (PIXEL_STRIDE * (MAX_CTU_SIZE + 3 * 2) * 2) //+3 * 2
> for top and bottom row, *2 for high bit depth
> > +#define LUMA_PARAMS_SIZE (ALF_SUBBLOCK_SIZE / 4 * ALF_SUBBLOCK_SIZE / 4
> * ALF_NUM_COEFF_LUMA)
> > +
> > +#define randomize_buffers(buf0, buf1, size)                 \
> > +    do {                                                    \
> > +        uint32_t mask = pixel_mask[(bit_depth - 8) >> 1];   \
> > +        int k;                                              \
> > +        for (k = 0; k < size; k += 4) {                     \
> > +            uint32_t r = rnd() & mask;                      \
> > +            AV_WN32A(buf0 + k, r);                          \
> > +            AV_WN32A(buf1 + k, r);                          \
> > +        }                                                   \
> > +    } while (0)
> > +
> > +#define randomize_buffers2(buf, size, filter)               \
> > +    do {                                                    \
> > +        int k;                                              \
> > +        if (filter) {                                       \
> > +            for (k = 0; k < size; k++) {                    \
> > +                uint8_t r = rnd();                          \
> > +                buf[k] = r;                                 \
> > +            }                                               \
> > +        } else {                                            \
> > +            for (k = 0; k < size; k++) {                    \
> > +                int16_t r = rnd();                          \
> > +                buf[k] = r;                                 \
> > +            }                                               \
> > +        }                                                   \
> > +    } while (0)
>
> I don't quite see the point of the extra uint8_t/int16_t variable r here -
> you could just as well assign it directly to buf[k], no? Unless you
> specifically want the effect where you're narrowing the random value to a
> smaller range beforehand.
>
yes. clips can only get special values.
https://github.com/ffvvc/FFmpeg/blob/main/libavcodec/vvc_filter_template.c#L653
fixed by
https://github.com/ffvvc/FFmpeg/pull/42/commits/3034ea8050f7f2db580ef4a36bd806e16be943d4#diff-99295cc07508b29c356b6432f8dbd72c5dceb37584fb8b4be0d34da19455864dR61

>
> > +
> > +static void check_alf_luma_filter(VVCDSPContext *c, const int bit_depth)
> > +{
> > +    LOCAL_ALIGNED_32(uint8_t, dst0, [BUF_SIZE]);
> > +    LOCAL_ALIGNED_32(uint8_t, dst1, [BUF_SIZE]);
> > +    LOCAL_ALIGNED_32(uint8_t, src0, [BUF_SIZE]);
> > +    LOCAL_ALIGNED_32(uint8_t, src1, [BUF_SIZE]);
> > +    int8_t filter[LUMA_PARAMS_SIZE];
> > +    int16_t clip[LUMA_PARAMS_SIZE];
> > +    ptrdiff_t stride = PIXEL_STRIDE * SIZEOF_PIXEL;
> > +    int offset = (3 * PIXEL_STRIDE + 3) * SIZEOF_PIXEL;
> > +
> > +    declare_func_emms(AV_CPU_FLAG_AVX2, void, uint8_t *dst, ptrdiff_t
> dst_stride, const uint8_t *src, ptrdiff_t src_stride,
> > +        int width, int height, const int8_t *filter, const int16_t
> *clip);
> > +
> > +    randomize_buffers(src0, src1, BUF_SIZE);
> > +    randomize_buffers2(filter, LUMA_PARAMS_SIZE, 1);
> > +    randomize_buffers2(clip, LUMA_PARAMS_SIZE, 2);
>
> Here, both invocations of randomize_buffers2 are called with filter=1 or
> 2, both which will pick the "if (filter) {" case, so the else in
> randomize_buffers2 is unused?
>
fixed

>
> > +
> > +    for (int h = 4; h < ALF_SUBBLOCK_SIZE; h += 4) {
> > +        for (int w = 4; w < ALF_SUBBLOCK_SIZE; w += 4) {
>
> Wouldn't you want to use <= instead of < for the comparisons here? I don't
> know vvc so I can't say for sure, but that would seem logical to me.
>
you are right, fixed.

>
> Are all aspect ratio combinations allowed here? E.g. the log mentions that
> you're testing 28x4 blocks. In dav1d, similar cases use logic of testing w
> in the range of [h/4, h*4] (plus limited to e.g. [4,64] at the same time).
> That would reduce the number of combinations to test, if they aren't
> really valid in practice.
>
yes. almost all 4x width and height are valid.  last col or row may have a
small width and height.

>
> > +            if (check_func(c->alf.filter[LUMA],
> "vvc_alf_filter_luma_%dx%d_%d", w, h, bit_depth)) {
> > +                memset(dst0, 0, BUF_SIZE);
> > +                memset(dst1, 0, BUF_SIZE);
> > +                call_ref(dst0, stride, src0 + offset, stride, w, h,
> filter, clip);
> > +                call_new(dst1, stride, src1 + offset, stride, w, h,
> filter, clip);
> > +                for (int i = 0; i < h; i++) {
> > +                    if (memcmp(dst0 + i * stride, dst1 + i * stride, w
> * SIZEOF_PIXEL))
> > +                        fail();
> > +                }
> > +                bench_new(dst1, stride, src1 + offset, stride, w, h,
> filter, clip);
> > +            }
> > +            if (check_func(c->alf.filter[CHROMA],
> "vvc_alf_filter_chroma_%dx%d_%d", w, h, bit_depth)) {
> > +                memset(dst0, 0, BUF_SIZE);
> > +                memset(dst1, 0, BUF_SIZE);
> > +                call_ref(dst0, stride, src0 + offset, stride, w, h,
> filter, clip);
> > +                call_new(dst1, stride, src1 + offset, stride, w, h,
> filter, clip);
> > +                for (int i = 0; i < h; i++) {
> > +                    if (memcmp(dst0 + i * stride, dst1 + i * stride, w
> * SIZEOF_PIXEL))
> > +                        fail();
> > +                }
> > +                bench_new(dst1, stride, src1 + offset, stride, w, h,
> filter, clip);
> > +            }
> > +        }
> > +    }
> > +}
> > +
> > +void checkasm_check_vvc_alf(void)
> > +{
> > +    int bit_depth = 10;
> > +    VVCDSPContext h;
> > +    ff_vvc_dsp_init(&h, bit_depth);
> > +    check_alf_luma_filter(&h, bit_depth);
> > +    report("alf_filter");
>
> Is 10 bits the only currently valid bitdepth here? Otherwise I think we
> should include all of them, even if there's only assembly for one case at
> the moment. (The pixel_mask seems to assume 8/10/12 bits.)
>
I use the 10 bits version to collect performance suggestions. After it
done, I will add 8 and 12 bits version
But you are right. I can add 8/12 bits test first.

>
> // Martin
>
> Hi Martin,
Thank you for the review. comments inline.
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".