From: Paul B Mahol <onemda@gmail.com> To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> Subject: Re: [FFmpeg-devel] [PATCH] area changed: scdet filter Date: Sun, 12 May 2024 13:34:57 +0200 Message-ID: <CAPYw7P5pBy53AqVDHLR6oet7gDNziScMw6otv_ff1MhEC87Jbg@mail.gmail.com> (raw) In-Reply-To: <000301daa45c$4236b900$c6a42b00$@gmail.com> On Sun, May 12, 2024 at 1:05 PM <radu.taraibuta@gmail.com> wrote: > Improve scene detection accuracy by comparing frame with both previous and > next frame (creates one frame delay). > Add new mode parameter and new method to compute the frame difference using > cubic square to increase the weight of small changes and new mean formula. > This improves accuracy significantly. > Slightly improve performance by not using frame clone. > > Inconsistent code style with other filters. (Mostly using AVFilterLink* link instead of AVFilterLink *link). Unrelated changes, please split trivial unrelated changes into separate patches. Can't tables be generated at .init/.config_props time? No point in storing them into binary. Adding extra delay is not backward compatible change, it should be implemented properly by adding option for users to select mode: next & prev frame or just next or prev frame. Could split frame clone change into earlier separate patch. Where are results of improvements with accuracy so it can be confirmed? > Signed-off-by: raduct <radu.taraibuta@gmail.com> > --- > doc/filters.texi | 13 +++ > libavfilter/scene_sad.c | 167 +++++++++++++++++++++++++++++++++++- > libavfilter/scene_sad.h | 2 + > libavfilter/vf_scdet.c | 150 ++++++++++++++++++++------------ > tests/fate/filter-video.mak | 3 + > 5 files changed, 281 insertions(+), 54 deletions(-) > > diff --git a/doc/filters.texi b/doc/filters.texi > index bfa8ccec8b..de83a5e322 100644 > --- a/doc/filters.texi > +++ b/doc/filters.texi > @@ -21797,6 +21797,19 @@ Default value is @code{10.}. > @item sc_pass, s > Set the flag to pass scene change frames to the next filter. Default value > is @code{0} > You can enable it if you want to get snapshot of scene change frames only. > + > +@item mode > +Set the scene change detection method. Default value is @code{0} > +Available values are: > + > +@table @samp > +@item 0 > +Regular sum of absolute linear differences. > + > +@item 1 > +Sum of mean of cubic root differences. > + > +@end table > @end table > > @anchor{selectivecolor} > diff --git a/libavfilter/scene_sad.c b/libavfilter/scene_sad.c > index caf911eb5d..5280e356cc 100644 > --- a/libavfilter/scene_sad.c > +++ b/libavfilter/scene_sad.c > @@ -65,9 +65,174 @@ ff_scene_sad_fn ff_scene_sad_get_fn(int depth) > if (!sad) { > if (depth == 8) > sad = ff_scene_sad_c; > - if (depth == 16) > + else if (depth == 16) > sad = ff_scene_sad16_c; > } > return sad; > } > > +/* > +* Lookup table for 40.25*pow(i,1/3) - a.k.a cubic root extended to 0 - 255 > interval > +* Increase the weight of small differences compared to linear > +*/ > +static const uint8_t cbrtTable[256] = { > +0, 40, 51, 58, 64, 69, 73, 77, 81, 84, 87, 90, 92, 95, 97, > 99, > +101, 103, 105, 107, 109, 111, 113, 114, 116, 118, 119, 121, 122, 124, 125, > 126, > +128, 129, 130, 132, 133, 134, 135, 136, 138, 139, 140, 141, 142, 143, 144, > 145, > +146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 158, 159, > 160, > +161, 162, 163, 163, 164, 165, 166, 167, 167, 168, 169, 170, 170, 171, 172, > 173, > +173, 174, 175, 176, 176, 177, 178, 178, 179, 180, 180, 181, 182, 182, 183, > 184, > +184, 185, 186, 186, 187, 187, 188, 189, 189, 190, 190, 191, 192, 192, 193, > 193, > +194, 195, 195, 196, 196, 197, 197, 198, 199, 199, 200, 200, 201, 201, 202, > 202, > +203, 203, 204, 204, 205, 205, 206, 206, 207, 207, 208, 208, 209, 209, 210, > 210, > +211, 211, 212, 212, 213, 213, 214, 214, 215, 215, 216, 216, 217, 217, 218, > 218, > +219, 219, 219, 220, 220, 221, 221, 222, 222, 223, 223, 223, 224, 224, 225, > 225, > +226, 226, 226, 227, 227, 228, 228, 229, 229, 229, 230, 230, 231, 231, 231, > 232, > +232, 233, 233, 233, 234, 234, 235, 235, 235, 236, 236, 237, 237, 237, 238, > 238, > +238, 239, 239, 240, 240, 240, 241, 241, 242, 242, 242, 243, 243, 243, 244, > 244, > +244, 245, 245, 246, 246, 246, 247, 247, 247, 248, 248, 248, 249, 249, 249, > 250, > +250, 250, 251, 251, 252, 252, 252, 253, 253, 253, 254, 254, 254, 255, 255, > 255 }; > + > +/* > +* Lookup table for 101.52*pow(i,1/3) - a.k.a cubic root extended to 0 - > 1023 interval > +* Increase the weight of small differences compared to linear > +*/ > +static const uint16_t cbrtTable10[1024] = {}; > + > +void ff_scene_scrd_c(SCENE_SAD_PARAMS) > +{ > + uint64_t scrdPlus = 0; > + uint64_t scrdMinus = 0; > + int x, y; > + > + for (y = 0; y < height; y++) { > + for (x = 0; x < width; x++) > + if (src1[x] > src2[x]) > + scrdMinus += cbrtTable[src1[x] - src2[x]]; > + else > + scrdPlus += cbrtTable[src2[x] - src1[x]]; > + src1 += stride1; > + src2 += stride2; > + } > + > + double mean = (sqrt(scrdPlus) + sqrt(scrdMinus)) / 2.0; > + *sum = 2.0 * mean * mean; > +} > + > +void ff_scene_scrd2B_c(SCENE_SAD_PARAMS, int bitdepth) > +{ > + uint64_t scrdPlus = 0; > + uint64_t scrdMinus = 0; > + const uint16_t* src1w = (const uint16_t*)src1; > + const uint16_t* src2w = (const uint16_t*)src2; > + int x, y; > + int shift = FFABS(bitdepth - 10); > + > + stride1 /= 2; > + stride2 /= 2; > + > + if (bitdepth > 10) { > + for (y = 0; y < height; y++) { > + for (x = 0; x < width; x++) > + if (src1w[x] > src2w[x]) > + scrdMinus += cbrtTable10[(src1w[x] - src2w[x]) >> > shift]; > + else > + scrdPlus += cbrtTable10[(src2w[x] - src1w[x]) >> > shift]; > + src1w += stride1; > + src2w += stride2; > + } > + scrdMinus <<= shift; > + scrdPlus <<= shift; > + } > + else { > + for (y = 0; y < height; y++) { > + for (x = 0; x < width; x++) > + if (src1w[x] > src2w[x]) > + scrdMinus += cbrtTable10[(src1w[x] - src2w[x]) << > shift]; > + else > + scrdPlus += cbrtTable10[(src2w[x] - src1w[x]) << > shift]; > + src1w += stride1; > + src2w += stride2; > + } > + scrdMinus >>= shift; > + scrdPlus >>= shift; > + } > + > + double mean = (sqrt(scrdPlus) + sqrt(scrdMinus)) / 2.0; > + *sum = 2.0 * mean * mean; > +} > + > +void ff_scene_scrd9_c(SCENE_SAD_PARAMS) > +{ > + ff_scene_scrd2B_c(src1, stride1, src2, stride2, width, height, sum, > 9); > +} > + > +void ff_scene_scrd10_c(SCENE_SAD_PARAMS) > +{ > + ff_scene_scrd2B_c(src1, stride1, src2, stride2, width, height, sum, > 10); > +} > + > +void ff_scene_scrd12_c(SCENE_SAD_PARAMS) > +{ > + ff_scene_scrd2B_c(src1, stride1, src2, stride2, width, height, sum, > 12); > +} > + > +void ff_scene_scrd14_c(SCENE_SAD_PARAMS) > +{ > + ff_scene_scrd2B_c(src1, stride1, src2, stride2, width, height, sum, > 14); > +} > + > +void ff_scene_scrd16_c(SCENE_SAD_PARAMS) > +{ > + ff_scene_scrd2B_c(src1, stride1, src2, stride2, width, height, sum, > 16); > +} > + > +ff_scene_sad_fn ff_scene_scrd_get_fn(int depth) > +{ > + ff_scene_sad_fn scrd = NULL; > + if (depth == 8) > + scrd = ff_scene_scrd_c; > + else if (depth == 9) > + scrd = ff_scene_scrd9_c; > + else if (depth == 10) > + scrd = ff_scene_scrd10_c; > + else if (depth == 12) > + scrd = ff_scene_scrd12_c; > + else if (depth == 14) > + scrd = ff_scene_scrd14_c; > + else if (depth == 16) > + scrd = ff_scene_scrd16_c; > + return scrd; > +} > diff --git a/libavfilter/scene_sad.h b/libavfilter/scene_sad.h > index 173a051f2b..af9b06201c 100644 > --- a/libavfilter/scene_sad.h > +++ b/libavfilter/scene_sad.h > @@ -41,4 +41,6 @@ ff_scene_sad_fn ff_scene_sad_get_fn_x86(int depth); > > ff_scene_sad_fn ff_scene_sad_get_fn(int depth); > > +ff_scene_sad_fn ff_scene_scrd_get_fn(int depth); > + > #endif /* AVFILTER_SCENE_SAD_H */ > diff --git a/libavfilter/vf_scdet.c b/libavfilter/vf_scdet.c > index 15399cfebf..6162e4615b 100644 > --- a/libavfilter/vf_scdet.c > +++ b/libavfilter/vf_scdet.c > @@ -31,6 +31,17 @@ > #include "scene_sad.h" > #include "video.h" > > +enum SCDETMode { > + MODE_DIFF = 0, > + MODE_MEAN_CBRT = 1 > +}; > + > +typedef struct SCDETFrameInfo { > + AVFrame* picref; > + double mafd; > + double diff; > +} SCDETFrameInfo; > + > typedef struct SCDetContext { > const AVClass *class; > > @@ -39,11 +50,12 @@ typedef struct SCDetContext { > int nb_planes; > int bitdepth; > ff_scene_sad_fn sad; > - double prev_mafd; > - double scene_score; > - AVFrame *prev_picref; > + SCDETFrameInfo curr_frame; > + SCDETFrameInfo prev_frame; > + > double threshold; > int sc_pass; > + enum SCDETMode mode; > } SCDetContext; > > #define OFFSET(x) offsetof(SCDetContext, x) > @@ -55,6 +67,7 @@ static const AVOption scdet_options[] = { > { "t", "set scene change detect threshold", > OFFSET(threshold), AV_OPT_TYPE_DOUBLE, {.dbl = 10.}, 0, 100., V|F > }, > { "sc_pass", "Set the flag to pass scene change frames", > OFFSET(sc_pass), AV_OPT_TYPE_BOOL, {.dbl = 0 }, 0, 1, V|F > }, > { "s", "Set the flag to pass scene change frames", > OFFSET(sc_pass), AV_OPT_TYPE_BOOL, {.dbl = 0 }, 0, 1, V|F > }, > + { "mode", "scene change detection method", > OFFSET(mode), AV_OPT_TYPE_INT, {.i64 = MODE_DIFF}, MODE_DIFF, > MODE_MEAN_CBRT, V|F }, > {NULL} > }; > > @@ -85,13 +98,16 @@ static int config_input(AVFilterLink *inlink) > s->bitdepth = desc->comp[0].depth; > s->nb_planes = is_yuv ? 1 : av_pix_fmt_count_planes(inlink->format); > > - for (int plane = 0; plane < 4; plane++) { > + for (int plane = 0; plane < s->nb_planes; plane++) { > ptrdiff_t line_size = av_image_get_linesize(inlink->format, > inlink->w, plane); > s->width[plane] = line_size >> (s->bitdepth > 8); > - s->height[plane] = inlink->h >> ((plane == 1 || plane == 2) ? > desc->log2_chroma_h : 0); > + s->height[plane] = plane == 1 || plane == 2 ? > AV_CEIL_RSHIFT(inlink->h, desc->log2_chroma_h) : inlink->h; > } > > - s->sad = ff_scene_sad_get_fn(s->bitdepth == 8 ? 8 : 16); > + if (s->mode == MODE_DIFF) > + s->sad = ff_scene_sad_get_fn(s->bitdepth == 8 ? 8 : 16); > + else if (s->mode == MODE_MEAN_CBRT) > + s->sad = ff_scene_scrd_get_fn(s->bitdepth); > if (!s->sad) > return AVERROR(EINVAL); > > @@ -101,46 +117,86 @@ static int config_input(AVFilterLink *inlink) > static av_cold void uninit(AVFilterContext *ctx) > { > SCDetContext *s = ctx->priv; > - > - av_frame_free(&s->prev_picref); > } > > -static double get_scene_score(AVFilterContext *ctx, AVFrame *frame) > +static void compute_diff(AVFilterContext *ctx) > { > - double ret = 0; > SCDetContext *s = ctx->priv; > - AVFrame *prev_picref = s->prev_picref; > + AVFrame *prev_picref = s->prev_frame.picref; > + AVFrame *curr_picref = s->curr_frame.picref; > > - if (prev_picref && frame->height == prev_picref->height > - && frame->width == prev_picref->width) { > - uint64_t sad = 0; > - double mafd, diff; > - uint64_t count = 0; > + if (prev_picref && curr_picref > + && curr_picref->height == prev_picref->height > + && curr_picref->width == prev_picref->width) { > > + uint64_t sum = 0; > + uint64_t count = 0; > for (int plane = 0; plane < s->nb_planes; plane++) { > - uint64_t plane_sad; > + uint64_t plane_sum; > s->sad(prev_picref->data[plane], prev_picref->linesize[plane], > - frame->data[plane], frame->linesize[plane], > - s->width[plane], s->height[plane], &plane_sad); > - sad += plane_sad; > + curr_picref->data[plane], > curr_picref->linesize[plane], > + s->width[plane], s->height[plane], &plane_sum); > + sum += plane_sum; > count += s->width[plane] * s->height[plane]; > } > > - mafd = (double)sad * 100. / count / (1ULL << s->bitdepth); > - diff = fabs(mafd - s->prev_mafd); > - ret = av_clipf(FFMIN(mafd, diff), 0, 100.); > - s->prev_mafd = mafd; > - av_frame_free(&prev_picref); > + s->curr_frame.mafd = (double)sum * 100. / count / (1ULL << > s->bitdepth); > + s->curr_frame.diff = s->curr_frame.mafd - s->prev_frame.mafd; > + } else { > + s->curr_frame.mafd = 0; > + s->curr_frame.diff = 0; > } > - s->prev_picref = av_frame_clone(frame); > - return ret; > } > > -static int set_meta(SCDetContext *s, AVFrame *frame, const char *key, > const > char *value) > +static int set_meta(AVFrame *frame, const char *key, const char *value) > { > return av_dict_set(&frame->metadata, key, value, 0); > } > > +static int filter_frame(AVFilterContext* ctx, AVFrame* frame) > +{ > + AVFilterLink* inlink = ctx->inputs[0]; > + AVFilterLink* outlink = ctx->outputs[0]; > + SCDetContext* s = ctx->priv; > + > + s->prev_frame = s->curr_frame; > + s->curr_frame.picref = frame; > + > + if (s->prev_frame.picref) { > + compute_diff(ctx); > + > + if (s->prev_frame.diff < -s->curr_frame.diff) { > + s->prev_frame.diff = -s->curr_frame.diff; > + s->prev_frame.mafd = s->curr_frame.mafd; > + } > + double scene_score = av_clipf(FFMAX(s->prev_frame.diff, 0), 0, > 100.); > + > + char buf[64]; > + snprintf(buf, sizeof(buf), "%0.3f", s->prev_frame.mafd); > + set_meta(s->prev_frame.picref, "lavfi.scd.mafd", buf); > + snprintf(buf, sizeof(buf), "%0.3f", scene_score); > + set_meta(s->prev_frame.picref, "lavfi.scd.score", buf); > + > + if (scene_score >= s->threshold) { > + av_log(s, AV_LOG_INFO, "lavfi.scd.score: %.3f, lavfi.scd.time: > %s\n", > + scene_score, av_ts2timestr(s->prev_frame.picref->pts, > &inlink->time_base)); > + set_meta(s->prev_frame.picref, "lavfi.scd.time", > + av_ts2timestr(s->prev_frame.picref->pts, > &inlink->time_base)); > + } > + > + if (s->sc_pass) { > + if (scene_score >= s->threshold) > + return ff_filter_frame(outlink, s->prev_frame.picref); > + else > + av_frame_free(&s->prev_frame.picref); > + } > + else > + return ff_filter_frame(outlink, s->prev_frame.picref); > + } > + > + return 0; > +} > + > static int activate(AVFilterContext *ctx) > { > int ret; > @@ -148,6 +204,8 @@ static int activate(AVFilterContext *ctx) > AVFilterLink *outlink = ctx->outputs[0]; > SCDetContext *s = ctx->priv; > AVFrame *frame; > + int64_t pts; > + int status; > > FF_FILTER_FORWARD_STATUS_BACK(outlink, inlink); > > @@ -155,31 +213,17 @@ static int activate(AVFilterContext *ctx) > if (ret < 0) > return ret; > > - if (frame) { > - char buf[64]; > - s->scene_score = get_scene_score(ctx, frame); > - snprintf(buf, sizeof(buf), "%0.3f", s->prev_mafd); > - set_meta(s, frame, "lavfi.scd.mafd", buf); > - snprintf(buf, sizeof(buf), "%0.3f", s->scene_score); > - set_meta(s, frame, "lavfi.scd.score", buf); > + if (ret > 0) > + return filter_frame(ctx, frame); > > - if (s->scene_score >= s->threshold) { > - av_log(s, AV_LOG_INFO, "lavfi.scd.score: %.3f, lavfi.scd.time: > %s\n", > - s->scene_score, av_ts2timestr(frame->pts, > &inlink->time_base)); > - set_meta(s, frame, "lavfi.scd.time", > - av_ts2timestr(frame->pts, &inlink->time_base)); > - } > - if (s->sc_pass) { > - if (s->scene_score >= s->threshold) > - return ff_filter_frame(outlink, frame); > - else { > - av_frame_free(&frame); > - } > - } else > - return ff_filter_frame(outlink, frame); > + if (ff_inlink_acknowledge_status(inlink, &status, &pts)) { > + if (status == AVERROR_EOF) > + ret = filter_frame(ctx, NULL); > + > + ff_outlink_set_status(outlink, status, pts); > + return ret; > } > > - FF_FILTER_FORWARD_STATUS(inlink, outlink); > FF_FILTER_FORWARD_WANTED(outlink, inlink); > > return FFERROR_NOT_READY; > @@ -190,12 +234,12 @@ static const AVFilterPad scdet_inputs[] = { > .name = "default", > .type = AVMEDIA_TYPE_VIDEO, > .config_props = config_input, > - }, > + } > }; > > const AVFilter ff_vf_scdet = { > .name = "scdet", > - .description = NULL_IF_CONFIG_SMALL("Detect video scene change"), > + .description = NULL_IF_CONFIG_SMALL("Detect video scene change."), > .priv_size = sizeof(SCDetContext), > .priv_class = &scdet_class, > .uninit = uninit, > @@ -203,5 +247,5 @@ const AVFilter ff_vf_scdet = { > FILTER_INPUTS(scdet_inputs), > FILTER_OUTPUTS(ff_video_default_filterpad), > FILTER_PIXFMTS_ARRAY(pix_fmts), > - .activate = activate, > + .activate = activate > }; > diff --git a/tests/fate/filter-video.mak b/tests/fate/filter-video.mak > index ee9f0f5e40..cff48e33d9 100644 > --- a/tests/fate/filter-video.mak > +++ b/tests/fate/filter-video.mak > @@ -672,6 +672,9 @@ SCDET_DEPS = LAVFI_INDEV FILE_PROTOCOL MOVIE_FILTER > SCDET_FILTER SCALE_FILTER \ > FATE_METADATA_FILTER-$(call ALLYES, $(SCDET_DEPS)) += > fate-filter-metadata-scdet > fate-filter-metadata-scdet: SRC = > $(TARGET_SAMPLES)/svq3/Vertical400kbit.sorenson3.mov > fate-filter-metadata-scdet: CMD = run $(FILTER_METADATA_COMMAND) > "sws_flags=+accurate_rnd+bitexact;movie='$(SRC)',scdet=s=1" > +FATE_METADATA_FILTER-$(call ALLYES, $(SCDET_DEPS)) += > fate-filter-metadata-scdet1 > +fate-filter-metadata-scdet1: SRC = > $(TARGET_SAMPLES)/svq3/Vertical400kbit.sorenson3.mov > +fate-filter-metadata-scdet1: CMD = run $(FILTER_METADATA_COMMAND) > "sws_flags=+accurate_rnd+bitexact;movie='$(SRC)',scdet=s=1:t=6.5:mode=1" > > CROPDETECT_DEPS = LAVFI_INDEV FILE_PROTOCOL MOVIE_FILTER MOVIE_FILTER > MESTIMATE_FILTER CROPDETECT_FILTER \ > SCALE_FILTER MOV_DEMUXER H264_DECODER > -- > 2.43.0.windows.1 > > > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
next prev parent reply other threads:[~2024-05-12 11:35 UTC|newest] Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top 2024-05-12 11:05 radu.taraibuta 2024-05-12 11:34 ` Paul B Mahol [this message] -- strict thread matches above, loose matches on Subject: below -- 2024-05-13 15:52 radu.taraibuta 2024-05-19 16:05 ` radu.taraibuta 2024-05-28 7:51 ` radu.taraibuta 2024-05-28 13:16 ` Paul B Mahol 2024-05-30 21:31 ` Michael Niedermayer 2024-06-02 20:17 ` radu.taraibuta 2024-06-03 22:42 ` Michael Niedermayer 2024-05-12 11:04 raduct
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=CAPYw7P5pBy53AqVDHLR6oet7gDNziScMw6otv_ff1MhEC87Jbg@mail.gmail.com \ --to=onemda@gmail.com \ --cc=ffmpeg-devel@ffmpeg.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git