* [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile @ 2024-06-07 14:05 Ramiro Polla 2024-06-07 14:05 ` [FFmpeg-devel] [PATCH 2/4] checkasm: add tests for {lum, chr}ConvertRange Ramiro Polla ` (3 more replies) 0 siblings, 4 replies; 11+ messages in thread From: Ramiro Polla @ 2024-06-07 14:05 UTC (permalink / raw) To: ffmpeg-devel --- tests/checkasm/Makefile | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 6eb94d10d5..3ce152e818 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -63,7 +63,9 @@ AVFILTEROBJS-$(CONFIG_SOBEL_FILTER) += vf_convolution.o CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes) # swscale tests -SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o +SWSCALEOBJS += sw_gbrp.o +SWSCALEOBJS += sw_rgb.o +SWSCALEOBJS += sw_scale.o CHECKASMOBJS-$(CONFIG_SWSCALE) += $(SWSCALEOBJS) -- 2.30.2 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* [FFmpeg-devel] [PATCH 2/4] checkasm: add tests for {lum, chr}ConvertRange 2024-06-07 14:05 [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile Ramiro Polla @ 2024-06-07 14:05 ` Ramiro Polla 2024-06-07 14:05 ` [FFmpeg-devel] [PATCH 3/4] swscale/x86: add sse4 " Ramiro Polla ` (2 subsequent siblings) 3 siblings, 0 replies; 11+ messages in thread From: Ramiro Polla @ 2024-06-07 14:05 UTC (permalink / raw) To: ffmpeg-devel --- tests/checkasm/Makefile | 1 + tests/checkasm/checkasm.c | 1 + tests/checkasm/checkasm.h | 1 + tests/checkasm/sw_range_convert.c | 134 ++++++++++++++++++++++++++++++ 4 files changed, 137 insertions(+) create mode 100644 tests/checkasm/sw_range_convert.c diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 3ce152e818..e4ec6a27ec 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -64,6 +64,7 @@ CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes) # swscale tests SWSCALEOBJS += sw_gbrp.o +SWSCALEOBJS += sw_range_convert.o SWSCALEOBJS += sw_rgb.o SWSCALEOBJS += sw_scale.o diff --git a/tests/checkasm/checkasm.c b/tests/checkasm/checkasm.c index d7aa2a9c09..d2b50c023a 100644 --- a/tests/checkasm/checkasm.c +++ b/tests/checkasm/checkasm.c @@ -248,6 +248,7 @@ static const struct { #endif #if CONFIG_SWSCALE { "sw_gbrp", checkasm_check_sw_gbrp }, + { "sw_range_convert", checkasm_check_sw_range_convert }, { "sw_rgb", checkasm_check_sw_rgb }, { "sw_scale", checkasm_check_sw_scale }, #endif diff --git a/tests/checkasm/checkasm.h b/tests/checkasm/checkasm.h index 211d7f52e6..e544007b67 100644 --- a/tests/checkasm/checkasm.h +++ b/tests/checkasm/checkasm.h @@ -119,6 +119,7 @@ void checkasm_check_rv40dsp(void); void checkasm_check_svq1enc(void); void checkasm_check_synth_filter(void); void checkasm_check_sw_gbrp(void); +void checkasm_check_sw_range_convert(void); void checkasm_check_sw_rgb(void); void checkasm_check_sw_scale(void); void checkasm_check_takdsp(void); diff --git a/tests/checkasm/sw_range_convert.c b/tests/checkasm/sw_range_convert.c new file mode 100644 index 0000000000..6d7e22ad40 --- /dev/null +++ b/tests/checkasm/sw_range_convert.c @@ -0,0 +1,134 @@ +/* + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with FFmpeg; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include <string.h> + +#include "libavutil/common.h" +#include "libavutil/intreadwrite.h" +#include "libavutil/mem.h" +#include "libavutil/mem_internal.h" + +#include "libswscale/swscale.h" +#include "libswscale/swscale_internal.h" + +#include "checkasm.h" + +static void check_lumConvertRange(int from) +{ + const char *func_str = from ? "lumRangeFromJpeg" : "lumRangeToJpeg"; +#define LARGEST_INPUT_SIZE 512 +#define INPUT_SIZES 6 + static const int input_sizes[] = {8, 24, 128, 144, 256, 512}; + struct SwsContext *ctx; + + LOCAL_ALIGNED_32(int16_t, dst0, [LARGEST_INPUT_SIZE]); + LOCAL_ALIGNED_32(int16_t, dst1, [LARGEST_INPUT_SIZE]); + + declare_func(void, int16_t *dst, int width); + + ctx = sws_alloc_context(); + if (sws_init_context(ctx, NULL, NULL) < 0) + fail(); + + ctx->srcFormat = from ? AV_PIX_FMT_YUVJ444P : AV_PIX_FMT_YUV444P; + ctx->dstFormat = from ? AV_PIX_FMT_YUV444P : AV_PIX_FMT_YUVJ444P; + ctx->srcRange = from; + ctx->dstRange = !from; + + for (int dstWi = 0; dstWi < INPUT_SIZES; dstWi++) { + int width = input_sizes[dstWi]; + for (int i = 0; i < width; i++) { + uint8_t r = rnd(); + dst0[i] = (int16_t) r << 7; + dst1[i] = (int16_t) r << 7; + } + ff_sws_init_scale(ctx); + if (check_func(ctx->lumConvertRange, "%s_%d", func_str, width)) { + call_ref(dst0, width); + call_new(dst1, width); + if (memcmp(dst0, dst1, width * sizeof(int16_t))) + fail(); + bench_new(dst1, width); + } + } + + sws_freeContext(ctx); +} +#undef LARGEST_INPUT_SIZE +#undef INPUT_SIZES + +static void check_chrConvertRange(int from) +{ + const char *func_str = from ? "chrRangeFromJpeg" : "chrRangeToJpeg"; +#define LARGEST_INPUT_SIZE 512 +#define INPUT_SIZES 6 + static const int input_sizes[] = {8, 24, 128, 144, 256, 512}; + struct SwsContext *ctx; + + LOCAL_ALIGNED_32(int16_t, dstU0, [LARGEST_INPUT_SIZE]); + LOCAL_ALIGNED_32(int16_t, dstV0, [LARGEST_INPUT_SIZE]); + LOCAL_ALIGNED_32(int16_t, dstU1, [LARGEST_INPUT_SIZE]); + LOCAL_ALIGNED_32(int16_t, dstV1, [LARGEST_INPUT_SIZE]); + + declare_func(void, int16_t *dstU, int16_t *dstV, int width); + + ctx = sws_alloc_context(); + if (sws_init_context(ctx, NULL, NULL) < 0) + fail(); + + ctx->srcFormat = from ? AV_PIX_FMT_YUVJ444P : AV_PIX_FMT_YUV444P; + ctx->dstFormat = from ? AV_PIX_FMT_YUV444P : AV_PIX_FMT_YUVJ444P; + ctx->srcRange = from; + ctx->dstRange = !from; + + for (int dstWi = 0; dstWi < INPUT_SIZES; dstWi++) { + int width = input_sizes[dstWi]; + for (int i = 0; i < width; i++) { + uint8_t r = rnd(); + dstU0[i] = (int16_t) r << 7; + dstV0[i] = (int16_t) r << 7; + dstU1[i] = (int16_t) r << 7; + dstV1[i] = (int16_t) r << 7; + } + ff_sws_init_scale(ctx); + if (check_func(ctx->chrConvertRange, "%s_%d", func_str, width)) { + call_ref(dstU0, dstV0, width); + call_new(dstU1, dstV1, width); + if (memcmp(dstU0, dstU1, width * sizeof(int16_t)) + || memcmp(dstV0, dstV1, width * sizeof(int16_t))) + fail(); + bench_new(dstU1, dstV1, width); + } + } + + sws_freeContext(ctx); +} +#undef LARGEST_INPUT_SIZE +#undef INPUT_SIZES + +void checkasm_check_sw_range_convert(void) +{ + check_lumConvertRange(1); + report("lumRangeFromJpeg"); + check_chrConvertRange(1); + report("chrRangeFromJpeg"); + check_lumConvertRange(0); + report("lumRangeToJpeg"); + check_chrConvertRange(0); + report("chrRangeToJpeg"); +} -- 2.30.2 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* [FFmpeg-devel] [PATCH 3/4] swscale/x86: add sse4 {lum, chr}ConvertRange 2024-06-07 14:05 [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile Ramiro Polla 2024-06-07 14:05 ` [FFmpeg-devel] [PATCH 2/4] checkasm: add tests for {lum, chr}ConvertRange Ramiro Polla @ 2024-06-07 14:05 ` Ramiro Polla 2024-06-07 17:38 ` Ramiro Polla 2024-06-07 14:05 ` [FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add neon " Ramiro Polla 2024-06-07 18:45 ` [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile Andreas Rheinhardt 3 siblings, 1 reply; 11+ messages in thread From: Ramiro Polla @ 2024-06-07 14:05 UTC (permalink / raw) To: ffmpeg-devel chrRangeFromJpeg_8_c: 19.9 chrRangeFromJpeg_8_sse4: 16.2 chrRangeFromJpeg_24_c: 60.7 chrRangeFromJpeg_24_sse4: 28.9 chrRangeFromJpeg_128_c: 325.7 chrRangeFromJpeg_128_sse4: 160.2 chrRangeFromJpeg_144_c: 364.2 chrRangeFromJpeg_144_sse4: 194.9 chrRangeFromJpeg_256_c: 630.7 chrRangeFromJpeg_256_sse4: 337.4 chrRangeFromJpeg_512_c: 1240.4 chrRangeFromJpeg_512_sse4: 668.4 chrRangeToJpeg_8_c: 37.7 chrRangeToJpeg_8_sse4: 19.7 chrRangeToJpeg_24_c: 114.7 chrRangeToJpeg_24_sse4: 30.2 chrRangeToJpeg_128_c: 636.4 chrRangeToJpeg_128_sse4: 161.7 chrRangeToJpeg_144_c: 715.7 chrRangeToJpeg_144_sse4: 272.9 chrRangeToJpeg_256_c: 1256.7 chrRangeToJpeg_256_sse4: 341.9 chrRangeToJpeg_512_c: 2498.7 chrRangeToJpeg_512_sse4: 668.4 lumRangeFromJpeg_8_c: 11.7 lumRangeFromJpeg_8_sse4: 12.4 lumRangeFromJpeg_24_c: 36.9 lumRangeFromJpeg_24_sse4: 17.7 lumRangeFromJpeg_128_c: 228.4 lumRangeFromJpeg_128_sse4: 85.2 lumRangeFromJpeg_144_c: 272.9 lumRangeFromJpeg_144_sse4: 96.9 lumRangeFromJpeg_256_c: 463.4 lumRangeFromJpeg_256_sse4: 183.9 lumRangeFromJpeg_512_c: 879.9 lumRangeFromJpeg_512_sse4: 355.9 lumRangeToJpeg_8_c: 17.7 lumRangeToJpeg_8_sse4: 15.4 lumRangeToJpeg_24_c: 56.2 lumRangeToJpeg_24_sse4: 18.4 lumRangeToJpeg_128_c: 331.4 lumRangeToJpeg_128_sse4: 84.4 lumRangeToJpeg_144_c: 375.2 lumRangeToJpeg_144_sse4: 96.9 lumRangeToJpeg_256_c: 649.7 lumRangeToJpeg_256_sse4: 184.4 lumRangeToJpeg_512_c: 1281.9 lumRangeToJpeg_512_sse4: 355.9 --- libswscale/swscale_internal.h | 1 + libswscale/utils.c | 2 + libswscale/x86/Makefile | 1 + libswscale/x86/range_convert.asm | 100 +++++++++++++++++++++++++++++++ libswscale/x86/swscale.c | 36 +++++++++++ 5 files changed, 140 insertions(+) create mode 100644 libswscale/x86/range_convert.asm diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h index d4b0c3cee2..92f6105443 100644 --- a/libswscale/swscale_internal.h +++ b/libswscale/swscale_internal.h @@ -698,6 +698,7 @@ void ff_updateMMXDitherTables(SwsContext *c, int dstY); av_cold void ff_sws_init_range_convert(SwsContext *c); av_cold void ff_sws_init_range_convert_loongarch(SwsContext *c); +av_cold void ff_sws_init_range_convert_x86(SwsContext *c); SwsFunc ff_yuv2rgb_init_x86(SwsContext *c); SwsFunc ff_yuv2rgb_init_ppc(SwsContext *c); diff --git a/libswscale/utils.c b/libswscale/utils.c index 476a24fea5..8dfa57b5ff 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -1082,6 +1082,8 @@ int sws_setColorspaceDetails(struct SwsContext *c, const int inv_table[4], ff_sws_init_range_convert(c); #if ARCH_LOONGARCH64 ff_sws_init_range_convert_loongarch(c); +#elif ARCH_X86 + ff_sws_init_range_convert_x86(c); #endif } diff --git a/libswscale/x86/Makefile b/libswscale/x86/Makefile index 68391494be..f00154941d 100644 --- a/libswscale/x86/Makefile +++ b/libswscale/x86/Makefile @@ -12,6 +12,7 @@ X86ASM-OBJS += x86/input.o \ x86/output.o \ x86/scale.o \ x86/scale_avx2.o \ + x86/range_convert.o \ x86/rgb_2_rgb.o \ x86/yuv_2_rgb.o \ x86/yuv2yuvX.o \ diff --git a/libswscale/x86/range_convert.asm b/libswscale/x86/range_convert.asm new file mode 100644 index 0000000000..333265fb65 --- /dev/null +++ b/libswscale/x86/range_convert.asm @@ -0,0 +1,100 @@ +;****************************************************************************** +;* Copyright (c) 2024 Ramiro Polla +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;****************************************************************************** + +%include "libavutil/x86/x86util.asm" + +; NOTE: there is no need to clamp the input when converting to jpeg range +; (like we do in the C code) because packssdw will saturate the output. + +;----------------------------------------------------------------------------- +; lumConvertRange +; +; void ff_lumRangeToJpeg_<opt>(int16_t *dst, int width); +; void ff_lumRangeFromJpeg_<opt>(int16_t *dst, int width); +; +;----------------------------------------------------------------------------- + +%macro LUMCONVERTRANGE 4 +SECTION_RODATA +mult_%1: times 4 dd %2 +offset_%1: times 4 dd %3 +SECTION .text +cglobal %1, 2, 3, 3, dst, width, x + movsxdifnidn widthq, widthd + xor xq, xq + mova m1, [mult_%1] + mova m2, [offset_%1] +.loop: + pmovsxwd m0, [dstq+xq*2] + pmulld m0, m1 + paddd m0, m2 + psrad m0, %4 + packssdw m0, m0 + movh [dstq+xq*2], m0 + add xq, mmsize / 4 + cmp xd, widthd + jl .loop + RET +%endmacro + +;----------------------------------------------------------------------------- +; chrConvertRange +; +; void ff_chrRangeToJpeg_<opt>(int16_t *dstU, int16_t *dstV, int width); +; void ff_chrRangeFromJpeg_<opt>(int16_t *dstU, int16_t *dstV, int width); +; +;----------------------------------------------------------------------------- + +%macro CHRCONVERTRANGE 4 +SECTION_RODATA +mult_%1: times 4 dd %2 +offset_%1: times 4 dd %3 +SECTION .text +cglobal %1, 3, 4, 4, dstU, dstV, width, x + movsxdifnidn widthq, widthd + xor xq, xq + mova m1, [mult_%1] + mova m2, [offset_%1] +.loop: + pmovsxwd m0, [dstUq+xq*2] + pmulld m0, m1 + paddd m0, m2 + psrad m0, %4 + packssdw m0, m0 + movh [dstUq+xq*2], m0 + pmovsxwd m0, [dstVq+xq*2] + pmulld m0, m1 + paddd m0, m2 + psrad m0, %4 + packssdw m0, m0 + movh [dstVq+xq*2], m0 + add xq, mmsize / 4 + cmp xd, widthd + jl .loop + RET +%endmacro + +%if ARCH_X86_64 +INIT_XMM sse4 +LUMCONVERTRANGE lumRangeToJpeg, 19077, -39057361, 14 +CHRCONVERTRANGE chrRangeToJpeg, 4663, -9289992, 12 +LUMCONVERTRANGE lumRangeFromJpeg, 14071, 33561947, 14 +CHRCONVERTRANGE chrRangeFromJpeg, 1799, 4081085, 11 +%endif diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index fff8bb4396..c5ddfb5605 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -447,6 +447,38 @@ INPUT_PLANAR_RGB_UV_ALL_DECL(avx2); INPUT_PLANAR_RGB_A_ALL_DECL(avx2); #endif +#if ARCH_X86_64 +#define RANGE_CONVERT_FUNCS(opt) do { \ + if (c->dstBpc <= 14) { \ + if (c->srcRange) { \ + c->lumConvertRange = ff_lumRangeFromJpeg_ ##opt; \ + c->chrConvertRange = ff_chrRangeFromJpeg_ ##opt; \ + } else { \ + c->lumConvertRange = ff_lumRangeToJpeg_ ##opt; \ + c->chrConvertRange = ff_chrRangeToJpeg_ ##opt; \ + } \ + } \ +} while (0) + +#define RANGE_CONVERT_FUNCS_DECL(opt) \ +void ff_lumRangeFromJpeg_ ##opt(int16_t *dst, int width); \ +void ff_chrRangeFromJpeg_ ##opt(int16_t *dstU, int16_t *dstV, int width); \ +void ff_lumRangeToJpeg_ ##opt(int16_t *dst, int width); \ +void ff_chrRangeToJpeg_ ##opt(int16_t *dstU, int16_t *dstV, int width); \ + +RANGE_CONVERT_FUNCS_DECL(sse4); + +av_cold void ff_sws_init_range_convert_x86(SwsContext *c) +{ + if (c->srcRange != c->dstRange && !isAnyRGB(c->dstFormat)) { + int cpu_flags = av_get_cpu_flags(); + if (EXTERNAL_SSE4(cpu_flags)) { + RANGE_CONVERT_FUNCS(sse4); + } + } +} +#endif + av_cold void ff_sws_init_swscale_x86(SwsContext *c) { int cpu_flags = av_get_cpu_flags(); @@ -805,4 +837,8 @@ switch(c->dstBpc){ \ } #endif + +#if ARCH_X86_64 + ff_sws_init_range_convert_x86(c); +#endif } -- 2.30.2 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH 3/4] swscale/x86: add sse4 {lum, chr}ConvertRange 2024-06-07 14:05 ` [FFmpeg-devel] [PATCH 3/4] swscale/x86: add sse4 " Ramiro Polla @ 2024-06-07 17:38 ` Ramiro Polla 0 siblings, 0 replies; 11+ messages in thread From: Ramiro Polla @ 2024-06-07 17:38 UTC (permalink / raw) To: ffmpeg-devel [-- Attachment #1: Type: text/plain, Size: 10195 bytes --] On Fri, Jun 7, 2024 at 4:05 PM Ramiro Polla <ramiro.polla@gmail.com> wrote: > > chrRangeFromJpeg_8_c: 19.9 > chrRangeFromJpeg_8_sse4: 16.2 > chrRangeFromJpeg_24_c: 60.7 > chrRangeFromJpeg_24_sse4: 28.9 > chrRangeFromJpeg_128_c: 325.7 > chrRangeFromJpeg_128_sse4: 160.2 > chrRangeFromJpeg_144_c: 364.2 > chrRangeFromJpeg_144_sse4: 194.9 > chrRangeFromJpeg_256_c: 630.7 > chrRangeFromJpeg_256_sse4: 337.4 > chrRangeFromJpeg_512_c: 1240.4 > chrRangeFromJpeg_512_sse4: 668.4 > chrRangeToJpeg_8_c: 37.7 > chrRangeToJpeg_8_sse4: 19.7 > chrRangeToJpeg_24_c: 114.7 > chrRangeToJpeg_24_sse4: 30.2 > chrRangeToJpeg_128_c: 636.4 > chrRangeToJpeg_128_sse4: 161.7 > chrRangeToJpeg_144_c: 715.7 > chrRangeToJpeg_144_sse4: 272.9 > chrRangeToJpeg_256_c: 1256.7 > chrRangeToJpeg_256_sse4: 341.9 > chrRangeToJpeg_512_c: 2498.7 > chrRangeToJpeg_512_sse4: 668.4 > lumRangeFromJpeg_8_c: 11.7 > lumRangeFromJpeg_8_sse4: 12.4 > lumRangeFromJpeg_24_c: 36.9 > lumRangeFromJpeg_24_sse4: 17.7 > lumRangeFromJpeg_128_c: 228.4 > lumRangeFromJpeg_128_sse4: 85.2 > lumRangeFromJpeg_144_c: 272.9 > lumRangeFromJpeg_144_sse4: 96.9 > lumRangeFromJpeg_256_c: 463.4 > lumRangeFromJpeg_256_sse4: 183.9 > lumRangeFromJpeg_512_c: 879.9 > lumRangeFromJpeg_512_sse4: 355.9 > lumRangeToJpeg_8_c: 17.7 > lumRangeToJpeg_8_sse4: 15.4 > lumRangeToJpeg_24_c: 56.2 > lumRangeToJpeg_24_sse4: 18.4 > lumRangeToJpeg_128_c: 331.4 > lumRangeToJpeg_128_sse4: 84.4 > lumRangeToJpeg_144_c: 375.2 > lumRangeToJpeg_144_sse4: 96.9 > lumRangeToJpeg_256_c: 649.7 > lumRangeToJpeg_256_sse4: 184.4 > lumRangeToJpeg_512_c: 1281.9 > lumRangeToJpeg_512_sse4: 355.9 > --- > libswscale/swscale_internal.h | 1 + > libswscale/utils.c | 2 + > libswscale/x86/Makefile | 1 + > libswscale/x86/range_convert.asm | 100 +++++++++++++++++++++++++++++++ > libswscale/x86/swscale.c | 36 +++++++++++ > 5 files changed, 140 insertions(+) > create mode 100644 libswscale/x86/range_convert.asm > > diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h > index d4b0c3cee2..92f6105443 100644 > --- a/libswscale/swscale_internal.h > +++ b/libswscale/swscale_internal.h > @@ -698,6 +698,7 @@ void ff_updateMMXDitherTables(SwsContext *c, int dstY); > > av_cold void ff_sws_init_range_convert(SwsContext *c); > av_cold void ff_sws_init_range_convert_loongarch(SwsContext *c); > +av_cold void ff_sws_init_range_convert_x86(SwsContext *c); > > SwsFunc ff_yuv2rgb_init_x86(SwsContext *c); > SwsFunc ff_yuv2rgb_init_ppc(SwsContext *c); > diff --git a/libswscale/utils.c b/libswscale/utils.c > index 476a24fea5..8dfa57b5ff 100644 > --- a/libswscale/utils.c > +++ b/libswscale/utils.c > @@ -1082,6 +1082,8 @@ int sws_setColorspaceDetails(struct SwsContext *c, const int inv_table[4], > ff_sws_init_range_convert(c); > #if ARCH_LOONGARCH64 > ff_sws_init_range_convert_loongarch(c); > +#elif ARCH_X86 > + ff_sws_init_range_convert_x86(c); > #endif > } > > diff --git a/libswscale/x86/Makefile b/libswscale/x86/Makefile > index 68391494be..f00154941d 100644 > --- a/libswscale/x86/Makefile > +++ b/libswscale/x86/Makefile > @@ -12,6 +12,7 @@ X86ASM-OBJS += x86/input.o \ > x86/output.o \ > x86/scale.o \ > x86/scale_avx2.o \ > + x86/range_convert.o \ > x86/rgb_2_rgb.o \ > x86/yuv_2_rgb.o \ > x86/yuv2yuvX.o \ > diff --git a/libswscale/x86/range_convert.asm b/libswscale/x86/range_convert.asm > new file mode 100644 > index 0000000000..333265fb65 > --- /dev/null > +++ b/libswscale/x86/range_convert.asm > @@ -0,0 +1,100 @@ > +;****************************************************************************** > +;* Copyright (c) 2024 Ramiro Polla > +;* > +;* This file is part of FFmpeg. > +;* > +;* FFmpeg is free software; you can redistribute it and/or > +;* modify it under the terms of the GNU Lesser General Public > +;* License as published by the Free Software Foundation; either > +;* version 2.1 of the License, or (at your option) any later version. > +;* > +;* FFmpeg is distributed in the hope that it will be useful, > +;* but WITHOUT ANY WARRANTY; without even the implied warranty of > +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +;* Lesser General Public License for more details. > +;* > +;* You should have received a copy of the GNU Lesser General Public > +;* License along with FFmpeg; if not, write to the Free Software > +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > +;****************************************************************************** > + > +%include "libavutil/x86/x86util.asm" > + > +; NOTE: there is no need to clamp the input when converting to jpeg range > +; (like we do in the C code) because packssdw will saturate the output. > + > +;----------------------------------------------------------------------------- > +; lumConvertRange > +; > +; void ff_lumRangeToJpeg_<opt>(int16_t *dst, int width); > +; void ff_lumRangeFromJpeg_<opt>(int16_t *dst, int width); > +; > +;----------------------------------------------------------------------------- > + > +%macro LUMCONVERTRANGE 4 > +SECTION_RODATA > +mult_%1: times 4 dd %2 > +offset_%1: times 4 dd %3 > +SECTION .text > +cglobal %1, 2, 3, 3, dst, width, x > + movsxdifnidn widthq, widthd > + xor xq, xq > + mova m1, [mult_%1] > + mova m2, [offset_%1] > +.loop: > + pmovsxwd m0, [dstq+xq*2] > + pmulld m0, m1 > + paddd m0, m2 > + psrad m0, %4 > + packssdw m0, m0 > + movh [dstq+xq*2], m0 > + add xq, mmsize / 4 > + cmp xd, widthd > + jl .loop > + RET > +%endmacro > + > +;----------------------------------------------------------------------------- > +; chrConvertRange > +; > +; void ff_chrRangeToJpeg_<opt>(int16_t *dstU, int16_t *dstV, int width); > +; void ff_chrRangeFromJpeg_<opt>(int16_t *dstU, int16_t *dstV, int width); > +; > +;----------------------------------------------------------------------------- > + > +%macro CHRCONVERTRANGE 4 > +SECTION_RODATA > +mult_%1: times 4 dd %2 > +offset_%1: times 4 dd %3 > +SECTION .text > +cglobal %1, 3, 4, 4, dstU, dstV, width, x > + movsxdifnidn widthq, widthd > + xor xq, xq > + mova m1, [mult_%1] > + mova m2, [offset_%1] > +.loop: > + pmovsxwd m0, [dstUq+xq*2] > + pmulld m0, m1 > + paddd m0, m2 > + psrad m0, %4 > + packssdw m0, m0 > + movh [dstUq+xq*2], m0 > + pmovsxwd m0, [dstVq+xq*2] > + pmulld m0, m1 > + paddd m0, m2 > + psrad m0, %4 > + packssdw m0, m0 > + movh [dstVq+xq*2], m0 > + add xq, mmsize / 4 > + cmp xd, widthd > + jl .loop > + RET > +%endmacro > + > +%if ARCH_X86_64 > +INIT_XMM sse4 > +LUMCONVERTRANGE lumRangeToJpeg, 19077, -39057361, 14 > +CHRCONVERTRANGE chrRangeToJpeg, 4663, -9289992, 12 > +LUMCONVERTRANGE lumRangeFromJpeg, 14071, 33561947, 14 > +CHRCONVERTRANGE chrRangeFromJpeg, 1799, 4081085, 11 > +%endif > diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c > index fff8bb4396..c5ddfb5605 100644 > --- a/libswscale/x86/swscale.c > +++ b/libswscale/x86/swscale.c > @@ -447,6 +447,38 @@ INPUT_PLANAR_RGB_UV_ALL_DECL(avx2); > INPUT_PLANAR_RGB_A_ALL_DECL(avx2); > #endif > > +#if ARCH_X86_64 > +#define RANGE_CONVERT_FUNCS(opt) do { \ > + if (c->dstBpc <= 14) { \ > + if (c->srcRange) { \ > + c->lumConvertRange = ff_lumRangeFromJpeg_ ##opt; \ > + c->chrConvertRange = ff_chrRangeFromJpeg_ ##opt; \ > + } else { \ > + c->lumConvertRange = ff_lumRangeToJpeg_ ##opt; \ > + c->chrConvertRange = ff_chrRangeToJpeg_ ##opt; \ > + } \ > + } \ > +} while (0) > + > +#define RANGE_CONVERT_FUNCS_DECL(opt) \ > +void ff_lumRangeFromJpeg_ ##opt(int16_t *dst, int width); \ > +void ff_chrRangeFromJpeg_ ##opt(int16_t *dstU, int16_t *dstV, int width); \ > +void ff_lumRangeToJpeg_ ##opt(int16_t *dst, int width); \ > +void ff_chrRangeToJpeg_ ##opt(int16_t *dstU, int16_t *dstV, int width); \ > + > +RANGE_CONVERT_FUNCS_DECL(sse4); > + > +av_cold void ff_sws_init_range_convert_x86(SwsContext *c) > +{ > + if (c->srcRange != c->dstRange && !isAnyRGB(c->dstFormat)) { > + int cpu_flags = av_get_cpu_flags(); > + if (EXTERNAL_SSE4(cpu_flags)) { > + RANGE_CONVERT_FUNCS(sse4); > + } > + } > +} > +#endif > + > av_cold void ff_sws_init_swscale_x86(SwsContext *c) > { > int cpu_flags = av_get_cpu_flags(); > @@ -805,4 +837,8 @@ switch(c->dstBpc){ \ > } > > #endif > + > +#if ARCH_X86_64 > + ff_sws_init_range_convert_x86(c); > +#endif > } > -- > 2.30.2 > Attached version is a little bit different, moving the consts out of the macro (so they can be reused by avx2) and processing twice the amount of data per loop. [-- Attachment #2: 0001-swscale-x86-add-sse4-lum-chr-ConvertRange.patch --] [-- Type: text/x-patch, Size: 10363 bytes --] From b8f72b1c4c8393becea9962378af6d7dffabbce2 Mon Sep 17 00:00:00 2001 From: Ramiro Polla <ramiro.polla@gmail.com> Date: Thu, 6 Jun 2024 18:33:34 +0200 Subject: [PATCH] swscale/x86: add sse4 {lum,chr}ConvertRange chrRangeFromJpeg_8_c: 19.9 chrRangeFromJpeg_8_sse4: 16.2 chrRangeFromJpeg_24_c: 60.7 chrRangeFromJpeg_24_sse4: 28.9 chrRangeFromJpeg_128_c: 325.7 chrRangeFromJpeg_128_sse4: 160.2 chrRangeFromJpeg_144_c: 364.2 chrRangeFromJpeg_144_sse4: 194.9 chrRangeFromJpeg_256_c: 630.7 chrRangeFromJpeg_256_sse4: 337.4 chrRangeFromJpeg_512_c: 1240.4 chrRangeFromJpeg_512_sse4: 668.4 chrRangeToJpeg_8_c: 37.7 chrRangeToJpeg_8_sse4: 19.7 chrRangeToJpeg_24_c: 114.7 chrRangeToJpeg_24_sse4: 30.2 chrRangeToJpeg_128_c: 636.4 chrRangeToJpeg_128_sse4: 161.7 chrRangeToJpeg_144_c: 715.7 chrRangeToJpeg_144_sse4: 272.9 chrRangeToJpeg_256_c: 1256.7 chrRangeToJpeg_256_sse4: 341.9 chrRangeToJpeg_512_c: 2498.7 chrRangeToJpeg_512_sse4: 668.4 lumRangeFromJpeg_8_c: 11.7 lumRangeFromJpeg_8_sse4: 12.4 lumRangeFromJpeg_24_c: 36.9 lumRangeFromJpeg_24_sse4: 17.7 lumRangeFromJpeg_128_c: 228.4 lumRangeFromJpeg_128_sse4: 85.2 lumRangeFromJpeg_144_c: 272.9 lumRangeFromJpeg_144_sse4: 96.9 lumRangeFromJpeg_256_c: 463.4 lumRangeFromJpeg_256_sse4: 183.9 lumRangeFromJpeg_512_c: 879.9 lumRangeFromJpeg_512_sse4: 355.9 lumRangeToJpeg_8_c: 17.7 lumRangeToJpeg_8_sse4: 15.4 lumRangeToJpeg_24_c: 56.2 lumRangeToJpeg_24_sse4: 18.4 lumRangeToJpeg_128_c: 331.4 lumRangeToJpeg_128_sse4: 84.4 lumRangeToJpeg_144_c: 375.2 lumRangeToJpeg_144_sse4: 96.9 lumRangeToJpeg_256_c: 649.7 lumRangeToJpeg_256_sse4: 184.4 lumRangeToJpeg_512_c: 1281.9 lumRangeToJpeg_512_sse4: 355.9 --- libswscale/swscale_internal.h | 1 + libswscale/utils.c | 2 + libswscale/x86/Makefile | 1 + libswscale/x86/range_convert.asm | 130 +++++++++++++++++++++++++++++++ libswscale/x86/swscale.c | 36 +++++++++ 5 files changed, 170 insertions(+) create mode 100644 libswscale/x86/range_convert.asm diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h index d4b0c3cee2..92f6105443 100644 --- a/libswscale/swscale_internal.h +++ b/libswscale/swscale_internal.h @@ -698,6 +698,7 @@ void ff_updateMMXDitherTables(SwsContext *c, int dstY); av_cold void ff_sws_init_range_convert(SwsContext *c); av_cold void ff_sws_init_range_convert_loongarch(SwsContext *c); +av_cold void ff_sws_init_range_convert_x86(SwsContext *c); SwsFunc ff_yuv2rgb_init_x86(SwsContext *c); SwsFunc ff_yuv2rgb_init_ppc(SwsContext *c); diff --git a/libswscale/utils.c b/libswscale/utils.c index 476a24fea5..8dfa57b5ff 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -1082,6 +1082,8 @@ int sws_setColorspaceDetails(struct SwsContext *c, const int inv_table[4], ff_sws_init_range_convert(c); #if ARCH_LOONGARCH64 ff_sws_init_range_convert_loongarch(c); +#elif ARCH_X86 + ff_sws_init_range_convert_x86(c); #endif } diff --git a/libswscale/x86/Makefile b/libswscale/x86/Makefile index 68391494be..f00154941d 100644 --- a/libswscale/x86/Makefile +++ b/libswscale/x86/Makefile @@ -12,6 +12,7 @@ X86ASM-OBJS += x86/input.o \ x86/output.o \ x86/scale.o \ x86/scale_avx2.o \ + x86/range_convert.o \ x86/rgb_2_rgb.o \ x86/yuv_2_rgb.o \ x86/yuv2yuvX.o \ diff --git a/libswscale/x86/range_convert.asm b/libswscale/x86/range_convert.asm new file mode 100644 index 0000000000..13983a386b --- /dev/null +++ b/libswscale/x86/range_convert.asm @@ -0,0 +1,130 @@ +;****************************************************************************** +;* Copyright (c) 2024 Ramiro Polla +;* +;* This file is part of FFmpeg. +;* +;* FFmpeg is free software; you can redistribute it and/or +;* modify it under the terms of the GNU Lesser General Public +;* License as published by the Free Software Foundation; either +;* version 2.1 of the License, or (at your option) any later version. +;* +;* FFmpeg is distributed in the hope that it will be useful, +;* but WITHOUT ANY WARRANTY; without even the implied warranty of +;* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;* Lesser General Public License for more details. +;* +;* You should have received a copy of the GNU Lesser General Public +;* License along with FFmpeg; if not, write to the Free Software +;* Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA +;****************************************************************************** + +%include "libavutil/x86/x86util.asm" + +SECTION_RODATA + +chr_to_mult: times 4 dd 4663 +chr_to_offset: times 4 dd -9289992 +%define chr_to_shift 12 + +chr_from_mult: times 4 dd 1799 +chr_from_offset: times 4 dd 4081085 +%define chr_from_shift 11 + +lum_to_mult: times 4 dd 19077 +lum_to_offset: times 4 dd -39057361 +%define lum_to_shift 14 + +lum_from_mult: times 4 dd 14071 +lum_from_offset: times 4 dd 33561947 +%define lum_from_shift 14 + +SECTION .text + +; NOTE: there is no need to clamp the input when converting to jpeg range +; (like we do in the C code) because packssdw will saturate the output. + +;----------------------------------------------------------------------------- +; lumConvertRange +; +; void ff_lumRangeToJpeg_<opt>(int16_t *dst, int width); +; void ff_lumRangeFromJpeg_<opt>(int16_t *dst, int width); +; +;----------------------------------------------------------------------------- + +%macro LUMCONVERTRANGE 4 +cglobal %1, 2, 3, 3, dst, width, x + movsxdifnidn widthq, widthd + xor xq, xq + mova m4, [%2] + mova m5, [%3] +.loop: + pmovsxwd m0, [dstq+xq*2] + pmovsxwd m1, [dstq+xq*2+mmsize/2] + pmulld m0, m4 + pmulld m1, m4 + paddd m0, m5 + paddd m1, m5 + psrad m0, %4 + psrad m1, %4 + packssdw m0, m0 + packssdw m1, m1 + movq [dstq+xq*2], m0 + movq [dstq+xq*2+mmsize/2], m1 + add xq, mmsize / 2 + cmp xd, widthd + jl .loop + RET +%endmacro + +;----------------------------------------------------------------------------- +; chrConvertRange +; +; void ff_chrRangeToJpeg_<opt>(int16_t *dstU, int16_t *dstV, int width); +; void ff_chrRangeFromJpeg_<opt>(int16_t *dstU, int16_t *dstV, int width); +; +;----------------------------------------------------------------------------- + +%macro CHRCONVERTRANGE 4 +cglobal %1, 3, 4, 4, dstU, dstV, width, x + movsxdifnidn widthq, widthd + xor xq, xq + mova m4, [%2] + mova m5, [%3] +.loop: + pmovsxwd m0, [dstUq+xq*2] + pmovsxwd m1, [dstUq+xq*2+mmsize/2] + pmovsxwd m2, [dstVq+xq*2] + pmovsxwd m3, [dstVq+xq*2+mmsize/2] + pmulld m0, m4 + pmulld m1, m4 + pmulld m2, m4 + pmulld m3, m4 + paddd m0, m5 + paddd m1, m5 + paddd m2, m5 + paddd m3, m5 + psrad m0, %4 + psrad m1, %4 + psrad m2, %4 + psrad m3, %4 + packssdw m0, m0 + packssdw m1, m1 + packssdw m2, m2 + packssdw m3, m3 + movq [dstUq+xq*2], m0 + movq [dstUq+xq*2+mmsize/2], m1 + movq [dstVq+xq*2], m2 + movq [dstVq+xq*2+mmsize/2], m3 + add xq, mmsize / 2 + cmp xd, widthd + jl .loop + RET +%endmacro + +%if ARCH_X86_64 +INIT_XMM sse4 +LUMCONVERTRANGE lumRangeToJpeg, lum_to_mult, lum_to_offset, lum_to_shift +CHRCONVERTRANGE chrRangeToJpeg, chr_to_mult, chr_to_offset, chr_to_shift +LUMCONVERTRANGE lumRangeFromJpeg, lum_from_mult, lum_from_offset, lum_from_shift +CHRCONVERTRANGE chrRangeFromJpeg, chr_from_mult, chr_from_offset, chr_from_shift +%endif diff --git a/libswscale/x86/swscale.c b/libswscale/x86/swscale.c index fff8bb4396..c5ddfb5605 100644 --- a/libswscale/x86/swscale.c +++ b/libswscale/x86/swscale.c @@ -447,6 +447,38 @@ INPUT_PLANAR_RGB_UV_ALL_DECL(avx2); INPUT_PLANAR_RGB_A_ALL_DECL(avx2); #endif +#if ARCH_X86_64 +#define RANGE_CONVERT_FUNCS(opt) do { \ + if (c->dstBpc <= 14) { \ + if (c->srcRange) { \ + c->lumConvertRange = ff_lumRangeFromJpeg_ ##opt; \ + c->chrConvertRange = ff_chrRangeFromJpeg_ ##opt; \ + } else { \ + c->lumConvertRange = ff_lumRangeToJpeg_ ##opt; \ + c->chrConvertRange = ff_chrRangeToJpeg_ ##opt; \ + } \ + } \ +} while (0) + +#define RANGE_CONVERT_FUNCS_DECL(opt) \ +void ff_lumRangeFromJpeg_ ##opt(int16_t *dst, int width); \ +void ff_chrRangeFromJpeg_ ##opt(int16_t *dstU, int16_t *dstV, int width); \ +void ff_lumRangeToJpeg_ ##opt(int16_t *dst, int width); \ +void ff_chrRangeToJpeg_ ##opt(int16_t *dstU, int16_t *dstV, int width); \ + +RANGE_CONVERT_FUNCS_DECL(sse4); + +av_cold void ff_sws_init_range_convert_x86(SwsContext *c) +{ + if (c->srcRange != c->dstRange && !isAnyRGB(c->dstFormat)) { + int cpu_flags = av_get_cpu_flags(); + if (EXTERNAL_SSE4(cpu_flags)) { + RANGE_CONVERT_FUNCS(sse4); + } + } +} +#endif + av_cold void ff_sws_init_swscale_x86(SwsContext *c) { int cpu_flags = av_get_cpu_flags(); @@ -805,4 +837,8 @@ switch(c->dstBpc){ \ } #endif + +#if ARCH_X86_64 + ff_sws_init_range_convert_x86(c); +#endif } -- 2.30.2 [-- Attachment #3: Type: text/plain, Size: 251 bytes --] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* [FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add neon {lum, chr}ConvertRange 2024-06-07 14:05 [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile Ramiro Polla 2024-06-07 14:05 ` [FFmpeg-devel] [PATCH 2/4] checkasm: add tests for {lum, chr}ConvertRange Ramiro Polla 2024-06-07 14:05 ` [FFmpeg-devel] [PATCH 3/4] swscale/x86: add sse4 " Ramiro Polla @ 2024-06-07 14:05 ` Ramiro Polla 2024-06-10 11:56 ` Martin Storsjö 2024-06-07 18:45 ` [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile Andreas Rheinhardt 3 siblings, 1 reply; 11+ messages in thread From: Ramiro Polla @ 2024-06-07 14:05 UTC (permalink / raw) To: ffmpeg-devel chrRangeFromJpeg_8_c: 28.5 chrRangeFromJpeg_8_neon: 21.2 chrRangeFromJpeg_24_c: 81.2 chrRangeFromJpeg_24_neon: 34.7 chrRangeFromJpeg_128_c: 425.2 chrRangeFromJpeg_128_neon: 162.0 chrRangeFromJpeg_144_c: 480.2 chrRangeFromJpeg_144_neon: 180.2 chrRangeFromJpeg_256_c: 838.2 chrRangeFromJpeg_256_neon: 318.0 chrRangeFromJpeg_512_c: 1698.2 chrRangeFromJpeg_512_neon: 630.0 chrRangeToJpeg_8_c: 56.0 chrRangeToJpeg_8_neon: 23.5 chrRangeToJpeg_24_c: 147.7 chrRangeToJpeg_24_neon: 38.2 chrRangeToJpeg_128_c: 760.2 chrRangeToJpeg_128_neon: 182.5 chrRangeToJpeg_144_c: 857.7 chrRangeToJpeg_144_neon: 204.5 chrRangeToJpeg_256_c: 1504.2 chrRangeToJpeg_256_neon: 358.5 chrRangeToJpeg_512_c: 3025.7 chrRangeToJpeg_512_neon: 710.5 lumRangeFromJpeg_8_c: 24.0 lumRangeFromJpeg_8_neon: 18.2 lumRangeFromJpeg_24_c: 64.0 lumRangeFromJpeg_24_neon: 22.2 lumRangeFromJpeg_128_c: 289.2 lumRangeFromJpeg_128_neon: 79.2 lumRangeFromJpeg_144_c: 334.7 lumRangeFromJpeg_144_neon: 87.7 lumRangeFromJpeg_256_c: 579.5 lumRangeFromJpeg_256_neon: 152.0 lumRangeFromJpeg_512_c: 1208.0 lumRangeFromJpeg_512_neon: 299.0 lumRangeToJpeg_8_c: 30.0 lumRangeToJpeg_8_neon: 19.0 lumRangeToJpeg_24_c: 82.2 lumRangeToJpeg_24_neon: 24.0 lumRangeToJpeg_128_c: 440.7 lumRangeToJpeg_128_neon: 90.5 lumRangeToJpeg_144_c: 502.0 lumRangeToJpeg_144_neon: 102.2 lumRangeToJpeg_256_c: 893.7 lumRangeToJpeg_256_neon: 178.0 lumRangeToJpeg_512_c: 1793.7 lumRangeToJpeg_512_neon: 355.0 --- libswscale/aarch64/Makefile | 1 + libswscale/aarch64/range_convert_neon.S | 103 ++++++++++++++++++++++++ libswscale/aarch64/swscale.c | 21 +++++ libswscale/swscale_internal.h | 1 + libswscale/utils.c | 4 +- 5 files changed, 129 insertions(+), 1 deletion(-) create mode 100644 libswscale/aarch64/range_convert_neon.S diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile index da1d909561..6923827f82 100644 --- a/libswscale/aarch64/Makefile +++ b/libswscale/aarch64/Makefile @@ -4,5 +4,6 @@ OBJS += aarch64/rgb2rgb.o \ NEON-OBJS += aarch64/hscale.o \ aarch64/output.o \ + aarch64/range_convert_neon.o \ aarch64/rgb2rgb_neon.o \ aarch64/yuv2rgb_neon.o \ diff --git a/libswscale/aarch64/range_convert_neon.S b/libswscale/aarch64/range_convert_neon.S new file mode 100644 index 0000000000..5e104971f0 --- /dev/null +++ b/libswscale/aarch64/range_convert_neon.S @@ -0,0 +1,103 @@ +/* + * Copyright (c) 2024 Ramiro Polla + * + * This file is part of FFmpeg. + * + * FFmpeg is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public + * License as published by the Free Software Foundation; either + * version 2.1 of the License, or (at your option) any later version. + * + * FFmpeg is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with FFmpeg; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + */ + +#include "libavutil/aarch64/asm.S" + +.macro lumConvertRange name max mult offset shift +const offset_\name, align=4 + .word \offset, \offset, \offset, \offset +endconst +function ff_\name, export=1 +.if \max != 0 + mov w3, #\max + dup v24.8h, w3 +.endif + mov w3, #\mult + dup v25.4s, w3 + movrel x3, offset_\name + ld1 {v26.4s}, [x3] +1: + ld1 {v0.8h}, [x0] +.if \max != 0 + smin v0.8h, v0.8h, v24.8h +.endif + mov v16.16b, v26.16b + mov v18.16b, v26.16b + sxtl v20.4s, v0.4h + sxtl2 v22.4s, v0.8h + mla v16.4s, v20.4s, v25.4s + mla v18.4s, v22.4s, v25.4s + shrn v0.4h, v16.4s, #\shift + shrn2 v0.8h, v18.4s, #\shift + subs w1, w1, #8 + st1 {v0.8h}, [x0], #16 + b.gt 1b + ret +endfunc +.endm + +.macro chrConvertRange name max mult offset shift +const offset_\name, align=4 + .word \offset, \offset, \offset, \offset +endconst +function ff_\name, export=1 +.if \max != 0 + mov w3, #\max + dup v24.8h, w3 +.endif + mov w3, #\mult + dup v25.4s, w3 + movrel x3, offset_\name + ld1 {v26.4s}, [x3] +1: + ld1 {v0.8h}, [x0] + ld1 {v1.8h}, [x1] +.if \max != 0 + smin v0.8h, v0.8h, v24.8h + smin v1.8h, v1.8h, v24.8h +.endif + mov v16.16b, v26.16b + mov v17.16b, v26.16b + mov v18.16b, v26.16b + mov v19.16b, v26.16b + sxtl v20.4s, v0.4h + sxtl v21.4s, v1.4h + sxtl2 v22.4s, v0.8h + sxtl2 v23.4s, v1.8h + mla v16.4s, v20.4s, v25.4s + mla v17.4s, v21.4s, v25.4s + mla v18.4s, v22.4s, v25.4s + mla v19.4s, v23.4s, v25.4s + shrn v0.4h, v16.4s, #\shift + shrn v1.4h, v17.4s, #\shift + shrn2 v0.8h, v18.4s, #\shift + shrn2 v1.8h, v19.4s, #\shift + subs w2, w2, #8 + st1 {v0.8h}, [x0], #16 + st1 {v1.8h}, [x1], #16 + b.gt 1b + ret +endfunc +.endm + +lumConvertRange lumRangeToJpeg_neon, 30189, 19077, -39057361, 14 +chrConvertRange chrRangeToJpeg_neon, 30775, 4663, -9289992, 12 +lumConvertRange lumRangeFromJpeg_neon, 0, 14071, 33561947, 14 +chrConvertRange chrRangeFromJpeg_neon, 0, 1799, 4081085, 11 diff --git a/libswscale/aarch64/swscale.c b/libswscale/aarch64/swscale.c index bbd9719a44..7344f75b2e 100644 --- a/libswscale/aarch64/swscale.c +++ b/libswscale/aarch64/swscale.c @@ -201,6 +201,26 @@ void ff_yuv2plane1_8_neon( default: break; \ } +void ff_lumRangeFromJpeg_neon(int16_t *dst, int width); +void ff_chrRangeFromJpeg_neon(int16_t *dstU, int16_t *dstV, int width); +void ff_lumRangeToJpeg_neon(int16_t *dst, int width); +void ff_chrRangeToJpeg_neon(int16_t *dstU, int16_t *dstV, int width); + +av_cold void ff_sws_init_range_convert_aarch64(SwsContext *c) +{ + if (c->srcRange != c->dstRange && !isAnyRGB(c->dstFormat)) { + if (c->dstBpc <= 14) { + if (c->srcRange) { + c->lumConvertRange = ff_lumRangeFromJpeg_neon; + c->chrConvertRange = ff_chrRangeFromJpeg_neon; + } else { + c->lumConvertRange = ff_lumRangeToJpeg_neon; + c->chrConvertRange = ff_chrRangeToJpeg_neon; + } + } + } +} + av_cold void ff_sws_init_swscale_aarch64(SwsContext *c) { int cpu_flags = av_get_cpu_flags(); @@ -212,5 +232,6 @@ av_cold void ff_sws_init_swscale_aarch64(SwsContext *c) if (c->dstBpc == 8) { c->yuv2planeX = ff_yuv2planeX_8_neon; } + ff_sws_init_range_convert_aarch64(c); } } diff --git a/libswscale/swscale_internal.h b/libswscale/swscale_internal.h index 92f6105443..1059f8a6de 100644 --- a/libswscale/swscale_internal.h +++ b/libswscale/swscale_internal.h @@ -697,6 +697,7 @@ void ff_yuv2rgb_init_tables_ppc(SwsContext *c, const int inv_table[4], void ff_updateMMXDitherTables(SwsContext *c, int dstY); av_cold void ff_sws_init_range_convert(SwsContext *c); +av_cold void ff_sws_init_range_convert_aarch64(SwsContext *c); av_cold void ff_sws_init_range_convert_loongarch(SwsContext *c); av_cold void ff_sws_init_range_convert_x86(SwsContext *c); diff --git a/libswscale/utils.c b/libswscale/utils.c index 8dfa57b5ff..12dba712c1 100644 --- a/libswscale/utils.c +++ b/libswscale/utils.c @@ -1080,7 +1080,9 @@ int sws_setColorspaceDetails(struct SwsContext *c, const int inv_table[4], if (need_reinit) { ff_sws_init_range_convert(c); -#if ARCH_LOONGARCH64 +#if ARCH_AARCH64 + ff_sws_init_range_convert_aarch64(c); +#elif ARCH_LOONGARCH64 ff_sws_init_range_convert_loongarch(c); #elif ARCH_X86 ff_sws_init_range_convert_x86(c); -- 2.30.2 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add neon {lum, chr}ConvertRange 2024-06-07 14:05 ` [FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add neon " Ramiro Polla @ 2024-06-10 11:56 ` Martin Storsjö 2024-06-11 12:33 ` Ramiro Polla 0 siblings, 1 reply; 11+ messages in thread From: Martin Storsjö @ 2024-06-10 11:56 UTC (permalink / raw) To: FFmpeg development discussions and patches On Fri, 7 Jun 2024, Ramiro Polla wrote: > chrRangeFromJpeg_8_c: 28.5 > chrRangeFromJpeg_8_neon: 21.2 > chrRangeFromJpeg_24_c: 81.2 > chrRangeFromJpeg_24_neon: 34.7 > chrRangeFromJpeg_128_c: 425.2 > chrRangeFromJpeg_128_neon: 162.0 > chrRangeFromJpeg_144_c: 480.2 > chrRangeFromJpeg_144_neon: 180.2 > chrRangeFromJpeg_256_c: 838.2 > chrRangeFromJpeg_256_neon: 318.0 > chrRangeFromJpeg_512_c: 1698.2 > chrRangeFromJpeg_512_neon: 630.0 > chrRangeToJpeg_8_c: 56.0 > chrRangeToJpeg_8_neon: 23.5 > chrRangeToJpeg_24_c: 147.7 > chrRangeToJpeg_24_neon: 38.2 > chrRangeToJpeg_128_c: 760.2 > chrRangeToJpeg_128_neon: 182.5 > chrRangeToJpeg_144_c: 857.7 > chrRangeToJpeg_144_neon: 204.5 > chrRangeToJpeg_256_c: 1504.2 > chrRangeToJpeg_256_neon: 358.5 > chrRangeToJpeg_512_c: 3025.7 > chrRangeToJpeg_512_neon: 710.5 > lumRangeFromJpeg_8_c: 24.0 > lumRangeFromJpeg_8_neon: 18.2 > lumRangeFromJpeg_24_c: 64.0 > lumRangeFromJpeg_24_neon: 22.2 > lumRangeFromJpeg_128_c: 289.2 > lumRangeFromJpeg_128_neon: 79.2 > lumRangeFromJpeg_144_c: 334.7 > lumRangeFromJpeg_144_neon: 87.7 > lumRangeFromJpeg_256_c: 579.5 > lumRangeFromJpeg_256_neon: 152.0 > lumRangeFromJpeg_512_c: 1208.0 > lumRangeFromJpeg_512_neon: 299.0 > lumRangeToJpeg_8_c: 30.0 > lumRangeToJpeg_8_neon: 19.0 > lumRangeToJpeg_24_c: 82.2 > lumRangeToJpeg_24_neon: 24.0 > lumRangeToJpeg_128_c: 440.7 > lumRangeToJpeg_128_neon: 90.5 > lumRangeToJpeg_144_c: 502.0 > lumRangeToJpeg_144_neon: 102.2 > lumRangeToJpeg_256_c: 893.7 > lumRangeToJpeg_256_neon: 178.0 > lumRangeToJpeg_512_c: 1793.7 > lumRangeToJpeg_512_neon: 355.0 > --- > libswscale/aarch64/Makefile | 1 + > libswscale/aarch64/range_convert_neon.S | 103 ++++++++++++++++++++++++ > libswscale/aarch64/swscale.c | 21 +++++ > libswscale/swscale_internal.h | 1 + > libswscale/utils.c | 4 +- > 5 files changed, 129 insertions(+), 1 deletion(-) > create mode 100644 libswscale/aarch64/range_convert_neon.S > > diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile > index da1d909561..6923827f82 100644 > --- a/libswscale/aarch64/Makefile > +++ b/libswscale/aarch64/Makefile > @@ -4,5 +4,6 @@ OBJS += aarch64/rgb2rgb.o \ > > NEON-OBJS += aarch64/hscale.o \ > aarch64/output.o \ > + aarch64/range_convert_neon.o \ > aarch64/rgb2rgb_neon.o \ > aarch64/yuv2rgb_neon.o \ > diff --git a/libswscale/aarch64/range_convert_neon.S b/libswscale/aarch64/range_convert_neon.S > new file mode 100644 > index 0000000000..5e104971f0 > --- /dev/null > +++ b/libswscale/aarch64/range_convert_neon.S > @@ -0,0 +1,103 @@ > +/* > + * Copyright (c) 2024 Ramiro Polla > + * > + * This file is part of FFmpeg. > + * > + * FFmpeg is free software; you can redistribute it and/or > + * modify it under the terms of the GNU Lesser General Public > + * License as published by the Free Software Foundation; either > + * version 2.1 of the License, or (at your option) any later version. > + * > + * FFmpeg is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + * Lesser General Public License for more details. > + * > + * You should have received a copy of the GNU Lesser General Public > + * License along with FFmpeg; if not, write to the Free Software > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + */ > + > +#include "libavutil/aarch64/asm.S" > + > +.macro lumConvertRange name max mult offset shift We usually use commas between the macro arguments here. Apparently it doesn't make any difference for any of the tools we support, but it would be nice for consistency. (When invoking macros, commas between arguments are optional for most platforms, but not when targeting Apple platforms, so being strict with consistent use of commas is generally good.) > +const offset_\name, align=4 > + .word \offset, \offset, \offset, \offset > +endconst > +function ff_\name, export=1 > +.if \max != 0 > + mov w3, #\max > + dup v24.8h, w3 > +.endif > + mov w3, #\mult > + dup v25.4s, w3 > + movrel x3, offset_\name > + ld1 {v26.4s}, [x3] FWIW, I did see that you were recommended this form, over ld1r, based on some microarchitectural performance numbers. However in our preexisting assembly, manually pre-splatting vectors like this is unusual I would say. I don't have a strong opinion on the matter though. Anyway, the assembly looks reasonable to me. // Martin _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add neon {lum, chr}ConvertRange 2024-06-10 11:56 ` Martin Storsjö @ 2024-06-11 12:33 ` Ramiro Polla 0 siblings, 0 replies; 11+ messages in thread From: Ramiro Polla @ 2024-06-11 12:33 UTC (permalink / raw) To: FFmpeg development discussions and patches On Mon, Jun 10, 2024 at 1:56 PM Martin Storsjö <martin@martin.st> wrote: > On Fri, 7 Jun 2024, Ramiro Polla wrote: > > > chrRangeFromJpeg_8_c: 28.5 > > chrRangeFromJpeg_8_neon: 21.2 > > chrRangeFromJpeg_24_c: 81.2 > > chrRangeFromJpeg_24_neon: 34.7 > > chrRangeFromJpeg_128_c: 425.2 > > chrRangeFromJpeg_128_neon: 162.0 > > chrRangeFromJpeg_144_c: 480.2 > > chrRangeFromJpeg_144_neon: 180.2 > > chrRangeFromJpeg_256_c: 838.2 > > chrRangeFromJpeg_256_neon: 318.0 > > chrRangeFromJpeg_512_c: 1698.2 > > chrRangeFromJpeg_512_neon: 630.0 > > chrRangeToJpeg_8_c: 56.0 > > chrRangeToJpeg_8_neon: 23.5 > > chrRangeToJpeg_24_c: 147.7 > > chrRangeToJpeg_24_neon: 38.2 > > chrRangeToJpeg_128_c: 760.2 > > chrRangeToJpeg_128_neon: 182.5 > > chrRangeToJpeg_144_c: 857.7 > > chrRangeToJpeg_144_neon: 204.5 > > chrRangeToJpeg_256_c: 1504.2 > > chrRangeToJpeg_256_neon: 358.5 > > chrRangeToJpeg_512_c: 3025.7 > > chrRangeToJpeg_512_neon: 710.5 > > lumRangeFromJpeg_8_c: 24.0 > > lumRangeFromJpeg_8_neon: 18.2 > > lumRangeFromJpeg_24_c: 64.0 > > lumRangeFromJpeg_24_neon: 22.2 > > lumRangeFromJpeg_128_c: 289.2 > > lumRangeFromJpeg_128_neon: 79.2 > > lumRangeFromJpeg_144_c: 334.7 > > lumRangeFromJpeg_144_neon: 87.7 > > lumRangeFromJpeg_256_c: 579.5 > > lumRangeFromJpeg_256_neon: 152.0 > > lumRangeFromJpeg_512_c: 1208.0 > > lumRangeFromJpeg_512_neon: 299.0 > > lumRangeToJpeg_8_c: 30.0 > > lumRangeToJpeg_8_neon: 19.0 > > lumRangeToJpeg_24_c: 82.2 > > lumRangeToJpeg_24_neon: 24.0 > > lumRangeToJpeg_128_c: 440.7 > > lumRangeToJpeg_128_neon: 90.5 > > lumRangeToJpeg_144_c: 502.0 > > lumRangeToJpeg_144_neon: 102.2 > > lumRangeToJpeg_256_c: 893.7 > > lumRangeToJpeg_256_neon: 178.0 > > lumRangeToJpeg_512_c: 1793.7 > > lumRangeToJpeg_512_neon: 355.0 > > --- > > libswscale/aarch64/Makefile | 1 + > > libswscale/aarch64/range_convert_neon.S | 103 ++++++++++++++++++++++++ > > libswscale/aarch64/swscale.c | 21 +++++ > > libswscale/swscale_internal.h | 1 + > > libswscale/utils.c | 4 +- > > 5 files changed, 129 insertions(+), 1 deletion(-) > > create mode 100644 libswscale/aarch64/range_convert_neon.S > > > > diff --git a/libswscale/aarch64/Makefile b/libswscale/aarch64/Makefile > > index da1d909561..6923827f82 100644 > > --- a/libswscale/aarch64/Makefile > > +++ b/libswscale/aarch64/Makefile > > @@ -4,5 +4,6 @@ OBJS += aarch64/rgb2rgb.o \ > > > > NEON-OBJS += aarch64/hscale.o \ > > aarch64/output.o \ > > + aarch64/range_convert_neon.o \ > > aarch64/rgb2rgb_neon.o \ > > aarch64/yuv2rgb_neon.o \ > > diff --git a/libswscale/aarch64/range_convert_neon.S b/libswscale/aarch64/range_convert_neon.S > > new file mode 100644 > > index 0000000000..5e104971f0 > > --- /dev/null > > +++ b/libswscale/aarch64/range_convert_neon.S > > @@ -0,0 +1,103 @@ > > +/* > > + * Copyright (c) 2024 Ramiro Polla > > + * > > + * This file is part of FFmpeg. > > + * > > + * FFmpeg is free software; you can redistribute it and/or > > + * modify it under the terms of the GNU Lesser General Public > > + * License as published by the Free Software Foundation; either > > + * version 2.1 of the License, or (at your option) any later version. > > + * > > + * FFmpeg is distributed in the hope that it will be useful, > > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > > + * Lesser General Public License for more details. > > + * > > + * You should have received a copy of the GNU Lesser General Public > > + * License along with FFmpeg; if not, write to the Free Software > > + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > > + */ > > + > > +#include "libavutil/aarch64/asm.S" > > + > > +.macro lumConvertRange name max mult offset shift > > We usually use commas between the macro arguments here. Apparently it > doesn't make any difference for any of the tools we support, but it would > be nice for consistency. (When invoking macros, commas between arguments > are optional for most platforms, but not when targeting Apple platforms, > so being strict with consistent use of commas is generally good.) Fixed in the new patchset. > > +const offset_\name, align=4 > > + .word \offset, \offset, \offset, \offset > > +endconst > > +function ff_\name, export=1 > > +.if \max != 0 > > + mov w3, #\max > > + dup v24.8h, w3 > > +.endif > > + mov w3, #\mult > > + dup v25.4s, w3 > > + movrel x3, offset_\name > > + ld1 {v26.4s}, [x3] > > FWIW, I did see that you were recommended this form, over ld1r, based on > some microarchitectural performance numbers. However in our preexisting > assembly, manually pre-splatting vectors like this is unusual I would say. > I don't have a strong opinion on the matter though. > > Anyway, the assembly looks reasonable to me. I changed it to movz/movk/dup in the new patchset (tested on rpi5, but not on macos). Thanks, Ramiro _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile 2024-06-07 14:05 [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile Ramiro Polla ` (2 preceding siblings ...) 2024-06-07 14:05 ` [FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add neon " Ramiro Polla @ 2024-06-07 18:45 ` Andreas Rheinhardt 2024-06-07 19:09 ` Ramiro Polla 3 siblings, 1 reply; 11+ messages in thread From: Andreas Rheinhardt @ 2024-06-07 18:45 UTC (permalink / raw) To: ffmpeg-devel Ramiro Polla: > --- > tests/checkasm/Makefile | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile > index 6eb94d10d5..3ce152e818 100644 > --- a/tests/checkasm/Makefile > +++ b/tests/checkasm/Makefile > @@ -63,7 +63,9 @@ AVFILTEROBJS-$(CONFIG_SOBEL_FILTER) += vf_convolution.o > CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes) > > # swscale tests > -SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o > +SWSCALEOBJS += sw_gbrp.o > +SWSCALEOBJS += sw_rgb.o > +SWSCALEOBJS += sw_scale.o > > CHECKASMOBJS-$(CONFIG_SWSCALE) += $(SWSCALEOBJS) > We use the multiple-objects in a line style in all Makefiles. - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile 2024-06-07 18:45 ` [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile Andreas Rheinhardt @ 2024-06-07 19:09 ` Ramiro Polla 2024-06-07 19:12 ` Andreas Rheinhardt 0 siblings, 1 reply; 11+ messages in thread From: Ramiro Polla @ 2024-06-07 19:09 UTC (permalink / raw) To: FFmpeg development discussions and patches [-- Attachment #1: Type: text/plain, Size: 1146 bytes --] On Fri, Jun 7, 2024 at 8:46 PM Andreas Rheinhardt <andreas.rheinhardt@outlook.com> wrote: > > Ramiro Polla: > > --- > > tests/checkasm/Makefile | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile > > index 6eb94d10d5..3ce152e818 100644 > > --- a/tests/checkasm/Makefile > > +++ b/tests/checkasm/Makefile > > @@ -63,7 +63,9 @@ AVFILTEROBJS-$(CONFIG_SOBEL_FILTER) += vf_convolution.o > > CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes) > > > > # swscale tests > > -SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o > > +SWSCALEOBJS += sw_gbrp.o > > +SWSCALEOBJS += sw_rgb.o > > +SWSCALEOBJS += sw_scale.o > > > > CHECKASMOBJS-$(CONFIG_SWSCALE) += $(SWSCALEOBJS) > > > > We use the multiple-objects in a line style in all Makefiles. Then we should change the following: libswscale/arm/Makefile (NEON_OBJS) tests/checkasm/Makefile (AVUTILOBJS) libavfilter/dnn/Makefile (OBJS-$(CONFIG_DNN)) New patch attached. [-- Attachment #2: 0001-tests-checkasm-cosmetics-one-object-per-line-in-Make.patch --] [-- Type: text/x-patch, Size: 938 bytes --] From 4965ece9648be5da6e93b6bfa319b6a5fe92aee6 Mon Sep 17 00:00:00 2001 From: Ramiro Polla <ramiro.polla@gmail.com> Date: Thu, 6 Jun 2024 15:40:03 +0200 Subject: [PATCH] tests/checkasm: cosmetics, one object per line in Makefile --- tests/checkasm/Makefile | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tests/checkasm/Makefile b/tests/checkasm/Makefile index 6eb94d10d5..c2a41d7f7b 100644 --- a/tests/checkasm/Makefile +++ b/tests/checkasm/Makefile @@ -63,7 +63,9 @@ AVFILTEROBJS-$(CONFIG_SOBEL_FILTER) += vf_convolution.o CHECKASMOBJS-$(CONFIG_AVFILTER) += $(AVFILTEROBJS-yes) # swscale tests -SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o +SWSCALEOBJS += sw_gbrp.o \ + sw_rgb.o \ + sw_scale.o \ CHECKASMOBJS-$(CONFIG_SWSCALE) += $(SWSCALEOBJS) -- 2.30.2 [-- Attachment #3: Type: text/plain, Size: 251 bytes --] _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile 2024-06-07 19:09 ` Ramiro Polla @ 2024-06-07 19:12 ` Andreas Rheinhardt 2024-06-07 19:47 ` Ramiro Polla 0 siblings, 1 reply; 11+ messages in thread From: Andreas Rheinhardt @ 2024-06-07 19:12 UTC (permalink / raw) To: ffmpeg-devel Ramiro Polla: > # swscale tests > -SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o > +SWSCALEOBJS += sw_gbrp.o \ > + sw_rgb.o \ > + sw_scale.o \ > > CHECKASMOBJS-$(CONFIG_SWSCALE) += $(SWSCALEOBJS) We typically only use a new line of the old line is full. - Andreas _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile 2024-06-07 19:12 ` Andreas Rheinhardt @ 2024-06-07 19:47 ` Ramiro Polla 0 siblings, 0 replies; 11+ messages in thread From: Ramiro Polla @ 2024-06-07 19:47 UTC (permalink / raw) To: FFmpeg development discussions and patches On Fri, Jun 7, 2024 at 9:27 PM Andreas Rheinhardt <andreas.rheinhardt@outlook.com> wrote: > > Ramiro Polla: > > # swscale tests > > -SWSCALEOBJS += sw_gbrp.o sw_rgb.o sw_scale.o > > +SWSCALEOBJS += sw_gbrp.o \ > > + sw_rgb.o \ > > + sw_scale.o \ > > > > CHECKASMOBJS-$(CONFIG_SWSCALE) += $(SWSCALEOBJS) > > We typically only use a new line of the old line is full. There's currently a mix of everything in the Makefiles. One object per line, multiple objects per line, mix of one or multiple objects per line in the same statement, aligned and unaligned += between lines, aligned and unaligned \ at the end of the lines, some have \ at the last line, some don't... I personally prefer += one object per line and no \ at the end of the line everywhere. It makes the code look consistent and the patches are cleaner and easier to understand. But I don't maintain this, so I have no strong opinion in this case. This patch was meant to simplify the next commit (checkasm: add tests for {lum,chr}ConvertRange), but I can drop it if you prefer. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe". ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-06-11 12:33 UTC | newest] Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-06-07 14:05 [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile Ramiro Polla 2024-06-07 14:05 ` [FFmpeg-devel] [PATCH 2/4] checkasm: add tests for {lum, chr}ConvertRange Ramiro Polla 2024-06-07 14:05 ` [FFmpeg-devel] [PATCH 3/4] swscale/x86: add sse4 " Ramiro Polla 2024-06-07 17:38 ` Ramiro Polla 2024-06-07 14:05 ` [FFmpeg-devel] [PATCH 4/4] swscale/aarch64: add neon " Ramiro Polla 2024-06-10 11:56 ` Martin Storsjö 2024-06-11 12:33 ` Ramiro Polla 2024-06-07 18:45 ` [FFmpeg-devel] [PATCH 1/4] tests/checkasm: cosmetics, one object per line in Makefile Andreas Rheinhardt 2024-06-07 19:09 ` Ramiro Polla 2024-06-07 19:12 ` Andreas Rheinhardt 2024-06-07 19:47 ` Ramiro Polla
Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel This inbox may be cloned and mirrored by anyone: git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \ ffmpegdev@gitmailbox.com public-inbox-index ffmpegdev Example config snippet for mirrors. AGPL code for this site: git clone https://public-inbox.org/public-inbox.git