From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ffmpeg-devel-bounces@ffmpeg.org>
Received: from ffbox0-bg.ffmpeg.org (ffbox0-bg.ffmpeg.org [79.124.17.100])
	by master.gitmailbox.com (Postfix) with ESMTPS id 6729D4CA89
	for <ffmpegdev@gitmailbox.com>; Tue, 27 May 2025 08:52:03 +0000 (UTC)
Received: from [127.0.1.1] (localhost [127.0.0.1])
	by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id 470D568D546;
	Tue, 27 May 2025 11:52:00 +0300 (EEST)
Received: from haasn.dev (haasn.dev [78.46.187.166])
 by ffbox0-bg.ffmpeg.org (Postfix) with ESMTP id EDF7068D470
 for <ffmpeg-devel@ffmpeg.org>; Tue, 27 May 2025 11:51:53 +0300 (EEST)
Received: from haasn.dev (unknown [10.30.1.1])
 by haasn.dev (Postfix) with UTF8SMTP id B17F34076C;
 Tue, 27 May 2025 10:51:53 +0200 (CEST)
Date: Tue, 27 May 2025 10:51:53 +0200
Message-ID: <20250527105153.GF38697@haasn.xyz>
From: Niklas Haas <ffmpeg@haasn.xyz>
To: Kieran Kunhya via ffmpeg-devel <ffmpeg-devel@ffmpeg.org>,
 FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
In-Reply-To: <CABGuwEmPupY2QBxX+qw4sW-S+efMR8OuLLMRFEbWczJQzZANZg@mail.gmail.com>
References: <20250527081242.22892-1-ffmpeg@haasn.xyz>
 <CABGuwEmPupY2QBxX+qw4sW-S+efMR8OuLLMRFEbWczJQzZANZg@mail.gmail.com>
MIME-Version: 1.0
Content-Disposition: inline
Subject: Re: [FFmpeg-devel] (no subject)
X-BeenThere: ffmpeg-devel@ffmpeg.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: FFmpeg development discussions and patches <ffmpeg-devel.ffmpeg.org>
List-Unsubscribe: <https://ffmpeg.org/mailman/options/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=unsubscribe>
List-Archive: <https://ffmpeg.org/pipermail/ffmpeg-devel>
List-Post: <mailto:ffmpeg-devel@ffmpeg.org>
List-Help: <mailto:ffmpeg-devel-request@ffmpeg.org?subject=help>
List-Subscribe: <https://ffmpeg.org/mailman/listinfo/ffmpeg-devel>,
 <mailto:ffmpeg-devel-request@ffmpeg.org?subject=subscribe>
Reply-To: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
Cc: Kieran Kunhya <kieran618@googlemail.com>
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Errors-To: ffmpeg-devel-bounces@ffmpeg.org
Sender: "ffmpeg-devel" <ffmpeg-devel-bounces@ffmpeg.org>
Archived-At: <https://master.gitmailbox.com/ffmpegdev/20250527105153.GF38697@haasn.xyz/>
List-Archive: <https://master.gitmailbox.com/ffmpegdev/>
List-Post: <mailto:ffmpegdev@gitmailbox.com>

On Tue, 27 May 2025 16:29:20 +0800 Kieran Kunhya via ffmpeg-devel <ffmpeg-devel@ffmpeg.org> wrote:
> >
> > - adding vzeroupper: ~12%
> >
>
> This seems quite suspicious.
> Can you explain what you are doing here?

I added a vzeroupper call whenever the code transitions from AVX to SSE. For
example:

Conversion pass for yuv444p -> rgba:
  [ u8 XXXX -> +++X] SWS_OP_READ         : 3 elem(s) planar >> 0
  [ u8 ...X -> +++X] SWS_OP_CONVERT      : u8 -> f32
  [f32 ...X -> ...X] SWS_OP_LINEAR       : matrix3+off3 [[85/73 0 1.596027 0 -222.921566] [85/73 -0.391762 -0.812968 0 135.575295] [85/73 2.017232 0 0 -276.835851] [0 0 0 1 0]]
  [f32 ...X -> ...X] SWS_OP_DITHER       : 16x16 matrix
  [f32 ...X -> ...X] SWS_OP_MAX          : {0 0 0 0} <= x
  [f32 ...X -> ...X] SWS_OP_MIN          : x <= {255 255 255 255}
  [f32 ...X -> +++X] SWS_OP_CONVERT      : f32 -> u8
                     ^-------- vzeroupper call added here
  [ u8 ...X -> ++++] SWS_OP_CLEAR        : {_ _ _ 255}
  [ u8 .... -> ++++] SWS_OP_WRITE        : 4 elem(s) packed >> 0

yuv444p 1920x1080 -> rgba 1920x1080, flags=0x100000 dither=1, SSIM {Y=1.000000 U=0.999999 V=0.999997 A=1.000000}
  time=911 us, ref=4257 us, speedup=4.669x faster

With the vzeroupper commented out:

yuv444p 1920x1080 -> rgba 1920x1080, flags=0x100000 dither=1, SSIM {Y=1.000000 U=0.999999 V=0.999997 A=1.000000}
  time=1361 us, ref=4265 us, speedup=3.133x faster

In most other cases, it does not matter, but in some cases like here, not
having the vzeroupper call introduces false dependencies.

Another example is grayf32 -> yuv444p, which goes from 268 us to 296 us if I
remove the vzeroupper calls. In general, anything involving switching between
32-bit floats (512 bits per block) and 8-bit integers (128 bits per block)
sees an effect.

>
> Kieran
>
> >
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".