From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id 084A543909 for ; Fri, 2 Sep 2022 14:03:40 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 6566C68BA1C; Fri, 2 Sep 2022 17:03:38 +0300 (EEST) Received: from mail-ed1-f43.google.com (mail-ed1-f43.google.com [209.85.208.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 4D4F268B76F for ; Fri, 2 Sep 2022 17:03:32 +0300 (EEST) Received: by mail-ed1-f43.google.com with SMTP id z2so2864678edc.1 for ; Fri, 02 Sep 2022 07:03:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gramner.com; s=google; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date; bh=76mdLIPsZ9ejYQ5rEgha5XiRDah1D0kkdu/w0yaP0Fo=; b=G47qd/SqKgByEyJiQ34XqTOU+08VkcSFp9ToMvzP3DIZApPyXJDlUXzSlIkg4uGD7a F7YgpugO8q4dzUIpAWGS9Of/BUwU698oFtFXQPzxzHTk48m1Sep6bUDFvKOHwU9Duup4 i8QoUvrBBjotIVLTt15AIoSppKs5TQb7+CL9o= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date; bh=76mdLIPsZ9ejYQ5rEgha5XiRDah1D0kkdu/w0yaP0Fo=; b=baTAhzxGDqrjI0K+CTOeLCTVM2FPOUEreqmbChn15cvn5PhyqHz8v0IvWJclYw+y/m vzEFZy5mNSLjFQgz4qsf2Vo9LT+h7BeVzAC1COSTM3pAVKdOWZg6M1Cugl/ReKEFfXcc tP/423KUUSmw1OOcb9Kx/MPB3RYH4mU6em3HK0uq+KRhnsU57Bn2iD3CKqFUV9HErPbN FAVEVXPtmc+9DcOcUbapImUOoAH+OhzHwMPLxPgJ2XuAaZj5gvDS5fh1ClQ7oM1TwUmX 4H5St1cTNOhGrlgyc6M1cAaBdvEUZ9zFYHa4wCqBFX2SkcCzmD77466u8BgA4P8J2IzE rXvA== X-Gm-Message-State: ACgBeo1HGZ6BFeT+x8Oju+ZFjHmNJ9F35VKrkZjo3uevEIKy8gcrYseo ONv5tJt5KSG7rariGp//fICFxn1rZkx0A56NGDpg9qlOMwgBmg== X-Google-Smtp-Source: AA6agR6+VYPD+gCROjXqb64myv3Q+mrCAJWcOGXerNK2yjY1DPS4++s5ZpMlQrmFAfvJB+CuOWTLX6DLDl2FWXR8Zvc= X-Received: by 2002:a05:6402:4449:b0:445:cdb6:2de9 with SMTP id o9-20020a056402444900b00445cdb62de9mr33139226edb.59.1662127411636; Fri, 02 Sep 2022 07:03:31 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Henrik Gramner Date: Fri, 2 Sep 2022 16:03:21 +0200 Message-ID: To: FFmpeg development discussions and patches Subject: Re: [FFmpeg-devel] [PATCH v2] x86/tx_float: implement inverse MDCT AVX2 assembly X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On Fri, Sep 2, 2022 at 7:55 AM Lynne wrote: > + movd xmm4, strided > + neg t2d > + movd xmm5, t2d > + SPLATD xmm4 > + SPLATD xmm5 > + vperm2f128 m4, m4, m4, 0x00 ; +stride splatted > + vperm2f128 m5, m5, m5, 0x00 ; -stride splatted movd xm4, strided pxor m5, m5 vpbroadcastd m4, xm4 + mova m2, [lutq] ; load LUT indices + pcmpeqd m0, m0 ; zero out a register + pmulld m3, m2, m4 ; multiply by +stride + pmulld m2, m5 ; multiply by -stride + movaps m1, m0 + vgatherdps m6, [inq + 2*m3], m0 ; im + vgatherdps m7, [t1q + 2*m2], m1 ; re pmulld m2, m4, [lutq] pcmpeqd m0, m0 mova m1, m0 vgatherdps m6, [inq + 2*m2], m0 psubd m2, m5, m2 vgatherdps m7, [t1q + 2*m2], m1 The comment for pcmpeqd is also wrong as bits are set to 1, not 0. That instruction could also be moved outside the loop and replaced with a cheaper register-register move inside the loop. > + vperm2f128 m0, m0, 0x01 ; flip > + vperm2f128 m4, m4, 0x01 ; flip (2) > + shufpd m0, m0, 101b > + shufpd m4, m4, 101b vpermpd m0, m0, q0123 vpermpd m4, m4, q0123 _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".