From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id F0A0544C69 for ; Mon, 14 Nov 2022 12:54:06 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 13B9868BDD7; Mon, 14 Nov 2022 14:54:04 +0200 (EET) Received: from mail-oi1-f180.google.com (mail-oi1-f180.google.com [209.85.167.180]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 3A32168BDA5 for ; Mon, 14 Nov 2022 14:53:58 +0200 (EET) Received: by mail-oi1-f180.google.com with SMTP id c129so11318986oia.0 for ; Mon, 14 Nov 2022 04:53:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=ggwzhmHsFkOYc1MDGa/wEf00m2XO/8rWByJuvDPT4VI=; b=coiH6ug3iNvh72kbujfm0nqhEAhXzBKCofzNzoJNWYMvPqS2j1UbcYlr4jkB+iY8Mf tOydHPl/Og+Ip+YBfXQ21HfZnRTJoz3jDaddftmLUaYgiejtQDoBYonmtnJ4cL3zeHjn 0ye6oE78z8o9i9k4zOaEIME/zQgjM5aw3XtEW0SyrG7dgE5ifzJRM9KbsTseybBuWmj1 KSByXeJfNdBz75GVlGrJwyud6TSicyegw6cqPW2X2Y+iNkXhpRwGmebdabsVirPJdo/B DDCCxCNZIww/anH6O031+4wAA1c7BSWSIj6CIdnXyzSx+oIjFxsXbp+MAllHE3SCjGiK JOLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ggwzhmHsFkOYc1MDGa/wEf00m2XO/8rWByJuvDPT4VI=; b=5FsujlWQoLkzVFk23nmjQuJhtJwj5Zo8YaaIa4h9oAdGtN5dRklnaFjCerARyDoNzv udH9WtKeIhO2Dwl9B4zkGCisoccvVgA7t1GOzbY19nF/4AnUozu9Iftvys1o7I7j6RCJ 1FUqcV8alwmlA8AIxGb6Zuks5SNNdTeKbLHgPga+dVEjtdP9mECK0rwow5UgJmyDx4nu mrLkDkxtgHoUan4JVv2GSRe9d7IENTzM1EJVLFeZtzU4wVVkClVFewMIeU+OdJol8XqK L+T8eJ0bjSz2DxkepM/0q1Zxkvnss2RuaEs7/u677cQkZC9ptZU9cqNlSc/0ftUawM3F HYRA== X-Gm-Message-State: ANoB5plLLPMDRkARNJJK18p2rq5/Dlrtt+a8T+anJ6wDReS+5P8bhl5a d/e28Euyw1q2922MjTH8BLwXC25WJbc= X-Google-Smtp-Source: AA0mqf42spd1jG9DoJ+0O5/7ujDJM2p6OGUEr9dO76ZJThOO1y5GT9uAtaCVebvHxph6DEGzKjTW/Q== X-Received: by 2002:a54:4489:0:b0:35a:d3d:f9d0 with SMTP id v9-20020a544489000000b0035a0d3df9d0mr5823155oiv.150.1668430435855; Mon, 14 Nov 2022 04:53:55 -0800 (PST) Received: from [192.168.0.15] ([181.85.72.69]) by smtp.gmail.com with ESMTPSA id r4-20020a9d7cc4000000b0066c2fd0528esm4026748otn.53.2022.11.14.04.53.54 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 14 Nov 2022 04:53:55 -0800 (PST) Message-ID: Date: Mon, 14 Nov 2022 09:54:15 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Content-Language: en-US To: ffmpeg-devel@ffmpeg.org References: <20221104082925.25598-1-bin.wang@intel.com> From: James Almer In-Reply-To: <20221104082925.25598-1-bin.wang@intel.com> Subject: Re: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 11/4/2022 5:29 AM, bin.wang-at-intel.com@ffmpeg.org wrote: > +.loop2: > + xor rd, rd > + pxor m4, m4 > + > + ;Gx > + SOBEL_MUL 0, data_n1 > + SOBEL_MUL 1, data_n2 > + SOBEL_MUL 2, data_n1 > + SOBEL_ADD 6 > + SOBEL_MUL 7, data_p2 > + SOBEL_ADD 8 > + > + cvtsi2ss xmm4, rd > + mulss xmm4, xmm4 > + > + xor rd, rd > + ;Gy > + SOBEL_MUL 0, data_n1 > + SOBEL_ADD 2 > + SOBEL_MUL 3, data_n2 > + SOBEL_MUL 5, data_p2 > + SOBEL_MUL 6, data_n1 > + SOBEL_ADD 8 > + > + cvtsi2ss xmm5, rd > + fmaddss xmm4, xmm5, xmm5, xmm4 > + > + sqrtps xmm4, xmm4 > + fmaddss xmm4, xmm4, xmm0, xmm1 ;sum = sum * rdiv + bias By using xmm# you're not taking into account any x86inc SWAPing, so this is using xmm0 and xmm1 where the single scalar float input arguments reside (at least on unix64), instead of xm0 and xm1 (xmm16 and xmm17) where the broadcasted scalars were stored. This, again, only worked by chance on unix64 because you're using scalar fmadd, and shouldn't work at all on win64. Also, all these as is are being encoded as VEX, not EVEX, but it should be fine leaving them untouched instead of using xm#, since they will be shorter (five bytes instead of six for some) by using the lower, non callee-saved regs. > + cvttps2dq xmm4, xmm4 ; trunc to integer > + packssdw xmm4, xmm4 > + packuswb xmm4, xmm4 > + movd rd, xmm4 > + mov [dstq + xq], rb > + > + add xq, 1 > + cmp xq, widthq > + jl .loop2 > +.end: > + RET _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".