From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from ffbox0-bg.mplayerhq.hu (ffbox0-bg.ffmpeg.org [79.124.17.100]) by master.gitmailbox.com (Postfix) with ESMTP id B8A6F44C25 for ; Mon, 14 Nov 2022 02:42:47 +0000 (UTC) Received: from [127.0.1.1] (localhost [127.0.0.1]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTP id 0CE8A68BC6B; Mon, 14 Nov 2022 04:42:45 +0200 (EET) Received: from mail-oa1-f43.google.com (mail-oa1-f43.google.com [209.85.160.43]) by ffbox0-bg.mplayerhq.hu (Postfix) with ESMTPS id 2287C68B37A for ; Mon, 14 Nov 2022 04:42:38 +0200 (EET) Received: by mail-oa1-f43.google.com with SMTP id 586e51a60fabf-13ba86b5ac0so11251510fac.1 for ; Sun, 13 Nov 2022 18:42:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=RlbBbJUqpsYnO+tUwGvEM1+xwJazXwIpt6xF8QDBh3o=; b=junQq6U88BpOBVwxMZp1k7lm4nz39mE+cPQLAFX6gPxNKMEX+8dN+qesw9TiLzJzpi GqI+1Ka1h3OgAFKLtiKs7gp177aIjg/m6uH17l1m7cjvfRKl3snjJFQJbvg+PZSU10ec mYn79iIDkQPvIszX5mthm5yHPPgGxmltu4dJxtYrvArdqxJ5PjffVxN5VsLcCKcNu2nM lSROvWBuV0B10GFkuqwZyNHjL5lG4ByEnXdWIJ3+1/Ryj/5QC9aPJ4HtrWliJL1obO2L PzbvCFZppnQC8OcRev85y4nkdZFJc0LfFB6n1sY3hXr6+K47qalKu8IiQFe2LHOd9dcy S9IA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=RlbBbJUqpsYnO+tUwGvEM1+xwJazXwIpt6xF8QDBh3o=; b=BTNFVqWH+xACk5cS0Nsec6CO/+HnYQUVxGCehZ45e4Gg0NIFpEFIYyxHbQoYBzF5xm /GSMW9rMGJ7JuB7DXnhdXPJ4Z2pLZdFUhgb980GMhddI6gHgaFENgzJx99iFamPKQHEN nnmZZEOWqYxw8w0BmjFrMP24FvGHPUiJicyEJxOfArJ8tgludwaqF9Xif8YXemCVxnq5 b+tk47Ss16w+VyIxuPWqnfTHP+m0KZurpHxCBIDI7GY5uJYsp9H+l89xiKaWswIOKS/O z2I7bff05G55b6ppohz6rdHpPcsamkOTiz2Sd+Tehw/AgkrTMzVIPWqdSajpO2qc8a3b LbWA== X-Gm-Message-State: ANoB5pn0zEubJcu8Xlv9exy1dbGUxb5Gx9vxKkUNf0HdKpONIgh5abRw jxAafdE+sH31bBcOsqNvgRPp7k2ooYs= X-Google-Smtp-Source: AA0mqf6zYPybYEtH4UswiVzto6uRRo/sY6lyWUq+4SiIngak94vXlywNbIX8ned3UVF6sZEar23n8A== X-Received: by 2002:a05:6870:4d02:b0:13a:e31c:ef4 with SMTP id pn2-20020a0568704d0200b0013ae31c0ef4mr5674220oab.285.1668393756310; Sun, 13 Nov 2022 18:42:36 -0800 (PST) Received: from [192.168.0.15] ([181.85.72.69]) by smtp.gmail.com with ESMTPSA id y20-20020a4ade14000000b0049bd96ec131sm3184024oot.8.2022.11.13.18.42.35 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 13 Nov 2022 18:42:35 -0800 (PST) Message-ID: Date: Sun, 13 Nov 2022 23:42:53 -0300 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Content-Language: en-US To: ffmpeg-devel@ffmpeg.org References: <20221104082925.25598-1-bin.wang@intel.com> From: James Almer In-Reply-To: <20221104082925.25598-1-bin.wang@intel.com> Subject: Re: [FFmpeg-devel] [PATCH v7] libavfilter/x86/vf_convolution: add sobel filter optimization and unit test with intel AVX512 VNNI X-BeenThere: ffmpeg-devel@ffmpeg.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: FFmpeg development discussions and patches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: FFmpeg development discussions and patches Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="us-ascii"; Format="flowed" Errors-To: ffmpeg-devel-bounces@ffmpeg.org Sender: "ffmpeg-devel" Archived-At: List-Archive: List-Post: On 11/4/2022 5:29 AM, bin.wang-at-intel.com@ffmpeg.org wrote: > +%macro FILTER_SOBEL 0 > +%if UNIX64 > +cglobal filter_sobel, 4, 15, 7, dst, width, matrix, ptr, c0, c1, c2, c3, c4, c5, c6, c7, c8, r, x > +%else > +cglobal filter_sobel, 4, 15, 7, dst, width, rdiv, bias, matrix, ptr, c0, c1, c2, c3, c4, c5, c6, c7, c8, r, x > +%endif > +%if WIN64 > + SWAP xmm0, xmm2 > + SWAP xmm1, xmm3 > + mov r2q, matrixmp > + mov r3q, ptrmp > + DEFINE_ARGS dst, width, matrix, ptr, c0, c1, c2, c3, c4, c5, c6, c7, c8, r, x > +%endif > + movsxdifnidn widthq, widthd > + VBROADCASTSS m0, xmm0 > + VBROADCASTSS m1, xmm1 This and every other xmm# case should instead be xm#, to ensure the swapping is taken into account. _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".