On Mon, Apr 08, 2024 at 10:13:40PM +0200, Andreas Rheinhardt wrote: > Frame-threaded decoders with inter-frame dependencies > use the ThreadFrame API for syncing. It works as follows: > > During init each thread allocates an AVFrame for every > ThreadFrame. > > Thread A reads the header of its packet and allocates > a buffer for an AVFrame with ff_thread_get_ext_buffer() > (which also allocates a small structure that is shared > with other references to this frame) and sets its fields, > including side data. Then said thread calls ff_thread_finish_setup(). > From that moment onward it is not allowed to change any > of the AVFrame fields at all any more, but it may change > fields which are an indirection away, like the content of > AVFrame.data or already existing side data. > > After thread A has called ff_thread_finish_setup(), > another thread (the user one) calls the codec's update_thread_context > callback which in turn calls ff_thread_ref_frame() which > calls av_frame_ref() which reads every field of A's > AVFrame; hence the above restriction on modifications > of the AVFrame (as any modification of the AVFrame by A after > ff_thread_finish_setup() would be a data race). Of course, > this av_frame_ref() also incurs allocations and therefore > needs to be checked. ff_thread_ref_frame() also references > the small structure used for communicating progress. > > This av_frame_ref() makes it awkward to propagate values that > only become known during decoding to later threads (in case of > frame reordering or other mechanisms of delayed output (like > show-existing-frames) it's not the decoding thread, but a later > thread that returns the AVFrame). E.g. for VP9 when exporting video > encoding parameters as side data the number of blocks only > becomes known during decoding, so one can't allocate the side data > before ff_thread_finish_setup(). It is currently being done afterwards > and this leads to a data race in the vp9-encparams test when using > frame-threading. Returning decode_error_flags is also complicated > by this. > > To perform this exchange a buffer shared between the references > is needed (notice that simply giving the later threads a pointer > to the original AVFrame does not work, because said AVFrame will > be reused lateron when thread A decodes the next packet given to it). > One could extend the buffer already used for progress for this > or use a new one (requiring yet another allocation), yet both > of these approaches have the drawback of being unnatural, ugly > and requiring quite a lot of ad-hoc code. E.g. in case of the VP9 > side data mentioned above one could not simply use the helper > that allocates and adds the side data to an AVFrame in one go. > > The ProgressFrame API meanwhile offers a different solution to all > of this. It is based around the idea that the most natural > shared object for sharing information about an AVFrame between > decoding threads is the AVFrame itself. To actually implement this > the AVFrame needs to be reference counted. This is achieved by > putting a (ownership) pointer into a shared (and opaque) structure > that is managed by the RefStruct API and which also contains > the stuff necessary for progress reporting. > The users get a pointer to this AVFrame with the understanding > that the owner may set all the fields until it has indicated > that it has finished decoding this AVFrame; then the users are > allowed to read everything. Every decoder may of course employ > a different contract than the one outlined above. > > Given that there is no underlying av_frame_ref(), creating > references to a ProgressFrame can't fail. Only > ff_thread_progress_get_buffer() can fail, but given that > it will replace calls to ff_thread_get_ext_buffer() it is > at places where errors are already expected and properly > taken care of. > > The ProgressFrames are empty (i.e. the AVFrame pointer is NULL > and the AVFrames are not allocated during init at all) > while not being in use; ff_thread_progress_get_buffer() both > sets up the actual ProgressFrame and already calls > ff_thread_get_buffer(). So instead of checking for > ThreadFrame.f->data[0] or ThreadFrame.f->buf[0] being NULL > for "this reference frame is non-existing" one should check for > ProgressFrame.f. > > This also implies that one can only set AVFrame properties > after having allocated the buffer. This restriction is not deep: > if it becomes onerous for any codec, ff_thread_progress_get_buffer() > can be broken up. The user would then have to get a buffer > himself. > > In order to avoid unnecessary allocations, the shared structure > is pooled, so that both the structure as well as the AVFrame > itself are reused. This means that there won't be lots of > unnecessary allocations in case of non-frame-threaded decoding. > It might even turn out to have fewer than the current code > (the current code allocates AVFrames for every DPB slot, but > these are often excessively large and not completely used; > the new code allocates them on demand). Pooling relies on the > reset function of the RefStruct pool API, it would be impossible > to implement with the AVBufferPool API. > > Finally, ProgressFrames have no notion of owner; they are built > on top of the ThreadProgress API which also lacks such a concept. > Instead every ThreadProgress and every ProgressFrame contains > its own mutex and condition variable, making it completely independent > of pthread_frame.c. Just like the ThreadFrame API it is simply > presumed that only the actual owner/producer of a frame reports > progress on said frame. > > Signed-off-by: Andreas Rheinhardt > --- > libavcodec/Makefile | 1 + > libavcodec/avcodec.c | 1 + > libavcodec/codec_internal.h | 4 ++ > libavcodec/decode.c | 122 +++++++++++++++++++++++++++++++++ > libavcodec/internal.h | 2 + > libavcodec/progressframe.h | 133 ++++++++++++++++++++++++++++++++++++ > libavcodec/pthread_frame.c | 1 + > libavcodec/tests/avcodec.c | 3 +- > 8 files changed, 266 insertions(+), 1 deletion(-) > create mode 100644 libavcodec/progressframe.h breaks build without threads ./configure --disable-pthreads --cc='ccache gcc' && make -j32 ... libavcodec/threadprogress.c: In function ‘ff_thread_progress_report’: libavcodec/threadprogress.c:54:5: error: implicit declaration of function ‘pthread_mutex_lock’; did you mean ‘ff_mutex_lock’? [-Werror=implicit-function-declaration] pthread_mutex_lock(&pro->progress_mutex); ^~~~~~~~~~~~~~~~~~ ff_mutex_lock libavcodec/threadprogress.c:55:5: error: implicit declaration of function ‘pthread_cond_broadcast’; did you mean ‘ff_cond_broadcast’? [-Werror=implicit-function-declaration] pthread_cond_broadcast(&pro->progress_cond); ^~~~~~~~~~~~~~~~~~~~~~ ff_cond_broadcast libavcodec/threadprogress.c:56:5: error: implicit declaration of function ‘pthread_mutex_unlock’; did you mean ‘ff_mutex_unlock’? [-Werror=implicit-function-declaration] pthread_mutex_unlock(&pro->progress_mutex); ^~~~~~~~~~~~~~~~~~~~ ff_mutex_unlock libavcodec/threadprogress.c: In function ‘ff_thread_progress_await’: libavcodec/threadprogress.c:71:9: error: implicit declaration of function ‘pthread_cond_wait’; did you mean ‘ff_cond_wait’? [-Werror=implicit-function-declaration] pthread_cond_wait(&pro->progress_cond, &pro->progress_mutex); ^~~~~~~~~~~~~~~~~ ff_cond_wait cc1: some warnings being treated as errors ffbuild/common.mak:81: recipe for target 'libavcodec/threadprogress.o' failed make: *** [libavcodec/threadprogress.o] Error 1 make: *** Waiting for unfinished jobs.... [...] -- Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB The smallest minority on earth is the individual. Those who deny individual rights cannot claim to be defenders of minorities. - Ayn Rand