[FFmpeg-devel] [IMPORTANT] AI written TLS Code in WHIP patch

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
 help / color / mirror / Atom feed

* [FFmpeg-devel] [IMPORTANT] AI written TLS Code in WHIP patch
@ 2025-07-29 20:02 Kieran Kunhya via ffmpeg-devel
  2025-07-29 20:11 ` James Almer
  2025-07-30  0:42 ` Jack Lau
  0 siblings, 2 replies; 6+ messages in thread
From: Kieran Kunhya via ffmpeg-devel @ 2025-07-29 20:02 UTC (permalink / raw)
  To: FFmpeg development discussions and patches; +Cc: Kieran Kunhya, tc

Hello,

It seem there is strong evidence that AI wrote TLS code as part of the
WHIP patch. It goes without saying why this is bad. Further discussion
here:
https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20053

This patch was pushed without ML review.

I think this code should be removed before the FFmpeg release. I
include TC in this email for that reason.

Regards,
Kieran Kunhya
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FFmpeg-devel] [IMPORTANT] AI written TLS Code in WHIP patch
  2025-07-29 20:02 [FFmpeg-devel] [IMPORTANT] AI written TLS Code in WHIP patch Kieran Kunhya via ffmpeg-devel
@ 2025-07-29 20:11 ` James Almer
  2025-07-29 20:56   ` Kacper Michajlow
  2025-07-30  0:42 ` Jack Lau
  1 sibling, 1 reply; 6+ messages in thread
From: James Almer @ 2025-07-29 20:11 UTC (permalink / raw)
  To: ffmpeg-devel

[-- Attachment #1.1.1: Type: text/plain, Size: 1470 bytes --]

On 7/29/2025 5:02 PM, Kieran Kunhya via ffmpeg-devel wrote:
> Hello,
> 
> It seem there is strong evidence that AI wrote TLS code as part of the
> WHIP patch. It goes without saying why this is bad. Further discussion
> here:
> https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20053
> 
> This patch was pushed without ML review.
> 
> I think this code should be removed before the FFmpeg release. I
> include TC in this email for that reason.

The UTF8 dashes are not so much an indication of LLM output but one that 
it was written with an unusual locale, I'd say.

As for the allegations that the code was seemingly written with one 
objective in mind, sure, that can be what an LLM does based on prompts, 
but also what a person does. It would not be the first time someone 
writes code without thinking of it having to work generically (The 
amount of API that was made for a single, very specific usecase that had 
to then be replaced by a more general one is not small).

That said, I agree the changes did look weird in places, so I'm not 
against reverting them in the release/8.0 branch so we have more time to 
properly audit it.

> 
> Regards,
> Kieran Kunhya
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 251 bytes --]

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FFmpeg-devel] [IMPORTANT] AI written TLS Code in WHIP patch
  2025-07-29 20:11 ` James Almer
@ 2025-07-29 20:56   ` Kacper Michajlow
  2025-07-29 22:39     ` James Almer
  0 siblings, 1 reply; 6+ messages in thread
From: Kacper Michajlow @ 2025-07-29 20:56 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

On Tue, 29 Jul 2025 at 22:11, James Almer <jamrial@gmail.com> wrote:
>
> On 7/29/2025 5:02 PM, Kieran Kunhya via ffmpeg-devel wrote:
> > Hello,
> >
> > It seem there is strong evidence that AI wrote TLS code as part of the
> > WHIP patch. It goes without saying why this is bad. Further discussion
> > here:
> > https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20053
> >
> > This patch was pushed without ML review.
> >
> > I think this code should be removed before the FFmpeg release. I
> > include TC in this email for that reason.
>
> The UTF8 dashes are not so much an indication of LLM output but one that
> it was written with an unusual locale, I'd say.

I disagree. I wouldn't call out AI if there wouldn't be a good
indication that this is where those hyphens came from. I tested many
LLMs to evaluate their usefulness and this is the kind of thing that
they love to insert even in code. I would expect any developer (even
natively using different locale) to use - in the .c file, after all
this is a common token in the code too.

Additionally, now I see there is also an ’ (0x2019) few lines below in
`to a av_malloc’d PEM string.` Which is also something that LLMs love
to insert. I can even just now remove those comments and ask one of
the biggest LLM to comment on the code to reproduce the same 0x2019
being inserted.

Lastly, the strong indication of LLM are dummy comments for every
operation. LLMs love to explain themselves. Comments in code are very
useful tools, but you don't have to comment every function call and
every label. IMHO it adds more noise than information, SNR is
important. It's harmless, but look at pkey_to_pem_string() and tell me
it really is organic to add `// Copy data & NUL-terminate` to a memcpy
call. Again I can reproduce this with quaring LLM to do so.

I'm not saying we should revert this code, but a good review would be
in-order to ensure we are not shipping something bad in there.

Note that my intention was not to start some big discussion, just
clean the file from unnecessary similar looking utf-8 characters. I'm
not opposed to AI/LLM use, but their output should be heavily
sanitized as they are not reliable on their own.

- Kacper
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FFmpeg-devel] [IMPORTANT] AI written TLS Code in WHIP patch
  2025-07-29 20:56   ` Kacper Michajlow
@ 2025-07-29 22:39     ` James Almer
  2025-07-29 22:59       ` Timo Rothenpieler
  0 siblings, 1 reply; 6+ messages in thread
From: James Almer @ 2025-07-29 22:39 UTC (permalink / raw)
  To: ffmpeg-devel


[-- Attachment #1.1.1: Type: text/plain, Size: 2968 bytes --]

On 7/29/2025 5:56 PM, Kacper Michajlow wrote:
> On Tue, 29 Jul 2025 at 22:11, James Almer <jamrial@gmail.com> wrote:
>>
>> On 7/29/2025 5:02 PM, Kieran Kunhya via ffmpeg-devel wrote:
>>> Hello,
>>>
>>> It seem there is strong evidence that AI wrote TLS code as part of the
>>> WHIP patch. It goes without saying why this is bad. Further discussion
>>> here:
>>> https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20053
>>>
>>> This patch was pushed without ML review.
>>>
>>> I think this code should be removed before the FFmpeg release. I
>>> include TC in this email for that reason.
>>
>> The UTF8 dashes are not so much an indication of LLM output but one that
>> it was written with an unusual locale, I'd say.
> 
> I disagree. I wouldn't call out AI if there wouldn't be a good
> indication that this is where those hyphens came from. I tested many
> LLMs to evaluate their usefulness and this is the kind of thing that
> they love to insert even in code. I would expect any developer (even
> natively using different locale) to use - in the .c file, after all
> this is a common token in the code too.
> 
> Additionally, now I see there is also an ’ (0x2019) few lines below in
> `to a av_malloc’d PEM string.` Which is also something that LLMs love
> to insert. I can even just now remove those comments and ask one of
> the biggest LLM to comment on the code to reproduce the same 0x2019
> being inserted.

Alright, i was not aware this was common behavior of LLMs.

> 
> Lastly, the strong indication of LLM are dummy comments for every
> operation. LLMs love to explain themselves. Comments in code are very
> useful tools, but you don't have to comment every function call and
> every label. IMHO it adds more noise than information, SNR is
> important. It's harmless, but look at pkey_to_pem_string() and tell me
> it really is organic to add `// Copy data & NUL-terminate` to a memcpy
> call. Again I can reproduce this with quaring LLM to do so.

This one i know is common.

> 
> I'm not saying we should revert this code, but a good review would be
> in-order to ensure we are not shipping something bad in there.
I however am saying we should revert it in the release/8.0 branch after 
it's made and before the release is tagged. A proper review can happen 
in the master branch without the risk of realizing we shipped dubious 
code in a tarball.

> 
> Note that my intention was not to start some big discussion, just
> clean the file from unnecessary similar looking utf-8 characters. I'm
> not opposed to AI/LLM use, but their output should be heavily
> sanitized as they are not reliable on their own.
> 
> - Kacper
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 251 bytes --]

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FFmpeg-devel] [IMPORTANT] AI written TLS Code in WHIP patch
  2025-07-29 22:39     ` James Almer
@ 2025-07-29 22:59       ` Timo Rothenpieler
  0 siblings, 0 replies; 6+ messages in thread
From: Timo Rothenpieler @ 2025-07-29 22:59 UTC (permalink / raw)
  To: ffmpeg-devel


On 7/30/2025 12:39 AM, James Almer wrote:
> On 7/29/2025 5:56 PM, Kacper Michajlow wrote:
>> On Tue, 29 Jul 2025 at 22:11, James Almer <jamrial@gmail.com> wrote:
>>>
>>> On 7/29/2025 5:02 PM, Kieran Kunhya via ffmpeg-devel wrote:
>>>> Hello,
>>>>
>>>> It seem there is strong evidence that AI wrote TLS code as part of the
>>>> WHIP patch. It goes without saying why this is bad. Further discussion
>>>> here:
>>>> https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20053
>>>>
>>>> This patch was pushed without ML review.
>>>>
>>>> I think this code should be removed before the FFmpeg release. I
>>>> include TC in this email for that reason.
>>>
>>> The UTF8 dashes are not so much an indication of LLM output but one that
>>> it was written with an unusual locale, I'd say.
>>
>> I disagree. I wouldn't call out AI if there wouldn't be a good
>> indication that this is where those hyphens came from. I tested many
>> LLMs to evaluate their usefulness and this is the kind of thing that
>> they love to insert even in code. I would expect any developer (even
>> natively using different locale) to use - in the .c file, after all
>> this is a common token in the code too.
>>
>> Additionally, now I see there is also an ’ (0x2019) few lines below in
>> `to a av_malloc’d PEM string.` Which is also something that LLMs love
>> to insert. I can even just now remove those comments and ask one of
>> the biggest LLM to comment on the code to reproduce the same 0x2019
>> being inserted.
> 
> Alright, i was not aware this was common behavior of LLMs.
> 
>>
>> Lastly, the strong indication of LLM are dummy comments for every
>> operation. LLMs love to explain themselves. Comments in code are very
>> useful tools, but you don't have to comment every function call and
>> every label. IMHO it adds more noise than information, SNR is
>> important. It's harmless, but look at pkey_to_pem_string() and tell me
>> it really is organic to add `// Copy data & NUL-terminate` to a memcpy
>> call. Again I can reproduce this with quaring LLM to do so.
> 
> This one i know is common.
> 
>>
>> I'm not saying we should revert this code, but a good review would be
>> in-order to ensure we are not shipping something bad in there.
> I however am saying we should revert it in the release/8.0 branch after 
> it's made and before the release is tagged. A proper review can happen 
> in the master branch without the risk of realizing we shipped dubious 
> code in a tarball.

It has been heavily modified since, so I'm not sure reverting it is even 
realistic.
There have been at least two major patch-sets on top of it by now.

I cleaned up tls_openssl, and then ePirat did as well (potentially 
introducing a bug with the index exhaustion).
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [FFmpeg-devel] [IMPORTANT] AI written TLS Code in WHIP patch
  2025-07-29 20:02 [FFmpeg-devel] [IMPORTANT] AI written TLS Code in WHIP patch Kieran Kunhya via ffmpeg-devel
  2025-07-29 20:11 ` James Almer
@ 2025-07-30  0:42 ` Jack Lau
  1 sibling, 0 replies; 6+ messages in thread
From: Jack Lau @ 2025-07-30  0:42 UTC (permalink / raw)
  To: FFmpeg development discussions and patches

> On Jul 30, 2025, at 04:02, Kieran Kunhya via ffmpeg-devel <ffmpeg-devel@ffmpeg.org> wrote:
> 
> Hello,
> 
> It seem there is strong evidence that AI wrote TLS code as part of the
> WHIP patch. It goes without saying why this is bad. Further discussion
> here:
> https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20053
Hello,

I would like to clarify a few things.

Before GSoC, the initial WHIP patches were written by Winlin and many other RTC developers. My first task during GSoC was to refactor the DTLS code and integrate it into the existing tls.

Additionally, I did use AI to speed up parts of the WHIP development. Specifically, the functions for converting cert/key between PEM and string formats were generated with the help of AI. I tested them locally before including them in the codebase.

At the time, I believed these functions were relatively straightforward and didn’t require refer to RFCs, so I used the AI-generated output.

However, the rest of the WHIP and DTLS implementation — especially the parts that require RFC compliance — was written manually. That said, there's still room for improvement. For example, this pull request <https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20030> not only adds new features, but also refactors the ICE/DTLS logic in WHIP.

I sincerely apologize for introducing the AI-generated code snippet, and I will avoid this in the future.

Before GSoC ends, I plan to continue improving the WHIP codebase, including adding support for multiple SSL libraries and additional codecs.

I would like to thank my mentors, Steven and Zhao, as well as many RTC developers for their support. 

Thanks to Timo and Marvin for their help during this time. I'm also grateful to Timo for his patient guidance, and to Kieran for many suggestions that have helped FFmpeg improve. Due to language and cultural differences, it would be even better if we could communicate more directly through patches.

> 
> This patch was pushed without ML review.
> 
> I think this code should be removed before the FFmpeg release. I
> include TC in this email for that reason.
> 
> Regards,
> Kieran Kunhya
> _______________________________________________
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
> 
> To unsubscribe, visit link above, or email
> ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

Regards,
Jack

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-request@ffmpeg.org with subject "unsubscribe".

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-07-30  0:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-07-29 20:02 [FFmpeg-devel] [IMPORTANT] AI written TLS Code in WHIP patch Kieran Kunhya via ffmpeg-devel
2025-07-29 20:11 ` James Almer
2025-07-29 20:56   ` Kacper Michajlow
2025-07-29 22:39     ` James Almer
2025-07-29 22:59       ` Timo Rothenpieler
2025-07-30  0:42 ` Jack Lau

Git Inbox Mirror of the ffmpeg-devel mailing list - see https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

This inbox may be cloned and mirrored by anyone:

	git clone --mirror https://master.gitmailbox.com/ffmpegdev/0 ffmpegdev/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 ffmpegdev ffmpegdev/ https://master.gitmailbox.com/ffmpegdev \
		ffmpegdev@gitmailbox.com
	public-inbox-index ffmpegdev

Example config snippet for mirrors.


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git