Using ffprobe to look at the stream's metadata, I noticed that the audio start PTS is way off after a few minutes of streaming
❯ ffprobe -hide_banner -i rtsp://192.168.1.150:8554/testing
Input #0, rtsp, from 'rtsp://192.168.1.150:8554/testing':
Metadata:
title : testing
comment : testing
Duration: N/A, start: 299.038000, bitrate: N/A
Stream #0:0: Video: h264 (Main), yuv420p(tv, bt709, progressive), 1920x1080 [SAR 1:1 DAR 16:9], 60 fps, 60 tbr, 90k tbn, start 299.038000
Stream #0:1: Audio: aac (LC), 48000 Hz, stereo, fltp, start 89777.512292
Notice that the video PTS matches with the stream PTS, while the audio PTS has a huge offset to it. This throws off media players that attempt to sync the video using the audio timestamp, such as mpv when --video-sync=audio is used.