Transcript was generated using whisper.cpp and on MacBook Air M1 it only took 4.75 minutes to transcribe a 2hr long video. Technology is amazing!
Enjoy the video.
Whole process is automated.
1. use ffmpeg to extract audio from video and output a wav file 2. use whisper.ccp to transcribe audio and output a srt file 3. use ffmpeg to generate segments with synced streams 4. use ffmpeg to concat segments into a final output