-
Notifications
You must be signed in to change notification settings - Fork 20
Description
Overview
Subaligner as of fc18afa (current head of master) crashes when using Whisper to transcribe audio if Whisper outputs an empty segment. The crash output looks like:
subaligner.transcriber - INFO - MainThread - Finished transcribing the audio
ERROR: list index out of range
File "subaligner/.venv/bin/subaligner", line 10, in <module>
sys.exit(main())
File "subaligner/.venv/lib/python3.11/site-packages/subaligner/__main__.py", line 438, in main
"ERROR: {}\n{}".format(str(e), "".join(traceback.format_stack()) if FLAGS.debug else "")
File "subaligner/.venv/lib/python3.11/site-packages/subaligner/__main__.py", line 377, in main
subtitle, frame_rate = transcriber.transcribe(video_file_path=local_video_path,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "subaligner/.venv/lib/python3.11/site-packages/subaligner/transcriber.py", line 106, in transcribe
f"{Utils.format_timestamp(segment['words'][0]['start'])} --> {Utils.format_timestamp(segment['words'][-1]['end'])}\n" \
~~~~~~~~~~~~~~~~^^^
subaligner.transcriber - INFO - MainThread - process shutting down
subaligner.transcriber - DEBUG - MainThread - running all "atexit" finalizers with priority >= 0
subaligner.transcriber - DEBUG - MainThread - running the remaining "atexit" finalizers
Reproduction
I think it's probably input-file dependent and my file is large and private, but here's the command-line invocation that crashed:
CUDA_VISIBLE_DEVICES= subaligner -m transcribe -v '/tmp/myfile.webm' -ml deu -mr whisper -mf large-v3 -o '/tmp/myfile.de.srt' --debug
Analysis
I did some debugging to print out the result from
subaligner/subaligner/transcriber.py
Line 84 in fc18afa
| result = self.__model.transcribe(audio, |
text field and a list of words):
{
'id': 70,
'seek': 14668,
'start': 163.36,
'end': 163.36,
'text': '',
'tokens': [],
'temperature': 0.0,
'avg_logprob': -0.785851426011934,
'compression_ratio': 1.8442211055276383,
'no_speech_prob': 0.034043002873659134,
'words': []
}
The crash is happening in
subaligner/subaligner/transcriber.py
Line 106 in fc18afa
| f"{Utils.format_timestamp(segment['words'][0]['start'])} --> {Utils.format_timestamp(segment['words'][-1]['end'])}\n" \ |
words is an empty array.
Possible Fix
In my local install , I patched up the conditions in the for-loop that processes segments to skip over empty segments like so:
for segment in result["segments"]:
if max_char_length is not None and len(segment["text"]) > max_char_length:
srt_str, srt_idx = self._chunk_segment(segment, srt_str, srt_idx, max_char_length)
elif with_word_time_codes:
for word in segment["words"]:
srt_str += f"{srt_idx}\n" \
f"{Utils.format_timestamp(word['start'])} --> {Utils.format_timestamp(word['end'])}\n" \
f"{word['word'].strip().replace('-->', '->')}\n" \
"\n"
srt_idx += 1
elif segment["words"]:
srt_str += f"{srt_idx}\n" \
f"{Utils.format_timestamp(segment['words'][0]['start'])} --> {Utils.format_timestamp(segment['words'][-1]['end'])}\n" \
f"{segment['text'].strip().replace('-->', '->')}\n" \
"\n"
srt_idx += 1
I don't know much about whisper's output format and I can imagine lots of ways this could still bail ungracefully... but handling empty segments in some fashion seems necessary for at least some input files.