Convert PGS subtitles to SRT using OCR.
Data files must be placed in the tessdata folder inside PgsToSrt folder, or the path can be specified in the command line with the --tesseractdata parameter.
You only need data files for the language(s) you want to convert.
dotnet PgsToSrt.dll [parameters]
| Parameter | Description |
|---|---|
--input |
Input filename, can be an mkv file or pgs subtitle extracted to a .sup file with mkvextract. |
--output |
Output SubRip (.srt) filename. Auto generated from input filename if not set. |
--track |
Track number of the subtitle to process in an .mkv file (only required when input is a matroska file) This can be obtained with mkvinfo |
--tracklanguage |
Convert all tracks of the specified language (only works with .mkv input) |
--tesseractlanguage |
Tesseract language to use if multiple languages are available in the tesseract data directory. |
--tesseractdata |
Path of tesseract language data files, by default tessdata in the executable directory. |
--tesseractversion |
libtesseract version, support 4 and 5 (default: 4) (ignored on Windows platform) |
--libleptname |
leptonica library name, usually lept or leptonica, 'lib' prefix is automatically added (default: lept) (ignored on Windows platform) |
--libleptversion |
leptonica library version (default: 5) (ignored on Windows platform) |
dotnet PgsToSrt.dll --input video1.fr.sup --output video1.fr.srt --tesseractlanguage fra
dotnet PgsToSrt.dll --input video1.mkv --output video1.srt --track 4Examime entrypoint.sh for a full list of all available arguments.
docker run -it --rm \
-v /data:/data \
-e INPUT=/data/myImageSubtitle.sup \
-e OUTPUT=/data/myTextSubtitle.srt \
-e LANGUAGE=eng \
tentacule/pgstosrtHint: The default arguments coming from Dockerfile are INPUT=/input.sup and OUTPUT=/output.srt, so you can easily:
touch output-file.srt # This needs to be a file, otherwise Docker will just assume it's a directory mount and it will fail.
docker run --it -rm \
-v source-file.sup:/input.sup \
-v output-file.srt:/output.srt \
-e LANGUAGE=eng \
tentacule/pgstosrt- Windows : none, tesseract/leptonica libraries are included in the release package.
- Linux : libtesseract5 (
sudo apt install libtesseract5or whatever your distro requires)
To build PgsToSrt.dll execute the following commands in the src/ directory:
dotnet restore
dotnet publish -c Release -o out --framework net6.0
# The file produced is PgsToSrt/out/PgsToSrt.dllTo build a Docker image for all languages:
make build-allTo build a docker image for a single language:
make build-single LANGUAGE=eng # or any other Tessaract-available language code