← Stable release & downloads

Development: Upstream Patch Series

A patch series adding subtitle format conversion to FFmpeg, maintained against master and the 8.1 release branch. Currently in its sixth iteration.

Approach

Prior art. Two attempts stalled. Coza (2022) put libass in libavcodec — a dependency violation. softworkz (2021–2025) proposed full subtitle filter infrastructure across 25 patches and 89 files; design questions remain unresolved.

This series takes the minimal path. libass (rendering) and Tesseract (OCR) already live in libavfilter. Small utility APIs there, called from fftools in the same pattern as sub2video, handle both directions. No filter infrastructure. No AVFrame changes.

Patches

Stable: pgs5-8.1 (23 patches). Latest: v5 (23 patches). Series A/B are independent. C depends on B. D and E are independent additions. F builds on A and B.

A PGS Encoder + Quantization Infrastructure · 11 patches
  • PGS encoder with composition state machine, decoder model compliance, palette animation
  • Palette delta encoding: only changed PDS entries written (83% reduction for single-entry fades)
  • NeuQuant, Median Cut, and ELBG quantizers via av_quantize_* API
  • OkLab colour space, palette remapping and dithering extracted from vf_paletteuse
  • Region-weighted palette generation for overlapping subtitle events
  • Algorithm comparison with PSNR and timing data
B Text-to-Bitmap Conversion · 5 patches
  • libass rendering via internal ff_sub_render_* API in libavfilter
  • Subtitle bitmap utilities in libavutil/sub_util.{h,c}
  • Orchestration in fftools: ffmpeg_enc_sub.c (render, quantize, animate, coalesce)
  • Event lookahead window for overlapping subtitles with different durations
  • Per-packet DTS per HDMV decoder timing model
C GIF Encoder · 1 patch
  • Direct RGBA-to-GIF encoding via av_quantize_* and ff_palette_map_apply
  • Configurable quantizer (Median Cut default) and dithering (Floyd-Steinberg default)
D OCR (Bitmap to Text) · 2 patches
  • Tesseract OCR via internal ff_sub_ocr_* API in libavfilter
  • Orchestration in fftools: ffmpeg_dec_sub.c (OCR, grayscale, dedup, positioning)
  • Bitmap deduplication for PGS fade sequences; PSM fallback for RTL scripts
E Pipeline Wiring + Muxer · 2 patches
  • Connects enc_sub and dec_sub into ffmpeg_enc.c encoding pipeline with clear Display Sets via AV_CODEC_PROP_EXPLICIT_END
  • Per-segment DTS in SUP muxer per HDMV timing model
  • Adds -sub_ocr_lang, -sub_ocr_datapath CLI options
F Tests + Documentation · 2 patches
  • 13 FATE tests covering encoder, quantization, animation, overlap, DTS, palette delta, OCR, and GIF
  • doc/encoders.texi (PGS), doc/ffmpeg.texi (OCR options), Changelog

Quality

Palette quantization PSNR at 256 colours (PGS maximum). NeuQuant is the default; Median Cut and ELBG available via -quantize_method. Full comparison.

Simple white text99.99 dB (all)
Karaoke highlight99.99 dB (all)
Multicolour (4 regions)69.20 dB (Median Cut)
RGB gradient stress test35.31 dB (ELBG)

OCR details

Tested across 114 languages with UDHR Article 1 roundtrip (text → PGS → text). 105 pass, 9 fail (Tesseract training data limitations). Bitmap deduplication skips OCR on palette-only changes (PGS fade sequences). PSM fallback handles RTL and complex scripts. Per-language preserve_interword_spaces tuning prevents word merging in Arabic, Hebrew, and Persian.

Tests

13 FATE tests run on every push via CI. 12 are self-contained; sub-pgs uses fate-suite samples. FFmpeg FATE

PGS encoder

sub-pgs
sub-pgs-overlap
api-pgs-fade
api-pgs-palette-delta
api-pgs-dts
api-pgs-overlap-verify

Animation & pipeline

api-pgs-animation-util
api-pgs-animation-timing
api-pgs-coalesce
api-pgs-rectsplit

Quantization & GIF

quantize
gifenc-rgba

OCR

sub-ocr-roundtrip

Full OCR language coverage (114 languages, 92% pass rate) on the OCR languages page.

Known upstream failures (not our patches)

sws-unscaled

Fails on clean FFmpeg 8.1 without our patches. Pre-existing libswscale issue.

Spec Compliance

Decoder model grounded in the Panasonic and Sony HDMV patents (US20090185789A1, US8638861B2, US7620297B2). Buffer sizes, transfer rates, and palette limits taken from the patents, not reverse-engineered from player behaviour.

SUPer by cubicibo used as a hardware-validated reference.

Series Evolution

Each iteration informed by testing, review, and upstream FFmpeg conventions. History preserved as git tags.

v0 Early experiments
  • Rapidly evolving, ported from PunkGraphicStream (Java/JNI)
  • PGS encoder, NeuQuant quantization, early text-to-bitmap work
  • Established the patent-grounded specification (docs/pgs-specification.md)
v1 Feature-complete series · 21 patches
  • First complete series: encoder, quantization API, text-to-bitmap, animation, GIF, OCR
  • Originally on a master snapshot, then rebased to FFmpeg 8.0.1
  • Released as n8.0.1-pgs1.0
v2 Consolidation · 17 patches
  • Consolidated from 21 to 17 patches
  • PGS decoder model compliance, GIF RGBA palette fix
v3 Structural refinement · 14 patches
  • Library/orchestration split: ff_sub_render_* / ff_sub_ocr_* in libavfilter, pipeline in fftools
  • ABI-safe enum ordering, FF-prefixed structs, no #include .c pattern
  • Released as n8.0.1-pgs3.0
v4 Expert review · 18 patches
  • 25 findings from Claude-driven review across 5 simulated perspectives (architecture, API, security, subtitle domain, process)
  • Clear Display Sets via AV_CODEC_PROP_EXPLICIT_END, avpriv_ cross-library prefixes
  • Subtitle utilities moved to libavutil/sub_util.{h,c}, internal structs to palettemap_internal.h
  • Documentation: doc/encoders.texi (PGS), doc/ffmpeg.texi (OCR options), Changelog
  • Every patch verified to compile independently
v5 Encoder optimisations · 23 patches
  • Palette delta encoding: only changed PDS entries written (83% reduction for single-entry fades)
  • Per-segment DTS in SUP muxer and per-packet DTS in fftools per HDMV timing model
  • Event lookahead window: overlapping subtitles with different durations produce correct Display Set sequence
  • Clear Display Sets via AV_CODEC_PROP_EXPLICIT_END and av_subtitle_needs_clear()
  • Released as n8.1-pgs5.0
  • 13 FATE tests

v5 Data

Measured against real PGS output.

Palette delta (1 entry changed)42 → 7 bytes (83%)
Palette delta (2 entries changed)42 → 12 bytes (71%)
DTS decode window (1920×1080)64.8 ms
DTS decode window (1280×720)28.8 ms
DTS decode window (720×480)10.8 ms
Clear Display Set17 bytes
Overlapping events: A 1s-5s, B 3s-7s

DS 1  T=1.0s  Epoch Start  A alone
DS 2  T=3.0s  Epoch Start  A+B composite
DS 3  T=5.0s  Epoch Start  B alone (A expired)
DS 4  T=7.0s  Normal       clear (B expired)

Links