← Stable release & downloads

Development: Upstream Patch Series

A patch series adding subtitle format conversion to FFmpeg, maintained against master and the 8.1 release branch. Currently in its seventh iteration.

Approach

Prior art. Two attempts stalled. Coza (2022) put libass in libavcodec — a dependency violation. softworkz (2021–2025) proposed full subtitle filter infrastructure across 25 patches and 89 files; design questions remain unresolved.

This series takes the minimal path. libass (rendering) and Tesseract (OCR) already live in libavfilter. Small utility APIs there, called from fftools in the same pattern as sub2video, handle both directions. No filter infrastructure. No AVFrame changes.

Patches

Stable: pgs7-8.1 (29 patches). Latest: pgs7 (29 patches on master). Plus a standalone DVB fix (1 patch). Structured as independent submission groups for upstream review.

PGS Encoder · 11 patches
  • PGS encoder with composition state machine, decoder model compliance, palette animation
  • Palette delta encoding: only changed PDS entries written (83% reduction for single-entry fades)
  • Per-segment DTS in SUP muxer, per-packet DTS in fftools per HDMV timing model
  • force_all option for forced subtitles; max_cdb_usage CDB rate control
  • Forced disposition bridge: automatic AV_DISPOSITION_FORCED ↔ per-rect flag propagation
  • -forced_subs_filter CLI option to split streams by forced flag
  • 9 FATE tests (encoder, palette, DTS, overlap, forced, rate control)
Colour Quantization · 10 patches
  • NeuQuant, Median Cut, and ELBG quantizers via av_quantize_* API
  • OkLab colour space, palette remapping and dithering extracted from vf_paletteuse
  • Region-weighted palette generation for overlapping subtitle events
  • GIF encoder: direct RGBA-to-GIF via built-in quantization and dithering
  • Algorithm comparison with PSNR and timing data
Text-to-Bitmap Conversion · 6 patches
  • libass rendering utility in libavfilter; subtitle bitmap utilities in libavutil
  • Orchestration in fftools: ffmpeg_enc_sub.c (render, quantize, animate, coalesce)
  • Event lookahead window for overlapping subtitles with different durations
  • forced_style option: ASS style names → forced flag (comma-separated, opt-in)
OCR (Bitmap to Text) · 2 patches
  • Tesseract OCR in libavfilter; orchestration in fftools (ffmpeg_dec_sub.c)
  • Bitmap deduplication for PGS fade sequences; PSM fallback for RTL scripts
  • -sub_ocr_lang, -sub_ocr_datapath CLI options

Quality

Palette quantization PSNR at 256 colours (PGS maximum). NeuQuant is the default; Median Cut and ELBG available via -quantize_method. Full comparison.

Simple white text99.99 dB (all)
Karaoke highlight99.99 dB (all)
Multicolour (4 regions)69.20 dB (Median Cut)
RGB gradient stress test35.31 dB (ELBG)

OCR details

Tested across 114 languages with UDHR Article 1 roundtrip (text → PGS → text). 105 pass, 9 fail (Tesseract training data limitations). Bitmap deduplication skips OCR on palette-only changes (PGS fade sequences). PSM fallback handles RTL and complex scripts. Per-language preserve_interword_spaces tuning prevents word merging in Arabic, Hebrew, and Persian.

Tests

18 FATE tests run on every push via CI. 17 are self-contained; sub-pgs uses fate-suite samples. FFmpeg FATE

PGS encoder

sub-pgs
sub-pgs-overlap
api-pgs-fade
api-pgs-palette-delta
api-pgs-palette-reuse
api-pgs-multi-object
api-pgs-dts
api-pgs-ap-interval
api-pgs-overlap-verify

Forced subtitles & rate control

api-pgs-forced
api-pgs-rate-control

Animation & pipeline

api-pgs-animation-util
api-pgs-animation-timing
api-pgs-coalesce
api-pgs-rectsplit

Quantization & GIF

quantize
gifenc-rgba

OCR

sub-ocr-roundtrip

Full OCR language coverage (114 languages, 92% pass rate) on the OCR languages page.

Known upstream failures (not our patches)

sws-unscaled

Fails on clean FFmpeg 8.1 without our patches. Pre-existing libswscale issue.

Spec Compliance

Decoder model grounded in the Panasonic and Sony HDMV patents (US20090185789A1, US8638861B2, US7620297B2). Buffer sizes, transfer rates, and palette limits taken from the patents, not reverse-engineered from player behaviour.

SUPer by cubicibo used as a hardware-validated reference.

Series Evolution

Each iteration informed by testing, review, and upstream FFmpeg conventions. History preserved as git tags.

v0 Early experiments
  • Rapidly evolving, ported from PunkGraphicStream (Java/JNI)
  • PGS encoder, NeuQuant quantization, early text-to-bitmap work
  • Established the patent-grounded specification (docs/pgs-specification.md)
v1 Feature-complete series · 21 patches
  • First complete series: encoder, quantization API, text-to-bitmap, animation, GIF, OCR
  • Originally on a master snapshot, then rebased to FFmpeg 8.0.1
  • Released as n8.0.1-pgs1.0
v2 Consolidation · 17 patches
  • Consolidated from 21 to 17 patches
  • PGS decoder model compliance, GIF RGBA palette fix
v3 Structural refinement · 14 patches
  • Library/orchestration split: ff_sub_render_* / ff_sub_ocr_* in libavfilter, pipeline in fftools
  • ABI-safe enum ordering, FF-prefixed structs, no #include .c pattern
  • Released as n8.0.1-pgs3.0
v4 Expert review · 18 patches
  • 25 findings from Claude-driven review across 5 simulated perspectives (architecture, API, security, subtitle domain, process)
  • Clear Display Sets via AV_CODEC_PROP_EXPLICIT_END, avpriv_ cross-library prefixes
  • Subtitle utilities moved to libavutil/sub_util.{h,c}, internal structs to palettemap_internal.h
  • Documentation: doc/encoders.texi (PGS), doc/ffmpeg.texi (OCR options), Changelog
  • Every patch verified to compile independently
v5 Encoder optimisations · 23 patches
  • Palette delta encoding: only changed PDS entries written (83% reduction for single-entry fades)
  • Per-segment DTS in SUP muxer and per-packet DTS in fftools per HDMV timing model
  • Event lookahead window: overlapping subtitles with different durations produce correct Display Set sequence
  • Clear Display Sets via AV_CODEC_PROP_EXPLICIT_END and av_subtitle_needs_clear()
  • Released as n8.1-pgs5.0
  • 13 FATE tests
v6 Forced subtitle pipeline · 32 patches
  • force_all option, disposition bridge (AV_DISPOSITION_FORCED ↔ per-rect flags)
  • DVB forced types 0x30–0x35 in MPEG-TS demuxer and muxer
  • -forced_subs_filter CLI option: split streams by forced/non-forced
  • forced_style: ASS style names → forced flag (comma-separated, opt-in)
  • CDB rate control: max_cdb_usage drops events exceeding HDMV buffer model
  • 5 new FATE tests (18 total), shared pgs-test-util.h
v7 Upstream restructuring · 30 patches
  • Restructured into 4 independent submission groups for upstream review
  • Encoder group merges without quantization API or text-to-bitmap infrastructure
  • Rebased onto FFmpeg master and 8.1
  • Released as n8.1-pgs7.0

Data

Measured against real PGS output.

Palette delta (1 entry changed)42 → 7 bytes (83%)
Palette delta (2 entries changed)42 → 12 bytes (71%)
DTS decode window (1920×1080)64.8 ms
DTS decode window (1280×720)28.8 ms
DTS decode window (720×480)10.8 ms
Clear Display Set17 bytes
Overlapping events: A 1s-5s, B 3s-7s

DS 1  T=1.0s  Epoch Start  A alone
DS 2  T=3.0s  Epoch Start  A+B composite
DS 3  T=5.0s  Epoch Start  B alone (A expired)
DS 4  T=7.0s  Normal       clear (B expired)

Links