Development: Upstream Patch Series

A patch series adding subtitle format conversion to FFmpeg, maintained against master and the 8.1 release branch. Currently in its seventh iteration.

Approach

Prior art. Two attempts stalled. Coza (2022) put libass in libavcodec — a dependency violation. softworkz (2021–2025) proposed full subtitle filter infrastructure across 25 patches and 89 files; design questions remain unresolved.

This series takes the minimal path. libass (rendering) and Tesseract (OCR) already live in libavfilter. Small utility APIs there, called from fftools in the same pattern as sub2video, handle both directions. No filter infrastructure. No AVFrame changes.

Patches

Stable: pgs7-8.1 (29 patches). Latest: pgs7 (29 patches on master). Plus a standalone DVB fix (1 patch). Structured as independent submission groups for upstream review.

PGS Encoder · 11 patches

PGS encoder with composition state machine, decoder model compliance, palette animation
Palette delta encoding: only changed PDS entries written (83% reduction for single-entry fades)
Per-segment DTS in SUP muxer, per-packet DTS in fftools per HDMV timing model
force_all option for forced subtitles; max_cdb_usage CDB rate control
Forced disposition bridge: automatic AV_DISPOSITION_FORCED ↔ per-rect flag propagation
-forced_subs_filter CLI option to split streams by forced flag
9 FATE tests (encoder, palette, DTS, overlap, forced, rate control)

Colour Quantization · 10 patches

NeuQuant, Median Cut, and ELBG quantizers via av_quantize_* API
OkLab colour space, palette remapping and dithering extracted from vf_paletteuse
Region-weighted palette generation for overlapping subtitle events
GIF encoder: direct RGBA-to-GIF via built-in quantization and dithering
Algorithm comparison with PSNR and timing data

Text-to-Bitmap Conversion · 6 patches

libass rendering utility in libavfilter; subtitle bitmap utilities in libavutil
Orchestration in fftools: ffmpeg_enc_sub.c (render, quantize, animate, coalesce)
Event lookahead window for overlapping subtitles with different durations
forced_style option: ASS style names → forced flag (comma-separated, opt-in)

OCR (Bitmap to Text) · 2 patches

Tesseract OCR in libavfilter; orchestration in fftools (ffmpeg_dec_sub.c)
Bitmap deduplication for PGS fade sequences; PSM fallback for RTL scripts
-sub_ocr_lang, -sub_ocr_datapath CLI options

Quality

Palette quantization PSNR at 256 colours (PGS maximum). NeuQuant is the default; Median Cut and ELBG available via -quantize_method. Full comparison.

Simple white text99.99 dB (all)

Karaoke highlight99.99 dB (all)

Multicolour (4 regions)69.20 dB (Median Cut)

RGB gradient stress test35.31 dB (ELBG)

OCR details

Tested across 114 languages with UDHR Article 1 roundtrip (text → PGS → text). 105 pass, 9 fail (Tesseract training data limitations). Bitmap deduplication skips OCR on palette-only changes (PGS fade sequences). PSM fallback handles RTL and complex scripts. Per-language preserve_interword_spaces tuning prevents word merging in Arabic, Hebrew, and Persian.

Tests

18 FATE tests run on every push via CI. 17 are self-contained; sub-pgs uses fate-suite samples.

PGS encoder

✓sub-pgs

✓sub-pgs-overlap

✓api-pgs-fade

✓api-pgs-palette-delta

✓api-pgs-palette-reuse

✓api-pgs-multi-object

✓api-pgs-dts

✓api-pgs-ap-interval

✓api-pgs-overlap-verify

Forced subtitles & rate control

✓api-pgs-forced

✓api-pgs-rate-control

Animation & pipeline

✓api-pgs-animation-util

✓api-pgs-animation-timing

✓api-pgs-coalesce

✓api-pgs-rectsplit

Quantization & GIF

✓quantize

✓gifenc-rgba

OCR

✓sub-ocr-roundtrip

Full OCR language coverage (114 languages, 92% pass rate) on the OCR languages page.

Known upstream failures (not our patches)

✗sws-unscaled

Fails on clean FFmpeg 8.1 without our patches. Pre-existing libswscale issue.

Spec Compliance

Decoder model grounded in the Panasonic and Sony HDMV patents (US20090185789A1, US8638861B2, US7620297B2). Buffer sizes, transfer rates, and palette limits taken from the patents, not reverse-engineered from player behaviour.

SUPer by cubicibo used as a hardware-validated reference.

Series Evolution

Each iteration informed by testing, review, and upstream FFmpeg conventions. History preserved as git tags.

v0 Early experiments

Rapidly evolving, ported from PunkGraphicStream (Java/JNI)
PGS encoder, NeuQuant quantization, early text-to-bitmap work
Established the patent-grounded specification (docs/pgs-specification.md)

v1 Feature-complete series · 21 patches

First complete series: encoder, quantization API, text-to-bitmap, animation, GIF, OCR
Originally on a master snapshot, then rebased to FFmpeg 8.0.1
Released as n8.0.1-pgs1.0

v2 Consolidation · 17 patches

Consolidated from 21 to 17 patches
PGS decoder model compliance, GIF RGBA palette fix

v3 Structural refinement · 14 patches

Library/orchestration split: ff_sub_render_* / ff_sub_ocr_* in libavfilter, pipeline in fftools
ABI-safe enum ordering, FF-prefixed structs, no #include .c pattern
Released as n8.0.1-pgs3.0

v4 Expert review · 18 patches

25 findings from Claude-driven review across 5 simulated perspectives (architecture, API, security, subtitle domain, process)
Clear Display Sets via AV_CODEC_PROP_EXPLICIT_END, avpriv_ cross-library prefixes
Subtitle utilities moved to libavutil/sub_util.{h,c}, internal structs to palettemap_internal.h
Documentation: doc/encoders.texi (PGS), doc/ffmpeg.texi (OCR options), Changelog
Every patch verified to compile independently

v5 Encoder optimisations · 23 patches

Palette delta encoding: only changed PDS entries written (83% reduction for single-entry fades)
Per-segment DTS in SUP muxer and per-packet DTS in fftools per HDMV timing model
Event lookahead window: overlapping subtitles with different durations produce correct Display Set sequence
Clear Display Sets via AV_CODEC_PROP_EXPLICIT_END and av_subtitle_needs_clear()
Released as n8.1-pgs5.0
13 FATE tests

v6 Forced subtitle pipeline · 32 patches

force_all option, disposition bridge (AV_DISPOSITION_FORCED ↔ per-rect flags)
DVB forced types 0x30–0x35 in MPEG-TS demuxer and muxer
-forced_subs_filter CLI option: split streams by forced/non-forced
forced_style: ASS style names → forced flag (comma-separated, opt-in)
CDB rate control: max_cdb_usage drops events exceeding HDMV buffer model
5 new FATE tests (18 total), shared pgs-test-util.h

v7 Upstream restructuring · 30 patches

Restructured into 4 independent submission groups for upstream review
Encoder group merges without quantization API or text-to-bitmap infrastructure
Rebased onto FFmpeg master and 8.1
Released as n8.1-pgs7.0

Data

Measured against real PGS output.

Palette delta (1 entry changed)42 → 7 bytes (83%)

Palette delta (2 entries changed)42 → 12 bytes (71%)

DTS decode window (1920×1080)64.8 ms

DTS decode window (1280×720)28.8 ms

DTS decode window (720×480)10.8 ms

Clear Display Set17 bytes

Overlapping events: A 1s-5s, B 3s-7s

DS 1  T=1.0s  Epoch Start  A alone
DS 2  T=3.0s  Epoch Start  A+B composite
DS 3  T=5.0s  Epoch Start  B alone (A expired)
DS 4  T=7.0s  Normal       clear (B expired)

Links

Patches (v7, latest) Patches (8.1 stable) Quantizer comparison OCR languages Colour distance metrics #3819 #6843