WebVTT: Modern Web Subtitles

Posted on December 12, 2024 by SubZapβ€’7 min read
TechnicalWebCreativeAccessibility

As video streaming becomes ubiquitous, subtitles need to adapt to web platforms. WebVTT (Web Video Text Tracks) builds on SRT's simplicity while adding features specifically designed for web delivery.

A Brief History#

When HTML5 video emerged, it became clear that subtitles needed to evolve. Browser vendors and streaming platforms required a format that could handle modern web needs: precise styling, multiple languages, and accessibility features. WebVTT was born from these requirements, becoming the W3C standard for web subtitles.

From SRT to WebVTT#

If you're familiar with SRT files, WebVTT will feel natural. Let's look at the same subtitle in both formats:

srt
1
00:00:01,000 --> 00:00:04,000
Hello, world!
vtt
WEBVTT

00:00:01.000 --> 00:00:04.000
Hello, world!

The similarities are clear, but WebVTT introduces some key differences:

  • Required "WEBVTT" header

  • Numbers before timestamps no longer required (but suggested)

  • Periods instead of commas in timestamps (01:00.000 instead of 01:00,000)

WebVTT also introduces web-specific styling options using limited CSS (Cascading Style Sheets) syntax, along with support for regions and positioning. We'll get into this in the next section.

CSS-Style Formatting#

WebVTT's power comes from its CSS-like styling system. Using STYLE blocks, you can define how different elements appear:

vtt
WEBVTT

STYLE
::cue {
  color: white;
  background-color: rgba(0, 0, 0, 0.7);
  font-family: Arial, sans-serif;
}

::cue(b) {
  color: yellow;
  font-weight: bold;
}

::cue(.important) {
  color: red;
  font-weight: bold;
}

::cue(v[voice="narrator"]) {
  color: cyan;
  font-style: italic;
}

Styling Elements#

Different selectors target specific elements:

vtt
WEBVTT

STYLE
::cue(b) {
  color: yellow;
}

::cue(i) {
  font-style: italic;
  color: cyan;
}

00:00:01.000 --> 00:00:04.000
This is <b>bold</b> and this is <i>italic</i>

Class-Based Styling#

You can define custom classes for different types of text:

vtt
WEBVTT

STYLE
::cue(.important) {
  color: red;
  font-weight: bold;
}

::cue(.whisper) {
  color: gray;
  font-style: italic;
}

00:00:01.000 --> 00:00:05.000
<c.important>Critical announcement!</c>

00:00:06.000 --> 00:00:10.000
<c.whisper>secret message</c>

Voice-Based Styling#

Speakers can have distinct styles:

vtt
WEBVTT

STYLE
::cue(v[voice="narrator"]) {
  color: yellow;
  font-family: "Times New Roman", serif;
}

::cue(v[voice="character"]) {
  color: cyan;
  font-family: Arial, sans-serif;
}

00:00:01.000 --> 00:00:04.000
<v narrator>The story begins...</v>

00:00:04.000 --> 00:00:08.000
<v character>Hello, world!</v>

Language-Specific Styling#

Different languages can have distinct appearances:

vtt
WEBVTT

STYLE
::cue(:lang(en)) {
  color: white;
  font-family: Arial, sans-serif;
}

::cue(:lang(ja)) {
  color: yellow;
  font-family: "Noto Sans JP", sans-serif;
}

00:00:01.000 --> 00:00:04.000
<lang en>Welcome to the tutorial</lang>

00:00:04.000 --> 00:00:08.000
<lang ja>チγƒ₯γƒΌγƒˆγƒͺγ‚’γƒ«γΈγ‚ˆγ†γ“γ</lang>

Styling Limitations#

While WebVTT's styling system is powerful, it has some important restrictions:

  • Cannot load external resources

  • Limited to text-related CSS properties

  • Styling applies to entire cue boxes

  • No animation or transition effects

Anatomy of a WebVTT File#

Now that we understand WebVTT's styling capabilities, let's look at how a complete file comes together:

vtt
WEBVTT
Kind: captions
Language: en

STYLE
::cue {
  color: white;
  background-color: rgba(0, 0, 0, 0.7);
}

NOTE
This is a comment - it won't be displayed

1
00:00:01.000 --> 00:00:04.000
In today's video, we'll explore
the latest web technologies.

2
00:00:04.500 --> 00:00:08.000 align:end line:90%
Subscribe for more tutorials!

3
00:00:08.100 --> 00:00:12.000
<v Presenter>Thanks for watching!

Each file contains:

  • The WEBVTT header (required)

  • Optional metadata (Kind, Language)

  • STYLE blocks for formatting

  • Cue blocks with timing and text

  • Optional positioning attributes

Positioning and Layout#

Beyond styling, WebVTT offers precise control over subtitle positioning. Unlike traditional formats, WebVTT uses a web-native positioning system:

vtt
00:00:04.000 --> 00:00:08.000 align:end position:90%
Right-aligned subtitle

00:00:08.000 --> 00:00:12.000 line:10%
Subtitle near the top

00:00:12.000 --> 00:00:16.000 size:40%
Narrower subtitle

Common positioning properties:

  • align: Start, center, or end alignment

  • line: Vertical position (percentage or line number)

  • position: Horizontal position (percentage)

  • size: Width of the text box

Voice and Speaker Support#

For content with multiple speakers, WebVTT provides clear identification through voice tags, which can be styled as we saw earlier:

vtt
STYLE
::cue(v[voice="host"]) {
  color: yellow;
}

::cue(v[voice="guest"]) {
  color: cyan;
}

00:00:01.000 --> 00:00:04.000
<v host>Welcome to the show!

00:00:04.000 --> 00:00:08.000
<v guest>Thanks for having me.

This feature is particularly valuable for interviews, panel discussions, and educational materials. It also helps with accessibility requirements by making speaker changes clear to screen readers.

Working with WebVTT Files#

While WebVTT offers powerful styling and positioning features, keeping subtitles simple often works best. Follow these guidelines for reliable results:

Improving Readability#

The same principles that work for SRT apply to WebVTT:

  • Two lines maximum per subtitle

  • Around 40 characters per line

  • 20-25 characters per second

  • Natural line breaks

Technical Recommendations#

For robust WebVTT files:

  • Always use UTF-8 encoding

  • Test positioning on different screen sizes

  • Verify speaker labels work in your player

  • Keep styling consistent throughout

Platform Support#

WebVTT enjoys strong support across modern platforms, but capabilities vary.

Most players reliably support:

  • Basic subtitle display

  • Simple positioning

  • Speaker identification

  • Standard timing

However, test carefully when using:

  • Complex positioning

  • Custom styling

  • Regions

  • Advanced features

This is due to the web-based nature of WebVTT, which is not always well-supported outside of web browsers, since it requires layout and styling support traditionally only implemented in web browsers.

Common Use Cases#

Video streaming platforms have embraced WebVTT for its reliability and web-native features. The format particularly shines in online learning, where clear speaker identification and precise timing help viewers follow along.

Accessibility is another key strength. Screen readers handle WebVTT well, and the format's support for semantic markup helps create more inclusive content. The combination of CSS-like styling and semantic structure makes it possible to create subtitles that are both visually appealing and accessible.

Tools and Validation#

While any text editor can handle WebVTT files, specialized tools make creation and testing easier:

Professional subtitle editors include:

  • Aegisub: Supports WebVTT export

  • Subtitle Edit: Strong WebVTT support

  • Caption Maker: Web-focused editor

Common Mistakes#

Here are some typical WebVTT-specific issues to watch out for:

Incorrect STYLE block placement#

This example demonstrates how a STYLE block may be placed incorrectly. These blocks must always come before any cues (shown text) in the subtitle.

vtt
1
00:00:01.000 --> 00:00:04.000
First subtitle

STYLE
::cue {
  color: red;
}

Invalid CSS syntax#

This example demonstrates a common mistake when writing CSS syntax - a missing semicolon. For more information on the specific syntax of CSS, W3Schools provides many great articles on the topic.

vtt
WEBVTT

STYLE
::cue {
  color: red
  font-weight: bold;
}

Mixing class and voice tags incorrectly#

This example demonstrates invalid use of XML-like tags for class and voice (speaker labeling).

  • Using v.important is invalid and should be c.important (v for voice vs c for cue).

vtt
WEBVTT

STYLE
::cue(.important) {
  color: red;
}

00:00:01.000 --> 00:00:04.000
<v.important>Wrong syntax</v>

00:00:01.000 --> 00:00:04.000
<v speaker><c.important>Correct syntax</c></v>

Invalid positioning values#

This example demonstrates an invalid positioning value as well as an invalid alignment value.

  • position is set to 101%, which is invalid because percentages must be between 0 and 100.

  • align is set to middle, when it should be center.

vtt
00:00:01.000 --> 00:00:04.000 position:101%
First subtitle

00:00:04.000 --> 00:00:08.000 align:middle
Second subtitle

What's Next?#

Now that you understand WebVTT's capabilities, from its CSS-like styling system to positioning controls, you'll want to explore the tools that can create and edit these files efficiently. In our next article, we'll look at subtitle editors that support modern formats like WebVTT.

Time to put your web subtitles to work!