Internet-Draft hang May 2025
Curley Expires 23 November 2025 [Page]
Workgroup:
moq
Internet-Draft:
draft-lcurley-moq-hang-latest
Published:
Intended Status:
Informational
Expires:
Author:
L. Curley

Media over QUIC - Hang

Abstract

Hang is a real-time conferencing protocol built on top of moq-lite. A room consists of multiple participants who publish media tracks. All updates are live, such as a change in participants or media tracks.

Discussion Venues

This note is to be removed before publishing as an RFC.

Discussion of this document takes place on the Media Over QUIC Working Group mailing list (moq@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/moq/.

Source for this draft and an issue tracker can be found at https://github.com/kixelated/moq-drafts.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 23 November 2025.

Table of Contents

1. Conventions and Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

2. Terminology

Hang is built on top of moq-lite [moql] and uses much of the same terminology. A quick recap:

Hang introduces additional terminology:

3. Discovery

The first requirement for a real-time conferencing application is to discover other participants in the same room. Hang does this using moq-lite's ANNOUNCE capabilities.

A room consists of a path. Any participants within the room MUST publish a broadcast with the room path as a prefix and it SHOULD end with the .hang suffix.

For example:

/room/alice.hang
/room/bob.hang
/other/zoe.hang

A participant issues an ANNOUNCE_PLEASE message to discover any other participants in the same room. The server (relay) will then respond with an ANNOUNCE message for any matching broadcasts, including their own.

For example:

ANNOUNCE_PLEASE prefix=/room/
ANNOUNCE suffix=alice.hang active=true
ANNOUNCE suffix=bob.hang   active=true

If a publisher no longer wants to participant, or is disconnected somehow, their presence will be unannounced. Publishers and subscribers SHOULD terminate any subscriptions once a participant is unannounced.

ANNOUNCE suffix=alice.hang active=false

4. Catalog

The catalog describes the available media tracks for a single participant. It's a JSON document that extends the the W3C WebCodecs specification.

The catalog is published as a catalog.json track within the broadcast so it can be updated live as the participant's media tracks change. A participant MAY forgo publishing a catalog if it does not wish to publish any media tracks now and in the future.

The catalog track consists of multiple groups, one for each update. Each group contains a single frame with UTF-8 JSON. A publisher MUST NOT write multiple frames to a group until a future specification includes a delta-encoding mechanism.

4.1. Root

The root of the catalog is a JSON document with the following schema:

type Catalog = {
        "audio": AudioTrack[],
        "video": VideoTrack[],
}

When there are multiple audio or video tracks, they SHOULD describe the same content. For example, different resolutions, codecs, bitrates, etc. If a participant wants to publish unrelated content, for example sharing the screen in addition to a webcam, it SHOULD publish a separate broadcast (and catalog).

Additional fields MAY be added based on the application. The catalog SHOULD be mostly static, delegating any dynamic content to other tracks. Additionally, a catalog SHOULD describe optional content, allowing the client to decide if it wants to subscribe.

For example, a "chat" field should include the name of a chat track, not individual chat messages. This way catalog updates are rare and a client MAY choose to not subscribe.

4.2. Video

A video track contains the necessary information to decode a video stream.

Hang uses the VideoDecoderConfig. This contains all of the information needed to configure a video decoder.

The track field includes the name and priority of the track within the broadcast.

type VideoTrack = {
        "track": {
                "name": string,
                "priority": number,
        },
        "config": VideoDecoderConfig,
}

For example:

{
        "track": {
                "name": "video",
                "priority": 2
        },
        "config": {
                "codec": "avc1.64001f",
                "dimensions": {
                        "width": 1280,
                        "height": 720
                },
                "bitrate": 6000000,
                "framerate": 30.0
        }
}

4.3. Audio

An audio track contains the necessary information to decode an audio stream.

The track field includes the name and priority of the track within the broadcast.

The config field contains an AudioDecoderConfig. This contains all of the information needed to configure an audio decoder.

type AudioTrack = {
        "track": {
                "name": string,
                "priority": number,
        },
        "config": AudioDecoderConfig,
}

For example:

{
        "track": {
                "name": "audio",
                "priority": 1
        },
        "config": {
                "codec": "opus",
                "sampleRate": 48000,
                "numberOfChannels": 2,
                "bitrate": 128000
        }
}

5. Media

Media tracks are split into groups and further into frames.

A group consists of one or more frames in decode order. Each group MUST start with a keyframe. If a codec supports delta frames (video), then all subsequent frames MUST be delta frames. Otherwise, a group MAY consist of multiple keyframes (audio).

Each "frame" consists of a tiny "container" containing the timestamp and codec specific payload. The timestamp is the presentation timestamp in microseconds encoded as a QUIC variable-length integer (62-bit max). The remainder of the frame payload is codec specific.

6. Security Considerations

TODO Security

7. IANA Considerations

This document has no IANA actions.

8. Normative References

[moql]
"*** BROKEN REFERENCE ***".
[moqt]
Nandakumar, S., Vasiliev, V., Swett, I., and A. Frindell, "Media over QUIC Transport", Work in Progress, Internet-Draft, draft-ietf-moq-transport-11, , <https://datatracker.ietf.org/doc/html/draft-ietf-moq-transport-11>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.

Acknowledgments

TODO acknowledge.

Author's Address

Luke Curley