RTP vs. RTSP Streaming Media Protocols: Everything You Need To Know About RTSP Video Streams

There are two protocols commonly used to stream video over the internet: Real-Time Transport Protocol (RTP) and Real Time Streaming Protocol (RTSP). In this post, we’ll discuss the differences between RTP and RTSP, the pros and cons of using each protocol and how to implement these streaming media protocols on your devices.

Contents

Introduction to RTP and RTSP

What is RTP? What is RTSP? Often used interchangeably, RTSP is a real-time streaming protocol, while RTP is the transport protocol used to transport media data negotiated over RTSP. RTSP is used to command servers — setting up, playing, pausing or tearing down the stream. On the other hand, RTP transports audio and video and handles other tasks, such as packetization, reordering, jitter control, quality of service, support for lip sync, etc.

RTP is used primarily to stream either H.264 or MPEG-4 video. RTP is a system protocol that provides mechanisms to synchronize the presentation of different streams. As such, it performs some of the same functions as an MPEG-2 transport or program stream.

RTP is codec-agnostic, which means carrying a large number of codec types inside RTP is possible. For each protocol, the Internet Engineering Task Force defines an RTP profile that specifies any codec-specific details of mapping data from the codec into RTP packets. Profiles are defined for H.264, MPEG-4 video and audio and many more. Even VC-1 — the “standardized” form of Windows Media Video — has an RTP profile.

As you probably know, there are a number of predominant ways to send MPEG-4 or H.264 video using RTP, all of which follow some relevant standards. If you’re writing a decoder, you’ll typically need to address all of them, so here’s a quick overview.

Techniques and Best Practices for RTP and RTSP Streaming

Multicast Delivery: RTP Over UDP

In an environment with one source of a video stream and many viewers, each frame of video and audio ideally only transits the network once. This is how multicast delivery works. In a multicast network, each viewer must retrieve an SDP file through some unspecified mechanism, which is usually HTTP. Once retrieved, the SDP file gives enough information for the viewer to find the multicast streams on the network and begin playback.

In the multicast delivery scenario, each individual stream is sent on a pair of different UDP ports — one for data and the second for the related RTP Control Protocol (RTCP). That means for a video program consisting of a video stream and two audio streams, you’ll see packets being delivered to six UDP ports:

Video data delivered over RTP.
The related RTCP port for the video stream.
Primary audio data delivered over RTP.
The related RTCP port for the primary audio stream.
Secondary audio data delivered over RTP.
The related RTCP port for the secondary audio stream.

Timestamps in the RTP headers can be used to synchronize the presentation of the various streams.

As a side note, RTCP is almost vestigial for most applications. It’s specified in RFC 3550 along with RTP. If you’re implementing a decoder, you’ll need to listen on the RTCP ports, but you can almost ignore any data sent to you. The exceptions are the sender report, which you’ll need to match up the timestamps between the streams, and the BYE, which some sources will send as they tear down a stream.

Multicast video delivery works best for live content. Because each viewer is viewing the same stream, individual viewers can’t pause, seek, rewind or fast forward the stream.

Unicast Delivery: RTP Over UDP

Sending unicast video over UDP is also possible, with one copy of the video transiting the network for each client. Unicast delivery can be used for both live and stored content. In the stored content case, additional control commands can pause, seek and enter fast-forward and rewind modes.

In this case, the player generally first establishes a control connection to a server using RTSP. In theory, RTSP can be used over UDP or TCP, but in practice, it is almost always used over TCP.

The player is usually started with an rtsp:// URL, which causes it to connect over TCP to the RTSP server. After some back and forth between the player and the RTSP server, during which the server sends the client an SDP file describing the stream, the server begins sending video to the client over UDP. As with the multicast delivery case, a pair of UDP ports is used for each elementary stream.

For seekable streams, once the video is playing, the player has additional control using RTSP: It can cause playback to pause, seek to a different position or enter fast-forward or rewind mode.

RTSP Interleaved Mode: RTP and RTSP Over TCP

I’m not a fan of streaming video over TCP. In the event a packet is lost in the network, it’s usually worse to wait for retransmission (which is what happens with TCP’s guaranteed delivery) than it is to allow the resulting video glitch to pass through to the user (which is what happens with UDP).

However, a handful of different networking configurations would block UDP video; in particular, firewalls historically have interacted badly with the two modes of UDP delivery summarized above.

So the RTSP RFC, in section 10.12, briefly outlines a mode of interleaving the RTP and RTCP packets onto the existing TCP connection being used for RTSP. Each RTP and RTCP packet is given a four-byte prefix and dropped onto the TCP stream. The result is that the player connects to the RTSP server, and all communication flows over a single TCP connection between the two.

HTTP Tunneled Mode: RTP and RTSP Over HTTP Over TCP

You would think RTSP interleaved mode — designed to transmit video across firewalls — would be the end. Still, it turns out that many firewalls aren’t configured to allow connections to RTSP port 554.

Consequently, Apple invented a method of mapping the entire RTSP interleaved communication on top of HTTP, meaning the video ultimately flows across TCP port 80. To my knowledge, this HTTP tunneled mode is not standardized in any official RFC but is so widely implemented that it has become a de facto standard.

Why the RTSP Protocol is Best for Surveillance

In our modern world, video streaming protocols are the heart and soul of how video content gets from point A to point B. While both protocols handle video communication, RTSP is more flexible, reliable and widely used than RTP and supports extensions lacking in RTP. Despite RTSP no longer dominating the world of streaming media, it remains the standard in many surveillance and closed-circuit television (CCTV) architectures because RTSP remains the IP camera protocol of choice.

If your team needs assistance building a digital video or streaming media product, our team of experts can help you launch world-class innovations that meet your business objectives and delight customers. Let us know how we can support your next project!

Check out these other streaming blogs:

If you enjoyed this blog, subscribe to our quarterly newsletter below!

The Many Ways to Stream Video Using RTP vs RTSP