Under The Hood

This guide explains how the Pupil Labs’ Realtime API works on the wire and how this client library abstracts away some of the complexities of the underlying protocols.


The Pupil Invisible Companion app hosts an HTTP REST API that can be used to query the phone’s current state, remote control it, and look up information about available data streams.

By default, the API is hosted at http://pi.local:8080/. The app will fallback to a different DNS name and/or port if the default values are taken by another app already. The current connection details can be looked up under the app’s main menu → Streaming. Alternatively, you can use Service discovery in the local network to find available devices.


The device serves the built-in monitor web app (to be released soon!) at the document root /. The API is served under the /api path. You can find the full OpenAPI 3 specification of the REST API here.

Start/stop/cancel recordings

By sending HTTP POST requests to the /api/recording:* endpoints, you can start, stop, and cancel recordings.


In specific situations, the app will not comply with the request to start a new recording:

  • the selected template has required fields

  • the available storage amount is too low

  • the device battery is too low

  • no wearer has been selected

  • no workspace has been selected

  • the setup bottom sheets have not been completed

Send events

By HTTP POSTing requests to the /api/event endpoint, you can send labeled events to the device. Events will be timestamped on reception. Alternatively, you can provide a Unix-epoch timestamp in nanosecond. This is recommended if you want to control the timing of the event.

Get Current Status

By sending a HTTP GET request to the /api/status endpoint, you can receive information about the device’s current status. This includes information about the battery and storage capacities, connected sensors, and running recordings.

See also

Asynchronous implementations pupil_labs.realtime_api.device.Device.get_status()

Websocket API

In addition to the HTTP REST API above, the Pupil Invisible Companion device also pushes status updates via a websocket connection. It is hosted on the same port as the REST API. By default, you can connect to it via ws://pi.local:8080/api/status.


You can use this website to test the websocket connection.

The messages published via this connection have the same format as the Get Current Status endpoint.

Streaming API

The Pupil Invisible Companion app uses the RTSP protocol (RFC 2326) to stream scene video and gaze data. Under the hood, communication is three-fold:

  • RTSP (RealTime Streaming Protocol) - Provides meta data about the corresponding stream

  • RTP (Realtime Transport Protocol) - Data delivery channel, contains actual payloads

  • RTCP (RTP Control Protocol) - Provides absolute time information to align multiple streams

The necessary connection information is made available via the Sensor model as part of the Get Current Status and Websocket API.

The RTSP connection URL follows the following pattern:



Each stream is available via two connection types:

  • DIRECT - direct RTSP connection, as described in this document

  • WEBSOCKET - tunneling RTSP over a websocket connection to make it available to web browsers


The Real Time Streaming Protocol, or RTSP, is an application-level protocol for control over the delivery of data with real-time properties.

Source: https://datatracker.ietf.org/doc/html/rfc2326

Of the various methods defined in the RTSP protocol, SETUP and DESCRIBE are particularly important for the transmission of the stream’s meta and connection information.

During the SETUP method, client and server exchange information about their corresponding port numbers for the RTP and RTCP connections.

The DESCRIBE response contains SDP (Session Description Protocol) data, describing the following stream attributes (via the media’s rtpmap):

  • encoding - The encoding of the stream, e.g. H264

  • clockRate - The clock rate of the stream’s relative clock

For video, it also exposes the sprop-parameter-sets via its format-specific parameters (fmtp). These contain crucial information in order to initialize the corresponding video decoder.


Each stream has its own clock rate. For temporal alignment, the clock offset between the stream’s relative clock and the absolute NTP clock has to be calculated. See RTCP below.

See also

To encode gaze data, a custom encoding called com.pupillabs.gaze1 is used. You can find more information about it below.


[The real-time transport protocol] provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services. […] The data transport is augmented by a control protocol (RTCP) […]. RTP and RTCP are designed to be independent of the underlying transport and network layers.

Source: https://datatracker.ietf.org/doc/html/rfc3550

Payloads can be split across multiple RTP packets. Their order can be identified via the packet header’s sequence number. Packets belonging to the same payload have the same timestamp. The payloads can be decoded individually. See Decoding Gaze Data and Decoding Video Data below.

See also

Read more about the RTP timestamp mechanism here.

See also

The Realtime Python API exposes raw RTP data via pupil_labs.realtime_api.streaming.base.RTSPRawStreamer.receive() and calculates relative RTP packet timestamps in pupil_labs.realtime_api.streaming.base._WallclockRTSPReader.relative_timestamp_from_packet().


The most important role that the RTP control protocol plays for the Pupil Labs Realtime Network API is to provide timestamps in relative stream time and in absolute NTP time (SR RTCP Packet type).

Relative timestamps are calculated by dividing the packet timestamp (numerator) by the clock rate (denominator), e.g. a timestamp of 250 at a clock rate of 50 Hz corresponds to 250 / 50 = 5 seconds.

Wallclock time (absolute date and time) is represented using the timestamp format of the Network Time Protocol (NTP), which is in seconds relative to 1 January 1900 00:00:00 UTC. The full resolution NTP timestamp is a 64-bit unsigned fixed-point number with the integer part in the first 32 bits and the fractional part in the last 32 bits.

Source: https://datatracker.ietf.org/doc/html/rfc3550#section-4

Knowing time points in both corresponding clocks, relative and absolute one, allows one to calculate the clock offset between the two clocks. This is done by subtracting the one from the other. The offset is then added to new relative timestamps to get the corresponding time.


The Realtime Python API converts absolute NTP timestamps to nanoseconds in Unix epoch (time since 1 January 1970 00:00:00 UTC). This corresponds to the same time base and unit returned by time.time_ns().

Decoding Gaze Data

Gaze data is encoded in network byte order (big-endian) and consists of

  1. x - horizontal component of the gaze location in pixels within the scene cameras coordinate system. The value is encoded as a 32-bit float.

  2. y - vertical component of the gaze location in pixels within the scene cameras coordinate system. The value is encoded as a 32-bit float.

  3. worn - a boolean indicating whether the user is wearing the device. The value is encoded as an unsigned 8-bit integer as either 255 (device is being worn) or 0 (device is not being worn).

Each RTP packet contains one gaze datum and has therefore a payload length of 9 bytes.

See also

The Realtime Python API exposes gaze data via pupil_labs.realtime_api.streaming.gaze.RTSPGazeStreamer.receive() and

Decoding Video Data

Video frames are split across multiple RTP packets. The payload is wrapped in the additional Network Abstraction Layer (NAL). This allows finding frame boundaries across fragmented payloads without relying on the RTP meta information.

Once the data is unpacked from the NAL, it can be passed to a corresponding video decoder, e.g. pyav's av.CodecContext.


The video decoder needs to be initialized with the sprop-parameter-sets exposed via the RTSP DESCRIBE method.

See also

The Realtime Python API implements the NAL unpacking here

Service discovery in the local network

To avoid having to manually copy the IP address from the Pupil Invisible Companion user interface, the application announces its REST API endpoint via multicast DNS service discovery. Specifically, it announces a service of type _http._tcp.local. and uses the folloing naming pattern:

PI monitor:<phone name>:<phone hardware id>._http._tcp.local.

The client’s pupil_labs.realtime_api.discovery module uses the zeroconf Python package under the hood to discover services.