Under The Hood#
This guide explains how the Pupil Labs’ Realtime API works on the wire and how this client library abstracts away some of the complexities of the underlying protocols.
HTTP REST API#
The Pupil Invisible Companion app hosts an HTTP REST API that can be used to query the phone’s current state, remote control it, and look up information about available data streams.
By default, the API is hosted at http://pi.local:8080/. The app will fallback to a different DNS name and/or port if the default values are taken by another app already. The current connection details can be looked up under the app’s main menu → Streaming. Alternatively, you can use Service discovery in the local network to find available devices.
The device serves the built-in monitor web app (to be released soon!) at the
/. The API is served under the
/api path. You can find the
full OpenAPI 3 specification of the REST API
By sending HTTP POST
requests to the
/api/recording:* endpoints, you can start, stop, and cancel
POST /api/recording:start - Starts a recording if possible
POST /api/recording:stop_and_save - Stops and saves the running recording if possible
POST /api/recording:cancel - Stops and discards the running recording if possible
In specific situations, the app will not comply with the request to start a new recording:
the selected template has required fields
the available storage amount is too low
the device battery is too low
no wearer has been selected
no workspace has been selected
the setup bottom sheets have not been completed
simple blocking implementations
By HTTP POSTing
requests to the
/api/event endpoint, you can send labeled events to the device.
Events will be timestamped on reception. Alternatively, you can provide a Unix-epoch
timestamp in nanosecond. This is recommended if you want to control the timing of the
POST /api/event - Sends an event to the device
Get Current Status#
By sending a HTTP GET
request to the
/api/status endpoint, you can receive information about the device’s
current status. This includes information about the battery and storage capacities,
connected sensors, and running recordings.
GET /api/status - Receive status from device
In addition to the HTTP REST API above, the Pupil Invisible Companion device also
pushes status updates via a websocket connection. It is
hosted on the same port as the REST API. By default, you can connect to it via
You can use this website to test the websocket connection.
The messages published via this connection have the same format as the Get Current Status endpoint.
The Pupil Invisible Companion app uses the RTSP protocol (RFC 2326) to stream scene video and gaze data. Under the hood, communication is three-fold:
RTSP (RealTime Streaming Protocol) - Provides meta data about the corresponding stream
RTP (Realtime Transport Protocol) - Data delivery channel, contains actual payloads
RTCP (RTP Control Protocol) - Provides absolute time information to align multiple streams
The RTSP connection URL follows the following pattern:
Each stream is available via two connection types:
DIRECT- direct RTSP connection, as described in this document
WEBSOCKET- tunneling RTSP over a websocket connection to make it available to web browsers
The Realtime Network API exposes this information via
The Real Time Streaming Protocol, or RTSP, is an application-level protocol for control over the delivery of data with real-time properties.
encoding- The encoding of the stream, e.g.
clockRate- The clock rate of the stream’s relative clock
Each stream has its own clock rate. For temporal alignment, the clock offset between the stream’s relative clock and the absolute NTP clock has to be calculated. See RTCP below.
To encode gaze data, a custom encoding called
com.pupillabs.gaze1 is used.
You can find more information about it below.
[The real-time transport protocol] provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services. […] The data transport is augmented by a control protocol (RTCP) […]. RTP and RTCP are designed to be independent of the underlying transport and network layers.
Payloads can be split across multiple RTP packets. Their order can be identified via the packet header’s sequence number. Packets belonging to the same payload have the same timestamp. The payloads can be decoded individually. See Decoding Gaze Data and Decoding Video Data below.
Read more about the RTP timestamp mechanism here.
The Realtime Python API exposes raw RTP data via
calculates relative RTP packet timestamps in
The most important role that the RTP control protocol plays for the Pupil Labs Realtime Network API is to provide timestamps in relative stream time and in absolute NTP time (SR RTCP Packet type).
Relative timestamps are calculated by dividing the packet timestamp (numerator) by the
clock rate (denominator), e.g. a timestamp of 250 at a clock rate of 50 Hz corresponds
250 / 50 = 5 seconds.
Wallclock time (absolute date and time) is represented using the timestamp format of the Network Time Protocol (NTP), which is in seconds relative to 1 January 1900 00:00:00 UTC. The full resolution NTP timestamp is a 64-bit unsigned fixed-point number with the integer part in the first 32 bits and the fractional part in the last 32 bits.
Knowing time points in both corresponding clocks, relative and absolute one, allows one to calculate the clock offset between the two clocks. This is done by subtracting the one from the other. The offset is then added to new relative timestamps to get the corresponding time.
The Realtime Python API converts absolute NTP timestamps to nanoseconds in Unix
epoch (time since 1 January 1970 00:00:00 UTC). This corresponds to the same
time base and unit returned by
Decoding Gaze Data#
Gaze data is encoded in network byte order (big-endian) and consists of
x- horizontal component of the gaze location in pixels within the scene cameras coordinate system. The value is encoded as a 32-bit float.
y- vertical component of the gaze location in pixels within the scene cameras coordinate system. The value is encoded as a 32-bit float.
worn- a boolean indicating whether the user is wearing the device. The value is encoded as an unsigned 8-bit integer as either
255(device is being worn) or
0(device is not being worn).
Each RTP packet contains one gaze datum and has therefore a payload length of 9 bytes.
The Realtime Python API exposes gaze data via
Decoding Video Data#
Video frames are split across multiple RTP packets. The payload is wrapped in the additional Network Abstraction Layer (NAL). This allows finding frame boundaries across fragmented payloads without relying on the RTP meta information.
Once the data is unpacked from the NAL, it can be passed to a corresponding video
The Realtime Python API implements the
NAL unpacking here
Service discovery in the local network#
To avoid having to manually copy the IP address from the Pupil Invisible Companion user
interface, the application announces its REST API endpoint via multicast DNS service
Specifically, it announces a service of type
_http._tcp.local. and uses the folloing
PI monitor:<phone name>:<phone hardware id>._http._tcp.local.
The service name is exposed via
The phone name component is exposed via
The phone hardware id component is exposed via