Under The Hood#
This guide explains how the Pupil Labs’ Realtime API works on the wire and how this client library abstracts away some of the complexities of the underlying protocols.
HTTP REST API#
The Pupil Invisible Companion app hosts an HTTP REST API that can be used to query the phone’s current state, remote control it, and look up information about available data streams.
By default, the API is hosted at http://pi.local:8080/. The app will fallback to a different DNS name and/or port if the default values are taken by another app already. The current connection details can be looked up under the app’s main menu → Streaming. Alternatively, you can use Service discovery in the local network to find available devices.
Note
The device serves the built-in monitor web app (to be released soon!) at the
document root /
. The API is served under the /api
path. You can find the
full OpenAPI 3 specification of the REST API
here.
Start/stop/cancel recordings#
By sending HTTP POST
requests to the /api/recording:*
endpoints, you can start, stop, and cancel
recordings.
POST /api/recording:start - Starts a recording if possible
POST /api/recording:stop_and_save - Stops and saves the running recording if possible
POST /api/recording:cancel - Stops and discards the running recording if possible
Attention
In specific situations, the app will not comply with the request to start a new recording:
the selected template has required fields
the available storage amount is too low
the device battery is too low
no wearer has been selected
no workspace has been selected
the setup bottom sheets have not been completed
See also
simple
blocking implementations
Send events#
By HTTP POSTing
requests to the /api/event
endpoint, you can send labeled events to the device.
Events will be timestamped on reception. Alternatively, you can provide a Unix-epoch
timestamp in nanosecond. This is recommended if you want to control the timing of the
event.
POST /api/event - Sends an event to the device
See also
Implementations
simple
blocking:pupil_labs.realtime_api.simple.Device.send_event()
Asynchronous:
pupil_labs.realtime_api.device.Device.send_event()
Get Current Status#
By sending a HTTP GET
request to the /api/status
endpoint, you can receive information about the device’s
current status. This includes information about the battery and storage capacities,
connected sensors, and running recordings.
GET /api/status - Receive status from device
See also
Asynchronous implementations
pupil_labs.realtime_api.device.Device.get_status()
Websocket API#
In addition to the HTTP REST API above, the Pupil Invisible Companion device also
pushes status updates via a websocket connection. It is
hosted on the same port as the REST API. By default, you can connect to it via
ws://pi.local:8080/api/status
.
Tip
You can use this website to test the websocket connection.
The messages published via this connection have the same format as the Get Current Status endpoint.
Streaming API#
The Pupil Invisible Companion app uses the RTSP protocol (RFC 2326) to stream scene video and gaze data. Under the hood, communication is three-fold:
RTSP (RealTime Streaming Protocol) - Provides meta data about the corresponding stream
RTP (Realtime Transport Protocol) - Data delivery channel, contains actual payloads
RTCP (RTP Control Protocol) - Provides absolute time information to align multiple streams
The necessary connection information is made available via the Sensor model as part of the Get Current Status and Websocket API.
The RTSP connection URL follows the following pattern:
rtsp://<ip>:<port>/?<params>
Caution
Each stream is available via two connection types:
DIRECT
- direct RTSP connection, as described in this documentWEBSOCKET
- tunneling RTSP over a websocket connection to make it available to web browsers
See also
The Realtime Network API exposes this information via
pupil_labs.realtime_api.models.Status.direct_world_sensor()
and
pupil_labs.realtime_api.models.Status.direct_gaze_sensor()
, returning
pupil_labs.realtime_api.models.Sensor
instances.
RTSP#
The Real Time Streaming Protocol, or RTSP, is an application-level protocol for control over the delivery of data with real-time properties.
Of the various methods defined in the RTSP protocol, SETUP and DESCRIBE are particularly important for the transmission of the stream’s meta and connection information.
During the SETUP method, client and server exchange information about their corresponding port numbers for the RTP and RTCP connections.
The DESCRIBE response contains SDP (Session Description Protocol) data, describing the following stream attributes (via the media’s rtpmap):
encoding
- The encoding of the stream, e.g.H264
clockRate
- The clock rate of the stream’s relative clock
For video, it also exposes the sprop-parameter-sets via its format-specific
parameters (fmtp
).
These contain crucial information in order to initialize the corresponding video decoder.
Attention
Each stream has its own clock rate. For temporal alignment, the clock offset between the stream’s relative clock and the absolute NTP clock has to be calculated. See RTCP below.
See also
To encode gaze data, a custom encoding called com.pupillabs.gaze1
is used.
You can find more information about it below.
RTP#
[The real-time transport protocol] provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services. […] The data transport is augmented by a control protocol (RTCP) […]. RTP and RTCP are designed to be independent of the underlying transport and network layers.
Payloads can be split across multiple RTP packets. Their order can be identified via the packet header’s sequence number. Packets belonging to the same payload have the same timestamp. The payloads can be decoded individually. See Decoding Gaze Data and Decoding Video Data below.
See also
Read more about the RTP timestamp mechanism here.
See also
The Realtime Python API exposes raw RTP data via
pupil_labs.realtime_api.streaming.base.RTSPRawStreamer.receive()
and
calculates relative RTP packet timestamps in
pupil_labs.realtime_api.streaming.base._WallclockRTSPReader.relative_timestamp_from_packet()
.
RTCP#
The most important role that the RTP control protocol plays for the Pupil Labs Realtime Network API is to provide timestamps in relative stream time and in absolute NTP time (SR RTCP Packet type).
Relative timestamps are calculated by dividing the packet timestamp (numerator) by the
clock rate (denominator), e.g. a timestamp of 250 at a clock rate of 50 Hz corresponds
to 250 / 50 = 5
seconds.
Wallclock time (absolute date and time) is represented using the timestamp format of the Network Time Protocol (NTP), which is in seconds relative to 1 January 1900 00:00:00 UTC. The full resolution NTP timestamp is a 64-bit unsigned fixed-point number with the integer part in the first 32 bits and the fractional part in the last 32 bits.
Source: https://datatracker.ietf.org/doc/html/rfc3550#section-4
Knowing time points in both corresponding clocks, relative and absolute one, allows one to calculate the clock offset between the two clocks. This is done by subtracting the one from the other. The offset is then added to new relative timestamps to get the corresponding time.
Attention
The Realtime Python API converts absolute NTP timestamps to nanoseconds in Unix
epoch (time since 1 January 1970 00:00:00 UTC). This corresponds to the same
time base and unit returned by time.time_ns()
.
Decoding Gaze Data#
Gaze data is encoded in network byte order (big-endian) and consists of
x
- horizontal component of the gaze location in pixels within the scene cameras coordinate system. The value is encoded as a 32-bit float.y
- vertical component of the gaze location in pixels within the scene cameras coordinate system. The value is encoded as a 32-bit float.worn
- a boolean indicating whether the user is wearing the device. The value is encoded as an unsigned 8-bit integer as either255
(device is being worn) or0
(device is not being worn).
Each RTP packet contains one gaze datum and has therefore a payload length of 9 bytes.
See also
The Realtime Python API exposes gaze data via
pupil_labs.realtime_api.streaming.gaze.RTSPGazeStreamer.receive()
and
Decoding Video Data#
Video frames are split across multiple RTP packets. The payload is wrapped in the additional Network Abstraction Layer (NAL). This allows finding frame boundaries across fragmented payloads without relying on the RTP meta information.
Once the data is unpacked from the NAL, it can be passed to a corresponding video
decoder, e.g. pyav's av.CodecContext
.
Important
The video decoder needs to be initialized with the sprop-parameter-sets exposed via the RTSP DESCRIBE method.
See also
The Realtime Python API implements the NAL unpacking here
Service discovery in the local network#
To avoid having to manually copy the IP address from the Pupil Invisible Companion user
interface, the application announces its REST API endpoint via multicast DNS service
discovery.
Specifically, it announces a service of type _http._tcp.local.
and uses the folloing
naming pattern:
PI monitor:<phone name>:<phone hardware id>._http._tcp.local.
See also
The service name is exposed via
The phone name component is exposed via
The phone hardware id component is exposed via
The client’s pupil_labs.realtime_api.discovery
module uses the
zeroconf
Python package under the hood to discover services.