Within the SODALITE H2020 project, one of the areas we are focusing on is the deployment of optimized containers with accelerator-specific ML models for Edge-based inference across heterogeneous Edge Gateways in a Connected Vehicle Fleet. The main motivation for this is three-fold:
- While we can benefit from Cloud and HPC resources for base model training (e.g. TensorFlow), adaptations for specific accelerators are still required (e.g. preparing derivative TFLite models for execution on a GPU or EdgeTPU).
- The lifecycle of a vehicle far exceeds that of a specific Cloud service, meaning that we can not make assumptions about the environment we are deploying into over time.
- Different services may have a stronger need for a specific type of accelerator (e.g. GPU, FPGA), meaning that an existing service may need to be re-scheduled and re-deployed onto an another available resource on a best-fit basis.
Node Labelling in Kubernetes
While there has been existing work looking at node feature labelling in heterogeneous Kubernetes clusters, the most prominent of which is the official node feature discovery project, support for Edge Gateways, which are typically embedded SBCs (typically either as a standalone blackbox, or as part of a pre-existing In-Vehicle Infotainment (IVI) system) has been found to be somewhat lacking. In the case of the official NVIDIA GPU device plugin, for example, detection of the GPU requires use of the NVIDIA Management Library (NVML), which, in turn, assumes an enumerable PCI bus. Jetson Nano users with an integrated GPU are therefore simply out of luck at the moment. Others, such as the Coral Dev Board, provide for enumeration of the EdgeTPU via the PCI bus, but do not as of yet provide a specific device plugin to manage and expose the EdgeTPU device. In both of these cases, we can work around these limitations by tagging the node with platform-specific properties / device capabilities and deploying a container with a targeted accelerator-specific run-time environment.
Enter Device Tree
A common feature across most of these Edge Gateways is the existence of a semi-standard devicetree blob (DTB), which exposes a static description of hardware and its corresponding topology through a special tree structure. While a comprehensive explanation of the devicetree is out of scope for this post, those that are so inclined can read through the specification here. A general overview of the devicetree and its node structure is as follows:
/dts-v1/;/ {
nvidia,fastboot-usb-pid = <0xb442>;
compatible = "nvidia,jetson-nano", "nvidia,tegra210";
nvidia,proc-boardid = "3448";
nvidia,pmu-boardid = "3448";
serial-number = "xxxxxxxxxxxxxxxxx";
nvidia,dtbbuildtime = "Jul 16 2019", "17:09:35";
model = "NVIDIA Jetson Nano Developer Kit";
... gpu {
compatible = "nvidia,tegra210-gm20b", "nvidia,gm20b";
access-vpr-phys;
resets = <0x21 0xb8>;
status = "okay";
interrupts = <0x0 0x9d 0x4 0x0 0x9e 0x4>;
reg = <0x0 0x57000000 0x0 0x1000000 0x0 0x58000000 0x0 0x1000000 0x0 0x538f0000 0x0 0x1000>;
iommus = <0x2b 0x1f>;
reset-names = "gpu";
nvidia,host1x = <0x78>;
interrupt-names = "stall", "nonstall";
};
...
while in general the model
property of the root node should provide us with a unique identifier for the specific model of the system board in a standard manufacturer,model
format, the specification unfortunately opts to take the easy way out and only recommends a standard format. This watering down of the specification means that we are, unfortunately, unable to use the model
property as a consistent source for node labelling, and must fall back onto the compatible
properties instead — while the specification provides no firm requirements here either, these are at least forced to adopt a standard convention implicitly in order to match with the Linux kernel naming conventions.
Generating Node Labels from DeviceTree Properties
In order to generate node labels from DeviceTree properties, we developed a custom Kubernetes controller specifically for this purpose:
Given the lack of consistency of the model
encoding, as mentioned above, the approach taken by our DeviceTree node labeller is therefore to iterate over compatible
properties within the root node, as well as any designated children for which we wish to expose labels — such as, in the Jetson Nano case, the gpu
node. A simple dry-run on the node with the children of interest defined demonstrates the tags that will be generated:
$ k8s-dt-node-labeller -d -n gpu
Discovered the following devicetree properties:beta.devicetree.org/nvidia-jetson-nano: 1
beta.devicetree.org/nvidia-tegra210: 1
beta.devicetree.org/nvidia-tegra210-gm20b: 1
beta.devicetree.org/nvidia-gm20b: 1
Deploying the Node Labeller into a Heterogeneous Cluster
The node labeller itself is intended to be deployed into a heterogeneous cluster as a DaemonSet. A general overview of the labeller in action is shown below:
Targeted Pod Placement
Once the labeller is up and running, it’s now possible to target specific Gateways or Gateway + accelerator pairs. To target the Jetson Nano, for example, the model-specific beta.devicetree.org/nvidia-jetson-nano
label can be used as the basis for node selection. To target the specific GPU, beta.devicetree.org/nvidia-gm20b
can be used. To further constrain the selection, multiple labels can be used together to define the selection basis.
Using an HTTP echo server as a simple deployment example, a targeted Pod description for a Jetson Nano with a GM20B GPU can be written as follows:
apiVersion: v1 kind: Pod metadata: name: http-echo-gpu-pod labels: app: http-echo spec: containers: - name: http-echo image: adaptant/http-echo imagePullPolicy: IfNotPresent args: [ "-text", "hello from a Jetson Nano with an NVIDIA GM20B GPU" ] ports: - containerPort: 5678 nodeSelector: beta.devicetree.org/nvidia-jetson-nano: "1" beta.devicetree.org/nvidia-gm20b: "1"
Which can then be exposed through a simple service definition, as follows:
kind: Service apiVersion: v1 metadata: name: http-echo-service spec: type: NodePort selector: app: http-echo ports: - port: 5678 protocol: TCP name: http
We further expose the port externally for testing outside of the cluster:
$ kubectl port-forward service/http-echo-service 5678:5678
and demonstrate connectivity to the appropriate node:
Next Steps
Specific base container run-time environments for the different accelerator types are being prepared separately and will be made available during later stages of the SODALITE project.
For an example on getting started with an NVIDIA GPU container runtime targeting the Jetson Nano, please refer to the official guidance from NVIDIA here. The scheduling of the Pod within the cluster can be carried out using the aforementioned node selection criteria and Pod template.
Limitations
- In order to walk the devicetree, access to the node’s
/sys/firmware
directory is required — this is presently enabled by running the Pod inprivileged
mode. It may be possible to leverageallowedProcMountTypes
to disable path masking within the Pod and run without privileged mode, but this has not yet been verified. - At present there is no mechanism by which a DaemonSet can gracefully terminate without triggering a restart, due to the Pod RestartPolicy being forced to
Always
in DaemonSet Pod specifications. This means that, at the moment, the initial node selector for the DaemonSet must constrain itself to nodes that are known to be DeviceTree-capable in order to avoid spurious restarts. This has not been an issue yet with targeting primarilyarm64
andarmhf
targets, but could be problematic for other architectures. - While the labeller can attest to the existence of a node in the devicetree, it offers no detailed device-specific information or control - all of which would need to be implemented through a fit-for-purpose device plugin (or baked into the container runtime — as in the GPU case). The labeller can, however, be used as a basis for scheduling device plugins on nodes with matching capabilities.