The Microkernel Meets High-Performance Computing

AUTOMOTIVE & IOT / 01.11.24 / Louay Abdelkader

How QNX Software-Development Platform 8.0 is changing how we think about microkernel architecture.

Adoption of operating systems for Internet of Things (IoT) devices, and systems based on the microkernel architecture, is on the rise. The focus on IoT- and microkernel-capable OS platforms is only natural, given the substantial advantages they can provide in terms of safety, security, software isolation, modularity, and portability for critical systems — such as vehicles, robotics, and industrial systems. Notable examples of OSes playing a central role in device innovation include Apple, with its incorporation of the Mach microkernel into OS X, and Google, with its creation of Fuchsia for IoT devices and KataOS for machine learning (ML) applications.

The BlackBerry® QNX® RTOS (real-time OS) reflects several decades of experience with microkernels. From our first commercial QNX offering over 40 years ago to our recent announcement of QNX^® OS 8.0, we’ve persistently focused on the capabilities and performance of our microkernel. This blog explains some concepts underlying the current resurgence in OSes based on the microkernel architecture, and shares the recent QNX OS 8.0 architectural updates that support its one-of-a-kind capabilities running on the latest high-performance computing (HPC) hardware.

What Is a Microkernel?

A microkernel is fundamentally an OS design architecture in which the kernel runs in its own address space in memory, also known as kernel space, and provides only the most essential set of services needed. These typically include memory management, thread scheduling, and facilitating communication between software components including other OS services that sit outside kernel space. The kernel is intentionally kept small and contains only critical functions and OS. Device drivers, networking, graphics, and filesystems all run in their own dedicated address space.

The main advantage of this design is that it provides a secure, reliable, and modular OS. Additional services provided by a typical OS — graphics, device drivers, filesystems, networking, and more — operate outside the kernel space in user mode. This places a small amount of code within the CPU’s supervisory ring, which significantly limits risks and exposure to security and safety bugs.

In contrast, an OS based on a monolithic kernel has the kernel, networking, filesystems, device drivers, graphics, and other OS services all running in kernel space, where they share the same address space in memory as the kernel. From a performance standpoint, in certain conditions, this design is more favorable as OS services are running within kernel space and there is no context switch when those services communicate with the kernel. However, with all services running in kernel space, this design greatly increases risk and exposure of security and safety bugs that could arise in one OS service, such as networking, that could impact the kernel or other OS services. For example, if a security bug associated with memory corruption is detected in the networking stack, it will likely impact other OS services and the kernel, since the networking stack shares the same memory space. The resulting domino effect from a system level could be significant if the kernel and other OS services are impacted. It could spiral into the overall system behaving erratically, which in mission- or safety-critical systems, could put lives at risk.

Traditional Microkernel Attributes

The beauty of microkernel design is its modularity in that security, safety, or any bugs associated with OS services, are contained in their own address space which is running in user space, which limits exposure to the kernel and other OS services.

Outside the kernel, a microkernel design helps develop a robust and modular system. The bulk of the system software — including OS service — runs in user mode, which limits the impact system integrity if it fails, as the kernel is running in its own address space. Should a driver, filesystem, networking stack, or other system service fail on a microkernel system, it will produce an immediate crash to that OS service with limited impact to the kernel and other OS services — and that’s a good thing! This means the kernel, which is the core of any OS, keeps running, to keep essential functions of the system up. It could also go into a design-safe state, and inform the software team of the specific issue so they can troubleshoot and address it quickly.

In contrast, monolithic OSes run the kernel and OS Services like networking, filesystems, hardware drivers and graphics in the kernel address space, resulting in a design that can allow bugs to corrupt or crash the kernel and other OS services. This can have a widespread, even catastrophic impact on the overall system.

Additional Microkernel Advantages

With the drivers being in user space, it also makes the microkernel-based OS essentially hardware-agnostic. Of course, the OS — including the kernel — would be supporting specific hardware architectures, such as Armv8, Armv9 or x86 64. But for special-purpose hardware that is provided by silicon vendors to implement their specific hardware architectures, the board support package (BSP) runs in user space. This is an excellent benefit for system integrators who want the ability to change the hardware from one vendor to another without having to re-spin the OS. With the microkernel, all that’s needed is a new BSP for the hardware and voila, you are up and running with no changes to core services which means minimal changes to software the system integrator deployed on top of the OS.

An additional benefit is the development model. Since all non-core components (like device drivers and protocol stacks) operate as user-space applications, they can be developed and debugged using standard tools, negating the need for specialized debug APIs or dedicated debuggers.

These benefits are some of the reasons why BlackBerry’s foundational QNX software products are built on this architecture. Since 1982, every QNX OS released has been based on a microkernel design. It’s why QNX software is the trusted choice for so many mission-critical applications, like space and avionics, medical, defense, nuclear power plants, locomotive control systems, and automobiles.

Why Isn’t Everything a Microkernel?

Given the tangible improvement in reliability provided by microkernels, it seems most operating systems should have adopted this direction long ago. While many common OSes, such as OS X®, iOS®, Windows®, and even Linux®, have been leveraging microkernel principles and features, this development direction has been dogged by a frequent objection that “microkernels have an extra context switch,” and thus could potentially lag in throughput performance compared to monolithic OSes. This concern has acted as a deterrent to deeper exploration in many cases, and consigned microkernels primarily to applications where reliable and robust operation is a higher priority than execution speed. Yet, the idea that “microkernels are slow by design” is now a misnomer, based on the capabilities and requirements of decades-old hardware and software.

To trace the origins of this perception, we must recognize that, in most cases, the architecture of a microkernel turns operating system requests into IPC calls between the application and the service, while a monolithic kernel manages this with a direct call.

Figure 1 – The performance cost of monolithic versus microkernel system calls

The performance cost associated with microkernels comes down to two additional mode switches and two additional context switches. This additional time is only burdensome when the call cost makes up a large proportion of the total time spent, or to be more specific, when the system service is managing overhead in order to perform a very small operation. While a couple of additional context and mode switches may have been a significant impact in the early days of protected-mode architectures, modern processors keep shrinking this disadvantage. This is especially true when considering the top speeds of today’s high-performance CPUs.

The Microkernel’s Moment to Shine

As the performance impact of the microkernel continues to shrink, unique aspects of how HPC development can benefit from microkernel-specific features become clear:

Fault containment is increasingly expanding beyond the realm of mission critical software, to encompass many types of applications, including those in HPC. Fault containment is defined as the set of OS services, user applications, or drivers that share one or more common resources and may be affected by a single fault. A microkernel-based OS is fault-tolerant due to its architecture, and allows the system to efficiently manage redundancy strategies, such as hot-swapping failed hardware or restarting services and drivers. The result is a consistent implementation of fault tolerance features across the system — as exemplified in BlackBerry’s High Availability Manager. This approach minimizes the impact of fault recovery, including recovery time, making a full restart a rare necessity.
Noise-sensitive apps: HPC applications that use a bulk-synchronous programming model rely on computation and communication threads operating in perfect synchronization to maximize throughput. However, “OS noise” created by sporadic OS housekeeping and background tasks can have a dramatic effect on performance. Pre-empting a thread can squander several thousand cycles by causing synchronized processes to fall out of step, amounting to a significant slowdown overall. The QNX microkernel, with its capacity to manage the bulk of hardware interrupt servicing in user-level code, helps reduce the variable background processing that can disrupt well-synchronized tasks.

QNX SDP 8.0 and HPC: Enhancing Performance and Flexibility

The latest release of the QNX microkernel, QNX^® SDP 8.0, has numerous improvements that make it particularly suited to HPC applications and HPC-based systems in key areas. These include:

Next-generation microkernel tailormade for HPC
New and updated kernel schedulers
Hard real-time determinism
Uniform parallelism and pre-emption
High-performance networking

These are just a few of the new capabilities that enable QNX SDP 8.0 to be fine-tuned for an HPC environment — without sacrificing safety and security.

Embracing the Next Era of Innovation

The growing interest in microkernels is driven by their ability to improve reliability, while efficiency concerns continue to wane. The latest updates in QNX SDP 8.0 and QNX OS 8.0 show a clear commitment to adapting microkernels to modern needs, offering improvements in areas that benefit overall performance, especially in HPC designs. These changes reflect not only the traditional strengths of microkernels but also the ongoing dedication to innovation with BlackBerry QNX.

To learn more about BlackBerry QNX and the features offered by QNX SDP 8.0, visit this page.

For similar articles and news delivered straight to your inbox, subscribe to the BlackBerry Blog.

Related Reading

About Louay Abdelkader

Louay Abdelkader is Senior Manager, Product Management at BlackBerry QNX.

Back