Software guide

CUDA architecture and programming model

CUDA is NVIDIA’s parallel computing architecture which enables the developers and users alike to harness the immense computing capabilities of modern GPUs (graphics processing unit). The CUDA parallel hardware architecture is accompanied by the CUDA parallel programming model where the developer can choose to express the parallelism of their software in high-level languages such as C, C++, Fortran or driver APIs such as OpenCL™ and DirectX™ 11 Compute.

The CUDA architecture consists of several components:
  • Parallel compute engines inside NVIDIA GPUs
  • PTX instruction set architecture (ISA) for parallel computing kernels and functions
  • OS kernel-level support for hardware initialization and configuration
  • User-mode driver, which provides a device-level API for developers (including CUDA C and OpenCL)
For further detail on the CUDA architecture refer to this overview.

There are several options available in choice of programming language to implement an application in, with language bindings for CUDA device level APIs available for all popular languages. Developers also have a choice of CUDA C, OpenCL and even Fortran to implement the kernels that are executed on the GPUs compute cores.


OpenCL™ and heterogeneous computing: The open route

The Khronos Group manages and promotes the Open Compute Language (OpenCL) which is an open standard for parallel programming on heterogeneous systems. OpenCL provides a uniform programming environment for software developers to write efficient and portable code for high-performance compute servers, desktop computer systems and potentially handheld devices. Petapath is a member of the Khronos Group, participate in the OpenCL Working Group and are strong proponents of the OpenCL standard.


CUDA and OpenCL™ and performance portability

While OpenCL provides a uniform programming environment it is worth noting that the underlying architectures, sometimes even between different products from the same vendor, can differ significantly. Good programming practice and the run-time specialisation that CUDA and OpenCL's APIs support will help even out these differences and it will be possible to create code that runs on all equivalent OpenCL platforms but the developer may still have to embrace the concept of Performance Portability to eke out the best performance from a diverse range of platforms. As professional developers and consultants Petapath can help you navigate the shoals of platform optimisation.