Vendor and Third-Party Interop
At some point you would ask yourself how you can use native vendor implementations e.g. for BLAS, FFT or other functions together with alpaka.
You might want to call thrust::transform on CUDA, rocPRIM on HIP, a oneAPI library on SYCL, or even a CPU-side library function on the host backend.
alpaka provides a function-symbol interface for this request, which keeps the code to call vendor functions clean and readable without introducing preprocessor macros around the function calls.
The following steps are required:
Define an alpaka function symbol with
ALPAKA_FN_SYMBOL(symbolName)Optionally implement a generic alpaka fallback for any device.
Specialize implementations for the backends that have a special vendor path.
Call the symbol via symbolName::call(queue, ...) or symbolName{}(queue, ...) and the correct implementation will be dispatched based on the API and device kind derived from the first argument, in this case the queue.
The example a tiny image-processing operation is used.
Each input value is read as a pixel intensity, and the operation computes scale * value + shift.
The affine operation itself is a tiny functor.
struct AffineTransformOp { float scale; float shift; ALPAKA_FN_ACC auto operator()(float const& value) const -> float { return scale * value + shift; } };For the API
hostand the device kindcpuwe will fallback to the C++ standard implementation viastd::transform.
Defining a Dispatchable Function
/* The function symbol is only defined without specifying the argument signature. * You need to provide at least a generic function dispatch signature for the symbol. */ ALPAKA_FN_SYMBOL(AffineTransform);It is allowed later to declare different dispatch function signatures for the same function symbol. The function dispatch order follows the C++ rules for function overloading.
Registering a Generic Fallback
/* Genic function dispatch signature which is used if no more specific specification for the symbol is provided. * `input` and `output` should be one-dimensional, enforced by the required clause, to build a unified interface * because std::transform used in the host CPU overload only supports one dimensional memory. */ template<typename T_Queue, alpaka::concepts::IMdSpan T_Output, alpaka::concepts::IMdSpan T_Input> constexpr void alpakaFnDispatch( AffineTransform, T_Queue&& queue, T_Output&& output, float scale, float shift, T_Input&& input) requires(concepts::Dim<ALPAKA_TYPEOF(input), 1u> && concepts::Dim<ALPAKA_TYPEOF(output), 1u>) { // Forward all arguments because the function signature chosen here matches the alpaka transform interface. alpaka::onHost::transform( ALPAKA_FORWARD(queue), ALPAKA_FORWARD(output), ScalarFunc{AffineTransformOp{scale, shift}}, ALPAKA_FORWARD(input)); }This overload is the portable baseline. It works on every backend that can run the normal alpaka algorithm path, so it is a good default even when you later add CUDA-, HIP-, or SYCL-specific overloads.
Registering an API-Device-Specific Overload
/* This overload is used if the queue API is `api::Host` and the device kind is `deviceKind::Cpu`. * `input` and `output` should be one-dimensional, enforced by the requirement clause, due to the limitations of * std::transform used for the implementation. */ template<typename T_Queue, alpaka::concepts::IMdSpan T_Output, alpaka::concepts::IMdSpan T_Input> constexpr void alpakaFnDispatch( AffineTransform::Spec<alpaka::api::Host, alpaka::deviceKind::Cpu>, T_Queue&& queue, T_Output&& output, float scale, float shift, T_Input&& input) requires(concepts::Dim<ALPAKA_TYPEOF(input), 1u> && concepts::Dim<ALPAKA_TYPEOF(output), 1u>) { /* Enqueue support only const lambdas/functors but the pointer must be writable, therefore create a copy of the * pointer before the multidimensional span becomes const due to the lambda. For IMdSpan the constness will be * propagated to the data. */ auto outPtr = output.data(); // Enqueue the operation via a host function to ensure the order of executions within a non-blocking queue. queue.enqueueHostFn( [=]() { std::transform( input.data(), input.data() + input.getExtents().x(), outPtr, AffineTransformOp{scale, shift}); }); }This example uses
std::transformas a small stand-in for a third-party backend function. The pattern is the same when the backend-specific code comes from a GPU vendor library. On CUDA, for example, this is where you would passqueue.getNativeHandle()to a library that expects a CUDA stream and then call the vendor routine there. This host-specific overload is intentionally constrained to 1D spans because the example forwards tostd::transformover a single contiguous range.The important part is the
Spec<api, deviceKind>type:
it states which backend the overload belongs to,
it keeps the backend choice out of the call site,
and it lets the same public function symbol dispatch differently for different queues and devices.
Calling the Function
The call itself stays simple. You pass the queue and the ordinary data arguments. alpaka looks at the queue’s specification and forwards the call to the best matching overload.
std::array<float, 5u> hostInput{1.f, 2.f, 3.f, 4.f, 5.f}; std::array<float, 5u> hostOutput{}; auto inputBuffer = onHost::allocLike(device, hostInput); auto outputBuffer = onHost::allocLike(device, hostOutput); onHost::memcpy(queue, inputBuffer, hostInput); /* Call the function, the overload will be dispatched based on the properties of the queue. * * You can also create an instance of the alpaka function symbol instead of using ::call(). * This allows using a function symbol as an argument of a method. * * example: `vendorTutorial::AffineTransform{}(....)` */ vendorTutorial::AffineTransform::call(queue, outputBuffer, 2.0f, 0.5f, inputBuffer); onHost::memcpy(queue, hostOutput, outputBuffer); onHost::wait(queue);
alpaka’s function interface is not limited to the usage on the host side only.
As shown in the function kernel tutorial you can write kernels which are specializable for a device kind and/or API.
You can also use it to call vendor/third-party functions from within a kernel (onAcc); in this case do not forget to mark the function with the attribute ALPAKA_FN_ACC or constexpr, otherwise some device compilers may fail to compile these functions.
Complete Source File
200_vendorInterop.cpp
1/* Copyright 2026 René Widera
2 * SPDX-License-Identifier: ISC
3 */
4
5#include "docsTest.hpp"
6
7#include <alpaka/alpaka.hpp>
8
9#include <catch2/catch_template_test_macros.hpp>
10#include <catch2/catch_test_macros.hpp>
11
12#include <algorithm>
13#include <array>
14
15using namespace alpaka;
16
17namespace vendorTutorial
18{
19 struct AffineTransformOp
20 {
21 float scale;
22 float shift;
23
24 ALPAKA_FN_ACC auto operator()(float const& value) const -> float
25 {
26 return scale * value + shift;
27 }
28 };
29
30
31 /* The function symbol is only defined without specifying the argument signature.
32 * You need to provide at least a generic function dispatch signature for the symbol.
33 */
34 ALPAKA_FN_SYMBOL(AffineTransform);
35
36
37 /* Genic function dispatch signature which is used if no more specific specification for the symbol is provided.
38 * `input` and `output` should be one-dimensional, enforced by the required clause, to build a unified interface
39 * because std::transform used in the host CPU overload only supports one dimensional memory.
40 */
41 template<typename T_Queue, alpaka::concepts::IMdSpan T_Output, alpaka::concepts::IMdSpan T_Input>
42 constexpr void alpakaFnDispatch(
43 AffineTransform,
44 T_Queue&& queue,
45 T_Output&& output,
46 float scale,
47 float shift,
48 T_Input&& input) requires(concepts::Dim<ALPAKA_TYPEOF(input), 1u> && concepts::Dim<ALPAKA_TYPEOF(output), 1u>)
49 {
50 // Forward all arguments because the function signature chosen here matches the alpaka transform interface.
51 alpaka::onHost::transform(
52 ALPAKA_FORWARD(queue),
53 ALPAKA_FORWARD(output),
54 ScalarFunc{AffineTransformOp{scale, shift}},
55 ALPAKA_FORWARD(input));
56 }
57
58
59 /* This overload is used if the queue API is `api::Host` and the device kind is `deviceKind::Cpu`.
60 * `input` and `output` should be one-dimensional, enforced by the requirement clause, due to the limitations of
61 * std::transform used for the implementation.
62 */
63 template<typename T_Queue, alpaka::concepts::IMdSpan T_Output, alpaka::concepts::IMdSpan T_Input>
64 constexpr void alpakaFnDispatch(
65 AffineTransform::Spec<alpaka::api::Host, alpaka::deviceKind::Cpu>,
66 T_Queue&& queue,
67 T_Output&& output,
68 float scale,
69 float shift,
70 T_Input&& input) requires(concepts::Dim<ALPAKA_TYPEOF(input), 1u> && concepts::Dim<ALPAKA_TYPEOF(output), 1u>)
71 {
72 /* Enqueue support only const lambdas/functors but the pointer must be writable, therefore create a copy of the
73 * pointer before the multidimensional span becomes const due to the lambda. For IMdSpan the constness will be
74 * propagated to the data.
75 */
76 auto outPtr = output.data();
77 // Enqueue the operation via a host function to ensure the order of executions within a non-blocking queue.
78 queue.enqueueHostFn(
79 [=]()
80 {
81 std::transform(
82 input.data(),
83 input.data() + input.getExtents().x(),
84 outPtr,
85 AffineTransformOp{scale, shift});
86 });
87 }
88
89} // namespace vendorTutorial
90
91TEMPLATE_LIST_TEST_CASE("tutorial vendor interop dispatch", "[docs]", docs::test::TestBackends)
92{
93 auto selector = onHost::makeDeviceSelector(TestType::makeDict()[object::deviceSpec]);
94 if(!selector.isAvailable())
95 return;
96 onHost::concepts::Device auto device = selector.makeDevice(0);
97 onHost::Queue queue = device.makeQueue(queueKind::blocking);
98
99 std::array<float, 5u> hostInput{1.f, 2.f, 3.f, 4.f, 5.f};
100 std::array<float, 5u> hostOutput{};
101
102 auto inputBuffer = onHost::allocLike(device, hostInput);
103 auto outputBuffer = onHost::allocLike(device, hostOutput);
104
105 onHost::memcpy(queue, inputBuffer, hostInput);
106
107 /* Call the function, the overload will be dispatched based on the properties of the queue.
108 *
109 * You can also create an instance of the alpaka function symbol instead of using ::call().
110 * This allows using a function symbol as an argument of a method.
111 *
112 * example: `vendorTutorial::AffineTransform{}(....)`
113 */
114 vendorTutorial::AffineTransform::call(queue, outputBuffer, 2.0f, 0.5f, inputBuffer);
115
116 onHost::memcpy(queue, hostOutput, outputBuffer);
117 onHost::wait(queue);
118
119 CHECK(hostOutput[0] == 2.5f);
120 CHECK(hostOutput[1] == 4.5f);
121 CHECK(hostOutput[2] == 6.5f);
122 CHECK(hostOutput[3] == 8.5f);
123 CHECK(hostOutput[4] == 10.5f);
124}