Professional Documents
Culture Documents
Vulkan
Maik Klein
University of Koblenz
Email: maikklein@uni-koblenz.de
applications might not benefit as much from using Vulkan can be specified when a render pass is started. Drawing can
compared to OpenGL [5, p. 9]. only happen inside a render pass. More information in section
GPU limited applications: If the application already uses XIII.
the GPU to its full extent, then Vulkan might not improve A VkPipeline contains all the state that is required for draw-
the performance [5, p. 9]. ing or computations. Vulkan differentiates between graphics
Portability: Vulkan exposes hardware capabilities and it and compute pipelines. It stores state such as the shader mod-
is the job of the developer to implement a fast path for ules, render and subpasses, depth, blending, multisampling,
the specific target hardware. This might add additional tesselation and so on. The state is mostly immutable which
complexity that the developer does not want to manage means that the state can not be changed after a pipeline is
[5, p. 9]. created. More information in section XIV.
Requires additional thought: Porting a D3D11/OpenGL VkCommandBuffer is able to record commands and can
application directly to Vulkan might not result in a net then be executed by a VkQueue. Command buffers offer
benefit. Additional thought is required to take advantage functionality for binding a pipeline, drawing, binding buffers,
of the explicit API [5, p. 9]. binding desciprtor sets and more. The main benefit of com-
mand buffers is that they can be recorded on multiple threads.
V. OVERVIEW OF THE ECO - SYSTEM More information in section IX.
Vulkan also offers extensions and layers. More information in
LunarG Vulkan SDK: [28] Includes Vulkan documen-
section XVI.
tation, samples and demos. It provides Vulkan runtime
libraries, a Vulkan loader, Validation layers, Vulkan trace
VII. L OADING V ULKAN
tools and SPIR-V tools.
Glslang [26]: The official reference compiler for SPIR-V. A. Loading Entry, Instance and Device level functions
SPIRV-Cross [27]: Exposes a reflection API for SPIR-V After the Vulkan library has been loaded into memory,
and can cross compile SPIR-V into HLSL, GLSL, and vkGetInstanceProcAddr can be resolved. This function can
MSL. be used to resolve the entry level functions.
Renderdoc [29]: A free and open source debugger for 1
Vulkan. 2 v k G e t I n s t a n c e P r o c A d d r (NULL, v k C r e a t e I n s t a n c e ) ;
SaschaWillemss Vulkan Samples [30]: Extensive list of 3 v k G e t I n s t a n c e P r o c A d d r (NULL,
vkEnumerateInstanceExtensionProperties ) ;
documented Vulkan samples. 4 v k G e t I n s t a n c e P r o c A d d r (NULL,
Gpuinfo [32]: A public Vulkan hardware database. This is vkEnumerateInstanceLayerProperties ) ;
useful to get an overview of which hardware capabilities
are available on consumer hardware.
After vkCreateInstance has successfully been resolved, it
VI. H IGHLEVEL OVERVIEW can be used to resolve instance level functions.
1
VkInstance serves as an entry point for the Vulkan API. 2 vkGetInstanceProcAddr ( instance , vkCreateDevice ) ;
It exposes functions to find physical devices, create logical 3 vkGetInstanceProcAddr ( instance ,
devices. VkPhysicalDevice usually represents a single de- 4 vkGetDeviceProcAddr ) ;
5 ...
vice in a system. It is possible to query various information
about a physical device such as memory porperties, supported
image formats, driver version, device name, API version and After vkCreateDevice and vkGetDeviceProcAddr have
so on. With a physical device a logical device can be created. been resolved, device level functions can be loaded.
A logical device VkDevice offers functionality to interact
1 vkGetDeviceProcAddr ( device , vkGetDeviceQueue ) ;
with a device. It can allocate memory, create buffers, create 2 v k G e t D e v i c e P r o c A d d r ( d e v i c e , vkQueueSubmit ) ;
command buffers. More information in section VIII. 3 ...
VkBuffer can store data in an linear array and which can be
bound by graphics or compute pipelines inside a command
buffer. Unlike in OpenGL, memory allocation and buffers It should be noted that it is also possible to resolve device
are seperate concepts. Memory allocation happens through level function pointers from vkGetInstanceProcAddr but this
vkAllocateMemory(), and this allocation can then be bound will introduce one level of indirection [31].
to a buffer. It is possible to suballocate many buffers into one
big allocation. More information in section X. VIII. I NSTANCES AND D EVICES
VkDescriptorSet describes a list of resources that are used Instances can be created with vkCreateInstance. When
by a shader. A shader can have more than one descriptor set. the instance has been created, it is then possible to create a
More information in section XII. logical device with vkCreateDevice. Before a logical device
A VkRenderPass describes how the rendering is going to can be created, the developer has to find a suitable physical
happen. A render pass consist of a series subpasses [38]. device. All physical devices that have Vulkan support can be
A VkFramebuffer is used to store the information from a found with vkEnumeratePhysicalDevices. A physical device
renderpass in images through images views. A frame buffer exposes all available hardware capabilities.
3
The developer might want to check at this time which queues Secondary command buffers are more limited and can not be
and device features are available for a particular device. submitted to a queue directly but they can be executed by
For example if the application intents to use graphics com- a primary command buffer with vkCmdExecuteCommands.
mands and to display the results onto a surface, the developer One approach would be to only create one primary command
has to find one or more queues that support these two features. buffer, create N secondary command buffers on different
vkGetPhysicalDeviceQueueFamilyProperties can be used to threads and then execute all secondary command buffers in
get an array of VkQueueFamilyProperties. the primary command buffer.
queueFlags contains information of the capabilities that Secondary command buffer:
this queue supports. The VK QUEUE GRAPHICS BIT Bind DescriptorSet
will be set if the queue supports graphics commands. vkGet- Bind Vertex/Index
PhysicalDeviceSurfaceSupportKHR can be used to find a Set Viewport
queue family index that supports presentation. A queue family Bind Pipeline
index is the index of a VkQueueFamilyProperties object Bind DescriptorSet
inside the array that is returned from vkGetPhysicalDevice- Draw
QueueFamilyProperties starting from 0. A queue is allowed Primary command buffer:
to have all VkQueueFlagBits set. In fact all modern Nvidia Begin Renderpass
GPUs currently expose 16 queues that support graphics, com- Execute N secondary command buffer
pute, transfer, and sparse binding commands. Specialized hard- End Renderpass
ware can be found by looking at queues that do not support
A command buffer can either be freed with vkFreeCom-
every feature. Most modern GPUs expose a specialized queue
mandBuffers, reset individually with vkResetCommand-
that excels at moving data from RAM to GPU memory. Such a
Buffer or every command buffer inside a pool can be re-
queue might only have the VK QUEUE TRANSFER BIT
set with vkResetCommandPool. It should be noted that a
set.
pool needs to be created with the VK COMMAND POOL
Before the device can be created, the developer has to
CREATE RESET COMMAND BUFFER BIT flag if the
specify which and how many queues the application should
application wants to free a command buffer individually.
use.
Lastly vkGetPhysicalDeviceFeatures can be used to re-
X. B UFFER AND A LLOCATION
trieve the supported features for a specific physical device.
These features can then be enabled before the logical device Buffers in Vulkan can be created by filling out the Vk-
is created. BufferCreateInfo struct and calling vkCreateBuffer. size
is the size of the buffer in bytes. It should be noted that
vkCreateBuffer does not do any allocation. Any combination
IX. C OMMAND P OOLS AND C OMMAND B UFFERS
of bits can be specified for usage, but at least one of the bits
Command buffers are objects used to record commands must be set in order to create a valid buffer.
which can be submitted to a queue. [1, Chapter 5] Command After the buffer has been created, vkGetBufferMemoryRe-
buffers are allocated from a VkCommandPool. A command quirements can be used to retrieve the memory requirements
buffer can be recorded with vkCmd* commands called of that buffer.
in between vkBeginCommandBuffer and vkEndCommand- memoryTypeBits is a bitmask and contains one bit set for
Buffer. Then the command buffer can be submitted to a queue every supported memory type for the resource. Bit i is set if
with vkQueueSubmit. It is important to know that command and only if the memory type i in the VkPhysicalDeviceMem-
buffers are implicit externally synchronized, which means that oryProperties structure for the physical device is supported
they are tied to a specific pool. In practice this means that it is for the resource [1, Chapter 10].
not allowed to concurrently record two command buffers, that 1 / / F i n d a memory t y p e i n memoryTypeBits t h a t
are allocated from the same pool, on two different threads. i n c l u d e s a l l of p r o p e r t i e s
To avoid synchronization, the developer is allowed to create 2 i n t 3 2 t F i n d P r o p e r t i e s ( u i n t 3 2 t memoryTypeBits ,
VkMemoryPropertyFlags p r o p e r t i e s )
one pool per thread. 3 {
A draw call might look like this: [34] 4 f o r ( i n t 3 2 t i = 0 ; i < memoryTypeCount ; ++ i )
Begin Renderpass
5 {
6 i f ( ( memoryTypeBits & ( 1 << i ) ) &&
Bind Vertex/Index 7 ( ( memoryTypes [ i ] . p r o p e r t y F l a g s &
Set Viewport p r o p e r t i e s ) == p r o p e r t i e s ) )
Bind Pipeline
8 return i ;
9 }
Bind DescriptorSet 10 r e t u r n 1;
Draw 11 }
12
End Renderpass
13 / / Try t o f i n d an o p t i m a l memory t y p e , o r i f i t d o e s
While it is possible to create a command buffer for every not e x i s t
draw call, a better approach would be to record every draw call 14 / / f i n d any c o m p a t i b l e memory t y p e
15 VkMemoryRequirements m e m o r y R e q u i r e m e n t s ;
directly inside the command buffer. This is why Vulkan dif- 16 vkGetImageMemoryRequirements ( d e v i c e , image , &
ferentiates between primary and secondary command buffers. memoryRequirements ) ;
4
describes how the attachments are used over the course of can insert themselves into the call chain. An example for a
the subpasses [1, Chapter 7]. layer would be the standard validation layer. This layer is
A renderpass can be used inside a command buffer with able to validate API calls during development. An extension
vkCmdBeginRenderPass and vkCmdEndRenderPass. is allowed to add functionality. [1, Chaper 30.1] [31]
Sometimes the application might want to use the output of one Commonly used extensions:
rendering job as the input of another rendering job. A common VK KHR surface abstracts over native surface or
example of this is deferred rendering where the application window objects. This extension is commonly used
writes diffuse color, specular color, depth, normal information with the following platform specific extensions
into separate textures. These textures are then used as the a) VK KHR win32 surface, b) VK KHR xlib surface,
input of another rendering job to light the scene, but rendering VK KHR swapchain provides the ability to present ren-
to separate textures is slow [35, Page. 17]. Subpasses are dering results to a surface.
able to express dependencies between different rendering VK EXT debug report is able to output warnings,
jobs and allow the driver to produce a better mapping to the errors, performance warnings, and general informa-
hardware. Subpasses can be executed inside a renderpass tion. This extension is commonly used with the
with vkCmdNextSubpass. If the images are created with VK LAYER LUNARG standard validation layer.
VK IMAGE USAGE TRANSIENT ATTACHMENT BIT, Instance extensions, and layers can be enabled at instance
the driver might not write to the texture at all and it can creation, they can be explicitly enabled in the VkInstance-
directly pass the results to the next subpass. The downside is CreateInfo struct.
that the render job can only access the current pixel. Device level extensions can be enabled in the VkDevice-
CreateInfo struct.
XIV. P IPELINES The naming convention for extensions is
Vulkan supports two types of pipelines, a compute and a VK VENDORNAME extension name. The vendor
graphics pipeline. OpenGL has no real concept of pipelines name for extensions developed by multiple vendors is EXT.
and render state might change from draw call to draw call. Additionally the vendor name for experimental extensions
Vulkan encapsulates the state in a pipeline which has to be is KHX. An example for such a extension would be
explicitly created with vkCreategraphicsPipelines or vkCre- VK KHX device group which is able to group multiple
ateComputePipelines. Pipeline creation can be cached with physical devices into one logical device.
a VkPipelineCache object. Pipelines are mostly immutable Previously Vulkan distinguished between instance and device
and can not change state after they have been created, unless level layers, but this feature has been deprecated and all
the state was defined to be dynamic at pipeline creation time. layers can now be enabled at instance creation time [1,
Pipelines can be bound with vkCmdBindPipeline inside a Chapter 30.1.1].
render pass. A complete list of extensions can be found at [1, Appendix
C: Layers & Extensions].
XV. S WAPCHAIN
The swapchain is a device level extension which allows XVII. T HREADING B EHAVIOR
images to be presented to a surface. A swapchain can be All functions in Vulkan can be called concurrently from
created with vkCreateSwapchainKHR. The presentable im- different threads but certain parameters or components of
ages are owned by the swapchain and the minImageCount parameters are defined to be externally synchronized.This
has to be specified before the swapchain can be created. means that the caller must guarantee that no more than
vkGetPhysicalDeviceSurfaceCapabilitiesKHR can be used one thread is using such a parameter at a given time [1,
to retrieve information of the minimum and maximum number Chapter 2.5] Additionally there are also functions with implicit
of images the underlying hardware supports. vkGetPhysi- externally synchronized parameters. For example command
calDeviceSurfacePresentModesKHR can be used to retrieve buffers that are allocated from a VkCommandPool are tied
the available present modes. The images inside the swapchain to this specific pool and therefore share the same threading
can be accessed with vkGetSwapchainImagesKHR. These behavior.
images can be used to create images views which can then be
used to create framebuffers. A renderpass is able to render XVIII. S YNCHRONIZATION
to a VkFramebuffer. vkAquireNextImageKHR can be Vulkan offers 4 synchronization primitives, fences,
used to retrieve the index of framebuffer, which can then be semaphores, events, and pipeline barriers. Fences are used
used inside a renderpass. After the draw call has finished, for communication with the host. One example would be
vkQueuePresentKHR can present the image to a surface. vkQueueSubmit. After the application has submitted work
If the application wants to resize the framebuffers, a new to the queue, a fence can be used with vkGetFenceStatus
swapchain has to be created. to check if the work has been completed. Additionally
it is also possible to wait for the work to complete with
XVI. E XTENSIONS AND L AYERS vkWaitForFences.
Vulkan offers two ways to add functionality, Extensions and A semaphore is used to synchronize dependencies on the
Layers. Layers can not add or modify Vulkan commands but GPU and it can only be in two states, signaled or unsignaled.
6
One example would be to synchronize the draw call with the [4] https://github.com/KhronosGroup/Vulkan-LoaderAndValidationLayers/
presentation. Semaphores do not need to be reset manually. blob/master/loader/LoaderAndLayerInterface.md#device-related-objects
Accessed on 16.07.2016
Events provide a fine-grained synchronization primitive which [5] http://developer.download.nvidia.com/gameworks/events/GDC2016/
can be signaled either within a command buffer or by the Vulkan Essentials GDC16 tlorach.pdf Accessed on 16.07.2016
host, and can be waited upon within a command buffer or [6] https://developer.nvidia.com/transitioning-opengl-vulkan Accessed on
16.07.2016
queried on the host [1, Chapter 6]. [7] https://developer.nvidia.com/transitioning-opengl-vulkan and
Pipeline barriers also provide synchronization control within http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/
a command buffer, but at a single point, rather than with uploads/2016/03/d3d12 vulkan lessons learned.pdf Accessed on
16.07.2016
separate signal and wait operations. One example of a barrier [8] https://www.amazon.com/Vulkan-Programming-Guide-Official-Learning/
would be the VkImageMemoryBarrier which can be used dp/0134464540 Accessed on 16.07.2016
to transition images to different image layouts. [9] http://www.croteam.com/house-interview-vulkan/ Accessed on
16.07.2016
[10] https://forums.robertsspaceindustries.com/discussion/comment/
XIX. C ONCLUSION 7581676/#Comment 7581676 Accessed on 16.07.2016
[11] https://www.unrealengine.com/en-US/blog/unreal-engine-4-15-released
Vulkan delivers a low level, explicit and cross platform API Accessed on 16.07.2016
that does not hide important details from the developer and [12] https://blogs.unity3d.com/2017/03/31/5-6-is-now-available-and-completes-the-unity-
Accessed on 16.07.2016
therefore makes the API more transparent. It is already used [13] https://www.cryengine.com/roadmap Accessed on 16.07.2016
in production by various games and game engines, has driver [14] http://xenko.com/ Accessed on 16.07.2016
support from all major hardware vendors and simplifies shader [15] http://www.valvesoftware.com/news/?id=16000 Accessed on 16.07.2016
[16] https://developer.nvidia.com/vulkan-driver Accessed on 16.07.2016
backends by providing a low level shading language. But other [17] http://www.amd.com/en-us/innovations/software-technologies/
options still exist. technologies-gaming/vulkan Accessed on 16.07.2016
[18] http://airlied.livejournal.com/83102.html Accessed on 16.07.2016
OpenGL: OpenGL supports all platforms that Vulkan [19] https://developer.android.com/ndk/guides/graphics/index.html Accessed
currently supports but can also be run in the browser and on 16.07.2016
on iOS and macOS. Modern OpenGL can still be a good [20] https://www.khronos.org/conformance/adopters/conformant-products
Accessed on 16.07.2016
choice, especially with AZDO [37] but it does not have [21] http://doom.com/en-us/ Accessed on 16.07.2016
the same level of support for multithreading that Vulkan [22] http://www.valvesoftware.com/games/dota2.html Accessed on
has. 16.07.2016
[23] http://www.ashesofthesingularity.com/ Accessed on 16.07.2016
Direct3D 12: Direct3D 12 has a very similar API com- [24] http://madmaxgame.com Accessed on 16.07.2016
pared to Vulkan but only runs on Windows 10. Because [25] MoltenVk https://moltengl.com/moltenvk/ Accessed on 16.07.2016
D3D 12 only supports Windows 10, Star Citizen might [26] https://www.khronos.org/opengles/sdk/tools/Reference-Compiler/
Accessed on 16.07.2016
exclusively use Vulkan. But because the APIs are so [27] https://github.com/KhronosGroup/SPIRV-Cross Accessed on 16.07.2016
similar, it should not be too hard to find a common [28] https://www.lunarg.com/vulkan-sdk/ Accessed on 16.07.2016
abstraction. [29] https://renderdoc.org/ Accessed on 16.07.2016
[30] https://github.com/SaschaWillems/Vulkan Accessed on 16.07.2016
Metal: Metal is a graphics and compute API from Apple. [31] http://gpuopen.com/using-the-vulkan-validation-layers/ Accessed on
It only supports iOS, macOS, and tvOS. It also is not as 16.07.2016
low level as Vulkan and Direct3D 12. MoltenVK [25] is [32] http://vulkan.gpuinfo.org/ Accessed on 16.07.2016
[33] https://developer.nvidia.com/vulkan-shader-resource-binding Accessed
a library that uses Metal to emulate Vulkan on macOS on 16.07.2016
and iOS. [34] https://developer.nvidia.com/engaging-voyage-vulkan Accessed on
16.07.2016
The spec [1] for Vulkan is very readable and the standard [35] https://www.khronos.org/assets/uploads/developers/library/
validation layer does a very good job of finding errors and 2016-vulkan-devday-uk/6-Vulkan-subpasses.pdf Accessed on 16.07.2016
reporting errors in a very understandable way. Often the error [36] https://www.khronos.org/registry/spir-v/specs/1.2/SPIRV.html Accessed
on 16.07.2016
messages even quote the spec to better explain the error [37] https://www.khronos.org/assets/uploads/developers/library/2014-gdc/
message. Khronos-OpenGL-Efficiency-GDC-Mar14.pdf Accessed on 16.07.2016
Vulkan has a pretty steep learning curve and it can be very [38] https://renderdoc.org/vulkan-in-30-minutes.html Accessed on
16.07.2016
tedious to render a simple triangle because of its explicit [39] https://medium.com/@kennyalive/quake-3-vulkanized-245cc349fdcf
API. But after rendering a simple triangle, it is not hard to Accessed on 16.07.2016
implement more complex tasks because rendering a triangle
already requires the developer to understand a big part of the
API. Rendering a triangle requires roughly 1000 LOC, and the
quake 3 vulkan renderer [39] only requires 3200 LOC.
R EFERENCES
[1] https://www.khronos.org/registry/vulkan/specs/1.0-extensions/html/
vkspec.html Accessed on 16.07.2016
[2] https://www.khronos.org/assets/uploads/developers/library/overview/
vulkan-overview.pdf Accessed on 16.07.2016
[3] http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/
uploads/2016/03/d3d12 vulkan lessons learned.pdf Accessed on
16.07.2016