Building a Vulkan Renderer from Scratch

Every graphics programmer eventually asks the same question: how does a triangle actually get from application memory to the screen? After years of working inside engines that abstract that pipeline away, I decided to find out by building a Vulkan renderer from the ground up.

Why Vulkan Over OpenGL

OpenGL served the industry well for decades, but its driver-managed state machine hides critical decisions from the developer. Vulkan inverts that relationship. You control memory allocation, synchronization, and command recording explicitly. That extra responsibility is exactly the point — you cannot optimize what you cannot see.

For a portfolio project meant to demonstrate graphics engineering depth, Vulkan was the obvious choice. It forces you to understand every stage of the pipeline, from device selection and queue families through render pass compatibility and descriptor set layouts.

PBR Pipeline Architecture

The renderer uses a physically-based rendering (PBR) pipeline following the metallic-roughness workflow. The core loop looks deceptively simple: record commands, submit to a queue, present the swapchain image. The complexity lives in what gets recorded.

// Simplified render loop structure
void Renderer::drawFrame() {
    vkWaitForFences(device, 1, &inFlightFence, VK_TRUE, UINT64_MAX);

    uint32_t imageIndex;
    vkAcquireNextImageKHR(device, swapchain, UINT64_MAX,
                          imageAvailableSemaphore, VK_NULL_HANDLE,
                          &imageIndex);

    recordCommandBuffer(commandBuffer, imageIndex);

    VkSubmitInfo submitInfo{};
    submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
    // ... semaphore configuration ...
    submitInfo.commandBufferCount = 1;
    submitInfo.pCommandBuffers = &commandBuffer;

    vkQueueSubmit(graphicsQueue, 1, &submitInfo, inFlightFence);
    // Present swapchain image
}

Each frame, the command buffer records draw calls that bind the PBR pipeline, set descriptor sets containing material parameters (albedo, metallic, roughness, normal maps), and issue indexed draws. The fragment shader samples those textures and runs the Cook-Torrance BRDF to compute final lighting.

Shader Compilation and SPIR-V

Vulkan does not accept GLSL directly. Shaders must be compiled to SPIR-V bytecode, and I integrated glslangValidator into the CMake build so shaders recompile automatically when modified. This caught syntax errors at build time rather than at runtime — a small workflow win that saved real debugging hours.

The PBR fragment shader alone runs around 200 lines of GLSL. It handles image-based lighting with a prefiltered environment map, a BRDF lookup texture, and an irradiance cubemap for diffuse ambient. Getting the energy conservation right between specular and diffuse terms was one of the most satisfying debugging sessions of the entire project.

Memory Management Challenges

Vulkan's explicit memory model means every buffer and image needs a backing VkDeviceMemory allocation. Naively allocating per-resource hits driver limits fast — most implementations cap total allocations around 4096.

I built a custom allocation layer that sub-allocates from larger memory blocks, grouping resources by usage type (vertex/index buffers in device-local memory, uniform buffers in host-visible memory). This pattern mirrors what production engines like Unreal use internally, and it eliminated allocation-related crashes on lower-end hardware during testing.

Frame Graph Design

Rather than hardcoding render pass order, I implemented a lightweight frame graph that declares resource dependencies between passes. The graph resolves execution order, inserts pipeline barriers for image layout transitions, and reuses transient attachments where lifetimes don't overlap.

For a renderer this size, a frame graph is arguably overkill. But it made adding shadow mapping trivial — I declared a depth-only pass, the graph inserted the correct barriers, and the shadow map was available as a sampled image in the lighting pass without any manual synchronization code.

35% Performance Optimization

The initial implementation ran at about 90 FPS on my test scene (a few hundred meshes with PBR materials). Profiling with RenderDoc revealed two bottlenecks: redundant pipeline state changes and per-frame command buffer allocation.

Pipeline state caching eliminated redundant vkCmdBindPipeline calls by sorting draw calls by material, then only binding when the pipeline handle actually changed. This alone recovered about 15% of frame time.

Command buffer reuse replaced the pattern of allocating and recording fresh command buffers every frame. By pre-recording static geometry into secondary command buffers and only re-recording the dynamic portions, I cut CPU-side recording time roughly in half. Combined with descriptor set pooling for efficient GPU resource binding, total throughput improved by 35%.

What I Took Away

Building a renderer from scratch changed how I think about every engine I use professionally. When I work in Unreal Engine now, I understand why certain material configurations are expensive, why draw call batching matters, and what the RHI abstraction layer is actually doing underneath.

The project is open source. If you're considering your own Vulkan journey, the repository includes validation layer setup, the full PBR shader source, and the CMake build configuration with automatic SPIR-V compilation.