[译]Vulkan教程(23)暂存buffer


[译]Vulkan教程(23)暂存buffer

Staging buffer 暂存buffer

Introduction 入门

The vertex buffer we have right now works correctly, but the memory type that allows us to access it from the CPU may not be the most optimal memory type for the graphics card itself to read from. The most optimal memory has the VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT flag and is usually not accessible by the CPU on dedicated graphics cards. In this chapter we're going to create two vertex buffers. One staging buffer in CPU accessible memory to upload the data from the vertex array to, and the final vertex buffer in device local memory. We'll then use a buffer copy command to move the data from the staging buffer to the actual vertex buffer.

我们现在的顶点buffer可以正确地工作,但是允许我们从CPU读写的内存类型,对图形卡可能不是最优的。最优的内存有VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT 标志,它通常在专用图形卡上是不能被CPU读写的。本章我们将创建2个顶点buffer。在CPU可读写的内存里的一个暂存buffer,用于保存顶点数组的数据,和最终的设备局部内存的顶点buffer。我们然后用一个buffer复制命令来将数据从暂存buffer移动到实际的顶点buffer。

Transfer queue 转移队列

The buffer copy command requires a queue family that supports transfer operations, which is indicated using VK_QUEUE_TRANSFER_BIT. The good news is that any queue family with VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT capabilities already implicitly support VK_QUEUE_TRANSFER_BIT operations. The implementation is not required to explicitly list it in queueFlags in those cases.

复制buffer的命令要求队列家族支持转移操作,这由VK_QUEUE_TRANSFER_BIT标志。好消息是,任何有VK_QUEUE_GRAPHICS_BIT 或VK_QUEUE_COMPUTE_BIT 能力的队列家族已经隐式地支持操作了。此时,实现不需要显式地将它列在queueFlags 里。

If you like a challenge, then you can still try to use a different queue family specifically for transfer operations. It will require you to make the following modifications to your program:

如果你喜欢调整,那么你仍旧可以尝试用一个不同的队列家族that专门针对转移操作。它会要求你对程序做出如下修改:

  • Modify QueueFamilyIndices and findQueueFamilies to explicitly look for a queue family with the VK_QUEUE_TRANSFER bit, but not the VK_QUEUE_GRAPHICS_BIT. 修改QueueFamilyIndices 和findQueueFamilies  to显式地查询带有VK_QUEUE_TRANSFER 位的队列家族,但是不带VK_QUEUE_GRAPHICS_BIT位。
  • Modify createLogicalDevice to request a handle to the transfer queue. 修改createLogicalDevice  to请求转移队列的句柄。
  • Create a second command pool for command buffers that are submitted on the transfer queue family. 创建第二个命令池for命令缓存that提交到转移队列家族。
  • Change the sharingMode of resources to be VK_SHARING_MODE_CONCURRENT and specify both the graphics and transfer queue families. 修改资源的sharingMode 为VK_SHARING_MODE_CONCURRENT ,同时指定图形和转移队列家族。
  • Submit any transfer commands like vkCmdCopyBuffer (which we'll be using in this chapter) to the transfer queue instead of the graphics queue. 提交转移命令(例如vkCmdCopyBuffer ,我们在本章就会这样用)到转移队列,而不是到图形队列。

It's a bit of work, but it'll teach you a lot about how resources are shared between queue families.

这需要点工作,但是它会教给你很多关于资源如何在队列家族间共享的事。

Abstracting buffer creation 抽象buffer创建

Because we're going to create multiple buffers in this chapter, it's a good idea to move buffer creation to a helper function. Create a new function createBuffer and move the code in createVertexBuffer (except mapping) to it.

因为本章我们要创建多个buffer,将buffer创建移动到一个辅助函数是个好主意。创建新函数createBuffer ,将createVertexBuffer 的代码(除了映射)放进去。

 1 void createBuffer(VkDeviceSize size, VkBufferUsageFlags usage, VkMemoryPropertyFlags properties, VkBuffer& buffer, VkDeviceMemory& bufferMemory) {
 2     VkBufferCreateInfo bufferInfo = {};
 3     bufferInfo.sType = VK_STRUCTURE_TYPE_BUFFER_CREATE_INFO;
 4     bufferInfo.size = size;
 5     bufferInfo.usage = usage;
 6     bufferInfo.sharingMode = VK_SHARING_MODE_EXCLUSIVE;
 7  
 8     if (vkCreateBuffer(device, &bufferInfo, nullptr, &buffer) != VK_SUCCESS) {
 9         throw std::runtime_error("failed to create buffer!");
10     }
11  
12     VkMemoryRequirements memRequirements;
13     vkGetBufferMemoryRequirements(device, buffer, &memRequirements);
14  
15     VkMemoryAllocateInfo allocInfo = {};
16     allocInfo.sType = VK_STRUCTURE_TYPE_MEMORY_ALLOCATE_INFO;
17     allocInfo.allocationSize = memRequirements.size;
18     allocInfo.memoryTypeIndex = findMemoryType(memRequirements.memoryTypeBits, properties);
19  
20     if (vkAllocateMemory(device, &allocInfo, nullptr, &bufferMemory) != VK_SUCCESS) {
21         throw std::runtime_error("failed to allocate buffer memory!");
22     }
23  
24     vkBindBufferMemory(device, buffer, bufferMemory, 0);
25 }

Make sure to add parameters for the buffer size, memory properties and usage so that we can use this function to create many different types of buffers. The last two parameters are output variables to write the handles to.

确保为buffer大小、内存属性和用法添加参数,这样我们就可以用这个函数创建许多不同类型的buffer了。最后2个参数是要写入句柄的输出变量。

You can now remove the buffer creation and memory allocation code from createVertexBuffer and just call createBuffer instead:

现在你可以去掉createVertexBuffer 里的创建buffer和分配内存的代码了,只需调用createBuffer 即可:

void createVertexBuffer() {
    VkDeviceSize bufferSize = sizeof(vertices[0]) * vertices.size();
    createBuffer(bufferSize, VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, vertexBuffer, vertexBufferMemory);
 
    void* data;
    vkMapMemory(device, vertexBufferMemory, 0, bufferSize, 0, &data);
        memcpy(data, vertices.data(), (size_t) bufferSize);
    vkUnmapMemory(device, vertexBufferMemory);
}

Run your program to make sure that the vertex buffer still works properly.

运行你的程序,确保顶点buffer仍旧工作正常。

Using a staging buffer 使用暂存buffer

We're now going to change createVertexBuffer to only use a host visible buffer as temporary buffer and use a device local one as actual vertex buffer.

我们现在要修改createVertexBuffer  to只用1个宿主可见的buffer作为临时buffer,用一个设备局部buffer作为实际的顶点buffer。

 1 void createVertexBuffer() {
 2     VkDeviceSize bufferSize = sizeof(vertices[0]) * vertices.size();
 3  
 4     VkBuffer stagingBuffer;
 5     VkDeviceMemory stagingBufferMemory;
 6     createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, stagingBuffer, stagingBufferMemory);
 7  
 8     void* data;
 9     vkMapMemory(device, stagingBufferMemory, 0, bufferSize, 0, &data);
10         memcpy(data, vertices.data(), (size_t) bufferSize);
11     vkUnmapMemory(device, stagingBufferMemory);
12  
13     createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, vertexBuffer, vertexBufferMemory);
14 }

We're now using a new stagingBuffer with stagingBufferMemory for mapping and copying the vertex data. In this chapter we're going to use two new buffer usage flags:

我们现在用stagingBuffer 和stagingBufferMemory 来映射和复制顶点数据。本章我们要用2个新的buffer用法标志:

  • VK_BUFFER_USAGE_TRANSFER_SRC_BIT: Buffer can be used as source in a memory transfer operation. Buffer可以被用于内存转移操作的源。
  • VK_BUFFER_USAGE_TRANSFER_DST_BIT: Buffer can be used as destination in a memory transfer operation. Buffer可被用于内存转移操作的目标。

The vertexBuffer is now allocated from a memory type that is device local, which generally means that we're not able to use vkMapMemory. However, we can copy data from the stagingBuffer to the vertexBuffer. We have to indicate that we intend to do that by specifying the transfer source flag for the stagingBuffer and the transfer destination flag for the vertexBuffer, along with the vertex buffer usage flag.

vertexBuffer 现在从设备局部的内存类型分配了,这一般意味着我们不能对它用vkMapMemory。但是,我们可以从stagingBuffer 向vertexBuffer复制数据。我们必须指明,我们想这样做by指定转移源标志for stagingBuffer ,指定转移目标标志for vertexBuffer,当然还有之前的顶点buffer用法标志。

We're now going to write a function to copy the contents from one buffer to another, called copyBuffer.

我们现在要写一个函数copyBuffer从一个buffer向另一个复制数据。

void copyBuffer(VkBuffer srcBuffer, VkBuffer dstBuffer, VkDeviceSize size) {
 
}

Memory transfer operations are executed using command buffers, just like drawing commands. Therefore we must first allocate a temporary command buffer. You may wish to create a separate command pool for these kinds of short-lived buffers, because the implementation may be able to apply memory allocation optimizations. You should use the VK_COMMAND_POOL_CREATE_TRANSIENT_BIT flag during command pool generation in that case.

内存转移操作是通过命令buffer执行的,就像绘制命令一样。因此我们必须先分配一个临时的命令buffer。你可能希望创建一个单独的命令池for这些短命的buffer,因为实现可能能做一些内存分配的优化。那样的话,在生成命令池时你应当用VK_COMMAND_POOL_CREATE_TRANSIENT_BIT 标志。

void copyBuffer(VkBuffer srcBuffer, VkBuffer dstBuffer, VkDeviceSize size) {
    VkCommandBufferAllocateInfo allocInfo = {};
    allocInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO;
    allocInfo.level = VK_COMMAND_BUFFER_LEVEL_PRIMARY;
    allocInfo.commandPool = commandPool;
    allocInfo.commandBufferCount = 1;
 
    VkCommandBuffer commandBuffer;
    vkAllocateCommandBuffers(device, &allocInfo, &commandBuffer);
}

And immediately start recording the command buffer:

然后立即开始录制命令buffer:

VkCommandBufferBeginInfo beginInfo = {};
beginInfo.sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO;
beginInfo.flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT;
 
vkBeginCommandBuffer(commandBuffer, &beginInfo);

The VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT flag that we used for the drawing command buffers is not necessary here, because we're only going to use the command buffer once and wait with returning from the function until the copy operation has finished executing. It's good practice to tell the driver about our intent using VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT.

用于绘制命令buffer的VK_COMMAND_BUFFER_USAGE_SIMULTANEOUS_USE_BIT 标志,这里并不需要,因为我们只想用命令buffer一次,等待复制操作完成,函数返回。用VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT告诉驱动我们的意图,是个好习惯。

VkBufferCopy copyRegion = {};
copyRegion.srcOffset = 0; // Optional
copyRegion.dstOffset = 0; // Optional
copyRegion.size = size;
vkCmdCopyBuffer(commandBuffer, srcBuffer, dstBuffer, 1, ©Region);

Contents of buffers are transferred using the vkCmdCopyBuffer command. It takes the source and destination buffers as arguments, and an array of regions to copy. The regions are defined in VkBufferCopy structs and consist of a source buffer offset, destination buffer offset and size. It is not possible to specify VK_WHOLE_SIZE here, unlike the vkMapMemory command.

Buffer的内容通过vkCmdCopyBuffer 命令转移。它接收源buffer、目的buffer和一个区域数组为参数to复制。区域在VkBufferCopy 结构体中定义,由源buffer偏移量、目标buffer偏移量和大小组成。这里不可能指定VK_WHOLE_SIZE,这与vkMapMemory 命令不同。

vkEndCommandBuffer(commandBuffer);

This command buffer only contains the copy command, so we can stop recording right after that. Now execute the command buffer to complete the transfer:

这个命令buffer值包含复制命令,所以我们可以在此之后立即结束录制。现在执行命令buffer来完成转移操作:

VkSubmitInfo submitInfo = {};
submitInfo.sType = VK_STRUCTURE_TYPE_SUBMIT_INFO;
submitInfo.commandBufferCount = 1;
submitInfo.pCommandBuffers = &commandBuffer;
 
vkQueueSubmit(graphicsQueue, 1, &submitInfo, VK_NULL_HANDLE);
vkQueueWaitIdle(graphicsQueue);

Unlike the draw commands, there are no events we need to wait on this time. We just want to execute the transfer on the buffers immediately. There are again two possible ways to wait on this transfer to complete. We could use a fence and wait with vkWaitForFences, or simply wait for the transfer queue to become idle with vkQueueWaitIdle. A fence would allow you to schedule multiple transfers simultaneously and wait for all of them complete, instead of executing one at a time. That may give the driver more opportunities to optimize.

与绘制命令不同,这次我们不需要等什么时间。我们只想立即执行buffer的转移操作。还是有2个方式to等待转移完成。我们可以用fence和vkWaitForFences等待,或者简单地用vkQueueWaitIdle等待转移队列变成空闲状态。Fence会允许你同时安排多个转移操作,等待它们全部完成,而不是一次只执行一个。这可能给驱动更多机会去优化。

vkFreeCommandBuffers(device, commandPool, 1, &commandBuffer);

Don't forget to clean up the command buffer used for the transfer operation.

别忘了清理命令buffer that用于转移操作。

We can now call copyBuffer from the createVertexBuffer function to move the vertex data to the device local buffer:

现在我们可以在函数createVertexBuffer 中调用copyBuffer  to移动顶点数据到设备局部buffer了:

createBuffer(bufferSize, VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_VERTEX_BUFFER_BIT, VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, vertexBuffer, vertexBufferMemory);
 
copyBuffer(stagingBuffer, vertexBuffer, bufferSize);

After copying the data from the staging buffer to the device buffer, we should clean it up:

从暂存buffer复制数据到设备buffer后,我们应当清理它:

    ...
 
    copyBuffer(stagingBuffer, vertexBuffer, bufferSize);
 
    vkDestroyBuffer(device, stagingBuffer, nullptr);
    vkFreeMemory(device, stagingBufferMemory, nullptr);
}

Run your program to verify that you're seeing the familiar triangle again. The improvement may not be visible right now, but its vertex data is now being loaded from high performance memory. This will matter when we're going to start rendering more complex geometry.

运行你的程序to验证你再次看到熟悉的三角形。进步可能目前无法看到,但是它的顶点数据已经从高性能内存加载了。这在我们要开始渲染更复杂的几何体时会显得重要。

Conclusion 总结

It should be noted that in a real world application, you're not supposed to actually call vkAllocateMemory for every individual buffer. The maximum number of simultaneous memory allocations is limited by the maxMemoryAllocationCount physical device limit, which may be as low as 4096 even on high end hardware like an NVIDIA GTX 1080. The right way to allocate memory for a large number of objects at the same time is to create a custom allocator that splits up a single allocation among many different objects by using the offset parameters that we've seen in many functions.

应当注意到,在实际的应用程序中,你不应该为每个单独的buffer都调用vkAllocateMemory 。同时内存分配的最大数是受到maxMemoryAllocationCount 物理设备限制的,which即使在高端硬件(如NVIDIA GTX 1080)可能低到4096 。为大量对象正确地分配内存的方式是,创建一个自定义的分配器that拆分一个单独的空间给许多不同的对象by offset参数that我们在许多函数中见过的。

You can either implement such an allocator yourself, or use the VulkanMemoryAllocator library provided by the GPUOpen initiative. However, for this tutorial it's okay to use a separate allocation for every resource, because we won't come close to hitting any of these limits for now.

你可以自己实现这样的分配器,也可以用GPUOpen倡议的VulkanMemoryAllocator 库。但是,本教程中用一个单独的分配器for每个资源,是可以的,因为我们不会接近任何这些上限。

C++ code / Vertex shader / Fragment shader

  • Previous

 

  • Next