Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vkQueueSubmit2 reports an error with enabled synchronization validation AND queue submit synchronization validation (alpha). #6177

Closed
nikitablack opened this issue Jul 21, 2023 · 19 comments
Assignees
Labels
Synchronization Synchronization Validation Object Issue

Comments

@nikitablack
Copy link

Environment:

  • OS: Ubuntu 20.04.6
  • GPU: NVIDIA RTX A5500 Laptop
  • SDK or header version if building from repo: 1.3.250
  • Options enabled (synchronization, best practices, etc.): synchronization, queue submit synchronization

Describe the Issue

A very simple application: in the very first frame, a swapchain image is acquired with vkAcquireNextImageKHR, no rendering is done, the image is transitioned to the presentation stage via an image barrier, submit. Here's the api dump of the relevant parts:

Thread 0, Frame 0, Time 297458 us:
vkAcquireNextImageKHR(device, swapchain, timeout, semaphore, fence, pImageIndex) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 0x555555b7e110
    swapchain:                      VkSwapchainKHR = 0x555555fbb1d0
    timeout:                        uint64_t = 18446744073709551615
    semaphore:                      VkSemaphore = 0x55555613d660
    fence:                          VkFence = 0
    pImageIndex:                    uint32_t* = 0

Thread 0, Frame 0, Time 297786 us:
vkBeginCommandBuffer(commandBuffer, pBeginInfo) returns VkResult VK_SUCCESS (0):
    commandBuffer:                  VkCommandBuffer = 0x555556171770
    pBeginInfo:                     const VkCommandBufferBeginInfo* = 0x7fffffffcff0:
        sType:                          VkStructureType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO (42)
        pNext:                          const void* = NULL
        flags:                          VkCommandBufferUsageFlags = 1 (VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT)
        pInheritanceInfo:               const VkCommandBufferInheritanceInfo* = UNUSED

Thread 0, Frame 0, Time 298232 us:
vkCmdPipelineBarrier2(commandBuffer, pDependencyInfo) returns void:
    commandBuffer:                  VkCommandBuffer = 0x555556171770
    pDependencyInfo:                const VkDependencyInfo* = 0x7fffffffcf40:
        sType:                          VkStructureType = VK_STRUCTURE_TYPE_DEPENDENCY_INFO (1000314003)
        pNext:                          const void* = NULL
        dependencyFlags:                VkDependencyFlags = 0
        memoryBarrierCount:             uint32_t = 0
        pMemoryBarriers:                const VkMemoryBarrier2* = NULL
        bufferMemoryBarrierCount:       uint32_t = 0
        pBufferMemoryBarriers:          const VkBufferMemoryBarrier2* = NULL
        imageMemoryBarrierCount:        uint32_t = 1
        pImageMemoryBarriers:           const VkImageMemoryBarrier2* = 0x7fffffffcf80
            pImageMemoryBarriers[0]:        const VkImageMemoryBarrier2 = 0x7fffffffcf80:
                sType:                          VkStructureType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER_2 (1000314002)
                pNext:                          const void* = NULL
                srcStageMask:                   VkPipelineStageFlags2 = 0 (VK_PIPELINE_STAGE_2_NONE)
                srcAccessMask:                  VkAccessFlags2 = 0 (VK_ACCESS_2_NONE)
                dstStageMask:                   VkPipelineStageFlags2 = 0 (VK_PIPELINE_STAGE_2_NONE)
                dstAccessMask:                  VkAccessFlags2 = 0 (VK_ACCESS_2_NONE)
                oldLayout:                      VkImageLayout = VK_IMAGE_LAYOUT_UNDEFINED (0)
                newLayout:                      VkImageLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR (1000001002)
                srcQueueFamilyIndex:            uint32_t = 4294967295
                dstQueueFamilyIndex:            uint32_t = 4294967295
                image:                          VkImage = 0x5555560f4280
                subresourceRange:               VkImageSubresourceRange = 0x7fffffffcfc8:
                    aspectMask:                     VkImageAspectFlags = 1 (VK_IMAGE_ASPECT_COLOR_BIT)
                    baseMipLevel:                   uint32_t = 0
                    levelCount:                     uint32_t = 1
                    baseArrayLayer:                 uint32_t = 0
                    layerCount:                     uint32_t = 1

Thread 0, Frame 0, Time 298414 us:
vkEndCommandBuffer(commandBuffer) returns VkResult VK_SUCCESS (0):
    commandBuffer:                  VkCommandBuffer = 0x555556171770

The validation layers with enabled synchronization and queue submit synchronization produce the error:

SYNC-HAZARD-WRITE-AFTER-READ(ERROR / SPEC): msgNum: 929810911 - Validation Error: [ SYNC-HAZARD-WRITE-AFTER-READ ] Object 0: handle = 0x555556c4ffe0, name = Graphics queue 0., type = VK_OBJECT_TYPE_QUEUE; | MessageID = 0x376bc9df | vkQueueSubmit2: Hazard WRITE_AFTER_READ for entry 0, VkCommandBuffer 0x555557133af0[], Recorded access info (recorded_usage: SYNC_IMAGE_LAYOUT_TRANSITION, command: vkCmdPipelineBarrier2, seq_no: 1, reset_no: 1). Access info (prior_usage: SYNC_PRESENT_ENGINE_SYNCVAL_PRESENT_ACQUIRE_READ_SYNCVAL, read_barriers: VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT|VK_PIPELINE_STAGE_2_BOTTOM_OF_PIPE_BIT, , batch_tag: 1, vkAcquireNextImageKHR aquire_tag:1: VkSwapchainKHR 0xfab64d0000000002[], image_index: 0image: VkImage 0xfa21a40000000003[]).
Objects: 1
[0] 0x555556c4ffe0, type: 4, name: Graphics queue 0.

The following vkQueuePresentKHR command produces the error:

SYNC-HAZARD-PRESENT-AFTER-WRITE(ERROR / SPEC): msgNum: -512052050 - Validation Error: [ SYNC-HAZARD-PRESENT-AFTER-WRITE ] Object 0: handle = 0x555556c4ffe0, name = Graphics queue 0., type = VK_OBJECT_TYPE_QUEUE; | MessageID = 0xe17ab4ae | vkQueuePresentKHR: Hazard PRESENT_AFTER_WRITE for present pSwapchains[0] , swapchain VkSwapchainKHR 0xfab64d0000000002[], image index 0 VkImage 0xfa21a40000000003[], Access info (usage: SYNC_PRESENT_ENGINE_SYNCVAL_PRESENT_PRESENTED_SYNCVAL, prior_usage: SYNC_IMAGE_LAYOUT_TRANSITION, write_barriers: 0, queue: VkQueue 0x555556c4ffe0[Graphics queue 0.], submit: 0, batch: 0, batch_tag: 2, command: vkCmdPipelineBarrier2, seq_no: 2, command_buffer: VkCommandBuffer 0x555557133af0[], reset_no: 1).
Objects: 1
[0] 0x555556c4ffe0, type: 4, name: Graphics queue 0.

Expected behavior

According to the Specification:

When transitioning the image to VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR or VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, there is no need to delay subsequent processing, or perform any visibility operations (as vkQueuePresentKHR performs automatic visibility operations). To achieve this, the dstAccessMask member of the VkImageMemoryBarrier should be set to 0, and the dstStageMask parameter should be set to VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT.

and

The TOP and BOTTOM pipeline stages are deprecated, and applications should prefer VK_PIPELINE_STAGE_2_ALL_COMMANDS_BIT and VK_PIPELINE_STAGE_2_NONE.

the destination source and stage masks are correct.

Since this is the first frame, the swapchain was created as usual and an image was never used, the source stage and access masks of 0 are also correct.

The expected behavior is to not have any validation errors.

@artem-lunarg artem-lunarg self-assigned this Jul 21, 2023
@artem-lunarg artem-lunarg added the Synchronization Synchronization Validation Object Issue label Jul 21, 2023
@artem-lunarg
Copy link
Contributor

artem-lunarg commented Jul 27, 2023

@nikitablack Could you provide a dump from the first frame (that causes the issue) for the following command?

vkAcquireNextImageKHR
vkCmdPipelineBarrier2
vkQueueSubmit2
vkQueuePresentKHR

The first two commands are still needed, even if they are provided above, because ids might change on each run.

I can reproduce the first error message (SYNC-HAZARD-WRITE-AFTER-READ) and I will investigate that use case (artem-lunarg@1831fcf).

The second message can be reported when VkPresentInfoKHR does not specify a semaphore to wait for. You might check if that's the case for your application. If not, then the dump I asked above might be helpful to understand the root cause.

@nikitablack
Copy link
Author

@artem-lunarg I do provide a semaphore in present. Here's the dump:

Thread 0, Frame 0, Time 270425 us:
vkAcquireNextImageKHR(device, swapchain, timeout, semaphore, fence, pImageIndex) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 0x555555c79d80
    swapchain:                      VkSwapchainKHR = 0x5555560b6350
    timeout:                        uint64_t = 18446744073709551615
    semaphore:                      VkSemaphore = 0x555556215900
    fence:                          VkFence = 0
    pImageIndex:                    uint32_t* = 0

Thread 0, Frame 0, Time 270460 us:
vkWaitForFences(device, fenceCount, pFences, waitAll, timeout) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 0x555555c79d80
    fenceCount:                     uint32_t = 1
    pFences:                        const VkFence* = 0x7fffffffcf90
        pFences[0]:                     const VkFence = 0x5555562160b0
    waitAll:                        VkBool32 = 1
    timeout:                        uint64_t = 18446744073709551615

Thread 0, Frame 0, Time 270481 us:
vkResetFences(device, fenceCount, pFences) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 0x555555c79d80
    fenceCount:                     uint32_t = 1
    pFences:                        const VkFence* = 0x7fffffffcf90
        pFences[0]:                     const VkFence = 0x5555562160b0

Thread 0, Frame 0, Time 270496 us:
vkResetCommandPool(device, commandPool, flags) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 0x555555c79d80
    commandPool:                    VkCommandPool = 0x555555b2c4b0
    flags:                          VkCommandPoolResetFlags = 1 (VK_COMMAND_POOL_RESET_RELEASE_RESOURCES_BIT)

allocating 10 command buffers for resource index 0
Thread 0, Frame 0, Time 270530 us:
vkAllocateCommandBuffers(device, pAllocateInfo, pCommandBuffers) returns VkResult VK_SUCCESS (0):
    device:                         VkDevice = 0x555555c79d80
    pAllocateInfo:                  const VkCommandBufferAllocateInfo* = 0x7fffffffce90:
        sType:                          VkStructureType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_ALLOCATE_INFO (40)
        pNext:                          const void* = NULL
        commandPool:                    VkCommandPool = 0x555555b2c4b0
        level:                          VkCommandBufferLevel = VK_COMMAND_BUFFER_LEVEL_PRIMARY (0)
        commandBufferCount:             uint32_t = 10
    pCommandBuffers:                VkCommandBuffer* = 0x555556214d10
        pCommandBuffers[0]:             VkCommandBuffer = 0x55555625dbc0
        pCommandBuffers[1]:             VkCommandBuffer = 0x555556257b80
        pCommandBuffers[2]:             VkCommandBuffer = 0x5555562a05b0
        pCommandBuffers[3]:             VkCommandBuffer = 0x5555562a3370
        pCommandBuffers[4]:             VkCommandBuffer = 0x5555562a6570
        pCommandBuffers[5]:             VkCommandBuffer = 0x5555562a9860
        pCommandBuffers[6]:             VkCommandBuffer = 0x5555562acbe0
        pCommandBuffers[7]:             VkCommandBuffer = 0x5555562aff60
        pCommandBuffers[8]:             VkCommandBuffer = 0x5555562b3410
        pCommandBuffers[9]:             VkCommandBuffer = 0x5555562b6830

Thread 0, Frame 0, Time 270626 us:
vkBeginCommandBuffer(commandBuffer, pBeginInfo) returns VkResult VK_SUCCESS (0):
    commandBuffer:                  VkCommandBuffer = 0x5555562b6830
    pBeginInfo:                     const VkCommandBufferBeginInfo* = 0x7fffffffcfc0:
        sType:                          VkStructureType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO (42)
        pNext:                          const void* = NULL
        flags:                          VkCommandBufferUsageFlags = 1 (VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT)
        pInheritanceInfo:               const VkCommandBufferInheritanceInfo* = UNUSED

Thread 0, Frame 0, Time 270846 us:
vkCmdBindDescriptorSets(commandBuffer, pipelineBindPoint, layout, firstSet, descriptorSetCount, pDescriptorSets, dynamicOffsetCount, pDynamicOffsets) returns void:
    commandBuffer:                  VkCommandBuffer = 0x5555562b6830
    pipelineBindPoint:              VkPipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS (0)
    layout:                         VkPipelineLayout = 0x55555620eb20
    firstSet:                       uint32_t = 0
    descriptorSetCount:             uint32_t = 2
    pDescriptorSets:                const VkDescriptorSet* = 0x7fffffffcff0
        pDescriptorSets[0]:             const VkDescriptorSet = 0x55555620b9e0
        pDescriptorSets[1]:             const VkDescriptorSet = 0x55555620c0c0
    dynamicOffsetCount:             uint32_t = 0
    pDynamicOffsets:                const uint32_t* = NULL

Thread 0, Frame 0, Time 270879 us:
vkCmdPipelineBarrier2(commandBuffer, pDependencyInfo) returns void:
    commandBuffer:                  VkCommandBuffer = 0x5555562b6830
    pDependencyInfo:                const VkDependencyInfo* = 0x7fffffffcf10:
        sType:                          VkStructureType = VK_STRUCTURE_TYPE_DEPENDENCY_INFO (1000314003)
        pNext:                          const void* = NULL
        dependencyFlags:                VkDependencyFlags = 0
        memoryBarrierCount:             uint32_t = 0
        pMemoryBarriers:                const VkMemoryBarrier2* = NULL
        bufferMemoryBarrierCount:       uint32_t = 0
        pBufferMemoryBarriers:          const VkBufferMemoryBarrier2* = NULL
        imageMemoryBarrierCount:        uint32_t = 1
        pImageMemoryBarriers:           const VkImageMemoryBarrier2* = 0x7fffffffcf50
            pImageMemoryBarriers[0]:        const VkImageMemoryBarrier2 = 0x7fffffffcf50:
                sType:                          VkStructureType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER_2 (1000314002)
                pNext:                          const void* = NULL
                srcStageMask:                   VkPipelineStageFlags2 = 0 (VK_PIPELINE_STAGE_2_NONE)
                srcAccessMask:                  VkAccessFlags2 = 0 (VK_ACCESS_2_NONE)
                dstStageMask:                   VkPipelineStageFlags2 = 0 (VK_PIPELINE_STAGE_2_NONE)
                dstAccessMask:                  VkAccessFlags2 = 0 (VK_ACCESS_2_NONE)
                oldLayout:                      VkImageLayout = VK_IMAGE_LAYOUT_UNDEFINED (0)
                newLayout:                      VkImageLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR (1000001002)
                srcQueueFamilyIndex:            uint32_t = 4294967295
                dstQueueFamilyIndex:            uint32_t = 4294967295
                image:                          VkImage = 0x5555561ef400
                subresourceRange:               VkImageSubresourceRange = 0x7fffffffcf98:
                    aspectMask:                     VkImageAspectFlags = 1 (VK_IMAGE_ASPECT_COLOR_BIT)
                    baseMipLevel:                   uint32_t = 0
                    levelCount:                     uint32_t = 1
                    baseArrayLayer:                 uint32_t = 0
                    layerCount:                     uint32_t = 1

Thread 0, Frame 0, Time 271018 us:
vkEndCommandBuffer(commandBuffer) returns VkResult VK_SUCCESS (0):
    commandBuffer:                  VkCommandBuffer = 0x5555562b6830

Thread 0, Frame 0, Time 271032 us:
vkQueueSubmit2(queue, submitCount, pSubmits, fence) returns VkResult VK_SUCCESS (0):
    queue:                          VkQueue = 0x555555ea2620
    submitCount:                    uint32_t = 1
    pSubmits:                       const VkSubmitInfo2* = 0x7fffffffcfc0
        pSubmits[0]:                    const VkSubmitInfo2 = 0x7fffffffcfc0:
            sType:                          VkStructureType = VK_STRUCTURE_TYPE_SUBMIT_INFO_2 (1000314004)
            pNext:                          const void* = NULL
            flags:                          VkSubmitFlags = 0
            waitSemaphoreInfoCount:         uint32_t = 1
            pWaitSemaphoreInfos:            const VkSemaphoreSubmitInfo* = 0x7fffffffcf60
                pWaitSemaphoreInfos[0]:         const VkSemaphoreSubmitInfo = 0x7fffffffcf60:
                    sType:                          VkStructureType = VK_STRUCTURE_TYPE_SEMAPHORE_SUBMIT_INFO (1000314005)
                    pNext:                          const void* = NULL
                    semaphore:                      VkSemaphore = 0x555556215900
                    value:                          uint64_t = 0
                    stageMask:                      VkPipelineStageFlags2 = 1024 (VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT)
                    deviceIndex:                    uint32_t = 0
            commandBufferInfoCount:         uint32_t = 1
            pCommandBufferInfos:            const VkCommandBufferSubmitInfo* = 0x7fffffffcf40
                pCommandBufferInfos[0]:         const VkCommandBufferSubmitInfo = 0x7fffffffcf40:
                    sType:                          VkStructureType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_SUBMIT_INFO (1000314006)
                    pNext:                          const void* = NULL
                    commandBuffer:                  VkCommandBuffer = 0x5555562b6830
                    deviceMask:                     uint32_t = 0
            signalSemaphoreInfoCount:       uint32_t = 1
            pSignalSemaphoreInfos:          const VkSemaphoreSubmitInfo* = 0x7fffffffcf90
                pSignalSemaphoreInfos[0]:       const VkSemaphoreSubmitInfo = 0x7fffffffcf90:
                    sType:                          VkStructureType = VK_STRUCTURE_TYPE_SEMAPHORE_SUBMIT_INFO (1000314005)
                    pNext:                          const void* = NULL
                    semaphore:                      VkSemaphore = 0x555556215a20
                    value:                          uint64_t = 0
                    stageMask:                      VkPipelineStageFlags2 = 1024 (VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT)
                    deviceIndex:                    uint32_t = 0
    fence:                          VkFence = 0x5555562160b0

Thread 0, Frame 0, Time 2181818 us:
vkQueuePresentKHR(queue, pPresentInfo) returns VkResult VK_SUCCESS (0):
    queue:                          VkQueue = 0x555555ea2620
    pPresentInfo:                   const VkPresentInfoKHR* = 0x7fffffffcfc0:
        sType:                          VkStructureType = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR (1000001001)
        pNext:                          const void* = NULL
        waitSemaphoreCount:             uint32_t = 1
        pWaitSemaphores:                const VkSemaphore* = 0x7fffffffd658
            pWaitSemaphores[0]:             const VkSemaphore = 0x555556215a20
        swapchainCount:                 uint32_t = 1
        pSwapchains:                    const VkSwapchainKHR* = 0x7fffffffd598
            pSwapchains[0]:                 const VkSwapchainKHR = 0x5555560b6350
        pImageIndices:                  const uint32_t* = 0x7fffffffd188
            pImageIndices[0]:               const uint32_t = 0
        pResults:                       VkResult* = NULL

@artem-lunarg
Copy link
Contributor

Thank you!

@jzulauf-lunarg
Copy link
Contributor

@artem-lunarg -- looks like to separate errors:

  1. the semaphore wait for the acquire doesn't seem to be guarding the swapchain image correctly
  2. the model doesn't implement the implicit ordering of PRESENT ILT's and present operations

@jzulauf-lunarg
Copy link
Contributor

@nikitablack -- any way to get a simple gfxr file or repro case?

@artem-lunarg
Copy link
Contributor

@artem-lunarg
Copy link
Contributor

The repro case is only for the first error, can't reproduce the second one.

@nikitablack
Copy link
Author

@artem-lunarg
Copy link
Contributor

artem-lunarg commented Aug 10, 2023

@nikitablack that's an update on the current progress. One issue is on the VVL side, another one is on the app side.

a) PRESENT_AFTER_WRITE

The second error message might be a missing part of our implementation (as noted by @jzulauf-lunarg). The part of the spec you mentioned above (and duplicated below for visibility) gives clear indication that visibility operation before vkQueuePresentKHR is not required. Visibility operation is determined by the destination scope, so we can use dstStageMask = STAGE_NONE and dstAccessMask = 0. Please note that availability operation (defined by source scope) still should be defined properly.

When transitioning the image to VK_IMAGE_LAYOUT_SHARED_PRESENT_KHR or VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, there is no need to delay subsequent processing, or perform any visibility operations (as vkQueuePresentKHR performs automatic visibility operations). To achieve this, the dstAccessMask member of the VkImageMemoryBarrier should be set to 0, and the dstStageMask parameter should be set to VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT.

b) WRITE_AFTER_READ

The first error message should be fixed on the application side. Availability operation of the image barrier should be properly synchronized with the acquire signal.

vkQueueSubmit2 specifies VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT as the stage to wait for but vkCmdPipelineBarrier2 still should be synchronized with this stage otherwise image transition can start before the acquire semaphore signaled.

Good explanation can be found in this blog post https://themaister.net/blog/2019/08/14/yet-another-blog-explaining-vulkan-synchronization/ in the section "Execution dependency chain with semaphore".

Also the specification provides the same recommendation (but in our case we don't need dst scope):

When the presentable image will be accessed by some stage S, the recommended
idiom for ensuring correct synchronization is:

The VkSubmitInfo used to submit the image layout transition for execution includes
vkAcquireNextImageKHR::semaphore in its pWaitSemaphores member, with the
corresponding element of pWaitDstStageMask including S.

The synchronization command that performs any necessary image layout transition
includes S in both the srcStageMask and dstStageMask.

SUMMARY
Specifying image_barrier.srcStageMask = VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT fixes the first issue.
Now I also have a reproduceable PRESENT_AFTER_WRITE error which will be the next focus.

@artem-lunarg
Copy link
Contributor

artem-lunarg commented Aug 10, 2023

For documentation purposes, here's a separate explanation of why we need to specify COLOR_ATTACHMENT_OUTPUT_BIT in the source stage of the barrier command when vkQueueSubmit2 already specifies that stage for the semaphore wait.

vkQueueSubmit2's pWaitSemaphoreInfos.stageMask says that COLOR_ATTACHMENT_OUTPUT_BIT stage should be blocked until acquire signal is received. This works for commands like vkCmdDraw that include COLOR_ATTACHMENT_OUTPUT_BIT stage as part of their execution. But barrier commands like vkCmdPipelineBarrier2 do not execute this stage, so the semaphore wait defined by vkQueueSubmit2 does not block them. They can start execution before the acquire signal.

By including COLOR_ATTACHMENT_OUTPUT_BIT as the source stage in the barrier command we create an execution dependency with the second scope of vkQueueSubmit2 semaphore wait. That's not a problem that we have no actual command that runs the COLOR_ATTACHMENT_OUTPUT_BIT stage. For the dependency chain, it's enough to specify an overlapping set of stages.

@nikitablack
Copy link
Author

nikitablack commented Aug 21, 2023

@artem-lunarg thank you for the explanation. I've read themaister's blog post a dozen of times through my Vulkan developer career and continue to refer to it quite often but some topics are still hard to understand. You mentioned:

But barrier commands like vkCmdPipelineBarrier2 do not execute this stage, so the semaphore wait defined by vkQueueSubmit2 does not block them. They can start execution before the acquire signal.

What does it meant barrier commands do not execute this stage? How does the semaphore in vkQueueSubmit2 relate to a barrier defined earlier?

Anyway, I added the corresponding stages to the barrier before the submit, but the error is still there. Here's the relevant API dump:

vkCmdPipelineBarrier2(commandBuffer, pDependencyInfo) returns void:
    commandBuffer:                  VkCommandBuffer = 0x5555562e61d0
    pDependencyInfo:                const VkDependencyInfo* = 0x7fffffffcdb0:
        sType:                          VkStructureType = VK_STRUCTURE_TYPE_DEPENDENCY_INFO (1000314003)
        pNext:                          const void* = NULL
        dependencyFlags:                VkDependencyFlags = 0
        memoryBarrierCount:             uint32_t = 0
        pMemoryBarriers:                const VkMemoryBarrier2* = NULL
        bufferMemoryBarrierCount:       uint32_t = 0
        pBufferMemoryBarriers:          const VkBufferMemoryBarrier2* = NULL
        imageMemoryBarrierCount:        uint32_t = 1
        pImageMemoryBarriers:           const VkImageMemoryBarrier2* = 0x7fffffffcdf0
            pImageMemoryBarriers[0]:        const VkImageMemoryBarrier2 = 0x7fffffffcdf0:
                sType:                          VkStructureType = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER_2 (1000314002)
                pNext:                          const void* = NULL
                srcStageMask:                   VkPipelineStageFlags2 = 1024 (VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT)
                srcAccessMask:                  VkAccessFlags2 = 256 (VK_ACCESS_2_COLOR_ATTACHMENT_WRITE_BIT)
                dstStageMask:                   VkPipelineStageFlags2 = 0 (VK_PIPELINE_STAGE_2_NONE)
                dstAccessMask:                  VkAccessFlags2 = 0 (VK_ACCESS_2_NONE)
                oldLayout:                      VkImageLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL (2)
                newLayout:                      VkImageLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR (1000001002)
                srcQueueFamilyIndex:            uint32_t = 4294967295
                dstQueueFamilyIndex:            uint32_t = 4294967295
                image:                          VkImage = 0x55555625bf50
                subresourceRange:               VkImageSubresourceRange = 0x7fffffffce38:
                    aspectMask:                     VkImageAspectFlags = 1 (VK_IMAGE_ASPECT_COLOR_BIT)
                    baseMipLevel:                   uint32_t = 0
                    levelCount:                     uint32_t = 1
                    baseArrayLayer:                 uint32_t = 0
                    layerCount:                     uint32_t = 1

Thread 0, Frame 0, Time 7953641 us:
vkEndCommandBuffer(commandBuffer) returns VkResult VK_SUCCESS (0):
    commandBuffer:                  VkCommandBuffer = 0x5555562e61d0

Thread 0, Frame 0, Time 13393352 us:
vkQueueSubmit2(queue, submitCount, pSubmits, fence) returns VkResult VK_SUCCESS (0):
    queue:                          VkQueue = 0x555555f0f160
    submitCount:                    uint32_t = 1
    pSubmits:                       const VkSubmitInfo2* = 0x7fffffffce60
        pSubmits[0]:                    const VkSubmitInfo2 = 0x7fffffffce60:
            sType:                          VkStructureType = VK_STRUCTURE_TYPE_SUBMIT_INFO_2 (1000314004)
            pNext:                          const void* = NULL
            flags:                          VkSubmitFlags = 0
            waitSemaphoreInfoCount:         uint32_t = 1
            pWaitSemaphoreInfos:            const VkSemaphoreSubmitInfo* = 0x7fffffffce00
                pWaitSemaphoreInfos[0]:         const VkSemaphoreSubmitInfo = 0x7fffffffce00:
                    sType:                          VkStructureType = VK_STRUCTURE_TYPE_SEMAPHORE_SUBMIT_INFO (1000314005)
                    pNext:                          const void* = NULL
                    semaphore:                      VkSemaphore = 0x555556282590
                    value:                          uint64_t = 0
                    stageMask:                      VkPipelineStageFlags2 = 1024 (VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT)
                    deviceIndex:                    uint32_t = 0
            commandBufferInfoCount:         uint32_t = 1
            pCommandBufferInfos:            const VkCommandBufferSubmitInfo* = 0x7fffffffcde0
                pCommandBufferInfos[0]:         const VkCommandBufferSubmitInfo = 0x7fffffffcde0:
                    sType:                          VkStructureType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_SUBMIT_INFO (1000314006)
                    pNext:                          const void* = NULL
                    commandBuffer:                  VkCommandBuffer = 0x5555562e61d0
                    deviceMask:                     uint32_t = 0
            signalSemaphoreInfoCount:       uint32_t = 1
            pSignalSemaphoreInfos:          const VkSemaphoreSubmitInfo* = 0x7fffffffce30
                pSignalSemaphoreInfos[0]:       const VkSemaphoreSubmitInfo = 0x7fffffffce30:
                    sType:                          VkStructureType = VK_STRUCTURE_TYPE_SEMAPHORE_SUBMIT_INFO (1000314005)
                    pNext:                          const void* = NULL
                    semaphore:                      VkSemaphore = 0x5555562826b0
                    value:                          uint64_t = 0
                    stageMask:                      VkPipelineStageFlags2 = 1024 (VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT)
                    deviceIndex:                    uint32_t = 0
    fence:                          VkFence = 0x555556282d40

As you can see, the barrier's srcStageMask contain the VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT as well as the wait semaphor's stageMask.

@artem-lunarg
Copy link
Contributor

artem-lunarg commented Aug 21, 2023

but some topics are still hard to understand

Yes, that's non trivial. Thanks to working on this item I clarified some concepts related to stages and synchronization commands. The next version of the specification will contain updated wording related to this.

What does it meant barrier commands do not execute this stage?

Mostly to emphasize that execution of barrier commands (e.g. vkCmdPipelineBarrier2) is not defined in terms of stages, they do not execute any stage. Draw commands/dispatches/copy execute some stages (even if it's a single stage). The existing version of the specification (1.3.261) contains wording that can suggest that barrier command execution is also defined as running some stage, but that's not the case. The next version of the specification will fix this (section 7.1.2. Pipeline Stages of the specification with all extensions).

The above clarification to the specification is important. Synchronization scopes are often described in terms of stages (e.g. whether stages are blocked or allowed to continue execution). If the command execution is not described in terms of stages then it won't be included in the synchronization scope.

How does the semaphore in vkQueueSubmit2 relate to a barrier defined earlier?

The barrier is defined earlier only in the source code. When we consider actual execution of the commands at runtime, then the logical order is: at first we have have wait operation specified by vkQueueSubmit2's pWaitSemaphoreInfos and after that we start execution of the commands recorded in the command buffer (with one exception, that the stages that were not included in pWaitSemaphoreInfos can start execution before pWaitSemaphoreInfos wait finishes ).

So the situation is that vkQueueSubmit2 on its own can not ensure that vkCmdPipelineBarrier2 won't start execution (which include image layout transition) before pWaitSemaphoreInfos wait finishes (because vkCmdPipelineBarrier2 does not run any stages that pWaitSemaphoreInfos can include in the second synchronization scope). That's why it's responsibility of vkCmdPipelineBarrier2 to create proper execution dependency with vkQueueSubmit2's pWaitSemaphoreInfos sync point to ensure it does not start earlier. One way to create this execution dependency is to specify that vkCmdPipelineBarrier2 waits for the same stages that are specified in vkQueueSubmit2's pWaitSemaphoreInfos.

@artem-lunarg
Copy link
Contributor

artem-lunarg commented Aug 21, 2023

Anyway, I added the corresponding stages to the barrier before the submit, but the error is still there.

Hm... Let me clarify something before we continue further. This ticket reports two errors WRITE_AFTER_READ and PRESENT_AFTER_WRITE. The second one is not solved yet, hopefully I'll be working on it this week. The above discussion is related to WRITE_AFTER_READ. Could you confirm that with the above change WRITE_AFTER_READ is still being reported?

Here's my repro case artem-lunarg@33f99d6 that fixes WRITE_AFTER_READ by synchronizing vkCmdPipelineBarrier2 with pWaitSemaphoreInfos waits and now it reports only PRESENT_AFTER_WRITE.

If I remove image_barrier.srcStageMask = VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT it reports WRITE_AFTER_READ. If srcStageMask does not define execution dependency then image layout transition can start before acquire semaphore wait finishes, in which case layout transition access (WRITE+READ) is not separated by any barrier from swapchain acquire (so presentation engine can still do READ), that's why WRITE_AFTER_READ hazard.

@nikitablack
Copy link
Author

Sorry, in the meantime I added another barrier and the validation error was about it. After I added the suggested stage flag in the srcStageMask I see no WRITE_AFTER_READ error anymore.
Thanks for the detailed answer - it has a lot of VERY useful information which helps to understand the subject. I'll share this thread in the relevant chats because I saw the similar questions arise from time to time.

@artem-lunarg
Copy link
Contributor

artem-lunarg commented Aug 25, 2023

@nikitablack I discussed the second issue (PRESENT_AFTER_WRITE) with our synchronization expert, and it turns out, it also can be fixed on the app side. I'm sorry I did not notice this sooner, since the reason is very similar to the first error with WRITE_AFTER_READ.

The first error (WRITE_AFTER_READ) was that a barrier command did not synchronize with submit's pWaitSemaphoreInfos, so layout transition could start before the image is acquired.

The second error (PRESENT_AFTER_WRITE) is because the barrier command does not synchronize with submit's pSignalSemaphoreInfos, so layout transition can still be active after the semaphore is signaled, and thus, it can be active when the presentation starts. That's why the PRESENT_AFTER_WRITE hazard.

Adding COLOR_ATTACHMENT_OUTPUT to the second scope of the barrier creates the necessary execution dependency to finish the layout transition before the presentation starts. Here's an example where I fixed the same issue in the test:

layout_transition.dstStageMask = VK_PIPELINE_STAGE_2_COLOR_ATTACHMENT_OUTPUT_BIT;

That's only execution dependency, memory dependencies are resolved as you quoted above - the spec guarantees that image writes are automatically become visible to the presentation engine.

@artem-lunarg
Copy link
Contributor

Closing this issue as not a VVL bug.

Thanks @nikitablack for reporting this, it helped to understand this part of synchronization better. Feel free to re-open if we missed something.

@nikitablack
Copy link
Author

nikitablack commented Aug 25, 2023

@artem-lunarg that fixes for me too. But it conflicts with the specification. I'll quote it here again for brevity:

When transitioning the image to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR, there is no need to delay subsequent processing, or perform any visibility operations. To achieve this, the dstAccessMask member of the VkImageMemoryBarrier should be set to 0, and the dstStageMask parameter should be set to VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT.

The keywords here are:

  • there is no need to delay subsequent processing (i.e. no execution dependency needed)
  • should be set to VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT (which is deprecated in favor of VK_PIPELINE_STAGE_2_NONE).

@artem-lunarg
Copy link
Contributor

artem-lunarg commented Aug 25, 2023

That's a good point! Please let me clarify this with the specification authors. I'm afraid that part was written when only vkQueueSubmit existed, which does not provide a way to specify scope for a signal semaphore. It might be it has to re-phrased because vkQueueSubmit2 can create hazard situations.

@artem-lunarg
Copy link
Contributor

Request to clarify this part of the specification (internal link): https://gitlab.khronos.org/vulkan/vulkan/-/issues/3608

johannesugb added a commit to cg-tuwien/Auto-Vk-Toolkit that referenced this issue Mar 21, 2024
johannesugb added a commit to cg-tuwien/Auto-Vk-Toolkit that referenced this issue Mar 23, 2024
johannesugb added a commit to cg-tuwien/Auto-Vk-Toolkit that referenced this issue Mar 26, 2024
johannesugb added a commit to cg-tuwien/Auto-Vk-Toolkit that referenced this issue Jul 31, 2024
* Towards fixing everything.
Most importantly these issues: KhronosGroup/Vulkan-ValidationLayers#6177

* Update cross-platform-check.yml

Using Vulkan SDK 1.3.216.0 instead of 1.3.204.1. The latter is no longer available for download as it appears.

* Attempted to fix sync issues in compute_image_processing.cpp with auto barriers (only able to fix them with explicit heavy barrier)

* Fake news: auto barriers work

* Fixed synchronization in dynamic_rendering example application

* Enabling synchronization validation in multi_invokee_rendering example

* Still problems in the multiple_queues example

* Added custom tangent space calculation (based on new (and shitty) ASSIMP code.

* Reverted bitangent calculation to ASSIMP's good old code in model.cpp

* Linked to latest Auto-Vk after hot fix to development branch.

* Added avk::projection parameter to camera::set_projection_matrix in order to not loose the information about the projection kind

* Linked to Auto-Vk after documentation for read has been added.

* Updated to The Assimp 5.4.2 Bugfix Release

* Added the CMake-generated config.h file

* ASSIMP .lib and .dll updates => Debug == Release actually

* Trying to fix workflows/cross-platform-check.yml

* assimp-vc143-mt.lib in assimp.cmake

* No idea what I'm doing...

* Updated auto_vk submodule

* vk::detail::createResultValueType for breaking change in latest Vulkan SDK
johannesugb added a commit to cg-tuwien/Auto-Vk-Toolkit that referenced this issue Aug 1, 2024
* Towards fixing everything.
Most importantly these issues: KhronosGroup/Vulkan-ValidationLayers#6177

* Update cross-platform-check.yml

Using Vulkan SDK 1.3.216.0 instead of 1.3.204.1. The latter is no longer available for download as it appears.

* Attempted to fix sync issues in compute_image_processing.cpp with auto barriers (only able to fix them with explicit heavy barrier)

* Fake news: auto barriers work

* Fixed synchronization in dynamic_rendering example application

* Enabling synchronization validation in multi_invokee_rendering example

* Still problems in the multiple_queues example

* Fixed multiple_queues example

* One more synchronization hint fixes a hazard from image layout transition in orca_loader example

* Fixed synchronization in vertex_buffers example

* Fixed synchronizationin ray tracing examples

* Not going to fix synchronization in framebuffer example and present_from_compute example, sorry bruh!
johannesugb added a commit to cg-tuwien/Auto-Vk-Toolkit that referenced this issue Aug 1, 2024
* Towards fixing everything.
Most importantly these issues: KhronosGroup/Vulkan-ValidationLayers#6177

* Update cross-platform-check.yml

Using Vulkan SDK 1.3.216.0 instead of 1.3.204.1. The latter is no longer available for download as it appears.

* Attempted to fix sync issues in compute_image_processing.cpp with auto barriers (only able to fix them with explicit heavy barrier)

* Fake news: auto barriers work

* Fixed synchronization in dynamic_rendering example application

* Enabling synchronization validation in multi_invokee_rendering example

* Still problems in the multiple_queues example

* Added custom tangent space calculation (based on new (and shitty) ASSIMP code.

* Reverted bitangent calculation to ASSIMP's good old code in model.cpp
Mrkol added a commit to AlexandrShcherbakov/etna that referenced this issue Sep 17, 2024
1) Proper synchronization for swapchain images. See this for explanation: KhronosGroup/Vulkan-ValidationLayers#6177
2) More debug naming: for swapchain images and all image views.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Synchronization Synchronization Validation Object Issue
Projects
None yet
Development

No branches or pull requests

3 participants