[Bug fix] Register Fusion Pass fuse policy assign wrong output edges #514

LeiWang1999 · 2023-04-11T12:27:32Z

Add Support for int16_t load ( bloom fp16 model
for Register fusion pass (welder) fused node with multiple outputs, current code makes a wrong assignment of output edge, which will cause mistakes in some cases.

3. re-write the CUDA_ARCH string in Cuda Codegen CMakeList.txt in a more friendly way.

in current way of

-gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86

if we wanna use some features which must be in sm_86, we should comment the low cuda arch gencode flag, otherwise we will get an compilation error.

ptxas /tmp/tmpxft_0000e00e_00000000-11_nnfusion_rt.compute_60.ptx, line 43059; error   : Feature '.m16n8k16' requires .target sm_80 or higher

with the new CUDA_ARCH SET way

SET(CUDA_ARCH "-gencode=arch=compute_60,code=compute_60 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_80,code=compute_80" CACHE STRING "target architecture")

we no longer have this concern.

bug fix

void cuda::FusionCudaEmitter::set_launch_config()
{
    auto block = m_fusion_group["block_size"];
    auto grid = m_fusion_group["grid_size"];
    block[0].get_to(m_blockDim.x);
    block[1].get_to(m_blockDim.y);
    block[2].get_to(m_blockDim.z);
    grid[0].get_to(m_gridDim.x);
    grid[1].get_to(m_gridDim.y);
    grid[1].get_to(m_gridDim.z);
}

should be grid[2].get_to(m_gridDim.z);

LeiWang1999 added 2 commits April 11, 2023 04:23

add support for int16_t load (bloom fp16 model)

556158b

fix bugs of register fusion pass

4cb6ce3

LeiWang1999 requested a review from xiayuqing0622 April 11, 2023 12:27

re-type the CUDA_ARCH String

b62ce84

xiayuqing0622 approved these changes Apr 12, 2023

View reviewed changes

LeiWang1999 added 4 commits April 11, 2023 23:52

bug fix ..

6b681e3

add dot permutation pass

7b605e3

support layout of layoutdot

b03e0a9

lowbit update

bcbe7d0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug fix] Register Fusion Pass fuse policy assign wrong output edges #514

[Bug fix] Register Fusion Pass fuse policy assign wrong output edges #514

LeiWang1999 commented Apr 11, 2023 •

edited

Loading

[Bug fix] Register Fusion Pass fuse policy assign wrong output edges #514

Are you sure you want to change the base?

[Bug fix] Register Fusion Pass fuse policy assign wrong output edges #514

Conversation

LeiWang1999 commented Apr 11, 2023 • edited Loading

3. re-write the CUDA_ARCH string in Cuda Codegen CMakeList.txt in a more friendly way.

LeiWang1999 commented Apr 11, 2023 •

edited

Loading