Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug fix] Register Fusion Pass fuse policy assign wrong output edges #514

Open
wants to merge 7 commits into
base: xbox
Choose a base branch
from

Conversation

LeiWang1999
Copy link
Contributor

@LeiWang1999 LeiWang1999 commented Apr 11, 2023

  1. Add Support for int16_t load ( bloom fp16 model

  2. for Register fusion pass (welder) fused node with multiple outputs, current code makes a wrong assignment of output edge, which will cause mistakes in some cases.

3. re-write the CUDA_ARCH string in Cuda Codegen CMakeList.txt in a more friendly way.

in current way of

-gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86

if we wanna use some features which must be in sm_86, we should comment the low cuda arch gencode flag, otherwise we will get an compilation error.

ptxas /tmp/tmpxft_0000e00e_00000000-11_nnfusion_rt.compute_60.ptx, line 43059; error   : Feature '.m16n8k16' requires .target sm_80 or higher

with the new CUDA_ARCH SET way

SET(CUDA_ARCH "-gencode=arch=compute_60,code=compute_60 -gencode=arch=compute_61,code=compute_61 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_75,code=compute_75 -gencode=arch=compute_80,code=compute_80" CACHE STRING "target architecture")

we no longer have this concern.

  1. bug fix
void cuda::FusionCudaEmitter::set_launch_config()
{
    auto block = m_fusion_group["block_size"];
    auto grid = m_fusion_group["grid_size"];
    block[0].get_to(m_blockDim.x);
    block[1].get_to(m_blockDim.y);
    block[2].get_to(m_blockDim.z);
    grid[0].get_to(m_gridDim.x);
    grid[1].get_to(m_gridDim.y);
    grid[1].get_to(m_gridDim.z);
}

should be grid[2].get_to(m_gridDim.z);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants