Skip to content

Sgemm kernel function on Nvidia Pascal GPU, able to achieve 60% theoretical performance.

License

Notifications You must be signed in to change notification settings

wjc404/Simple_CUDA_GEMM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple_CUDA_GEMM

SGEMM and DGEMM kernel functions on Nvidia GPUs.

Efficiency of the SGEMM kernel: 30-40% on GTX Titan Black, 60% on Tesla P4 and Tesla P100, 80% on Tesla V100.

Efficiency of the DGEMM kernel: 40% on GTX Titan Black, 70-80% on Tesla P100 and Tesla V100.

About

Sgemm kernel function on Nvidia Pascal GPU, able to achieve 60% theoretical performance.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published