Skip to content

Latest commit

 

History

History
6 lines (4 loc) · 273 Bytes

README.md

File metadata and controls

6 lines (4 loc) · 273 Bytes

Simple_CUDA_GEMM

SGEMM and DGEMM kernel functions on Nvidia GPUs.

Efficiency of the SGEMM kernel: 30-40% on GTX Titan Black, 60% on Tesla P4 and Tesla P100, 80% on Tesla V100.

Efficiency of the DGEMM kernel: 40% on GTX Titan Black, 70-80% on Tesla P100 and Tesla V100.