Skip to content

Latest commit



199 lines (170 loc) · 8.99 KB

File metadata and controls

199 lines (170 loc) · 8.99 KB

OCML User Guide


What Is OCML

OCML is an LLVM-IR bitcode library designed to relieve language compiler and runtime implementers of the burden of implementing efficient and accurate mathematical functions. It is essentially a “libm” in intermediate representation with a fixed, simple API that can be linked in to supply the implementations of most standard low-level mathematical functions provided by the language.

Using OCML

Standard Usage

OCML is expected to be used in a standard LLVM compilation flow as follows:

  • Compile source modules to LLVM-IR bitcode (clang)
  • Link program bitcode, “wrapper” bitcode, OCML bitcode, and OCML control functions (llvm-link)
  • Generic optimizations (opt)
  • Code generation (llc)

Here, “wrapper” bitcode denotes a thin library responsible for mapping mangled built-in function calls as produced by clang to the OCML API. An example in C might look like

inline float sqrt(float x) { return __ocml_sqrt_f32(x); }

The next section describes OCML controls and how to make them.


OCML supports a number of controls that are provided by linking in specifically named inline functions. These functions are inlined at optimization time and result in specific paths taken with no control flow overhead. These functions all have the form (in C)

__attribute__((always_inline, const)) int
{ return 1; } // or 0 to disable

The currently supported control are

  • finite_only_opt - floating point Inf and NaN are never expected to be consumed or produced
  • unsafe_math_opt - lower accuracy results may be produced with higher performance
  • daz_opt - subnormal values consumed and produced may be flushed to zero
  • correctly_rounded_sqrt32 - float square root must be correctly rounded
  • ISA_version - an integer representation of the ISA version of the target device


OCML ships as a single LLVM-IR bitcode file named

ocml-{LLVM rev}-{OCLM rev}.bc

where {LLVM rev} is the version of LLVM used to create the file, of the form X.Y, e.g. 3.8, and {OCML rev} is the OCML library version of the form X.Y, currently 0.9.


Some OCML functions require access to tables of constants. These tables are currently named with the prefix __ocmltbl_ and are placed in LLVM address space 2.

Naming convention

OCML functions follow a simple naming convention:

__ocml_{function}_{type suffix}

where {function} is generally the familiar libm name of the function, and {type suffix} indicates the type of the floating point arguments or results, and is one of

  • f16 – 16 bit floating point (half precision)
  • f32 – 32 bit floating point (single precision)
  • f64 – 64 bit floating point (double precision)

For example, __ocml_sqrt_f32 is the name of the OCML single precision square root function.

OCML does not currently support higher than double precision due to the lack of support on most devices.

Supported functions

The following table contains a list of {function} currently supported by OCML, a brief description of each, and the maximum relative error in ULPs for each floating point type. A “c” in the last 3 columns indicates that the function is required to be correctly rounded.

{function} Description f32 max err f64 max err f16 max err
acos arc cosine 4 4 2
acosh arc hyperbolic cosine 4 4 2
acospi arc cosine / π 5 5 2
add_{rm} add with specific rounding mode c c c
asin arc sine 4 4 2
asinh arc hyperbolic sin 4 4 2
asinpi arc sine / pi 5 5 2
atan2 two argument arc tangent 6 6 2
atan2pi two argument arc tangent / pi 6 6 2
atan single argument arc tangent 5 5 2
atanh arc hyperbolic tangent 5 5 2
atanpi single argument arc tangent / pi 5 5 2
cbrt cube root 2 2 2
ceil round upwards to integer c c c
copysign copy sign of second argument to absolute value of first 0 0 0
cos cosine 4 4 2
cosh hyperbolic cosine 4 4 2
cospi cosine of argument times pi 4 4 2
div_{rm} correctly rounded division with specific rounding mode c c c
erf error function 16 16 4
erfc complementary error function 16 16 4
erfcinv inverse complementary error function 7 8 3
erfcx scaled error function 6 6 2
erfinv inverse error function 3 8 2
exp10 10x 3 3 2
exp2 2x 3 3 2
exp ex 3 3 2
expm1 ex - 1, accurate at 0 3 3 2
fabs absolute value 0 0 0
fdim positive difference c c c
floor round downwards to integer c c c
fma[_{rm}] fused (i.e. singly rounded) multiply-add, with optional specific rounding c c c
fmax maximum, avoids NaN 0 0 0
fmin minimum, avoids NaN 0 0 0
fmod floating point remainder 0 0 0
fpclassify classify floating point - - -
fract fractional part c c c
frexp extract significand and exponent 0 0 0
hypot length, with overflow control 4 4 2
i0 modified Bessel function of the first kind, order 0, I0 6 6 2
i1 modified Bessel function of the first kind, order 1, I1 6 6 2
ilogb extract exponent 0 0 0
isfinite tests finiteness - - -
isinf test for Inf - - -
isnan test for NaN - - -
isnormal test for normal - - -
j0 Bessel function of the first kind, order 0, J0 6 (<12) 6 (<12) 2 (<12)
j1 Bessel function of the first kind, order 1, J1 6 (<12) 6 (<12) 2 (<12)
ldexp multiply by 2 raised to an integral power c c c
len3 three argument hypot 2 2 2
len4 four argument hypot 2 2 2
lgamma log Γ function 6(>0) 4(>0) 3(>0)
lgamma_r log Γ function with sign 6(>0) 4(>0) 3(>0)
log10 log base 10 3 3 2
log1p log base e accurate near 1 2 2 2
log2 log base 2 3 3 2
log log base e 3 3 2
logb extract exponent 0 0 0
mad multiply-add, implementation defined if fused c c c
max maximum without special NaN handling 0 0 0
maxmag maximum magnitude 0 0 0
min minimum without special NaN handling 0 0 0
minmag minimum magnitude 0 0 0
modf extract integer and fraction 0 0 0
mul_{rm} multiply with specific rounding mode c c c
nan produce a NaN with a specific payload 0 0 0
ncdf standard normal cumulateive distribution function 16 16 4
ncdfinv inverse standard normal cumulative distribution function 16 16 4
nearbyint round to nearest integer (see also rint) 0 0 0
nextafter next closest value above or below 0 0 0
pow general power 16 16 4
pown power with integral exponent 16 16 4
powr power with positive floating point exponent 16 16 4
rcbrt reciprocal cube root 2 2 2
remainder floating point remainder 0 0 0
remquo floating point remainder and lowest integral quotient bits 0 0 0
rhypot reciprocal hypot 2 2 2
rint round to nearest integer c c c
rlen3 reciprocal len3 2 2 2
rlen4 reciprocal len4 2 2 2
rootn nth root 16 16 4
round round to integer, always away from 0 c c c
rsqrt reciprocal square root 2 2 1
scalb multiply by 2 raised to a power c c c
scalbn multiply by 2 raised to an integral power (see also ldexp) c c c
signbit nonzero if argument has sign bit set - - -
sin sine function 4 4 2
sincos simultaneous sine and cosine evaluation 4 4 2
sincospi sincos function of argument times pi 4 4 2
sinh hyperbolic sin 4 4 2
sinpi sine of argument times pi 4 4 2
sqrt square root 3/c 3/c c
sub_{rm} subtract with specific rounding mode c c c
tan tangent 5 5 2
tanh hyperbolic tangent 5 5 2
tanpi tangent of argument times pi 6 6 2
tgamma true Γ function 16 16 4
trunc round to integer, towards zero c c c
y0 Bessel function of the second kind, order 0, Y0 2 (<12) 6 (<12) 6 (<12)
y1 Bessel function of the second kind, order 1, Y1 2 (<12) 6 (<12) 6 (<12)

For the functions supporting specific roundings, the rounding mode {rm} can be one of

  • rte – round towards nearest even
  • rtp – round towards positive infinity
  • rtn – round towards negative infinity
  • rtz – round towards zero