11 minutes

Written: 2021-08-14 00:28 +0000

# GSoC21 W10: LFortran Runtime Library Design

The road to accurate elementary intrinsic functions is paved with IEEE/ISO standards

## Background

Serialized update for the 2021 Google Summer of Code under the fortran-lang organization, mentored by Ondrej Certik.

### Series

This post is part of a series based around my weekly GSoC21 project check-ins.

- GSoC21 W1: LFortran Kickoff
- GSoC21 W2: LFortran Unraveling
- GSoC21 W3: Kind, Characters, and Standards
- GSoC21 W4: LFortran, Backends and Bugs
- GSoC21 W5: LFortran Design Details and
`minidftatom`

- GSoC21 W6: LFortran ASR and Values
- GSoC21 W7: LFortran Workflow Basics
- GSoC21 W8: LFortran Refactors, and Compile Time Evaluations
- GSoC21 W9: LFortran Bug Hunting Bonanza
**GSoC21 W10: LFortran Runtime Library Design**<– You are here!

## Logistics

- Met with Ondrej daily from Sunday till Friday
- On Sunday we finished the compile time evaluation design
(partially)
- Ondrej finished the rest overnight

- On Monday, we started co working on runtime implemenations
- Also added some more compile time stuff and pointers to clean things up
- Ondrej fixed a longstanding bug in the deserialise of serialized mod

- On Wednesday there was more scaffolding work done on the
generics
- Added another implicit cast

- On Thursday we implemented an all fortran
`sin`

function- Just as accurate as
`gfortran`

- Compiles with LFortran
- Also fixed another serialization bug (floats were being truncated)

- Just as accurate as
- On Friday we redesigned the runtime library
- Also discussed future plans and longterm goals
- Opened discussions around the runtime library and elementary function accuracy on the Zulip instance
- Considered adding
`fortran`

versions of LAMMPS (F77 and F90) to the compilation plans

- On Sunday we finished the compile time evaluation design
(partially)

## Overview

This was most definitely my most productive week by far.

### New Merge Requests

- llvm: More values when possible (1113)
- Backend values from compile time evaluations
- kind: Compile time (1132)
- A rather thorny refactor; but one with great benefits
- asr: Clean up extract_kind (1136)
- Fruitful usage of the new functions
- Compile Time Intrinsic: Real (1114)
- A
`cast`

based approach - selected_kinds: Body visitors (1167)
- Possibly my favorite MR, a nice pure
`fortran`

implementation backed by a rough`sollya`

approximation - Initial sin implementation (1167)
- Possibly my favorite MR, a nice pure
`fortran`

implementation backed by a rough`sollya`

approximation - sin: Rework runtime to get better approximations (1173)
- Possibly my favorite MR, a nice pure
`fortran`

implementation backed by a rough`sollya`

approximation - Compile time floor (1176)
- Fairly standard implementation requiring casts
- Integer2real (1162)
- An implicit cast which had not previously been implement
- Compile time int (1169)
- Along the lines of
`real()`

- Sincos tests (1194)
- Benchmarking and setting up of the new test harness

### Freshly Assigned Issues

- Semantics: Implement correct handling of PARAMETER (510)
- ASR Pass: Order functions (507)
- Semantics: Implement FLOOR (A [, KIND]) (506)
- Semantics: Implement MODULO (A, [, KIND]) (504)
- [RFC] Compile Time Intrinsic Design (501)
- Strange Runtime issue with Implicit Casts (500)
- Implicit cast for parameter (499)
- Better handling of intrinsic module names (494)

### Additional Tasks

Will write a closing report for next week.

## Misc Logs

This was a really packed week. Much of the discussion has already taken place in the relevant issues and MRs; but some notes are in order for posterity.

### Items to fix

- Some compile time function implementations are not standard
compliatn
`real`

- Implicit casting rules fail for certain cases
- Notably for all
`Arrays`

- Notably for all

Strange bug when intrinsic functions are called on implicit cast

### Generic Proceedures

This was fun to work on.

### Fortran Runtime Intrinsics

A long term goal is to have a dog-fooded runtime library. We zeroed in on this design over on the wiki a few weeks ago. Now with generic proceedures in; we’ve begun implementing some of the intrinsic functions.

#### Simple Functions

The simplest ones are of course trivial:

```
1abs
```

#### Trignometric functions

For more “mathematical” methods, we needed a template. One we found
which was compatible with our license is the `fdlibm`

versions. Consider
for `sin`

:

Older compilers are much more readable compared to the newer ones.

## Trignometric Functions

The quote is motivated by my (limited) reading of two texts in the field of implementing elementary functions, namely Muller (2016) and Muller et al. (2018).

### Brainstorming Sin

- This and then the simpler version here
- Multiple interval based approximation used for AMD’s optimized sin

We were looking for a pedagogical, simple approach to calculating `sin`

.
Unsurprisingly, the entire ecosystem of compilers seems to mostly follow
one implementation from the early 90s. Which is great, but Fortran
doesn’t really implement IEEE 754’s argument reduction routine (a.k.a.
`quadmath`

). It turns out with a little massaging; the pedagogical
approach presented in this blog post by Christer
Ericson can be
adapted for use.

We needed to do slightly better. For starters, the errors of around \(10^{-16}\) could be stomached, but not the ones around \(10^{-10}\). Think of this as the same sort of problem as the standard fitting of a potential energy surface.

How do we test these? Import from the intrinsic, and rename to a different symbol, then run a loop.

## Sollya and Function Approximations

Rather than go rooting around the annals of outdated compilers; and instead of boldly copying from existing libraries; an interesting approach is to calculate precise approximations including the Remez (or minmax), Taylor approximations, and Chebyshev approximations.

Thankfully testing this is trivial with Sollya. Since it does nothave a
terminal inbuild we need `rlwrap`

too.

```
1nix-shell -p sollya rlwrap
```

Consider a trial implementation:

```
1! lfortran_intrinsic_sin.f90
2module lfortran_intrinsic_sin
3use, intrinsic :: iso_fortran_env, only: sp => real32, dp => real64
4implicit none
5
6interface sin
7 module procedure dsin
8end interface
9
10contains
11
12elemental real(dp) function dsin(x) result(r)
13real(dp), parameter :: pi = 3.1415926535897932384626433832795_dp
14real(dp), intent(in) :: x
15real(dp) :: y
16integer :: n
17if (abs(x) < pi/2) then
18 r = kernel_dsin(x)
19else ! fold to pi/2
20! https://realtimecollisiondetection.net/blog/?p=9
21 y = modulo(x, 2*pi)
22 y = min(y, pi - y)
23 y = max(y, -pi - y)
24 y = min(y, pi - y)
25 r = kernel_dsin(y)
26end if
27end function
28! Implement kernel_dsin with Sollya (read post)
```

The general approach to testing this is then:

```
1! test_sina.f90
2program main
3 use :: lfortran_intrinsic_sin, only: sin2 => dsin
4 real(kind=8) :: x,y
5 integer(kind=8) :: i
6 do i=1,1000000
7 y = real(i)/100.0
8 print*, y, abs(sin(y)-sin2(y))
9 end do
10end program
```

Which can be compiled with:

```
1gfortran -ffree-line-length-none lfortran_intrinsic_sin.f90 test_sina.f90 && ./a.out
```

Where the line length is important since we want to be able to copy and paste expressions from Sollya.

### Remez

The test expressions can be generated as follows:

```
1nix-shell -p sollya rlwrap
2rlwrap -A sollya
3prec=120;
4f=sin(x);
5d=[-pi/2;pi/2];
6p=remez(f,16,d,default,1e-16,1e-30);
7p;
81.430896392112983438690712915149618875e-17 + x * (0.9999999999999989714281994951487325976 + x * (-3.349530544567829511297283045847078806e-16 + x * (-0.1666666666666470131512533841331726124 + x * (2.06038248351960358085265034956173799e-15 + x * (8.33333333322297677610158866083573121e-3 + x * (-5.200037492550081347598442689963074914e-15 + x * (-1.984126981335937217532222096272076263e-4 + x * (6.779661505197324337448766161086156236e-15 + x * (2.755731547177283303366066506722157076e-6 + x * (-4.994033469644991382509162854527493863e-15 + x * (-2.505182167856013032537240066679691216e-8 + x * (2.101039405474559290547827191709051192e-15 + x * (1.604654097883256924434495617086734883e-10 + x * (-4.71033700061518989435986419744537652e-16 + x * (-7.356804698944816248097669067062646244e-13 + x * 4.36551332303930597866240036869066221e-17)))))))))))))))
```

Where the signature of `remez`

is
`func, degree, interval, default=1, quality=1e-5, bounds=[0;infty]`

; Now
we can implement `kernel_dsin`

as follows:

```
1elemental real(dp) function kernel_dsin(x) result(res)
2 real(dp), intent(in) :: x
3 res = 1.430896392112983438690712915149618875e-17_dp + x * (0.9999999999999989714281994951487325976_dp + x * (-3.349530544567829511297283045847078806e-16_dp + x * (-0.1666666666666470131512533841331726124_dp + x * (2.06038248351960358085265034956173799e-15_dp + x * (8.33333333322297677610158866083573121e-3_dp + x * (-5.200037492550081347598442689963074914e-15_dp + x * (-1.984126981335937217532222096272076263e-4_dp + x * (6.779661505197324337448766161086156236e-15_dp + x * (2.755731547177283303366066506722157076e-6_dp + x * (-4.994033469644991382509162854527493863e-15_dp + x * (-2.505182167856013032537240066679691216e-8_dp + x * (2.101039405474559290547827191709051192e-15_dp + x * (1.604654097883256924434495617086734883e-10_dp + x * (-4.71033700061518989435986419744537652e-16_dp + x * (-7.356804698944816248097669067062646244e-13_dp + x * 4.36551332303930597866240036869066221e-17_dp)))))))))))))))
4
5end function
```

Note that we need to use double precision for the numbers, so we can
replace `SPACE+`

with `_dpSPACE+`

. We would like to also get some
statistics of the output, in `bash`

just becuse we can ^{1}.

```
1gfortran -ffree-line-length-none lfortran_intrinsic_sin.f90 test_sina.f90 && ./a.out > remez16
2awk 'NR == 1 || $2 < min {line = $0; min = $2}END{print line}' remez16
3awk 'NR == 1 || $2 > max {line = $0; max = $2}END{print line}' remez16
```

### Taylor and Chebyshev

To extract information from the “certified” functions in `Sollaya`

we
need to remember that the return value is a list.

```
1nix-shell -p sollya rlwrap
2rlwrap -A sollya
3prec=120;
4f=sin(x);
5d=[-pi/2;pi/2];
6p=chebyshevform(f,16,d);
7p[2];
8p[0];
9quit;
```

Similarly `taylorform`

can be used to get Taylor series expansions.
Given that we know exactly what we want to generate; we can also simply
use `sollya`

with an input script:

```
1sollya < taylorform16
```

Where `taylorform16`

contains:

```
1f=sin(x);
2d=[-pi/2;pi/2];
3p=taylorform(f,16,d);
4p[0];
```

This workflow can be easily extended into a script based runner.

## Better Polynomials

Some basic understanding of series expansions and / or visual inspection
would make it clear that for `sin`

at-least; the odd terms are
essentially \(0\). `Sollya`

can handle these; as described in Muller
(2016). We opted to go the
rounding error route.

## Rounding Errors

With Ondrej, we were able to isolate the inaccuracy in our pure
`fortran`

implementation and we realized it stemmed from the manner in
which our argument reduction is carried out. This is a well known
problem with attempted solutions from 1980s Payne and Hanek
(1983) (including a
related implementation in
BSD)^{2}.
In particular, as per this WG document of the SunPro
group
Ng (2006) describing the cannonical
implementation; the error is actually from the division and difference.
This is also touched upon elsewhere in the literature, Brisebarre et al.
(2005) and Defour et
al. (n.d.), along with Muller et al.
(2018) . Naturally one
of first places to check is the relevant ISO draft standard on
elementary
functions
(**N462Pdf?**) .

This isn’t immediately actionable; but might be something for later.
Naturally the discussions here in context of IEEE 754
(**IEEEStd754?**) (which the Fortran standard does
not fully support) and presumes basic knowledge of floating point
numbers Goldberg (1991) . The
clearest explanation for requiring higher precision for some constants
in these reduction algorithms is given nicely by the draft standard:

For reduction of an argument given in radians, implementations use one or several approximate value(s) of π (or of a multiple of π), valid to, say, n digits. The division implied in the argument reduction cannot be valid to more than n digits, which implies a maximum absolute angle value for which the reduction yields an accurate reduced angle value.

## Benchmarking and Testing

To make the benchmarking of new `lfortran`

runtime intrinsic functions
as painless as the existing test runner framework; I ended up
implementing a new framework for these. Currently this takes different
compiler backends and a TOML file along with some `Jinja2`

templates and
makes a `CMake`

based project which is built and gives out a grid of
points along with the `gfortran`

values and `abs`

differences. Some
things to do there:

- Plot things automatically
- Use non-uniform grids
- Other QoL changes

This is really easy to work with:

```
1cd src/runtime/pure/benchmarks
2nix-shell -p 'python38.withPackages(ps: with ps; [seaborn matplotlib click jinja2 tomli pandas])'
3# Or just micromamba install seaborn matplotlib pandas click jinja2 tomli
4python gen_test.py --execute --build --compiler "gfortran"
```

The TOML file has simple naming rules:

```
1[[trig]] # [name of lfortran_MODULE.f90]
2fname = "sin" # Function to generate test for; must be the same as for gfortran
3only = "dsin" # if not testing same function
4range = 20 # [-range, +range]
5stepsize = 0.01
```

As does the template:

```
1program {{ fname }}_test
2 use, intrinsic :: iso_fortran_env, only: sp => real32, dp => real64
3 use :: lfortran_intrinsic_trig, only: {{ fname }}2 => {{ fname }}
4 real(dp), parameter :: pi = 3.1415926535897932384626433832795_dp
5 real(dp) :: ir, gf, lf
6 print*, "Compiled with {{ compiler }}"
7 print*, "x ", "gfortran_{{fname}}(x) ", &
8 "lfortran_{{fname}}(x) ", "abs_error "
9 ir = -{{ range }}
10 do while (ir<={{ range }})
11 gf = {{ fname }}(ir)
12 lf = {{ fname }}2(ir)
13 print*, ir, gf, lf, abs(gf-lf)
14 ir = ir + {{ stepsize }}
15 end do
16end program
```

## Conclusions

Rather than move forward with the closing comments; and going into more
detail about the manner in which further `sollya`

based polynomial
approximations are being looked into; I will end this post here; as a
trailer and invitation to the closing report. This has been the last of
the weekly updates. This doesn’t mean anything will change other than
the format. I quite like keeping track of the work I do in this manner;
keeps me honest and not complacent ^{3}. For those of you following
along; this has been a fun ride; for those who come to this in the
future; feel free to follow along from the beginning and drop comments
below.

## References

Brisebarre, Nicolas, David Defour, Peter Kornerup, and Jean-Michel
Muller. 2005. “A New Range-Reduction Algorithm.” *IEEE TRANSACTIONS ON
COMPUTERS* 54 (3): 9.

Defour, P David, Peter Kornerup, Jean-Michel Muller, and Nathalie Revol. n.d. “A New Range Reduction Algorithm,” 13.

Goldberg, David. 1991. “What Every Computer Scientist Should Know about
Floating-Point Arithmetic.” *ACM Computing Surveys* 23 (1): 5–48.
https://doi.org/10.1145/103162.103163.

Muller, Jean-Michel. 2016. *Elementary Functions*. Boston, MA:
Birkhäuser Boston. https://doi.org/10.1007/978-1-4899-7983-4.

Muller, Jean-Michel, Nicolas Brunie, Florent de Dinechin, Claude-Pierre
Jeannerod, Mioara Joldes, Vincent Lefèvre, Guillaume Melquiond, Nathalie
Revol, and Serge Torres. 2018. *Handbook of Floating-Point Arithmetic*.
Cham: Springer International Publishing.
https://doi.org/10.1007/978-3-319-76526-6.

Ng, K C. 2006. “Argument Reduction for Huge Arguments: Good to the Last Bit.”* Introduction*, 8.

Payne, Mary H., and Robert N. Hanek. 1983. “Radian Reduction for
Trigonometric Functions.” *ACM SIGNUM Newsletter* 18 (1): 19–24.
https://doi.org/10.1145/1057600.1057602.

From this stackoverflow post of course ↩︎

From this blog post by Bert Hubert ↩︎

I maintain similar notes for my research, but those sadly never see the light of day ↩︎