MPL-v0.2 Release Notes (December 16, 2020)

It's been about a year since we released MPL-v0.1, and now we're ready for round 2! Here's what's changed.

Concurrent, Parallel GC is here! (#129) This is the biggest change, and has been in the works for over a year. Awesome work Jatin @typerSniper! Many of the details are published in our recent POPL'21 paper: Provably Space-Efficient Parallel Functional Programming.

Concurrent: heaps are collected while the mutator(s) continue running.
Parallel: multiple heaps are collected simultaneously.
Based on a provably efficient algorithm, described in the paper.
Consists of two cooperating forms of collection:
- Non-moving mark-sweep collections of shallow internal heaps.
- Compactifying (copying) collections of deep processor-local heaps.
Utilizes a novel snapshot-at-fork technique which efficiently computes root-sets.
- Closely integrated with the scheduler.
- No interference with parallelism! The GC essentially requires no additional synchronization. (Never stop-the-world, even to determine roots.)

General performance improvements

(#119) Scheduler-allocated data is now subject to GC, including thread objects (and their stacks) allocated to execute stolen work, as well as any data allocated during idle loops. This is accomplished by generalizing the heap hierarchy, as discussed in 02c9b3e.
(#120) Stacks now resize in-place, by giving each stack its own chunk. For stack-stress benchmarks, the performance benefits can be significant. A secondary benefit is that stacks no longer "move around" due to GC, so it's easier to reason about stack placement in the hierarchy. (Previously, when the stack was resized, it would be copied to the current depth of the executing thread! Gross.)
(#124) Local collections now allocate chunks lazily, which avoids some fragmentation and simplifies the algorithm. (Previously, local collection relied on having a valid frontier ready for every depth in the to-space. But on each forwarding copy, we had to check that there was enough space for the object anyway, which sometimes would leave behind an empty chunk! It's much simpler and more efficient to assume that you don't have a valid frontier, and only allocate chunks when you need them.)

Upstream MLton merge (thanks @MatthewFluet!)

(#127) Merges new MLton features, including the static/dynamic heaps distinction.

New features

(#113) Primitives for fast parallel file I/O based on mmap/munmap. The existing basis library I/O functions were essentially entirely sequential. By mmaping a file, we can then load it into a heap array in parallel.
(#118) Primitives for GC statistics.

New examples

General maintenance

Bug fixes

(#110, #111) Fixes an error in SimplifyTypes SSA optimization pass.
(19d0c63) Prevent scheduler crashes by sequentializing instead of exceeding the deque capacity.

Provide feedback