Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update TLA+ conformance monitor chart to support windowed model checking #35

Merged
merged 1 commit into from
Mar 6, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
85 changes: 65 additions & 20 deletions tlaplus-monitor/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,25 @@
# TLA+ Conformance Monitor

Provides a Helm chart that enables near-real time conformance monitoring with [TLA+].
Provides a Helm chart for near-real time [TLA+] conformance monitoring using TLC.

## How It Works

The conformance monitor uses the TLA+ specification language and the TLC model checker to evaluate
system trace streams to detect violations of system safety properties. This chart can configure and
run arbitrary TLA+ conformance monitoring specs and their dependencies.

Conformance monitoring specifications evaluate infinite streams of system traces to detect
safety violations in near-real time. To limit resource usage over infinite streams, the
conformance monitor batches streams into time-based overlapping windows. For each window,
the conformance monitor creates a separate TLC process to evaluate traces within the window.
Once all traces within a window have been evaluated, the conformance monitor starts a new process
for the next window.

The conformance monitor supports configurable _sources_ and _sinks_, allowing monitoring specs to
consume traces from different systems and publish alerts to different systems and in different
formats. Sources (e.g. Kafka topics) can also be partitioned. The conformance monitor will scale
model checking across multiple processes when multiple source partitions are available.

### Prerequisites

The conformance monitor currently requires Kafka to stream traces to TLC.
Expand All @@ -23,6 +37,8 @@ to consume traces and produce alerts:
replicas: 1
zookeeper:
replicaCount: 1
configurationOverrides:
"log.message.timestamp.type": "LogAppendTime"
topics:
- name: traces
partitions: 3
Expand All @@ -32,7 +48,11 @@ topics:
replicationFactor: 1
```
Then, install the `incubator/kafka` chart with the desired configuration overrides:
*Traces must be written to Kafka with a timestamp, so either the trace logger must write messages with
a client-provided timestamp, or Kafka must be configured to attach a timestamp to messages in the log
with `configurationOverrides."log.message.timestamp.type"="LogAppendTime"`.*

Install the `incubator/kafka` chart with the desired configuration overrides:

```bash
$ helm install kafka incubator/kafka --values values.yaml
Expand All @@ -48,13 +68,15 @@ monitoring on Kafka streams. The monitor uses [TLA+] specifications to evaluate
and invariants specified in the chart configuration can detect safety violations in the trace stream.

Several artifacts are required to by the chart:

* `model` - the name of the module to evaluate
* `modules` - an array of TLA+ module files to mount to the monitor pod
* `spec` - the specification to evaluate
* `init` - the state initialization predicate (required if `spec` is not configured)
* `next` - the next state relation (required if `spec` is not configured)

Additional options can be used to specify invariants and other constraints on the model checker:

* `invariants` - an array of invariants to check for each trace
* `constants` - a mapping of constant values to assign to the model
* `constraints` - an array of state constraints
Expand All @@ -64,6 +86,19 @@ Additional options can be used to specify invariants and other constraints on th
$ helm install my-monitor --set modules={Cache.tla,CacheHistory.tla} --set model=Cache.tla --set config.spec=Spec --set config.invariants={TypeOK}
```

The conformance monitor evaluates an infinite stream of system traces by checking traces in batches
using a sliding window over the stream. The window is a time range which selects and evaluates
traces within a time window. By default, the time window is `5m` (five minutes).

To override the default time window, set the `window` configuration:

```
helm install my-monitor --set modules={Cache.tla,CacheHistory.tla} --set model=Cache.tla --set window=1h
```

Note that the monitor may consume disk space and memory proportional to the window duration.
Limit the monitoring window to reduce resource usage overhead.

## Specifications

The role of the conformance monitor is to track the system state over time and detect violations
Expand Down Expand Up @@ -97,31 +132,36 @@ Spec == Init /\ [][Next]_<<vars>>

Typically, the TLA+ model checker, TLC, computes and evaluates every state that can be reached
by the spec according to its initial state and next state relation. Conformance monitoring specs
operate on an infinite stream of traces, using the `NextTrace` operator to consume and check
traces in near-real time:
operate on an infinite stream of traces that are consumed by calling the `Trace` operator with
a sequential offset:

```
INSTANCE Traces
\* The current trace offset
VARIABLE offset
\* The previous trace version
VARIABLE prevVersion
\* The current trace version
VARIABLE nextVersion
\* A list of variables in the spec
vars == <<prevVersion, nextVersion>>
vars == <<offset, prevVersion, nextVersion>>
\* Read a trace record from the stream and update the previous and next versions
Read ==
LET record == NextTrace
IN
/\ PrintT(record)
/\ prevVersion' = nextVersion
/\ nextVersion' = record.version
/\ offset' = offset + 1
/\ LET trace == Trace(offset')
IN
/\ PrintT(trace)
/\ prevVersion' = nextVersion
/\ nextVersion' = trace.version
\* The init predicate describing the initial state of the system
Init ==
/\ offset = 0
/\ prevVersion = 0
/\ nextVersion = 0
Expand Down Expand Up @@ -150,9 +190,9 @@ INSTANCE Alerts
\* An invariant verifying that the trace version is monotonically increasing
TypeOK ==
\/ nextVersion # 0 => nextVersion < prevVersion
\/ PublishAlert([msg |-> "Invariant violated",
prevVersion |-> prevVersion,
nextVersion |-> nextVersion])
\/ Alert([msg |-> "Invariant violated",
prevVersion |-> prevVersion,
nextVersion |-> nextVersion])
```
With the initial state predicate, the next state relation, and the type invariants,
Expand All @@ -167,14 +207,17 @@ INSTANCE Traces

INSTANCE Alerts

\* The current trace offset
VARIABLE offset

\* The previous trace version
VARIABLE prevVersion

\* The current trace version
VARIABLE nextVersion

\* A list of variables in the spec
vars == <<prevVersion, nextVersion>>
vars == <<offset, prevVersion, nextVersion>>

\* An invariant verifying that the trace version is monotonically increasing
TypeOK ==
Expand All @@ -183,16 +226,18 @@ TypeOK ==
prevVersion |-> prevVersion,
nextVersion |-> nextVersion])

\* Read a trace record from the traces stream and update the previous and next versions
\* Read a trace record from the stream and update the previous and next versions
Read ==
LET record == NextTrace
IN
/\ PrintT(record)
/\ prevVersion' = nextVersion
/\ nextVersion' = record.version
/\ offset' = offset + 1
/\ LET trace == Trace(offset')
IN
/\ PrintT(trace)
/\ prevVersion' = nextVersion
/\ nextVersion' = trace.version

\* The init predicate describing the initial state of the system
Init ==
/\ offset = 0
/\ prevVersion = 0
/\ nextVersion = 0

Expand Down
6 changes: 3 additions & 3 deletions tlaplus-monitor/templates/configmap.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,13 +11,13 @@ metadata:
heritage: "{{ .Release.Service }}"
data:
{{ base .Values.model | trimSuffix ".tla" }}.cfg: |
{{ if .Values.config.spec }}
{{- if .Values.config.spec }}
SPECIFICATION {{ .Values.config.spec }}
{{ end }}
{{ if .Values.config.init }}
{{- if .Values.config.init }}
INIT {{ .Values.config.init }}
{{ end }}
{{ if .Values.config.next }}
{{- if .Values.config.next }}
NEXT {{ .Values.config.next }}
{{ end }}
{{- range .Values.config.invariants }}
Expand Down
10 changes: 6 additions & 4 deletions tlaplus-monitor/templates/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,15 @@ spec:
imagePullPolicy: IfNotPresent
args:
- /opt/tlaplus/model/{{ .Values.model }}
- -metadir
- /opt/tlaplus/data
- -monitor
- -trace
- -source
- "kafka://{{ .Values.kafka.service }}:{{ .Values.kafka.port }}/{{ .Values.kafka.topics.traces }}"
- -alert
- -sink
- "kafka://{{ .Values.kafka.service }}:{{ .Values.kafka.port }}/{{ .Values.kafka.topics.alerts }}"
- -window
- {{ .Values.window }}
- -metadir
- /opt/tlaplus/data
volumeMounts:
- name: models
mountPath: /opt/tlaplus/model
Expand Down
3 changes: 3 additions & 0 deletions tlaplus-monitor/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,9 @@ model: ""
# 'modules' is a list of module files to include. This should contain at least the 'model' file.
modules: []

# 'window' is the size of the sliding window within which to process traces
window: "5m"

# 'config' is the TLC configuration
config:
# 'spec' is the specification expression
Expand Down