Skip to content

Storage

Paula Gearon edited this page Mar 8, 2017 · 5 revisions

Overview

Naga abstracts storage behind a Clojure protocol.

Storage is expected to be a graph store with a basic set of primitive operations. If the storage requires declaration of properties before use, then this should be done before the rules engine is run against the store.

Registering a Storage Module

Storage modules should register themselves with a public keyword that will be used to find the storage. This is done with the register-storage! function.

Namespace: naga.store
Function:
  (register-storage! registry-key construction-fn)

Registers a storage type with Naga. This should be called by the storage module when it is loaded. registry-key: Should be a keyword that will be used to address the module. The :memory module is always registered. construction-fn: This is a function that takes a single parameter of a map that contains storage specific parameters, and returns an implementation of the naga.store.Storage protocol.

e.g.

(naga.store/registry.storage! :datomic create-datomic-store)

Storage Protocol

The storage protocol contains a set of basic operations required for all storage types. Many of these are expected to be simple wrappers or stubs for most graph stores. Implementations are not expected to be stateful. Functions that change the state of storage all return a storage object, which the implementor may choose to be the same object as the original, with modified state.

Namespace: naga.store
Functions:
  (start-tx store)
  (commit-tx store)
  (new-node store)
  (node-type? store property node)
  (data-property store data)
  (container-property store data)
  (resolve-pattern store pattern)
  (count-pattern store pattern)
  (query store output-patterns patterns)
  (assert-data store data)
  (query-insert store assertion-patterns patterns)

(start-tx store)

This starts a transaction for the store, when transactions are supported. If transactions are unsupported, just returns the original store.

(commit-tx store)

Commits an outstanding transaction on the store, when transactions are supported and one is pending. If transactions are unsupported, just returns the original store. If no transaction is pending, then this may return the store (a no-op). Alternatively, if the implementation wishes to do so, then a failed commit may throw an exception.

(new-node store)

Creates a new graph node for representing an entity. For some systems, this may be a simple unique identifier, but for others this may require a function call (such as the [datomic.api/tempid](http://docs.datomic.com/clojure/index.html#datomic.api/tempid) function in Datomic).

(node-type? store property n)

Tests if n may be a graph node. Some systems may use the same data type in different contexts, so the property (or edge) that refers to n is also provided. Returns true when the property refers to an n that is a graph node.

(data-property store data)

Returns a property that can refer to the value for data. This must be a keyword in the naga namespace, and start with first. For instance :naga/first.long to refer to a long value. Storage that has untyped properties (like :memory) may just return :naga/first.

(container-property store data)

Returns a property that can refer to the value for data, and indicates membership in a container. This must be a keyword in the naga namespace, and should differ from the value returned by data-property. For instance :naga/contains.long to refer to a long value. Storage that has untyped properties (like :memory) may return a hardcoded value, such as :naga/contains.

(resolve-pattern store pattern)

Takes a single query pattern, and returns a set of bindings for it. These are the same patterns that appear in rule bodies, and have the form:
[entity attribute value]
Bindings are a sequence of vectors, where each vector contains the requested columns.

As an example, if a database contained the following data:

[:a :p :b]
[:a :p :c]
[:m :q :x]
[:m :q :y]

Then resolving the pattern: [?u :p ?v]
will match every element that contains the :p property. The result would be this sequence:
[[:a :b] [:a :c]]
Note that the results only contain the variable values (since the property was specified and known).
The variables bound in the results (?a and ?b) are stored as metadata on the results:

=> (def results (resolve-pattern store '[?u :p ?v]))
#'results
=> results
[[:a :b] [:a :c]]
=> (meta results)
{:cols [?u ?v]}

(count-pattern store pattern)

Similar to resolve-pattern, but this returns the count of the results. This can be implemented with a simple wrapper:

;; inside storage definition
(count-pattern [store pattern]
  (count (resolve-pattern store pattern)))

Most databases have an operation to perform this counting directly, which will be more efficient than this approach.

(query store output-patterns patterns)

This performs a full query against the storage. The API is loosely based on Datomic queries.

Queries are based on patterns with the results being projected to only use the variables in the output-patterns.

patterns is a seq of either 3 element patterns, or a list which contains a filtering operation.

list

An operation that returns a truthy value. This filters results to only include those where the value is truthy. For instance, '(> ?x 3) will return true when ?x is greater than 3, and only those bindings will end up in the result.

pattern

A seq with values and variables in it. Variables are symbols that start with the ? character.

  • The first element must be a valid "node" value or a variable.
  • The second element must be a keyword property or a variable.
  • The third element may be any kind of value supported by the store, or a variable.

The output is determined from the inner join of all the pattern resolutions, filtered by the lists. The resulting bindings are then "projected" through the output-patterns. This is a seq of patterns containing variables, and directs the format of the results. The output will be in the same structure as the output-patterns with each variable being replaced by the associated bound value.

If a query gives a raw set of bindings of:
columns: ?a ?b ?c ?d ?e

[:t :u :v :v 3]
[:t :x :y :z 3]
[:m :u :v :n 4]

Then for output-patterns of: [[?a ?b] [?c ?e]] The result would be:

[[:t :u] [:v 3]]
[[:t :x] [:y 3]]
[[:m :u] [:v 4]]

(assert-data store data)

Insert data into the store. The data parameter is a seq of 3 element seqs in [entity attribute value] form. They may not contain variables.

Entities must be a valid node type for the store. Attributes are properties that the store will accept (we expect this to be in keyword form). Values are any supported datatype in the store, that is compatible with the given attribute.

Data that duplicates statements in the store will be silently ignored.

(query-insert store assertion-patterns patterns)

Much like the query function, with the exception that the resulting data will be inserted into the store, much like with the assert-data function. For this reason, the assertion-patterns must all be 3 elements wide. These elements may contain fixed values, rather than only variables.

If any variables are left unbound in the assertion-patterns parameter, this will indicate a new node that must be created for each set of bindings (or "row" in a result). Using the same unbound variable in multiple places in the assertion-patterns parameter will reuse that node during the same binding. This allows new elements to be created for each binding, and multiple attributes to be attached to them.

Using the sample results in the description of the query function above, then these assertion patterns:
[[?new :first ?b] [?new :second ?e]]
Would assert the following data back in the database:

[:new_node1 :first :u]
[:new_node1 :second 3]
[:new_node2 :first :x]
[:new_node3 :second 3]
[:new_node4 :first :u]
[:new_node4 :second 4]

Note how 2 assertion patterns appear for each result, so the 3 result rows were converted into 6 assertions into the storage.

Clone this wiki locally