Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: data hypercores, schemas, validation, indexing #16

Merged
merged 9 commits into from
Oct 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/node.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,8 @@ jobs:
matrix:
os: [macos-latest, ubuntu-latest, windows-latest]
node-version:
- '14.x'
- '16.x'
- '18.x'
steps:
- uses: actions/checkout@v2
- name: Use Node.js ${{ matrix.node-version }}
Expand Down
55 changes: 55 additions & 0 deletions example.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import Corestore from 'corestore'
import ram from 'random-access-memory'
import Sqlite from 'better-sqlite3'
import { Mapeo, DataType } from './index.js'
import { Observation, validateObservation } from 'mapeo-schema'

const corestore = new Corestore(ram)

// writer has to be ready for multi-core-indexer to know it exists
const writer = corestore.get({ name: 'writer' })
await writer.ready()

// TODO: actual schema from mapeo-schema
const observation = new DataType({
name: 'observation',
blockPrefix: '6f62', // could make this automatically set based on the name, but that might be too magic
validate: validateObservation,
schema: Observation,
})

const sqlite = new Sqlite(':memory:')

const mapeo = new Mapeo({
corestore,
sqlite,
dataTypes: [observation],
})

await mapeo.ready()

const doc = await mapeo.observation.create({
id: '79be849f934590ec',
version: '4d822ba6f2e502a5a944f50476217fe90ed5927fe92e71e7d94b0849a65929f3',
created_at: '2018-12-28T21:25:01.689Z',
timestamp: '2019-01-13T19:27:39.983Z',
type: 'observation',
schemaVersion: 4,
tags: {
notes: 'example note',
},
})

const newDocVersion = Object.assign({}, doc, {
tags: {
notes: 'updated note',
},
links: [doc.version],
})

console.log('doc', doc)
await mapeo.observation.update(newDocVersion)

const gotDoc = mapeo.observation.query()

console.log(gotDoc)
91 changes: 91 additions & 0 deletions index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
import MultiCoreIndexer from 'multi-core-indexer'
import b4a from 'b4a'
import ram from 'random-access-memory'

import { DataStore } from './lib/datastore/index.js'
import { Indexer } from './lib/indexer/index.js'
export { DataType } from './lib/datatype/index.js'

export class Mapeo {
#indexers = new Map()
#multiCoreIndexer
#corestore
#dataTypes

constructor(options) {
const { corestore, dataTypes, sqlite } = options
this.#corestore = corestore
this.#dataTypes = dataTypes

for (const dataType of dataTypes) {
const extraColumns = Object.keys(dataType.schema.properties)
.filter((key) => {
return (
['id', 'version', 'links', 'forks', 'properties'].includes(key) ===
false
)
})
.map((key) => {
// TODO: better support for vaious types
if (
['string', 'array', 'object'].includes(
dataType.schema.properties[key].type
)
) {
return `${key} TEXT`
} else if (dataType.schema.properties[key].type === 'number') {
return `${key} REAL`
} else if (dataType.schema.properties[key].type === 'integer') {
return `${key} INTEGER`
}
})
.join(', ')

const indexer = new Indexer({
dataType,
sqlite,
extraColumns,
})

this.#indexers.set(dataType.name, indexer)
this[dataType.name] = new DataStore({ dataType, corestore, indexer })
}

this.#multiCoreIndexer = new MultiCoreIndexer(this.cores, {
storage: (key) => {
return new ram(key)
},
batch: (entries) => {
for (const entry of entries) {
const { block } = entry
const dataType = this.getDataType(block)
if (!dataType) continue
const doc = dataType.decode(block)
const indexer = this.#indexers.get(dataType.name)
indexer.batch([doc])
}
},
})
}

async ready() {
for (const dataType of this.#dataTypes) {
await this[dataType.name].ready()
}
}

get coreKeys() {
return [...this.#corestore.cores.keys()]
}

get cores() {
return [...this.#corestore.cores.values()]
}

getDataType(block) {
const typeHex = b4a.toString(block, 'utf-8', 0, 4)
return this.#dataTypes.find((dataType) => {
return dataType.blockPrefix === typeHex
})
}
}
23 changes: 23 additions & 0 deletions lib/datastore/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# DataStore

> Create, read, update, delete, and query data.

## Purpose

The `DataStore` class composes our [`Indexer` class](../indexer/) with the [`Corestore` instance](https://npmjs.com/corestore) used to store the local writer [hypercore](https://npmjs.com/hypercore) and all the relevant hypercores of peers in a project.

## Usage

The `DataStore` class is used internally by the main [`Mapeo` class](../../index.js).

Currently it isn't usable on its own as it requires an instance of the `Indexer` class, which in turn currently assumes it is used along with [multi-core-indexer](https://npmjs.com/multi-core-indexer) as part of the `Mapeo` class.

The API of this module is primarily a convenient wrapper around the [`DataType`](../datatype/) and `Indexer` classes.

## API docs

TODO!

## Tests

Tests for this module are in [tests/datastore.js](../../tests/datastore.js)
135 changes: 135 additions & 0 deletions lib/datastore/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
import { randomBytes } from 'crypto'

/**
* The DataStore class provides methods for managing a single type of data.
*/
export class DataStore {
#dataType
#corestore
#indexer
#writer

/**
* @param {Object} options
* @param {import('../datatype/index.js').DataType} options.dataType an instance of the [DataType](../datatype/) class
* @param {Corestore} options.corestore an instance of the [Corestore](https://npmjs.com/corestore) class
* @param {import('../indexer/index.js').Indexer} options.indexer an instance of the [Indexer](../indexer/) class
*/
constructor({ dataType, corestore, indexer }) {
this.#dataType = dataType
this.#corestore = corestore
this.#indexer = indexer
this.#writer = this.#corestore.get({ name: 'writer' })
}

/**
* Wait for the corestore and writer hypercore to be ready
* @returns {Promise<void>}
*/
async ready() {
await this.#writer.ready()
await this.#corestore.ready()
}

/**
* Validate a doc
* @param {Doc} doc
* @returns {Boolean}
* @throws {Error}
*/
validate(doc) {
return this.#dataType.validate(doc)
}

/**
* Encode a doc (an object), to a block (a Buffer)
* @param {Doc} doc
* @returns {Block}
*/
encode(doc) {
return this.#dataType.encode(doc)
}

/**
* Decode a block (a Buffer), to a doc (an object)
* @param {Block} block
* @returns {Doc}
*/
decode(block) {
return this.#dataType.decode(block)
}

/**
* Get a doc by id
* @param {string} id
* @returns {Doc}
*/
getById(id) {
return this.#indexer.get(
`SELECT * from ${this.#indexer.name} where id = :id`,
{ id }
)
}

/**
* Create a doc
* @param {Doc} data
* @returns {Promise<Doc>}
*/
async create(data) {
const doc = Object.assign(data, {
id: data.id || randomBytes(8).toString('hex'),
version: `${this.#writer.key.toString('hex')}@${this.#writer.length}`,
created_at: new Date().toISOString(),
})

if (!doc.links) {
doc.links = []
}

this.validate(doc)
const encodedDoc = this.encode(doc)
await this.#writer.append(encodedDoc)

const indexing = new Promise((resolve) => {
this.#indexer.onceWriteDoc(doc.version, (doc) => {
resolve(doc)
})
})

await indexing
return doc
}

/**
* Update a doc
* @param {Doc} data
* @returns {Promise<Doc>}
*/
async update(data) {
const doc = Object.assign({}, data, {
version: `${this.#writer.key.toString('hex')}@${this.#writer.length}`,
})

const indexing = new Promise((resolve) => {
this.#indexer.onceWriteDoc(doc.version, (doc) => {
resolve(doc)
})
})

this.validate(doc)
const encodedDoc = this.encode(doc)
await this.#writer.append(encodedDoc)
await indexing
return doc
}

/**
* Query indexed docs
* @param {string} where sql where clause
* @returns {Doc[]}
*/
query(where) {
return this.#indexer.query(where)
}
}
68 changes: 68 additions & 0 deletions lib/datatype/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# DataType

> Define schemas and encode/decode functions for data models.

## Purpose

We created the `DataType` class to establish a clear method for creating and using multiple data types in Mapeo. We provide an array of DataTypes to the Mapeo class to determine which [`DataStore` instances](../datastore/) are made available.

For example, to be able to manage GeoJSON Point data:

```js
const points = new DataType({
// ... provide relevant options
})

const mapeo = new Mapeo({
dataTypes: [points],
// ... additional required and optional options
})

// there is now an `points` property on `mapeo` with methods for managing data, including:
await mapeo.points.create()
await mapeo.points.update()
await mapeo.points.getById()
await mapeo.points.query()
```

## Usage

While this is primarily an internal class used by the main [`Mapeo` class](../../index.js), the `DataType` class can be used on its own.

Here's an example creating a GeoJSON Point `DataType`:

```js
import DataType from '@mapeo/core/lib/datatype/index.js'

const point = new DataType({
name: 'point',
blockPrefix: 'abcd', // magic string that is prefixed onto each block of this DataType for easy identification
schema: {
title: 'GeoJSON Point',
type: 'object',
required: ['type', 'coordinates'],
properties: {
type: {
type: 'string',
enum: ['Point'],
},
coordinates: PointCoordinates,
bbox: BoundingBox,
},
},
encode: (obj) => {
return JSON.stringify(obj)
},
decode: (str) => {
return JSON.parse(str)
},
})
```

## API docs

TODO!

## Tests

Tests for this module are in [tests/datatype.js](../../tests/datatype.js)
Loading