Frictionless Data Julia Libraries - Design Document

Oleg Lavrovsky ~ @loleg Last updated: November 20, 2017

Overview

Frictionless Data is a set of lightweight specifications, libraries and software improving the ways to get, share, and validate data.

This design document focuses on functional specification and design of two code libraries written in Julia: "Table Schema" and "Data Package". The design follows the general design prbinciples described at specs.frictionlessdata.io and the V1 announcement (blog.okfn.org, hackmd.io).

Functional Specification

Each library needs to implement a set of core “actions” that are further described in the implementation documentation. For simplicity, these core actions are reproduced here, along with links to the corresponding unit test cases:

Table Schema

read and validate a table schema descriptor schema.jl
create/edit a table schema descriptor schema.jl
provide a model-type interface to interact with a descriptor schema.jl
infer a Table Schema descriptor from a supplied sample of data infer.jl
validate a data source against the Table Schema descriptor validate.jl
validate in response to editing the descriptor changes.jl
enable streaming and reading of a data source through a Table Schema read.jl
reading of a data source with cast on iteration schema.jl
saving of a descriptor to disk save.jl

Data Package

read an existing Data Package descriptor read.jl
validate an existing Data Package descriptor, including profile-specific validation via the registry of JSON Schemas
create a new Data Package descriptor
edit an existing Data Package descriptor
as part of editing a descriptor, helper methods to add and remove resources from the resources array
validate edits made to a data package descriptor
save a Data Package descriptor to a file path
zip a Data Package descriptor and its co-located references (more generically: "zip a data package")
read a zip file that "claims" to be a data package
save a zipped Data Package to disk

API Proposal

Package names should be short, named as the base name of its source directory, and CamelCase, as per conventions described in Julia's Manual on Packages.

We will have two central classes within the project: Schema and Table. These will allow us to have constructions like Schema.infer(), which are desirable for readability.

This first design proposal follows the basic usages described in tableschema-py, tableschema-js and tableschema-go.

The Schema() type constructor accepts a stream (file I/O), string (JSON) or dictionary (parsed object) representation of a table schema:

function Schema(dictionary::Dict) (*Schema, error)
function Schema(filename::String) (*Schema, error)
function Schema(stream::IO) (*Schema, error)

Table represents a table that is an instance of the schema, and is validated by it.

Field represents a set of resources in the schema, such as the columns in a table.

Usage

For an example usage sequence please see runtests.jl in the test subfolder. Tables and schema can be loaded as follows:

using TableSchema

# read Table Schema from a JSON file:
filestream = os.open("schema.json")
schema = Schema(filestream)

# err is falsy, or an error summary:
err = schema.errors

# read Table Schema from a CSV file:
filestream = os.open("data.csv")
table = Table(filestream)
rows = table.read()

# save the Schema back to a file
if not table.errors and table.schema.valid
  table.schema.save("data_schema.json")
end

Implementation

At least, finish the basic implementation level. Interfaces are described here.
Must follow OKI coding standards.
Development process is described here.
For code style and linting, we are going to use the Julia Style Guide, and the Lint.jl tool for static analysis.
The code will be written and tested in Julia 1.0, the latest stable release of which is 1.0.0 as of September 2018.
We will use Julia's standard user manual as documentation, which can be locally generated using Documenter.jl.
The library documentation must be searchable at https://pkg.julialang.org
Unit and integration tests are going to be done using facilities of the Julia Standard Library

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DESIGN.md

DESIGN.md

Frictionless Data Julia Libraries - Design Document

Overview

Functional Specification

API Proposal

Usage

Implementation

Files

DESIGN.md

Latest commit

History

DESIGN.md

File metadata and controls

Frictionless Data Julia Libraries - Design Document

Overview

Functional Specification

API Proposal

Usage

Implementation