Skip to content
/ archived Public

Cloud native service to store versioned data in space-efficient manner

License

Notifications You must be signed in to change notification settings

teran/archived

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

archived

Verify Go Report Card Go Reference

Cloud native service to store versioned data in space-efficient manner

archived is applicable if you have amount of low-cardinality data to share with amount of users/systems. Good example of that task: APT/RPM repository.

Project status & roadmap

archived is under active development and almost everything is a subject to change. MVP was already implemented as of v0.0.1 to prove all the concepts used in archived.

The complete feature list is available in the repository issues

How it works

archived is inspired by rsync --link-dest which allowed to store package mirrors without duplicating data for decades. And now archived makes this approach unbound from local file systems by using modern era storage services under the hood like S3.

To do so archived relies on two storages: metadata and CAS.

Metadata is a some kind of database to store all of the things:

  • namespaces - group of containers
  • containers - some kind of directories
  • versions - immutable version of the data in container
  • objects - named data BLOBs with some additional metadata

Good example of metadata storage is a PostgreSQL database.

CAS storage is a BLOB storage which stores the data behind objects. CAS is actually an acronym means Content Addressed Storage which describes how exactly it operates: stores BLOBs under content aware unique key (SHA256 is used by default).

Good example of CAS storage is S3.

This approach allows to reduce raw data usage by linking duplicates instead if storing copies.

archived components

archived is built with microservice architecture containing the following components:

  • archived-publisher - HTTP server to allow data listing and fetching
  • archived-manager - gRPC API to manage namespaces, containers, versions and objects
  • archived-exporter - Prometheus metrics exporter for metadata entities
  • CLI - CLI application to interact with manage component
  • migrator - metadata migration tool
  • archived-gc - garbage collector

Deploy

archived is distributed as a number of prebuilt binaries which allows to choose any particular way to deploy it from systemd services to Kubernetes.

The main things are required to know before deployment:

  • archived-publisher can use RO replica of PostgreSQL for operation and can scale
  • archived-manager requires RW PostgreSQL instance since it performs writes, can also scale
  • archived-exporter is sufficient to run in the only copy since it just provides metrics for the database stuff, RO replica access is also enough
  • archived-migrator must be ran each time archived is upgrading right before other components
  • archived-cli could run anywhere and will require network access to archived-manager
  • there's no authentication on any stage at the moment (yes, even for cli/manager)

diagram

An example for Kubernetes deployment specs is available in docs/examples/deploy/k8s directory.

Full configuration reference is available at docs/configuration.md reference.

CLI

archived-cli provides an CLI interface to operate archived including creating namespaces, containers, versions and objects. It works with archived-manager to handle requests.

usage: archived-cli --endpoint=ENDPOINT [<flags>] <command> [<args> ...]

CLI interface for archived


Flags:
      --[no-]help            Show context-sensitive help (also try --help-long and --help-man).
  -d, --[no-]debug           Enable debug mode ($ARCHIVED_CLI_DEBUG)
  -t, --[no-]trace           Enable trace mode (debug mode on steroids) ($ARCHIVED_CLI_TRACE)
  -s, --endpoint=ENDPOINT    Manager API endpoint address ($ARCHIVED_CLI_ENDPOINT)
      --[no-]insecure        Do not use TLS for gRPC connection
      --[no-]insecure-skip-verify
                             Do not perform TLS certificate verification for gRPC connection
      --cache-dir="~/.cache/archived/cli/objects"
                             Stat-cache directory for objects ($ARCHIVED_CLI_STAT_CACHE_DIR)
  -n, --namespace="default"  namespace for containers to operate on

Commands:
help [<command>...]
    Show help.

namespace create <name>
    create new namespace

namespace rename <old-name> <new-name>
    rename the given namespace

namespace delete <name>
    delete the given namespace

namespace list
    list namespaces

container create <name>
    create new container

container move <name> <namespace>
    move container to another namespace

container rename <old-name> <new-name>
    rename the given container

container delete <name>
    delete the given container

container list
    list containers

version create [<flags>] <container>
    create new version for given container

version delete <container> <version>
    delete the given version

version list <container>
    list versions for the given container

version publish <container> <version>
    publish the given version

object list <container> <version>
    list objects in the given container and version

object create <container> <version> <path>
    create object(s) from location

object url <container> <version> <key>
    get URL for the object

object delete <container> <version> <key>
    delete object

stat-cache show-path
    print actual cache path

How build the project manually

archived requires the following dependencies to build:

  • Go v1.22+ (prior versions not tested)
  • goreleaser v2.0+ (prior versions not tested)
  • protoc-gen-go v1.34+ (prior versions not tested)
  • protoc-gen-go-grpc v1.4 (prior versions not test)
  • docker (to build container images, run some tests)

To build the project just:

go generate ./...
goreleaser build --snapshot --clean

To build container images:

docker-compose build

or build them manually by running:

docker build -f Dockerfile.component .

Where component is one of publisher, manager, migrator, etc.

Local development

In some cases it's nice and clean to run the while stack locally. archived has docker-compose way to do that from prebuilt images:

docker-compose up

or by running custom build:

go generate -v ./... && \
goreleaser build --snapshot --clean && \
docker-compose build && \
docker-compose up || docker-compose down

Please note docker-compose down at the will automatically remove containers on stop. Please remove it if you don't need such behavior.

Run tests locally

Simply

go test ./...

Please note running the tests will required docker to run since the tests are using go-docker-testsuite to run components dependencies in tests like PostgreSQL or memcached.