Skip to content

Commit

Permalink
feat: composite primary key -> unique Collection id (Attempt #2) (#2372)
Browse files Browse the repository at this point in the history
* feat: migration - use unique id for Collections

This migration abandons the composite primary key (id, visibility)
approach for the project / Collections table and enforces a unique id.
Going forward, revisions will be identified by their unique id rather
than by the id of their public-facing Collection coupled with
visibility=PRIVATE.

At the time of this commit, there are no revisions in progress in
the corpora_prod database.

* fix: prefix revision id comment with seq num

Co-authored-by: Kuni Katsuya <kkatsuya@chanzuckerberg.com>

* fix: update visibility to PRIVATE when downgrading

* feat: add 'revision' relationship to DbCollection

This commit also refactors foreign keys to inline

* refactor: improve foreign key ref column notations

Referencing `id` within DbCollection for `revision_of` by using `project.id` means that `id` doesn't have to be redefined (already in Base class but interpreter otherwise sees the builtin id method)

Co-authored-by: Trent Smith <1429913+Bento007@users.noreply.github.com>

* fix: add unique constraint to project.revision_of

There should only ever be a single revision open at one time for a given
public Collection

* feat!: backend changes for Collection primary key migration

BREAKING_CHANGE: This commit is a temporary stopping point and
represents the migration of all backend service code from using the
composite key for Collections (`id`, `visibility`) in the `project`
table to using a unique `id` as the primary key for every Collection.

* fix: move a commit operation plus refactoring

Move the commit for Collection revision creation to after the creation
of individual Dataset revisions. Remove unnecessary comments and add
Session type hint to function definitions.

* refactor: revert format for wmg example in corpora-api spec

* fix: update unit tests and work in migration

* Return revison_of when listing collections

* fix tests.unit.processing_container.test_process.TestDatasetProcessing

* feat: remove visibility from url's and api requests (#2197)

* add revision_off attribute

* alter queries and mutations

* change query/mutation callsites

* revert to passing only attributes and not whole obj

* 7 Levels of prop drilling 💀

* retype new file mutation

* fix: rename migration

* fix: add a cognitive-complexity lint skip

* fix: 'collections' path for legacy links

* remove private redirects

* fix cognitive complexity

* handle new revision creation flow

* handle collections with new `revision_of`

* fix typo in collection link

* fix: remove window location manipulation

* fix: DatasetUploadStatus import bug fix

* properly assign key id

* debugging logs

* one more log

* redirect properly on revision publish

* remove non-existant attr

* first past of revision/published id links

* first past of revision/published id links

* remove unneeded revision_of checks

* fix diffing

* don't redirect if private publish

* handle private collection publish

* add handled success state

* invalidate collection on private publish

* refresh on publish

* one more ignored collection field for diff

* possibly fix edit collection issue

* fix: add `id` to ignored fields (#2240)

* add logs

* do not check collection id for diffs

* Revert "add logs"

This reverts commit d177ef1.

Co-authored-by: Daniel Hegeman <daniel.hegeman@chanzuckerberg.com>
Co-authored-by: Trent Smith <1429913+Bento007@users.noreply.github.com>

* fix call to test snapshot module in setup_dev_data.sh

* incorporate removal of DatasetArtifactType

* pkey migration changes in wmg tests

* lint cleanup

* clean up migration

* incorporate cloudfront invalidation test

* more cleanup

* cleanup, docstring

* fix import scheme to prevent circular import error

* chore: attempt to fix circular import

Unit tests are passing locally but not in Github Actions

* fix module access typo

* another import fix attempt

* Revert "another import fix attempt"

This reverts commit ff55991.

* Revert "fix module access typo"

This reverts commit 70ae87d.

* Revert "chore: attempt to fix circular import"

This reverts commit 01e712a.

* Revert "fix import scheme to prevent circular import error"

This reverts commit 70fbcd0.

* Move CollectionLink to own file

* fix: black version bump to be compatible with click

* fix accidental blank lines

Co-authored-by: Kuni Katsuya <kkatsuya@chanzuckerberg.com>
Co-authored-by: Trent Smith <1429913+Bento007@users.noreply.github.com>
Co-authored-by: Trent Smith <trent.smith@chanzuckerberg.com>
Co-authored-by: Severiano Badajoz <sbadajoz@chanzuckerberg.com>
  • Loading branch information
5 people authored Apr 15, 2022
1 parent b52dc1a commit 3570af7
Show file tree
Hide file tree
Showing 53 changed files with 900 additions and 626 deletions.
41 changes: 17 additions & 24 deletions backend/config/corpora-api.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@ paths:
- wmg
summary: Get WMG cell types (WMG FEATURE PROTOTYPE)
security:
- cxguserCookieLenient: [ ]
- { }
- cxguserCookieLenient: []
- {}
description: >-
JSON of all genes
operationId: backend.corpora.lambdas.api.v1.wmg.get_cell_types
parameters: [ ]
parameters: []
responses:
"200":
description: OK
Expand All @@ -50,12 +50,12 @@ paths:
- wmg
summary: Get WMG genes (WMG FEATURE PROTOTYPE)
security:
- cxguserCookieLenient: [ ]
- { }
- cxguserCookieLenient: []
- {}
description: >-
JSON of all cell types
operationId: backend.corpora.lambdas.api.v1.wmg.get_genes
parameters: [ ]
parameters: []
responses:
"200":
description: OK
Expand All @@ -81,18 +81,18 @@ paths:
- wmg
summary: Get WMG gene expression for a given gene name (WMG FEATURE PROTOTYPE)
security:
- cxguserCookieLenient: [ ]
- { }
- cxguserCookieLenient: []
- {}
description: >-
JSON of gene expression for a given gene
operationId: backend.corpora.lambdas.api.v1.wmg.get_gene_expression
parameters:
- name: gene_name
in: path
required: true
schema:
type: string
description: The name of the gene
- name: gene_name
in: path
required: true
schema:
type: string
description: The name of the gene
responses:
"200":
description: OK
Expand Down Expand Up @@ -235,7 +235,6 @@ paths:
operationId: backend.corpora.lambdas.api.v1.collection.get_collection_details
parameters:
- $ref: "#/components/parameters/path_collection_uuid"
- $ref: "#/components/parameters/query_visibility"
responses:
"200":
description: OK
Expand Down Expand Up @@ -338,7 +337,6 @@ paths:
operationId: backend.corpora.lambdas.api.v1.collection.delete_collection
parameters:
- $ref: "#/components/parameters/path_collection_uuid"
- $ref: "#/components/parameters/query_visibility"
responses:
"204":
$ref: "#/components/responses/204"
Expand Down Expand Up @@ -396,7 +394,7 @@ paths:
description: >-
An authenticated user can upload a file from a shared link to replace an existing dataset in their collection.
The existing dataset must be in an error start or completely uploaded before a replacement can be attempted.
The existing dataset must be in an error state or completely uploaded before a replacement can be attempted.
Once the upload of the replacement dataset is complete, the previous version of the dataset is removed and is
no longer available for downloading through this collection. The replacement datasets are not publicly available
until the collection revision is published. The old datasets are available in the public collection until the
Expand Down Expand Up @@ -1000,7 +998,7 @@ paths:
tags:
- wmg
operationId: backend.wmg.api.v1.primary_filter_dimensions
parameters: [ ]
parameters: []
responses:
"200":
description: OK
Expand All @@ -1026,7 +1024,7 @@ paths:
tags:
- wmg
operationId: backend.wmg.api.v1.query
parameters: [ ]
parameters: []
requestBody:
content:
application/json:
Expand Down Expand Up @@ -1205,7 +1203,6 @@ paths:
ethnicity_terms:
$ref: "#/components/schemas/wmg_ontology_term_id_label_list"


components:
schemas:
user_uuid:
Expand Down Expand Up @@ -1362,7 +1359,6 @@ components:
- name
- revision
- collection_id
- collection_visibility
properties:
id:
$ref: "#/components/schemas/dataset_uuid"
Expand Down Expand Up @@ -1401,8 +1397,6 @@ components:
$ref: "#/components/schemas/dataset_asset"
collection_id:
$ref: "#/components/schemas/collection_uuid"
collection_visibility:
$ref: "#/components/schemas/visibility"
is_primary_data:
$ref: "#/components/schemas/is_primary_data"
cell_type:
Expand Down Expand Up @@ -1513,7 +1507,6 @@ components:
type: string
format: uuid


parameters:
path_collection_uuid:
name: collection_uuid
Expand Down
42 changes: 15 additions & 27 deletions backend/corpora/common/corpora_orm.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,12 @@
from datetime import datetime
from sqlalchemy import (
Boolean,
CheckConstraint,
Column,
DateTime,
Enum,
Float,
ForeignKey,
ForeignKeyConstraint,
Integer,
String,
UniqueConstraint,
Expand Down Expand Up @@ -216,7 +216,7 @@ class DbCollection(Base, AuditMixin, TimestampMixin):
# the tablename is "project" instead of "collection" to avoid migrating the database
__tablename__ = "project"

visibility = Column(Enum(CollectionVisibility), primary_key=True, nullable=False)
visibility = Column(Enum(CollectionVisibility), nullable=False)
owner = Column(StrippedString, nullable=False)
name = Column(StrippedString)
description = Column(StrippedString)
Expand All @@ -226,12 +226,16 @@ class DbCollection(Base, AuditMixin, TimestampMixin):
data_submission_policy_version = Column(StrippedString, nullable=True)
tombstone = Column(Boolean, default=False, nullable=False)
publisher_metadata = Column(JSON, nullable=True)
revision_of = Column(String, ForeignKey("project.id"), nullable=True, unique=True)

# Relationships
revision = relationship("DbCollection", cascade="all, delete-orphan", uselist=False)
links = relationship("DbProjectLink", back_populates="collection", cascade="all, delete-orphan")
datasets = relationship("DbDataset", back_populates="collection", cascade="all, delete-orphan")
genesets = relationship("DbGeneset", back_populates="collection", cascade="all, delete-orphan")

__table_args__ = (CheckConstraint("revision_of IS NULL OR visibility = 'PRIVATE'", "ck_project_private_revision"),)


class DbProjectLink(Base, AuditMixin):
"""
Expand All @@ -241,21 +245,14 @@ class DbProjectLink(Base, AuditMixin):
# the tablename is "project_link" instead of "collection_link" to avoid migrating the database
__tablename__ = "project_link"

collection_id = Column(String, nullable=False)
collection_visibility = Column(Enum(CollectionVisibility), nullable=False)
collection_id = Column(String, ForeignKey("project.id"), nullable=False)
link_name = Column(StrippedString)
link_url = Column(StrippedString)
link_type = Column(Enum(CollectionLinkType))

# Relationships
collection = relationship("DbCollection", uselist=False, back_populates="links")

# Composite FK
__table_args__ = (
ForeignKeyConstraint([collection_id, collection_visibility], [DbCollection.id, DbCollection.visibility]),
{},
)


# provide a consistent name
DbCollectionLink = DbProjectLink
Expand Down Expand Up @@ -283,8 +280,7 @@ class DbDataset(Base, AuditMixin, TimestampMixin):
cell_count = Column(Integer)
is_valid = Column(Boolean, default=False)
is_primary_data = Column(Enum(IsPrimaryData))
collection_id = Column(String, nullable=False)
collection_visibility = Column(Enum(CollectionVisibility), nullable=False)
collection_id = Column(String, ForeignKey("project.id"), nullable=False)
tombstone = Column(Boolean, default=False, nullable=False)
original_id = Column(String)
published = Column(Boolean, default=False)
Expand All @@ -304,11 +300,7 @@ class DbDataset(Base, AuditMixin, TimestampMixin):
genesets = relationship("DbGeneset", secondary="geneset_dataset_link", back_populates="datasets")

# Composite FK
__table_args__ = (
ForeignKeyConstraint([collection_id, collection_visibility], [DbCollection.id, DbCollection.visibility]),
UniqueConstraint("collection_id", "curator_tag", name="_dataset__collection_id_curator_tag"),
{},
)
__table_args__ = (UniqueConstraint("collection_id", "curator_tag", name="_dataset__collection_id_curator_tag"),)

def to_dict(self, *args, **kwargs):
kwargs["remove_attr"] = kwargs.get("remove_attr", []) + ["genesets"]
Expand All @@ -326,7 +318,7 @@ class DbDatasetArtifact(Base, AuditMixin):
__tablename__ = "dataset_artifact"
__mapper_args__ = {"confirm_deleted_rows": False}

dataset_id = Column(ForeignKey("dataset.id"), nullable=False)
dataset_id = Column(String, ForeignKey("dataset.id"), nullable=False)
filename = Column(String)
filetype = Column(Enum(DatasetArtifactFileType))
user_submitted = Column(Boolean)
Expand Down Expand Up @@ -418,7 +410,7 @@ class DbDatasetProcessingStatus(Base, AuditMixin):

__tablename__ = "dataset_processing_status"

dataset_id = Column(ForeignKey("dataset.id"), nullable=False)
dataset_id = Column(String, ForeignKey("dataset.id"), nullable=False)
upload_status = Column(Enum(UploadStatus))
upload_progress = Column(Float)
upload_message = Column(String)
Expand All @@ -443,15 +435,11 @@ class DbGeneset(Base, AuditMixin):
name = Column(String, nullable=False)
description = Column(String)
genes = Column(JSONB)
collection_id = Column(String, nullable=False)
collection_visibility = Column(Enum(CollectionVisibility), nullable=False)
collection_id = Column(String, ForeignKey("project.id"), nullable=False)
collection = relationship("DbCollection", uselist=False, back_populates="genesets")
datasets = relationship("DbDataset", secondary="geneset_dataset_link", back_populates="genesets")

__table_args__ = (
ForeignKeyConstraint([collection_id, collection_visibility], [DbCollection.id, DbCollection.visibility]),
UniqueConstraint("name", "collection_id", "collection_visibility", name="_geneset_name__collection_uc"),
)
__table_args__ = (UniqueConstraint("name", "collection_id", name="_geneset_name__collection_uc"),)

def to_dict(self, *args, **kwargs):
kwargs["remove_attr"] = kwargs.get("remove_attr", []) + ["datasets"]
Expand All @@ -467,5 +455,5 @@ class DbGenesetDatasetLink(Base, AuditMixin):

__tablename__ = "geneset_dataset_link"

geneset_id = Column(String, ForeignKey("geneset.id"), index=True)
dataset_id = Column(String, ForeignKey("dataset.id"), index=True)
geneset_id = Column(String, ForeignKey("geneset.id"), index=True, nullable=False)
dataset_id = Column(String, ForeignKey("dataset.id"), index=True, nullable=False)
Loading

0 comments on commit 3570af7

Please sign in to comment.