Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publishing mechanism #60

Open
mishaschwartz opened this issue Jun 27, 2024 · 1 comment
Open

Publishing mechanism #60

mishaschwartz opened this issue Jun 27, 2024 · 1 comment

Comments

@mishaschwartz
Copy link

We would like to provide users with a mechanism that allows them to publish a workflow or data product (product) that they have created on a Marble node.

When a user wants to publish a product, it should be accompanied with some metadata that helps other users search for and use the product for their own research. Another user should also be able to recreate the product using the same steps as the original author.

Publishing requirements (note that this is not an exhaustive list, feel free to add to it):

  • author(s)
  • input data (description and provenance)
  • data description:
    • spatial/temporal extents
    • resolution
    • variables
  • steps to recreate data (eg. an accompanying workflow script or similar?)

The entire publishing mechanism should include:

  • a UI for users to request that their product be published
  • a UI for node administrators to accept, request changes, or reject a request
  • a (mostly automated if possible) process to publish the data on the THREDDS/geoserver server with metadata hosted on the STAC catalog.
    • this should be integrated with the data ingestion workflow project if possible

Suggested steps to take for this project:

  1. Research the following if you are not already familiar:

  2. Research/compile example of the sort of data and workflows that users may want to publish on Marble

    • you may want to do a literature review of climate research papers that have accompanying data products
  3. Compile a list of metadata that should accompany a published product

    • describe which metadata is: always required, required depending on the product type, optional
    • describe acceptable values for each metadata type
  4. Translate the metadata described above as a STAC extension(s) so that published products can be stored as a STAC entry

  5. Design the UI for users to request that their product be published

  6. Design the UI for node administrators to accept, request changes, or reject a request

    • requests for changes need to be communicated to the original requestor and there needs to be a UI for them to ammend their request and re-submit it for review
  7. Implement the UI for steps 5 and 6 above

  8. Write software that takes a data product (and accompanying metadata) once it has been accepted for publishing and:

    • makes it available through the THREDDS server (or similar as appropriate)
      • data can/should be shared through THREDDS or geoserver
      • workflows will probably need a different way of hosting them. Consider that workflows can be defined as jupyter notebooks, weaver jobs, cwl files, etc.
    • adds the metadata as an entry on the STAC API so that the new product is searchable

Deliverables:

  • Internal report describing steps 2 and 3 above
  • STAC extension(s) required to store published metadata (if needed)
  • a UI design (wireframe) for users to request that their product be published
  • a UI design (wireframe) for node administrators to accept, request changes, or reject a request
  • implementation of the two UI described above
  • software to publish the requested data (see step 8)

Participants/Roles:

  • Student (TBD): research and software development, UX design/consultation
  • Shruti Katkar: UX design and UX research assistance
  • Alex Yu: UI development, UI consultation
  • Steve Easterbrook: consult on metadata and publishing requirements
@fmigneault
Copy link

fmigneault commented Jul 12, 2024

We are currently working on similar requirements for GeoDataCubes (GDC) in OGC Testbed-20 regarding how to perform a workflow processing on multi-dimensional spatio-temporal data, and how the resulting data products/collections can keep track and report their provenance with the processing pipeline (see also the Integrity, Provenance, and Trust (IPT) track) - ie FAIR principles.

A few relevant documents/issues of ongoing work items for metadata:

And below, previous searches I did regarding relevant STAC extensions when involving metadata relevant for machine learning or notable filtering of data :

→ Accuacy: combination of STAC extensions

→ Filtering / pre-processing:

→ Data Sample Elements (samples from DataLoader?):

Note that are many more extensions for different use cases and data types, and the list often expands:
https://stac-extensions.github.io/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants