Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 458: RSTUF Integration #2

Closed
wants to merge 4 commits into from
Closed

PEP 458: RSTUF Integration #2

wants to merge 4 commits into from

Conversation

kairoaraujo
Copy link
Owner

@kairoaraujo kairoaraujo commented Jun 13, 2023

This PR implements PEP 458 by adding a setup for Repository Service for TUF (RSTUF) and connecting Warehouse to it.

Context

Unlike previous attempts (pypi#7488, pypi#10870) to implement PEP 458, this PR does not deeply integrate TUF repository management into Warehouse, but instead configure a stand-alone RSTUF service, which maintain the TUF metadata repository as a black box, and which accepts calls to a REST API, so that Warehouse can indicate addition or removal of release files, and trigger TUF metadata changes.

Additionally, RSTUF provides a CLI for root signining in order to initialize the RSTUF metadata repository.

See RSTUF docs for details.

Description of this PR

  • Configure RSTUF (development instance)

    • Add development dependencies
    • Add RSTUF worker services to docker-compose.yml
      • RSTUF uses the same Redis Server, but unique Redis DB ids 1 and 2
      • RSTUF uses the same PostgreSQL, but a specific rstuf database
    • Add a pre-generated bootstrap file, with RSTUF config and root of trust
    • Add RSTUF commands Makefile
      • make tufinit to bootstrap the RSTUF service
      • make tufimport to create TUF metadata for all existing release files from example Warehouse database
        It is a make command for Warehouse CLI (warehouse tuf dev import-all)
  • Add calls to RSTUF API upon package addition and removal

Status of RSTUF

RSTUF is close to releasing a beta version, called Minimum Working Version (MWV)). Actually, two of three components (RSTUF Worker and API) are already tagged MWV. The third component (RSTUF CLI) is missing one feature, which is not relevant for this PR, and not expected to break compatibility for the MWV release.

TODO

  • Unit tests
  • Pin RSTUF versions (docker-compose.yml)

@kairoaraujo kairoaraujo force-pushed the rstuf_integration branch 2 times, most recently from e032cea to e70a3b7 Compare June 13, 2023 09:32
Copy link
Owner Author

@kairoaraujo kairoaraujo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments about the infrastructure commit

"bins": 1
},
"services": {
"number_of_delegated_bins": 256,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we choose a smaller number of delegations for easy debugging?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The number should fit the expected number of target files. How many target files do we expect in a dev setup? If only a few for illustration, we can definitely start with a smaller number.

(ideally, the number is chosen automatically anyway, see e.g. https://gitlab.com/rugged/rugged/-/issues/137)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could be a feature in RSTUF, and it is ready to implement this feature with the last changes in the database model.
The current number in PEP 458 is 16384. We can choose a smaller number, such as 16 or so.



@tasks.task(ignore_result=True, acks_late=True)
def generate_projects_simple_detail(request, projects):
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About this task

  1. I need it to have the request object. I couldn't find a way to retrieve the request to be used by the render_simple_detail function. You can see it is used during the process of the warehouse tuf dev import-all command

    request = config.task(_generate_projects_simple_detail).get_request()

  2. I also intended to use it later during the Manage packages/project and simple details to TUF commit. Still, it wasn't possible because I could not send the simple detail generated in the background. I need the response to use in the TUF metadata.

If I can retrieve the request object or generate one with the render_simple_detail, we can remove it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Warehouse folks should be able to help with that.

Copy link
Owner Author

@kairoaraujo kairoaraujo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RSTUF, by design, is asynchronous

targets.* returns always the task id(s)

Should the Warehouse wait and monitor the task result to make it "synchronous"?
Should the Warehouse create an asynchronous monitoring/action for the tasks to take some action depending on the result?

Currently, we store the task id in the events, but the task id remain available in RSTUF for 24 hours (as Celery defaults)

Should the Warehouse store the task results?

@lukpueh lukpueh self-requested a review June 13, 2023 14:54
@lukpueh lukpueh self-assigned this Jun 13, 2023
@lukpueh lukpueh removed their request for review June 13, 2023 14:58
"""


def _fetch_succinct_roles(config):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could use some explanation. It looks like you manually tofu the first version of root in RSTUF and then use the client updater to update timestamp, snapshot and targets, to get the hash bin delegation info from the latest targets, without first updating root?

I have a few question:

  • Is it unsafe to expect that top-level targets has not yet been updated when calling import_all? If not, couldn't we just use the bin numbers from rstuf-bostrap-payload.json to generate Delegations object?
  • If we do need to look at the metadata, does warehouse have no other way, to get the latest metadata, than using the client updater?
  • If we do need the client updater, why not use it all the way, via refresh(), and also update root?
  • If warehouse uses the client updater and needs to bootstrap trust, can't it use the initial root metadata from rstuf-bootstrap-payload.json?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could use some explanation. It looks like you manually tofu the first version of root in RSTUF and then use the client updater to update timestamp, snapshot and targets, to get the hash bin delegation info from the latest targets, without first updating root?

Let me explain the problem. I think you already gave some solutions for it.

The Warehouse user, in that case in specific, a Warehouse developer who wants to develop/test something related to RSTUF / Warehouse, will use this command to import all existent data from development Warehouse database to RSTUF database.

warehouse tuf dev import-all is a subcommand inside dev command to make clear it is for development only. (Maybe a better protection to avoid it to run in production?)

It requires Warehouse database initialization (make initdb) and TUF initialization (make tufinit) first.

What I'm trying to do here: I need to identify the Delegation role name, to fetch from RSTUF Database (table rstuf_target_roles) the role id when I'm inserting the artifact (package or simple detail index).

I have a few question:

  • Is it unsafe to expect that top-level targets has not yet been updated when calling import_all? If not, couldn't we just use the bin numbers from rstuf-bostrap-payload.json to generate Delegations object?`

Do we need to care about this safety here in development process?

I could use it from rstuf-bootstrap-payload.json, but what if I generated a custom ceremony with RSTUF CLI.
For example, as a developer user I want to test a new feature or even test performance with 16384 delegation roles and I used kairo-payload.json in the Makefile or just run the bootstrap command from Makefile? It could generate an inconsistency unless I change a specific call in the cli/tuf.py.

  • If we do need to look at the metadata, does warehouse have no other way, to get the latest metadata, than using the client updater?

I could do as I used to do before, loading the file /var/opt/warehouse/metadata

- rstuf-metadata:/var/opt/warehouse/metadata

Another option is get it from config.registry.settings["tuf.metadata.url"] (http://files:9001/metadata) and get the 1.target.json?, import it as TUF Metadata and get the succinct role.

  • If we do need the client updater, why not use it all the way, via refresh(), and also update root?

I assumed that no updates in the metadata and the developer did: make initdb, make tufinit and make tufimport where I tried to cover only a scenario where they just make a custome ceremony to generate a custom payload, but not a root metadata update.

  • If warehouse uses the client updater and needs to bootstrap trust, can't it use the initial root metadata from rstuf-bootstrap-payload.json?

I assumed yes. IMHO, We shouldn't care here about the root.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm trying to do here: I need to identify the Delegation role name, to fetch from RSTUF Database (table rstuf_target_roles) the role id when I'm inserting the artifact (package or simple detail index).

Wait, why does Warehouse even need to be concerned with those technical details? Shouldn't import-all just throw target files at the RSTUF API, like normal release uploads would do?

MO using the tuf client here makes sense for two reasons:

  1. security -- tuf client guarantees that we get the correct metadata
  2. abstraction -- tuf client hides the complexity of getting metadata

A the same time we, we don't actually care for security here, because we ignore root updates. And we violate layering because we handle TUF metadata and manually resolve delegations inside Warehouse, which could be all hidden behind the RSTUF API.

If this is a workaround, because RSTUF does not yet implement an API endpoint for "backsigning" existing packages, and is only used in a controlled dev environment, then we should use the easiest way of getting this information, e.g. ls /var/opt/warehouse/metadata/*.targets.json | sort -r | head -1, or just create the object by ourselves:

succinct_roles = SuccinctRoles(
   keyids,         # irrelevant
   threshold,      # irrelevant
   bit_length,     # Can't we get this from warehouse config or db?
   name_prefix,    # Can't we get this from warehouse config or db?
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see previous comment for how I think we should do this. Below replies are just to sort out misunderstandings...

I could use it from rstuf-bootstrap-payload.json, but what if I generated a custom ceremony with RSTUF CLI. For example, as a developer user I want to test a new feature or even test performance with 16384 delegation roles and I used kairo-payload.json in the Makefile or just run the bootstrap command from Makefile? It could generate an inconsistency unless I change a specific call in the cli/tuf.py.

I didn't mean the default rstuf-bootstrap-payload.json included in this PR, but the one that the admin/developer actually uses to initialise RSTUF. I assumed that Warehouse would somehow keep a copy of that.

  • If warehouse uses the client updater and needs to bootstrap trust, can't it use the initial root metadata from rstuf-bootstrap-payload.json?

I assumed yes. IMHO, We shouldn't care here about the root.

I didn't mean the initial root metadata that is already in RSTUF, but the one that is uploaded to RSTUF. Again I assumed that Warehouse would somehow keep a copy of that when doing tufinit.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, why does Warehouse even need to be concerned with those technical details? Shouldn't import-all just throw target files at the RSTUF API, like normal release uploads would do?

We are doing an import, not a regular flow of adding targets using the PyPI API to add targets.

This is a procedure to speed up the process of Importing existing targets

Unfortunately, the Warehouse doesn't support the existing tool from RSTUF due to the SQLAlchemy version, and we implemented it in the Warehouse CLI for development.

A the same time we, we don't actually care for security here, because we ignore root updates. And we violate layering because we handle TUF metadata and manually resolve delegations inside Warehouse, which could be all hidden behind the RSTUF API.

If this is a workaround, because RSTUF does not yet implement an API endpoint for "backsigning" existing packages, and is only used in a controlled dev environment,

RSTUF Supports adding all targets to the API and publishing them to the metadata files in the storage by another API call.
But we don't want to run this procedure (designed for high-traffic), adding many artifacts (hundred thousand to millions) at once, so we would do a kind of data migration first.

then we should use the easiest way of getting this information, e.g. ls /var/opt/warehouse/metadata/*.targets.json | sort -r | head -1, or just create the object by ourselves:

succinct_roles = SuccinctRoles(
   keyids,         # irrelevant
   threshold,      # irrelevant
   bit_length,     # Can't we get this from warehouse config or db?
   name_prefix,    # Can't we get this from warehouse config or db?
)

Yes, I think we can go for the simpler way.

bit_length: We can get it from the RSTUF API
name_prefix: I would say it is irrelevant, RSTUF has predefine as "bins"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, you did think all of this through! ❤️ I strongly suggest to put this info (plus link) into a code comment, i.e. why we don't want to use the API, and can't use the CLI here.

Copy link
Collaborator

@lukpueh lukpueh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is amazing, Kairo! Let's resolve the discussion about "backsigning" and share this with Warehouse folks. 🚀

@lukpueh
Copy link
Collaborator

lukpueh commented Jun 14, 2023

One more thing: Let's add a little bit more context about RSTUF and its current state to the PR description, when submitting upstream.

@kairoaraujo kairoaraujo force-pushed the rstuf_integration branch 10 times, most recently from 0e59f72 to 818210a Compare June 15, 2023 05:14
Kairo de Araujo added 4 commits July 29, 2023 12:21
Adds the RSTUF in the Warehouse infrastructure

* Include the RSTUF Ceremony payload file
  - It is generated using `rstuf admin ceremony`, and the keys
* Add the development dependencies
  - RSTUF CLI and dependencies
* Include RSTUF components to the `docker-compose.yml`
 - RSTUF uses the same Redis Server but uses unique Redis DB ids `1` and `2`
 - RSTUF uses the same PostgreSQL, but a specific database rstuf
* Add the RSTUF environment configuration for development
* Define the Makefile commands for RSTUF
  - `make tufinit` to bootstrap the RSTUF service
  - `make tufimport` to import all project packages to the RSTUF service
* Define the basic commands for RSTUF within Warehouse
  - Command to import all existent packages and indexes to TUF
    metadata (`warehouse tuf dev import-all`)
* Add TUF development documentation

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
* Adding packages

After adding a package to the Warehouse database, it generates and
stores the Simple Index with a request to the RSTUF backend to
include the package and its simple index in TUF Metadata.

* Removing package or Project Release

On PyPI Management, when a user removes a file or a project release
it also removes it from TUF metadata and updates the simple details index.

Co-authored-by: Lukas Puehringer <lukas.puehringer@nyu.edu>
Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>

simplify code in warehouse.tuf.targets

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
Reduce the number of delegated hash-bin roles for the development
enviroment.

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
Rename the environment variable setting `TUF_API_URL` to `RSTUF_API_URL`
as this API is provided by Repository Service for TUF (RSTUF).

Signed-off-by: Kairo de Araujo <kdearaujo@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants