Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reading some attributes of a delegated targets role after loading a repository throws a KeyError #574

Closed
Tracked by #10672
trishankatdatadog opened this issue Dec 28, 2017 · 16 comments

Comments

@trishankatdatadog
Copy link
Member

trishankatdatadog commented Dec 28, 2017

Description of issue or feature request:

After loading a repository, reading the threshold of a delegated targets role throws a KeyError.

Current behavior:

  1. Create a new repository with a delegated targets role.
  2. Load the repository.
  3. Try to read the threshold of this delegated targets role:
(Pdb) delegated_targets_role.threshold
*** KeyError: 'threshold'
  1. Try to read the keys of this delegated targets role:
  File "/usr/lib/python3.6/site-packages/tuf/repository_tool.py", line 1008, in keys
    keyids = roleinfo['keyids']
KeyError: 'keyids'

Expected behavior:

Should return the threshold / keys of the delegated targets role. I suspect this is due to missing assignments [1].

Ideally, the fix should include checking the schema of the delegated targets role after assignments.

@trishankatdatadog trishankatdatadog changed the title Reading threshold of a delegated targets role after loading a repository throws a KeyError Reading some attributes of a delegated targets role after loading a repository throws a KeyError Dec 29, 2017
@trishankatdatadog
Copy link
Member Author

Unfortunately, this one is a bit of a showstopper. As a workaround, I am forced to use a kludge (e.g., manually reading from tuf.roledb.get_roleinfo('targets', 'default') to get the keyids and threshold for a delegated targets role off the top-level targets role).

@JustinCappos
Copy link
Member

JustinCappos commented Dec 29, 2017 via email

@trishankatdatadog
Copy link
Member Author

The interface is already there (e.g., delegated_targets_role.threshold and delegated_targets_role.keys). Unless I'm doing something wrong, it just looks like the implementation of this interface is incomplete due to missing parsing and assignments.

@vladimir-v-diaz
Copy link
Contributor

Asking about the threshold and keyids of a delegated role (e.g., my_role.threshold) is complicated by the fact that my_role can be a delegation of more than one role. For instance, foo can delegate to my_role some paths with a threshold of X and keyids A, B, C. The bar role can also delegate to my_role some different paths with a threshold of Y and keyids D, E.

So when you give the my_role.threshold query, which value do you expect?

my_role.threshold was actually a thing in the past, back when delegations resembled a tree rather than a graph. I suppose it's clear which value is meant via repository.targets.foo.my_role.threshold. Unfortunately, it was decided that this command was unweildy, so it was replaced by, for example, repository.targets("my_role").delegate() and repository.targets("foo").version = 2).

What should the interface be to modify a delegation?

Is this good?

repository.targets("foo").revoke("my_role")
repository.targets("foo").delegate("my_role", new_threshold, new_keyids...)

Hopefully we can settle on a design that doesn't negatively impact code maintainability and my well being :)

FYI: I think you can still give a repository.targets.threshold. It is allowed because there is a definitive value.

@vladimir-v-diaz
Copy link
Contributor

As a workaround, I am forced to use a kludge (e.g., manually reading from tuf.roledb.get_roleinfo('targets', 'default') to get the keyids and threshold for a delegated targets role off the top-level targets role).

Should we add a wrapper for this? repository.get_roleinfo("rolename")? The returned dict should contain all of the metadata fields you'd want.

@trishankatdatadog
Copy link
Member Author

I don't know, I have to think about this a bit. Your proposal for modifying a delegation looks fine to me. In general, I'm in favour of being explicit.

My original question, though, was simply about reading the properties of existing delegations loaded from a repository off disk. For example, why do the following not work?

print(str(repository.targets("foo").threshold))
print(str(repository.targets("foo".keys)))

Am I missing something?

@trishankatdatadog
Copy link
Member Author

(As an aside: I, too, am in favour of any solution that maintains your well-being.)

@vladimir-v-diaz
Copy link
Contributor

The problem is that a delegated role doesn't have one threshold value. It is set by one or more delegators. When a delegated targets file is loaded from disk, the repository cannot, for example, set my_delegated_role.threshold to one unique value.

Nevertheless, we should probably still have an interface to inspect all fields of metadata.

Does repository.get_roleinfo("rolename") work for you?

@trishankatdatadog
Copy link
Member Author

trishankatdatadog commented Jan 16, 2018

Vlad, I understand what you mean, but I still don't see why we can't read the threshold without ambiguity. Let me use an example to clarify what I mean.

Suppose that:

  • The targets role delegates "*" to both A and B.
  • A delegates "a*" with a threshold of (2,3) to C.
  • B delegates "b*" with a threshold of (1,2) also to C.

Now why can't I read the threshold of 2 using something like repository("targets")("A")("C")? Similarly, I would get the threshold of 1 when using repository("targets")("B")("C").

Personally, I much prefer the more Pythonic array indexing, but this should work, no? I'm explicitly walking through the graph to get exactly the delegation chain I want. Maybe I missed some reason why this wouldn't work?

@vladimir-v-diaz
Copy link
Contributor

vladimir-v-diaz commented Jan 16, 2018

Sebastien and I discussed this issue today. He noted that it's unclear what is meant by the repository("targets")("A")("C") notation. Does it return the "delegations" field for C (as it appears in A), or the contents of C?

I wonder if it might be simpler to remove these getter and setter functions and give users the metadata itself (exactly as it appears in the specification). What do you think of the following interface for accessing and updating a delegations field?

from tuf.repository_tool import *
repo = load_repository("/repo_dir", "my_repo")
targets_metadata = repo.get_metadata("targets")

delegation1 = targets_medatata["delegations"]["roles"][0]
delegation1["threshold"] = 2

repo.update_metadata("targets", targets_metadata)

What I like about this approach (for the low-level repo tool) is that there is no ambiguity and users are given a data structure that matches exactly what's in the specification.

@trishankatdatadog
Copy link
Member Author

Without having thought deeply about it, my gut response is that there should be harmony between the read and write interfaces. Let me think about it a bit more, and get back to you, please...

@trishankatdatadog
Copy link
Member Author

I see what you're saying, but that approach seems a bit "un-Pythonic" (requires explicitly knowing the metadata specification, whereas I think a more ORM approach would be nice).The proposed read interface is also radically different from the write interface.

What do you think about the following?

repository["targets"]["A"]["C"].targets #gives you targets
repository["targets"]["A"]["C"].threshold #gives you threshold
repository["targets"]["A"]["C"].version #gives you version number
# and so on and so forth

I hope what I'm proposing is not too onerous. What do you think?

@vladimir-v-diaz
Copy link
Contributor

What do you think about the following?

repository["targets"]["A"]["C"].targets #gives you targets
repository["targets"]["A"]["C"].threshold #gives you threshold
repository["targets"]["A"]["C"].version #gives you version number
# and so on and so forth

I have several problems with this notation.

  • repository["targets"]["A"]["C"].threshold reads or modifies an A attribute. In other words, threshold is found in A's metadata. In contrast, repository["targets"]["A"]["C"].version reads or modifies a C attribute, where version is found in C's metadata.

  • In Python, brackets are normally used for lists. It isn't obvious that you are reading/modifying attributes that are actually found in dictionaries, sublists, etc. A notation is being used that doesn't match the format of the actual metadata.

  • One of the tenets of Python is that "there should be one-- and preferably only one --obvious way to do it." If the metadata is written as JSON and loaded as Python dictionaries, there should be one way to read and modify this data.

  • Getter and setter methods are needed for every single attribute found in metadata. This complicates the code and degrades maintainability.

It might be best to load metadata and give it to the integrator, in my opinion. Of course, we would still provide functions to handle things like performing a delegation, fetching specific metadata, etc.

A command-line tool can be provided to make it easier to work with a TUF repository, and the calls under discussion for integrators that want something more low-level. The low-level calls should be kept low level, and the CLI should focus on usability. Isn't the interface provided by Notary, and its Flynn.io predecessor, a CLI?

I'm curious what others think? @JustinCappos @awwad @lukpueh @SantiagoTorres

P.S. The Zen of Python:

>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

@awwad
Copy link
Contributor

awwad commented Jan 18, 2018 via email

@trishankatdatadog
Copy link
Member Author

Oh, I see what you guys are saying now. Yeah, using dictionary-like methods to get/set is fine with me, as long as it is consistent (both read/write uses this interface). Thanks!

@jku
Copy link
Member

jku commented Feb 17, 2022

Closing this issue as it was filed against (what is now known as) the legacy codebase: issue seems to not be relevant anymore. Please re-open or file a new issue if you feel that the issue is revelant to current python-tuf.

repository_tool specifically does not exist anymore: currently the expectation is that repository actions are implemented using the low-level Metadata API.

More details

Current source code (and upcoming 1.0 release) only contains the modern components

  • a low-level Metadata API (tuf.api) and
  • tuf.ngclient that implements the client workflow,

Legacy components (e.g. tuf.client, tuf.repository_tool, tuf.repository_lib as well as the repo and client scripts) are no longer included. See announcement and API reference for more details.

@jku jku closed this as completed Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants