-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop using @pete and @uma (strings) in role assignments rather than @2 and @3 (database IDs) #1151
Comments
What if a Dataverse user with a local account wants to start using Shib or Oauth? In that case, I believe this identifier does change in the role assignment table. It also happens in the opposite case, when someone goes from Shib to a local user. When a role assignment string may change:
If this understanding is correct, it's another reason why everyone should also have a username--b/c usernames may be added or lost depending on the direction of the move. /cc @pdurbin @scolapasta |
@michbarsinai any thoughts on this? |
Username changes were not a requirement. Moving between user directories was addressed by the authenticate user lookup table (which now has a 1-to-1 relation rather than 1-to-many). Are these official requirements? We are using the string there in order to support assignees (not users!) that do not live in the DB (:guest, &authenticatedUsers...) and to support a standard resolution of the assignee - note that different assignees live in different tables, or in memory. The resolution is done via the service bean, and not via SQL. If everything must be expressed in SQL (which I don't think), we can add columns of nullable foreign keys for the various assignees tables, and a VARCHAR field for the in-memory ones. I don't think this is going to be performant (multiple joins on large tables). Also, it's not a sound solution, as there's no way to tell the SQL that only a single foreign key should be non-null. Replacing the numbers with strings (@2) feels wrong, as it's a very expensive way of storing a number. We could go with a triplet of fields, one denoting a type (VARCHAR) another holding an ID (INT) and another one holding assignee name (for the in-memory assignees). The ID field won't be a foreign key though, since it may refer to different tables. |
In discussing this issue we really, really need to stop conflating userIdentifier with username. They are very different things. It only confuses things. A username is a string that's probably no more than a dozen characters long that you can imagine telling someone over the phone. ("On Twitter, I'm 'philipdurbin' but on GitHub I'm 'pdurbin'.") Every authenticated user should have a username, I've argued for a long time. Both Twitter and GitHub allow you to change your username and in fact on Twitter the Dataverse account was recently changed from "thedataorg" to "dataverseorg" in IQSS/dataverse.org#32 As of this writing, only builtin users have usernames. This is unfortunate. Currently, what all authenticated users have (both builtin and shib) is a userIdentifier, a potentially long, ugly, unfriendly, unreadable string. The userIdentifier for Shibboleth users looks something like I don't necessarily want to store a role assignment for pete as the string |
The use case of converting a local/builtin account is being discussed in #796 and #1387 as well as the Shibboleth Functional Requirements Document for Dataverse 4.0. It's something we'd like for 4.0. I assume in the UI, the user will click "Confirm" on a dialog that says something like "Really convert my local account to a Shibboleth account." The question then is what needs to happen in the database. In the screenshot below, I'm trying to capture a list of database tables that need to be updated. I'm sure more tables should be added to the list, especially when explicit groups are shipped as part of #91 or #92 or whatever ticket. And there well could be other tables I'm not even thinking of. #983 for example, hints that we should persist the creator of a dataverse, for example. (We can only hope that what is persisted is the primary key from the authenticateduser table rather than a string like In the example below, imagine that user "Fiona Finch" wants to start using Shibboleth. Here's what I know needs to change in the database (again there is or will be more, I assume):
Again, these I'm pretty sure about but where else is "finch" or "@finch" squirreled away in the database? Coming soon, I think:
I could use some help tracking these down. |
It should be enough to change the authenticatedUserLookup table. The row that points to @finch (i.e. has @finch's id in the authenticateduser_id column) should have the shib provider id in the authenticationproviderid column, and https://idp.testshib.org/idp/shibboleth|ec5b1df9 in the persistentuserid column.
|
I disagree. For example, as new roles are assigned to the converted user, the @scolapasta opined that the He also pointed out that comments in tickets may not be the best way of discussing all this. :) I hope my concerns are clear, at least. @raprasad has written about similar concerns in #1387 |
I agree with everything here, except for the shib persistent identifiers (https://idp.testshib.org/idp/shibboleth|ec5b1df9) leaking beyond the user lookup process. The shib provider has all the information it needs to automatically create an OK identifier. We could (should?) make a flow where the user can select/amend this username before it's created.
|
@michbarsinai is sounds like you'd like to have the Shibboleth code refactored to store an "OK identifier". I believe your definition of OK is this:
Does that sound right? Basically, you want me to create a new database table for Shibboleth users and when Shibboleth users log in there will be a double lookup. First you check the new Shibboleth user table (which doesn't yet exist). Then you check theauthenticateduserlookup table, based on the useridentifier in the Shibboleth user table. Finally, you have the primary key (and entire row) of the user from the authenticateduser table. There's already a ticket to refactor the Shibboleth code at #963 This will take time to implement, of course. By the way, what is actually leaking beyond the lookup process right into the bowels of the system at the level of assigning permissions (roles) in the current design is the overloaded concept of a username/useridentifer. Every other system has a clean separation between username and useridentifier. My Twitter username is "philipdurbin" but my Twitter useridentifer is 18326200. My GitHub username is "pdurbin" but my GitHub useridentifier is 21006. I'm sure in Twitter and GitHub when permissions are assigned they are not assigned to "philipdurbin" or "pdurbin" in their respective systems. Rather, permissions are most likely assigned to numbers. I continue to object to the current design, which is why I opened this ticket in the first place. |
Part of this was answered by email. The rest - good enough means human readable, memorable, and based on the user data. Actual algorithm needs to be approved by Liz, who asked for this explicitly. It may require a some UI. As for:
No. The pair (idprovider id, persistent user ID) has to be unique. authenticateduserlookup can have duplicate values. Sent from my iPhone
|
Now it should be, as of this commit I just made: 20c38b6 :) As @michbarsinai and @scolapasta and I discovered, the
@scolapasta didn't want to change the behavior of that method so now it calls a new From a Shib perspective, I'm now no longer blocked. I change the Shib code to use the new Now, when I create Shib users, the Shib-specific
Now I should be able to start taking a closer look at converting local users to Shib users in #796 |
The persistentuserid field in authenticateduserlookup now contains an immutable value (i.e. "1938468") rather than a mutable one such as a username (i.e. "jane_doe"). This is much preferred because most systems allow you to change usernames (from "jane_doe" to "jane_smith", for example, sometimes with side effects) in the case of divorce, etc. Until IQSS#1445 is fixed, we'll persist the mutable value ("jane_doe") in the useridentifier field in the authenticateduser column because it's what is (unfortunately) displayed in the UI. This value is also used for the assigneeidentifier field in the roleassignment table (see IQSS#1151).
I desperately wish this had all been designed differently but I'm going just have to be at peace with it. Closing. |
The more I stare at the role assignment table (screenshot attached) the more I'm completely freaked out by the use of strings in role assignments rather than id numbers.
When I assign a role to Pete it gets persisted as "@pete" rather than "@2".
I'm fine with the use of @ for users and some other symbol (&) for groups.
But what if pete wants to change his username?
Why should I have to do extra database lookups to figure out the database id of the authenticated user? Why can't I just have the authenticated user's database id right there in front of me while I'm looking at the role assignment table?
Here are some more examples with
@admin
,@finch
, and@spruce
:The text was updated successfully, but these errors were encountered: