Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop using @pete and @uma (strings) in role assignments rather than @2 and @3 (database IDs) #1151

Closed
pdurbin opened this issue Nov 20, 2014 · 12 comments
Labels

Comments

@pdurbin
Copy link
Member

pdurbin commented Nov 20, 2014

The more I stare at the role assignment table (screenshot attached) the more I'm completely freaked out by the use of strings in role assignments rather than id numbers.

When I assign a role to Pete it gets persisted as "@pete" rather than "@2".

I'm fine with the use of @ for users and some other symbol (&) for groups.

But what if pete wants to change his username?

Why should I have to do extra database lookups to figure out the database id of the authenticated user? Why can't I just have the authenticated user's database id right there in front of me while I'm looking at the role assignment table?

Here are some more examples with @admin, @finch, and @spruce:

brittle

@pdurbin pdurbin added this to the In Review - Dataverse 4.0 milestone Nov 20, 2014
@raprasad
Copy link
Contributor

What if a Dataverse user with a local account wants to start using Shib or Oauth?

In that case, I believe this identifier does change in the role assignment table. It also happens in the opposite case, when someone goes from Shib to a local user.

When a role assignment string may change:

  1. A Person who authenticates via an institution (Shib users) wants to become a local user
  2. Person who is a local user wants to use her/his institutional auth

If this understanding is correct, it's another reason why everyone should also have a username--b/c usernames may be added or lost depending on the direction of the move.

/cc @pdurbin @scolapasta

@scolapasta
Copy link
Contributor

@michbarsinai any thoughts on this?

@scolapasta scolapasta modified the milestones: Beta 11 - Dataverse 4.0, In Review - Dataverse 4.0 Jan 6, 2015
@michbarsinai
Copy link
Member

Username changes were not a requirement. Moving between user directories was addressed by the authenticate user lookup table (which now has a 1-to-1 relation rather than 1-to-many).

Are these official requirements?

We are using the string there in order to support assignees (not users!) that do not live in the DB (:guest, &authenticatedUsers...) and to support a standard resolution of the assignee - note that different assignees live in different tables, or in memory. The resolution is done via the service bean, and not via SQL.

If everything must be expressed in SQL (which I don't think), we can add columns of nullable foreign keys for the various assignees tables, and a VARCHAR field for the in-memory ones. I don't think this is going to be performant (multiple joins on large tables). Also, it's not a sound solution, as there's no way to tell the SQL that only a single foreign key should be non-null.

Replacing the numbers with strings (@2) feels wrong, as it's a very expensive way of storing a number. We could go with a triplet of fields, one denoting a type (VARCHAR) another holding an ID (INT) and another one holding assignee name (for the in-memory assignees). The ID field won't be a foreign key though, since it may refer to different tables.

@scolapasta scolapasta modified the milestones: Dataverse 4.0: Final, Beta 11 - Dataverse 4.0 Jan 8, 2015
@pdurbin
Copy link
Member Author

pdurbin commented Jan 13, 2015

In discussing this issue we really, really need to stop conflating userIdentifier with username. They are very different things. It only confuses things.

A username is a string that's probably no more than a dozen characters long that you can imagine telling someone over the phone. ("On Twitter, I'm 'philipdurbin' but on GitHub I'm 'pdurbin'.") Every authenticated user should have a username, I've argued for a long time. Both Twitter and GitHub allow you to change your username and in fact on Twitter the Dataverse account was recently changed from "thedataorg" to "dataverseorg" in IQSS/dataverse.org#32

As of this writing, only builtin users have usernames. This is unfortunate.

Currently, what all authenticated users have (both builtin and shib) is a userIdentifier, a potentially long, ugly, unfriendly, unreadable string. The userIdentifier for Shibboleth users looks something like @https://idp.testshib.org/idp/shibboleth|2c9c47ca-dfc9-41f7-a04d-e47e42b7871c@testshib.org. I had opened #1123 because at one point we were exposing these nasty things in the UI.

I don't necessarily want to store a role assignment for pete as the string @2. A foreign key would be better, if we can figure out how to do this and still account for groups.

@scolapasta scolapasta modified the milestones: In Review - Dataverse 4.0, Dataverse 4.0: Final Jan 23, 2015
@pdurbin
Copy link
Member Author

pdurbin commented Jan 29, 2015

What if a Dataverse user with a local account wants to start using Shib or Oauth?

The use case of converting a local/builtin account is being discussed in #796 and #1387 as well as the Shibboleth Functional Requirements Document for Dataverse 4.0. It's something we'd like for 4.0.

I assume in the UI, the user will click "Confirm" on a dialog that says something like "Really convert my local account to a Shibboleth account." The question then is what needs to happen in the database.

In the screenshot below, I'm trying to capture a list of database tables that need to be updated. I'm sure more tables should be added to the list, especially when explicit groups are shipped as part of #91 or #92 or whatever ticket. And there well could be other tables I'm not even thinking of. #983 for example, hints that we should persist the creator of a dataverse, for example. (We can only hope that what is persisted is the primary key from the authenticateduser table rather than a string like @finch.)

In the example below, imagine that user "Fiona Finch" wants to start using Shibboleth. Here's what I know needs to change in the database (again there is or will be more, I assume):

  • authenticateduser table:
    • change finch to https://idp.testshib.org/idp/shibboleth|ec5b1df9
    • update "Fiona Finch" with whatever came in from Shibboleth
  • authenticateduserlookup table:
    • change finch to https://idp.testshib.org/idp/shibboleth|ec5b1df9
    • change builtin to shib
  • builtinuser table:
    • delete entire row where the username is "finch" (no longer a builtin user)
  • roleassignment table:
    • change @finch to @https://idp.testshib.org/idp/shibboleth|ec5b1df9
  • datasetversion_dataverseuser table:
    • change @finch to @https://idp.testshib.org/idp/shibboleth|ec5b1df9

Again, these I'm pretty sure about but where else is "finch" or "@finch" squirreled away in the database? Coming soon, I think:

  • explicit group assignments
  • dataverse creator
  • other places??

I could use some help tracking these down.

migrate

@michbarsinai
Copy link
Member

It should be enough to change the authenticatedUserLookup table. The row that points to @finch (i.e. has @finch's id in the authenticateduser_id column) should have the shib provider id in the authenticationproviderid column, and https://idp.testshib.org/idp/shibboleth|ec5b1df9 in the persistentuserid column.

On Jan 29, 2015, at 18:28, Philip Durbin notifications@github.com wrote:

What if a Dataverse user with a local account wants to start using Shib or Oauth?

The use case of converting a local/builtin account is being discussed in #796 and #1387 as well as the Shibboleth Functional Requirements Document for Dataverse 4.0. It's something we'd like for 4.0.

I assume in the UI, the user will click "Confirm" on a dialog that says something like "Really convert my local account to a Shibboleth account." The question then is what needs to happen in the database.

In the screenshot below, I'm trying to capture a list of database tables that need to be updated. I'm sure more tables should be added to the list, especially when explicit groups are shipped as part of #91 or #92 or whatever ticket. And there well could be other tables I'm not even thinking of. #983 for example, hints that we should persist the creator of a dataverse, for example. (We can only hope that what is persisted is the primary key from the authenticateduser table rather than a string like @finch.)

In the example below, imagine that user "Fiona Finch" wants to start using Shibboleth. Here's what I know needs to change in the database (again there is or will be more, I assume):

• authenticateduser table:
• change finch to https://idp.testshib.org/idp/shibboleth|ec5b1df9
• update "Fiona Finch" with whatever came in from Shibboleth
• authenticateduserlookup table:
• change finch to https://idp.testshib.org/idp/shibboleth|ec5b1df9
• change builtin to shib
• builtinuser table:
• delete entire row where the username is "finch" (no longer a builtin user)
• roleassignment table:
• change @finch to @https://idp.testshib.org/idp/shibboleth|ec5b1df9
• datasetversion_dataverseuser table:
• change @finch to @https://idp.testshib.org/idp/shibboleth|ec5b1df9
Again, these I'm pretty sure about but where else is "finch" or "@finch" squirreled away in the database? Coming soon, I think:

• explicit group assignments
• dataverse creator
• other places??
I could use some help tracking these down.


Reply to this email directly or view it on GitHub.

@pdurbin
Copy link
Member Author

pdurbin commented Jan 29, 2015

It should be enough to change the authenticatedUserLookup table.

I disagree. For example, as new roles are assigned to the converted user, the roleassignment table would have a weird mix of old pre-conversion identifiers (@finch) and growing number of post-conversion identifiers (@https://idp.testshib.org/idp/shibboleth|ec5b1df9) for the same (converted) user.

@scolapasta opined that the datasetversion_dataverseuser could probably use the primary key of the authenticateduser rather than @finch. This is because only users (never groups) will be editing datasets. We should make a ticket for this refactoring, probably, at some point.

He also pointed out that comments in tickets may not be the best way of discussing all this. :)

I hope my concerns are clear, at least. @raprasad has written about similar concerns in #1387

@michbarsinai
Copy link
Member

I agree with everything here, except for the shib persistent identifiers (https://idp.testshib.org/idp/shibboleth|ec5b1df9) leaking beyond the user lookup process. The shib provider has all the information it needs to automatically create an OK identifier. We could (should?) make a flow where the user can select/amend this username before it's created.

On Jan 29, 2015, at 21:28, Philip Durbin notifications@github.com wrote:

It should be enough to change the authenticatedUserLookup table.

I disagree. For example, as new roles are assigned to the converted user, the roleassignment table would have a weird mix of old pre-conversion identifiers (@finch) and growing number of post-conversion identifiers (@https://idp.testshib.org/idp/shibboleth|ec5b1df9) for the same (converted) user.

@scolapasta https://github.com/scolapasta opined that the datasetversion_dataverseuser could probably use the primary key of the authenticateduser rather than @finch. This is because only users (never groups) will be editing datasets. We should make a ticket for this refactoring, probably, at some point.

He also pointed out that comments in tickets may not be the best way of discussing all this. :)

I hope my concerns are clear, at least. @raprasad https://github.com/raprasad has written about similar concerns in #1387 #1387

Reply to this email directly or view it on GitHub #1151 (comment).

@pdurbin
Copy link
Member Author

pdurbin commented Jan 30, 2015

The shib provider has all the information it needs to automatically create an OK identifier.

@michbarsinai is sounds like you'd like to have the Shibboleth code refactored to store an "OK identifier". I believe your definition of OK is this:

  • the identifier should be a String
  • the identifier should be unique for the useridentifier column in the authenticateduser table
  • the identifier should be unique for the persistentuserid column in the authenticateduserlookup table
  • the identifier should never change

Does that sound right? Basically, you want me to create a new database table for Shibboleth users and when Shibboleth users log in there will be a double lookup. First you check the new Shibboleth user table (which doesn't yet exist). Then you check theauthenticateduserlookup table, based on the useridentifier in the Shibboleth user table. Finally, you have the primary key (and entire row) of the user from the authenticateduser table. There's already a ticket to refactor the Shibboleth code at #963

This will take time to implement, of course.

By the way, what is actually leaking beyond the lookup process right into the bowels of the system at the level of assigning permissions (roles) in the current design is the overloaded concept of a username/useridentifer.

Every other system has a clean separation between username and useridentifier. My Twitter username is "philipdurbin" but my Twitter useridentifer is 18326200. My GitHub username is "pdurbin" but my GitHub useridentifier is 21006. I'm sure in Twitter and GitHub when permissions are assigned they are not assigned to "philipdurbin" or "pdurbin" in their respective systems. Rather, permissions are most likely assigned to numbers. I continue to object to the current design, which is why I opened this ticket in the first place.

@michbarsinai
Copy link
Member

Part of this was answered by email.

The rest - good enough means human readable, memorable, and based on the user data. Actual algorithm needs to be approved by Liz, who asked for this explicitly. It may require a some UI.

As for:

the identifier should be unique for the persistentuserid column in the authenticateduserlookup table

No. The pair (idprovider id, persistent user ID) has to be unique. authenticateduserlookup can have duplicate values.

Sent from my iPhone

On 30 בינו׳ 2015, at 16:38, Philip Durbin notifications@github.com wrote:

The shib provider has all the information it needs to automatically create an OK identifier.

@michbarsinai is sounds like you'd like to have the Shibboleth code refactored to store an "OK identifier". I believe your definition of OK is this:

the identifier should be a String
the identifier should be unique for the useridentifier column in the authenticateduser table
the identifier should be unique for the persistentuserid column in the authenticateduserlookup table
the identifier should never change
Does that sound right? There's already a ticket to refactor the Shibboleth code at #963

This will take time to implement, of course.

By the way, what is actually leaking beyond the lookup process right into the bowels of the system at the level of assigning permissions (roles) in the current design is the overloaded concept of a username/useridentifer.

Every other system has a clean separation between username and useridentifier. My Twitter username is "philipdurbin" but my Twitter useridentifer is 18326200. My GitHub username is "pdurbin" but my GitHub useridentifier is 21006. I'm sure in Twitter and GitHub when permissions are assigned they are not assigned to "philipdurbin" or "pdurbin" in their respective systems. Rather, permissions are most likely assigned to numbers. I continue to object to the current design, which is why I opened this ticket in the first place.


Reply to this email directly or view it on GitHub.

@eaquigley eaquigley modified the milestones: Post 4.0, In Review - Dataverse 4.0 Feb 2, 2015
@pdurbin
Copy link
Member Author

pdurbin commented Feb 2, 2015

It should be enough to change the authenticatedUserLookup table

Now it should be, as of this commit I just made: 20c38b6 :)

As @michbarsinai and @scolapasta and I discovered, the createAuthenticatedUser persists the same authPrvUserPersistentId string to the following places:

  • the useridentifier column of the authenticateduser table
  • the persistentuserid column of the authenticateduserlookup table

@scolapasta didn't want to change the behavior of that method so now it calls a new createAuthenticatedUserWithDecoupledIdentifiers method but to preserve the same behavior it sets both identifiers to be the same string. Again this is in 20c38b6

From a Shib perspective, I'm now no longer blocked. I change the Shib code to use the new createAuthenticatedUserWithDecoupledIdentifiers method in 74795ba

Now, when I create Shib users, the Shib-specific persistentuserid does not leak from the authenticateduserlookup table:

+ psql -c 'select id,name,useridentifier from authenticateduser;' dataverse_db
 id |       name       |  useridentifier  
----+------------------+------------------
  3 | Sammy Sparrow    | sparrow
  4 | Wilbur Wren      | wren
  6 | Caleb Chestnut   | chestnut
  2 | Fiona Finch      | finch
  5 | Sabrina Spruce   | spruce
  7 | Cathy Medina     | tinywolf257
  8 | Isaac Hall       | orangekoala902
  9 | Rodney Alexander | beautifulbird545
  1 | Admin Dataverse  | admin
(9 rows)

+ psql -c 'select * from authenticateduserlookup;' dataverse_db
                 persistentuserid                 | authenticationproviderid | authenticateduser_id 
--------------------------------------------------+--------------------------+----------------------
 admin                                            | builtin                  |                    1
 finch                                            | builtin                  |                    2
 sparrow                                          | builtin                  |                    3
 wren                                             | builtin                  |                    4
 spruce                                           | builtin                  |                    5
 chestnut                                         | builtin                  |                    6
 https://idp.pluto.com/idp/shibboleth|6decc4dc    | shib                     |                    7
 https://idp.sabrina.com/idp/shibboleth|03646a11  | shib                     |                    8
 https://idp.sabrina1.com/idp/shibboleth|6715945b | shib                     |                    9
(9 rows)

+ psql -c 'select * from roleassignment;' dataverse_db
 id |  assigneeidentifier  | definitionpoint_id | role_id 
----+----------------------+--------------------+---------
  1 | @admin               |                  1 |       1
  2 | :authenticated-users |                  1 |       4
 14 | @beautifulbird545    |                 13 |       1
 15 | @beautifulbird545    |                 14 |       6
(4 rows)

+ psql -c 'select datasetversionid,useridentifier from datasetversion_dataverseuser;' dataverse_db
 datasetversionid |  useridentifier   
------------------+-------------------
                3 | @beautifulbird545
(1 row)

Now I should be able to start taking a closer look at converting local users to Shib users in #796

@scolapasta scolapasta modified the milestones: In Review - Long Term, In Review - Short Term May 8, 2015
@mheppler mheppler added Feature: Permissions Component: Code Infrastructure formerly "Feature: Code Infrastructure" and removed Component: High-level labels Jan 27, 2016
@scolapasta scolapasta removed their assignment Jan 27, 2016
@scolapasta scolapasta removed this from the Not Assigned to a Release milestone Jan 28, 2016
pdurbin added a commit to pdurbin/dataverse that referenced this issue Oct 5, 2016
The persistentuserid field in authenticateduserlookup now contains an
immutable value (i.e. "1938468") rather than a mutable one such as a
username (i.e. "jane_doe"). This is much preferred because most systems
allow you to change usernames (from "jane_doe" to "jane_smith", for
example, sometimes with side effects) in the case of divorce, etc.

Until IQSS#1445 is fixed, we'll persist the mutable value ("jane_doe") in
the useridentifier field in the authenticateduser column because it's
what is (unfortunately) displayed in the UI. This value is also used for
the assigneeidentifier field in the roleassignment table (see IQSS#1151).
@pdurbin
Copy link
Member Author

pdurbin commented Jun 28, 2017

I desperately wish this had all been designed differently but I'm going just have to be at peace with it. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants