Stop using @pete and @uma (strings) in role assignments rather than @2 and @3 (database IDs) #1151

pdurbin · 2014-11-20T16:09:35Z

The more I stare at the role assignment table (screenshot attached) the more I'm completely freaked out by the use of strings in role assignments rather than id numbers.

When I assign a role to Pete it gets persisted as "@pete" rather than "@2".

I'm fine with the use of @ for users and some other symbol (&) for groups.

But what if pete wants to change his username?

Why should I have to do extra database lookups to figure out the database id of the authenticated user? Why can't I just have the authenticated user's database id right there in front of me while I'm looking at the role assignment table?

Here are some more examples with @admin, @finch, and @spruce:

The text was updated successfully, but these errors were encountered:

raprasad · 2014-12-14T02:21:03Z

What if a Dataverse user with a local account wants to start using Shib or Oauth?

In that case, I believe this identifier does change in the role assignment table. It also happens in the opposite case, when someone goes from Shib to a local user.

When a role assignment string may change:

A Person who authenticates via an institution (Shib users) wants to become a local user
Person who is a local user wants to use her/his institutional auth

If this understanding is correct, it's another reason why everyone should also have a username--b/c usernames may be added or lost depending on the direction of the move.

/cc @pdurbin @scolapasta

scolapasta · 2015-01-06T20:37:41Z

@michbarsinai any thoughts on this?

michbarsinai · 2015-01-07T22:35:20Z

Username changes were not a requirement. Moving between user directories was addressed by the authenticate user lookup table (which now has a 1-to-1 relation rather than 1-to-many).

Are these official requirements?

We are using the string there in order to support assignees (not users!) that do not live in the DB (:guest, &authenticatedUsers...) and to support a standard resolution of the assignee - note that different assignees live in different tables, or in memory. The resolution is done via the service bean, and not via SQL.

If everything must be expressed in SQL (which I don't think), we can add columns of nullable foreign keys for the various assignees tables, and a VARCHAR field for the in-memory ones. I don't think this is going to be performant (multiple joins on large tables). Also, it's not a sound solution, as there's no way to tell the SQL that only a single foreign key should be non-null.

Replacing the numbers with strings (@2) feels wrong, as it's a very expensive way of storing a number. We could go with a triplet of fields, one denoting a type (VARCHAR) another holding an ID (INT) and another one holding assignee name (for the in-memory assignees). The ID field won't be a foreign key though, since it may refer to different tables.

pdurbin · 2015-01-13T13:10:50Z

In discussing this issue we really, really need to stop conflating userIdentifier with username. They are very different things. It only confuses things.

A username is a string that's probably no more than a dozen characters long that you can imagine telling someone over the phone. ("On Twitter, I'm 'philipdurbin' but on GitHub I'm 'pdurbin'.") Every authenticated user should have a username, I've argued for a long time. Both Twitter and GitHub allow you to change your username and in fact on Twitter the Dataverse account was recently changed from "thedataorg" to "dataverseorg" in IQSS/dataverse.org#32

As of this writing, only builtin users have usernames. This is unfortunate.

Currently, what all authenticated users have (both builtin and shib) is a userIdentifier, a potentially long, ugly, unfriendly, unreadable string. The userIdentifier for Shibboleth users looks something like @https://idp.testshib.org/idp/shibboleth|2c9c47ca-dfc9-41f7-a04d-e47e42b7871c@testshib.org. I had opened #1123 because at one point we were exposing these nasty things in the UI.

I don't necessarily want to store a role assignment for pete as the string @2. A foreign key would be better, if we can figure out how to do this and still account for groups.

pdurbin · 2015-01-29T16:28:33Z

What if a Dataverse user with a local account wants to start using Shib or Oauth?

The use case of converting a local/builtin account is being discussed in #796 and #1387 as well as the Shibboleth Functional Requirements Document for Dataverse 4.0. It's something we'd like for 4.0.

I assume in the UI, the user will click "Confirm" on a dialog that says something like "Really convert my local account to a Shibboleth account." The question then is what needs to happen in the database.

In the screenshot below, I'm trying to capture a list of database tables that need to be updated. I'm sure more tables should be added to the list, especially when explicit groups are shipped as part of #91 or #92 or whatever ticket. And there well could be other tables I'm not even thinking of. #983 for example, hints that we should persist the creator of a dataverse, for example. (We can only hope that what is persisted is the primary key from the authenticateduser table rather than a string like @finch.)

In the example below, imagine that user "Fiona Finch" wants to start using Shibboleth. Here's what I know needs to change in the database (again there is or will be more, I assume):

authenticateduser table:
- change finch to https://idp.testshib.org/idp/shibboleth|ec5b1df9
- update "Fiona Finch" with whatever came in from Shibboleth
authenticateduserlookup table:
- change finch to https://idp.testshib.org/idp/shibboleth|ec5b1df9
- change builtin to shib
builtinuser table:
- delete entire row where the username is "finch" (no longer a builtin user)
roleassignment table:
- change @finch to @https://idp.testshib.org/idp/shibboleth|ec5b1df9
datasetversion_dataverseuser table:
- change @finch to @https://idp.testshib.org/idp/shibboleth|ec5b1df9

Again, these I'm pretty sure about but where else is "finch" or "@finch" squirreled away in the database? Coming soon, I think:

explicit group assignments
dataverse creator
other places??

I could use some help tracking these down.

michbarsinai · 2015-01-29T18:58:18Z

It should be enough to change the authenticatedUserLookup table. The row that points to @finch (i.e. has @finch's id in the authenticateduser_id column) should have the shib provider id in the authenticationproviderid column, and https://idp.testshib.org/idp/shibboleth|ec5b1df9 in the persistentuserid column.

On Jan 29, 2015, at 18:28, Philip Durbin notifications@github.com wrote:

What if a Dataverse user with a local account wants to start using Shib or Oauth?

The use case of converting a local/builtin account is being discussed in #796 and #1387 as well as the Shibboleth Functional Requirements Document for Dataverse 4.0. It's something we'd like for 4.0.

I assume in the UI, the user will click "Confirm" on a dialog that says something like "Really convert my local account to a Shibboleth account." The question then is what needs to happen in the database.

In the screenshot below, I'm trying to capture a list of database tables that need to be updated. I'm sure more tables should be added to the list, especially when explicit groups are shipped as part of #91 or #92 or whatever ticket. And there well could be other tables I'm not even thinking of. #983 for example, hints that we should persist the creator of a dataverse, for example. (We can only hope that what is persisted is the primary key from the authenticateduser table rather than a string like @finch.)

In the example below, imagine that user "Fiona Finch" wants to start using Shibboleth. Here's what I know needs to change in the database (again there is or will be more, I assume):

• authenticateduser table:
• change finch to https://idp.testshib.org/idp/shibboleth|ec5b1df9
• update "Fiona Finch" with whatever came in from Shibboleth
• authenticateduserlookup table:
• change finch to https://idp.testshib.org/idp/shibboleth|ec5b1df9
• change builtin to shib
• builtinuser table:
• delete entire row where the username is "finch" (no longer a builtin user)
• roleassignment table:
• change @finch to @https://idp.testshib.org/idp/shibboleth|ec5b1df9
• datasetversion_dataverseuser table:
• change @finch to @https://idp.testshib.org/idp/shibboleth|ec5b1df9
Again, these I'm pretty sure about but where else is "finch" or "@finch" squirreled away in the database? Coming soon, I think:

• explicit group assignments
• dataverse creator
• other places??
I could use some help tracking these down.

—
Reply to this email directly or view it on GitHub.

pdurbin · 2015-01-29T19:28:33Z

It should be enough to change the authenticatedUserLookup table.

I disagree. For example, as new roles are assigned to the converted user, the roleassignment table would have a weird mix of old pre-conversion identifiers (@finch) and growing number of post-conversion identifiers (@https://idp.testshib.org/idp/shibboleth|ec5b1df9) for the same (converted) user.

@scolapasta opined that the datasetversion_dataverseuser could probably use the primary key of the authenticateduser rather than @finch. This is because only users (never groups) will be editing datasets. We should make a ticket for this refactoring, probably, at some point.

He also pointed out that comments in tickets may not be the best way of discussing all this. :)

I hope my concerns are clear, at least. @raprasad has written about similar concerns in #1387

michbarsinai · 2015-01-29T19:32:24Z

I agree with everything here, except for the shib persistent identifiers (https://idp.testshib.org/idp/shibboleth|ec5b1df9) leaking beyond the user lookup process. The shib provider has all the information it needs to automatically create an OK identifier. We could (should?) make a flow where the user can select/amend this username before it's created.

On Jan 29, 2015, at 21:28, Philip Durbin notifications@github.com wrote:

It should be enough to change the authenticatedUserLookup table.

I disagree. For example, as new roles are assigned to the converted user, the roleassignment table would have a weird mix of old pre-conversion identifiers (@finch) and growing number of post-conversion identifiers (@https://idp.testshib.org/idp/shibboleth|ec5b1df9) for the same (converted) user.

@scolapasta https://github.com/scolapasta opined that the datasetversion_dataverseuser could probably use the primary key of the authenticateduser rather than @finch. This is because only users (never groups) will be editing datasets. We should make a ticket for this refactoring, probably, at some point.

He also pointed out that comments in tickets may not be the best way of discussing all this. :)

I hope my concerns are clear, at least. @raprasad https://github.com/raprasad has written about similar concerns in #1387 #1387
—
Reply to this email directly or view it on GitHub #1151 (comment).

pdurbin · 2015-01-30T14:38:18Z

The shib provider has all the information it needs to automatically create an OK identifier.

@michbarsinai is sounds like you'd like to have the Shibboleth code refactored to store an "OK identifier". I believe your definition of OK is this:

the identifier should be a String
the identifier should be unique for the useridentifier column in the authenticateduser table
the identifier should be unique for the persistentuserid column in the authenticateduserlookup table
the identifier should never change

Does that sound right? Basically, you want me to create a new database table for Shibboleth users and when Shibboleth users log in there will be a double lookup. First you check the new Shibboleth user table (which doesn't yet exist). Then you check theauthenticateduserlookup table, based on the useridentifier in the Shibboleth user table. Finally, you have the primary key (and entire row) of the user from the authenticateduser table. There's already a ticket to refactor the Shibboleth code at #963

This will take time to implement, of course.

By the way, what is actually leaking beyond the lookup process right into the bowels of the system at the level of assigning permissions (roles) in the current design is the overloaded concept of a username/useridentifer.

Every other system has a clean separation between username and useridentifier. My Twitter username is "philipdurbin" but my Twitter useridentifer is 18326200. My GitHub username is "pdurbin" but my GitHub useridentifier is 21006. I'm sure in Twitter and GitHub when permissions are assigned they are not assigned to "philipdurbin" or "pdurbin" in their respective systems. Rather, permissions are most likely assigned to numbers. I continue to object to the current design, which is why I opened this ticket in the first place.

michbarsinai · 2015-01-30T22:07:08Z

Part of this was answered by email.

The rest - good enough means human readable, memorable, and based on the user data. Actual algorithm needs to be approved by Liz, who asked for this explicitly. It may require a some UI.

As for:

the identifier should be unique for the persistentuserid column in the authenticateduserlookup table

No. The pair (idprovider id, persistent user ID) has to be unique. authenticateduserlookup can have duplicate values.

Sent from my iPhone

On 30 בינו׳ 2015, at 16:38, Philip Durbin notifications@github.com wrote:

The shib provider has all the information it needs to automatically create an OK identifier.

@michbarsinai is sounds like you'd like to have the Shibboleth code refactored to store an "OK identifier". I believe your definition of OK is this:

the identifier should be a String
the identifier should be unique for the useridentifier column in the authenticateduser table
the identifier should be unique for the persistentuserid column in the authenticateduserlookup table
the identifier should never change
Does that sound right? There's already a ticket to refactor the Shibboleth code at #963

This will take time to implement, of course.

By the way, what is actually leaking beyond the lookup process right into the bowels of the system at the level of assigning permissions (roles) in the current design is the overloaded concept of a username/useridentifer.

Every other system has a clean separation between username and useridentifier. My Twitter username is "philipdurbin" but my Twitter useridentifer is 18326200. My GitHub username is "pdurbin" but my GitHub useridentifier is 21006. I'm sure in Twitter and GitHub when permissions are assigned they are not assigned to "philipdurbin" or "pdurbin" in their respective systems. Rather, permissions are most likely assigned to numbers. I continue to object to the current design, which is why I opened this ticket in the first place.

—
Reply to this email directly or view it on GitHub.

pdurbin · 2015-02-02T22:47:47Z

It should be enough to change the authenticatedUserLookup table

Now it should be, as of this commit I just made: 20c38b6 :)

As @michbarsinai and @scolapasta and I discovered, the createAuthenticatedUser persists the same authPrvUserPersistentId string to the following places:

the useridentifier column of the authenticateduser table
the persistentuserid column of the authenticateduserlookup table

@scolapasta didn't want to change the behavior of that method so now it calls a new createAuthenticatedUserWithDecoupledIdentifiers method but to preserve the same behavior it sets both identifiers to be the same string. Again this is in 20c38b6

From a Shib perspective, I'm now no longer blocked. I change the Shib code to use the new createAuthenticatedUserWithDecoupledIdentifiers method in 74795ba

Now, when I create Shib users, the Shib-specific persistentuserid does not leak from the authenticateduserlookup table:

+ psql -c 'select id,name,useridentifier from authenticateduser;' dataverse_db
 id |       name       |  useridentifier  
----+------------------+------------------
  3 | Sammy Sparrow    | sparrow
  4 | Wilbur Wren      | wren
  6 | Caleb Chestnut   | chestnut
  2 | Fiona Finch      | finch
  5 | Sabrina Spruce   | spruce
  7 | Cathy Medina     | tinywolf257
  8 | Isaac Hall       | orangekoala902
  9 | Rodney Alexander | beautifulbird545
  1 | Admin Dataverse  | admin
(9 rows)

+ psql -c 'select * from authenticateduserlookup;' dataverse_db
                 persistentuserid                 | authenticationproviderid | authenticateduser_id 
--------------------------------------------------+--------------------------+----------------------
 admin                                            | builtin                  |                    1
 finch                                            | builtin                  |                    2
 sparrow                                          | builtin                  |                    3
 wren                                             | builtin                  |                    4
 spruce                                           | builtin                  |                    5
 chestnut                                         | builtin                  |                    6
 https://idp.pluto.com/idp/shibboleth|6decc4dc    | shib                     |                    7
 https://idp.sabrina.com/idp/shibboleth|03646a11  | shib                     |                    8
 https://idp.sabrina1.com/idp/shibboleth|6715945b | shib                     |                    9
(9 rows)

+ psql -c 'select * from roleassignment;' dataverse_db
 id |  assigneeidentifier  | definitionpoint_id | role_id 
----+----------------------+--------------------+---------
  1 | @admin               |                  1 |       1
  2 | :authenticated-users |                  1 |       4
 14 | @beautifulbird545    |                 13 |       1
 15 | @beautifulbird545    |                 14 |       6
(4 rows)

+ psql -c 'select datasetversionid,useridentifier from datasetversion_dataverseuser;' dataverse_db
 datasetversionid |  useridentifier   
------------------+-------------------
                3 | @beautifulbird545
(1 row)

Now I should be able to start taking a closer look at converting local users to Shib users in #796

The persistentuserid field in authenticateduserlookup now contains an immutable value (i.e. "1938468") rather than a mutable one such as a username (i.e. "jane_doe"). This is much preferred because most systems allow you to change usernames (from "jane_doe" to "jane_smith", for example, sometimes with side effects) in the case of divorce, etc. Until IQSS#1445 is fixed, we'll persist the mutable value ("jane_doe") in the useridentifier field in the authenticateduser column because it's what is (unfortunately) displayed in the UI. This value is also used for the assigneeidentifier field in the roleassignment table (see IQSS#1151).

pdurbin · 2017-06-28T00:27:06Z

I desperately wish this had all been designed differently but I'm going just have to be at peace with it. Closing.

pdurbin added Type: Suggestion an idea Status: Design labels Nov 20, 2014

pdurbin added this to the In Review - Dataverse 4.0 milestone Nov 20, 2014

raprasad added the Component: High-level label Dec 14, 2014

scolapasta modified the milestones: Beta 11 - Dataverse 4.0, In Review - Dataverse 4.0 Jan 6, 2015

scolapasta assigned michbarsinai Jan 6, 2015

michbarsinai assigned scolapasta and unassigned michbarsinai Jan 7, 2015

scolapasta modified the milestones: Dataverse 4.0: Final, Beta 11 - Dataverse 4.0 Jan 8, 2015

scolapasta modified the milestones: In Review - Dataverse 4.0, Dataverse 4.0: Final Jan 23, 2015

pdurbin mentioned this issue Jan 29, 2015

The Ticket Formerly Known as Log In & Shibboleth: Onboarding Process #1387

Closed

eaquigley modified the milestones: Post 4.0, In Review - Dataverse 4.0 Feb 2, 2015

pdurbin added a commit that referenced this issue Feb 2, 2015

support different values for user identifier vs. lookup string #1151

20c38b6

pdurbin added a commit that referenced this issue Feb 2, 2015

use createAuthenticatedUserWithDecoupledIdentifiers from Shib #1151

74795ba

pdurbin mentioned this issue Feb 10, 2015

Users are unable to change their username #1445

Closed

This was referenced Mar 6, 2015

Manage Permissions: Add Identifier for Users/Groups #1584

Closed

Stop using @pete and @uma in the datasetversion_dataverseuser table #1611

Closed

pdurbin mentioned this issue Apr 9, 2015

API: Deleting a user does not clean up roles, can introduce undesirable behavior. #1929

Closed

scolapasta modified the milestones: In Review - Long Term, In Review - Short Term May 8, 2015

eaquigley removed the Status: Design label Jun 25, 2015

mheppler added Feature: Permissions Component: Code Infrastructure formerly "Feature: Code Infrastructure" and removed Component: High-level labels Jan 27, 2016

scolapasta removed their assignment Jan 27, 2016

scolapasta added Status: Triaged and removed Status: Dev labels Jan 28, 2016

scolapasta removed this from the Not Assigned to a Release milestone Jan 28, 2016

pdurbin mentioned this issue Feb 5, 2016

Support login using email address (as well as username) for builtin/local users #2115

Closed

pdurbin mentioned this issue Mar 9, 2016

API: add endpoint for listing users #3009

Closed

pdurbin closed this as completed Jun 28, 2017

pdurbin mentioned this issue Aug 30, 2019

Shibboleth login fails (in specific case involving multiple accounts and multiple e-mails) #6014

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop using @pete and @uma (strings) in role assignments rather than @2 and @3 (database IDs) #1151

Stop using @pete and @uma (strings) in role assignments rather than @2 and @3 (database IDs) #1151

pdurbin commented Nov 20, 2014

raprasad commented Dec 14, 2014

scolapasta commented Jan 6, 2015

michbarsinai commented Jan 7, 2015

pdurbin commented Jan 13, 2015

pdurbin commented Jan 29, 2015

michbarsinai commented Jan 29, 2015

pdurbin commented Jan 29, 2015

michbarsinai commented Jan 29, 2015

pdurbin commented Jan 30, 2015

michbarsinai commented Jan 30, 2015

pdurbin commented Feb 2, 2015

pdurbin commented Jun 28, 2017

Stop using @pete and @uma (strings) in role assignments rather than @2 and @3 (database IDs) #1151

Stop using @pete and @uma (strings) in role assignments rather than @2 and @3 (database IDs) #1151

Comments

pdurbin commented Nov 20, 2014

raprasad commented Dec 14, 2014

scolapasta commented Jan 6, 2015

michbarsinai commented Jan 7, 2015

pdurbin commented Jan 13, 2015

pdurbin commented Jan 29, 2015

michbarsinai commented Jan 29, 2015

pdurbin commented Jan 29, 2015

michbarsinai commented Jan 29, 2015

pdurbin commented Jan 30, 2015

michbarsinai commented Jan 30, 2015

pdurbin commented Feb 2, 2015

pdurbin commented Jun 28, 2017