Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using allow_broker with device code flow causes token refreshing to fail: Account with id '(pii)' not found. #563

Closed
jiasli opened this issue May 18, 2023 · 7 comments · Fixed by #569

Comments

@jiasli
Copy link
Contributor

jiasli commented May 18, 2023

Describe the bug
Using allow_broker with device code flow causes token refreshing to fail: Account with id '(pii)' not found.

To Reproduce

# Opt-in WAM
az config set core.allow_broker=true

# Turn off token encryption so that we can edit it manually later
az config set core.encrypt_token_cache=false

# Log in with device code, but with an account not in WAM, such as domain account testuser@azuresdkteam.onmicrosoft.com
az login --tenant 54826b22-38d6-4fb2-bad9-b7b93a3e9c5a --use-device-code

# Edit ~/.azure/msal_token_cache.json. Change AccessToken.<key>.expires_on to 0 to make the access token expire
# {
#     "AccessToken": {        
#         "...": {
#             ...
#             "expires_on": "0",

# Trigger token refreshing
az group list

Expected behavior
Token refreshing should succeed.

What you see instead

Account with id '(pii)' not found. Status: Response_Status.Status_AccountNotFound, Error code: 0, Tag: 525678464

Apparently, this is because device code's refresh token is saved to MSAL token cache, not WAM, but during token refreshing, MSAL tries to get access token from WAM.

The MSAL Python version you are using
Latest dev branch: 56256e4

Additional context
I understand there is no point to use device code flow when broker is available, but it's better to make the whole workflow more user-friendly.

If MSAL exposes enable_broker rather than allow_broker (like what it is doing internally):

life will be much easier - simply raise an error if acquire_token_by_device_flow is called on a PublicClientApplication with enable_broker=True.

Such allow_... design makes the logic far more complex - both acquire_token_by_device_flow and acquire_token_silent_with_error need to implement such fallback logic and be consistent with each other. In this issue, acquire_token_by_device_flow falls back to MSAL token cache, but acquire_token_silent_with_error doesn't and still reads from WAM.

@rayluo
Copy link
Collaborator

rayluo commented May 27, 2023

Yes, I can reproduce this issue. And @jiasli guessed the reason right, "Apparently, this is because device code's refresh token is saved to MSAL token cache, not WAM, but during token refreshing, MSAL tries to get access token from WAM."

The reason for that is, back then when we implemented the Broker/MsalRuntime/WAM integration, the guidance was "surface all MsalRuntime errors, instead of fallback to non-MsalRuntime code path". Now that I think about it, an MsalRuntime error of "hey I cannot find this account inside my (MsalRuntime's) cache so it is not my business", is not really an error, but an invite to tell MSAL Python to fallback gracefully.

I incline to change MSAL Python to also fallback when the error happens in read_account_by_id(), unless @MSamWils has different opinion. Feel free to contact me if you need more info or discussion.

@MSamWils
Copy link

@rayluo , my understanding is that MSALRuntime do not support device code flow.
Should MSAL Python not even invoking MSALRuntime in this scenario?

@rayluo
Copy link
Collaborator

rayluo commented Jun 1, 2023

@rayluo , my understanding is that MSALRuntime do not support device code flow. Should MSAL Python not even invoking MSALRuntime in this scenario?

MSAL Python already does not invoke MsalRuntime in device code flow. The current issue is on the second leg i.e. the acquire_token_silent(). At that point, MSAL Python will attempt MsalRuntime based on allow_broker=True. My proposal above is to gracefully absorb the "account not found" error from MsalRuntime, and then fallback to the RT in native MSAL's token cache. I built a proof-of-concept and it worked well. Shall I proceed?

@jiasli
Copy link
Contributor Author

jiasli commented Jun 1, 2023

My proposal above is to gracefully absorb the "account not found" error from MsalRuntime, and then fallback to the RT in native MSAL's token cache.

I personally don't like "fallback" as "fallback" introduces unpredictability.

@rayluo
Copy link
Collaborator

rayluo commented Jun 1, 2023

My proposal above is to gracefully absorb the "account not found" error from MsalRuntime, and then fallback to the RT in native MSAL's token cache.

I personally don't like "fallback" as "fallback" introduces unpredictability.

Fair enough, and perhaps "fallback" is not the right word here. In my book (and probably in your "fallback introduces unpredictability" statement, too), fallback hints "some unknown error occurred and we do not know how to deal with it, so let's fallback to a default option". But in this case, "account not found" is actually expected, and going back to the non-broker code path is the right move.

If there are different proposal, please let us know.

@rayluo
Copy link
Collaborator

rayluo commented Oct 26, 2023

in this case, "account not found" is actually expected, and going back to the non-broker code path is the right move.

If there are different proposal, please let us know.

An update on this topic. We tried #569 but that approach was controversial. How about MSAL simply emits a warning during the initial Device Code Flow saying something like "Broker was enabled. Subsequent acquire_token_silent() will only work after a successful acquire_token_interactive()"? Azure CLI can then use with warnings.catch_warnings() to catch that warning, and then also emit a more user friendly warning from Azure CLI to end users.

What do you think, @jiasli ?

@jiasli
Copy link
Contributor Author

jiasli commented Dec 21, 2023

How about MSAL simply emits a warning during the initial Device Code Flow saying something like "Broker was enabled. Subsequent acquire_token_silent() will only work after a successful acquire_token_interactive()"?

This is a regression in my humble opinion.

As #569 as been merged, after bumping MSAL in Azure/azure-cli#27726, I can verify the device code flow now works as before even when broker is enabled (by running az config set core.enable_broker_on_windows=true).

I can see in ~/.azure/msal_token_cache.json, the account looks like

    "Account": {
        "...": {
            ...
            "username": "xxx@xxx.onmicrosoft.com",
            "authority_type": "MSSTS",
            "account_source": "urn:ietf:params:oauth:grant-type:device_code"
        }
    },

In addition, for auth code flow, account_source is authorization_code; for username password flow, account_source is password.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants