Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

streamline authentication #74

Open
dataders opened this issue Oct 11, 2023 · 7 comments
Open

streamline authentication #74

dataders opened this issue Oct 11, 2023 · 7 comments

Comments

@dataders
Copy link
Collaborator

dataders commented Oct 11, 2023

after spending all day trying to debug a connection with ActiveDirectoryServicePrincipal authentication, I feel strongly that this spaghetti code that I helped to create must be refactored, for maintainer sanity :)

For example I spent the better part of a day helping a developer debug some credentials, and I got lost in introspecting the authentication control flow. At the end of the day, the issue was that the developer was passing 1443 as the port instead of 1433.💀 💀 .

Things I noticed

holdovers from on-prem SQL Server

  1. server:port syntax
  2. authentication: sql is default and indicates SqlPassword
  3. windows_login
  4. perhaps some non-Azure, on-prem ActiveDirectory stuff?

authentication simplification

Right now there's a lot of get_*_access_token() methods -- I'm not sure they're all needed.

Right now there's three authentication scenarios for Fabric:

  1. User provides credentials within profiles.yml and decalres the correct ActiveDirectory* type.
  2. User authenticates with az login prior to invoking dbt-fabric . This path happens when user passes one of the following to authentication: "serviceprincipal", "cli", "msi", "auto", "environment"

We should predominately rely upon the first path, and extend only when needed. What doesn't fit into user-provided credentials are:

  • ActiveDirectoryInteractive on non-Windows machines, we should support an authentication=interactive path
  • use a token I already have: perhaps this is called something like cached_az_token or something?
  • not sure if auto is useful. like when would a user ever want to do what's specified in azure.identity.DefaultAzureCredential(). Hey try a lot of things in a specific order? Seems like a recipe for trouble. but happy to be wrong

sane defaults

alignment with underlying driver's defaults

I'm rather torn as to the value of having defaults that are the same as the underlying driver. This is especially complicated given that dbt Cloud also has a distinct set of defaults.

For example, if for msodbc, the default value of Encrypt is Yes, must we also specify it as a default? Perhaps so in this instance because with YAML it makes more sense to provide this as a Boolean T/F? Actually I think yaml handles Yes as True anyway so 🤷🏻 ?

Another question is if a user does not provide a value for Encrypt, should we still pass Encrypt=Yes into the connection string? or leave it out because it is equivalent?

authentication

There should be no default authentication method (right now it is sql, but we should validate that user input is supported).

driver: str
host: str
database: str
schema: str
port: Optional[int] = 1433
UID: Optional[str] = None
PWD: Optional[str] = None
windows_login: Optional[bool] = False
tenant_id: Optional[str] = None
client_id: Optional[str] = None
client_secret: Optional[str] = None
authentication: Optional[str] = "sql"
encrypt: Optional[bool] = True # default value in MS ODBC Driver 18 as well
trust_cert: Optional[bool] = False # default value in MS ODBC Driver 18 as well

general cleanup

there's some bare asserts, which is a python anti-pattern. e.g. - asserting that authentication is None when the Credentials dataclass defines it as a default.

Also the conditionals within the .open() method are hard to grok. Perhaps a standalone Class/function for connection string generation would be more self-evident as to what's going on.

For example ActiveDirectoryServicePrincipal is not supported right now, but easily could be. #75 is one way to do this, but it smells wonky to have to add yet another conditional statement. either we exhaustively model all possible configurations, or make the logic simpler.

if "\\" in credentials.host:
# If there is a backslash \ in the host name, the host is a
# SQL Server named instance. In this case then port number has to be omitted.
con_str.append(f"SERVER={credentials.host}")
else:
con_str.append(f"SERVER={credentials.host},{credentials.port}")
con_str.append(f"Database={credentials.database}")
assert credentials.authentication is not None
if "ActiveDirectory" in credentials.authentication:
con_str.append(f"Authentication={credentials.authentication}")
if credentials.authentication == "ActiveDirectoryPassword":
con_str.append(f"UID={{{credentials.UID}}}")
con_str.append(f"PWD={{{credentials.PWD}}}")
elif credentials.authentication == "ActiveDirectoryInteractive":
con_str.append(f"UID={{{credentials.UID}}}")
elif credentials.windows_login:
con_str.append("trusted_connection=Yes")
elif credentials.authentication == "sql":
con_str.append(f"UID={{{credentials.UID}}}")
con_str.append(f"PWD={{{credentials.PWD}}}")

@prdpsvs
Copy link
Collaborator

prdpsvs commented Nov 19, 2023

@dataders

Great feedback

  • Updated explicit ActiveDirectoryServicePrincipal authentication
  • Removed port from the connection string. It is no longer required in Fabric.
  • Encrypt and trust_cert combination should be provided by user. I am evaluating how SaaS is different from PaaS version.
  • We should support SPN, AAD Password (service accounts) or CLI authentication for now. We should remove auto/environment-based profiles for testing. I will remove the unnecessary get_*_access_token() methods.
  • Raising an error if sql authentication is used. I will default to ActiveDirectoryServicePrincipal authentication and user can change it to CLI.

@prdpsvs
Copy link
Collaborator

prdpsvs commented Nov 22, 2023

Windows authentication depends on Kerberos (or NTLM), which needs an Active Directory domain to authenticate the user in. Azure Active Directory does not handle Kerberos tokens. So users from on-premises can use windows authentication instead of AAD.

@prdpsvs
Copy link
Collaborator

prdpsvs commented Nov 22, 2023

az cli should be the way forward for interactive authentication as dbt- core based project development. I would suggest using az cli to fetch user token for interactive dbt-cloud integration as well.

This will address non windows machines.

@dataders , let me know what you think

@prdpsvs
Copy link
Collaborator

prdpsvs commented Nov 22, 2023

MSI profile does not make sense for dbt projects.
I believe, cli and auto are the same but not sure why we use auto and CLI interchangeably?
What is the purpose of "environment" in profiles?

@dataders
Copy link
Collaborator Author

dataders commented Nov 22, 2023

I would suggest using az cli to fetch user token for interactive dbt-cloud integration as well.

Yeah we should start this conversation separately.

So users from on-premises can use windows authentication instead of AAD.

this should only be a consideration for SQL Server, right?

What is the purpose of "environment" in profiles?

in this scenario, users could have their token/secret in an environment variable? perhaps this would be helpful in Codespace-based envs where they're auto-provisioned.

@prdpsvs
Copy link
Collaborator

prdpsvs commented Nov 22, 2023

I would suggest using az cli to fetch user token for interactive dbt-cloud integration as well.

Yeah we should start this conversation separately.
yes, we should.

So users from on-premises can use windows authentication instead of AAD.

this should only be a consideration for SQL Server, right?
This applies to Fabric DW connecting from non-windows machines or onprem databases

What is the purpose of "environment" in profiles?

in this scenario, users could have their token/secret in an environment variable? perhaps this would be helpful in Codespace-based envs where they're auto-provisioned.

@sdebruyn
Copy link
Collaborator

I don't really agree with this proposal as I use auto and the other env related things all the time. In most places on Azure, auto will get you the right credential and most apps also do this with the DefaultAzureCredential. This allows for easy mixing between places where you have MSI (e.g. in a k8s pod or most Azure services), local (e.g. when you're working on a machine with CLI installed or when you're using the VS Code credential), and so on. The interactive auth fails in a lot of places (e.g. in DevContainers, a popular place to work on dbt projects).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants