Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't deal with partial resources due to the lack of permissions for other resources #104

Closed
yu-iskw opened this issue Jan 20, 2022 · 4 comments · Fixed by #108
Closed
Labels
bug Something isn't working good_first_issue Good for newcomers

Comments

@yu-iskw
Copy link
Contributor

yu-iskw commented Jan 20, 2022

Describe the bug

Our dbt project for BigQuery is composed of mutually exclusive sub DAGs by dbt tags to identify services so that we manage all dbt resources in a single place. We would like to use multiple IAM service accounts to deal with each sub DAG selected by dbt tags , because we don't want to create a too strong service account which can refer all BigQuery tables.

According to my research, we have to change the repository dbt-core, not this repository dbt-bigquery. However, let me create the issue in dbt-bigquery first, because I am not sure the issue is related to other warehouses.

Steps To Reproduce

I created the github repository to reproduce the issue.
https://github.com/yu-iskw/dbt-issue-with-multiple-service-accounts-on-bigquery

Expected behavior

The image illustrates a simplified our desired use case. Let's consider the service account for GCP project A has permissions to only GCP project A. It has no permissions to GCP project B. However, dbt doesn't work due to the lack of permissions.

When we want to deal with only resources in sub graph X (the blue area) using the service account for GCP project A, we want to skip getting and creating schema of non-selected resources in sub graph Y (the green area).

image

Screenshots and log output

dbt failed because the service account for tag:test_dataset1 have no permissions for the dataset test_dataset2.
The complete logs is at https://github.com/yu-iskw/dbt-issue-with-multiple-service-accounts-on-bigquery/blob/main/resources/dbt.log .

image

dbt compile

dbt compile --profiles-dir profiles --target dataset1 --select "tag:test_dataset1"
Running with dbt=1.0.0
Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
Partial parsing enabled, no changes found, skipping parsing
Found 2 models, 0 tests, 0 snapshots, 0 analyses, 185 macros, 0 operations, 0 seed files, 0 sources, 0 exposures

Encountered an error:
403 GET https://bigquery.googleapis.com/bigquery/v2/projects/your-gcp-project/datasets/test_dataset2/tables?maxResults=100000&prettyPrint=false: Access Denied: Dataset your-gcp-project:test_dataset2: Permission bigquery.tables.list denied on dataset your-gcp-project:test_dataset2 (or it may not exist).

dbt run

dbt run --profiles-dir profiles --target dataset1 --select "tag:test_dataset1" 
Running with dbt=1.0.0
Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
Partial parsing enabled, no changes found, skipping parsing
Found 2 models, 0 tests, 0 snapshots, 0 analyses, 185 macros, 0 operations, 0 seed files, 0 sources, 0 exposures

Encountered an error:
403 GET https://bigquery.googleapis.com/bigquery/v2/projects/ubie-yu-sandbox/datasets/test_dataset2/tables?maxResults=100000&prettyPrint=false: Access Denied: Dataset ubie-yu-sandbox:test_dataset2: Permission bigquery.tables.list denied on dataset ubie-yu-sandbox:test_dataset2 (or it may not exist).

System information

The output of dbt --version:

installed version: 1.0.0
   latest version: 1.0.0

Up to date!

Plugins:
  - bigquery: 1.0.0

The operating system you're using:
macOS 12.1

The output of python --version:
Python 3.8.8

Additional context

Add any other context about the problem here.

@yu-iskw yu-iskw added bug Something isn't working triage labels Jan 20, 2022
@yu-iskw
Copy link
Contributor Author

yu-iskw commented Jan 20, 2022

@jtcohen6 Do you have any idea to resolve the issue?

@McKnight-42
Copy link
Contributor

@yu-iskw Thank you so much for such a detailed write up really fantastic and clear reasoning.
After talking with @jtcohen6 and doing some research there seem to be several ways to potentially resolve this issue ex:

  • Cache only selected values; This would be a bigger lift in dbt-core and could have unintended side effects.
  • We could catch exceptions (including those due to permissions) and keep going.

The second seems to be the more clear path as it is also done in snowflake.
Updating list_relations_without_caching to ignore more errors around permissions should relieve this issue.

try:
return [self._bq_table_to_relation(table) for table in all_tables]
except google.api_core.exceptions.NotFound:
return []

I believe this could be a one-line change that updates our exception handling during caching, so I'm going to mark this a good first issue. Would you be interested in contributing the fix?

@McKnight-42 McKnight-42 added good_first_issue Good for newcomers and removed triage labels Jan 24, 2022
@yu-iskw
Copy link
Contributor Author

yu-iskw commented Jan 25, 2022

@McKnight-42 Thank you for the comment. I completely agree with that it might be tough to change the behavior of dbt-core due to unintended side effects. We can go with catching exceptions in list_relations_without_caching at the moment. I will send a pull request to update.

yu-iskw added a commit to yu-iskw/dbt-bigquery that referenced this issue Jan 25, 2022
yu-iskw added a commit to yu-iskw/dbt-bigquery that referenced this issue Jan 25, 2022
@yu-iskw
Copy link
Contributor Author

yu-iskw commented Jan 25, 2022

@McKnight-42 I have sent the pull request. Can you take a look?
#108

@McKnight-42 McKnight-42 mentioned this issue Jan 31, 2022
4 tasks
McKnight-42 added a commit that referenced this issue Feb 7, 2022
…ching` (#108)

* [#104] Ignore the forbidden exception in `list_relations_without_caching`

* Update CHANGELOG.md

* Further update CHANGELOG.md

Co-authored-by: Matthew McKnight <91097623+McKnight-42@users.noreply.github.com>
siephen pushed a commit to AgencyPMG/dbt-bigquery that referenced this issue May 16, 2022
…thout_caching` (dbt-labs#108)

* [dbt-labs#104] Ignore the forbidden exception in `list_relations_without_caching`

* Update CHANGELOG.md

* Further update CHANGELOG.md

Co-authored-by: Matthew McKnight <91097623+McKnight-42@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good_first_issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants