Skip to content

Commit

Permalink
[GCP] [BigQuery] Handle totalBytesProcessed NoneType (apache#27474)
Browse files Browse the repository at this point in the history
* [GCP] [BigQuery] Handle totalBytesProcessed NoneType

* Update CHANGES.md

* lint / whitespace

---------

Co-authored-by: Yi Hu <yathu@google.com>
  • Loading branch information
2 people authored and bullet03 committed Aug 11, 2023
1 parent 1aadfee commit 0633841
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 4 deletions.
2 changes: 2 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@

## Bugfixes

* Fixed DirectRunner bug in Python SDK where GroupByKey gets empty PCollection and fails when pipeline option `direct_num_workers!=1`. ([#27373](https://github.com/apache/beam/pull/27373))
* Fixed BigQuery I/O bug when estimating size on queries that utilize row-level security ([#27474](https://github.com/apache/beam/pull/27474))
* Fixed X (Java/Python) ([#X](https://github.com/apache/beam/issues/X)).

## Known Issues
Expand Down
26 changes: 22 additions & 4 deletions sdks/python/apache_beam/io/gcp/bigquery.py
Original file line number Diff line number Diff line change
Expand Up @@ -751,8 +751,17 @@ def estimate_size(self):
kms_key=self.kms_key,
job_labels=self._get_bq_metadata().add_additional_bq_job_labels(
self.bigquery_job_labels))
size = int(job.statistics.totalBytesProcessed)
return size

if job.statistics.totalBytesProcessed is None:
# Some queries may not have access to `totalBytesProcessed` as a
# result of row-level security.
# > BigQuery hides sensitive statistics on all queries against
# > tables with row-level security.
# See cloud.google.com/bigquery/docs/managing-row-level-security
# and cloud.google.com/bigquery/docs/best-practices-row-level-security
return None

return int(job.statistics.totalBytesProcessed)
else:
# Size estimation is best effort. We return None as we have
# no access to the query that we're running.
Expand Down Expand Up @@ -1104,8 +1113,17 @@ def estimate_size(self):
kms_key=self.kms_key,
job_labels=self._get_bq_metadata().add_additional_bq_job_labels(
self.bigquery_job_labels))
size = int(job.statistics.totalBytesProcessed)
return size

if job.statistics.totalBytesProcessed is None:
# Some queries may not have access to `totalBytesProcessed` as a
# result of row-level security
# > BigQuery hides sensitive statistics on all queries against
# > tables with row-level security.
# See cloud.google.com/bigquery/docs/managing-row-level-security
# and cloud.google.com/bigquery/docs/best-practices-row-level-security
return None

return int(job.statistics.totalBytesProcessed)
else:
# Size estimation is best effort. We return None as we have
# no access to the query that we're running.
Expand Down

0 comments on commit 0633841

Please sign in to comment.