Make max_stream_count
configurable when using Bigquery Storage API
#2030
Labels
api: bigquery
Issues related to the googleapis/python-bigquery API.
Currently, for API that can use BQ Storage Client to fetch data like
to_dataframe_iterable
orto_arrow_iterable
, the client library always uses the maximum number of read streams recommended by BQ server.python-bigquery/google/cloud/bigquery/_pandas_helpers.py
Line 840 in ef8e927
python-bigquery/google/cloud/bigquery/_pandas_helpers.py
Lines 854 to 858 in ef8e927
This behavior has the advantage of maximizing throughput but can lead to out-of-memory issue when there are too many streams being opened and result are not read fast enough: we've encountered queries that open hundreds of streams and consuming GBs of memory.
BQ Storage Client API also suggests capping
max_stream_count
when resource is constrainedhttps://cloud.google.com/bigquery/docs/reference/storage/rpc/google.cloud.bigquery.storage.v1#createreadsessionrequest
This problem has been encountered by others before and can be worked-around by monkey-patching the
create_read_session
on the BQ Client object: #1292However, it should really be fixed by allowing the
max_stream_count
parameter to be set through public API.The text was updated successfully, but these errors were encountered: