You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OS type and version: Linux, happens on docker container python:3.9-slim-bullseye in GKE as well.
Python version: 3.9.12
pip version: 22.0.4
google-cloud-bigquery version: 3.2.0
Steps to reproduce
Read from a large table with to_dataframe_iterable(bqstorage_client)
Will continue to fill memory until OOMKiller kicks in.
Disable bqstorage_client and the problem is gone. ##EDIT, not entirely sure this is true.. I think this still happens just astronomically slower. Iterating by row is different though.
Code example
# Runs out of memory:bqstorage_client=bigquery_storage.BigQueryReadClient()
fordfinbigquery_result.result().to_dataframe_iterable(bqstorage_client=bqstorage_client, max_queue_size=2):
pass# Works fine:forrowinbigquery_result.result():
pass
Is max_queue_size not propagated or something like that? The table I'm reading from is 24gb in size and not partitioned. I've been trying to use tracemalloc etc to track down what's going on, but not been successful. Happy to help add debug information if anyone has any ideas on how to resolve this one.
The text was updated successfully, but these errors were encountered:
We had this issue as well. We started to suspect that to_dataframe_iterable can kick off loads of threads, where each of them uses a lot of memory, and also suspected that max_queue_size didn't affect it. Not using the bqstorage client also seemed fix the memory issue, but was also very much slower for it.
To have the best of both worlds, we came up with a bit of a hack that modifies the bqStorageClient's create_read_session method: modifying its max_stream_count argument to always be 1
Environment details
3.9.12
22.0.4
google-cloud-bigquery
version:3.2.0
Steps to reproduce
Code example
Is max_queue_size not propagated or something like that? The table I'm reading from is 24gb in size and not partitioned. I've been trying to use tracemalloc etc to track down what's going on, but not been successful. Happy to help add debug information if anyone has any ideas on how to resolve this one.
The text was updated successfully, but these errors were encountered: