Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean_tables script to prepare a database for import can not handle too many rows in a particular table #2459

Closed
vickyszuchin opened this issue Jul 12, 2024 · 2 comments · Fixed by #2482
Assignees
Labels
bug Something that isn't working as intended carryover Carryover from a previous sprint dev pulled in pulled into a sprint after sprint planning

Comments

@vickyszuchin
Copy link

vickyszuchin commented Jul 12, 2024

Current Behavior

When running the clean_tables script to prepare a database for import, the process gets killed if there are too many rows in a particular table.

Expected Behavior

The clean_tables script to prepare a database for import, the process is successful regardless of the number of rows in a table.

Steps to Reproduce

  1. Execute clean_tables script table with large table

Environment

Staging

Additional Context

Peer Dave Kennedy:

  • In order to fix, delete rows should be batched in groups of 1000.
  • Incorporate this code directly into the clean_tables script:

def delete_objects_in_batches(model):
total_deleted = 0
while True:
pks = list(model.objects.values_list('pk', flat=True)[:BATCH_SIZE])
if not pks:
break
with transaction.atomic():
deleted, _ = model.objects.filter(pk__in=pks).delete()
total_deleted += deleted
logger.info(f"Deleted {deleted} objects, total deleted: {total_deleted}")
sleep(0.1)
logger.info(f"Finished deleting. Total deleted: {total_deleted}")

Issue Links

No response

@vickyszuchin
Copy link
Author

Per approval by Alysia to move this up a sprint since it's already in "In review" status. Moved from sprint 50 to sprint 49.

@vickyszuchin vickyszuchin added the pulled in pulled into a sprint after sprint planning label Jul 22, 2024
@vickyszuchin
Copy link
Author

Per guideline, all "In review" tickets from Sprint 49 will moved forward at noon EST on Day 4 to current sprint (Sprint 50).

@vickyszuchin vickyszuchin added the carryover Carryover from a previous sprint label Jul 29, 2024
dave-kennedy-ecs added a commit that referenced this issue Jul 31, 2024
Issue #2459: updated clean_tables to run in batches of 1000 rows
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that isn't working as intended carryover Carryover from a previous sprint dev pulled in pulled into a sprint after sprint planning
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

2 participants