Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create dispatch_computed_file_generation command #841

Open
avrohomgottlieb opened this issue Aug 9, 2024 · 1 comment
Open

Create dispatch_computed_file_generation command #841

avrohomgottlieb opened this issue Aug 9, 2024 · 1 comment
Assignees

Comments

@avrohomgottlieb
Copy link
Contributor

avrohomgottlieb commented Aug 9, 2024

Context

As mentioned in Issue #840, a dispatch_computed_file_generation command must be build, which will be responsible for dispatching file computation jobs to Batch.

Problem or idea

The dispatch_computed_file_generation command should query the Project model for all projects in the database that do not have any ComputedFiles associated with them, iterate over those projects and then submit them to the Batch queue one by one.

The query could look as follows:

Project.objects.filter(project_computed_files__is_null=True)

Solution or next step

from django.core.management.base import BaseCommand

import boto3

from scpca_portal.config.logging import get_and_configure_logger

batch = boto3.client('batch', region_name=os.environ["AWS_REGION"])
logger = get_and_configure_logger(__name__)


class Command(BaseCommand):
    def handle(self, *args, **kwargs) -> None:
        self.dispatch_computed_file_generation()
  
    def submit_batch_job(self, project_id:str) -> None:
        response = batch.submit_job(
            jobName='job-name',
            jobQueue='job-queue',
            jobDefinition='job-definition',
            containerOverrides={
                'command': ['python', 'manage.py', 'generate_computed_files', 'project.scpca_id']            
         }
         logger.info(f'{project} submitted to Batch as job {response["jobId"]}')
        )

    def dispatch_computed_file_generation(self) -> None:
        for project in Project.objects.filter(project_computed_files__is_null=True):
            self.submit_batch_job(project.scpca_id)

From an error handling perspective, submit_batch_job could return a bool (indicating whether or not the job was successfully added to the queue). If the job was not successfully added to the job queue, one tactic could be to try submitting the job again. A possible way of doing that could be as follows:

from queue import Queue

dispatch_queue = Queue()
for project in Project.objects.filter(project_computed_files__is_null=True):
    dispatch_queue.put(project)    

while not dispatch_queue.empty():
    project = dispatch_queue.get()
    if not self.submit_batch_job(project.scpca_id):
        dispatch_queue.put(project)
@avrohomgottlieb avrohomgottlieb self-assigned this Sep 24, 2024
@avrohomgottlieb
Copy link
Contributor Author

avrohomgottlieb commented Oct 2, 2024

Updates for this Issue:

  • This command should take an optional project argument. If this argument is not passed, then it should submit_jobs for computed_file generation for all projects that don't have computed_files associated with them.
  • submit_job should set command to python manage.py generate_computed_file --project/sample [scpca_id] --config_name [config name]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant