Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.0] fix: JobAgent fails when checking status of the jobs using JobMonitoringClient #7380

Merged
merged 1 commit into from
Jan 8, 2024

Conversation

aldbr
Copy link
Contributor

@aldbr aldbr commented Dec 21, 2023

Pilots with failed jobs are failing with:

Traceback (most recent call last):
  File "/cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.27-1702983535/Linux-x86_64/lib/python3.11/site-packages/DIRAC/Core/Base/AgentReactor.py", line 130, in __finalize
    self.__agentModules[agentName]["instanceObj"].finalize()
  File "/cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.27-1702983535/Linux-x86_64/lib/python3.11/site-packages/DIRAC/WorkloadManagementSystem/Agent/JobAgent.py", line 881, in finalize
    result = self._checkSubmittedJobs()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cvmfs/lhcb.cern.ch/lhcbdirac/versions/v11.0.27-1702983535/Linux-x86_64/lib/python3.11/site-packages/DIRAC/WorkloadManagementSystem/Agent/JobAgent.py", line 725, in _checkSubmittedJobs
    if res["Value"][jobID]["Status"] == JobStatus.RUNNING:
       ~~~~~~~~~~~~^^^^^^^
KeyError: '827068098'

2023-12-21T07:59:07.609424Z INFO [LaunchAgent] DiskSpace (MB) = 6430006

The bug was introduced with dab5430.

Cause of the issue: the jobID is processed as a str in JobAgent, whereas it is returned as a int by the JobMonitoringClient if I understand correctly (type hints would have been more than welcome here).

BEGINRELEASENOTES
*WorkloadManagement
FIX: JobAgent interaction with JobMonitoringClient
ENDRELEASENOTES

@DIRACGridBot DIRACGridBot added the alsoTargeting:integration Cherry pick this PR to integration after merge label Dec 21, 2023
@aldbr aldbr requested a review from fstagni December 21, 2023 09:07
@fstagni fstagni merged commit 6aae4e4 into DIRACGrid:rel-v8r0 Jan 8, 2024
25 checks passed
@DIRACGridBot DIRACGridBot added the sweep:done All sweeping actions have been done for this PR label Jan 8, 2024
DIRACGridBot pushed a commit to DIRACGridBot/DIRAC that referenced this pull request Jan 8, 2024
@DIRACGridBot
Copy link

Sweep summary

Sweep ran in https://github.com/DIRACGrid/DIRAC/actions/runs/7446214355

Successful:

  • integration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
alsoTargeting:integration Cherry pick this PR to integration after merge sweep:done All sweeping actions have been done for this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants