Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline script linting #147

Merged
merged 18 commits into from
Jan 5, 2021
Merged

Pipeline script linting #147

merged 18 commits into from
Jan 5, 2021

Conversation

nritsche
Copy link
Contributor

@nritsche nritsche commented Dec 9, 2020

Examples:

Value of requires has no corresponding out:

pipeline:
  tasks:
  - in: input_files
    out: siderealday
    type: caput.tests.test_lint.DoNothing
  - in: siderealday
    out: tstream
    params:
      a_list:
      - 1
    requires: input_files
    type: caput.tests.test_lint.DoNothing2

output:

caput.pipeline.PipelineConfigError: Setting up task 0 caused an error - Value 'in' for task <class 'caput.tests.test_lint.DoNothing'> has no corresponding 'out' from another task (Value input_files is not in []).

Input file doesn't exist:

pipeline:
  tasks:
  - out: input_files
    params:
      files: /this/does/not/exist.h5
    type: draco.core.io.LoadFilesFromParams
  - in: input_files
    out: siderealday
    type: ch_pipeline.analysis.sidereal.SiderealGrouper
  - in: siderealday
    out: maskedday
    params:
      output_name: rfi_filtered.h5
      save: true
    requires: input_files
    type: ch_pipeline.analysis.flagging.RFIFilter
  - in: maskedday
    out: tstream
    params:
      flag_type:
      - globalflag
      - bad_calibration_fpga_restart
      period: 2
      phase:
      - 1
    type: ch_pipeline.analysis.calibration.NoiseSourceFold

output:

caput.pipeline.PipelineConfigError: Setting up task 0 caused an error - Failed instanciating <class 'draco.core.io.LoadFilesFromParams'> from config: File not found: /this/does/not/exist.h5

Task doesn't exist

pipeline:
  tasks:
  - out: input_files
    params:
      files: testdata / *.h5
    type: draco.core.io.OupsTypo
  - in: input_files
    out: siderealday
    type: ch_pipeline.analysis.sidereal.SiderealGrouper

output:

caput.pipeline.PipelineConfigError: Setting up task 0 caused an error - Loading task 'draco.core.io.OupsTypo' caused error - AttributeError: module 'draco.core.io' has no attribute 'OupsTypo'

@nritsche nritsche requested a review from jrs65 December 11, 2020 18:18
@nritsche nritsche force-pushed the rn/scriptlinter branch 5 times, most recently from ce42258 to f7c1a70 Compare December 11, 2020 19:22
@nritsche nritsche marked this pull request as ready for review December 11, 2020 20:00
@nritsche
Copy link
Contributor Author

nritsche commented Dec 11, 2020

Tasks have to be ordered according to how their in/out/require's connect. Will this be very annoying? I changed that

Copy link
Contributor

@jrs65 jrs65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nritsche this is looking great.

One issue I can think of is that because of the virtualenv parameter in the cluster config that the pipeline might run within a different Python environment than the one that is used by caput-pipeline. So you could in theory have issue with task that are missing/changed between those versions. I don't know how exactly to workaround that, potentially you could add the virtualenv's site packages to sys.path before doing any of the task imports for validation. Or potentially just warn if this is happening.

Although this might seem like a strange situation to be in, it's actually the way the automated pipeline runs as it tries to maintain strict isolation of the code that is used in the pipeline for consistency.

caput/pipeline.py Outdated Show resolved Hide resolved
caput/tests/test_lint.py Show resolved Hide resolved
@nritsche
Copy link
Contributor Author

@nritsche this is looking great.

One issue I can think of is that because of the virtualenv parameter in the cluster config that the pipeline might run within a different Python environment than the one that is used by caput-pipeline. So you could in theory have issue with task that are missing/changed between those versions. I don't know how exactly to workaround that, potentially you could add the virtualenv's site packages to sys.path before doing any of the task imports for validation. Or potentially just warn if this is happening.

Although this might seem like a strange situation to be in, it's actually the way the automated pipeline runs as it tries to maintain strict isolation of the code that is used in the pipeline for consistency.

It's a bit hacky but I found a solution that activates the venv for linting.

@nritsche
Copy link
Contributor Author

nritsche commented Dec 17, 2020

TODO

  • print line&column of error
  • warn about unused keys

@jrs65
Copy link
Contributor

jrs65 commented Dec 18, 2020

Here's an issue I've just found if the config changes the logging level, the linter spits out a bunch of messages:

$ caput-pipeline queue rev02_winter1_restack.yaml
     6.1s [MPI 0/1] - DEBUG    caput.pipeline: Added SetMPILogging to task list.
     6.3s [MPI 0/1] - DEBUG    caput.pipeline: Added LoadProductManager to task list.
     7.2s [MPI 0/1] - DEBUG    caput.pipeline: Added LoadFilesFromParams to task list.
     7.5s [MPI 0/1] - DEBUG    caput.pipeline: Added MaskDay to task list.
     7.5s [MPI 0/1] - DEBUG    caput.pipeline: Added DataFlagger to task list.
     7.5s [MPI 0/1] - DEBUG    caput.pipeline: Added SiderealMean to task list.
     7.5s [MPI 0/1] - DEBUG    caput.pipeline: Added ChangeSiderealMean to task list.
     7.5s [MPI 0/1] - DEBUG    caput.pipeline: Added SiderealStacker to task list.
     7.5s [MPI 0/1] - DEBUG    caput.pipeline: Added DelaySpectrumEstimator to task list.
     7.5s [MPI 0/1] - DEBUG    caput.pipeline: Added RingMapMaker to task list.
Submitted batch job 57624086

- merge pipeline.PipelineConfigError and config.CaputConfigError
- add line number and  file name to exception
- check for unused keys (ignore the ones added by pipeline setup, not by
user)
- add (very stupid) indentation to linter output
@nritsche nritsche requested a review from jrs65 December 19, 2020 00:53
@nritsche
Copy link
Contributor Author

Here's an issue I've just found if the config changes the logging level, the linter spits out a bunch of messages:

$ caput-pipeline queue rev02_winter1_restack.yaml
     6.1s [MPI 0/1] - DEBUG    caput.pipeline: Added SetMPILogging to task list.
     6.3s [MPI 0/1] - DEBUG    caput.pipeline: Added LoadProductManager to task list.
     7.2s [MPI 0/1] - DEBUG    caput.pipeline: Added LoadFilesFromParams to task list.
     7.5s [MPI 0/1] - DEBUG    caput.pipeline: Added MaskDay to task list.
     7.5s [MPI 0/1] - DEBUG    caput.pipeline: Added DataFlagger to task list.
     7.5s [MPI 0/1] - DEBUG    caput.pipeline: Added SiderealMean to task list.
     7.5s [MPI 0/1] - DEBUG    caput.pipeline: Added ChangeSiderealMean to task list.
     7.5s [MPI 0/1] - DEBUG    caput.pipeline: Added SiderealStacker to task list.
     7.5s [MPI 0/1] - DEBUG    caput.pipeline: Added DelaySpectrumEstimator to task list.
     7.5s [MPI 0/1] - DEBUG    caput.pipeline: Added RingMapMaker to task list.
Submitted batch job 57624086

This is because the linter really just sets up the pipeline, so these logs are still correct. I don't want to fumble a flag into the already chaotic task setup just to toggle debug logs. Do you have a better idea?

@jrs65
Copy link
Contributor

jrs65 commented Dec 19, 2020

Override the root handler that the manager sets up?

@nritsche nritsche merged commit 8f22399 into master Jan 5, 2021
@nritsche nritsche deleted the rn/scriptlinter branch January 5, 2021 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants