Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daily Pipeline rev_08 Config #229

Merged
merged 9 commits into from
Jun 24, 2024
Merged

Daily Pipeline rev_08 Config #229

merged 9 commits into from
Jun 24, 2024

Conversation

ljgray
Copy link
Contributor

@ljgray ljgray commented Jan 16, 2024

Updated RFI masking

  • Uses the new RFIStokesIMask task, with significant improvements to RFI flagging
  • Adds RFI flagging based on a high-delay chi-squared metric
  • Adds RFI flagging based on sensitivity metric

Sidereal rebinning

Uses the new SiderealRebinner task in place of regridding, which is significantly simpler than the previous regridding task and eliminates the artifacts introduced at this stage. Also removes the ThresholdVisWeightBaseline task which was required to remove regridding artifacts.

The rebinning process results in a difference between the true and effective RA bin centres, so a correction is required. On the daily level, a local gradient is calculated based on a stack dataset with full sidereal and frequency coverage. This correction will be applied after stacking when producing a stack from this data.

Restrict run order for BeamFormCat tasks

Slightly modifies the run order around BeamFormCat tasks. This task can store a large copy of the sidereal stream data in its setup method, so these changes try to (a) block the setup from happening until next can run right afterwards and (b) restrict the number of outputs based on the number of catalogs being loaded so that the task moves to its finish state as soon as the last iteration is done.

Fixes and improvements to file online/offline management

Properly manage weather files, flag checks, and file pad lag.

Dependencies

@ljgray ljgray requested a review from jrs65 January 16, 2024 00:55
@ljgray ljgray changed the title feat(daily): reorder a section of the pipeline for better memory management feat(daily): restrict run order for BeamFormCat tasks Jan 17, 2024
@ljgray ljgray changed the title feat(daily): restrict run order for BeamFormCat tasks Daily Pipeline rev_08 Config May 3, 2024
@ljgray ljgray marked this pull request as draft May 3, 2024 20:18
@sjforeman
Copy link
Contributor

Suggested changes for updated rainfall flagging:

  1. Remove rain1mm from flags in type: ch_pipeline.analysis.flagging.DataFlagger block
  2. Add FlagRainfall (feat(flagging): add pipeline task for rainfall flagging #244) after DataFlagger:
    • type: ch_pipeline.analysis.flagging.FlagRainfall
      in: sstream_mask3
      out: sstream_mask4
      params:
      accumulation_time: 30.0
      threshold: 1.0
  3. Updated later sstream_maskN references for consistency

@ljgray
Copy link
Contributor Author

ljgray commented May 13, 2024

Suggested changes for updated rainfall flagging:

  1. Remove rain1mm from flags in type: ch_pipeline.analysis.flagging.DataFlagger block

  2. Add FlagRainfall (feat(flagging): add pipeline task for rainfall flagging #244) after DataFlagger:

    • type: ch_pipeline.analysis.flagging.FlagRainfall
      in: sstream_mask3
      out: sstream_mask4
      params:
      accumulation_time: 30.0
      threshold: 1.0
  3. Updated later sstream_maskN references for consistency

Done

@ssiegelx
Copy link
Contributor

The new data quality metric and rfi mask introduced in #246 and draco #268 be incorporated into the configuration file as follows:

    # Smooth the noise estimates which suffer from sample variance
    - type: draco.analysis.flagging.SmoothVisWeight
      in: tstream_thermal_corrected
      out: tstream_day_smoothweight

    # Mask out the short baselines.  Make a copy since the next task
    # will apply a delay filter in place.
    - type: draco.analysis.flagging.MaskBaselines
      requires: manager
      in: tstream_day_smoothweight
      out: tstream_day_noshort
      params:
        zero_data: No
        mask_short: 20.0
        share: "none"

    # Apply an aggressive delay filter.
    - type: draco.analysis.dayenu.DayenuDelayFilterFixedCutoff
      in: tstream_day_noshort
      out: tstream_day_filtered
      params:
        tauw: 0.400
        single_mask: false
        atten_threshold: 0.0

    # Check consistency of data with noise at high delay.
    - type: draco.analysis.transform.ReduceChisq
      in: tstream_day_filtered
      out: chisq_day_filtered
      params:
        axes:
          - "stack"
        dataset: "vis"
        save: true
        output_name: "chisq_{tag}.h5"

    # Generate an RFI mask from the chi-squared test statistic.
    - type: draco.analysis.flagging.RFIMaskChisqHighDelay
      in: chisq_day_filtered
      out: rfimask2
      params:
        save: true
        output_name: "rfi_mask_chisq_{tag}.h5"

    # Apply the RFI mask. This will modify the data in place.
    - type: draco.analysis.flagging.ApplyTimeFreqMask
      in: [tstream_day_smoothweight, rfimask2]
      out: tstream_day_rfi

Although we have to make a copy of the timestream in order to filter it, which could potentially cause memory issues.

@ljgray
Copy link
Contributor Author

ljgray commented May 14, 2024

The new data quality metric and rfi mask introduced in #246 and draco #268 be incorporated into the configuration file as follows:

    # Smooth the noise estimates which suffer from sample variance
    - type: draco.analysis.flagging.SmoothVisWeight
      in: tstream_thermal_corrected
      out: tstream_day_smoothweight

    # Mask out the short baselines.  Make a copy since the next task
    # will apply a delay filter in place.
    - type: draco.analysis.flagging.MaskBaselines
      requires: manager
      in: tstream_day_smoothweight
      out: tstream_day_noshort
      params:
        zero_data: No
        mask_short: 20.0
        share: "none"

    # Apply an aggressive delay filter.
    - type: draco.analysis.dayenu.DayenuDelayFilterFixedCutoff
      in: tstream_day_noshort
      out: tstream_day_filtered
      params:
        tauw: 0.400
        single_mask: false
        atten_threshold: 0.0

    # Check consistency of data with noise at high delay.
    - type: draco.analysis.transform.ReduceChisq
      in: tstream_day_filtered
      out: chisq_day_filtered
      params:
        axes:
          - "stack"
        dataset: "vis"
        save: true
        output_name: "chisq_{tag}.h5"

    # Generate an RFI mask from the chi-squared test statistic.
    - type: draco.analysis.flagging.RFIMaskChisqHighDelay
      in: chisq_day_filtered
      out: rfimask2
      params:
        save: true
        output_name: "rfi_mask_chisq_{tag}.h5"

    # Apply the RFI mask. This will modify the data in place.
    - type: draco.analysis.flagging.ApplyTimeFreqMask
      in: [tstream_day_smoothweight, rfimask2]
      out: tstream_day_rfi

Although we have to make a copy of the timestream in order to filter it, which could potentially cause memory issues.

Looks good, I'll test it out. I have a feeling that we will end up having memory issues. If we do, how much more computationally expensive would it be to do the delay filter for all baselines and then add a min/max baseline length parameter to ReduceChisq? We could potentially get around having to make a copy that way.

@ljgray
Copy link
Contributor Author

ljgray commented May 14, 2024

Ah never mind, I see now that the dayenu filter is applied in place

@ssiegelx
Copy link
Contributor

Yeah exactly. I think if it was an issue, then we could just create a single task that does everything (baseline masking, filtering, and sum over the stack axis). It would do this for each time sample and then output a (freq, time) container for masking.

@ljgray
Copy link
Contributor Author

ljgray commented May 16, 2024

@ssiegelx I tested with the standard pipeline config and it did unfortunately produce a memory error during analysis.dayenu.DayenuDelayFilterFixedCutoff for the standard pipeline (12 nodes).

If it's straightforward to combine the tasks then that would probably be the best way to go. I can also try increasing the number of nodes per pipeline job to 16, but with our lower allocation this year it wouldn't be ideal

@ssiegelx
Copy link
Contributor

Ok thanks! I'll combine the tasks. I think I should have something by tomorrow.

@ssiegelx
Copy link
Contributor

The updated task has been pushed to draco #268. I am still waiting to get a test job to run. The config should now look like:

    # Smooth the noise estimates which suffer from sample variance
    - type: draco.analysis.flagging.SmoothVisWeight
      in: tstream_thermal_corrected
      out: tstream_day_smoothweight

    # Apply an aggressive delay filter and
    # check consistency of data with noise at high delay.
    - type: draco.analysis.dayenu.DayenuDelayFilterFixedCutoff
      in: tstream_day_smoothweight
      out: chisq_day_filtered
      params:
        tauw: 0.400
        single_mask: false
        atten_threshold: 0.0
        reduce_baseline: true
        mask_short: 20.0
        save: true
        output_name: "chisq_{tag}.h5"

    # Generate an RFI mask from the chi-squared test statistic.
    - type: draco.analysis.flagging.RFIMaskChisqHighDelay
      in: chisq_day_filtered
      out: rfimask2
      params:
        save: true
        output_name: "rfi_mask_chisq_{tag}.h5"

    # Apply the RFI mask. This will modify the data in place.
    - type: draco.analysis.flagging.ApplyTimeFreqMask
      in: [tstream_day_smoothweight, rfimask2]
      out: tstream_day_rfi

@ljgray ljgray force-pushed the ljg/pipeline-run-order branch 3 times, most recently from a889ba6 to 09319ac Compare May 23, 2024 23:50
@ljgray ljgray force-pushed the ljg/pipeline-run-order branch 10 times, most recently from 8b172b8 to c5b9183 Compare May 31, 2024 00:41
@ljgray ljgray force-pushed the ljg/pipeline-run-order branch 6 times, most recently from f581f76 to 1bc36aa Compare June 13, 2024 21:36
@ljgray ljgray force-pushed the ljg/pipeline-run-order branch 3 times, most recently from c61be01 to fa4bdbb Compare June 20, 2024 19:27
@ljgray ljgray force-pushed the ljg/pipeline-run-order branch 2 times, most recently from 2c06f62 to 3fe6bc9 Compare June 20, 2024 23:03
@ljgray ljgray marked this pull request as ready for review June 21, 2024 21:33
@ljgray ljgray requested a review from jmaceachern June 22, 2024 00:02
@ljgray ljgray merged commit ffd893c into master Jun 24, 2024
2 checks passed
@ljgray ljgray deleted the ljg/pipeline-run-order branch June 24, 2024 18:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants