Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sidereal, containers): sample variance over sidereal days #104

Merged
merged 8 commits into from
May 28, 2021

Conversation

ssiegelx
Copy link
Contributor

@ssiegelx ssiegelx commented Sep 1, 2020

sidereal.SiderealStacker:

  • Add the capability to use uniform or inverse variance weighting.
  • Calculate the sample variance over sidereal days.
  • Switch to an algorithm that calculates the current mean at each
    iteration and updates the sum of square differences from the
    current mean, which is less prone to numerical issues.

containers.SampleVarianceContainer

  • Acts as super class for both SiderealStream and TrackBeam.
  • Sets default component axis on instantiation.
  • Defines properties to access common datasets.
  • Defines properties for rotating sample_variance to common bases.
  • Defines sample_weight property that can be compared to the weight
    dataset.
  • Add sample_variance and nsample datasets to SiderealStream
    and TrackBeam. By default these are not initialized.

Seth Siegel added 2 commits August 28, 2020 11:31
sidereal.SiderealStacker:
- Add the capabililty to use uniform or inverse variance weighting.
- Calculate the sample variance over sidereal days.
- Switch to an algorithm that calculates the current mean at each
iteration and updates the sum of square differences from the
current mean, which is less prone to numerical issues.

containers.SampleVarianceContainer
- Acts as super class for both SiderealStream and TrackBeam.
+ Sets default component axis on instantiation.
+ Defines properties to access common datasets.
+ Defines properties for rotating sample_variance to common bases.
+ Defines sample_weight property that can be compared to the weight
dataset.
- Add sample_variance and nsample datasets to SiderealStream
and TrackBeam.  By default these are not initialized.
@ssiegelx
Copy link
Contributor Author

ssiegelx commented Sep 1, 2020

This PR replaces PR #68. See that PR for an example of the variance over holographic observations.

example_baselines.pdf contains plots of the sample variance for a 16-day stack of data from Feb 2019 generated with this branch. Each page is a different unique baseline (see title for the baseline distance). The bottom two panels show the sample variance in the I-Q and amp-phase basis. The sample variance is compared to the expected noise, which is obtained by propagating the weight dataset.

In general there is decent agreement between variance over days and the expectation based on weights. We definitely see an increase above expected in the regions around the brightest point sources -- likely due to variations in complex gain -- with the phase direction showing larger variations than the amplitude direction.

As you move to shorter baseline distances, you do begin to see an excess noise. excess_noise_versus_baseline.pdf shows the median (over the sidereal day) ratio of measured to expected noise as a function of baseline distance. For the intercylinder baselines the day-to-day variations are only 10-50% larger than expected based on the weights, but for the intracylinder baselines the day-to-day variations increase from 10% to 1000% of the expected as you move to the shortest baseline distances. This is likely due to variations in additive cross-talk and complex gain variations modulating the diffuse sky.

Copy link
Contributor

@tristpinsm tristpinsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took so long to get to this. It got lost in email while I was away in the summer.

(I made some comments but I'm not done yet)

norm = self.stack.weight[:]

# Perform Bessel's correction. In the case of
# uniform weighting, norm will be equal to nsample - 1.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this equal to nsample - 1 only if each of the weights is 1 ? Otherwise they make a distinction between "frequency weights" and "reliability weights" here https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Weighted_sample_variance. This is probably not terribly important?

draco/analysis/sidereal.py Show resolved Hide resolved
Copy link
Contributor

@tristpinsm tristpinsm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for adding the sample variance capabilities to the holography container as well

Seth Siegel added 2 commits May 14, 2021 18:23
Now SiderealStacker only calculates the sample variance if requested.
Copy link
Contributor

@jrs65 jrs65 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear what to do about making nested stacking work, but otherwise I think this is good.

draco/analysis/beamform.py Outdated Show resolved Hide resolved
self.stack.vis[:] += delta_before * tools.invert_no_zero(sum_coeff)

# The calculations below are only required if the sample variance was requested
if self.with_sample_variance:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed on Friday, I don't think this sample variance estimation works with nested stacking, i.e. if I run a set of individual days through this, get a couple of stacks like that, and stack those together with this, this will essentially take the variance of the means. I think what we really want it to do is to take the means of the variances (in some heuristic sense). Does that seem reasonable?

I think that means switching behaviour if you find that your input data already has a sample_variance dataset, but that might be a bit too much work to squeeze into this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah sorry about that, I forgot what was done in this PR.

Are there scenarios in the current stacking pipeline where this task is used for nested stacking? Or is the SiderealStackMatch task used exclusively for that purpose?

I can update this task (and also SiderealStackMatch) so that it handles the sample variance calculation during nested stacking properly, but I think it will be low priority. If you think that it would be useful to have the sample variance of the quarter stacks on the next version of the pipeline processing, then I would merge this PR. Then the sample variance calculation can be turned off when generating a full stack until I can issue a separate PR for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just SiderealStackerMatch. I agree that it's a low priority to get the variance carried through to the nested stacks, so maybe add it to the Pipeline issues list, and then we'll get this one merged.

@ssiegelx
Copy link
Contributor Author

chime-experiment/Pipeline/issues/103 describes remaining issues related to nested stacking that are left over from this PR.

@ssiegelx ssiegelx merged commit f139036 into master May 28, 2021
@ssiegelx ssiegelx deleted the ss/sample_variance branch May 28, 2021 23:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants