-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sidereal, containers): sample variance over sidereal days #104
Conversation
sidereal.SiderealStacker: - Add the capabililty to use uniform or inverse variance weighting. - Calculate the sample variance over sidereal days. - Switch to an algorithm that calculates the current mean at each iteration and updates the sum of square differences from the current mean, which is less prone to numerical issues. containers.SampleVarianceContainer - Acts as super class for both SiderealStream and TrackBeam. + Sets default component axis on instantiation. + Defines properties to access common datasets. + Defines properties for rotating sample_variance to common bases. + Defines sample_weight property that can be compared to the weight dataset. - Add sample_variance and nsample datasets to SiderealStream and TrackBeam. By default these are not initialized.
This PR replaces PR #68. See that PR for an example of the variance over holographic observations. example_baselines.pdf contains plots of the sample variance for a 16-day stack of data from Feb 2019 generated with this branch. Each page is a different unique baseline (see title for the baseline distance). The bottom two panels show the sample variance in the I-Q and amp-phase basis. The sample variance is compared to the expected noise, which is obtained by propagating the weight dataset. In general there is decent agreement between variance over days and the expectation based on weights. We definitely see an increase above expected in the regions around the brightest point sources -- likely due to variations in complex gain -- with the phase direction showing larger variations than the amplitude direction. As you move to shorter baseline distances, you do begin to see an excess noise. excess_noise_versus_baseline.pdf shows the median (over the sidereal day) ratio of measured to expected noise as a function of baseline distance. For the intercylinder baselines the day-to-day variations are only 10-50% larger than expected based on the weights, but for the intracylinder baselines the day-to-day variations increase from 10% to 1000% of the expected as you move to the shortest baseline distances. This is likely due to variations in additive cross-talk and complex gain variations modulating the diffuse sky. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry it took so long to get to this. It got lost in email while I was away in the summer.
(I made some comments but I'm not done yet)
draco/analysis/sidereal.py
Outdated
norm = self.stack.weight[:] | ||
|
||
# Perform Bessel's correction. In the case of | ||
# uniform weighting, norm will be equal to nsample - 1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this equal to nsample - 1
only if each of the weights is 1 ? Otherwise they make a distinction between "frequency weights" and "reliability weights" here https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Weighted_sample_variance. This is probably not terribly important?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks for adding the sample variance capabilities to the holography container as well
Now SiderealStacker only calculates the sample variance if requested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear what to do about making nested stacking work, but otherwise I think this is good.
self.stack.vis[:] += delta_before * tools.invert_no_zero(sum_coeff) | ||
|
||
# The calculations below are only required if the sample variance was requested | ||
if self.with_sample_variance: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on Friday, I don't think this sample variance estimation works with nested stacking, i.e. if I run a set of individual days through this, get a couple of stacks like that, and stack those together with this, this will essentially take the variance of the means. I think what we really want it to do is to take the means of the variances (in some heuristic sense). Does that seem reasonable?
I think that means switching behaviour if you find that your input data already has a sample_variance
dataset, but that might be a bit too much work to squeeze into this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sorry about that, I forgot what was done in this PR.
Are there scenarios in the current stacking pipeline where this task is used for nested stacking? Or is the SiderealStackMatch
task used exclusively for that purpose?
I can update this task (and also SiderealStackMatch
) so that it handles the sample variance calculation during nested stacking properly, but I think it will be low priority. If you think that it would be useful to have the sample variance of the quarter stacks on the next version of the pipeline processing, then I would merge this PR. Then the sample variance calculation can be turned off when generating a full stack until I can issue a separate PR for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just SiderealStackerMatch
. I agree that it's a low priority to get the variance carried through to the nested stacks, so maybe add it to the Pipeline issues list, and then we'll get this one merged.
chime-experiment/Pipeline/issues/103 describes remaining issues related to nested stacking that are left over from this PR. |
sidereal.SiderealStacker:
iteration and updates the sum of square differences from the
current mean, which is less prone to numerical issues.
containers.SampleVarianceContainer
dataset.
and TrackBeam. By default these are not initialized.