Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gaussNorm: I don't understand the resulting transformation #54

Open
SamGG opened this issue Aug 13, 2024 · 3 comments
Open

gaussNorm: I don't understand the resulting transformation #54

SamGG opened this issue Aug 13, 2024 · 3 comments

Comments

@SamGG
Copy link

SamGG commented Aug 13, 2024

I see problems with this code:

  • I don't understand changing the intensity of the signal; I agree to use the gaussian function to weight the shift, which leads to a slightly different code, but not to change the intensity because it results in shrinking effects at the ends
  • setting a specific s aka width for each landmark leads to unwanted results IMHO

flowStats/R/gaussNorm.R

Lines 497 to 500 in fabb131

sh=(shift[1]-shift[2])
data=data+gau(data, s[1], m[1])*(sh/2)
data=data-gau(data, s[2], m[2])*(sh/2)
return(data+shift[1]-sh/2)

Code to reproduce my trials

# aim to untouch peak at 0 and shift at 1.24 to 1.74, ie add 0.5
landm = c(0, 0, 1.24, 1.74)
# white noise to visualize the transform
xx = runif(N, -1, 3)
# s from the data = sd(data) ~ 0.6

# problem: ends are not equal to the shift at the nearest landmark
# the negative peak ends at -0.5 in real data
# the positive peak ends at +2.5 in real data
yy = flowStats:::register.function(
    xx, c(.5, .5), landm[c(1,3)], landm[c(1,3)+1]-landm[c(1,3)])
plot(xx, yy); abline(c(0,1)); abline(c(0.5,1))
plot(xx, yy-xx, asp = 1); abline(h=c(0,0.5))

# user should be cautious when setting too small values for s
yy = flowStats:::register.function(
    xx, c(.25, .5), landm[c(1,3)], landm[c(1,3)+1]-landm[c(1,3)])
plot(xx, yy); abline(c(0,1)); abline(c(0.5,1))
plot(xx, yy-xx, asp = 1); abline(h=c(0,0.5))

Here are the 2nd and 4th graphs showing the shift applied to the data as a function of the intensity (from asinh(FluoIntensity/cofactor)).

image

image

@mikejiang @SofieVG @cciccole

@gfinak
Copy link
Member

gfinak commented Aug 13, 2024

@SamGG, just a heads-up that the authors of the gaussNorm approach are from this paper
https://onlinelibrary.wiley.com/doi/10.1002/cyto.a.20823 (14 years ago).
I'm not sure if any of them are still involved in computational flow. gaussNorm, as far as I know, is long-since orphaned and unsupported. I'm not sure who can provide insight into what the code is doing or why it does so, or any possible bugs, known or unknown.

@SamGG
Copy link
Author

SamGG commented Aug 13, 2024

Hi Greg. Thanks for your quick feedback.

Except Ryan, I don't see these names nowdays, and he seems not to be on github. May @tslumley or @fhahne answer?

My main point was to warn potential users and to keep a track if someone wants to investigate.

What I coded for 2 peaks is:

  if(length(m)==2){
    sh=(shift[1]-shift[2])
    w1 = flowStats:::gau(data, s[1], m[1])
    w2 = flowStats:::gau(data, s[2], m[2])
    ws = w1 + w2
    data=data+w1/ws*(sh/2)
    data=data-w2/ws*(sh/2)
    return(data+shift[1]-(sh/2))
  }

This leads to correct results IMHO when s is the same for the 2 peaks.

s = c(0.5, 0.5)

image

s = c(0.25, 0.25)

image

s = c(0.25, 0.5)

image

The latter is still slightly weird (although comprehensive) as s is not similar. This should not arise as the API does not provide to adjust it. BTW, the estimation of s is the sd of all the intensity of the channel (code below). This relies on all the data, not of each peak, which is a coarse estimation in my view.

flowStats/R/gaussNorm.R

Lines 472 to 475 in fabb131

for(i in seq(1,length(matched.lms), by=2)){
shift=append(shift, matched.lms[i+1]-matched.lms[i])
lms=append(lms, matched.lms[i])
s=append(s, sd(na.omit(data)))

@cciccole
Copy link

@SamGG I would need to see more practical examples that show the data -- not just the diff -- in order to understand more clearly, though I don't doubt you've found some sensible improvement.

However, separately, I also think this algorithm has other noteworthy problems so I'm not sure how much thought I can give this particular one 😁 If I recall correctly:

  • Cannot normalize matrices with only 1 column (the classic R bug of error after 2D object is converted to 1D automatically)
  • Extremely inefficient (cannot scale to larger datasets)
  • Can't deal with individual files that are missing a peak compared to their counterparts. I.e., only have negatives in a particular channel.
  • Related to the above, misidentifies landmarks and can do crazy shifts as a result. E.g., negatives get shifted to positives or vice versa.
  • Maybe some more stuff.

The core of the warping, when it gets it right, seems pretty good. Unfortunately there are many situations where it fails. If the core warping strategy were wrapped with a more robust surrounding layer, that could be valuable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants