Validate ABC-SMC estimation for compartmental (SIR) models #42

ArtPoon · 2017-02-16T16:49:55Z

Just like kernel-ABC validation

--Tammy Ng

gtng92 · 2017-06-08T20:42:29Z

This is more of a minor annoyance, but I'm unsure about the format of variable sampleStates.
I created an example using of SIR non-dynamic compartmental model in pkg/examples/example-compartmental.R and this warning shows on the console:

Warning message:  
In if (!is.na(sampleStates)) 
    if (!is.matrix(sampleStates)) stop("sampleStates must be a matrix (not a data.frame)") :  
      the condition has length > 1 and only the first element will be used

Looking into compartmental-models.R, I've set sampleStates as:

# sample states
  sampleStates <- matrix(1, nrow=tips, ncol=length(demes))
  colnames(sampleStates) <- demes
  rownames(sampleStates) <- 1:tips

This code was derived from kamphir/simulate.SI.R. It creates a matrix with one column and many rows, all with the same value of 1.

In rcolgem.R (Erik Volz's package), I traced this warning back to line 119. It displays regardless, even though sampleStates is a matrix.

I figured when we run the compartmental.model function, we don't want this warning to pop up every single time it simulates a tree.

--Tammy Ng

ArtPoon · 2017-06-12T14:18:00Z

This is a bug in rcolgem that I fixed a while back but didn't make it into this release for some reason. We need to replace:

if (!is.na(sampleStates))

with

if (all(!is.na(sampleStates))

This should result in a condition of length 1. We could make this change to our fork of rcolgem and direct people to that version.
The alternative is to to use the R function suppressWarnings so that we don't get this warning message in our wrapper function.
I think the best approach is to make our fork of rcolgem a submodule of Kaphi and edit this line, since there will be other issues that arise.

--Tammy Ng

gtng92 · 2017-06-16T16:22:01Z

I've simulated a target tree, and was able to calculate kernel distances for a varying parameter.
I'm able to initialize workspace. When I run.smc I keep getting the following error:

Error in sample.int(m^2, size = 1, prob = as.vector(.lambdamat)) : 
  negative probability

The traceback error:

11.sample.int(m^2, size = 1, prob = as.vector(.lambdamat)) at rcolgem.R#1473
10.simulate.binary.dated.tree.fgy(fgy[[1]], fgy[[2]], fgy[[3]], 
    fgy[[4]], sampleTimes, sampleStates, integrationMethod = integrationMethod) at compartmental-models.R#38
9..call.rcolgem(x0, t0, t.end, sampleTimes, sampleStates, births, 
    migrations, deaths, nonDemeDynamics, parms, fgyResolution, 
    integrationMethod) 
8.FUN(X[[i]], ...) 
7.lapply(X = X, FUN = FUN, ...) 
6.sapply(integer(n), eval.parent(substitute(function(...) expr)), 
    simplify = simplify) 
5.replicate(nsim, .call.rcolgem(x0, t0, t.end, sampleTimes, sampleStates, 
    births, migrations, deaths, nonDemeDynamics, parms, fgyResolution, 
    integrationMethod), simplify = FALSE) at compartmental-models.R#211
4.config$model(theta = theta, nsim = config$nsample, tips = workspace$tip.heights, 
    model = model, seed = seed, labels = workspace$tip.labels, 
    ...) 
3.simulate.trees(ws, ws$particles[i, ], model = model, ...) 
2.initialize.smc(ws, model) 
1.run.smc(ws, trace.file = "pkg/examples/example-compartmental.tsv", 
    model = "sir.dynamic")

I noticed that when I calculated kernel distances for varying t.end, the same error is thrown if:

I first set t.end = 30.*52 in theta for the target tree
then as I vary t.end, the t.end value exceeds the value 30.*52

One possible reason for this may be how I've set my prior distribution for t.end. Currently it's a random normal distribution with mean=100 and sd=5.
N is set as lognormal, and the rest of the parameters are gamma distributions. I will see if this error shows up with these ones too.

gtng92 · 2017-06-19T14:14:43Z

rcolgem.R has a clause in function make.fgy that assigns the following:

demeNames <- rownames(births)
m <- nrow(births)
nonDemeNames <- names(nonDemeDynamics)
mm <- length(nonDemeNames)

and then checks

if (length(x0)!=m + mm) stop('initial conditons incorrect dimension', x0, m, mm)

births = (ODE expr) therefore m = 1
nonDemeNames = ('S') therefore mm = 1
x0 = (S=S, I=I) therefore length(x0) = 2

This is okay for the sir.nondynamic model but not for the rest, since I'm including compartment R in the ODE expressions as well as compartment E for the seir model. This changes length(x0) to either 3 or 4.
Do I then put the recovered and exposed expressions together with the births expression?
I would keep the part of the expression that is strictly "deaths" in the deaths compartment.

Before:

births <- rbind(c('parms$beta * S * I / (S+I) - (parms$gamma + parms$mu) * I'))
migrations <- rbind(c('0'))
deaths <- rbind(c('parms$gamma * I - parms$mu * R'))
nonDemeDynamics <- rbind(c('parms$mu * (I+R) - parms$beta * S * I / (S+I)'))

to:

births <- cbind(c('parms$beta * S * I / (S+I) - (parms$gamma + parms$mu) * I ', '- parms$mu * R'))
migrations <- rbind(c('0'))
deaths <- rbind(c('parms$gamma * I '))
nonDemeDynamics <- rbind(c('parms$mu * (I+R) - parms$beta * S * I / (S+I)'))

I would also have to change births to cbind instead of rbind so rcolgem's nrow(births) registers the extra expression.
Or I could change rcolgem.R to other specifications. But I don't know how much I should mess around with it.

ArtPoon · 2017-06-20T18:01:20Z

We don't want to incorporate the recovered compartment in the rcolgem expressions, so change this:

deaths <- rbind(c('parms$gamma * I - parms$mu * R'))

to this:

deaths <- rbind(c('parms$gamma * I'))

and exclude R from the initial conditions vector x0.

gtng92 · 2017-06-21T15:54:25Z

Update on the following error:

Error in sample.int(m^2, size = 1, prob = as.vector(.lambdamat)) : 
  negative probability

Parameter t.end = 30.*52
make.fgy returns

( 
  fgy[[1]],  # time axis of ODE solution
  fgy[[2]],  # births
  fgy[[3]],  # migrations
  fgy[[4]],  # deme sizes
  fgy[[5]]   # ODE solution
)
Plot for `x=fgy[[2]], y=fgy[[1]]`

I changed t.end to 1500 and it works ok for a single tree. Later when I run.smc, need to be more stringent up to about 1300 so you won't get negative probability.

gtng92 · 2017-06-28T16:46:38Z

Individual validation of each parameter:
Target tree parameters (true values):

theta <- c(t.end=50, N=5000, beta=0.1, gamma=1/520, mu=1/3640, alpha=0)

Configuration settings of priors:

> config$priors
$t.end
[1] "rnorm(n=1,mean=50,sd=5)"

$N
[1] "rnorm(n=1,mean=5000,sd=100)"

$beta
[1] "rgamma(n=1,shape=50,rate=1)"

$gamma
[1] "rgamma(n=1,shape=20,rate=1)"

$mu
[1] "rgamma(n=1,shape=20,rate=1)"

$alpha
[1] "rnorm(n=1,mean=1,sd=0.1)"

Kernel distances for varying N, using seq(100, 10100, 1000) # (from, to, step)

Kernel distances for varying beta using seq(0.01, 0.36, 0.025)

Kernel distances for varying gamma using seq(0.001, 0.08, 0.0025)

Kernel distances for varying mu using seq(0.0001, 0.01, 0.0005)

ArtPoon · 2017-06-28T17:38:49Z

There's something funny going on with the kernel scores here. Can you please check whether they're getting normalized correctly? I've run into a similar problem before, I'll dig around and make an issue.

…

On Jun 28, 2017, at 12:46 PM, Tammy Ng ***@***.***> wrote: Individual validation of each parameter: Target tree parameters (true values): theta <- c(t.end=50, N=5000, beta=0.1, gamma=1/520, mu=1/3640, alpha=0) Configuration settings of priors: > config$priors $t.end [ 1] "rnorm(n=1,mean=50,sd=5)" $N [ 1] "rnorm(n=1,mean=5000,sd=100)" $beta [ 1] "rgamma(n=1,shape=50,rate=1)" $gamma [ 1] "rgamma(n=1,shape=20,rate=1)" $mu [ 1] "rgamma(n=1,shape=20,rate=1)" $alpha [ 1] "rnorm(n=1,mean=1,sd=0.1)" Kernel distances for varying N, using seq(1000, 10000, 1000) # (from, to, step) Kernel distances for varying beta using seq(0.01, 0.36, 0.025) Kernel distances for varying gamma using seq(0.001, 0.08, 0.0025) Kernel distances for varying mu using seq(0.0001, 0.01, 0.0005) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

ArtPoon · 2017-06-28T19:08:03Z

Okay, now that I understand these are kernel distances I think we need to redo the analyses for mu and gamma to account for their low target values (0.00192 and 0.000274, respectively). Might also be worth applying a log-transform on the x-axis for these figures.

I also expect some of these parameters to be confounded, like beta and N, and mu and gamma. Eventually it will be informative to vary these pairs jointly and then look the distribution of kernel distances: see Rosemary's paper on contact networks and SMC-ABC for reference.

gtng92 · 2017-06-28T21:00:39Z

I've set gamma to 1/520. and mu to 1/3640. Should I switch the values?
norm.mode is set to 'NONE'

ArtPoon · 2017-06-28T21:48:22Z

Those target values are fine, but I would refine the range for simulations to evaluate against the target.

…

On Jun 28, 2017, at 5:00 PM, Tammy Ng ***@***.***> wrote: I've set gamma to 1/520. and mu to 1/3640. Should I switch the values? norm.mode is set to 'NONE' — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#42 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABDtUDzSCvsGtuCPaaVu9Yk03J4rOvTNks5sIr73gaJpZM4MDQzY>.

gtng92 · 2017-06-29T14:48:02Z

Kernel distances for varying N, for nsim=100 and seq(1000, 10000, 1000)
In Rosemary's paper, it says that parameter N and m had little to no identifiability with ABC. Is this for a different kind of scenario?

Kernel distances for beta for nsim=100 and seq(0.01, 0.28, 0.025)

Kernel distances for gamma for nsim=100 and seq(0.0005, 0.03, 0.0015)

ArtPoon · 2017-06-29T15:11:07Z

Rosemary's paper focuses on a very different kind of model (network based epidemic). However there are certainly similarities - it is a susceptible-infected process on a network. Actually her model found some signal for N, only I (number of infected) and N are confounded. At this point I think we should switch out emphasis to bringing in other measures of tree shape, to see if we can improve the performance of ABC-SMC. Let's plan it out at our meeting Monday.

…

On Jun 29, 2017, at 10:48 AM, Tammy Ng ***@***.***> wrote: Kernel distances for varying N, for nsim=100 and seq(1000, 10000, 1000) In Rosemary's paper, it says that parameter N and m had little to no identifiability with ABC. Is this for a different kind of scenario? Kernel distances for beta for nsim=100 and seq(0.01, 0.28, 0.025) Kernel distances for gamma for nsim=100 and seq(0.0005, 0.03, 0.0015) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

ArtPoon · 2017-06-29T15:13:13Z

By the way, plots are much improved, thanks for refining the range on beta and gamma.
Until Monday, can you proceed with running ABC-SMC to assess performance and also to identify usability problems?

gtng92 · 2017-06-29T15:30:47Z

Yep, I'll look for a theta that reproduces the negative probability error.

ArtPoon · 2017-07-03T14:19:04Z

Try using the following approach to examine the distribution of kernel score over two model parameters:

plot(x, y, cex=sqrt(z))  # where x and y are two parameters and z is kernel score/distance

You can also use the rgb argument to plug in kernel distance as a red, green or blue argument (or two simultaneously).

gtng92 · 2017-07-05T13:27:17Z

Am I doing this correctly?

ArtPoon · 2017-07-05T13:38:37Z

Well, the trick is that you need to vary both beta and N - I suggest doing a "grid search" where you examine all pairwise combinations of values {beta} x {N}.

…

On Jul 5, 2017, at 9:27 AM, Tammy Ng ***@***.***> wrote: Am I doing this correctly? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

MathiasRenaud · 2017-07-06T14:23:06Z

I have started a grid search script in pkg/examples/example-bd.R for birth-death model, which can be adapted to other models. Still working on the plot and testing.

gtng92 · 2017-07-11T16:08:40Z

This is the heatmap for N and beta with the different kernel distances. I can create a more specific one as well.
True value for beta is 0.1, and true value for N is 5000.

gtng92 · 2017-07-12T14:52:33Z

Heatmap for pairwise comparisons of gamma and mu.
True value for gamma is 0.00192, and true value for mu is 0.00027. It looks like the gamma values are consistently overestimated.

ArtPoon · 2017-07-17T14:43:08Z

There are a few things to try here:

Increase the resolution of numerical approximation for ODE (fgy.resol parameter) to reduce chance of numerical error causing negative probability.
Ignore simulations where negative probability error arises and re-attempt simulation.

ArtPoon · 2017-07-17T18:06:48Z

There's a bug in compartmental-models.R, deaths term should not appear in births expression; for example:

    births <- rbind(c('parms$beta * S * I / (S+I) - parms$gamma * I'))

should be

    births <- rbind(c('parms$beta * S * I / (S+I)'))

The only place that parms$gamma * I should appear is in the subsequent deaths expression.

ArtPoon · 2017-07-24T14:17:04Z

Correcting the births matrix specification did not resolve the problem. Currently when the pseudorandom number generator is fixed with a given seed, the sixth particle being updated in SMC runs into a "negative probability" error (presumably associated with coalescent simulation).

Post the parameter values of particle 6 here.
Examine population trajectories associated with the parameter values of particle 6. If F, G and Y are negative, we need to determine why. Do the parameter values make sense?

gtng92 · 2017-07-24T16:28:49Z

The parameter values for some actually didn't make sense. In a previous comment, I've set the target tree with theta of:

theta <- c(t.end=50, N=5000, beta=0.1, gamma=1/520, mu=1/3640, alpha=0)

Configuration settings of priors beta, gamma, and mu in particular were previously:

> config$priors
$beta
[1] "rgamma(n=1,shape=50,rate=1)"
$gamma
[1] "rgamma(n=1,shape=20,rate=1)"
$mu
[1] "rgamma(n=1,shape=20,rate=1)"

And now that I've changed them to:

> config$priors
$beta
[1] "rgamma(n=1,shape=1,rate=5)"
$gamma
[1] "rgamma(n=1,shape=1,rate=10)"
$mu
[1] "rgamma(n=1,shape=0.2,rate=50)"

I can now initialize.smc for 100 particles (so far as I've tested).
Thanks!

--Tammy Ng

gtng92 · 2017-08-17T16:05:22Z

Ran with parameters in commit 4d37ca4 executed with 500 particles finished in approximately 12.39 hours
Step 110 epsilon: 0.1023741 ESS: 187.2476 accept: 0.2204082 elapsed: 44595.8 s

Trajectories of the mean estimates of parameters in theta are able to be viewed in this file:
example-compartmental.pdf

In the plots it looks like there's still room to converge for parameters beta, gamma, and mu. But if I lower my accept rate, the runtime would take a while longer.

ArtPoon · 2017-08-21T14:16:21Z

Check whether collecting samples with invalid values of t.end is skewing the other parameter estimates by fixing t.end to the actual value and then comparing to the current set of results (where invalid t.end can result in negative probabilities in the coalescent simulation, but only throws a warning and not an exception).

ArtPoon · 2017-08-28T14:08:16Z

@gtng92 observed that fixing to actual t.end is producing the negative probability error, so let's try using a uniform distribution with a lower bound at actual t.end and upper bound some small amount above this value, and varying these bounds until we no longer get this error.

gtng92 · 2017-10-30T15:05:40Z

From Issue #120, I decided to implement the idea of incorporating a dummy tree upon an error encountered in rcolgem. From there I was able to run a grid-search over a wide range of varying parameters for all params in theta (except alpha, which isn't used in an sir.dynamic model --> still working on stripping that part out when it's irrelevant to the model)

True Params:

theta <- c(t.end=200, N=10000, beta=0.1, gamma=0.2, mu=0.01, alpha=5)

Grid-search Params:

t.end <- seq(50, 400, 50)
N <- seq(7000, 14000, 1000)
beta <- seq(0.01, 0.36, 0.05)
gamma <- seq(0.01, 0.36, 0.05)
mu <- seq(0.001, 0.071, 0.01)
alpha <- 5

Average of top 10 thetas with lowest kernel distances:

avg.theta <- c(t.end=215, N=10900, beta=0.12, gamma=0.235, mu=0.03451, alpha=5)

Average of top 20 thetas:

avg.theta <- c(t.end=217.5, N=10277.78, beta=0.115, gamma=0.23, mu=0.0365, alpha=5)

ArtPoon · 2017-10-30T15:13:46Z

Ok, careful about this though -- if there was no real difference among trees with respect to kernel score, one could arbitrarily pick 10 or 20 trees and the mean parameter estimates would still be around these values based on the way you've designed the grid. It will be more convincing when we see the actual distribution of distances over the grid. Thanks!

gtng92 · 2017-10-30T15:25:07Z

From viewing the distributions, varying t.end and N seem to have little to no effect on the kernel distance. beta and gamma in particular seemed the most estimable (agrees with what was estimable before).

It's a little hard to see the distribution in this plot, but I can show the distributions at the dev meeting with a 3D scatterplot that is rotatable.

gtng92 · 2017-11-06T15:41:18Z

Trajectory of Mean t.end seems to be improving, although admittedly my priors were relatively narrow. It does look better than the t.end trajectory in the PDF file linked in the comment from Aug 17, 2017. Parameter beta still seems to have a strange spike up to 40 iterations.

ArtPoon · 2017-11-06T15:42:57Z

I think we're in danger of fishing for results here. Let's see what distances are like using other metrics. Thanks!

gtng92 · 2017-11-06T20:52:12Z

To check, I widened the prior to a normal distribution with a mean of 300 and a standard deviation of 100. True value was 200.

ArtPoon · 2017-11-14T16:00:27Z

Deprecated by #123

ArtPoon changed the title ~~Validate method on birth-death trees~~ Validate method on birth-death or coalescent trees Feb 16, 2017

ArtPoon mentioned this issue Feb 16, 2017

Implement skyline coalescent simulator #43

Open

ArtPoon added this to the Implement and test standard model library milestone May 8, 2017

ArtPoon changed the title ~~Validate method on birth-death or coalescent trees~~ Validate ABC-SMC estimation for compartmental (SIR) models Jun 5, 2017

ArtPoon assigned gtng92 Jun 5, 2017

ArtPoon mentioned this issue Jun 5, 2017

Validate ABC-SMC parameter estimation on speciation models #62

Closed

gtng92 added a commit that referenced this issue Jun 8, 2017

Working on issue #42

68e45b5

--Tammy Ng

gtng92 added a commit that referenced this issue Jun 9, 2017

Converted simulated tree to phylo object issue #42

c4aef3f

--Tammy Ng

gtng92 added a commit that referenced this issue Jun 12, 2017

resolves issue #42 no more warnings

13e8a42

--Tammy Ng

gtng92 mentioned this issue Jul 14, 2017

Extra attributes in simulate.binary.dated.tree.fgy output uses a lot of space #89

Closed

gtng92 mentioned this issue Jul 27, 2017

Overestimating beta parameter value for compartmental (SIR) models #93

Closed

gtng92 added a commit that referenced this issue Aug 17, 2017

Working on issue#93 and #42

4d37ca4

--Tammy Ng

ArtPoon mentioned this issue Nov 6, 2017

MASTER Newick output tree errors #120

Closed

gtng92 mentioned this issue Nov 8, 2017

Comprehensive overview of past runs of SIR models using rcolgem vs MASTER frameworks #123

Closed

ArtPoon closed this as completed Nov 14, 2017

Validate ABC-SMC estimation for compartmental (SIR) models #42

Validate ABC-SMC estimation for compartmental (SIR) models #42

Comments

ArtPoon commented Feb 16, 2017

gtng92 commented Jun 8, 2017 • edited Loading

ArtPoon commented Jun 12, 2017

gtng92 commented Jun 16, 2017

gtng92 commented Jun 19, 2017

ArtPoon commented Jun 20, 2017

gtng92 commented Jun 21, 2017

gtng92 commented Jun 28, 2017 • edited Loading

ArtPoon commented Jun 28, 2017 via email

ArtPoon commented Jun 28, 2017

gtng92 commented Jun 28, 2017

ArtPoon commented Jun 28, 2017 via email

gtng92 commented Jun 29, 2017

ArtPoon commented Jun 29, 2017 via email

ArtPoon commented Jun 29, 2017

gtng92 commented Jun 29, 2017

ArtPoon commented Jul 3, 2017

gtng92 commented Jul 5, 2017

ArtPoon commented Jul 5, 2017 via email

MathiasRenaud commented Jul 6, 2017

gtng92 commented Jul 11, 2017

gtng92 commented Jul 12, 2017

ArtPoon commented Jul 17, 2017

ArtPoon commented Jul 17, 2017

ArtPoon commented Jul 24, 2017

gtng92 commented Jul 24, 2017

gtng92 commented Aug 17, 2017 • edited Loading

ArtPoon commented Aug 21, 2017

ArtPoon commented Aug 28, 2017

gtng92 commented Oct 30, 2017

ArtPoon commented Oct 30, 2017

gtng92 commented Oct 30, 2017

gtng92 commented Nov 6, 2017

ArtPoon commented Nov 6, 2017

gtng92 commented Nov 6, 2017

ArtPoon commented Nov 14, 2017

gtng92 commented Jun 8, 2017 •

edited

Loading

gtng92 commented Jun 28, 2017 •

edited

Loading

gtng92 commented Aug 17, 2017 •

edited

Loading