Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start adding GRW RV Op and distribution #5298

Merged
merged 13 commits into from
Apr 3, 2022
Merged

Start adding GRW RV Op and distribution #5298

merged 13 commits into from
Apr 3, 2022

Conversation

canyon289
Copy link
Member

Starting to add time series support to v4 on step at a time (See what I did there?)

Scope of this PR is just to add the random variable, not the distribution as this might take me a while and I'll likely have a lot of basic questions

#5290

@canyon289 canyon289 added the v4 label Dec 31, 2021
@ricardoV94
Copy link
Member

ricardoV94 commented Dec 31, 2021

Code inside the rng_fn should be pure non-symbolic numpy/python.

Anyway the idea with aeppl is that we don't need to build a new RandomVariable for every possible distribution, but use Aesara directly (as you are doing with that scan) to generate random draws.

Similarly, instead of needing a custom logp method for every distribution we use aeppl to generate the logp graph, given the corresponding Aesara random generating graph.

The thing that is needed is to make pm.GaussianRandomWalk return the right Aesara random generating graph. We still have to figure out the right API for this type of "distributions'. I have a WIP proposal in #5169 under the name of DerivedDistribution (feedback and better name appreciated)

@ricardoV94
Copy link
Member

ricardoV94 commented Dec 31, 2021

Also to illustrate the point of this approach, the only thing that is needed to create any other random walk is to change the distribution used in that scan graph

rv, updates = aesara.scan(
  fn=lambda prev_value: pm.StudentT.dist(prev_value + mu, sigma, nu),
  outputs_info=np.array(0.0),
  n_steps=size,
)
# ignoring: updates have to be managed differently when using RandomVariables via `.dist`

Instead of having to create them one by one we can offer all at once in a pm.RandomWalk by letting users specify which base distribution to use as one of the arguments.

my_srw = pm.RandomWalk("my_srw", dist=pm.StudentT.dist(mu, sigma, nu), size=10)

pm.GaussianRandomWalk can then be a short convenience function that calls pm.RandomWalk(name, dist=pm.Normal.dist(mu, sigma), size=size) under the hood.

@canyon289
Copy link
Member Author

These are great pointers thank you!

@canyon289
Copy link
Member Author

canyon289 commented Dec 31, 2021

@ricardoV94, reviewed the other prs and thought through this some. Do you mind if I learn how to implement a "fixed" GRW RV first in this PR, get that working, then follow up with a RW RV generalization? Thatll help me break up unknown bits that I need to learn into smaller unknown bits for me

@ricardoV94
Copy link
Member

@ricardoV94, reviewed the other prs and thought through this some. Do you mind if I learn how to implement a "fixed" GRW RV first in this PR, get that working, then follow up with a RW RV generalization? Thatll help me break up unknown bits that I need to learn into smaller unknown bits for me

If you want to take it in small steps I think limiting yourself to the case of Gaussian innovations might help. I would still suggest you sidestep writing the RandomVariable altogether and try to use the WIP DerivedDistribution from my branch.

Otherwise you'll just be practicing writing traditional PyMC distributions. That can be useful for temporary backwards compatibility of V4, but it will have a smaller impact than understanding how to make use of aeppl's flexibility.

@canyon289
Copy link
Member Author

Great thanks for this advice, I'll start working off of that branch

@codecov
Copy link

codecov bot commented Jan 1, 2022

Codecov Report

Merging #5298 (3af1b59) into main (02897d1) will increase coverage by 0.23%.
The diff coverage is 90.76%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5298      +/-   ##
==========================================
+ Coverage   88.58%   88.82%   +0.23%     
==========================================
  Files          75       75              
  Lines       13716    13736      +20     
==========================================
+ Hits        12151    12201      +50     
+ Misses       1565     1535      -30     
Impacted Files Coverage Δ
pymc/distributions/timeseries.py 47.31% <90.76%> (+24.42%) ⬆️

@canyon289
Copy link
Member Author

Why is precommit failing here? Same commit, different results locally than in Ci?

image

@canyon289 canyon289 changed the title [WIP] Start adding GRW random variable Start adding GRW random variable Jan 2, 2022
@canyon289
Copy link
Member Author

canyon289 commented Jan 2, 2022

@ricardoV94 Let me know if you like this GRW RV implementation.

Per our conversation for now we'll

  1. Skip init distribution
  2. Restrict innovations just to Gaussians to get something working
  3. Implement an old style distribution

Next steps for me to to create the GRW Distribution in old style

After thats working I'll work on

  1. Switching to Derived dist with more aeppl and aesara integration
  2. Generalize distributions
  3. Figure out init distribution (if we want it)

@ricardoV94
Copy link
Member

Are you planning to do this all in one PR or separate ones?

@canyon289
Copy link
Member Author

canyon289 commented Jan 3, 2022

Are you planning to do this all in one PR or separate ones?

Separate ones ideally

i updated the comment to make it more clear what my plan is, let me know if you think its a reasonable course of action

@ricardoV94
Copy link
Member

Are you planning to refactor the PyMC distribution to work with the explicit RV from this PR?

@canyon289 canyon289 changed the title Start adding GRW random variable Start adding GRW RV Op and distribution Jan 6, 2022
pymc/distributions/timeseries.py Outdated Show resolved Hide resolved
pymc/distributions/timeseries.py Outdated Show resolved Hide resolved
pymc/distributions/timeseries.py Outdated Show resolved Hide resolved
pymc/distributions/timeseries.py Outdated Show resolved Hide resolved
pymc/distributions/timeseries.py Outdated Show resolved Hide resolved
pymc/distributions/timeseries.py Outdated Show resolved Hide resolved
Copy link
Member

@ricardoV94 ricardoV94 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small tweak to comments

pymc/distributions/timeseries.py Outdated Show resolved Hide resolved
pymc/distributions/timeseries.py Outdated Show resolved Hide resolved
pymc/distributions/timeseries.py Outdated Show resolved Hide resolved
@ricardoV94
Copy link
Member

There's some indentation error going on

@ricardoV94 ricardoV94 modified the milestones: v4.0.0b7, v4.0.0 Apr 1, 2022
@canyon289
Copy link
Member Author

Looks like the init dist is causing some test failure. I can look into this weekend, next 72 hours or so

https://github.com/pymc-devs/pymc/runs/5788430816?check_suite_focus=true#step:7:411
image

@canyon289
Copy link
Member Author

Alright well this passes except black which I think is broken, and mypy which Im pretty sure is broken, and flaky codecov. Should I just merge and those get fixed up in other prs?

canyon289 and others added 13 commits April 3, 2022 11:37
Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>
Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>
Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>
Co-authored-by: Ricardo Vieira <28983449+ricardoV94@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants