Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add binomial entropy and kl #149

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

add binomial entropy and kl #149

wants to merge 2 commits into from

Conversation

alicanb
Copy link
Collaborator

@alicanb alicanb commented Jun 26, 2018

This is a larger PR than I intended but basically it adds binomial entropy and binomial-poisson and binomial-geometric KL with some helper functions:

  • binomial._log1pmprobs: I used this a lot so I made it a separate function. it calculates
    (-probs).log1p() safely.
  • binomial._Elnchoosek(): for x~Bin(n, p), this calculates E[log(nchoosek)], E[log(n!)], E[log(x!)], E[log((n-x)!)]

@@ -77,6 +77,11 @@ def probs(self):
def param_shape(self):
return self._param.size()

def _log1pmprobs(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is a function for internal use, I think this can be moved to the top, like in MVN. Something like:

def _log1pmtensor(tensor):
    # Do the same thing

Uses of the function in kl.py can be done via importing this function along with Binomial.

@@ -109,3 +111,27 @@ def enumerate_support(self):
values = values.view((-1,) + (1,) * len(self._batch_shape))
values = values.expand((-1,) + self._batch_shape)
return values

def _Elnchoosek(self):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idea here.

s = self.enumerate_support()
s[0] = 1 # 0! = 1
# x is factorial matrix i.e. x[k,...] = k!
x = torch.cumsum(s.log(), dim=0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x is the log of factorial matrix right?

indices[0] = torch.arange(x.size(0) - 1, -1, -1,
dtype=torch.long, device=x.device)
# x[tuple(indices)] is x reversed on first axis
lnchoosek = x[-1] - x - x[tuple(indices)]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think x.flip(dim=0) will exhibit same behaviour.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird, I tried using flip and it didn't work before- maybe I messed with arguments...

elognfac = x[-1]
elogkfac = ((lnchoosek + s * self.logits + self.total_count * self._log1pmprobs()).exp() *
x).sum(dim=0)
elognmkfac = ((lnchoosek + s * self.logits + self.total_count * self._log1pmprobs()).exp() *

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E[log(n-k)!] = E[log k!] but for Bin(n, (1 - p)). Can we use this fact here?

inf_idxs = p.total_count > q.total_count
kl[inf_idxs] = _infinite_like(kl[inf_idxs])
return kl


@register_kl(Binomial, Poisson)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heterogeneous combinations were placed below. This section was for homogeneous combinations.

q.rate)


@register_kl(Binomial, Geometric)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment.

@@ -273,6 +290,11 @@ def _kl_geometric_geometric(p, q):
return -p.entropy() - torch.log1p(-q.probs) / p.probs - q.logits


@register_kl(Geometric, Binomial)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment.

Copy link

@vishwakftw vishwakftw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments have been given. Please check them.

Could you check if the KL test passes with lower tolerance, and how much time it takes in the default tolerance setting?

Copy link

@fritzo fritzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these!

@alicanb
Copy link
Collaborator Author

alicanb commented Jun 26, 2018

@vishwakftw thanks for the comments! One thing I want us to work out before wrapping this up is an approximation to E[logk!] for large n, I tried Stirling's but couldn't come up with a closed form. Any ideas?

@vishwakftw
Copy link

vishwakftw commented Jun 26, 2018

I think we have to make use of Stirling's inequality and the Taylor series to compute this. I guess the reason you are unable to come up with a closed form is because of the log (k) term.

I tried using them, and got about 0.5% relative error.

image

This might help after the expansion of log k! <= 1 + klog k + 0.5 log k - k

<source: wikipedia: https://en.wikipedia.org/wiki/Taylor_expansions_for_the_moments_of_functions_of_random_variables>

Copy link

@vishwakftw vishwakftw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!! @fritzo what do you think?

@vishwakftw
Copy link

Also, are you going to try the large n approximation using Stirling and Taylor expansions? @alicanb

@alicanb
Copy link
Collaborator Author

alicanb commented Jun 27, 2018

@vishwakftw btw I tried it with 0.01 precision as well. 2 things on my wishlist:

  • large n approximation for _Elnchoosek
  • KL(Bin(N,p)|Bin(M,p)) where M>N. Although we can calculate this expensively, making it work for batch is hard... Maybe it doesn't worth the effort.

@vishwakftw
Copy link

@alicanb I have a closed form solution for E[log x!], E[log (n - x)!] and E[log n!] (this is simply log n!) for large n.

@alicanb
Copy link
Collaborator Author

alicanb commented Jun 27, 2018

Great, have you experimented with any large n? n=30 seems not large enough for KL(Bin|Geom) for me with 0.1 precision.

@vishwakftw
Copy link

This is the gist for the approximations.

I ran some tests: n = {10, 20, 50, 75, 100} and p = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
Max relative error: 0.198 (n = 10, p = 0.9) and min relative error: 0.00025 (n = 100, p = 0.1). This is for E[log(n - x)!]

@alicanb
Copy link
Collaborator Author

alicanb commented Jun 27, 2018

btw lgamma(n * (1-p) + 1) + 0.5 * polygamma(1,n * (1-p) + 1) * n * p * (1-p) is a pretty good approximation even for small n, but it's non-differentiable we don't have polygamma(2,x)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants