Skip to content

How does 'srilm gen num ' work?

Werkov edited this page Oct 11, 2011 · 2 revisions

Idea was that it chooses the most probable word for given context whene generating sentence.

Actual implementation is different. LM.cc lines 1097–1117 (LM::generateWord()) are most important. The conditional probabilities are put in a row on [0,1] interval and the word whose interval contains randomly generated number (from [0, 1)) is chosen as continuation.

I didn't do any calculations (as the order of intervals isn't specified in general) but I estimate (guess) that that mean number of tested words is half the size of vocabulary. (There two extremes: (i) most probable words first (ii) most probable words last. (i) tests less words whereas (ii) could be very time consuming.)

Conclusion: this is not what we want for "nice" implementation of getAllPossibilities.

Clone this wiki locally