-
Notifications
You must be signed in to change notification settings - Fork 0
How does 'srilm gen num ' work?
Idea was that it chooses the most probable word for given context whene generating sentence.
Actual implementation is different. LM.cc
lines 1097–1117 (LM::generateWord()
) are most important. The conditional probabilities are put in a row on [0,1]
interval and the word whose interval contains randomly generated number (from [0, 1)
) is chosen as continuation.
I didn't do any calculations (as the order of intervals isn't specified in general) but I estimate (guess) that that mean number of tested words is half the size of vocabulary. (There two extremes: (i) most probable words first (ii) most probable words last. (i) tests less words whereas (ii) could be very time consuming.)
Conclusion: this is not what we want for "nice" implementation of getAllPossibilities.