Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement 3: ngram profiles #203

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

msm-code
Copy link
Contributor

@msm-code msm-code commented Dec 23, 2022

I'm not super happy with this change, but it gives good results.

So what we have here is a heuristic to estimate how many files match for a given ngram. We prefer to start with queries that will return a smaller number of files (because there is a chance that we can "fail fast" and return from a sub query without doing all the work).

So this PR introduces a class called NgramProfile, that stores this information. This is additional 128MB of memory footprint per database, but it shouldn't matter too much in the real world (I hope).

TODO: maybe we should make this optional?
TODO: do we need a way to regenerate the profile?
TODO: measure the real world impact (on cold and warm RAM) and assess the results. It complicates the code significantly, so I think we need at least 25% speedup to consider merging it.

@msm-code msm-code changed the title Fix/performance3 ngram profiles Performance improvement 3: ngram profiles Dec 23, 2022
WIP on a very promising approach to query planning

Close to a workable solution - ngram profiles

Make it production ready

Fix the comment location
@msm-code msm-code force-pushed the fix/performance3-ngram-profiles branch from b962ce7 to 6fe18ec Compare December 24, 2022 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant