Performance improvement 3: ngram profiles #203

msm-code · 2022-12-23T01:29:35Z

I'm not super happy with this change, but it gives good results.

So what we have here is a heuristic to estimate how many files match for a given ngram. We prefer to start with queries that will return a smaller number of files (because there is a chance that we can "fail fast" and return from a sub query without doing all the work).

So this PR introduces a class called NgramProfile, that stores this information. This is additional 128MB of memory footprint per database, but it shouldn't matter too much in the real world (I hope).

TODO: maybe we should make this optional?
TODO: do we need a way to regenerate the profile?
TODO: measure the real world impact (on cold and warm RAM) and assess the results. It complicates the code significantly, so I think we need at least 25% speedup to consider merging it.

WIP on a very promising approach to query planning Close to a workable solution - ngram profiles Make it production ready Fix the comment location

msm-code changed the title ~~Fix/performance3 ngram profiles~~ Performance improvement 3: ngram profiles Dec 23, 2022

msm-code added 2 commits December 24, 2022 16:19

Squash the commits:

1462971

WIP on a very promising approach to query planning Close to a workable solution - ngram profiles Make it production ready Fix the comment location

Fix move ctor for database and accidental delete

6fe18ec

msm-code force-pushed the fix/performance3-ngram-profiles branch from b962ce7 to 6fe18ec Compare December 24, 2022 15:40

Fix empty profile

614d9fa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance improvement 3: ngram profiles #203

Performance improvement 3: ngram profiles #203

msm-code commented Dec 23, 2022 •

edited

Loading

Performance improvement 3: ngram profiles #203

Are you sure you want to change the base?

Performance improvement 3: ngram profiles #203

Conversation

msm-code commented Dec 23, 2022 • edited Loading

msm-code commented Dec 23, 2022 •

edited

Loading