Skip to content

Commit

Permalink
Add an option to turn off density-weighting
Browse files Browse the repository at this point in the history
When searching, Pagefind applies a heuristic that often works quite well
to boost pages with a higher density, i.e. a higher number of hits
divided by the number of words on the page. This is called "density
weighting".

In some instances, it is desirable, though, to just use the number of
hits directly, without dividing by the number of words on the page.

Let's support this via the search option `use_weighting`, which
default to `true` to maintain the current behavior.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
  • Loading branch information
dscho committed Jan 3, 2024
1 parent fd70ca9 commit b48ea1c
Show file tree
Hide file tree
Showing 5 changed files with 46 additions and 4 deletions.
12 changes: 12 additions & 0 deletions docs/content/docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,18 @@ const search = await pagefind.search("static", {
See [Sorting using the Pagefind JavaScript API](/docs/js-api-sorting/) for more details and functionality.
## Turning off density-weighting
By default, the results' weights are "density-weighted", i.e. the weights are calculated by counting the number of matches within a page divided by the page's word count. This density-weighting can be turned off:
{{< diffcode >}}
```js
const earch = await pagefind.search("term", {
+ use_weighting: false
});
```
{{< /diffcode >}}
## Re-initializing the search API
In some cases you might need to re-initialize Pagefind. For example, if you dynamically change the language of the page without reloading, Pagefind will need to be re-initialized to reflect this langauge change.
Expand Down
28 changes: 28 additions & 0 deletions pagefind/features/weighting.feature
Original file line number Diff line number Diff line change
Expand Up @@ -224,3 +224,31 @@ Feature: Word Weighting
Then There should be no logs
# Treat the bal value here as a snapshot — update the expected value as needed
Then The selector "p" should contain "weight:1/bal:82.28572/loc:4"

Scenario: Density weighting can be turned off
Given I have a "public/single-word.html" file with the body:
"""
<p>word</p>
"""
Given I have a "public/three-words.html" file with the body:
"""
<p>I have a word and a word and another word</p>
"""
When I run my program
Then I should see "Running Pagefind" in stdout
When I serve the "public" directory
When I load "/"
When I evaluate:
"""
async function() {
let pagefind = await import("/pagefind/pagefind.js");
let search = await pagefind.search(`word`);
let search2 = await pagefind.search(`word`, { use_weighting: false });
let counts = [search, search2].map(s => s.results.map(r => r.words.length));
document.querySelector('p').innerText = JSON.stringify(counts);
}
"""
Then There should be no logs
# With density weighting, single-word should be the first hit, otherwise three-words
Then The selector "p" should contain "[[1,3],[3,1]]"
4 changes: 2 additions & 2 deletions pagefind_web/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ pub fn filters(ptr: *mut SearchIndex) -> String {
}

#[wasm_bindgen]
pub fn search(ptr: *mut SearchIndex, query: &str, filter: &str, sort: &str, exact: bool) -> String {
pub fn search(ptr: *mut SearchIndex, query: &str, filter: &str, sort: &str, exact: bool, weighting: bool) -> String {
let search_index = unsafe { Box::from_raw(ptr) };

if let Some(generator_version) = search_index.generator_version.as_ref() {
Expand All @@ -225,7 +225,7 @@ pub fn search(ptr: *mut SearchIndex, query: &str, filter: &str, sort: &str, exac
let (unfiltered_results, mut results) = if exact {
search_index.exact_term(query, filter_set)
} else {
search_index.search_term(query, filter_set)
search_index.search_term(query, filter_set, weighting)
};
let unfiltered_total = unfiltered_results.len();
debug!({ format!("Raw total of {} results", unfiltered_total) });
Expand Down
3 changes: 2 additions & 1 deletion pagefind_web/src/search.rs
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,7 @@ impl SearchIndex {
&self,
term: &str,
filter_results: Option<BitSet>,
weighting: bool,
) -> (Vec<usize>, Vec<PageSearchResult>) {
debug!({
format! {"Searching {:?}", term}
Expand Down Expand Up @@ -318,7 +319,7 @@ impl SearchIndex {
.map(|BalancedWordScore { balanced_score, .. }| balanced_score)
.sum::<f32>()
/ 24.0)
/ page.word_count as f32;
/ (if weighting { page.word_count as f32 } else { 1.0 });

let search_result = PageSearchResult {
page: page.hash.clone(),
Expand Down
3 changes: 2 additions & 1 deletion pagefind_web_js/lib/coupled_search.ts
Original file line number Diff line number Diff line change
Expand Up @@ -390,6 +390,7 @@ class PagefindInstance {
verbose: false,
filters: {},
sort: {},
use_weighting: true,
...options,
};
const log = (str: string) => { if (options.verbose) console.log(str) };
Expand Down Expand Up @@ -443,7 +444,7 @@ class PagefindInstance {
// pointer may have updated from the loadChunk calls
ptr = await this.getPtr();
let searchStart = Date.now();
let result = this.backend.search(ptr, term, filter_list, sort_list, exact_search) as string;
let result = this.backend.search(ptr, term, filter_list, sort_list, exact_search, options.use_weighting) as string;
log(`Got the raw search result: ${result}`);
let [unfilteredResultCount, all_results, filters, totalFilters] = result.split(/:([^:]*):(.*)__PF_UNFILTERED_DELIM__(.*)$/);
let filterObj = this.parseFilters(filters);
Expand Down

0 comments on commit b48ea1c

Please sign in to comment.