Skip to content

Commit

Permalink
facet search
Browse files Browse the repository at this point in the history
  • Loading branch information
stereobooster committed Nov 20, 2023
1 parent faab0c1 commit 1eb8b70
Show file tree
Hide file tree
Showing 8 changed files with 83 additions and 100 deletions.
46 changes: 2 additions & 44 deletions notes/README.md
Original file line number Diff line number Diff line change
@@ -1,47 +1,5 @@
# Inverted index experiment

## Idea

- diffrent type of indexes:
- Inverted index based on Map
- Inverted index based on TrieMap
- Full-text index based on different implemntations, like MiniSearch, Fuse, Flexsearch etc.
- except ids, it can return order of items (relevance)
- except ids, it can return `matches`
- ids from index can be returned as BitSet (preferable) or as array
- indexes can be combined with `union` (`or`), `intersection` (`and`)
- other features:
- `sort` - simple array sort
- `filter` - simple array filter
- pagination - simple array slice
- facets
- categorical facets
- collect all values, count, sort by frequency
- numerical facets
- min, max
- memoize one
- should help with pagination and sorting
- schema to use different types for different indexes: number, string, string[]
- id field
- date fields
- or rather put type in agregation
- query as object `(and (eq index field) (eq index field) (or ...))`

## TODO

- benchmark performance
- https://benchmarkjs.com/
- benchmark memory
- https://www.valentinog.com/blog/node-usage/

## data structure

- set of integers (BitSet, Roaring Bitmaps)
- sorted set
- bag of integers
- https://sair.synerise.com/efficient-integer-pairs-hashing/ ?
- sorted bag

## Intro

Just some experiments with inverted index and related subjects.
Expand All @@ -59,7 +17,7 @@ Options which I can think of:
- Modern JS
- Dictionary - `Map`
- Set - `Set`
- Relate [missing set operations](https://exploringjs.com/impatient-js/ch_sets.html#missing-set-operations)
- Related [missing set operations](https://exploringjs.com/impatient-js/ch_sets.html#missing-set-operations)
- Less naive approach
- Dictionary - some kind of TrieMap
- https://lucaong.github.io/minisearch/classes/SearchableMap_SearchableMap.SearchableMap.html
Expand All @@ -68,7 +26,7 @@ Options which I can think of:
- https://github.com/Sec-ant/trie-map
- https://github.com/mattbierner/hamt
- https://github.com/scttdavs/radix-trie
- Relate https://towardsdatascience.com/the-pruning-radix-trie-a-radix-trie-on-steroids-412807f77abc
- Related https://towardsdatascience.com/the-pruning-radix-trie-a-radix-trie-on-steroids-412807f77abc
- Set
- https://github.com/lemire/FastBitSet.js/
- https://github.com/lemire/TypedFastBitSet.js/
Expand Down
2 changes: 0 additions & 2 deletions packages/demo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,6 @@ Data copied from [algolia/datasets](https://github.com/algolia/datasets/tree/mas

## TODO

- [ ] filter out 0 frequency facets
- [ ] for prices I only need `min` and `max`, I don't need to intersect all facets
- [ ] move index to worker?

### Snippeting doesn't work
Expand Down
15 changes: 11 additions & 4 deletions packages/demo/src/search.ts
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,9 @@ const schema = {
// },
price: {
type: "number",
facet: true,
facet: {
showZeroes: true,
},
},
image: {
type: "string",
Expand All @@ -77,7 +79,9 @@ const schema = {
},
rating: {
type: "number",
facet: true,
facet: {
showZeroes: true,
},
},
// TODO: sort by popularity by default?
popularity: {
Expand All @@ -96,7 +100,10 @@ const schema = {

const data = await fetch("/records.json").then((x) => x.json());
// if there would be facets client as webworker e.g. asyncrhonious it would need separate adapter
const index = new Facets({ textIndex: TQuickscoreIndex, schema, idKey: "objectID" }, data);
const index = new Facets(
{ textIndex: TQuickscoreIndex, schema, idKey: "objectID" },
data
);
const searchClient = getSearchClient(index);
const search = instantsearch({
searchClient,
Expand All @@ -105,7 +112,7 @@ const search = instantsearch({
insights: false,
future: {
preserveSharedStateOnUnmount: true,
}
},
});

search.addWidgets([
Expand Down
17 changes: 11 additions & 6 deletions packages/facets-instantsearch/src/adaptRequest.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,16 @@ export function adaptRequest<S extends Schema>(
request: MultipleQueriesQuery,
schema: Schema
): SearchOptions<S> {
const highlight =
request.params?.highlightPostTag && request.params?.highlightPreTag
? {
start: request.params.highlightPreTag,
end: request.params.highlightPostTag,
key: "_highlightResult",
subKey: "value",
}
: undefined;

return {
query: request.params?.query,
page: request.params?.page,
Expand All @@ -14,12 +24,7 @@ export function adaptRequest<S extends Schema>(
...adaptFacetFilters(request.params?.facetFilters as any, schema),
...adaptNumericFilters(request.params?.numericFilters as any),
} as any,
highlight: {
start: "__ais-highlight__",
end: "__/ais-highlight__",
key: "_highlightResult",
subKey: "value",
},
highlight,
};
}

Expand Down
23 changes: 10 additions & 13 deletions packages/facets-instantsearch/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ export function getSearchClient<S extends Schema, I extends Item<S>>(
adaptResponse(
index.search(adaptRequest(request, index.config().schema)),
request.params?.query || "",
index.config().idKey,
index.config().idKey
)
),
}) as any,
Expand All @@ -40,23 +40,20 @@ export function getSearchClient<S extends Schema, I extends Item<S>>(
exhaustiveFacetsCount: true,
facetHits: index
.facet(
querie.params.facetName,
adaptRequest(
{ ...(querie.params as any), perPage: -1 },
index.config().schema
)
{
field: querie.params.facetName,
query: querie.params.facetQuery,
perPage:
querie.params.maxFacetHits || querie.params.maxValuesPerFacet,
},
adaptRequest(querie.params as any, index.config().schema)
)
.items.filter(
([x]) =>
x &&
(x as string).toLowerCase().startsWith(querie.params.facetQuery)
)
.map(
.items.map(
([value, count]) =>
({
value,
highlighted: value,
count,
highlighted: value,
} as FacetHit)
),
}))
Expand Down
18 changes: 9 additions & 9 deletions packages/facets/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,14 +24,13 @@ The demo works, as you can see, but beyond that, there was no exhaustive testing

## TODO

- default values for facet by type
- number
- sort by value asc
- `showZeroes=false`
- [initialization](/notes/initialization.md)
- facet request
- InstantSearch sometimes makes requests like this `facets: "price", hitsPerPage: 0, ​maxValuesPerFacet: 10`
- in this case there is no need to iterate over all facets (`facets: "price"`)
- it needs to respect `​maxValuesPerFacet`
- search for facets
- right now search for facet is done outside. Making it inside would allow to do fewer intersections
- plus it would open possibility to use to use TrieMap for search
- `​maxValuesPerFacet`
- `facets: "price"`
- numeric filter
- support `>` (not just `>=`), `<` (not just `<=`)
- support multiple ranges e.g `[{ from, to }, { eq }, { neq }]`
Expand All @@ -40,5 +39,6 @@ The demo works, as you can see, but beyond that, there was no exhaustive testing
- warn if people try to use text search without providing text index
- [benchmarks](https://github.com/tinylibs/tinybench)
- performance seems to be good (except numeric range filter), but in order to be sure we need to do benchmark
- I interested if using TrieMap would allow to save memory

- I'm curious if using TrieMap would allow to save memory
- benchmark memory
- https://www.valentinog.com/blog/node-usage/
54 changes: 36 additions & 18 deletions packages/facets/src/Facets.ts
Original file line number Diff line number Diff line change
Expand Up @@ -155,6 +155,12 @@ export type SearchResults<S extends Schema, I extends Item<S>> = {
facets: FacetResults<S>;
};

export type FacetOptions<S extends Schema> = {
field: keyof S;
query?: string;
perPage?: number;
};

export class Facets<S extends Schema, I extends Item<S>> {
#config: FacetsConfig<S>;
// @ts-expect-error it is assigned later
Expand Down Expand Up @@ -245,14 +251,17 @@ export class Facets<S extends Schema, I extends Item<S>> {
};
}

facet(filed: string, options?: SearchOptions<S>): FacetResult {
facet(
{ field, perPage, query }: FacetOptions<S>,
options?: SearchOptions<S>
): FacetResult {
const { idsByText, facetFilterInternal } = this.#searchAll(options);
return this.#getFacet(
filed,
field as string,
facetFilterInternal,
idsByText,
options?.page,
options?.perPage
query,
perPage
);
}

Expand Down Expand Up @@ -372,16 +381,16 @@ export class Facets<S extends Schema, I extends Item<S>> {
field: string,
facetFilter: FacetFilterInternal | undefined,
textSearch: SparseTypedFastBitSet | undefined,
page?: number,
facetQuery?: string,
perPage?: number
) {
const ff = this.#getFullFacets();

page = page || 0;
const page = 0;
// @ts-expect-error fix later
perPage = perPage || this.#config.schema[field].facet?.perPage || 20;
// @ts-expect-error fix later
const showZeroes = this.#config.schema[field].facet?.showZeroes || true;
const showZeroes = this.#config.schema[field].facet?.showZeroes || false;
const selectedFirst =
// @ts-expect-error fix later
this.#config.schema[field].facet?.selectedFirst || false;
Expand All @@ -398,19 +407,28 @@ export class Facets<S extends Schema, I extends Item<S>> {
resultSet = textSearch;
}

let newFacet = resultSet
? ff[field].map(([x, , z]) => {
const zz = z.new_intersection(resultSet!);
return [x, zz.size(), zz] as [
SupportedFieldTypes,
number,
SparseTypedFastBitSet
];
})
: ff[field];
let newFacet = ff[field];

if (this.#config.schema[field].type === "string" && facetQuery) {
// this can be filtered with TrieMap
newFacet = newFacet.filter(([x]) =>
(x as string).toLowerCase().startsWith(facetQuery)
);
}

if (resultSet) {
newFacet = newFacet.map(([x, , z]) => {
const zz = z.new_intersection(resultSet!);
return [x, zz.size(), zz] as [
SupportedFieldTypes,
number,
SparseTypedFastBitSet
];
});
}

if (resultSet && !showZeroes) {
newFacet = newFacet.filter(([x, y]) => y !== 0 || selected.includes(x));
newFacet = newFacet.filter(([x, y]) => y !== 0 || selected?.includes(x));
}

const sortConfig = this.#sortConfigForFacet(field);
Expand Down
8 changes: 4 additions & 4 deletions packages/facets/tests/Facets.facet.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,8 @@ describe("Facets facets", () => {
]);
expect(result.facets.category.items).toEqual([
["ca", 1],
["cb", 0],
[null, 0],
// ["cb", 0],
// [null, 0],
]);
});

Expand All @@ -84,7 +84,7 @@ describe("Facets facets", () => {
const result = t.search({ facetFilter: { category: [null] } });
expect(result.facets.brand.items).toEqual([
["ba", 1],
["bb", 0],
// ["bb", 0],
]);
expect(result.facets.category.items).toEqual([
["ca", 3],
Expand Down Expand Up @@ -115,7 +115,7 @@ describe("Facets facets", () => {
expect(result.items.map((x) => x.price)).toEqual([10, 2, 10]);
expect(result.facets.brand.items).toEqual([
["ba", 3],
["bb", 0],
// ["bb", 0],
]);
expect(result.facets.category.items).toEqual([
["ca", 1],
Expand Down

0 comments on commit 1eb8b70

Please sign in to comment.