FILTER keyword #129

rogerlucena · 2020-08-07T23:44:02Z

It would be very useful to have in BQL a FILTER keyword that could allow us to filter out part of the results of a query in a level closer to the storage (closer to the driver), improving performance.

Some of the functionalities that this FILTER keyword could accept:

Filtering only the triples with immutable predicates for our query result.

The syntax for that could be something like:

FILTER isImmutable(?p)

Coming inside or after the WHERE clause to specify that the predicate bound to ?p in our query should be immutable.

Another function isTemporal could work similarly.

This could come as a solution for what was asked in the Issue #115.

Filtering only the triples with the latest time anchor for our query result.

It is pretty common to be interested only in the latest triple of a time series. Instead of getting all the triples, sorting them by the time anchor in decreasing order and limiting the result to 1, as one may be doing nowadays, we could just use FILTER to do that in a much less expensive way using a syntax like:

FILTER latest(?p)

This is a pretty common use case already highlighted by the Issues #86 and #85.

It also opens the possibility for supporting the opposite: filtering only the earliest time anchor, as illustrated below.

FILTER earliest(?p)

One could also decide for a syntax that uses directly the time binding for filtering, like in:

FILTER latest(?date)

Allow using regular expressions for matching.

We could use regex for filtering too. The syntax could be something like:

FILTER match(?obj, "ab+"^^type:text)

Which also resonates with what was asked by the Issue #122.

Filtering to satisfy comparisons (evaluated as boolean conditions).

For example:

FILTER greaterThan(?obj, "37"^^type:int64)

With the functions lowerThan and equal it should be analogous.

Filtering to satisfy a combination of functions.

For example, one could write something like:

FILTER latest(lowerThan(?date, 2005-01-02T15:04:05.999999999Z07:00))

To get in the query result only the latest element of a time series while also restricting the time interval to be before a given date.

Another approach for this would be building a function like:

FILTER latestBeforeUpperBound(?date, 2005-01-02T15:04:05.999999999Z07:00)

Others.

Other ideas for filtering functions could be the likes of:

FILTER isToday(?date)

That would compare a binding with a value extracted during runtime (the current day in this example).

These above are just some examples. The FILTER keyword could open space for a number of other functionalities in the future, as we discover new ones that could be handy and implement them as functions for filtering (just like the functions isImmutable and latest above).

The idea is for the FILTER functions all have a signature like below:

FILTER myFunction(?binding, <value>)

With the <value> argument above being optional (depending on the function it is not necessary, isImmutable does not require it for example).

This way, when adding a new function no new changes will be necessary inside the parser or inside lookupOptions (that communicates with the driver, defined in storage.go). All the FILTER functions should be mapped to three variables there: operation, field (for the binding or its position in the clause) and value.

For other general ideas, one could get inspiration from the SPARQL's FILTER keyword.

N.B.: Note how this FILTER keyword differ from the HAVING: the FILTER would work closer to the storage/driver level to improve query performance while filtering the results, while the HAVING would work focusing on aggregated data in a higher level farther from the driver (as when using functionalities such as sum and count to write your HAVING conditions).

The text was updated successfully, but these errors were encountered:

rogerlucena · 2020-11-18T22:00:55Z

At the moment, BadWolf already supports the following FILTER functions:

latest: FILTER keyword for latest anchor queries (grammar/lexer/hooks) #149 and FILTER keyword for latest anchor queries (planner/memory) #150;
isImmutable: FILTER isImmutable #153;
isTemporal: FILTER isTemporal #154.

rogerlucena mentioned this issue Sep 21, 2020

FILTER keyword for latest anchor queries #146

Closed

This was referenced Oct 5, 2020

FILTER keyword for latest anchor queries (grammar/lexer/hooks) #149

Merged

FILTER keyword for latest anchor queries (planner/memory) #150

Merged

thiagovas assigned rogerlucena Oct 6, 2020

thiagovas added the feature request label Oct 6, 2020

This was linked to pull requests Oct 6, 2020

FILTER keyword for latest anchor queries (grammar/lexer/hooks) #149

Merged

FILTER keyword for latest anchor queries (planner/memory) #150

Merged

thiagovas closed this as completed in #149 Oct 9, 2020

This was referenced Oct 9, 2020

Allow more elaborate patterns (like specifying suffix/prefix) for bindings inside HAVING clauses #122

Open

FILTER isImmutable #153

Merged

FILTER isTemporal #154

Merged

thiagovas reopened this Oct 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FILTER keyword #129

FILTER keyword #129

rogerlucena commented Aug 7, 2020 •

edited

Loading

rogerlucena commented Nov 18, 2020

FILTER keyword #129

FILTER keyword #129

Comments

rogerlucena commented Aug 7, 2020 • edited Loading

rogerlucena commented Nov 18, 2020

rogerlucena commented Aug 7, 2020 •

edited

Loading