Make use of Jena3 text index for better performance #273

osma · 2015-08-19T07:06:19Z

There are significant changes (implemented by Alexis Miara and myself) in the jena-text module of Jena 3.0.0 / Fuseki 1.3.0 / Fuseki 2.3.0. These include

support for storing language tags of literals and limiting queries to a specific language
support for storing full literal values in the index and accessing them at query time
support for deleting obsolete entries from the text index

These together enable a new way of using the text index from Skosmos:

Text queries could, in most cases, be limited to a specific language. This avoids false hits from the text index that would have to be filtered out using SPARQL, and should thus speed up queries, particularly the alphabetical display for large vocabularies.
Since the text index can return full literal values, there is less need to find out which literal value actually matched the query (using regular expressions or string matching functions, as is done currently). This should make text index related SPARQL queries both simpler and faster.
The uidField should be enabled, so that stale entries will be dropped from the text index. Currently the performance of text index related queries deteriorates slightly each time the vocabulary data is updated. This is probably due to stale entries. Cleaning them up should prevent this performance deterioration.

Text index related code in JenaTextSparql (and possibly GenericSparql) will need to be heavily rewritten. Luckily the new code should be simpler than the old one and we already have pretty good unit tests for this functionality, so it is easy to verify what works and what doesn't.

Text index configuration needs to be changed to enable the new features, and text indexes must then be rebuilt. Fuseki 1.3.0/2.3.0 will require Java 8 to be installed on servers, development machines and the Travis CI environment (where it should be available, but not used by default).

(Finto project note: this is a way of implementing FINTO-85: Tuki hyvin suurille tietovarannoille)

The text was updated successfully, but these errors were encountered:

osma · 2015-11-12T08:54:31Z

Started work on this in the jena3-text-index branch.

Travis tests are not currently working. Travis doesn't seem to provide an environment that would have both PHP and Java8 support. See travis-ci/travis-ci#4750

osma · 2015-11-12T11:30:09Z

Got the Travis tests working again by switching to the old, non-container-based Ubuntu 12.04 environment and installing Oracle Java 8 via the webupd8 repository installer. It's slow and inelegant (every test run downloads 180MB from Oracle) but works for the moment as a stopgap until we can switch to the Trusty environment, after the PHP issues are fixed by Travis.

osma · 2015-12-02T14:38:30Z

Merged the jena3-text-index branch to master. Still needs documentation and possible bugfixes.

osma · 2015-12-07T10:45:45Z

Documented in wiki: InstallFusekiJenaText and Upgrading

osma added enhancement performance size-large more than 2 days branch labels Aug 19, 2015

osma added this to the Next Tasks milestone Aug 19, 2015

osma mentioned this issue Aug 20, 2015

Duplicate results in alphabetical listing #275

Closed

osma mentioned this issue Sep 22, 2015

Same concept appears multiple times in search results #53

Closed

osma modified the milestones: Next Tasks, 1.4 Sep 22, 2015

osma mentioned this issue Oct 1, 2015

Search should find also terms with diacritics, e.g. 'deja vu' -> 'déjà vu' #313

Closed

osma self-assigned this Nov 12, 2015

osma added the needs documentation label Dec 2, 2015

osma mentioned this issue Dec 2, 2015

Global search broken #370

Closed

osma closed this as completed Dec 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make use of Jena3 text index for better performance #273

Make use of Jena3 text index for better performance #273

osma commented Aug 19, 2015

osma commented Nov 12, 2015

osma commented Nov 12, 2015

osma commented Dec 2, 2015

osma commented Dec 7, 2015

Make use of Jena3 text index for better performance #273

Make use of Jena3 text index for better performance #273

Comments

osma commented Aug 19, 2015

osma commented Nov 12, 2015

osma commented Nov 12, 2015

osma commented Dec 2, 2015

osma commented Dec 7, 2015