Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "lexical" and "natural" sort strategies for notation codes #1205

Merged
merged 8 commits into from
Sep 16, 2021

Conversation

osma
Copy link
Member

@osma osma commented Sep 14, 2021

This PR makes it possible to select either "lexical" or "natural" sort strategy for sorting notation codes (see #937 for more background). The sorting is implemented both on the PHP and JS sides. The skosmos:sortByNotation configuration setting, which previously was just a boolean value, can now given a literal value to select the appropriate sort strategy:

skosmos:sortByNotation "false";    # disabled, i.e. sort by labels, not notations
skosmos:sortByNotation "lexical";  # lexical sorting
skosmos:sortByNotation "natural";  # natural sorting
skosmos:sortByNotation "true";     # default sorting i.e. decimal, as it's the most common

TODO:

  • more testing - it seems that sorting by notation is not always working on the JS side because notations are missing for the jsTree concept objects (fixed, the problem was in Finto layout footer.inc)
  • more unit tests to check that the sort order is as expected
  • update wiki documentation (Configuration)

Fixes #937

@osma osma added this to the 2.12 milestone Sep 14, 2021
@osma osma self-assigned this Sep 14, 2021
@codecov
Copy link

codecov bot commented Sep 14, 2021

Codecov Report

Merging #1205 (e53e82b) into master (4b4359d) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1205      +/-   ##
============================================
- Coverage     68.26%   68.24%   -0.03%     
- Complexity     1609     1613       +4     
============================================
  Files            32       32              
  Lines          3955     3958       +3     
============================================
+ Hits           2700     2701       +1     
- Misses         1255     1257       +2     
Impacted Files Coverage Δ
model/Concept.php 80.95% <100.00%> (ø)
model/ConceptProperty.php 96.42% <100.00%> (-3.58%) ⬇️
model/ConceptPropertyValue.php 86.25% <100.00%> (-0.50%) ⬇️
model/VocabularyConfig.php 93.64% <100.00%> (+0.10%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4b4359d...e53e82b. Read the comment docs.

@@ -33,12 +33,7 @@ public function __construct($model, $vocab, $resource, $prop, $clang = '')

public function __toString()
{
$label = is_string($this->getLabel()) ? $this->getLabel() : $this->getLabel()->getValue();
if ($this->vocab->getConfig()->sortByNotation()) {
$label = ltrim($this->getNotation() . ' ') . $label;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the notation here doesn't seem to have any effect on the sort order so I just removed it.

*/
public function sortByNotation()
public function getSortByNotation(): ?string
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed the method for better consistency with other getters, and redefined its return value.

// Natural sort from: http://stackoverflow.com/a/15479354/3894569
// adapted to return only -1 or 1 using negVsPos function above
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the documentation of the jsTree sort plugin, it expects that the return value of the sort function is either -1 or 1 (anything other than -1 will be interpreted as 1, AFAICT). To make sure that the return values meet those expectations, I added a coercing function.

@osma
Copy link
Member Author

osma commented Sep 15, 2021

it seems that sorting by notation is not always working on the JS side because notations are missing for the jsTree concept objects

Finally found the cause of the missing notation information: the footer.inc customizations in the Finto layout were overriding key JS methods, and they were not in sync with changes that happened in the original Skosmos JS code (that they are based on) a long time ago. Fixed in this Finto-data commit. Now the notation information is available and sorting by notation code works (both for decimal and natural sort order) in the hierarchy tree.

However, it turns out that the natural sort order is a bit different for PHP vs JS. On the PHP side, numbers with leading zeros are always first (PHP uses Martin Pool's algorithm) The current JS implementation of naturalCompare ignores leading zeros. The differences can be seen in this screenshot, where the notation codes 33.03 and 33.04 are placed differently in the hierarchy sidebar (JS) and the concept view (PHP):

image

Not sure if this is a big issue - the example is a bit artificial, because YKL/PLC shouldn't use natural sorting but decimal, where this doesn't happen. And for vocabularies where natural sorting is preferred (i.e. PTVL), leading zeros probably shouldn't be used anyway.

@osma osma marked this pull request as ready for review September 15, 2021 13:59
@osma
Copy link
Member Author

osma commented Sep 15, 2021

I added unit tests that verify the sort order is correct on the PHP side in both "decimal" and "natural" cases. Ready for review.

I will update the wiki documentation after merging.

@osma osma requested a review from kouralex September 15, 2021 14:11
@kouralex
Copy link
Contributor

Looks great!

Three remarks, one that you already pinpointed:

Here (look at the image below) the natural order faces the same problem. I am not completely sure, if this vocabulary (HKLJ/HCLCS) should be using the "decimal" order. If that is the case (which may very well be), there is no problem as you mentioned and explained. (URL for your convenience: https://finto.fi/hklj/fi/page/001)
image

Here, a "real" problem where irregardless of the setting, this always shows up like this: (URL for your convenience: https://finto.fi/hklj/fi/page/691.1)
image

It seems like the sorting is done by the label only. For mapping properties one should probably also sort by its (if vocabulary known to Skosmos) sortByNotation setting. I am not sure if this complicates the issue a bit too much - maybe (I think) it is OK to use the same value.

Here it seems like the search is done using always the "decimal" order, even though one changes the value to "natural". This would mean that no vocabulary may use true "natural" sort strategy in search queries, am I right? The same naturally applies to the typeahead component, too (they use the same implementation under the hood).
image

Copy link
Contributor

@kouralex kouralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already commented in the PR where I had three remarks (of which at least two require some action or explanation).

@kouralex
Copy link
Contributor

kouralex commented Sep 15, 2021

As a fourth item, I also noticed that with "false" value only affects concept info part - it does not affect search nor JS (this may or may not be justified):
image
image

@osma
Copy link
Member Author

osma commented Sep 16, 2021

Thanks for the review @kouralex . You're right, I didn't consider ordering of search results at all in this PR. I will take a closer look at that and the other things you mentioned.

@osma osma changed the title Add "decimal" and "natural" sort strategies for notation codes Add "lexical" and "natural" sort strategies for notation codes Sep 16, 2021
@osma
Copy link
Member Author

osma commented Sep 16, 2021

Renamed the "decimal" strategy to "lexical". It's not really appropriate for decimal numbers (e.g. "110" sorts before "13") but is just based on lexical string sorting. Amended the PR title and OP accordingly.

@osma
Copy link
Member Author

osma commented Sep 16, 2021

Regarding the four comments:

1. Different natural sort orders in PHP and JS

Already discussed above, not a big issue IMHO.

2. Search result order

I looked at the ordering of search results by notation in more detail and opened an issue #1209. But I don't think there's anything worth changing in this area so I set that issue to the Blue Sky milestone.

3. Mapping property order

For mapping properties one should probably also sort by its (if vocabulary known to Skosmos) sortByNotation setting. I am not sure if this complicates the issue a bit too much - maybe (I think) it is OK to use the same value.

I opened a separate issue #1208 about ordering mapping property values. I think that's out of scope for this PR and issue.

4. Setting "false" has no effect on hierarchy

As a fourth item, I also noticed that with "false" value only affects concept info part - it does not affect search nor JS (this may or may not be justified):

You are right, I'm currently investigating this. I think this happens because when sorting by notation is disabled (set to false), the sorting of jsTree nodes (representing concepts) falls back to comparison of node text. But that text includes the notation code, so notation codes will affect the sorting anyway.

@osma
Copy link
Member Author

osma commented Sep 16, 2021

I fixed the JS side sorting in the hierarchy when the setting is false. I already responded to the three other comments above.
Ready for another round of review and (hopefully) merging @kouralex

@osma osma requested a review from kouralex September 16, 2021 10:32
Copy link
Contributor

@kouralex kouralex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to be working now and the issues I pinpointed have been addressed. Great job @osma !

I noticed that the default value is "false" if the property is not set at all, which kind of makes sense, however, the comments in the code imply it could be "lexical" (which is true if the value is "true"!).

Apart from that, ready for merging! 🙂

@osma
Copy link
Member Author

osma commented Sep 16, 2021

I fixed a minor JS issue and clarified the comments about default values. Will merge after tests have passed.

@sonarcloud
Copy link

sonarcloud bot commented Sep 16, 2021

SonarCloud Quality Gate failed.    Quality Gate failed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

No Coverage information No Coverage information
25.6% 25.6% Duplication

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The way concepts with notation codes are sorted should be described in config.ttl
2 participants