Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize the search term #1239

Merged
merged 1 commit into from
Nov 11, 2021
Merged

Normalize the search term #1239

merged 1 commit into from
Nov 11, 2021

Conversation

joelit
Copy link
Contributor

@joelit joelit commented Nov 9, 2021

Fixes #1184

@sonarcloud
Copy link

sonarcloud bot commented Nov 9, 2021

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@codecov
Copy link

codecov bot commented Nov 9, 2021

Codecov Report

Merging #1239 (5bfe646) into master (30ce5d0) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #1239   +/-   ##
=========================================
  Coverage     69.07%   69.08%           
  Complexity     1638     1638           
=========================================
  Files            32       32           
  Lines          4016     4017    +1     
=========================================
+ Hits           2774     2775    +1     
  Misses         1242     1242           
Impacted Files Coverage Δ
model/ConceptSearchParameters.php 84.44% <100.00%> (+0.17%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 30ce5d0...5bfe646. Read the comment docs.

@osma osma added the bug label Nov 10, 2021
@osma osma added this to the 2.13 milestone Nov 10, 2021
@osma
Copy link
Member

osma commented Nov 10, 2021

I tested this and it works as it should. The only issue is that highlighting (with bold text) the matching parts in the autocomplete box is not working when the input string is decomposed (composed on the left, decomposed on the right):

kuva

I tried to fix this by adding a .normalize('NFC') call here:

searchString = $(this).val();

...but it didn't help; I think it would require changes within typeahead.js as it seems to read the search string directly from the text field, so there is no easy opportunity to normalize it. Anyway this is no big deal; there are other similar problems with missing highlights (e.g. in case of accent folding) and these should be pretty rare cases anyway.

I think it's reasonable to place the normalization call in ConceptSearchParameters.getSearchTerm(), as that method also performs other types of search term normalization such as stripping whitespace.

@kinow
Copy link
Collaborator

kinow commented Nov 10, 2021

I think it would require changes within typeahead.js

I think #1182 is also updating typeahead. I can try to test it with the latest version and comment if we have a follow-up issue for that @osma 👍 (I will see if I have some text with surrogate characters or just manually edit a dataset and upload to my test fuseki)

@osma
Copy link
Member

osma commented Nov 10, 2021

will see if I have some text with surrogate characters or just manually edit a dataset and upload to my test fuseki

You don't need very special data for this, just text with non-ASCII characters (e.g. åäöéñ) that are stored as composed Unicode characters (NFC), which is the usual case for RDF data.

The problem was when the user enters a search string which contains decomposed (NFD) characters (typically copied and pasted from some other system - in our case an Aleph ILS). This comment in the original issue shows how to trigger it with Linux command line tools and the KANTO/FINAF data set in Finto.

@joelit
Copy link
Contributor Author

joelit commented Nov 11, 2021

Thanks for spotting this defect, and thanks for the valuable input. :) I'll make a new issue regarding the autocomplete highlighting bug, and call this one an incremental improvement on the situation (and I'll link this discussion there).

@joelit joelit merged commit 15b1674 into master Nov 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Searching with decomposed unicode characters
3 participants