Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Search user does not work with some specific Vietnamese letters #13655

Closed
belakalotay opened this issue Aug 29, 2022 · 10 comments · Fixed by #14464
Closed

Search user does not work with some specific Vietnamese letters #13655

belakalotay opened this issue Aug 29, 2022 · 10 comments · Fixed by #14464
Labels
A-I18n A-User-Directory O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@belakalotay
Copy link

Description

If you search for a user having some special letters with accents in its name (like "á") then the suggestions become empty as soon as you type the subsequent letter after the special character.

See also the attached video demonstrating the problem.

Steps to reproduce

Please refer to the attached video.

Homeserver

matrix.org

Synapse Version

{"server_version":"1.66.0rc1 (b=matrix-org-hotfixes,ce8f7d118c)","python_version":"3.8.12"}

Installation Method

No response

Platform

app.element.io as webclient, matrix.org as homeserver.

2022-08-29.13-42-55.mp4

Relevant log output

We have also reproduced the problem on our own debug server, the following log lines are the relevant part of the synapse server log:

2022-08-29 14:03:19,983 - synapse.storage.txn - 795 - DEBUG - expire_url_cache_data-763 - [TXN END] {get_url_cache_media_before-17c4} 0.001260 sec
2022-08-29 14:03:19,983 - synapse.rest.media.v1.preview_url_resource - 840 - DEBUG - expire_url_cache_data-763 - No media removed from url preview cache
2022-08-29 14:03:19,990 - synapse.storage.TIME - 602 - DEBUG - sentinel - Total database time: 0.092% {_prune_old_user_ips(2): 0.038%, _update_client_ips_batch(1): 0.031%, get_url_cache_media_before(1): 0.013%}
2022-08-29 14:03:21,645 - synapse.access.http.8008 - 405 - DEBUG - GET-2345 - ::ffff:127.0.0.1 - 8008 - Received request: GET /health
2022-08-29 14:03:21,646 - synapse.access.http.8008 - 450 - DEBUG - GET-2345 - ::ffff:127.0.0.1 - 8008 - {None} Processed request: 0.000sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 2B 200 "GET /health HTTP/1.1" "curl/7.74.0" [0 dbevts]
2022-08-29 14:03:23,330 - synapse.http.site - 533 - WARNING - sentinel - forwarded request lacks an x-forwarded-proto header: assuming https
2022-08-29 14:03:23,330 - synapse.access.http.8008 - 405 - DEBUG - OPTIONS-2346 - 185.150.4.97 - 8008 - Received request: OPTIONS /_matrix/client/r0/user_directory/search
2022-08-29 14:03:23,331 - synapse.access.http.8008 - 450 - DEBUG - OPTIONS-2346 - 185.150.4.97 - 8008 - {None} Processed request: 0.000sec/-0.000sec (0.000sec, 0.000sec) (0.000sec/0.000sec/0) 0B 204 "OPTIONS /_matrix/client/r0/user_directory/search HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36" [0 dbevts]
2022-08-29 14:03:23,337 - synapse.http.site - 533 - WARNING - sentinel - forwarded request lacks an x-forwarded-proto header: assuming https
2022-08-29 14:03:23,337 - synapse.access.http.8008 - 405 - DEBUG - POST-2347 - 185.150.4.97 - 8008 - Received request: POST /_matrix/client/r0/user_directory/search
2022-08-29 14:03:23,338 - synapse.storage.txn - 691 - DEBUG - POST-2347 - [TXN START] {search_user_dir-17c5}
2022-08-29 14:03:23,338 - synapse.storage.SQL - 409 - DEBUG - POST-2347 - [SQL] {search_user_dir-17c5} SELECT d.user_id AS user_id, display_name, avatar_url FROM user_directory_search as t INNER JOIN user_directory AS d USING (user_id) WHERE user_id != ? AND vector @@ to_tsquery('simple', ?) ORDER BY (CASE WHEN d.user_id IS NOT NULL THEN 4.0 ELSE 1.0 END) * (CASE WHEN display_name IS NOT NULL THEN 1.2 ELSE 1.0 END) * (CASE WHEN avatar_url IS NOT NULL THEN 1.2 ELSE 1.0 END) * ( 3 * ts_rank_cd( '{0.1, 0.1, 0.9, 1.0}', vector, to_tsquery('simple', ?), 8 ) + ts_rank_cd( '{0.1, 0.1, 0.9, 1.0}', vector, to_tsquery('simple', ?), 8 ) ) DESC, display_name IS NULL, avatar_url IS NULL LIMIT ?
2022-08-29 14:03:23,339 - synapse.storage.SQL - 417 - DEBUG - POST-2347 - [SQL values] {search_user_dir-17c5} ('@2_cb874bcb1f1b5219:anconnect-server-dev107.aarenet.com', '(Gi:* | Gi)', 'Gi', 'Gi:*', 11)
2022-08-29 14:03:23,341 - synapse.storage.SQL - 438 - DEBUG - POST-2347 - [SQL time] {search_user_dir-17c5} 0.002775 sec
2022-08-29 14:03:23,342 - synapse.storage.txn - 795 - DEBUG - POST-2347 - [TXN END] {search_user_dir-17c5} 0.003450 sec
2022-08-29 14:03:23,343 - synapse.access.http.8008 - 450 - INFO - POST-2347 - 185.150.4.97 - 8008 - {@2_cb874bcb1f1b5219:anconnect-server-dev107.aarenet.com} Processed request: 0.005sec/0.001sec (0.002sec, 0.000sec) (0.000sec/0.003sec/1) 155B 200 "POST /_matrix/client/r0/user_directory/search HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36" [0 dbevts]
2022-08-29 14:03:23,414 - synapse.metrics._gc - 118 - DEBUG - sentinel - Collecting gc 0
2022-08-29 14:03:24,654 - synapse.storage.txn - 691 - DEBUG - prune_old_user_ips-1528 - [TXN START] {_prune_old_user_ips-17c6}
2022-08-29 14:03:24,654 - synapse.storage.SQL - 409 - DEBUG - prune_old_user_ips-1528 - [SQL] {_prune_old_user_ips-17c6} DELETE FROM user_ips WHERE last_seen <= ( SELECT COALESCE(MAX(last_seen), -1) FROM ( SELECT last_seen FROM user_ips WHERE last_seen <= ? ORDER BY last_seen ASC LIMIT 5000 ) AS u )
2022-08-29 14:03:24,655 - synapse.storage.SQL - 417 - DEBUG - prune_old_user_ips-1528 - [SQL values] {_prune_old_user_ips-17c6} (1659355404653,)
2022-08-29 14:03:24,655 - synapse.storage.SQL - 438 - DEBUG - prune_old_user_ips-1528 - [SQL time] {_prune_old_user_ips-17c6} 0.000770 sec
2022-08-29 14:03:24,656 - synapse.storage.txn - 795 - DEBUG - prune_old_user_ips-1528 - [TXN END] {_prune_old_user_ips-17c6} 0.001809 sec
2022-08-29 14:03:24,757 - synapse.http.site - 533 - WARNING - sentinel - forwarded request lacks an x-forwarded-proto header: assuming https
2022-08-29 14:03:24,758 - synapse.access.http.8008 - 405 - DEBUG - POST-2348 - 185.150.4.97 - 8008 - Received request: POST /_matrix/client/r0/user_directory/search
2022-08-29 14:03:24,759 - synapse.storage.txn - 691 - DEBUG - POST-2348 - [TXN START] {search_user_dir-17c7}
2022-08-29 14:03:24,759 - synapse.storage.SQL - 409 - DEBUG - POST-2348 - [SQL] {search_user_dir-17c7} SELECT d.user_id AS user_id, display_name, avatar_url FROM user_directory_search as t INNER JOIN user_directory AS d USING (user_id) WHERE user_id != ? AND vector @@ to_tsquery('simple', ?) ORDER BY (CASE WHEN d.user_id IS NOT NULL THEN 4.0 ELSE 1.0 END) * (CASE WHEN display_name IS NOT NULL THEN 1.2 ELSE 1.0 END) * (CASE WHEN avatar_url IS NOT NULL THEN 1.2 ELSE 1.0 END) * ( 3 * ts_rank_cd( '{0.1, 0.1, 0.9, 1.0}', vector, to_tsquery('simple', ?), 8 ) + ts_rank_cd( '{0.1, 0.1, 0.9, 1.0}', vector, to_tsquery('simple', ?), 8 ) ) DESC, display_name IS NULL, avatar_url IS NULL LIMIT ?
2022-08-29 14:03:24,759 - synapse.storage.SQL - 417 - DEBUG - POST-2348 - [SQL values] {search_user_dir-17c7} ('@2_cb874bcb1f1b5219:anconnect-server-dev107.aarenet.com', '(Gia:* | Gia)', 'Gia', 'Gia:*', 11)
2022-08-29 14:03:24,762 - synapse.storage.SQL - 438 - DEBUG - POST-2348 - [SQL time] {search_user_dir-17c7} 0.002633 sec
2022-08-29 14:03:24,762 - synapse.storage.txn - 795 - DEBUG - POST-2348 - [TXN END] {search_user_dir-17c7} 0.003215 sec
2022-08-29 14:03:24,763 - synapse.access.http.8008 - 450 - INFO - POST-2348 - 185.150.4.97 - 8008 - {@2_cb874bcb1f1b5219:anconnect-server-dev107.aarenet.com} Processed request: 0.005sec/0.000sec (0.002sec, 0.000sec) (0.000sec/0.003sec/1) 155B 200 "POST /_matrix/client/r0/user_directory/search HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36" [0 dbevts]
2022-08-29 14:03:24,908 - synapse.handlers.typing - 100 - DEBUG - typing._handle_timeouts-1528 - Checking for typing timeouts
2022-08-29 14:03:24,908 - synapse.handlers.presence - 901 - DEBUG - handle_presence_timeouts-1522 - Handling presence timeouts
2022-08-29 14:03:24,908 - synapse.util.metrics - 163 - DEBUG - handle_presence_timeouts-1522 - Entering block presence_update_states
2022-08-29 14:03:24,909 - synapse.util.metrics - 176 - DEBUG - handle_presence_timeouts-1522 - Exiting block presence_update_states
2022-08-29 14:03:26,325 - synapse.http.site - 533 - WARNING - sentinel - forwarded request lacks an x-forwarded-proto header: assuming https
2022-08-29 14:03:26,326 - synapse.access.http.8008 - 405 - DEBUG - POST-2349 - 185.150.4.97 - 8008 - Received request: POST /_matrix/client/r0/user_directory/search
2022-08-29 14:03:26,327 - synapse.storage.txn - 691 - DEBUG - POST-2349 - [TXN START] {search_user_dir-17c8}
2022-08-29 14:03:26,327 - synapse.storage.SQL - 409 - DEBUG - POST-2349 - [SQL] {search_user_dir-17c8} SELECT d.user_id AS user_id, display_name, avatar_url FROM user_directory_search as t INNER JOIN user_directory AS d USING (user_id) WHERE user_id != ? AND vector @@ to_tsquery('simple', ?) ORDER BY (CASE WHEN d.user_id IS NOT NULL THEN 4.0 ELSE 1.0 END) * (CASE WHEN display_name IS NOT NULL THEN 1.2 ELSE 1.0 END) * (CASE WHEN avatar_url IS NOT NULL THEN 1.2 ELSE 1.0 END) * ( 3 * ts_rank_cd( '{0.1, 0.1, 0.9, 1.0}', vector, to_tsquery('simple', ?), 8 ) + ts_rank_cd( '{0.1, 0.1, 0.9, 1.0}', vector, to_tsquery('simple', ?), 8 ) ) DESC, display_name IS NULL, avatar_url IS NULL LIMIT ?
2022-08-29 14:03:26,328 - synapse.storage.SQL - 417 - DEBUG - POST-2349 - [SQL values] {search_user_dir-17c8} ('@2_cb874bcb1f1b5219:anconnect-server-dev107.aarenet.com', '(Gia:* | Gia) & (o:* | o)', 'Gia & o', 'Gia:* & o:*', 11)
2022-08-29 14:03:26,329 - synapse.storage.SQL - 438 - DEBUG - POST-2349 - [SQL time] {search_user_dir-17c8} 0.001021 sec
2022-08-29 14:03:26,329 - synapse.storage.txn - 795 - DEBUG - POST-2349 - [TXN END] {search_user_dir-17c8} 0.001580 sec
2022-08-29 14:03:26,330 - synapse.access.http.8008 - 450 - INFO - POST-2349 - 185.150.4.97 - 8008 - {@2_cb874bcb1f1b5219:anconnect-server-dev107.aarenet.com} Processed request: 0.004sec/0.001sec (0.002sec, 0.000sec) (0.001sec/0.002sec/1) 30B 200 "POST /_matrix/client/r0/user_directory/search HTTP/1.0" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36" [0 dbevts]
2022-08-29 14:03:28,370 - synapse.storage.txn - 691 - DEBUG - _get_stats_for_federation_staging-254 - [TXN START] {_get_stats_for_federation_staging-17c9}
2022-08-29 14:03:28,370 - synapse.storage.SQL - 409 - DEBUG - _get_stats_for_federation_staging-254 - [SQL] {_get_stats_for_federation_staging-17c9} SELECT count(*) FROM federation_inbound_events_staging
2022-08-29 14:03:28,371 - synapse.storage.SQL - 438 - DEBUG - _get_stats_for_federation_staging-254 - [SQL time] {_get_stats_for_federation_staging-17c9} 0.000456 sec

Anything else that would be useful to know?

No response

@DMRobertson DMRobertson added the A-Message-Search Searching messages label Aug 30, 2022
@DMRobertson
Copy link
Contributor

DMRobertson commented Aug 30, 2022

Possibly related to #3116? Ah no, this is user directory search.

Can you please confirm what database you're using? If postgres, what is the synapse database's locale and encoding? (SHOW lc_ctype; SHOW lc_collate; SHOW server_encoding;)

@DMRobertson DMRobertson added S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. A-User-Directory X-Needs-Info This issue is blocked awaiting information from the reporter and removed A-Message-Search Searching messages labels Aug 30, 2022
@belakalotay
Copy link
Author

Hi,

We have reported the problem for matrix.org, because the behaviour there is the same. But originally encountered it on our own server which has a postgres DB.
The results on our server for your queries are as follows:

postgres=# SHOW lc_ctype;
 lc_ctype
----------
 C
(1 row)

postgres=# SHOW lc_collate;
 lc_collate
------------
 C
(1 row)

postgres=# SHOW server_encoding;
 server_encoding
-----------------
 UTF8
(1 row)

@DMRobertson DMRobertson added O-Occasional Affects or can be seen by some users regularly or most users rarely and removed X-Needs-Info This issue is blocked awaiting information from the reporter labels Aug 30, 2022
@squahtx
Copy link
Contributor

squahtx commented Aug 30, 2022

It's worth noting that the is a lowercase a followed by a U+0301 COMBINING ACUTE ACCENT.

_parse_query_postgres notably tries to split the search term into words but gets it wrong:

def _parse_query_postgres(search_term: str) -> Tuple[str, str, str]:
"""Takes a plain unicode string from the user and converts it into a form
that can be passed to database.
We use this so that we can add prefix matching, which isn't something
that is supported by default.
"""
# Pull out the individual words, discarding any non-word characters.
results = re.findall(r"([\w\-]+)", search_term, re.UNICODE)
both = " & ".join("(%s:* | %s)" % (result, result) for result in results)
exact = " & ".join("%s" % (result,) for result in results)
prefix = " & ".join("%s:*" % (result,) for result in results)
return both, exact, prefix

>>> from synapse.storage.databases.main.user_directory import _parse_query_postgres
>>> _parse_query_postgres("Gá")
('(Ga:* | Ga)', 'Ga', 'Ga:*')
>>> _parse_query_postgres("Gáo")
('(Ga:* | Ga) & (o:* | o)', 'Ga & o', 'Ga:* & o:*')

Line 918 probably needs to accept \p{Mark} code points, except that syntax isn't supported by the default re library in Python. The regex package does, or re could be given code point ranges to match.

There are probably other places in the code where \w is used with the same intent too.

@MurzNN
Copy link

MurzNN commented Sep 2, 2022

We have a similar problem is with all Russian characters too - case-insensitive search does not work, here is the issue about this: #3116 so I guess the solution could be the same.

@squahtx squahtx added the A-I18n label Sep 6, 2022
@squahtx
Copy link
Contributor

squahtx commented Sep 6, 2022

@squahtx
Copy link
Contributor

squahtx commented Sep 7, 2022

Line 918 probably needs to accept \p{Mark} code points, except that syntax isn't supported by the default re library in Python. The regex package does, or re could be given code point ranges to match.

There are probably other places in the code where \w is used with the same intent too.

To expand on this: the quick and dirty proposal is to use something like regex.findall(r"\b\w.*?\b", search_term, regex.WORD) instead of re.findall(r"([\w\-]+)", search_term, re.UNICODE) to identify whole words.

This will fix exact matches not working, but will not resolve #1523, where Gao or Gáo (a with acute accent) will not match Gáo (a followed by combining acute accent). The latter may or may not already work depending on what postgres does.


Note that this still performs poorly. There are languages whose words consist of a variable number of \w code points that do not have spaces between them. A much better solution is to integrate libicu's word boundaries (https://unicode-org.github.io/icu/userguide/boundaryanalysis/#word-boundary), which is what chromium supposedly uses. In the comparison below, it can be seen that only icu does something vaguely reasonable for Japanese.

Test code to compare re, regex and icu
#!/usr/bin/env python3

import re
import regex
import icu

test_cases = [
    "It's a nice day outside.",
    "Received foo.png!",
    "Gáo",
    "C++20",
    "3.14159. 3.",
    "あなたはそれを行うべきではありません",
]

for text in test_cases:
    re1_output = re.findall(r"([\w\-]+)", text, re.UNICODE)
    re2_output = re.findall(r"\b\w.*?\b", text, re.UNICODE)
    regex_output = regex.findall(r"\b\w.*?\b", text, regex.WORD)
    icu_output = []
    breaker = icu.BreakIterator.createWordInstance(icu.Locale.getDefault())
    breaker.setText(text)
    i = 0
    while True:
        j = breaker.nextBoundary()
        if j < 0:
            break
        icu_output.append(text[i:j])
        i = j

    print(f"Text: {text!r}")
    print(f"    re.findall(r\"([\\w\\-]+)\"):    {re1_output!r}")
    print(f"    re.findall(r\"\\b\\w.*?\\b\"):    {re2_output!r}")
    print(f"    regex.findall(r\"\\b\\w.*?\\b\"): {regex_output!r}")
    print(f"    icu:                         {icu_output!r}")
Text: "It's a nice day outside."
    re.findall(r"([\w\-]+)"):    ['It', 's', 'a', 'nice', 'day', 'outside']
    re.findall(r"\b\w.*?\b"):    ['It', 's', 'a', 'nice', 'day', 'outside']
    regex.findall(r"\b\w.*?\b"): ["It's", 'a', 'nice', 'day', 'outside']
    icu:                         ["It's", ' ', 'a', ' ', 'nice', ' ', 'day', ' ', 'outside', '.']
Text: 'Received foo.png!'
    re.findall(r"([\w\-]+)"):    ['Received', 'foo', 'png']
    re.findall(r"\b\w.*?\b"):    ['Received', 'foo', 'png']
    regex.findall(r"\b\w.*?\b"): ['Received', 'foo.png']
    icu:                         ['Received', ' ', 'foo.png', '!']
Text: 'Gáo'
    re.findall(r"([\w\-]+)"):    ['Ga', 'o']
    re.findall(r"\b\w.*?\b"):    ['Ga', 'o']
    regex.findall(r"\b\w.*?\b"): ['Gáo']
    icu:                         ['Gáo']
Text: 'C++20'
    re.findall(r"([\w\-]+)"):    ['C', '20']
    re.findall(r"\b\w.*?\b"):    ['C', '20']
    regex.findall(r"\b\w.*?\b"): ['C', '20']
    icu:                         ['C', '+', '+', '20']
Text: '3.14159. 3.'
    re.findall(r"([\w\-]+)"):    ['3', '14159', '3']
    re.findall(r"\b\w.*?\b"):    ['3', '14159', '3']
    regex.findall(r"\b\w.*?\b"): ['3.14159', '3']
    icu:                         ['3.14159', '.', ' ', '3', '.']
Text: 'あなたはそれを行うべきではありません'
    re.findall(r"([\w\-]+)"):    ['あなたはそれを行うべきではありません']
    re.findall(r"\b\w.*?\b"):    ['あなたはそれを行うべきではありません']
    regex.findall(r"\b\w.*?\b"): ['あ', 'な', 'た', 'は', 'そ', 'れ', 'を', '行', 'う', 'べ', 'き', 'で', 'は', 'あ', 'り', 'ま', 'せ', 'ん']
    icu:                         ['あなた', 'は', 'それ', 'を', '行う', 'べ', 'き', 'では', 'ありま', 'せん']

This issue only concerns word boundaries, and not any sort of normalization, stemming or case/accent folding for searching. And we will have to ensure that the words in the postgres index are the same as the words we search for if/when we change the logic.

@reivilibre's view is that it would be best if we can find a way to have postgres or some library handle all this for us.

@DMRobertson
Copy link
Contributor

@reivilibre's view is that it would be best if we can find a way to have postgres or some library handle all this for us.

Or even some external full-text search database. Lucene or something that uses it?

@belakalotay
Copy link
Author

Line 918 probably needs to accept \p{Mark} code points, except that syntax isn't supported by the default re library in Python. The regex package does, or re could be given code point ranges to match.
There are probably other places in the code where \w is used with the same intent too.

To expand on this: the quick and dirty proposal is to use something like regex.findall(r"\b\w.*?\b", search_term, regex.WORD) instead of re.findall(r"([\w\-]+)", search_term, re.UNICODE) to identify whole words.

This will fix exact matches not working, but will not resolve #1523, where Gao or Gáo (a with acute accent) will not match Gáo (a followed by combining acute accent). The latter may or may not already work depending on what postgres does.

Note that this still performs poorly. There are languages whose words consist of a variable number of \w code points that do not have spaces between them. A much better solution is to integrate libicu's word boundaries (https://unicode-org.github.io/icu/userguide/boundaryanalysis/#word-boundary), which is what chromium supposedly uses. In the comparison below, it can be seen that only icu does something vaguely reasonable for Japanese.

Test code to compare re, regex and icu

Text: "It's a nice day outside."
    re.findall(r"([\w\-]+)"):    ['It', 's', 'a', 'nice', 'day', 'outside']
    re.findall(r"\b\w.*?\b"):    ['It', 's', 'a', 'nice', 'day', 'outside']
    regex.findall(r"\b\w.*?\b"): ["It's", 'a', 'nice', 'day', 'outside']
    icu:                         ["It's", ' ', 'a', ' ', 'nice', ' ', 'day', ' ', 'outside', '.']
Text: 'Received foo.png!'
    re.findall(r"([\w\-]+)"):    ['Received', 'foo', 'png']
    re.findall(r"\b\w.*?\b"):    ['Received', 'foo', 'png']
    regex.findall(r"\b\w.*?\b"): ['Received', 'foo.png']
    icu:                         ['Received', ' ', 'foo.png', '!']
Text: 'Gáo'
    re.findall(r"([\w\-]+)"):    ['Ga', 'o']
    re.findall(r"\b\w.*?\b"):    ['Ga', 'o']
    regex.findall(r"\b\w.*?\b"): ['Gáo']
    icu:                         ['Gáo']

We have tried this approach, but unfortunately it didn't help.

@chagai95
Copy link
Contributor

Hey, just wondering, what would the timeline be for integrating such a library or some external full-text search database?

@benparsons
Copy link
Member

Customer tried to use the fix proposed by @squahtx, but was unsuccessful.

babolivier added a commit that referenced this issue Dec 12, 2022
Fixes #13655

This change uses ICU (International Components for Unicode) to improve boundary detection in user search.

This change also adds a new dependency on libicu-dev and pkg-config for the Debian packages, which are available in all supported distros.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-I18n A-User-Directory O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Blocks non-critical functionality, workarounds exist. T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants