Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

viewer : csw search doesn't return exact title match #760

Closed
fphg opened this issue Sep 8, 2014 · 8 comments
Closed

viewer : csw search doesn't return exact title match #760

fphg opened this issue Sep 8, 2014 · 8 comments
Assignees
Milestone

Comments

@fphg
Copy link
Member

fphg commented Sep 8, 2014

Search performed on full titles as "Zones de cantonnement de pêche aux crustacés dans le Finistère" does not return any match.

This search is being converted into word filters by mapfishapp, then an && search is being performed. Some words are excluded of the fr index (de ? dans ?), so the query fails.

This behavior could be improved by adding 2 filters

title = fullquery*
OR
alternatetitle = fullquery*
OR
(word filter&&word filter&&word filter)

This way

  • the fulltext search returns usual occurences
  • it also will return any md with a matching title or subtitle

Will contribute for this if you find it convenient.

@fvanderbiest
Copy link
Member

Looks very promising !

Thanks for the contribution Fabrice.

@fvanderbiest
Copy link
Member

You can contribute to this as a bugfix on 14.01 (preferred) or 14.06 branch, or a new feature on master.

@fphg
Copy link
Member Author

fphg commented Sep 8, 2014

thanks

@fvanderbiest fvanderbiest added this to the 14.12 milestone Sep 8, 2014
@hsquividant
Copy link
Member

Very promising, certes, mais aussi very "marée d'équinoxe", isn't it ?
A quand les frayères à bar ou les épaves à langoustes en OpenData ?

@fphg
Copy link
Member Author

fphg commented Sep 10, 2014

proposal in 48eb57a
basically :

AND AnyText~*wms*
AND type=dataset||series
AND (
    (Title||AlternateTitle||Identifier) == fullquery
    OR
    (word match && word match && word match)
)

Tests showed improvements but also unexpected effects, however no regression :

  • full id search works, very convenient
  • title doesn't work for long titles. index limit ?
  • title works for partial titles even if word search is off (ie crustacés find the metadata). it shouldnt as we are using PropertyIsEqualTo (exact match)

Organisation search (#geOrchestra PSC) didn't work because words are splitted. Added _ to address this : #geOrchestra_PSC will return md with OrganisationName="geOrchestra PSC"

If this behavior is OK for you I PR

@fvanderbiest
Copy link
Member

Remaining question for @fxp : "title doesn't work for long titles. index limit ?"

@fxprunayre
Copy link
Member

"title doesn't work for long titles. index limit ?"

Probably not related to title length but more to the language configuration defined for the index (see https://github.com/geonetwork/core-geonetwork/blob/develop/web/src/main/webapp/WEB-INF/classes/setup/sql/data/data-db-default.sql#L632). You could have language detection and analyzer applied which could cause trouble in such cases and an exact match is not working as expected. This really depends on the index language settings.

For your example: "Zones de cantonnement de pêche aux crustacés dans le Finistère", I would suspect an issue related to stopwords. In that case "Zones cantonnement pêche crustacés Finistère" maybe return a match ?

At indexing time : "Zones de cantonnement de pêche aux crustacés dans le Finistère" > after analyzer applied for a record in french tokens will be > "Zones" "cantonnement" "pêche" "crustacés" "Finistère"

At search time: the same analyzer should be applied BUT if the search is detected as english or requested as english then "de", "dans", ... will be part of the match and return no match (if the record was in French).

@fvanderbiest
Copy link
Member

Thanks Fx !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants