Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add more complex context-aware methods for matching strings to entities #4

Open
rtroncy opened this issue Mar 11, 2019 · 0 comments
Open
Assignees

Comments

@rtroncy
Copy link
Contributor

rtroncy commented Mar 11, 2019

When we applied string2vocabulary with strings representing cities and towns to match with Geonames in SILKNOW, we obtained a lot of bad results.

Example: http://data.silknow.org/production/41481202-0c96-3171-82ca-099088faf425.
The original city mentioned is simply "Saint Etienne" identified by http://www.geonames.org/2980291/. Strangely, string2vocabulary has matched it with a much smaller town, "Saint-Étienne-du-Rouvray" identified by http://sws.geonames.org/2980236/. Having said this, there are a 100 cities in France named "Saint Etienne something".

This shows the limit of pure fuzzy string matching. Should we consider having more complex matching techniques, e.g. relying on pre-trained word embeddings. It is possible that "Saint Etienne" used with the other contextual words (satin, faille, soie, tissu façonné) will have lead to the right city.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants