Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzy match error: by_city_and_state changes the city when it is not ambiguously spelled. #35

Open
kosar opened this issue Mar 23, 2020 · 4 comments

Comments

@kosar
Copy link

kosar commented Mar 23, 2020

Describe the bug
Searching for a zip code by city (by_city_and_state) is returning a zip code for a city who's name is close to the one provided, but unnecessarily so, since the city provided is unambiguous.

A clear and concise description of what the bug is.
Searching for city state : 'burien' , 'wa' returns a zip code for 'Burlington, WA'

To Reproduce
by_city_and_state using above test case.

Steps to reproduce the behavior:
import pandas as pd
from uszipcode import SearchEngine
searchObject = SearchEngine(simple_zipcode=True)
strCity='burien'
strState='WA'
res = searchObject.by_city_and_state(strCity, strState, returns=100)
res[0]
SimpleZipcode(zipcode='98233', zipcode_type='Standard', major_city='Burlington', post_office_city='Burlington, WA', common_city_list=['Burlington'], county='Skagit County', state='WA', lat=48.5, lng=-122.4, timezone='Pacific', radius_in_miles=10.0, area_code_list=['360'], population=14871, population_density=439.0, land_area_in_sqmi=33.85, water_area_in_sqmi=0.26, housing_units=5897, occupied_housing_units=5522, median_home_value=232700, median_household_income=52906, bounds_west=-122.444478, bounds_east=-122.285302, bounds_north=48.620048, bounds_south=48.444658)

Expected behavior
Should match on 'burien' as that is a valid city in WA.

Screenshots
see above, code snippet

Additional context

I love your work, and hope it helps to report this issue. keep it up.
I am working on a workaround to this issue, to search by state, and then try to find the match myself for the city name, which defeats some of the purpose of this awesome library. Wondering if there is a better pattern or workaround I could consider, if the above is just a side effect of fuzzy matching. Thanks so much.

@MacHu-GWU
Copy link
Owner

@kosar good catch. I cannot fix it because it is highly depends on the fuzzy match algorithm I am using.

@Yossi
Copy link

Yossi commented Apr 3, 2023

Perhaps this project could migrate to https://github.com/maxbachmann/RapidFuzz which seems to still be maintained.

see here seatgeek/fuzzywuzzy#318 (comment)

@MacHu-GWU
Copy link
Owner

@Yossi it says

On Windows the [Visual C++ 2019 redistributable](https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads) is required

This would be too harsh for windows user. Maybe I can use try ... except ... to let user to choose fuzzywuzzy or rapidfuzz

@maxbachmann
Copy link

Actually this is not really the whole truth anymore. In case the c++ implementation is not available it falls back to a pure Python implementation similar to fuzzywuzzy (but without behavior differences between the Python and C++ version).

So while yes it is recommended to install the c++ redistributable for performance reasons, this is not really needed for the library to work anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants