GitHub - Dineshkarthik/FuzzyStringComparison: Clustering of strings using Fuzzy String matching and KMeans Algorithm.

String Clustering:

Clustering of strings using Fuzzy String matching and KMeans Algorithm.

python string_clustering.py json_file_name field_name no_of_clusters

json_file_name: Name of the input JSON file
field_name : Name of the JSON field
no_of_clusters: Number of Clusters into which the string has to be clustered. * If the input file is present in another direcoty enter the full path, D:/FuzzyStringMatch/data/sample_data.json

csv or tsv files can also be used. Use Pandas read_csv function.

python string_clustering.py D:/FuzzyStringcomparision/data/sample_data.json field04 25

This generates a output file named Report.txt with the strings from the JSON field field04 clustured together into 25 different clusters.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
string_clustering.py		string_clustering.py