Skip to content

R project for cluster analysis of data on the dynamics of the covid-19 epidemic with k-mean and hierarchical clustering

Notifications You must be signed in to change notification settings

dimapihtar/cluster-analysis-of-data-on-the-dynamics-of-the-covid-19-epidemic-with-R

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

cluster-analysis-of-data-on-the-dynamics-of-the-covid-19-epidemic-with-R

R project for cluster analysis of data on the dynamics of the covid-19 epidemic with k-mean and hierarchical clustering usind DTW-distances.

Data

You can download data via csv by the link with needed countries.

Data preparation

Distances

For distances between countries was used Dynamic Time Warping (DTW) approach. It's better approach for time series comparing to classical methods because it's capable of finding disease peaks that can be shifted relative to each other and calculating distance between them not between relative pairs Xi and Yi as classical methods do. So it gives us more accurate distances between countries.

Example of calculated DTW-distances between two time series (countries): example

Clustering

  • Multidimensional scaling + knn
  • Hierarchical clustering with finding optimal linkage method and optimal number of clusters using gap statistics

Comparison of results

Comparison between used clustering methods (mds + knn and hclust) using Rand Index and cintigency table.

Results Interpretation

Example of hierarchical clustering results researching 25 European countries (used service www.mapchart.net/europe.html): results example

About

R project for cluster analysis of data on the dynamics of the covid-19 epidemic with k-mean and hierarchical clustering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages