Skip to content

Latest commit

 

History

History
24 lines (21 loc) · 2.39 KB

File metadata and controls

24 lines (21 loc) · 2.39 KB

cluster-analysis-of-data-on-the-dynamics-of-the-covid-19-epidemic-with-R

R project for cluster analysis of data on the dynamics of the covid-19 epidemic with k-mean and hierarchical clustering usind DTW-distances.

Data

You can download data via csv by the link with needed countries.

Data preparation

Distances

For distances between countries was used Dynamic Time Warping (DTW) approach. It's better approach for time series comparing to classical methods because it's capable of finding disease peaks that can be shifted relative to each other and calculating distance between them not between relative pairs Xi and Yi as classical methods do. So it gives us more accurate distances between countries.

Example of calculated DTW-distances between two time series (countries): example

Clustering

  • Multidimensional scaling + knn
  • Hierarchical clustering with finding optimal linkage method and optimal number of clusters using gap statistics

Comparison of results

Comparison between used clustering methods (mds + knn and hclust) using Rand Index and cintigency table.

Results Interpretation

Example of hierarchical clustering results researching 25 European countries (used service www.mapchart.net/europe.html): results example