Skip to content

liangruibupt/covid_19_report_end2end_analytics

Repository files navigation

COVID-19: End-To-End Analytics With AWS Glue, Athena And Superset

Introduction

Build a pipeline for open COVID-19 dataset. The dataset from corona-virus-report. It shows the cumulative confirmed, recovered and deaths figures by day, country and province. Latitude and longitude coordinates are also added at the country level. The pipeline complete the ETL of raw data and give visualization of entire world spread.

The pipeline as below

  1. Save messed-up dataset to the covid-19-raw-data S3 bucket.
  2. Run the AWS Glue Crawler on covid-19-raw-data S3 bucket to parse JSONs and create the covid-19-raw-data table in the Glue Data Catalog.
  3. Run the Glue ETL Job on covid-19-raw-data table to:
  • clean the data
  • save ETL JSON result to the covid_19_output_data S3 bucket.
  1. Run the AWS Glue Crawler on covid-19-output-data S3 bucket to parse JSONs and create the covid-19-output-data table in the Glue Data Catalog.
  2. Query the covid-19-output-data table in Amazon Athena. Remove duplicates and create the final covid19_app_data_athena table in the Glue Data Catalog.
  3. Connect Apache Superset to the covid19_app_data_athena table and build visualization dashboard.

COVID-19-Analytics-Pipeline

Detail step by step guide

Reference

covid-19-end-to-end-analytics-with-aws-glue-athena-and-quicksight A public data lake for analysis of COVID-19 data

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages