Skip to content

Automation with web-scrapping applications that allows user to retrieve all pre-defined Mars-related data by just "one-click". Utilized Chrome Driver with Python for automatically scraping multiple websites, retrieving data, and building Mongo DB to store retrieved data.

Notifications You must be signed in to change notification settings

henryle-n/Mission-to-Mars

Repository files navigation

Welcome to The Red Planet ... MARS !

1. Background

For over 30 years since the first human kind's close-up of this planet in 1965, Mars Exploration has never stopped being a top-trending topic. The more we look, the more interesting facts we found about this 4th planet from the Sun, such as "polar ice caps and clouds in its atmosphere, seasonal weather patterns, volcanoes, canyons and other recognizable features" (mars.nasa.gov).

In this project, a web application is built to scrape multiple websites for data related to NASA Mars Exploration Program. All scraped data is stored in a MongoDB table, queried, and displayed on a comprehensive single HTML page.

Click here to see the final page image.

Mars out of range ... Waiting for satellite signal ...


2. Languages, Tools & Techniques

  • Languges:
    • Python 3 | HTML 5 | CSS 3 | Markdown
  • Python Platform & Modules:
    • Flask | Anaconda 3 (Python 3.8) | Spinter (Chromedriver) | Beautiful Soup | Pandas | PyMongo | Jinja
  • HTML Libraries:
    • Bootstrap
  • noSQL Database:
  • Software/ Applications:
    • Jupyter Notebook | Visual Studio Code | Google Chrome v. 84 | Windows Terminal | Git Bash
  • Operating System:
    • Windows 10 Pro v1909

3. Table of Contents

All files are stored in the folder and sub-folder of "Missions_to_Mars"

FOLDER NAME CONTENTS
Chromedriver-v84 Chrome Driver for Google Chrome ver. 84
Jupyter Notebook Mission_to_Mars.ipynb, utilized during developement of scrape_mars.py
static contains style.css, and pictures of HTML background, readme & final website screenshot
templates contains 'index.html', main home page
application.py Flask App to drive the website
get_mars_data.py queries from MongoDB & shows data on main page data
load_mongo_db.py calls for new web-scraping and loading data into MongoDB
scrape_mars.py scraps the data and export new data into new table
requirements.txt all libraries/ modules were used in this project

4. Process Overview

4.1. Process at a Glance

For simplifying codes and easier debugging, the whole process was broken up into smaller processes and thus written in multiple Python Files. The below schematic decribes how all files are working together.

  • Green arrows indicate process when Fetch New Data button is clicked.
  • Orange arrows indicate how the data is retrieved, processed and displayed on the front-end.

4.1. Creating the App

Step 1 - Preparation & Develop Jupyter Notebook

  • Create new environment:
    • conda create -n <name-of-env> python=3.8
    • conda activate <name-of-env>
    • pip install -r requirements.txt
  • Download Chrome Driver:
    • In the address bar type: chrome://version/, look for Google Chrome header indicating current Chrome version.
    • For example: Google Chrome 84.0.4147.105 (Official Build) (64-bit) (cohort: Stable)
  • Download correct version of Chrome Driver, link here.
  • Websites visited for scraping:
WEBSITE WEB ADDRESS
Mars Latest News https://mars.nasa.gov/news
JPL Mars Featured Space Images https://www.jpl.nasa.gov/spaceimages
Mars Weather https://twitter.com/marswxreport
Mars Facts https://space-facts.com/mars
Mars Hemispheres https://astrogeology.usgs.gov/search/results?q=hemisphere+enhanced&k1=target&v1=Mars
  • Capture all scraped data into a dictionary.
  • Exported the Jupyter Notebook file to a Python file named scrap_mars.py.

Step 2 - Build MongoDB Application

File name: load_mongo_db.py

  • Download and install MongoDB.
  • Utilize PyMongo to establish the connection to MongoDB.
    • If there is server connection error, perform these steps:
      • Start -> Administrative Tools -> Services -> MongoDB.
      • Make sure MongoDB Server (MongoDB) Status is Running.
      • Click here for an image demo.
  • Write the dictionary of all data (scrapped from Step 1) into MongoDB.

Step 3 - Build Data Query Application from MongoDB

File name: get_mars_data.py

  • Utilize PyMongo to establish the connection to MongoDB.
  • Query data in MongoDB.
  • Prepare to send data back to Flask App.

Step 4 - Build Flask App & Home Page

File names: application.py & index.HTML

  • Build HTML template as the main webpage, formatted by CSS & Jinja codes embedded.
  • Build different RESTful APIs:
    • Render index.html.
    • Connect with previous Python apps for querying data and post on HTML template by Jinja.

Step 5 - Start-up Procedure

  • For the first time use, launch load_mongo_db.py to get some data for MongoDB and webpage.
  • Open and run application.py.

5. Summary

  • All data was successfully loaded into MongoDB. No significant issue occurs while web-scrapping.
  • Pending on internet connection, several websites took a very long time to load or not loading at all. This caused errors or missing data as the website has not yet done redering & loading. Delay time was added to multiple places in program to allow browser to catch up.
  • Many nested "if-elif-else" were used to patch up issues associated with slow or invalid website to ensure application processed without hiccup. A better way to do it is utilized "try / error" block to lean codes and improve working speed.

About

Automation with web-scrapping applications that allows user to retrieve all pre-defined Mars-related data by just "one-click". Utilized Chrome Driver with Python for automatically scraping multiple websites, retrieving data, and building Mongo DB to store retrieved data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published