Skip to content

AndrewShanahan/PFDA_Assignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Programming for Data Analysis

PfDA Assignment 1 - README

Introduction - about the Project and Readme file

This repository has been created as part of the course work for the Programming for Data Analysis module in the Higher Diploma in Computer Programming in Data Analytics provided by Atlantic Technical University.

Purpose

This repository has been created and edit by way of using Jupyter Notebooks.

This project analyses data from an Airbnb dataset for Dublin, Ireland and analysis statistical analysis on certain variables in addition to similar analysis on a synthesised dataset. To get started, the notebook is found in the following link and contains all the analysis for this project; https://github.com/AndrewShanahan/PFDA_Assignment/blob/main/PfDA_Assignment.ipynb

From the Airbnb dataset, I have identified 5 variables that I have decided to focus on and analyse:

  • host_id (integer)
  • host_listings_count (float)
  • reviews_per_month (float)
  • review_scores_rating (float)
  • price (object) - Please note, during the project I had some instances where an object was not the most useful data type, in the spirit of good time keeping I amended this to a float rather than performing a task to remove any symbols (i.e. €,$,£ etc.).

Please note that as the project has evolved I have decided to utilise some other variables, for example; plotting graphs and charts and data synthesis.

The references section below may be of use to users in order to help with understanding of project.

System Requirements

To run or modify the notebooks on a local machine requires the latest version of Python, Anaconda is an easy to use version available on Windows, Mac or Linux operating systems. Alternatively, there are a number of web based version available like Jupyter Notebooks which has been utilised during this project.

Information on how to install and run Jupyter Notebooks can be found through the following link:
https://docs.jupyter.org/en/latest/install.html

Running Jupyter Notebooks

The following link provides information on how to launch Jupyter Notebook from a terminal.
https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/execute.html

References:

Dataset:
[01] AirBNB - http://insideairbnb.com/get-the-data/
Datacamp - numerous courses/tracks completed over last number of months have supported this exercise

Python/General:
[02] Udemy course - https://www.udemy.com/course/the-modern-python3-bootcamp/learn/lecture/8680110?start=94#overview
[03] Software Freedom Conservancy. Git - https://git-scm.com/.
[04] Datacamp - https://www.datacamp.com/
[05] W3Schools - https://www.w3schools.com/python/default.asp
[06] Stackoverflow - https://stackoverflow.com/
[07] Numpy/Random generator - https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.normal.html#numpy.random.Generator.normal
[08] Plot labels, titles, legend - lecturer video - https://web.microsoftstream.com/video/10974869-e53e-4621-961e-6a6922203374

Data Synthesis:
[09] Lecturer video - https://web.microsoftstream.com/video/84fb76a5-0c81-4ac9-8548-d8a6ed609366
[10] https://www.simplilearn.com/top-python-libraries-for-data-science-article#7_scikitlearn
[11] https://scikit-learn.org/stable/
[12] https://scikit-learn.org/stable/supervised_learning.html#supervised-learning
[13] https://www.freecodecamp.org/news/python-functions-define-and-call-a-function/#:~:text=Basic%20Syntax%20for%20Defining%20a,function%20to%20do%20for%20you.
[14] https://www.projectpro.io/recipes/create-simulated-data-for-classification-in-python
[15] https://www.geeksforgeeks.org/how-to-create-simulated-data-for-classification-in-python/
[16] https://towardsdatascience.com/https-medium-com-faizanahemad-generating-synthetic-classification-data-using-scikit-1590c1632922
[17] https://stackabuse.com/generating-synthetic-data-with-numpy-and-scikit-learn/
[18] Troubleshooting - https://stackoverflow.com/questions/45554008/error-in-python-script-expected-2d-array-got-1d-array-instead
[19] https://stackoverflow.com/questions/22071987/generate-random-array-of-floats-between-a-range
[20] https://www.w3schools.com/python/ref_random_uniform.asp
[21] https://en.wikipedia.org/wiki/NumPy

Distribution: [22] https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.lognormal.html#numpy.random.Generator.lognormal

Jupyter Notebooks:
[23] https://stackoverflow.com/questions/48655801/tables-in-markdown-in-jupyter

Readme file editing:
[24] https://medium.com/analytics-vidhya/the-jupyter-notebook-formatting-guide-873ab39f765e
[25] https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-readmes

ATU:
[26] Software Freedom Conservancy. Git. https://git-scm.com/
[27] https://www.atu.ie/sites/default/files/2022-08/Student%20Code_Final_August_2022.pdf
Inspiration:
[28] Karsten Jeschkies: https://github.com/jeschkies/gensim