Skip to content

Project for wrangling of Uber Dataset. Missing Values, Falsified Values and multiple type of outliers in the dataset has been removed using tools and techniques of Data Wrangling.

Notifications You must be signed in to change notification settings

sohail-sankanur/Cleaning-Uber-Dataset

Repository files navigation

Cleaning-Uber-Dataset

Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one "raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes such as analytics.

In today's date we have people from the data science community who put in a lot of effort to learn about machine learning, deep learning and many new technologies however we forget that for performing all these tasks the main fuel for this is good data. The world is producing data at an enormous rate however obtaining good data is very difficult. I found a lot of data online for Uber rides which have been taken over the world, however the data which was present had many anomolies and errors. Hence I decided why not do some wrangling on the data to make it more valuable for analysis.

Here I have classified the datasets into three categories and all the data has some form of error. Using mathematical algorithms and techniques the datasets have been wrangled and the errors have been removed from the dataset.

The project is done using Python3. All the steps have been elaborately explained and reason for doing each step has also been mentioned. The dataset with the "_dirty" name is the data with errors and after executing the ipynb notebook completely the clean datasets could be obtained.

About

Project for wrangling of Uber Dataset. Missing Values, Falsified Values and multiple type of outliers in the dataset has been removed using tools and techniques of Data Wrangling.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published