Medical Insurance Pricing Analysis Project

Project Overview

This project focuses on analyzing the factors that influence medical insurance pricing. The key objective is to examine relationships between several demographic and lifestyle factors such as BMI, age, smoking status, region, and medical charges. Additionally, the project implements machine learning models to predict insurance charges using these attributes.

Key Steps of the Project

Data Preprocessing:
- Normalization of continuous variables (e.g., Age, BMI, Medical Charges).
- Encoding categorical variables (e.g., Sex, Smoker, Region) using one-hot encoding.
- Log transformation of skewed features (e.g., Medical Charges) to ensure normal distribution.
Exploratory Data Analysis:
- Visualizing raw distributions of key variables.
- Performing correlation analysis to identify relationships between features.
- Using statistical tests like Kolmogorov-Smirnov, Mann-Whitney U, and ANOVA to test hypotheses.
Feature Engineering:
- Outlier removal using standard deviation thresholds.
- Standardization and PCA for dimensionality reduction.
- Feature selection based on importance measures.
Machine Learning Models:
- Linear Regression, Random Forest, Support Vector Machines, and XGBoost models were trained and evaluated.
- SHAP (SHapley Additive exPlanations) was used to interpret the models and identify the most important features for predicting medical charges.

Dataset

The dataset used in this project is publicly available on Kaggle and contains information about demographic and health-related attributes for 1,400 individuals, including:

Age
Sex
BMI (Body Mass Index)
Number of children
Smoking status
Region
Medical charges

How to Run

Follow these steps to set up and run the project:

Clone the repository:

git clone https://github.com/Haimzis/KaggleMedicalInsurancePrediction.git

Navigate to the project directory:
```
cd KaggleMedicalInsurancePrediction
```
Install the required Python packages: Make sure you have Python 3.x installed on your system. Run the following command to install all the required packages:
```
pip install -r requirements.txt
```
Run the Jupyter notebook: After installing the dependencies, you can run the Jupyter notebook to reproduce the analysis:
```
jupyter notebook
```
Open the notebook.ipynb and run all the cells to perform the analysis and generate results.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
README.md		README.md
notebook.ipynb		notebook.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Insurance Pricing Analysis Project

Project Overview

Key Steps of the Project

Dataset

How to Run

About

Releases

Packages

Languages

Haimzis/KaggleMedicalInsurancePrediction

Folders and files

Latest commit

History

Repository files navigation

Medical Insurance Pricing Analysis Project

Project Overview

Key Steps of the Project

Dataset

How to Run

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages