Health-Care

DESCRIPTION

Cardiovascular diseases are the leading cause of death globally. It is therefore necessary to identify the causes and develop a system to predict heart attacks in an effective manner. The data below has the information about the factors that might have an impact on cardiovascular health.

To download the complete document click here

To download the dataset click here

Problem statement:

Cardiovascular diseases are the leading cause of death globally. It is therefore necessary to identify the causes and develop a system to predict heart attacks in an effective manner. The data below has the information about the factors that might have an impact on cardiovascular health.

Dataset description:

Variable	Description
Age	Age in years
Sex	1 = male; 0 = female
cp	Chest pain type
trestbps	Resting blood pressure (in mm Hg on admission to the hospital)
chol	Serum cholesterol in mg/dl
fbs	Fasting blood sugar > 120 mg/dl (1 = true; 0 = false)
restecg	Resting electrocardiographic results
thalach	Maximum heart rate achieved
exang	Exercise induced angina (1 = yes; 0 = no)
oldpeak	ST depression induced by exercise relative to rest
slope	Slope of the peak exercise ST segment
ca	Number of major vessels (0-3) colored by fluoroscopy
thal	3 = normal; 6 = fixed defect; 7 = reversible defect
Target	1 or 0

Task to be performed:

Preliminary analysis:
a. Perform preliminary data inspection and report the findings on the structure of the data, missing values, duplicates, etc.

b. Based on these findings, remove duplicates (if any) and treat missing values using an appropriate strategy.
Prepare a report about the data explaining the distribution of the disease and the related factors using the steps listed below: a. Get a preliminary statistical summary of the data and explore the measures of central tendencies and spread of the data.

b. Identify the data variables which are categorical and describe and explore these variables using the appropriate tools, such as count plot.

c. Study the occurrence of CVD across the Age category.

d. Study the composition of all patients with respect to the Sex category.

e. Study if one can detect heart attacks based on anomalies in the resting blood pressure (trestbps) of a patient.

f. Describe the relationship between cholesterol levels and a target variable.

g. State what relationship exists between peak exercising and the occurrence of a heart attack.

h. Check if thalassemia is a major cause of CVD.

i. List how the other factors determine the occurrence of CVD.

j. Use a pair plot to understand the relationship between all the given variables.
Build a baseline model to predict the risk of a heart attack using a logistic regression and random forest and explore the results while using correlation analysis and logistic regression (leveraging standard error and p-values from statsmodels) for feature selection.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Health-Care

DESCRIPTION

Problem statement:

Dataset description:

Task to be performed:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Health-Care

DESCRIPTION

Problem statement:

Dataset description:

Task to be performed: