Stroke has become a significant global public health issue in recent years. One solution is to control metabolic factors. However, the medical staffs have difficulty predicting people getting stroke unless it is obviously abnormal. Therefore, it is necessary to predict stroke using modelling and valid data.
In this project, the dataset used is about people who have had a stroke and those who have not, which is considered from several causes or features. This dataset is also an imbalanced dataset where the target class (supervised learning) has an uneven distribution of observations. Then, several machine learning algorithms are used to determine what factors influence stroke and what ML algorithms is most suitable for predicting stroke.
Conclusions:
- All features in the dataset are used to analysing (no redundant features)
- Individual's age is the highest factor that affects stroke
- Gender, residence do not have much effect on having stroke, it depends on the genetics and lifestyle
- Best algorithm is decision tree classifier
Recommendations:
Adding more features about genetics and lifestyle. Example:
- Stroke history from their parents
- Daily food
- Physical activity