This study focus on develop a odor predict model and interpret the model's classification result by using explainable AI method.
- Prevention of odor in pig barns by managing chemical substances (odor substances) that affect odor generation
- Creation of an optimal prediction model for complex odors using 15 odorous substances
- Identification of the influence of odorous substances on complex odors and the interaction effect between odorous substances
- Creation of a complex odor classification prediction model using 15 odorous substances and measurement-related variables
- Prevention of bad smell in pig houses by managing chemical substances (odor substances) that affect odor generation
- explanatory variable : Complex odor
- response variable : 15 odorous substances
- Ammonia
- Sulfur compounds: Hydrogen Sulfide, Methyl mercaptan, Dimethyl sulfide, Dimethyl disulfide
- Volatile Organic compounds: Acetic acid , Propionic acid, Butyric acid, Iso-Butyric acid, Valeric acid, Iso-Valeric aic, Phenol, para-Cresol, Indole, Skatole
- Compare different analysis processes to find the optimal predictive model
- Data problems and solutions
- High missing rate: Considering the fact that the missing rate may be high considering data collection through sensors in the future, consider the replacement method rather than the missing value removal method
- Small amount of data: Model validation through the Leave-One-Out Cross Validation (LOOCV) method that can be used when there is little data
- Data pre-processing
- Missing imputation: Simple imputation (mean, median), Multivariate imputation (bayesian), Multiple imputation (bayesian ridge, gaussian process regression, KNN)
- Feature preprocessing: standardization, Partial Least Square (PLS), Principal Component Analysis (PCA)
- Prediction models: Regression, SVM, RandomForest, ExtraTree, XGBoost, DNN
- Model Verification: Using R-square, MAPE through LOOCV
- Additional Analysis: Correlation Analysis, Principal Component Analysis(PCA), Identification of predictor feature importance
- Features related to measurement: measurement time (year, month, day), measurement location (inside the pig barn, outside the pig barn, site boundary)
- summary
- Perform data preprocessing based on primary research and compare multiple machine learning models
- Minimize overfitting by analyzing 30 times and select the optimal model through 8 evaluation indicators
- Identification of the influence and interaction effect of odor spray through the XAI method
- Data pre-processing
- Complex odor: Conversion of continuous data into binary classification data in the form of emission possible / non emission in accordance with the domestic odor prevention law
- Measurement-related variables: Measurement time variables are converted into seasonal variables, followed by One-Hot Encoding, and measurement location variables One-Hot Encoding
- Variable preprocessing: Multivariate imputation (bayesian ridge) & Standardization
- Prediction models: k-Nearest Neighbor, SVC, RandomForest, LightGBM, ExtraTree, XGBoost
- Model validation: F1-score, Accuracy, Sensitivity, Specitiv
- Identification of influence: XAI - Partial Dependence Plot, variable importance
- Additional analysis: correlation analysis and VIF (continuous variable), ANOVA (categorical variable)