Product Classification of Electron-Proton Scattering

Comparison of different supervised learning models for the classification of particles produced during inelastic electron-proton scattering. The goal is to identify particles and evaluate the best model among the following:

Decision Tree
Random Forest
Multilayer Perceptron
K-Nearest Neighbor

The data used are the product of the response of six different detectors, usanti through the simulation platform GEANT4. The DataSet can be found at Kaggle.

DataSet composition

The features in the dataset used are as follows:

Features	Meaning	Dimension
id	Particle Name	$NoDim$
p	Momentum	$GeV/c$
theta	Scattering angle	$rad$
beta	Relativistic Veliocityu $v$ e $c$	$NoDim$
nphe	Number of Photoelectrons	$NoDim$
ein	Input Energy	$GeV$
eout	Output Energy	$GeV$

A ciascun id inoltre è associata una precisa particella:

id	Particle	Symbol	Mass (MeV)
(-11)	Positron	$e^+$	$0.51$
(211)	Pion	$\pi \quad (\pi^0,\pi^+,\pi^-)$	$137$
(321)	Kaon	$K \quad (K^0,K^+,K^-)$	$495$
(2212)	Proton	$p$	$940$

Regardless of the underlying physical model, i.e., the Standard Model, it is sufficient to know that each particle is characterized by a certain set of values and in particular their rest mass.

Exploratory Analysis & Data Preparation.

The following libraries were exploited for preliminary data preparation:

Numpy
MatPlotLib
Pandas
Seaborn
Imbalanced Learn

In addition, physical and intuitive observations enabled further preparation and simplification of the dataset.

Finally, prior to the construction of the MachineLearning models and their subsequent training, a resampling procedure was performed to correct the balance of the dataset.

Model Construction & Training

Supervised learning was chosen for the choice of models, in particular, classifiers were exploited. The latter, based on the previously mentioned algorithms, are offered by the Scikit Learn library:

DecisionTreeClassifier
RandomForestClassifier
MLPClassifier
KNeighborsClassifier

Specifically for the RandomForestClassifier the importance of features for model training was evaluated, confirming what had been inferred in the exploratory analysis.

While for MLPClassifier an optimization of hyperparameters was chosen, by means of a GridSearch.

Comparison & Conclusion

Finally, accuracy and ,visually, confusion matrices were used as metrics to evaluate the efficiency of the models. The results do not show a predominance of one model over another, but the KNeighborsClassifier is the least efficient.

Classifier	Accuracy
Decision Tree	89.6%
Random Forest	93.2%
ML Perceptron	93.1%
K-Nearest Neighbor	88.6%

Authors

Marco Cecca - marcoccecca@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Images		Images
ParticlesClassifier.ipynb		ParticlesClassifier.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Product Classification of Electron-Proton Scattering

DataSet composition

Exploratory Analysis & Data Preparation.

Model Construction & Training

Comparison & Conclusion

Authors

About

Releases

Packages

Languages

marcocecca00/ParticleID

Folders and files

Latest commit

History

Repository files navigation

Product Classification of Electron-Proton Scattering

DataSet composition

Exploratory Analysis & Data Preparation.

Model Construction & Training

Comparison & Conclusion

Authors

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages