Skip to content

Latest commit

 

History

History
50 lines (35 loc) · 1.28 KB

README.md

File metadata and controls

50 lines (35 loc) · 1.28 KB

python_machine_learning

Experiment Results

PCA(n_out=500) -> Random Forest(n_estimators = 100,'entropy')
n = 50000
n_train = 37500
n_test = 12500
time: 126.9m
f-score: 0.44
classifcation error:0.46

Conducted walk-through on the experiment above by fitting/testing the model with moving windows in time. The f-score was stable (ranged from 0.43 to 0.44) throughout the walkthrough.

RBM(n_out=500) -> RandomForest(n_estimators = 100, 'entropy')
n = 50000
n_train = 37500
n_test = 12500
f-score: 0.44
classification error:0.46
*Experimented with different values for n_out, but f-score didn't change significantly *Also, experimented with multiple layers of RBM, but f-score still didn't change much *Trained for longer time period (n_train = 87500) and tested on same set, but still got similar f-score.

RBM(n_out=1000) -> LogisticRegression
Since sklearn's LogisticRegression does not support labels with multiple columns, LogisticRegression was performed label by label for a total of 43 times. Also, used grid-search to get optimal value of 'C' (inverse of regularization strength).
n = 50000
n_train = 37500
n_test = 12500
f-score = 0.36