Skip to content

Latest commit

 

History

History
12 lines (10 loc) · 1.5 KB

README.md

File metadata and controls

12 lines (10 loc) · 1.5 KB

Parallel Decision Tree Building

Sirui Li, Xueyan Xie

CSE 549 Final Project

We studied how to parallelize the Decision Tree Building procedure in a multi-processors setting. We first used Cilk spawn, sync, and parallel for to parallelize the algorithm. We tested its correctness as well as the time and speedup. Then we looked at an approximation algorithm, Streaming Parallel Decision Tree, which is designed for classifying large data sets and streaming data. It assumes a setting where the size of data set is so large that it has to be stored distributively in different processors. We implemented SPDT in shared memory machine, measured its accuracy and performance and compared it with our recursive decision tree. At last, we applied the decision tree algorithm as a subroutine to build Bagging Forest. We saw some very interesting behaviors in terms of accurracy, time and speedup.

Recursive Decision Tree Performance

SPDT Performance