Skip to content

Latest commit

 

History

History
88 lines (71 loc) · 3.64 KB

3.2_Classification_Algorithms.md

File metadata and controls

88 lines (71 loc) · 3.64 KB

3.2 Classification Algorithms

Linear Classifier

Uses a linear function to divide each set into two subsets

  • For 3+ classes: fit N-1 lines (N classes)

Nearest Neighbor Classifier

If the nearest instance to the previously unseen instance is A, then the class is also A, otherwise it is B.

  • the decision surface of a NN classifier (also called Theissen regions) is implicit and divide the space into regions "belonging" to each instance and corresponding class.

KNN Algorithm

Generalization of the nearest neighbor algorithm

  • Find the nearest K (typically chosen to be an odd number) instances
  • Each one represents a vote

Sensitivity to Irrelevant Attributes

  • Using more attributes can sometimes lead to getting the wrong classification for an instance, when using less gave the right one;
  • Nevertheless, using some subsets can also lead to wrong classifications.

Advantages

  • Simple to implement
  • Handles correlated features
  • Defined for any distance measure
  • Handles streaming data trivially

Disadvantages

  • Very sensitive to irrelevant features
  • Slow classification time for large datasets
  • Work best for real valued datasets

Decision Tree Classifier

Avoid Overfitting

  • Prepruning: halt tree construction early
    • do not split a node if this would result in the goodness measure falling below a threshold
  • Postpruning: get a sequence of progressively pruned tree; use a set of data different from the training data to decide which is the "best pruned tree"

Advantages

  • Easy to understand
  • Easy to generate rules

Disadvantages

  • Overfitting
  • Does not handle correlated features very well (classified by rectangular partitioning)
  • Can be quite large (pruning is necessary)

Bayes Classifier

Use Bayes theorem, which says:

  • The probability of an instance being in a certain class is the probability of that instance being generated by that class times the probability of occurence of that class, divided by the probability of that instance existing.
  • Assuming independent distributions, the probability of a certain class generating the instance is the multiplication of the probability of that class generating the observed value for each of the features of that instance.

Advantages

  • Fast to train (single scan) and to classify
  • Not sensitive to irrelevant features
  • Handles real and discrete data
  • Handles streaming data well

Disadvantages

  • Assumes independence of features

Neural Network Learning

  • Set of neurons (input/output units) connected with weights
  • Supervised learning adjusts weights to ensure outputs to given inputs are the expected ones
  • Predicting: feed input values; collect outputs

Advantages

  • Universal: fit any continuous function
  • Versatile: output may be one or more discrete and real values
  • Online: application and learning are intertwined
  • Robust to errors and noise data
  • Fast application to new examples
  • Parallel

Disadvantages

  • Slow training
  • Low usability: empirical parameter tuning; network topology and learning rate
  • Low Interpretability: understand the weights
  • Low Adaptability: not easy to incorporate domain knowledge

Support Vector Machines (SVM)

  • Linear learning machines with maximisation of margin (better separation between classes);
    • Hyperplane farthest from both classes has less risk of overfitting.
  • Higher robustness to the curse of dimensionality;
  • For non-linear functions, map attributes to space where linear discrimination is possible.

SVM for Regression:

  • Minimize the tube "around" the data;
  • Instead of maximizing the distance to the closest examples from each class.