Skip to content

TODO: Format of Machine Learning API

Kelvin Ng edited this page Dec 20, 2016 · 5 revisions

1. Problem-and-model-based

the whole machine learning algorithm takes the form: a model assigns the optimization problems to different types of optimizers and uses the solution to do inference. (@TatianaJin, 2016)

More accurately, each model could assign different optimization problems to different types of optimizes, but use the same algorithm to do inference with the solution. So, model, optimization problem and optimization algorithms are three separate parts. So, they should be organized in the following way:

class XXXModel {
public:
    double predict(Vector data);
protected:
    Vector w;
};

class XXXProblem : public XXXModel {
public:
    double cost_func(); // it may depend on predict() in the Model class
    Vector cost_func_grad();
    Vector cost_func_grad(int i); // if the objective function can be expressed as f(x) = f_1(x) + f_2(x) +...+ f_n(x), it returns gradient of f_i(x)
    double regularization_func(); // seems that it is not needed in most cases
    double regularization_func_grad();
    double prox_func();
    void update_w(Vector updates); // the problem may want to update the weights in some special ways, such as dropout. If gradient descent is used to optimize this problem, the gradient will be passed as updates
private:
    Matrix data; // matrix is just an abstract type. It could be an object list of vector etc.
};

Users may want to only implement a subset of the above APIs, depending on what optimization algorithms they want to use. However, machine learning algorithms inside husky lib should implement all of them.

Then, to optimize a problem, first initialize the Problem object, and then call something like some_optimization_algo(problem, some, parameters, for, that, algo).

Note that, for example, L1 norm and L2 norm regularized logistic regression are two different optimization problems of the same model, so they correspond to two different Problem classes inheriting the same Model class.

2. Problem-based

(Deprecated. This should be worse than Problem-and-model-based. Leave it there for historical reason) Basically, each machine learning problem is an optimization problem trying to optimize a function. Each problem can be made as an object. This object is responsible for telling others about the structure and characteristics of the problem. So, this object is in the form of the following (for example only):

class XXXProblem {
public:
    double cost_func();
    Vector cost_func_grad();
    Vector cost_func_grad(int i); // if the objective function can be expressed as f(x) = f_1(x) + f_2(x) +...+ f_n(x), it returns gradient of f_i(x)
    double regularization_func(); // seems that it is not needed in most cases
    double regularization_func_grad();
    double prox_func();
    void update_w(Vector updates); // the problem may want to update the weights in some special ways, such as dropout. If gradient descent is used to optimize this problem, the gradient will be passed as updates
private:
    Matrix data; // matrix is just an abstract type. It could be an object list of vector etc.
    Vector w;
};

Users may want to only implement a subset of the above APIs, depending on what optimization algorithms they want to use. However, machine learning algorithms inside husky lib should implement all of them.

Then, to optimize a problem, first initialize the Problem object, and then call something like some_optimization_algo(problem, some, parameters, for, that, algo).

Note that the object is an object for an optimization problem, not a machine learning model. For logistic regression, for example, L1 norm and L2 norm regularized are two different optimization problems, and so correspond to two different Problem classes. However, this can be solved by doing inheritance.

3. Model-based

This is indeed complicated. Roughly speaking, we construct such a class, for example:

class XXXModel {
public:
    double cost_func();
    Vector cost_func_grad();
    Vector cost_func_grad(int i); // if the objective function can be expressed as f(x) = f_1(x) + f_2(x) +...+ f_n(x), it returns gradient of f_i(x)
    void update_w(Vector updates); // the problem may want to update the weights in some special ways, such as dropout. If gradient descent is used to optimize this problem, the gradient will be passed as updates
private:
    Matrix data; // matrix is just an abstract type. It could be an object list of vector etc.
    Vector w;
};

Then, to train the model, first initialize the Model object, and then call something like some_optimization_algo(model, regularization_func_grad / prox_func, some, parameters, for, that, algo).

In this way, each type of model corresponding to one Model class. However, this is not rigorous enough. In fact, for the same model, the cost function can be different, e.g. linear regression vs least absolute deviations (for both of them the model is f(x) = w' x + b).

4. Algorithm-based

4.1 Setting at runtime

An optimization algorithm is used in this way: xxx_opt_algo(cost_func, cost_grad_func, cost_partial_grad_func, regularization_func_grad / prox_func, some, parameters, for, the, algo); or

XXXOptAlgo xxx_opt_algo;
xxx_opt_algo.set_cost_func(cost_func);
xxx_opt_algo.set_cost_grad_func(cost_grad_func);
...

4.2 Using Subclass

Each optimization function is an abstract class like that:

class XXXOptAlgo {
public:
    virtual double cost_func() = 0;
    virtual Vector cost_func_grad() = 0;
    virtual Vector cost_func_grad(int i) = 0; // for something like SGD
    virtual double regularization_func() = 0; // seems that it is not needed in most cases
    virtual double regularization_func_grad() = 0;
    virtual double prox_func() = 0;
    virtual void update_w(Vector updates) = 0; // the problem may want to update the weights in some special ways, such as dropout. If gradient descent is used to optimize this problem, the gradient will be passed as updates
    void train(some, parameters);
};

For each optimization problem, the user inherit this class and implement the virtual functions. Then, construct the object and call train(some, parameters); In fact this is similar to the first one, but this is a bit clumsy.

Performance issue

It seems that the choice 1, 2 and 4.2 can give the highest performance when they are implemented in a suitable way. For choice 1 and 2, if the Problem object is passed as a template parameter, the function call will become a normal function call. For choice 4.2, as long as one does not use a pointer or reference of the parent type to point to the object, normal function call will be done. However, in the choice 3 and 4.1, virtual function call (hidden in std::function) or function pointer will be used and may affect performance.