What is the fundamental difference that makes a labelling obtained from a classification differ from that of a clustering?
What is eager classification?, lazy classification?  In terms of computational cost (space and time), what does it entail?  Can you give one example of an eager classification model?, of a lazy classification model?
In the way the attributes are considered, a decision tree and a naive Bayes can be considered opposite algorithms. Why?
Between a decision tree and a SVM, what method usually provides the most accurate predictions? What makes the other one valuable in some circumstances?
What is over-fitting/over-generalizing?  Illustrate them with the k-nearest neighbors algorithm.  Can you give, for other algorithms to learn classifiers, hyperparameters to tune so that those problems are avoided?
Explain the 5-fold stratified cross-validation in details?
What distribution of classes makes many algorithm output bad classfiers?  In what sense are they bad?  Do you know ways to address this problem?
