Module 6#

Information Gain#

Pasted image 20220417111409.png
Pasted image 20220417111739.png
Pasted image 20220417112504.png

Pasted image 20220417113241.png
Pasted image 20220417113622.png

Inexpensive to construct
Extremely fast at

Inductive bias is the assumption made by the model to learn the target function and to generalize beyond training data
What is the inductive bias of DT Learning
- Shorter trees are preferred over longer trees
- Prefer trees that place high information gain attributes close to the root

Overfitting
Building trees that "adapt too much" to the training example may lead to "overfitting"
May therefore fail to fit additional data or predict future observations reliably

Pasted image 20220417114048.png

Training data accuracy increases but the test data accuracy decreases when the size of the tree increases

Pre pruning:
- Stop the algorithm before it becomes a fully grown tree
- General stopping conditions for a node
  - Stop if all instances belong to the same class
  - Stop if all the attribute values are the same
- More restrictive conditions:
  - Stop if number of instances is less than some user specified threshold
  - Stop if class distribution of instances are independent of the available features
  - Stop if expanding the current node foes not improve impurity measures.
Post pruning:
- Grow decision tree to its entirity
- Trim the nodes of the decision tree ina bottom up fashion
- If generalization error improves after trimming, replace sub tree by a leaf node
- Majority class of instances in the sub tree is used as the class label of leaf node.

Use multiple learning algorithms to obtain better predictive performance than could be obtained from any one of the constituent learning algorithm
By combining individual models we can get less bias and less variance.
This method will perform worse if the error rate is \(\varepsilon \gt 0.5\)
Each base class has to be independent of each other.

Pasted image 20220417121350.png

Simple methods:
- Max voting
- Averaging
- Weighted averaging
Advanced methods:
- Bagging: Homogeneous weak learners, learning done independently from each other.
- Boosting: Homogeneous weak learners, learning done sequentially in a adaptive way.
- Stacking: Heterogeneous weak learners, learning done independently and combines them by training a meta model to output a prediction based on the different weak models predictions.

Pasted image 20220417123516.png
Pasted image 20220417123524.png
Pasted image 20220417123757.png

Pasted image 20220417124104.png
Pasted image 20220417124123.png
Pasted image 20220417124150.png
Pasted image 20220417124457.png