Best Split Doesn’t Necessarily Produce the Best Decision Tree


Decision trees are popular as predictive models because of their intuitiveness and competitive performance with respect to other model building methodologies. A decision tree model is build using the data at hand, that is the training data, by successively splitting the data into purer and purer subsets in a top-down manner. The quality of any potential split of the data is measured by one of a handful of split quality measures such as the Gini index or the entropy measure. These or other similar measures essentially quantify the level of impurity in the resulting subsets of an split.

Just to get an idea of how the quality of a potential split is determined, take a look at the slide below which shows a set of possible splits for an illustrative example and how their quality is calculated using the entropy measure.

image

All decision tree modeling methods go for the best split at each stage of the model building process with the understanding that the resulting tree model will be better than the tree model that will result from not choosing the best split. Here I just want to show an example where this is not true. It is shown in the slide below where the worst split as per the entropy measure yields an inferior tree in terms of tree size and model clarity. The reason for this is the greedy nature of split selection criterion which doesn’t include any look-ahead component.

image

If you have come across any other example where a similar result is obtained, then please share that example with us.

Thanks.

Advertisements

Author: Krishan

I am a professional with over 40 years of experience in computer vision, data mining, machine learning, and pattern recognition. I provide consulting and training services through my company, Integrated Knowledge Solutions.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s