Decision Tree Algorithm - Basic Implementation in Python With Examples

Hey folks! In this article, we will be having a look at one of the most interesting concepts in the domain of Machine Learning and Artificial Intelligence โ€” Decision Tree.

Machine Learning is a vast domain that contains numerous algorithms to train the data on and predict the outcome of a model and thus increase the business standards and strategies.

Decision Tree delivers insights about the data in terms of predictions and probabilistic outcomes.

So, let us begin!


What is a Decision Tree?

Decision Tree is a Machine Learning Algorithm that makes use of a model of decisions and provides an outcome/prediction of an event in terms of chances or probabilities.

It is a non-parametric and predictive algorithm that delivers the outcome based on the modeling of certain decisions/rules framed from observing the traits in the data.

Being a supervised Machine Learning algorithm, Decision Trees learn from the historic data. So, the algorithm trains and builds the model of decisions based on it, and then predicts the outcome.

Decision Trees can perform either of the following tasks:

  • Classification of a record based on probabilities to which category the records belongs to.
  • Estimation of the value of a certain target value.

We will learn more about the above tasks in the upcoming section.

In programming terms, a Decision Tree represents a flowchart-resembling structure which delivers the outcome based on the probabilistic conditions.

Decision Tree Algorithm

Decision Tree
  • The internal node represents the condition on the attributes/variables.
  • Every branch represents the outcome value of the condition.
  • The leaf nodes gives information about the decision to be taken after resolving all the conditions of every attribute.

Decision Tree uses various algorithms such as ID3, CART, C5.0, etc to identify the best attribute to be placed as the root node value that signifies the best homogeneous set of data variables.

It begins with the comparison between the root node and the attributes of the tree. The comparison value evaluates the model of decisions. This continues until it reaches the leaf node with the predicted outcome classification or estimation value.

Now, let us have a look at the different types of Decision Trees in detail.


Types of Decision Tree

Decision Tree predicts the following types of values:

  • Classification: It predicts the class of the target value for the categorical data variables. For example, prediction if the attribute gives the outcome as YES or NO.
  • Regression: It predicts the outcome value of the target variable for the numeric/continuous variables. For example, prediction of the Bike Rental Count in the near time.

Assumptions of a Decision Tree

  • Before the first iteration, i.e. before defining the model of decisions, the entire training dataset is assumed as the value for the root node.
  • The data values/attributes are distributed recursively.
  • A statistical approach is assumed to be used to finalize the order of placing the attribute as root/attribute node values.

Implementation of Decision Tree using Python

Having understood the working of Decision Trees, let us now implement the same in Python.

Please visit the below link to find the entire dataset.

Bike.csv

This data set contains a target variable โ€“ โ€˜cntโ€™. It is a regression problem because here we have to predict the count of customer who would rent a bike in the near time based on certain conditions.

In the below piece of code, we have imported the necessary libraries such as os, numpy, pandas, sklearn, etc.

Now, we have defined a function to judge the error value of the performance of the Decision Tree.

We have used MAPE error metrics for this algorithm as this is a regression evaluation problem.

Taking it ahead, we segregate the dataset into two data sets comprising of independent and dependent variables. Further, we use the train_test_split() function to divide the dataset into training and testing data.

Now is the time to apply the decision tree algorithm. We used DecisionTreeRegressor() function to build and train the model.

Further, we predict the values of the target variable in the testing dataset using the predict() function.

After this, we calculate the MAPE value (error value) and subtract it from 100 to get the accuracy of the model.

Output:


Advantages of Decision Trees

  • Decision Tree are quite useful in Exploratory Data Analysis and Predictive Analysis of data and helps find the relationship between the variables.
  • It implicitly selects the best variables for building the model.
  • Decision Tree is almost unaffected by Outlier values.
  • Moreover, it is a non-parametric method i.e. it does not hold any change due to the memory distribution or structure of the classifier.

Limitations of Decision Trees

  • Decision Tree may lead to Overfitting of data i.e. it can create complex tree structures.
  • Decision Tree may create biased trees if the data is not balanced.

Conclusion

By this, we have come to the end of this topic. Feel free to comment below, in case you come across any question.

Please find the link to my GitHub profile wherein you would find the entire code along with other machine learning algorithms.


References

By admin

Leave a Reply

%d bloggers like this: