Sklearn Decision Tree | Decision Tree Classifier in Python

A decision tree is a supervised machine-learning algorithm. A decision tree can be used for classification and regression. It is mostly used for classification tasks. It looks like a tree structure. We can quickly implement a Decision Tree in Python using the Sklearn library. You have to follow the given steps to implement the decision tree classifier.

Step 1: Import the libraries

import seaborn as sns

Step 2: Import the iris dataset

iris_data = sns.load_dataset("iris")
print(iris_data.head())
   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

Step 3: Split the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split
X = iris_data.iloc[:, :-1].values
y = iris_data.iloc[:, 4].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

Step 4: Feature Scaling

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Step 5: Fitting Decision Tree to the Training set using Sklearn

from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
model.fit(X_train, y_train)

Step 6: Prediction on the Test set

y_pred = model.predict(X_test)

Step 7: Accuracy on the training set and test set

from sklearn.metrics import accuracy_score
print("Accuracy on training set: ", accuracy_score(y_train, model.predict(X_train)))
print("Accuracy on test set", accuracy_score(y_test, y_pred))
Accuracy on training set:  1.0
Accuracy on test set 0.9736842105263158

Step 8: Confusion Matrix

from sklearn.metrics import confusion_matrix
c_matric = confusion_matrix(y_test, y_pred)
print(c_matric)
[[13  0  0]
 [ 0 15  1]
 [ 0  0  9]]

Step 9: Plot Decision Tree

from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
plt.figure(figsize=(15, 10))
plt.title("Decision Tree of Iris Dataset")
plot_tree(model, filled=True)
plt.show()

Complete Code:

# Step 1: Import the libraries
import seaborn as sns

# Step 2: Import the iris dataset
iris_data = sns.load_dataset("iris")

# Step 3: Split the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split

X = iris_data.iloc[:, :-1].values
y = iris_data.iloc[:, 4].values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Step 4: Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Step 5: Fitting Decision Tree to the Training set
from sklearn.tree import DecisionTreeClassifier

model = DecisionTreeClassifier(criterion = 'entropy', random_state = 0)
model.fit(X_train, y_train)

# Step 6: Prediction on Test set
y_pred = model.predict(X_test)

# Step 7: Accuracy on training set and test set
from sklearn.metrics import accuracy_score
print("Accuracy on training set: ", accuracy_score(y_train, model.predict(X_train)))
print("Accuracy on test set", accuracy_score(y_test, y_pred))

# Step 8: Confusion Matrix
from sklearn.metrics import confusion_matrix
c_matric = confusion_matrix(y_test, y_pred)
print(c_matric)

Output:

Accuracy on training set:  1.0
Accuracy on test set 0.9736842105263158
[[13  0  0]
 [ 0 15  1]
 [ 0  0  9]]

Free learning resources: AiHintsCodeAllow

Leave a Comment

Your email address will not be published.