Sklearn KNN | k-nearest neighbor classifier in Python

KNN (k-nearest neighbor) is a supervised machine learning algorithm. The KNN algorithm can be used for classification and regression. We can quickly implement KNN in Python using the Sklearn library. You have to follow the given steps.

Step 1: Import the libraries

import numpy as np
import seaborn as sns

Step 2: Import the iris dataset

iris_data = sns.load_dataset("iris")
print(iris_data.head())

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

Step 3: Split the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split
X = iris_data.iloc[:, :-1].values
y = iris_data.iloc[:, 4].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

Step 4: Feature Scaling

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Step 5: Fitting K-NN to the Training set using Sklearn

from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors = 5)
model.fit(X_train, y_train)

Step 6: Prediction on the Test set

y_pred = model.predict(X_test)

Step 7: Accuracy on the training set and test set

from sklearn.metrics import accuracy_score
print("Accuracy on training set: ", accuracy_score(y_train, model.predict(X_train)))
print("Accuracy on test set", accuracy_score(y_test, y_pred))

Accuracy on training set:  0.9642857142857143
Accuracy on test set 0.9736842105263158

Step 8: Confusion Matrix

from sklearn.metrics import confusion_matrix
c_matric = confusion_matrix(y_test, y_pred)
print(c_matric)

[[13  0  0]
 [ 0 15  1]
 [ 0  0  9]]

Complete Code:

# Step 1: Import the libraries 
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

# Step 2: Import the iris dataset
iris_data = sns.load_dataset("iris")
print(iris_data.head())

# Step 3: Split the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X = iris_data.iloc[:, :-1].values
y = iris_data.iloc[:, 4].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state = 0)

# Step 4: Feature Scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

# Step 5: Fitting K-NN to the Training set
from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors = 5)
model.fit(X_train, y_train)

# Step 6: Prediction on Test set
y_pred = model.predict(X_test)

# Step 7: Accuracy on training set and test set
from sklearn.metrics import accuracy_score
print("Accuracy on training set: ", accuracy_score(y_train, model.predict(X_train)))
print("Accuracy on test set", accuracy_score(y_test, y_pred))

# Step 8: Confusion Matrix
from sklearn.metrics import confusion_matrix
c_matric = confusion_matrix(y_test, y_pred)
print(c_matric)

Output:

   sepal_length  sepal_width  petal_length  petal_width species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa
Accuracy on training set:  0.9642857142857143
Accuracy on test set 0.9736842105263158
[[13  0  0]
 [ 0 15  1]
 [ 0  0  9]]

Free learning resources: AiHints, CodeAllow

Related Posts

Leave a Comment Cancel Reply