Classification(iris data with KNeighborsClassifier, SVC and Decision Tree)

2022-07-17 2 분 소요

[Notice] [ML_6]

Classification(iris data with KNeighborsClassifier)

import warnings

# Avoid unnecessary warning output.
warnings.filterwarnings('ignore')

import pandas as pd
import seaborn as sns
from IPython.display import Image

from sklearn.datasets import load_iris

iris = load_iris()

data = iris['data']
feature_names = iris['feature_names']
target = iris['target']
iris['target_names']

array(['setosa', 'versicolor', 'virginica'], dtype='<U10')

Make DataFrame

df_iris = pd.DataFrame(data, columns=feature_names)

df_iris.head()

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

df_iris['target'] = target

df_iris.head()

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

from sklearn.model_selection import train_test_split

x_train, x_valid, y_train, y_valid = train_test_split(df_iris.drop('target', 1), df_iris['target'], stratify = df_iris['target'])

sns.countplot(y_train)

<AxesSubplot:xlabel='target', ylabel='count'>

x_train.shape, y_train.shape

((112, 4), (112,))

x_valid.shape, y_valid.shape

((38, 4), (38,))

from IPython.display import Image

KNeighborsClassifier

Image('https://res.cloudinary.com/dyd911kmh/image/upload/f_auto,q_auto:best/v1531424125/KNN_final_a1mrv9.png')

from sklearn.neighbors import KNeighborsClassifier

knc = KNeighborsClassifier()

knc.fit(x_train, y_train)

KNeighborsClassifier()

knc_pred = knc.predict(x_valid)

(knc_pred == y_valid).mean()

0.9736842105263158

knc = KNeighborsClassifier(n_neighbors=9)
knc.fit(x_train, y_train)
knc_pred = knc.predict(x_valid)

(knc_pred == y_valid).mean()

1.0

SVC

Create a non-stochastic binary linear classification model that determines which category new data belongs to.
Algorithm to find the boundary with the largest width among data expressed as boundaries.

Image('https://csstudy.files.wordpress.com/2011/03/screen-shot-2011-02-28-at-5-53-26-pm.png')

Only binary classification is possible, like LogisticRegression. (Only two classes can be discriminated.)

Use of OvsR strategy

document

from sklearn.svm import SVC

svc = SVC(random_state = 0)
svc.fit(x_train, y_train)
svc_pred = svc.predict(x_valid)

svc

SVC(random_state=0)

(svc_pred == y_valid).mean()

1.0

svc_pred[:5]

array([2, 0, 1, 0, 1])

decision_function() that returns the probability value for each class

svc.decision_function(x_valid)[:5]

array([[-0.23303451,  0.93162872,  2.240261  ],
       [ 2.23208462,  1.15637443, -0.25351021],
       [-0.23277667,  2.21638146,  1.1057563 ],
       [ 2.23200834,  1.1519243 , -0.25256858],
       [-0.23854618,  2.2012744 ,  1.16604144]])

Decision Tree

document

from sklearn.tree import DecisionTreeClassifier

dtc = DecisionTreeClassifier(random_state = 0)

dtc.fit(x_train, y_train)

DecisionTreeClassifier(random_state=0)

dtc_pred = dtc.predict(x_valid)

(dtc_pred == y_valid).mean()

0.9736842105263158

Visualization of decision tree

from sklearn.tree import export_graphviz
from subprocess import call

def graph_tree(model):
    
    # Export to .dot file
    export_graphviz(model, out_file='tree.dot')

    # Convert the generated .dot file to .png
    call(['dot', '-Tpng', 'tree.dot', '-o', 'decistion-tree.png', '-Gdpi=600'])

    # .png output
    return Image(filename = 'decistion-tree.png', width=500)

gini coefficient: means impurity, and the higher the coefficient, the greater the entropy.

High entropy simply means that the classes are cluttered

dtc = DecisionTreeClassifier(max_depth = 2)
dtc.fit(x_train, y_train)
dtc_pred = dtc.predict(x_valid)

graph_tree(dtc)

Twitter Facebook LinkedIn

Classification(iris data with KNeighborsClassifier, SVC and Decision Tree)

Classification(iris data with KNeighborsClassifier)

Make DataFrame

KNeighborsClassifier

SVC

Decision Tree

Visualization of decision tree

공유하기

댓글남기기

참고

Predicting_income

seasonal_decompose

Dickey Fuller Test

Arima_forecast

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2

	sepal length (cm)	sepal width (cm)	petal length (cm)	petal width (cm)
0	5.1	3.5	1.4	0.2
1	4.9	3.0	1.4	0.2
2	4.7	3.2	1.3	0.2
3	4.6	3.1	1.5	0.2
4	5.0	3.6	1.4	0.2