10 분 소요

[Notice] [ML_8]

Regression(Boston data with evaluation metrics and regularization)

document

import pandas as pd
import numpy as np

# To show the numeric type well
np.set_printoptions(suppress=True)
from sklearn.datasets import load_boston
data = load_boston()
print(data['DESCR'])
.. _boston_dataset:

Boston house prices dataset
---------------------------

**Data Set Characteristics:**  

    :Number of Instances: 506 

    :Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

    :Attribute Information (in order):
        - CRIM     per capita crime rate by town
        - ZN       proportion of residential land zoned for lots over 25,000 sq.ft.
        - INDUS    proportion of non-retail business acres per town
        - CHAS     Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
        - NOX      nitric oxides concentration (parts per 10 million)
        - RM       average number of rooms per dwelling
        - AGE      proportion of owner-occupied units built prior to 1940
        - DIS      weighted distances to five Boston employment centres
        - RAD      index of accessibility to radial highways
        - TAX      full-value property-tax rate per $10,000
        - PTRATIO  pupil-teacher ratio by town
        - B        1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
        - LSTAT    % lower status of the population
        - MEDV     Median value of owner-occupied homes in $1000's

    :Missing Attribute Values: None

    :Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.
https://archive.ics.uci.edu/ml/machine-learning-databases/housing/


This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic
prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978.   Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980.   N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
problems.   
     
.. topic:: References

   - Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity', Wiley, 1980. 244-261.
   - Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth International Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.

df = pd.DataFrame(data['data'], columns = data['feature_names'])
df['MEDV'] = data['target']
df.head()
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33 36.2

About Columns

Number of attributes: 13

  • CRIM: crime rate

  • ZN: Percentage of residential land per 25,000 square feet

  • INDUS: Percentage of non-retail business area

  • CHAS: Charles River Dummy Variable (1 if the passage is towards the river; 0 otherwise)

  • NOX: Nitric oxide concentration (parts per million)

  • RM: Average number of rooms per dwelling

  • AGE: Percentage of Self-Owned Occupancy Built Before 1940

  • DIS: Weighted distance to 5 Boston job centers

  • RAD: Highway Accessibility Index

  • TAX: Full value property tax rate per $10,000

  • PTRATIO Student-Teacher Ratio by City

  • B: 1000 (Bk-0.63)^2 where Bk is the percentage of tests per city

  • LSTAT: low status of the population

  • MEDV: Median home value (in USD 1,000)

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(df.drop('MEDV', 1), df['MEDV'])
x_train.shape, x_test.shape
((379, 13), (127, 13))
x_train.head()
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT
493 0.17331 0.0 9.69 0.0 0.585 5.707 54.0 2.3817 6.0 391.0 19.2 396.90 12.01
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33
120 0.06899 0.0 25.65 0.0 0.581 5.870 69.7 2.2577 2.0 188.0 19.1 389.15 14.37
358 5.20177 0.0 18.10 1.0 0.770 6.127 83.4 2.7227 24.0 666.0 20.2 395.43 11.48
97 0.12083 0.0 2.89 0.0 0.445 8.069 76.0 3.4952 2.0 276.0 18.0 396.90 4.21
y_train.head()
493    21.8
4      36.2
120    22.0
358    22.7
97     38.7
Name: MEDV, dtype: float64

Create evaluation metrics

MSE(Mean Squared Error)

${(\frac{1}{n})\sum_{i=1}^{n}(y_{i} - x_{i})^{2}}$

Average of the squares of the difference between the predicted value and the actual value

MAE (Mean Absolute Error)

$(\frac{1}{n})\sum_{i=1}^{n}\left y_{i} - x_{i} \right $

Average of the absolute value of the difference between the predicted value and the actual value

RMSE (Root Mean Squared Error)

$\sqrt{(\frac{1}{n})\sum_{i=1}^{n}(y_{i} - x_{i})^{2}}$

A value obtained by averaging the square of the difference between the predicted value and the actual value and then covering the root

import numpy as np
pred = np.array([3, 4, 5])
actual = np.array([1, 2, 3])
def my_mse(pred, actual):
    return ((pred - actual)**2).mean()
my_mse(pred, actual)
4.0
def my_mae(pred, actual):
    return np.abs(pred - actual).mean()
my_mae(pred, actual)
2.0
def my_rmse(pred, actual):
    return np.sqrt(my_mse(pred, actual))
my_rmse(pred, actual)
2.0

Using sklearn’s evaluation indicators

from sklearn.metrics import mean_absolute_error, mean_squared_error
my_mae(pred, actual), mean_absolute_error(pred, actual)
(2.0, 2.0)
my_mse(pred, actual), mean_squared_error(pred, actual)
(4.0, 4.0)

Functions to check the performance of each model

import matplotlib.pyplot as plt
import seaborn as sns

my_predictions = {}

colors = ['r', 'c', 'm', 'y', 'k', 'khaki', 'teal', 'orchid', 'sandybrown',
          'greenyellow', 'dodgerblue', 'deepskyblue', 'rosybrown', 'firebrick',
          'deeppink', 'crimson', 'salmon', 'darkred', 'olivedrab', 'olive', 
          'forestgreen', 'royalblue', 'indigo', 'navy', 'mediumpurple', 'chocolate',
          'gold', 'darkorange', 'seagreen', 'turquoise', 'steelblue', 'slategray', 
          'peru', 'midnightblue', 'slateblue', 'dimgray', 'cadetblue', 'tomato'
         ]

def plot_predictions(name_, pred, actual):
    df = pd.DataFrame({'prediction': pred, 'actual': y_test})
    df = df.sort_values(by='actual').reset_index(drop=True)

    plt.figure(figsize=(12, 9))
    plt.scatter(df.index, df['prediction'], marker='x', color='r')
    plt.scatter(df.index, df['actual'], alpha=0.7, marker='o', color='black')
    plt.title(name_, fontsize=15)
    plt.legend(['prediction', 'actual'], fontsize=12)
    plt.show()

def mse_eval(name_, pred, actual):
    global predictions
    global colors

    plot_predictions(name_, pred, actual)

    mse = mean_squared_error(pred, actual)
    my_predictions[name_] = mse

    y_value = sorted(my_predictions.items(), key=lambda x: x[1], reverse=True)
    
    df = pd.DataFrame(y_value, columns=['model', 'mse'])
    print(df)
    min_ = df['mse'].min() - 10
    max_ = df['mse'].max() + 10
    
    length = len(df)
    
    plt.figure(figsize=(10, length))
    ax = plt.subplot()
    ax.set_yticks(np.arange(len(df)))
    ax.set_yticklabels(df['model'], fontsize=15)
    bars = ax.barh(np.arange(len(df)), df['mse'])
    
    for i, v in enumerate(df['mse']):
        idx = np.random.choice(len(colors))
        bars[i].set_color(colors[idx])
        ax.text(v + 2, i, str(round(v, 3)), color='k', fontsize=15, fontweight='bold')
        
    plt.title('MSE Error', fontsize=18)
    plt.xlim(min_, max_)
    
    plt.show()

def remove_model(name_):
    global my_predictions
    try:
        del my_predictions[name_]
    except KeyError:
        return False
    return True

LinearRegression

document

from sklearn.linear_model import LinearRegression
model = LinearRegression(n_jobs = -1)
  • n_jobs: use CPU core
model.fit(x_train, y_train)
LinearRegression(n_jobs=-1)
pred = model.predict(x_test)
mse_eval('LinearRegression', pred, y_test)

              model        mse
0  LinearRegression  25.817813

Regularization

Giving some kind of penalty to prevent learning from overfitting

L2 Regularization

  • Multiply the sum of squares of each weight by the Regularization Strength λ.

  • Increasing λ will decrease the weight more (regulation is important), while decreasing λ will increase the weight (regulation is not important).

L1 Regularization

  • Not the sum of squares of weights, but the addition of sum of weights multiplied by the regularization strength λ and added to the error.

  • Any weight w is actually zero. That is, there is a characteristic that is completely excluded from the model.

L2 regulation is more stable than L1 regulation, so L2 regulation is generally used more

Ridge - L2 regularization

$Error=MSE+αw^2$

Lasso - L1 regularization

$Error=MSE+α w $

Ridge

from sklearn.linear_model import Ridge
# The larger the value, the greater the regulation.
alphas = [100, 10, 1, 0.1, 0.01, 0.001, 0.0001]
for alpha in alphas:
    ridge = Ridge(alpha = alpha)
    ridge.fit(x_train, y_train)
    pred = ridge.predict(x_test)
    mse_eval('Ridge(alpha={})'.format(alpha), pred, y_test)

              model        mse
0  Ridge(alpha=100)  26.476653
1  LinearRegression  25.817813

              model        mse
0   Ridge(alpha=10)  26.775513
1  Ridge(alpha=100)  26.476653
2  LinearRegression  25.817813

              model        mse
0   Ridge(alpha=10)  26.775513
1  Ridge(alpha=100)  26.476653
2    Ridge(alpha=1)  26.307971
3  LinearRegression  25.817813

              model        mse
0   Ridge(alpha=10)  26.775513
1  Ridge(alpha=100)  26.476653
2    Ridge(alpha=1)  26.307971
3  Ridge(alpha=0.1)  25.882311
4  LinearRegression  25.817813

               model        mse
0    Ridge(alpha=10)  26.775513
1   Ridge(alpha=100)  26.476653
2     Ridge(alpha=1)  26.307971
3   Ridge(alpha=0.1)  25.882311
4  Ridge(alpha=0.01)  25.824345
5   LinearRegression  25.817813

                model        mse
0     Ridge(alpha=10)  26.775513
1    Ridge(alpha=100)  26.476653
2      Ridge(alpha=1)  26.307971
3    Ridge(alpha=0.1)  25.882311
4   Ridge(alpha=0.01)  25.824345
5  Ridge(alpha=0.001)  25.818467
6    LinearRegression  25.817813

                 model        mse
0      Ridge(alpha=10)  26.775513
1     Ridge(alpha=100)  26.476653
2       Ridge(alpha=1)  26.307971
3     Ridge(alpha=0.1)  25.882311
4    Ridge(alpha=0.01)  25.824345
5   Ridge(alpha=0.001)  25.818467
6  Ridge(alpha=0.0001)  25.817878
7     LinearRegression  25.817813

x_train.columns
Index(['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX',
       'PTRATIO', 'B', 'LSTAT'],
      dtype='object')
ridge.coef_
array([ -0.12437217,   0.043921  ,  -0.02919101,   2.56563445,
       -15.77155659,   4.43365161,  -0.01219844,  -1.57385887,
         0.30938265,  -0.01215097,  -0.88381517,   0.00965481,
        -0.45669038])
def plot_coef(columns, coef):
    coef_df = pd.DataFrame(list(zip(columns, coef)))
    coef_df.columns=['feature', 'coef']
    coef_df = coef_df.sort_values('coef', ascending=False).reset_index(drop=True)
    
    fig, ax = plt.subplots(figsize=(9, 7))
    ax.barh(np.arange(len(coef_df)), coef_df['coef'])
    idx = np.arange(len(coef_df))
    ax.set_yticks(idx)
    ax.set_yticklabels(coef_df['feature'])
    fig.tight_layout()
    plt.show()
plot_coef(x_train.columns, ridge.coef_)

ridge_100 = Ridge(alpha = 100)
ridge_100.fit(x_train, y_train)
ridge_pred_100 = ridge_100.predict(x_test)

ridge_001 = Ridge(alpha = 0.001)
ridge_001.fit(x_train, y_train)
ridge_pred_001 = ridge_001.predict(x_test)
plot_coef(x_train.columns, ridge_100.coef_)

plot_coef(x_train.columns, ridge_001.coef_)

from sklearn.linear_model import Lasso
alphas = [100, 10, 1, 0.1, 0.01, 0.001, 0.0001]
for alpha in alphas:
    lasso = Lasso(alpha = alpha)
    lasso.fit(x_train, y_train)
    pred = lasso.predict(x_test)
    mse_eval('Lasso(alpha={})'.format(alpha), pred, y_test)

                 model        mse
0     Lasso(alpha=100)  67.992572
1      Ridge(alpha=10)  26.775513
2     Ridge(alpha=100)  26.476653
3       Ridge(alpha=1)  26.307971
4     Ridge(alpha=0.1)  25.882311
5    Ridge(alpha=0.01)  25.824345
6   Ridge(alpha=0.001)  25.818467
7  Ridge(alpha=0.0001)  25.817878
8     LinearRegression  25.817813

                 model        mse
0     Lasso(alpha=100)  67.992572
1      Lasso(alpha=10)  38.306399
2      Ridge(alpha=10)  26.775513
3     Ridge(alpha=100)  26.476653
4       Ridge(alpha=1)  26.307971
5     Ridge(alpha=0.1)  25.882311
6    Ridge(alpha=0.01)  25.824345
7   Ridge(alpha=0.001)  25.818467
8  Ridge(alpha=0.0001)  25.817878
9     LinearRegression  25.817813

                  model        mse
0      Lasso(alpha=100)  67.992572
1       Lasso(alpha=10)  38.306399
2       Ridge(alpha=10)  26.775513
3      Ridge(alpha=100)  26.476653
4        Lasso(alpha=1)  26.318104
5        Ridge(alpha=1)  26.307971
6      Ridge(alpha=0.1)  25.882311
7     Ridge(alpha=0.01)  25.824345
8    Ridge(alpha=0.001)  25.818467
9   Ridge(alpha=0.0001)  25.817878
10     LinearRegression  25.817813

                  model        mse
0      Lasso(alpha=100)  67.992572
1       Lasso(alpha=10)  38.306399
2      Lasso(alpha=0.1)  27.228867
3       Ridge(alpha=10)  26.775513
4      Ridge(alpha=100)  26.476653
5        Lasso(alpha=1)  26.318104
6        Ridge(alpha=1)  26.307971
7      Ridge(alpha=0.1)  25.882311
8     Ridge(alpha=0.01)  25.824345
9    Ridge(alpha=0.001)  25.818467
10  Ridge(alpha=0.0001)  25.817878
11     LinearRegression  25.817813

                  model        mse
0      Lasso(alpha=100)  67.992572
1       Lasso(alpha=10)  38.306399
2      Lasso(alpha=0.1)  27.228867
3       Ridge(alpha=10)  26.775513
4      Ridge(alpha=100)  26.476653
5        Lasso(alpha=1)  26.318104
6        Ridge(alpha=1)  26.307971
7     Lasso(alpha=0.01)  25.984630
8      Ridge(alpha=0.1)  25.882311
9     Ridge(alpha=0.01)  25.824345
10   Ridge(alpha=0.001)  25.818467
11  Ridge(alpha=0.0001)  25.817878
12     LinearRegression  25.817813

                  model        mse
0      Lasso(alpha=100)  67.992572
1       Lasso(alpha=10)  38.306399
2      Lasso(alpha=0.1)  27.228867
3       Ridge(alpha=10)  26.775513
4      Ridge(alpha=100)  26.476653
5        Lasso(alpha=1)  26.318104
6        Ridge(alpha=1)  26.307971
7     Lasso(alpha=0.01)  25.984630
8      Ridge(alpha=0.1)  25.882311
9    Lasso(alpha=0.001)  25.831237
10    Ridge(alpha=0.01)  25.824345
11   Ridge(alpha=0.001)  25.818467
12  Ridge(alpha=0.0001)  25.817878
13     LinearRegression  25.817813

                  model        mse
0      Lasso(alpha=100)  67.992572
1       Lasso(alpha=10)  38.306399
2      Lasso(alpha=0.1)  27.228867
3       Ridge(alpha=10)  26.775513
4      Ridge(alpha=100)  26.476653
5        Lasso(alpha=1)  26.318104
6        Ridge(alpha=1)  26.307971
7     Lasso(alpha=0.01)  25.984630
8      Ridge(alpha=0.1)  25.882311
9    Lasso(alpha=0.001)  25.831237
10    Ridge(alpha=0.01)  25.824345
11  Lasso(alpha=0.0001)  25.819123
12   Ridge(alpha=0.001)  25.818467
13  Ridge(alpha=0.0001)  25.817878
14     LinearRegression  25.817813

lasso_100 = Lasso(alpha = 100)
lasso_100.fit(x_train, y_train)
lasso_pred_100 = lasso_100.predict(x_test)

lasso_001 = Lasso(alpha = 0.001)
lasso_001.fit(x_train, y_train)
lasso_pred_001 = lasso_001.predict(x_test)
plot_coef(x_train.columns, lasso_100.coef_)

lasso_100.coef_
array([-0.        ,  0.        , -0.        ,  0.        , -0.        ,
        0.        , -0.        ,  0.        ,  0.        , -0.02249985,
       -0.        ,  0.00144425, -0.        ])
plot_coef(x_train.columns, lasso_001.coef_)

lasso_001.coef_
array([ -0.12917321,   0.06504872,   0.00512375,   2.6515016 ,
       -18.61205761,   2.98149349,   0.00016926,  -1.71181198,
         0.31973993,  -0.01452793,  -0.8523832 ,   0.00733997,
        -0.53417737])

ElasticNet

l1_ratio (default=0.5)

  • l1_ratio = 0 (only L2 regulation is used).

  • l1_ratio = 1 (only L1 regulation is used).

  • 0 < l1_ratio < 1 (mixed use of L1 and L2 regulations)

from sklearn.linear_model import ElasticNet
ratios = [0.2, 0.5, 0.8]
for ratio in ratios:
    elasticnet = ElasticNet(alpha = 0.5, l1_ratio = ratio)
    elasticnet.fit(x_train, y_train)
    pred = elasticnet.predict(x_test)
    mse_eval('ElasticNet(l1_ratio={})'.format(ratio), pred, y_test)

                       model        mse
0           Lasso(alpha=100)  67.992572
1            Lasso(alpha=10)  38.306399
2           Lasso(alpha=0.1)  27.228867
3            Ridge(alpha=10)  26.775513
4   ElasticNet(l1_ratio=0.2)  26.630630
5           Ridge(alpha=100)  26.476653
6             Lasso(alpha=1)  26.318104
7             Ridge(alpha=1)  26.307971
8          Lasso(alpha=0.01)  25.984630
9           Ridge(alpha=0.1)  25.882311
10        Lasso(alpha=0.001)  25.831237
11         Ridge(alpha=0.01)  25.824345
12       Lasso(alpha=0.0001)  25.819123
13        Ridge(alpha=0.001)  25.818467
14       Ridge(alpha=0.0001)  25.817878
15          LinearRegression  25.817813

                       model        mse
0           Lasso(alpha=100)  67.992572
1            Lasso(alpha=10)  38.306399
2           Lasso(alpha=0.1)  27.228867
3            Ridge(alpha=10)  26.775513
4   ElasticNet(l1_ratio=0.2)  26.630630
5           Ridge(alpha=100)  26.476653
6   ElasticNet(l1_ratio=0.5)  26.473066
7             Lasso(alpha=1)  26.318104
8             Ridge(alpha=1)  26.307971
9          Lasso(alpha=0.01)  25.984630
10          Ridge(alpha=0.1)  25.882311
11        Lasso(alpha=0.001)  25.831237
12         Ridge(alpha=0.01)  25.824345
13       Lasso(alpha=0.0001)  25.819123
14        Ridge(alpha=0.001)  25.818467
15       Ridge(alpha=0.0001)  25.817878
16          LinearRegression  25.817813

                       model        mse
0           Lasso(alpha=100)  67.992572
1            Lasso(alpha=10)  38.306399
2           Lasso(alpha=0.1)  27.228867
3            Ridge(alpha=10)  26.775513
4   ElasticNet(l1_ratio=0.2)  26.630630
5           Ridge(alpha=100)  26.476653
6   ElasticNet(l1_ratio=0.5)  26.473066
7             Lasso(alpha=1)  26.318104
8             Ridge(alpha=1)  26.307971
9   ElasticNet(l1_ratio=0.8)  26.212880
10         Lasso(alpha=0.01)  25.984630
11          Ridge(alpha=0.1)  25.882311
12        Lasso(alpha=0.001)  25.831237
13         Ridge(alpha=0.01)  25.824345
14       Lasso(alpha=0.0001)  25.819123
15        Ridge(alpha=0.001)  25.818467
16       Ridge(alpha=0.0001)  25.817878
17          LinearRegression  25.817813

elsticnet_20 = ElasticNet(alpha = 5, l1_ratio = ratio)
elsticnet_20.fit(x_train, y_train)
elasticnet_pred_20 = elsticnet_20.predict(x_test)

elsticnet_80 = ElasticNet(alpha = 5, l1_ratio = 0.8)
elsticnet_80.fit(x_train, y_train)
elasticnet_pred_80 = elsticnet_80.predict(x_test)
plot_coef(x_train.columns, elsticnet_20.coef_)

plot_coef(x_train.columns, elsticnet_80.coef_)

elsticnet_80.coef_
array([-0.        ,  0.03138366, -0.        ,  0.        ,  0.        ,
        0.        ,  0.0327152 , -0.        ,  0.01066803, -0.00894765,
       -0.00438333,  0.00478111, -0.74943405])

댓글남기기