Evaluating Classification Models#
OBJECTIVES
Use the confusion matrix to evaluate classification models
Explore precision and recall as evaluation metrics
Determine cost of predicting highest probability targets
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler, OneHotEncoder, PolynomialFeatures
from sklearn.compose import make_column_transformer
from sklearn.metrics import ConfusionMatrixDisplay, confusion_matrix
from sklearn.datasets import load_breast_cancer, load_digits, fetch_openml
Evaluating Classifiers#
Today, we want to think a bit more about the appropriate classification metrics in different situations. Please use this form to summarize your work.
Problem#
Below, a dataset with measurements of cancerous and non-cancerous breast tumors is loaded and displayed. Use LogisticRegression and KNeighborsClassifier to build predictive models on train/test splits. Generate a confusion matrix and explore the classifiers mistakes.
Which model do you prefer and why?
Do you care about predicting each of these classes equally?
Is there a ratio other than accuracy you think is more important based on the confusion matrix?
cancer = load_breast_cancer(as_frame=True).frame
cancer.head()
| mean radius | mean texture | mean perimeter | mean area | mean smoothness | mean compactness | mean concavity | mean concave points | mean symmetry | mean fractal dimension | ... | worst texture | worst perimeter | worst area | worst smoothness | worst compactness | worst concavity | worst concave points | worst symmetry | worst fractal dimension | target | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 17.99 | 10.38 | 122.80 | 1001.0 | 0.11840 | 0.27760 | 0.3001 | 0.14710 | 0.2419 | 0.07871 | ... | 17.33 | 184.60 | 2019.0 | 0.1622 | 0.6656 | 0.7119 | 0.2654 | 0.4601 | 0.11890 | 0 |
| 1 | 20.57 | 17.77 | 132.90 | 1326.0 | 0.08474 | 0.07864 | 0.0869 | 0.07017 | 0.1812 | 0.05667 | ... | 23.41 | 158.80 | 1956.0 | 0.1238 | 0.1866 | 0.2416 | 0.1860 | 0.2750 | 0.08902 | 0 |
| 2 | 19.69 | 21.25 | 130.00 | 1203.0 | 0.10960 | 0.15990 | 0.1974 | 0.12790 | 0.2069 | 0.05999 | ... | 25.53 | 152.50 | 1709.0 | 0.1444 | 0.4245 | 0.4504 | 0.2430 | 0.3613 | 0.08758 | 0 |
| 3 | 11.42 | 20.38 | 77.58 | 386.1 | 0.14250 | 0.28390 | 0.2414 | 0.10520 | 0.2597 | 0.09744 | ... | 26.50 | 98.87 | 567.7 | 0.2098 | 0.8663 | 0.6869 | 0.2575 | 0.6638 | 0.17300 | 0 |
| 4 | 20.29 | 14.34 | 135.10 | 1297.0 | 0.10030 | 0.13280 | 0.1980 | 0.10430 | 0.1809 | 0.05883 | ... | 16.67 | 152.20 | 1575.0 | 0.1374 | 0.2050 | 0.4000 | 0.1625 | 0.2364 | 0.07678 | 0 |
5 rows × 31 columns
# changing target label
#cancer['target'] = np.where(cancer['target'] == 0, 1, 0)
from sklearn.model_selection import train_test_split, cross_val_score
X = cancer.iloc[:, :-1]
y = cancer['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 11)
lgr = LogisticRegression()
knn = KNeighborsClassifier(n_neighbors=30)
scaler = StandardScaler()
from sklearn.pipeline import Pipeline
lgr_pipe = Pipeline([('scale', scaler), ('model', lgr)])
knn_pipe = Pipeline([('scale', scaler), ('model', knn)])
lgr_pipe.fit(X_train, y_train)
knn_pipe.fit(X_train, y_train)
Pipeline(steps=[('scale', StandardScaler()),
('model', KNeighborsClassifier(n_neighbors=30))])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('scale', StandardScaler()),
('model', KNeighborsClassifier(n_neighbors=30))])StandardScaler()
KNeighborsClassifier(n_neighbors=30)
#plot confusion matrices
Problem#
Below, a dataset around customer churn is loaded and displayed. Build classification models on the data and visualize the confusion matrix.
Suppose you want to offer an incentive to customers you think are likely to churn, what is an appropriate evaluation metric?
Suppose you only have a budget to target 100 individuals you expect to churn. By targeting the most likely predictions to churn, what percent of churned customers did you capture?
churn = fetch_openml(data_id = 43390).frame
churn.head()
| RowNumber | CustomerId | Surname | CreditScore | Geography | Gender | Age | Tenure | Balance | NumOfProducts | HasCrCard | IsActiveMember | EstimatedSalary | Exited | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 15634602 | Hargrave | 619 | France | Female | 42 | 2 | 0.00 | 1 | 1 | 1 | 101348.88 | 1 |
| 1 | 2 | 15647311 | Hill | 608 | Spain | Female | 41 | 1 | 83807.86 | 1 | 0 | 1 | 112542.58 | 0 |
| 2 | 3 | 15619304 | Onio | 502 | France | Female | 42 | 8 | 159660.80 | 3 | 1 | 0 | 113931.57 | 1 |
| 3 | 4 | 15701354 | Boni | 699 | France | Female | 39 | 1 | 0.00 | 2 | 0 | 0 | 93826.63 | 0 |
| 4 | 5 | 15737888 | Mitchell | 850 | Spain | Female | 43 | 2 | 125510.82 | 1 | 1 | 1 | 79084.10 | 0 |
X = churn.iloc[:, :-1]
y = churn['Exited']
X.drop(['Surname', 'RowNumber', 'CustomerId'], axis = 1, inplace = True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 11)
encoder = make_column_transformer((OneHotEncoder(drop = 'first'), ['Geography', 'Gender']),
remainder = StandardScaler())
knn_pipe = Pipeline([('transform', encoder), ('model', KNeighborsClassifier())])
lgr_pipe = Pipeline([('transform', encoder), ('model', LogisticRegression())])
knn_pipe.fit(X_train, y_train)
lgr_pipe.fit(X_train, y_train)
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/sklearn/compose/_column_transformer.py:1623: FutureWarning:
The format of the columns of the 'remainder' transformer in ColumnTransformer.transformers_ will change in version 1.7 to match the format of the other transformers.
At the moment the remainder columns are stored as indices (of type int). With the same ColumnTransformer configuration, in the future they will be stored as column names (of type str).
To use the new behavior now and suppress this warning, use ColumnTransformer(force_int_remainder_cols=False).
warnings.warn(
Pipeline(steps=[('transform',
ColumnTransformer(remainder=StandardScaler(),
transformers=[('onehotencoder',
OneHotEncoder(drop='first'),
['Geography', 'Gender'])])),
('model', LogisticRegression())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('transform',
ColumnTransformer(remainder=StandardScaler(),
transformers=[('onehotencoder',
OneHotEncoder(drop='first'),
['Geography', 'Gender'])])),
('model', LogisticRegression())])ColumnTransformer(remainder=StandardScaler(),
transformers=[('onehotencoder', OneHotEncoder(drop='first'),
['Geography', 'Gender'])])['Geography', 'Gender']
OneHotEncoder(drop='first')
['CreditScore', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'HasCrCard', 'IsActiveMember', 'EstimatedSalary']
StandardScaler()
LogisticRegression()
#plot confusion matrices
Predicting Positives#
Return to the churn example and a Logistic Regression model on the data.
If you were to make predictions on a random 30% of the data, what percent of the true positives would you expect to capture?
Use the predict probability capabilities of the estimator to create a
DataFramewith the following columns:
probability of prediction = 1 |
true label |
|---|---|
.8 |
1 |
.7 |
1 |
.4 |
0 |
Sort the probabilities from largest to smallest. What percentage of the positives are in the first 3000 rows?
scikit-learn visualizers#
PrecisionRecallDisplayROCurveDisplay
from skplot docs
plot_cumulative_gain
from sklearn.metrics import PrecisionRecallDisplay, RocCurveDisplay
import scikitplot as skplot
skplot.metrics.cumulative_gain_curve()
-----------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[22], line 1
----> 1 skplot.metrics.cumulative_gain_curve()
TypeError: cumulative_gain_curve() missing 2 required positional arguments: 'y_true' and 'y_score'