Streamlit#

OBJECTIVES

  • Save objects with the pickle module

  • Save sklearn models as .pkl files

  • Build and deploy basic streamlit applications

import warnings
warnings.filterwarnings('ignore')

Serialization and pickle#

One approach to writing python objects out is to serialize, or create a byte stream. This is done using the pickle module, though other options exist. Note: Pickle files are not secure and you should not trust unknown sources of pickled files.

import pickle
v1 = [1, 2, 3]
with open('simple_list.pkl', 'wb') as f:
    pickle.dump(v1, f)
with open('simple_list.pkl', 'rb') as f:
    v2 = pickle.load(f)
v2
[1, 2, 3]

Example: Regression Model#

Below we build and save a pipeline to share with our streamlit app. Pipeline objects will make input and transformations easy and are able to be pickled. There are security issues with pickle and some alternative ideas if looking to use unknown sources for models such as skops.

from sklearn.datasets import fetch_openml
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer

houses = fetch_openml(data_id = 43926)
data = houses.frame
# data.head()
X = data[['Gr_Liv_Area', 'Overall_Qual', 'Sale_Condition', 'Lot_Area']]
y = data['Sale_Price']
transformer = make_column_transformer((OneHotEncoder(), X.select_dtypes('category').columns.tolist()),
                                      remainder = 'passthrough')
model = LinearRegression()
pipeline = Pipeline([('transformer', transformer), ('model', model)])
pipeline.fit(X, y)
Pipeline(steps=[('transformer',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('onehotencoder',
                                                  OneHotEncoder(),
                                                  ['Overall_Qual',
                                                   'Sale_Condition'])])),
                ('model', LinearRegression())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
X.head()
Gr_Liv_Area Overall_Qual Sale_Condition Lot_Area
0 1656 Above_Average Normal 31770
1 896 Average Normal 11622
2 1329 Above_Average Normal 14267
3 2110 Good Normal 11160
4 1629 Average Normal 13830
with open('lr_model.pkl', 'wb') as f:
    pickle.dump(pipeline, f)
X['Overall_Qual'].unique()
['Above_Average', 'Average', 'Good', 'Very_Good', 'Excellent', 'Below_Average', 'Fair', 'Poor', 'Very_Excellent', 'Very_Poor']
Categories (10, object): ['Above_Average', 'Average', 'Below_Average', 'Excellent', ..., 'Poor', 'Very_Excellent', 'Very_Good', 'Very_Poor']
X['Sale_Condition'].unique()
['Normal', 'Partial', 'Family', 'Abnorml', 'Alloca', 'AdjLand']
Categories (6, object): ['Abnorml', 'AdjLand', 'Alloca', 'Family', 'Normal', 'Partial']

Moving the model to Streamlit#

Now we will create a basic application using the Streamlit library docs. To do so, we will first create a virtual environment for the project. Over to VSCode.

import streamlit as st 
import pickle
import pandas as pd
### regression model
from sklearn.datasets import fetch_openml
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer

houses = fetch_openml(data_id = 43926)
data = houses.frame
X = data[['Gr_Liv_Area', 'Overall_Qual', 'Sale_Condition', 'Lot_Area']]
y = data['Sale_Price']
transformer = make_column_transformer((OneHotEncoder(), X.select_dtypes('category').columns.tolist()),
                                      remainder = 'passthrough')
model = LinearRegression()
pipeline = Pipeline([('transformer', transformer), ('model', model)])
pipeline.fit(X, y)






st.header('Regression App')

gr_area = st.number_input('What is the above ground living area:')
lot_area = st.slider('What is the total lot area:')
over_qual = st.selectbox('What was the quality?', 
                         ('Above_Average', 'Average', 'Good', 'Very_Good', 'Excellent', 'Below_Average', 'Fair', 'Poor', 'Very_Excellent', 'Very_Poor'))
sale_cond = st.selectbox("Condition at sale?",
                         ('Normal', 'Partial', 'Family', 'Abnorml', 'Alloca', 'AdjLand'))
#bring in our model
# with open('lr_model.pkl', 'rb') as f:
#     model = pickle.load(f)
    
X = pd.DataFrame({'Gr_Liv_Area': gr_area,
                  'Overall_Qual': over_qual,
                  'Sale_Condition': sale_cond,
                  'Lot_Area': lot_area}, index = [0])

pred = model.predict(X)
st.write(pred)