Streamlit

Contents

Streamlit#

OBJECTIVES

Save objects with the pickle module
Save sklearn models as .pkl files
Build and deploy basic streamlit applications

import warnings
warnings.filterwarnings('ignore')

Serialization and `pickle`#

One approach to writing python objects out is to serialize, or create a byte stream. This is done using the pickle module, though other options exist. Note: Pickle files are not secure and you should not trust unknown sources of pickled files.

import pickle

v1 = [1, 2, 3]

with open('simple_list.pkl', 'wb') as f:
    pickle.dump(v1, f)

with open('simple_list.pkl', 'rb') as f:
    v2 = pickle.load(f)

v2

[1, 2, 3]

Example: Regression Model#

Below we build and save a pipeline to share with our streamlit app. Pipeline objects will make input and transformations easy and are able to be pickled. There are security issues with pickle and some alternative ideas if looking to use unknown sources for models such as skops.

from sklearn.datasets import fetch_openml
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer

houses = fetch_openml(data_id = 43926)
data = houses.frame
# data.head()
X = data[['Gr_Liv_Area', 'Overall_Qual', 'Sale_Condition', 'Lot_Area']]
y = data['Sale_Price']
transformer = make_column_transformer((OneHotEncoder(), X.select_dtypes('category').columns.tolist()),
                                      remainder = 'passthrough')
model = LinearRegression()
pipeline = Pipeline([('transformer', transformer), ('model', model)])
pipeline.fit(X, y)

Pipeline(steps=[('transformer',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('onehotencoder',
                                                  OneHotEncoder(),
                                                  ['Overall_Qual',
                                                   'Sale_Condition'])])),
                ('model', LinearRegression())])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

X.head()

	Gr_Liv_Area	Overall_Qual	Sale_Condition	Lot_Area
0	1656	Above_Average	Normal	31770
1	896	Average	Normal	11622
2	1329	Above_Average	Normal	14267
3	2110	Good	Normal	11160
4	1629	Average	Normal	13830

with open('lr_model.pkl', 'wb') as f:
    pickle.dump(pipeline, f)

X['Overall_Qual'].unique()

['Above_Average', 'Average', 'Good', 'Very_Good', 'Excellent', 'Below_Average', 'Fair', 'Poor', 'Very_Excellent', 'Very_Poor']
Categories (10, object): ['Above_Average', 'Average', 'Below_Average', 'Excellent', ..., 'Poor', 'Very_Excellent', 'Very_Good', 'Very_Poor']

X['Sale_Condition'].unique()

['Normal', 'Partial', 'Family', 'Abnorml', 'Alloca', 'AdjLand']
Categories (6, object): ['Abnorml', 'AdjLand', 'Alloca', 'Family', 'Normal', 'Partial']

Moving the model to Streamlit#

Now we will create a basic application using the Streamlit library docs. To do so, we will first create a virtual environment for the project. Over to VSCode.

import streamlit as st 
import pickle
import pandas as pd
### regression model
from sklearn.datasets import fetch_openml
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer

houses = fetch_openml(data_id = 43926)
data = houses.frame
X = data[['Gr_Liv_Area', 'Overall_Qual', 'Sale_Condition', 'Lot_Area']]
y = data['Sale_Price']
transformer = make_column_transformer((OneHotEncoder(), X.select_dtypes('category').columns.tolist()),
                                      remainder = 'passthrough')
model = LinearRegression()
pipeline = Pipeline([('transformer', transformer), ('model', model)])
pipeline.fit(X, y)






st.header('Regression App')

gr_area = st.number_input('What is the above ground living area:')
lot_area = st.slider('What is the total lot area:')
over_qual = st.selectbox('What was the quality?', 
                         ('Above_Average', 'Average', 'Good', 'Very_Good', 'Excellent', 'Below_Average', 'Fair', 'Poor', 'Very_Excellent', 'Very_Poor'))
sale_cond = st.selectbox("Condition at sale?",
                         ('Normal', 'Partial', 'Family', 'Abnorml', 'Alloca', 'AdjLand'))
#bring in our model
# with open('lr_model.pkl', 'rb') as f:
#     model = pickle.load(f)
    
X = pd.DataFrame({'Gr_Liv_Area': gr_area,
                  'Overall_Qual': over_qual,
                  'Sale_Condition': sale_cond,
                  'Lot_Area': lot_area}, index = [0])

pred = model.predict(X)
st.write(pred)