Streamlit#
OBJECTIVES
Save objects with the
picklemoduleSave
sklearnmodels as.pklfilesBuild and deploy basic streamlit applications
import warnings
warnings.filterwarnings('ignore')
Serialization and pickle#
One approach to writing python objects out is to serialize, or create a byte stream. This is done using the pickle module, though other options exist. Note: Pickle files are not secure and you should not trust unknown sources of pickled files.
import pickle
v1 = [1, 2, 3]
with open('simple_list.pkl', 'wb') as f:
pickle.dump(v1, f)
with open('simple_list.pkl', 'rb') as f:
v2 = pickle.load(f)
v2
[1, 2, 3]
Example: Regression Model#
Below we build and save a pipeline to share with our streamlit app. Pipeline objects will make input and transformations easy and are able to be pickled. There are security issues with pickle and some alternative ideas if looking to use unknown sources for models such as skops.
from sklearn.datasets import fetch_openml
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
houses = fetch_openml(data_id = 43926)
data = houses.frame
# data.head()
X = data[['Gr_Liv_Area', 'Overall_Qual', 'Sale_Condition', 'Lot_Area']]
y = data['Sale_Price']
transformer = make_column_transformer((OneHotEncoder(), X.select_dtypes('category').columns.tolist()),
remainder = 'passthrough')
model = LinearRegression()
pipeline = Pipeline([('transformer', transformer), ('model', model)])
pipeline.fit(X, y)
Pipeline(steps=[('transformer',
ColumnTransformer(remainder='passthrough',
transformers=[('onehotencoder',
OneHotEncoder(),
['Overall_Qual',
'Sale_Condition'])])),
('model', LinearRegression())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
Pipeline(steps=[('transformer',
ColumnTransformer(remainder='passthrough',
transformers=[('onehotencoder',
OneHotEncoder(),
['Overall_Qual',
'Sale_Condition'])])),
('model', LinearRegression())])ColumnTransformer(remainder='passthrough',
transformers=[('onehotencoder', OneHotEncoder(),
['Overall_Qual', 'Sale_Condition'])])['Overall_Qual', 'Sale_Condition']
OneHotEncoder()
['Gr_Liv_Area', 'Lot_Area']
passthrough
LinearRegression()
X.head()
| Gr_Liv_Area | Overall_Qual | Sale_Condition | Lot_Area | |
|---|---|---|---|---|
| 0 | 1656 | Above_Average | Normal | 31770 |
| 1 | 896 | Average | Normal | 11622 |
| 2 | 1329 | Above_Average | Normal | 14267 |
| 3 | 2110 | Good | Normal | 11160 |
| 4 | 1629 | Average | Normal | 13830 |
with open('lr_model.pkl', 'wb') as f:
pickle.dump(pipeline, f)
X['Overall_Qual'].unique()
['Above_Average', 'Average', 'Good', 'Very_Good', 'Excellent', 'Below_Average', 'Fair', 'Poor', 'Very_Excellent', 'Very_Poor']
Categories (10, object): ['Above_Average', 'Average', 'Below_Average', 'Excellent', ..., 'Poor', 'Very_Excellent', 'Very_Good', 'Very_Poor']
X['Sale_Condition'].unique()
['Normal', 'Partial', 'Family', 'Abnorml', 'Alloca', 'AdjLand']
Categories (6, object): ['Abnorml', 'AdjLand', 'Alloca', 'Family', 'Normal', 'Partial']
Moving the model to Streamlit#
Now we will create a basic application using the Streamlit library docs. To do so, we will first create a virtual environment for the project. Over to VSCode.
import streamlit as st
import pickle
import pandas as pd
### regression model
from sklearn.datasets import fetch_openml
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
houses = fetch_openml(data_id = 43926)
data = houses.frame
X = data[['Gr_Liv_Area', 'Overall_Qual', 'Sale_Condition', 'Lot_Area']]
y = data['Sale_Price']
transformer = make_column_transformer((OneHotEncoder(), X.select_dtypes('category').columns.tolist()),
remainder = 'passthrough')
model = LinearRegression()
pipeline = Pipeline([('transformer', transformer), ('model', model)])
pipeline.fit(X, y)
st.header('Regression App')
gr_area = st.number_input('What is the above ground living area:')
lot_area = st.slider('What is the total lot area:')
over_qual = st.selectbox('What was the quality?',
('Above_Average', 'Average', 'Good', 'Very_Good', 'Excellent', 'Below_Average', 'Fair', 'Poor', 'Very_Excellent', 'Very_Poor'))
sale_cond = st.selectbox("Condition at sale?",
('Normal', 'Partial', 'Family', 'Abnorml', 'Alloca', 'AdjLand'))
#bring in our model
# with open('lr_model.pkl', 'rb') as f:
# model = pickle.load(f)
X = pd.DataFrame({'Gr_Liv_Area': gr_area,
'Overall_Qual': over_qual,
'Sale_Condition': sale_cond,
'Lot_Area': lot_area}, index = [0])
pred = model.predict(X)
st.write(pred)