Plotting with bokeh

Plotting with bokeh#

Similar to plotly, the bokeh library offers plots that are interactive and come with basic tools like zoom, pan, and save. Personally, I find bokeh nicer to work with than plotly and they have very good documentation. One of the key differences with bokeh is that you can create a plot object (figure()) and assign this to a variable that you then interact with and show at your leisure (using the show function).

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

To begin, a basic plot will be created with the figure function and the display of the plot is handled with the show function. To display the plots in a notebook enviornment you should execute the output_notebook function at the start of the notebook.

# !pip install bokeh
from bokeh.plotting import figure
from bokeh.io import output_notebook, show
output_notebook()
Loading BokehJS ...
x = np.random.normal(loc = 3, scale = 2, size = 100)
epsilon = np.random.normal(loc = 4, scale = 3, size = 100)
p = figure(width = 600, height = 300)
p.scatter(x, 3*x + 4 + epsilon)
show(p)
p = figure(width = 600, height = 300, title = 'Linear function with gaussian noise')
p.scatter(x, 3*x + 4 + epsilon,  color = 'red', size = 10, alpha = 0.5)

show(p)

Hovering#

Adding interactivity through hovering is accomplished using the tooltips argument and connecting this to a data source with ColumnDataSource – the way to easily plot from DataFrame’s as well.

from bokeh.models import ColumnDataSource
from bokeh.layouts import gridplot
source = ColumnDataSource(data = dict(
    x = x,
    y = 3*x + 4 + epsilon))
TOOLTIPS = [ ("(x, y)", "($x, $y)")]


p = figure(width = 600, height = 300, title = 'Linear function with gaussian noise',
           tooltips = TOOLTIPS)
p.scatter('x', 'y', color = 'red', size = 10, alpha = 0.5, source = source)

show(p)

Below, a similar example using a DataFrame as the source is shown.

from bokeh.models import ColumnDataSource
from bokeh.layouts import gridplot


df = pd.read_csv('https://raw.githubusercontent.com/jfkoehler/nyu_bootcamp_fa24/refs/heads/main/data/2017.csv')
df.rename({'Happiness.Score': 'Happiness_Score',
           'Economy..GDP.per.Capita.': 'GDP'}, axis = 'columns', inplace = True)
source = ColumnDataSource(df)
tooltips = [('Country', '@Country'),
            ('GDP', '@GDP'),
            ('Happiness Score', '@Happiness_Score')]
p1 = figure(title="Happiness and Economy", width = 600, height = 300, tooltips = tooltips)
p1.scatter("Happiness_Score", "GDP", color="firebrick", size = 10,
          alpha = 0.5,
          source=source)
p1.xaxis[0].axis_label = "Happiness Score"
p1.yaxis[0].axis_label = "GDP per Capita"

show(p1)

More Complex Plotting#

The example below is from the documentation here on the RangeTool and how to create subplots with the column function. Your job is to annotate each line of the code and explain what it is doing. Then, use the example to write a function that takes in any DataFrame

# pip install bokeh_sampledata
from bokeh.sampledata.stocks import GOOG
goog = pd.DataFrame(GOOG)
goog.head()
date open high low close volume adj_close
0 2004-08-19 100.00 104.06 95.96 100.34 22351900 100.34
1 2004-08-20 101.01 109.08 100.50 108.31 11428600 108.31
2 2004-08-23 110.75 113.48 109.05 109.40 9137200 109.40
3 2004-08-24 111.24 111.60 103.57 104.87 7631300 104.87
4 2004-08-25 104.96 108.00 103.88 106.00 4598900 106.00
from bokeh.layouts import column
from bokeh.models import RangeTool
dates = np.array(goog['date'], dtype=np.datetime64)
source = ColumnDataSource(data=dict(date=dates, close=goog['adj_close']))

p = figure(height=300, width=800, tools="xpan", toolbar_location=None,
           x_axis_type="datetime", x_axis_location="above",
           background_fill_color="#efefef", x_range=(dates[500], dates[1000]))

p.line('date', 'close', source=source)
p.yaxis.axis_label = 'Price'

select = figure(title="Drag the middle and edges of the selection box to change the range above",
                height=130, width=800, y_range=p.y_range,
                x_axis_type="datetime", y_axis_type=None,
                tools="", toolbar_location=None, background_fill_color="#efefef")

range_tool = RangeTool(x_range=p.x_range)
range_tool.overlay.fill_color = "navy"
range_tool.overlay.fill_alpha = 0.2

select.line('date', 'close', source=source)
select.ygrid.grid_line_color = None
select.add_tools(range_tool)

show(column(p, select))

Candlestick Charts#

A popular tool in financial visualizations are Candlestick Charts. These show the distribution of prices over a unit of time for a stock. Below, a candlestick chart is created for Microsoft stock data. Your task is to add commentary for each line of code, explaining what each line is doing. Then, modularize the code in a function to use the Alpha Vantage api where the function takes in a ticker symbol and optional start and stop dates, returning a candlestick chart of the ticker.

import pandas as pd

from bokeh.models import BoxAnnotation
from bokeh.plotting import figure, show
from bokeh.sampledata.stocks import MSFT

df = pd.DataFrame(MSFT)[60:120]
df["date"] = pd.to_datetime(df["date"])

inc = df.close > df.open
dec = df.open > df.close

non_working_days = df[['date']].assign(diff=df['date'].diff()-pd.Timedelta('1D'))
non_working_days = non_working_days[non_working_days['diff']>=pd.Timedelta('1D')]

df['date'] += pd.Timedelta('12h') # move candles to the center of the day

TOOLS = "pan,wheel_zoom,box_zoom,reset,save"

p = figure(x_axis_type="datetime", tools=TOOLS, width=1000, height=400,
           title="MSFT Candlestick", background_fill_color="#efefef")
p.xaxis.major_label_orientation = 0.8 # radians

boxes = [
    BoxAnnotation(fill_color="#bbbbbb", fill_alpha=0.2, left=date-diff, right=date)
    for date, diff in non_working_days.values
]
p.renderers.extend(boxes)

p.segment(df.date, df.high, df.date, df.low, color="black")

p.vbar(df.date[dec], pd.Timedelta('16h'), df.open[dec], df.close[dec], color="#eb3c40")
p.vbar(df.date[inc], pd.Timedelta('16h'), df.open[inc], df.close[inc], fill_color="white",
       line_color="#49a3a3", line_width=2)

show(p)