Predicting Visitors with Facebook Prophet

Facebook open sourced its forecasting tool [Prohpet][1] for time series data. Although forecasting is not a trivial task, the libraries are very easy to use and produce nice results quickly. In this basic blog post, I am going to forecast the visitor statistics based on the historical data I collected with Piwik.

Python Prerequisites

Install and initialize a new virtual Python environment

# Install virtual environments package
sudo pip3 install virtualenv
# Create a new folder for the project 
mkdir python-projects
cd python-projects/
# Create a new virtual environment
virtualenv -p python3 py

Install Prophet and its Dependencies

Within your new Python virtual environment, install the required dependencies first and then Prophet

# Linux Dependencies
sudo apt-get install python3-tk
# Python Dependencies
./py/bin/pip3 install cython numpy
# Prohpet
./py/bin/pip3 install fbprophet```

## Get the Data from your Piwik Database

We aggregate the data from the visitors table per day and store the result in a CSV file. In the case of this blog, I started collecting visitor traffic data from early 2013. Prophet allows displaying not only trends and seasonality, but also to forecast into the future.

SELECT DATE_FORMAT(visit_first_action_time,'%Y-%m-%d’), SUM(visitor_count_visits) FROM db_piwik.piwik_log_visit GROUP BY 1 INTO OUTFILE ‘/tmp/visits.csv’ FIELDS TERMINATED BY ‘,’ LINES TERMINATED BY ‘\n’;“```

Usually MySQL runs with a security setting that prevents writing files to the server’s disk (for a good reason). Check the variable secure-file-priv to find the path you can use for exporting.

The data now looks similar like this:

~/python-projects $ head visits.csv 

This is exactly the format which Prophet expects.

Forecasting with Prophet

The short but [nice tutorial][2] basically shows it all. Here is the script, it is basically the very same as from the tutorial:

import pandas as pd
import numpy as np
from fbprophet import Prophet
import matplotlib.pyplot as plt

df = pd.read_csv('visits.csv')
df.columns = ['ds', 'y']
df['y'] = np.log(df['y'])

m = Prophet();

future = m.make_future_dataframe(periods=365)

forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()

figure_forecast = m.plot(forecast);


The results are the forecast graph and the components as nice graphs. Facebook Prophet incorporates seasonal variations, holidays and trends derived from historical data.

[<img class="aligncenter size-full wp-image-2925" src="./media/2017/06/forcast.png" alt="" width="720" height="432" srcset="./media/2017/06/forcast.png 720w, ./media/2017/06/forcast-300x180.png 300w" sizes="(max-width: 720px) 100vw, 720px" />][3]

[<img class="aligncenter size-full wp-image-2926" src="./media/2017/06/forcast_component.png" alt="" width="648" height="648" srcset="./media/2017/06/forcast_component.png 648w, ./media/2017/06/forcast_component-150x150.png 150w, ./media/2017/06/forcast_component-300x300.png 300w, ./media/2017/06/forcast_component-60x60.png 60w" sizes="(max-width: 648px) 100vw, 648px" />][4]As you can see, the weekend is rather low on visitors and that the beginning summer is also rather weak.

<div class="twttr_buttons">
  <div class="twttr_twitter">
    <a href="" class="twitter-share-button" data-via="" data-hashtags=""  data-size="default" data-url=""  data-related="" target="_blank">Tweet</a>
  <div class="twttr_followme">
    <a href="" class="twitter-follow-button" data-show-count="true" data-size="default"  data-show-screen-name="false"  target="_blank">Follow me</a>

 [3]: ./media/2017/06/forcast.png
 [4]: ./media/2017/06/forcast_component.png