(GLM-poisson-regression)=
:::{post} November 30, 2022 :tags: regression, poisson :category: intermediate :author: Jonathan Sedar, Benjamin Vincent :::
This is a minimal reproducible example of Poisson regression to predict counts using dummy data.
This Notebook is basically an excuse to demo Poisson regression using PyMC, both manually and using bambi to demo interactions using the formulae library. We will create some dummy data, Poisson distributed according to a linear model, and try to recover the coefficients of that linear model through inference.
For more statistical detail see:
This very basic model is inspired by a project by Ian Osvald, which is concerned with understanding the various effects of external environmental factors upon the allergic sneezing of a test subject.
This dummy dataset is created to emulate some data created as part of a study into quantified self, and the real data is more complicated than this. Ask Ian Osvald if you'd like to know more @ianozvald.
nsneeze (int)alcohol (boolean)nomeds (boolean)Create 4000 days of data: daily counts of sneezes which are Poisson distributed w.r.t alcohol consumption and antihistamine usage
Observe:
nomeds == False and alcohol == False (top-left, akak antihistamines WERE used, alcohol was NOT drunk) the mean of the poisson distribution of sneeze counts is low.alcohol == True (top-right) increases the sneeze count nsneeze slightlynomeds == True (lower-left) increases the sneeze count nsneeze furtheralcohol == True and nomeds == True (lower-right) increases the sneeze count nsneeze a lot, increasing both the mean and variance.Our model here is a very simple Poisson regression, allowing for interaction of terms:
Create linear model for interaction of terms
Create Design Matrices
Create Model
Sample Model
View Diagnostics
Observe:
Observe:
The contributions from each feature as a multiplier of the baseline sneezecount appear to be as per the data generation:
exp(Intercept): mean=1.05 cr=[0.98, 1.10]
Roughly linear baseline count when no alcohol and meds, as per the generated data:
theta_noalcohol_meds = 1 (as set above) theta_noalcohol_meds = exp(Intercept) = 1
exp(alcohol): mean=2.86 cr=[2.67, 3.07]
non-zero positive effect of adding alcohol, a ~3x multiplier of baseline sneeze count, as per the generated data:
theta_alcohol_meds = 3 (as set above) theta_alcohol_meds = exp(Intercept + alcohol) = exp(Intercept) * exp(alcohol) = 1 * 3 = 3
exp(nomeds): mean=5.73 cr=[5.34, 6.08]
larger, non-zero positive effect of adding nomeds, a ~6x multiplier of baseline sneeze count, as per the generated data:
theta_noalcohol_nomeds = 6 (as set above) theta_noalcohol_nomeds = exp(Intercept + nomeds) = exp(Intercept) * exp(nomeds) = 1 * 6 = 6
exp(alcohol:nomeds): mean=2.10 cr=[1.96, 2.28]
small, positive interaction effect of alcohol and meds, a ~2x multiplier of baseline sneeze count, as per the generated data:
theta_alcohol_nomeds = 36 (as set above) theta_alcohol_nomeds = exp(Intercept + alcohol + nomeds + alcohol:nomeds) = exp(Intercept) * exp(alcohol) * exp(nomeds * alcohol:nomeds) = 1 * 3 * 6 * 2 = 36
bambiCreate Model
Alternative automatic formulation using bambi
Fit Model
View Traces
Observe:
We can use az.plot_ppc() to check that the posterior predictive samples are similar to the observed data.
For more information on posterior predictive checks, we can refer to {ref}pymc:posterior_predictive.
:::{include} ../page_footer.md :::