(GLM-negative-binomial-regression)=

GLM: Negative Binomial Regression

:::{post} September, 2023 :tags: negative binomial regression, generalized linear model, :category: beginner :author: Ian Ozsvald, Abhipsha Das, Benjamin Vincent :::

:::{include} ../extra_installs.md :::

This notebook closely follows the GLM Poisson regression example by Jonathan Sedar (which is in turn inspired by a project by Ian Osvald) except the data here is negative binomially distributed instead of Poisson distributed.

Negative binomial regression is used to model count data for which the variance is higher than the mean. The negative binomial distribution can be thought of as a Poisson distribution whose rate parameter is gamma distributed, so that rate parameter can be adjusted to account for the increased variance.

Generate Data

As in the Poisson regression example, we assume that sneezing occurs at some baseline rate, and that consuming alcohol, not taking antihistamines, or doing both, increase its frequency.

Poisson Data

First, let's look at some Poisson distributed data from the Poisson regression example.

Since the mean and variance of a Poisson distributed random variable are equal, the sample means and variances are very close.

Negative Binomial Data

Now, suppose every subject in the dataset had the flu, increasing the variance of their sneezing (and causing an unfortunate few to sneeze over 70 times a day). If the mean number of sneezes stays the same but variance increases, the data might follow a negative binomial distribution.

As in the Poisson regression example, we see that drinking alcohol and/or not taking antihistamines increase the sneezing rate to varying degrees. Unlike in that example, for each combination of alcohol and nomeds, the variance of nsneeze is higher than the mean. This suggests that a Poisson distribution would be a poor fit for the data since the mean and variance of a Poisson distribution are equal.

Visualize the Data

Negative Binomial Regression

Create GLM Model

View Results

The mean values are close to the values we specified when generating the data:

  • The base rate is a constant 1.
  • Drinking alcohol triples the base rate.
  • Not taking antihistamines increases the base rate by 6 times.
  • Drinking alcohol and not taking antihistamines doubles the rate that would be expected if their rates were independent. If they were independent, then doing both would increase the base rate by 3*6=18 times, but instead the base rate is increased by 3*6*2=36 times.

Finally, the mean of nsneeze_alpha is also quite close to its actual value of 10.

See also, bambi's negative binomial example for further reference.

Authors

:::{include} ../page_footer.md :::

Empty cell