Setup details are described here, and the meta-post about these solutions is here.

Materials

The summmer course1 is based off of the second edition of Statistical Rethinking by Richard McElreath.

  • Chapter 13
    • E{1,2,3,4,5}
  • Chapter 14
    • E{1,2,3}

Packages

A colophon with details is provided at the end, but the following packages and theme parameters are used throughout.

1libsUsed<-c("tidyverse","tidybayes","orgutils","dagitty",
2            "rethinking","tidybayes.rethinking",
3            "ggplot2","kableExtra","dplyr","glue",
4            "latex2exp","data.table","printr","devtools")
5invisible(lapply(libsUsed, library, character.only = TRUE));
6theme_set(theme_grey(base_size=24))
7set.seed(1995)

Chapter XIII: Models With Memory

Easy Questions (Ch13)

HOLD 13E1

Which of the following priors will produce more shrinkage in the estimates? (a) αTANKNormal(0,1); (b) αTANKNormal(0,2).

Solution

The normal distribution fits a probability distribution centered around the mean and the spread is given by the standard deviation. Thus the first option, (a) will produce more shrinkage in the estimates, as the prior will be more concentrated.

1curve(dnorm(x,0,1),from=-10,to=10,col="red",ylab="density")
2curve(dnorm(x,0,2),add=TRUE)
3legend("topright",
4       col = c("red","black"),
5       pch = 19,
6       legend = c("Normal(0,1)","Normal(0,2)"))

13E2

Rewrite the following model as a multilevel model.

Solution

The model can be expressed as:

The priors have been chosen to be essentially uninformative, as is appropriate for a situation where no further insight is present for the hyperparameters.

13E3

Rewrite the following model as a multilevel model.

Solution

The model can be defined as:

HOLD 13E4

Write a mathematical model formula for a Poisson regression with varying intercepts.

Solution

13E5

Write a mathematical model formula for a Poisson regression with two different kinds of varying intercepts, a cross-classified model.

Solution

We will use the non-centered form for the cross-classified model.

Chapter XIV: Adventures in Covariance

Easy Questions (Ch14)

HOLD 14E1

Add to the following model varying slopes on the predictor x.

Solution

Following the convention in the physical sciences, I will use square brackets for matrices and parenthesis for vectors.

Where we do not have any information so have used a standard weakly informative LKJcorr prior for correlation matrices which is flat for all valid correlation matrices. We also use weakly uninformative priors for the standard deviations among slopes and intercepts.

HOLD 14E2

Think up a context in which varying intercepts will be positively correlated with varying slopes. Provide a mechanistic explanation for the correlation.

Solution

We note at the onset that the concept of varying intercepts is to account for blocks or sub-groups in our problem. This means that the clusters in our data which have higher average values will show a stronger positive association with predictor variables. To augment the example of the tadpoles in the book, if the data is arranged as:

  • Tadpoles in tanks
  • Some tanks have larger tadpoles (different species) which grow faster

For a repeated measurement in an interval of time, there will be a positive correlation between the initial height and the slope.

HOLD 14E3

When is it possible for a varying slopes model to have fewer effective parameters (as estimated by WAIC or PSIS) than the corresponding model with fixed (unpooled) slopes? Explain.

Solution

The varying effects essentially causes regularization or shrinkage towards the global mean to prevent overfitting to the individual data-points. Consider the example from the text, for the chimpanzee experiment.

1data(chimpanzees)
2d <- chimpanzees
3d$block_id <- d$block
4d$treatment <- 1L + d$prosoc_left + 2L*d$condition
5dat <- list(
6L = d$pulled_left,
7tid = as.integer(d$treatment),
8actor = d$actor )

We will set up a simple fixed effects model.

1m14fix <- ulam(
2alist(
3L ~ dbinom( 1 , p ) ,
4logit(p) <- alpha[actor] + beta[tid] ,
5alpha[actor] ~ dnorm( 0 , 5 ),
6beta[tid] ~ dnorm( 0 , 0.5 )
7) , data=dat , chains=4 , log_lik=TRUE )

Now we can extend this to a varying slopes model, where we will consider varying slopes for actors.

1m14ppool <- ulam(
2alist(
3L ~ dbinom( 1 , p ) ,
4logit(p) <- alpha + a[actor]*vary_id + beta[tid],
5alpha ~ dnorm( 0 , 5 ),
6a[actor] ~ dnorm( 0 , 1 ),
7beta[tid] ~ dnorm( 0 , 0.5 ),
8vary_id ~ dexp( 1 )
9) , data=dat , chains=4 , log_lik=TRUE )

Now we can test the number of parameters.

1compare(m14ppool,m14fix) %>% toOrg
1| row.names |             WAIC |               SE |            dWAIC |              dSE |            pWAIC |            weight |
2|-----------+------------------+------------------+------------------+------------------+------------------+-------------------|
3| m14ppool  | 532.211705503729 | 19.5177343184252 |                0 |               NA | 9.12911563615787 | 0.796616007296431 |
4| m14fix    | 534.942259470042 | 18.0912938913487 | 2.73055396631241 | 1.66292448384092 | 8.10370201332515 | 0.203383992703568 |

As we can see, the model with partial pooling has only one effective additional parameter, even though the model without pooling has n(actor) intercepts (one per actor) with a standard deviation, while the partial pooling parameter has an additional average intercept and a standard deviation parameter.

Both the models have around the same number of effective parameters, which mean that the additional parameters do not actually cause additional overfitting. This simply implies that for Bayesian models, the raw number of model parameters does not correspond necessarily to a model with more overfitting.

In general, we should keep in mind that the effective number of parameters, when the variation among clusters is high, is probably going to be lower than the total number of parameters, due to adaptive regularization.

A: Colophon

To ensure that this document is fully reproducible at a later date, we will record the session info.

1devtools::session_info()

  1. Summer of 2020 ↩︎