Graph showing the impact of time-varying predictors on mental health in a multilevel model. It compares predicted mental health for individuals with different log income levels over time.

Understanding time-varying predictors in multilevel models for longitudinal data

Multilevel models for change are among the most popular approaches used to understand how people, groups and societies change over time. In previous posts, we introduced the model and showed how to include time-constant predictors to explain change in time. Here, we will discuss how to add time-varying predictors in such models to analyze dynamic processes. This technique allows researchers to account for variables that change over time, providing a nuanced understanding of how these predictors influence outcomes. For instance, in studying educational progress, time-varying predictors like student engagement or teacher support can be critical in explaining individual differences in achievement growth. Understanding how to include time-varying predictors in multilevel models is essential for accurately capturing the complexity of longitudinal data and enhancing the precision of statistical analyses.

Preparing the data

You can follow this guide by running the code in R on your computer. Our examples will use synthetic (simulated) data modelled after Understanding Society, a comprehensive panel study from the UK. The real data can be accessed for free from the UK Data Archive.

In a previous blog, we cleaned the data and we have it both in the long format and the wide one. We will use that cleaned data here. If you want to follow along you can download the data from here and all the code from here.

Access the code used here.

Access the data here.

Before we start, make sure you have the tidyverse and lme4 packages installed and loaded by running the following code:

install.packages("lme4")
install.packages("tidyverse")

library(lme4)
library(tidyverse)

We will mainly use the lme4 package to run multilevel models but will also use tidyverse for some light data cleaning.

Next, we will load the data prepared in the previous post. This contains both the long and the wide format data in an “RData” file. We will load the long-format data and have a quick look at it:

load("./data/us_clean_syn.RData")

glimpse(usl)
## Rows: 204,028
## Columns: 32
## $ pidp        <dbl> 5, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 10, 10, 10, 10, 11, 11…
## $ wave        <dbl> 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4…
## $ age         <dbl+lbl> 28, 28, 28, 28, 80, 80, 80, 80, 60, 60, 60, 60, 42, 42…
## $ hiqual      <dbl+lbl> 1, 1, 1, 1, 9, 9, 9, 9, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, …
## $ single      <dbl+lbl>  1,  1,  1,  0,  0,  0,  0,  0,  1, NA, NA, NA,  0, NA…
## $ fimngrs     <dbl> 3283.87, 4002.50, 3616.67, 850.50, 896.00, 709.00, 702.00,…
## $ sclfsato    <dbl+lbl> -9,  6,  2,  1,  6,  6, -8,  3,  2, NA, NA, NA, -9, NA…
## $ sf12pcs     <dbl> 54.51, 62.98, 56.97, 56.15, 53.93, 46.39, NA, 46.16, 33.18…
## $ sf12mcs     <dbl> 55.73, 36.22, 60.02, 59.04, 46.48, 45.39, NA, 37.02, 46.80…
## $ istrtdaty   <dbl+lbl> 2009, 2010, 2011, 2012, 2010, 2011, 2012, 2013, 2009, …
## $ sf1         <dbl+lbl>  2,  2,  1,  3,  4,  3,  3,  3,  5, NA, NA, NA,  2, NA…
## $ present     <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, NA, …
## $ gndr.fct    <fct> Male, Male, Male, Male, Male, Male, Male, Male, Male, Male…
## $ single.fct  <fct> Single, Single, Single, In relationship, In relationship, …
## $ urban.fct   <fct> Urban, Urban, Urban, Urban, Urban, Urban, Urban, Urban, Ur…
## $ degree.fct  <fct> Degree, Degree, Degree, Degree, No degree, No degree, No d…
## $ vote6.fct   <fct> Not very, Not at all, Not at all, Not at all, Not at all, …
## $ sati        <dbl> NA, 6, 2, 1, 6, 6, NA, 3, 2, NA, NA, NA, NA, NA, 5, NA, NA…
## $ sati.fct    <fct> NA, Mostly satisfied, Mostly dissatisfied, Completely diss…
## $ fimngrs.cap <dbl> 3283.87, 4002.50, 3616.67, 850.50, 896.00, 709.00, 702.00,…
## $ logincome   <dbl> 8.099818, 8.297170, 8.196070, 6.757514, 6.809039, 6.577861…
## $ year        <dbl> 2009, 2010, 2011, 2012, 2010, 2011, 2012, 2013, 2009, NA, …
## $ sati.ind    <dbl> 3.000000, 3.000000, 3.000000, 3.000000, 5.000000, 5.000000…
## $ sati.dev    <dbl> NA, 3.000000, -1.000000, -2.000000, 1.000000, 1.000000, NA…
## $ sf12pcs.ind <dbl> 57.65250, 57.65250, 57.65250, 57.65250, 48.82667, 48.82667…
## $ sf12pcs.dev <dbl> -3.1425000, 5.3275000, -0.6825000, -1.5025000, 5.1033333, …
## $ sf12mcs.ind <dbl> 52.75250, 52.75250, 52.75250, 52.75250, 42.96333, 42.96333…
## $ sf12mcs.dev <dbl> 2.977500, -16.532500, 7.267500, 6.287500, 3.516667, 2.4266…
## $ waves       <int> 4, 4, 4, 4, 4, 4, 4, 4, 1, 1, 1, 1, 2, 2, 2, 2, 4, 4, 4, 4…
## $ sati.lag    <dbl> NA, NA, 6, 2, NA, 6, 6, NA, NA, 2, NA, NA, NA, NA, NA, 5, …
## $ sf12pcs.lag <dbl> NA, 54.51, 62.98, 56.97, NA, 53.93, 46.39, NA, NA, 33.18, …
## $ sf12mcs.lag <dbl> NA, 55.73, 36.22, 60.02, NA, 46.48, 45.39, NA, NA, 46.80, …

Including Time Varying Predictors

We discussed the basic multilevel model of change and included time-constant predictors in previous posts (here and here). Now we will build on that knowledge to include time-varying predictors in the model.

Before we run the model, it’s good practice to fix the time variable to start from 0 to make the intercept easier to interpret:

usl <- mutate(usl,
              wave0 = wave - 1) 

Including time-varying predictors in a multilevel model for change works in the same way as time-constant ones, but their interpretation is more complex as their values can change over time. Let’s look at a simple example where we include log income as a predictor of mental health:

m1 <- lmer(data = usl, sf12mcs ~ 1 + wave0 + logincome + 
             (1 + wave0 | pidp))
summary(m1)
## Linear mixed model fit by REML ['lmerMod']
## Formula: sf12mcs ~ 1 + wave0 + logincome + (1 + wave0 | pidp)
##    Data: usl
## 
## REML criterion at convergence: 934469.1
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.3666 -0.4289  0.1551  0.5582  3.1007 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  pidp     (Intercept) 39.740   6.304         
##           wave0        3.053   1.747    -0.38
##  Residual             59.077   7.686         
## Number of obs: 127863, groups:  pidp, 48615
## 
## Fixed effects:
##             Estimate Std. Error t value
## (Intercept) 48.77459    0.15088  323.28
## wave0       -0.44797    0.02191  -20.45
## logincome    0.22631    0.02166   10.45

Based on the model, the rate of change in mental health is -0.448. This indicates that with the passing of each wave, mental health decreases by 0.448, on average. Here, we see that an increase of log income by 1 leads to an increase in mental health by 0.226. This shift can happen multiple times because individuals can increase and decrease their income over time.

To help better understand how to interpret the results we can build three case studies. We will create a new small data with three individuals: one that always has logincome = 2, one that always has 9 and one that has 4 in the first wave and then switches to 5, 3 and 3.

pred_data <- tibble(pidp = rep(1:3, each = 4) |> as.factor(),
                    wave0 = rep(0:3, 3),
                    logincome = c(rep(c(2, 9), each = 4), 
                                  c(4, 5, 3, 3)))

pred_data
## # A tibble: 12 × 3
##    pidp  wave0 logincome
##    <fct> <int>     <dbl>
##  1 1         0         2
##  2 1         1         2
##  3 1         2         2
##  4 1         3         2
##  5 2         0         9
##  6 2         1         9
##  7 2         2         9
##  8 2         3         9
##  9 3         0         4
## 10 3         1         5
## 11 3         2         3
## 12 3         3         3

We can then predict the expected mental health for these individuals based on our model:

pred_data <- mutate(pred_data,
       pred_m1 = predict(m1, 
                         newdata = pred_data, 
                         allow.new.levels = TRUE))

pred_data
## # A tibble: 12 × 4
##    pidp  wave0 logincome pred_m1
##    <fct> <int>     <dbl>   <dbl>
##  1 1         0         2    49.2
##  2 1         1         2    48.8
##  3 1         2         2    48.3
##  4 1         3         2    47.9
##  5 2         0         9    50.8
##  6 2         1         9    50.4
##  7 2         2         9    49.9
##  8 2         3         9    49.5
##  9 3         0         4    49.7
## 10 3         1         5    49.5
## 11 3         2         3    48.6
## 12 3         3         3    48.1

Finally, we can visualize how their mental health would change over time.

pred_data |> 
  ggplot(aes(wave0, pred_m1, 
             color = pidp, 
             group = pidp)) +
  geom_line()

This way, we can easily see that the trends for the individuals are the same and how the shift in log income impacts the trajectory of mental health. This approach also helps communicate the findings of these complex models.

Let’s examine in more detail how these predicted scores were created using a slightly nice version of the graph.

Graph showing the impact of time-varying predictors on mental health in a multilevel model. It compares predicted mental health for individuals with different log income levels over time.

The first individual, line green, always has a log income of 9. Based on our model, this leads to better mental health. The 0.226 coefficient causes the shift in the line of this individual. On the other hand, we have an individual who always has a log income of 2 (red line). Their mental health is lower. We also note that the two lines are parallel. This is because, in the absence of an interaction with time, we assume that the rate of change is the same regardless of the log income value. Based on this model, log income leads to a shift in the trend line but not a change in the trajectory itself.

The last individual (blue line) starts with an initial log income of 4. In the second wave, their income increases to 5, shifting the trend line by 0.226. In the next wave, their income decreases by two, leading to another shift in their trend line of 2 * 0.226. In this way, we can see how the change in log income “shifts” individuals to different hypothetical trends.

In some situations, we might have a hypothesis that a variable not only increases or decreases the values of the outcomes but also impacts their trajectory. Alternatively, we can think that the effect of a variable might not be the same over time. Maybe we expect the effect of income to be lower as people get older. In such situations, we can expand the model to include an interaction between the predictor and time (similar to what we have done before).

m2 <- lmer(data = usl, sf12mcs ~ 1 + wave0 + 
             logincome + logincome:wave0 +
             (1 + wave0 | pidp))
summary(m2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: sf12mcs ~ 1 + wave0 + logincome + logincome:wave0 + (1 + wave0 |  
##     pidp)
##    Data: usl
## 
## REML criterion at convergence: 934472.2
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -5.3668 -0.4293  0.1552  0.5581  3.1014 
## 
## Random effects:
##  Groups   Name        Variance Std.Dev. Corr 
##  pidp     (Intercept) 39.756   6.305         
##           wave0        3.053   1.747    -0.38
##  Residual             59.077   7.686         
## Number of obs: 127863, groups:  pidp, 48615
## 
## Fixed effects:
##                 Estimate Std. Error t value
## (Intercept)     48.96708    0.18545 264.047
## wave0           -0.65471    0.11779  -5.558
## logincome        0.19769    0.02695   7.335
## wave0:logincome  0.02987    0.01672   1.787

Based on these results, log income not only has a positive effect on mental health, but its effect increases over time. In each wave, the effect is larger by 0.03

Let’s look again at three hypothetical trends to better understand the model’s implications.

Graph showing the effect of time-varying predictors on mental health in a multilevel model, highlighting the impact of log income changes and interaction over time.

For an individual who always has a log income of 9 (green line), we see that they start with higher mental health in the first wave. This is caused by the main effect of log income (0.198). We observe that their mental health decreases over time, but it does so at a lower rate compared to the other two individuals. The interaction effect causes this difference in the trend. On the other hand, we see that the individual who always has a log income of 2 (red line) not only starts with lower mental health but also decreases at a faster rate.

The third individual (blue line) starts with a log income of 4 and shifts to a value of 5 in the second wave. This increase in log income initially leads to a change in mental health of 0.198. But because of the interaction effect, as time passes they get an additional improvement in mental health. By wave 4, they get an additional 2 * 0.03 in the mental health score.

While we can interpret the coefficients like in a standard regression, the substantive interpretation can be more complex. The main reason for this is that time-varying variables also include two sources of variation. The effect of log income could be due to between variation, for example, because the respondent is more or less wealthy, which is a stable trait. The effect could be due to within variation, if the respondent had a significant increase or decrease in income, which could be important in relative terms (feeling more or less financially secure than usual) and less critical in absolute terms.

Conclusions

In conclusion, incorporating time-varying predictors into multilevel models offers a way to analyze dynamic processes in longitudinal data. By accounting for variables that change over time, such as income or educational support, researchers can gain deeper insights into the factors influencing outcomes like mental health or academic achievement. As demonstrated here, the interpretation of such models can be complex, especially when considering interactions. Building case studies of different individuals using predicted scores from the model can help better understand the implications of the findings.

This model can be easily extended to include multiple time constant and time-varying predictors. It can also include non-linear change in time, which we discuss here. To interpret time-varying predictors easier it is also possible to recode or decompose time-varying predictors using lagged effects or separating between and within variation. We will discuss how to do this in future posts.


Was the information useful?

Consider supporting the site by:

2 responses to “Understanding time-varying predictors in multilevel models for longitudinal data”