How to estimate and interpret parallel Latent Growth Models (LGM) in R: a step-by-step guide

How does change in physical activity influence mental well-being trends? Do income and life satisfaction evolve in tandem during retirement? Can parenting stress and child behaviour patterns predict each other across the early years? These kinds of questions—spanning psychology, sociology, and public health—require a method that captures the simultaneous growth trajectories of two or more variables and examines their relationships over time. Parallel latent growth models (LGM) provide a robust framework for tackling these challenges. This post will explore how to estimate and interpret parallel LGMs using R, empowering you to answer complex longitudinal research questions.

Preparing the data for analysis

You can follow this guide by running the code in R on your computer. Our examples will use synthetic (simulated) data modelled after Understanding Society, a comprehensive panel study from the UK. The actual data can be accessed for free from the UK Data Archive.

In a previous blog, we cleaned the data and have both the long and the wide formats. We will use that cleaned data here. If you want to follow along, you can download the data here and all the code here.

Access the code used here.
Access the data here.

Before we start, make sure you have the tidyverse and lavaan packages loaded by running the following code:

library(tidyverse)
library(lavaan)

We will mainly use the lavaan package to run LGMs and tidyverse for light data cleaning.

Next, we will load the data prepared in the previous post. This contains long and wide format data in an “RData” file. We will load the long-format data and have a quick look at it:

load("./data/us_clean_syn.RData")

Parallel Latent Growth Models

Latent Growth Models (LGM) are a type of structural equation model that estimates the growth trajectory of a variable over time. They are applied to longitudinal data, where the same cases are measured at multiple time points. LGMs can be used to estimate the average growth trajectory of a variable, as well as individual differences in growth rates. LGMs are estimated in the Structural Equation Modeling (SEM) framework, which allows the estimation of latent variables (unobserved constructs) and their relationships. This flexible framework can estimate many models, including parallel LGMs.

Parallel LGMs are a type of LGM that simultaneously estimates the growth trajectories of two or more variables. This allows researchers to investigate how the growth trajectories of different variables are related to each other over time. For example, researchers might be interested in how income and life satisfaction change over time and whether these changes are associated. Parallel LGMs can be used to estimate the growth trajectories of both variables and the relationship between them.

A visual representation of a parallel LGM is shown below. In this model, we have two latent variables representing the growth trajectories of two variables (e.g., income and life satisfaction). The arrows represent the relationships between the latent and observed variables (e.g., income and life satisfaction at each time point). The model estimates the growth trajectories of the two variables and the relationship between them over time.

Diagram of a parallel latent growth model showing relationships between latent variables and observed variables over multiple time points.

In this context, the key coefficient of interest is the relationship between the rate of change of one variable (η₁) and the rate of change of another (η₃) . In the simplest version, we can investigate this relationship using a correlation (as seen in the graph above). If, on the other hand, there is a clear expectation regarding the causal direction, the correlation can be replaced with a regression.

To show how to estimate a parallel LGM in the real world, we will explore how a change in income is related to the change in satisfaction over four waves. To estimate the model, we need to specify the growth trajectories for both variables together. By default, lavaan includes correlations between latent variables, so we don’t need to add it explicitly.

model <- 'i_inc =~ 1*logincome_1 + 1*logincome_2 + 
                    1*logincome_3 + 1*logincome_4 
          s_inc =~ 0*logincome_1 + 1*logincome_2 + 
                    2*logincome_3 + 3*logincome_4 
    
          i_sati =~ 1*sati_1 + 1*sati_2 + 1*sati_3 + 1*sati_4
          s_sati =~ 0*sati_1 + 1*sati_2 + 2*sati_3 + 3*sati_4'

fit <- growth(model, data = usw)

summary(fit, standardized = T)

## lavaan 0.6-19 ended normally after 75 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        22
## 
##                                                   Used       Total
##   Number of observations                         16360       51007
## 
## Model Test User Model:
##                                                       
##   Test statistic                               763.682
##   Degrees of freedom                                22
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   i_inc =~                                                              
##     logincome_1       1.000                               1.163    0.871
##     logincome_2       1.000                               1.163    0.907
##     logincome_3       1.000                               1.163    0.986
##     logincome_4       1.000                               1.163    1.039
##   s_inc =~                                                              
##     logincome_1       0.000                               0.000    0.000
##     logincome_2       1.000                               0.285    0.222
##     logincome_3       2.000                               0.570    0.483
##     logincome_4       3.000                               0.855    0.764
##   i_sati =~                                                             
##     sati_1            1.000                               1.025    0.748
##     sati_2            1.000                               1.025    0.739
##     sati_3            1.000                               1.025    0.692
##     sati_4            1.000                               1.025    0.695
##   s_sati =~                                                             
##     sati_1            0.000                               0.000    0.000
##     sati_2            1.000                               0.328    0.236
##     sati_3            2.000                               0.655    0.442
##     sati_4            3.000                               0.983    0.666
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   i_inc ~~                                                              
##     s_inc            -0.222    0.006  -36.902    0.000   -0.670   -0.670
##     i_sati            0.001    0.013    0.063    0.950    0.001    0.001
##     s_sati            0.021    0.006    3.742    0.000    0.056    0.056
##   s_inc ~~                                                              
##     i_sati            0.011    0.004    2.605    0.009    0.038    0.038
##     s_sati           -0.003    0.002   -1.373    0.170   -0.027   -0.027
##   i_sati ~~                                                             
##     s_sati           -0.186    0.009  -21.488    0.000   -0.554   -0.554
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     i_inc             6.892    0.010  679.198    0.000    5.925    5.925
##     s_inc             0.075    0.003   22.691    0.000    0.262    0.262
##     i_sati            5.379    0.010  529.848    0.000    5.247    5.247
##     s_sati           -0.081    0.004  -18.241    0.000   -0.246   -0.246
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .logincome_1       0.430    0.012   34.849    0.000    0.430    0.241
##    .logincome_2       0.655    0.009   69.936    0.000    0.655    0.398
##    .logincome_3       0.602    0.008   71.703    0.000    0.602    0.432
##    .logincome_4       0.502    0.011   45.130    0.000    0.502    0.400
##    .sati_1            0.826    0.020   41.008    0.000    0.826    0.440
##    .sati_2            1.137    0.016   72.278    0.000    1.137    0.591
##    .sati_3            1.460    0.019   75.688    0.000    1.460    0.665
##    .sati_4            1.278    0.025   50.977    0.000    1.278    0.587
##     i_inc             1.353    0.020   67.722    0.000    1.000    1.000
##     s_inc             0.081    0.003   30.919    0.000    1.000    1.000
##     i_sati            1.051    0.022   47.533    0.000    1.000    1.000
##     s_sati            0.107    0.005   22.356    0.000    1.000    1.000

The coefficients of the parallel LGM can be interpreted as in the standard model. We discussed this in detail in this blog post.

The key coefficient of interest in the parallel LGM is the correlation between the two rates of change, in this case “s_inc” and “s_sati”. In our model, the correlation between the two is -0.027. One way to interpret this is that an increase in satisfaction leads to a slight decrease in income. Given this is a correlation, we can also interpret it as an increase in income leads to a slight decrease in satisfaction.

Visualizing the results

To make things slightly easier to understand, we can consider what these latent variables represent and what the correlation implies. Based on our model, we can visualize the distribution of the latent variables.

# predict individual level values from model
pred_fit <- predict(fit)

# get means of the predicted values
pred_fit_means <- map_df(as_tibble(pred_fit), mean) |> 
  gather(key = key, value = value)

# plot the distribution of the latent variables
pred_fit |>
  as_tibble() |> 
  gather(key = key, value = value) |> # reshape to long format
  ggplot(aes(value)) +
  geom_histogram() +
  geom_vline(data = pred_fit_means, 
             aes(xintercept = value), color = "red") +
  facet_wrap(~key, scales = "free") +
  theme_bw()

Histogram of latent variable distributions for a parallel latent growth model, illustrating individual variations in income and life satisfaction at baseline and slope changes over time.

The graph’s red lines represent the average values (equivalent to the values under “Intercepts” in the output), while the grey bars are a histogram of the predicted scores for different individuals. For example, “i_inc” represents the expected log income at the start of the study. The average is close to 7. Some people have values above that at the beginning of the study (they are on the right of the graph), while some start lower. The interpretation of “i_sati” is similar but refers to satisfaction at the start of the study. Looking at the slopes, we see that the average slope for income is slightly positive, while for satisfaction, it is somewhat negative. So, overall, income increases over time while satisfaction decreases. That being said, there is some variation around that average value, especially for satisfaction.

When using the parallel LGM, we are interested in the relationship between these two slope latent variables. Our results show a slight negative correlation that can be visually represented using a scatter plot:

pred_fit |>
  as_tibble() |> 
  ggplot(aes(s_inc, s_sati)) +
  geom_point(alpha = 0.05) +
  geom_smooth(method = "lm", se = F) +
  theme_bw()

Scatter plot showing the relationship between slopes of income growth and life satisfaction change in a parallel latent growth model, with trend line indicating a slight negative correlation

The graph shows that someone who increases their income (on the right side of the graph) is expected to have a decreasing satisfaction (lower part of the graph).

Adding regressions in parallel LGMs

So far, we have not made any assumptions about the relationship between the two variables. We just estimated their correlation. If we have a clear expectation about the causal direction of the relationship, we can specify a regression instead of a correlation. For example, if we expect that changes in income lead to changes in satisfaction, we can specify a regression from income to satisfaction. This is done by adding a regression from the slope of income to the slope of satisfaction. The model new model would be:

model <- 'i_inc =~ 1*logincome_1 + 1*logincome_2 + 
                    1*logincome_3 + 1*logincome_4 
          s_inc =~ 0*logincome_1 + 1*logincome_2 + 
                    2*logincome_3 + 3*logincome_4 
    
          i_sati =~ 1*sati_1 + 1*sati_2 + 1*sati_3 + 1*sati_4
          s_sati =~ 0*sati_1 + 1*sati_2 + 2*sati_3 + 3*sati_4

          s_sati ~ s_inc'

fit2 <- growth(model, data = usw)

summary(fit2, standardized = T)

## lavaan 0.6-19 ended normally after 63 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        20
## 
##                                                   Used       Total
##   Number of observations                         16360       51007
## 
## Model Test User Model:
##                                                       
##   Test statistic                              1314.986
##   Degrees of freedom                                24
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   i_inc =~                                                              
##     logincome_1       1.000                               1.163    0.871
##     logincome_2       1.000                               1.163    0.907
##     logincome_3       1.000                               1.163    0.986
##     logincome_4       1.000                               1.163    1.038
##   s_inc =~                                                              
##     logincome_1       0.000                               0.000    0.000
##     logincome_2       1.000                               0.284    0.222
##     logincome_3       2.000                               0.568    0.482
##     logincome_4       3.000                               0.853    0.761
##   i_sati =~                                                             
##     sati_1            1.000                               0.821    0.607
##     sati_2            1.000                               0.821    0.600
##     sati_3            1.000                               0.821    0.553
##     sati_4            1.000                               0.821    0.538
##   s_sati =~                                                             
##     sati_1            0.000                               0.000    0.000
##     sati_2            1.000                               0.152    0.111
##     sati_3            2.000                               0.304    0.205
##     sati_4            3.000                               0.456    0.299
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   s_sati ~                                                              
##     s_inc            -0.074    0.021   -3.584    0.000   -0.139   -0.139
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   i_inc ~~                                                              
##     s_inc            -0.222    0.006  -36.853    0.000   -0.671   -0.671
##     i_sati            0.005    0.012    0.442    0.659    0.005    0.005
##   s_inc ~~                                                              
##     i_sati            0.017    0.004    4.265    0.000    0.073    0.073
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     i_inc             6.892    0.010  679.136    0.000    5.925    5.925
##     s_inc             0.075    0.003   22.705    0.000    0.262    0.262
##     i_sati            5.380    0.010  561.987    0.000    6.555    6.555
##    .s_sati           -0.076    0.004  -17.063    0.000   -0.498   -0.498
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .logincome_1       0.431    0.012   34.930    0.000    0.431    0.241
##    .logincome_2       0.655    0.009   69.914    0.000    0.655    0.398
##    .logincome_3       0.601    0.008   71.694    0.000    0.601    0.433
##    .logincome_4       0.505    0.011   45.361    0.000    0.505    0.403
##    .sati_1            1.158    0.016   70.655    0.000    1.158    0.632
##    .sati_2            1.175    0.016   72.824    0.000    1.175    0.629
##    .sati_3            1.442    0.019   74.439    0.000    1.442    0.654
##    .sati_4            1.453    0.025   58.166    0.000    1.453    0.624
##     i_inc             1.353    0.020   67.700    0.000    1.000    1.000
##     s_inc             0.081    0.003   30.759    0.000    1.000    1.000
##     i_sati            0.674    0.012   54.513    0.000    1.000    1.000
##    .s_sati            0.023    0.003    8.021    0.000    0.981    0.981

The model’s output is similar to the previous one, but now we have a regression from the slope of income to the slope of satisfaction. This regression is significant and negative, indicating that an increase in income leads to a decrease in satisfaction. More precisely, an increase of 1 log income leads to a decline of 0.07 in satisfaction.

The model can be expanded in several ways. If, for example, we expect that some other variable mediates the relationship between income and satisfaction, we can include that in the model. We can also include non-linear effects, interactions, and other controls. The model can be expanded to include more than two variables. For example, we can include a third variable and investigate how the three variables are related over time.

Conclusion

Parallel latent growth models are a versatile and powerful tool for analysing the dynamic relationships between multiple trajectories over time. Whether you’re exploring how income and life satisfaction evolve together or investigating the interplay between physical activity and mental health, these models offer valuable insights into complex longitudinal data.

As you apply these techniques, remember that the flexibility of parallel LGMs allows for further customisation—whether through adding mediators, exploring non-linear effects, or testing directional hypotheses. These extensions can help you tailor the model to your research questions and datasets.

Learn more about LGMs by exploring our other blog posts on estimating and visualising LGMs and including time-constant and time-varying predictors. If you want to learn how to model non-linear LGMs, check out this post. To understand how LGM differs from the multilevel model for change, you can read more about it here.

Was the information useful?

Consider supporting the site:

Subscribe to newsletter 📬

Buy a coffee ☕

Buy the book 📖

Take a course 🏫

Longitudinal Analysis

Longitudinal methods, made clear