How does change in physical activity influence mental well-being trends? Do income and life satisfaction evolve in tandem during retirement? Can parenting stress and child behaviour patterns predict each other across the early years? These kinds of questions—spanning psychology, sociology, and public health—require a method that captures the simultaneous growth trajectories of two or more variables and examines their relationships over time. Parallel latent growth models (LGM) provide a robust framework for tackling these challenges. This post will explore how to estimate and interpret parallel LGMs using R, empowering you to answer complex longitudinal research questions.
Preparing the data for analysis
You can follow this guide by running the code in R
on your computer. Our examples will use synthetic (simulated) data modelled after Understanding Society, a comprehensive panel study from the UK. The actual data can be accessed for free from the UK Data Archive.
In a previous blog, we cleaned the data and have both the long and the wide formats. We will use that cleaned data here. If you want to follow along, you can download the data here and all the code here.
Before we start, make sure you have the tidyverse
and lavaan
packages loaded by running the following code:
library(tidyverse) library(lavaan)
We will mainly use the lavaan
package to run LGMs and tidyverse
for light data cleaning.
Next, we will load the data prepared in the previous post. This contains long and wide format data in an “RData” file. We will load the long-format data and have a quick look at it:
load("./data/us_clean_syn.RData")
Parallel Latent Growth Models
Latent Growth Models (LGM) are a type of structural equation model that estimates the growth trajectory of a variable over time. They are applied to longitudinal data, where the same cases are measured at multiple time points. LGMs can be used to estimate the average growth trajectory of a variable, as well as individual differences in growth rates. LGMs are estimated in the Structural Equation Modeling (SEM) framework, which allows the estimation of latent variables (unobserved constructs) and their relationships. This flexible framework can estimate many models, including parallel LGMs.
Parallel LGMs are a type of LGM that simultaneously estimates the growth trajectories of two or more variables. This allows researchers to investigate how the growth trajectories of different variables are related to each other over time. For example, researchers might be interested in how income and life satisfaction change over time and whether these changes are associated. Parallel LGMs can be used to estimate the growth trajectories of both variables and the relationship between them.
A visual representation of a parallel LGM is shown below. In this model, we have two latent variables representing the growth trajectories of two variables (e.g., income and life satisfaction). The arrows represent the relationships between the latent and observed variables (e.g., income and life satisfaction at each time point). The model estimates the growth trajectories of the two variables and the relationship between them over time.
In this context, the key coefficient of interest is the relationship between the rate of change of one variable (η1) and the rate of change of another (η3) . In the simplest version, we can investigate this relationship using a correlation (as seen in the graph above). If, on the other hand, there is a clear expectation regarding the causal direction, the correlation can be replaced with a regression.
To show how to estimate a parallel LGM in the real world, we will explore how a change in income is related to the change in satisfaction over four waves. To estimate the model, we need to specify the growth trajectories for both variables together. By default, lavaan
includes correlations between latent variables, so we don’t need to add it explicitly.
model <- 'i_inc =~ 1*logincome_1 + 1*logincome_2 + 1*logincome_3 + 1*logincome_4 s_inc =~ 0*logincome_1 + 1*logincome_2 + 2*logincome_3 + 3*logincome_4 i_sati =~ 1*sati_1 + 1*sati_2 + 1*sati_3 + 1*sati_4 s_sati =~ 0*sati_1 + 1*sati_2 + 2*sati_3 + 3*sati_4' fit <- growth(model, data = usw) summary(fit, standardized = T)
## lavaan 0.6-19 ended normally after 75 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 22 ## ## Used Total ## Number of observations 16360 51007 ## ## Model Test User Model: ## ## Test statistic 763.682 ## Degrees of freedom 22 ## P-value (Chi-square) 0.000 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## i_inc =~ ## logincome_1 1.000 1.163 0.871 ## logincome_2 1.000 1.163 0.907 ## logincome_3 1.000 1.163 0.986 ## logincome_4 1.000 1.163 1.039 ## s_inc =~ ## logincome_1 0.000 0.000 0.000 ## logincome_2 1.000 0.285 0.222 ## logincome_3 2.000 0.570 0.483 ## logincome_4 3.000 0.855 0.764 ## i_sati =~ ## sati_1 1.000 1.025 0.748 ## sati_2 1.000 1.025 0.739 ## sati_3 1.000 1.025 0.692 ## sati_4 1.000 1.025 0.695 ## s_sati =~ ## sati_1 0.000 0.000 0.000 ## sati_2 1.000 0.328 0.236 ## sati_3 2.000 0.655 0.442 ## sati_4 3.000 0.983 0.666 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## i_inc ~~ ## s_inc -0.222 0.006 -36.902 0.000 -0.670 -0.670 ## i_sati 0.001 0.013 0.063 0.950 0.001 0.001 ## s_sati 0.021 0.006 3.742 0.000 0.056 0.056 ## s_inc ~~ ## i_sati 0.011 0.004 2.605 0.009 0.038 0.038 ## s_sati -0.003 0.002 -1.373 0.170 -0.027 -0.027 ## i_sati ~~ ## s_sati -0.186 0.009 -21.488 0.000 -0.554 -0.554 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## i_inc 6.892 0.010 679.198 0.000 5.925 5.925 ## s_inc 0.075 0.003 22.691 0.000 0.262 0.262 ## i_sati 5.379 0.010 529.848 0.000 5.247 5.247 ## s_sati -0.081 0.004 -18.241 0.000 -0.246 -0.246 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .logincome_1 0.430 0.012 34.849 0.000 0.430 0.241 ## .logincome_2 0.655 0.009 69.936 0.000 0.655 0.398 ## .logincome_3 0.602 0.008 71.703 0.000 0.602 0.432 ## .logincome_4 0.502 0.011 45.130 0.000 0.502 0.400 ## .sati_1 0.826 0.020 41.008 0.000 0.826 0.440 ## .sati_2 1.137 0.016 72.278 0.000 1.137 0.591 ## .sati_3 1.460 0.019 75.688 0.000 1.460 0.665 ## .sati_4 1.278 0.025 50.977 0.000 1.278 0.587 ## i_inc 1.353 0.020 67.722 0.000 1.000 1.000 ## s_inc 0.081 0.003 30.919 0.000 1.000 1.000 ## i_sati 1.051 0.022 47.533 0.000 1.000 1.000 ## s_sati 0.107 0.005 22.356 0.000 1.000 1.000
The coefficients of the parallel LGM can be interpreted as in the standard model. We discussed this in detail in this blog post.
The key coefficient of interest in the parallel LGM is the correlation between the two rates of change, in this case “s_inc” and “s_sati”. In our model, the correlation between the two is -0.027. One way to interpret this is that an increase in satisfaction leads to a slight decrease in income. Given this is a correlation, we can also interpret it as an increase in income leads to a slight decrease in satisfaction.
Visualizing the results
To make things slightly easier to understand, we can consider what these latent variables represent and what the correlation implies. Based on our model, we can visualize the distribution of the latent variables.
# predict individual level values from model pred_fit <- predict(fit) # get means of the predicted values pred_fit_means <- map_df(as_tibble(pred_fit), mean) |> gather(key = key, value = value) # plot the distribution of the latent variables pred_fit |> as_tibble() |> gather(key = key, value = value) |> # reshape to long format ggplot(aes(value)) + geom_histogram() + geom_vline(data = pred_fit_means, aes(xintercept = value), color = "red") + facet_wrap(~key, scales = "free") + theme_bw()
The graph’s red lines represent the average values (equivalent to the values under “Intercepts” in the output), while the grey bars are a histogram of the predicted scores for different individuals. For example, “i_inc” represents the expected log income at the start of the study. The average is close to 7. Some people have values above that at the beginning of the study (they are on the right of the graph), while some start lower. The interpretation of “i_sati” is similar but refers to satisfaction at the start of the study. Looking at the slopes, we see that the average slope for income is slightly positive, while for satisfaction, it is somewhat negative. So, overall, income increases over time while satisfaction decreases. That being said, there is some variation around that average value, especially for satisfaction.
When using the parallel LGM, we are interested in the relationship between these two slope latent variables. Our results show a slight negative correlation that can be visually represented using a scatter plot:
pred_fit |> as_tibble() |> ggplot(aes(s_inc, s_sati)) + geom_point(alpha = 0.05) + geom_smooth(method = "lm", se = F) + theme_bw()
The graph shows that someone who increases their income (on the right side of the graph) is expected to have a decreasing satisfaction (lower part of the graph).
Adding regressions in parallel LGMs
So far, we have not made any assumptions about the relationship between the two variables. We just estimated their correlation. If we have a clear expectation about the causal direction of the relationship, we can specify a regression instead of a correlation. For example, if we expect that changes in income lead to changes in satisfaction, we can specify a regression from income to satisfaction. This is done by adding a regression from the slope of income to the slope of satisfaction. The model new model would be:
model <- 'i_inc =~ 1*logincome_1 + 1*logincome_2 + 1*logincome_3 + 1*logincome_4 s_inc =~ 0*logincome_1 + 1*logincome_2 + 2*logincome_3 + 3*logincome_4 i_sati =~ 1*sati_1 + 1*sati_2 + 1*sati_3 + 1*sati_4 s_sati =~ 0*sati_1 + 1*sati_2 + 2*sati_3 + 3*sati_4 s_sati ~ s_inc' fit2 <- growth(model, data = usw) summary(fit2, standardized = T)
## lavaan 0.6-19 ended normally after 63 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 20 ## ## Used Total ## Number of observations 16360 51007 ## ## Model Test User Model: ## ## Test statistic 1314.986 ## Degrees of freedom 24 ## P-value (Chi-square) 0.000 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## i_inc =~ ## logincome_1 1.000 1.163 0.871 ## logincome_2 1.000 1.163 0.907 ## logincome_3 1.000 1.163 0.986 ## logincome_4 1.000 1.163 1.038 ## s_inc =~ ## logincome_1 0.000 0.000 0.000 ## logincome_2 1.000 0.284 0.222 ## logincome_3 2.000 0.568 0.482 ## logincome_4 3.000 0.853 0.761 ## i_sati =~ ## sati_1 1.000 0.821 0.607 ## sati_2 1.000 0.821 0.600 ## sati_3 1.000 0.821 0.553 ## sati_4 1.000 0.821 0.538 ## s_sati =~ ## sati_1 0.000 0.000 0.000 ## sati_2 1.000 0.152 0.111 ## sati_3 2.000 0.304 0.205 ## sati_4 3.000 0.456 0.299 ## ## Regressions: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## s_sati ~ ## s_inc -0.074 0.021 -3.584 0.000 -0.139 -0.139 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## i_inc ~~ ## s_inc -0.222 0.006 -36.853 0.000 -0.671 -0.671 ## i_sati 0.005 0.012 0.442 0.659 0.005 0.005 ## s_inc ~~ ## i_sati 0.017 0.004 4.265 0.000 0.073 0.073 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## i_inc 6.892 0.010 679.136 0.000 5.925 5.925 ## s_inc 0.075 0.003 22.705 0.000 0.262 0.262 ## i_sati 5.380 0.010 561.987 0.000 6.555 6.555 ## .s_sati -0.076 0.004 -17.063 0.000 -0.498 -0.498 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .logincome_1 0.431 0.012 34.930 0.000 0.431 0.241 ## .logincome_2 0.655 0.009 69.914 0.000 0.655 0.398 ## .logincome_3 0.601 0.008 71.694 0.000 0.601 0.433 ## .logincome_4 0.505 0.011 45.361 0.000 0.505 0.403 ## .sati_1 1.158 0.016 70.655 0.000 1.158 0.632 ## .sati_2 1.175 0.016 72.824 0.000 1.175 0.629 ## .sati_3 1.442 0.019 74.439 0.000 1.442 0.654 ## .sati_4 1.453 0.025 58.166 0.000 1.453 0.624 ## i_inc 1.353 0.020 67.700 0.000 1.000 1.000 ## s_inc 0.081 0.003 30.759 0.000 1.000 1.000 ## i_sati 0.674 0.012 54.513 0.000 1.000 1.000 ## .s_sati 0.023 0.003 8.021 0.000 0.981 0.981
The model’s output is similar to the previous one, but now we have a regression from the slope of income to the slope of satisfaction. This regression is significant and negative, indicating that an increase in income leads to a decrease in satisfaction. More precisely, an increase of 1 log income leads to a decline of 0.07 in satisfaction.
The model can be expanded in several ways. If, for example, we expect that some other variable mediates the relationship between income and satisfaction, we can include that in the model. We can also include non-linear effects, interactions, and other controls. The model can be expanded to include more than two variables. For example, we can include a third variable and investigate how the three variables are related over time.
Conclusion
Parallel latent growth models are a versatile and powerful tool for analysing the dynamic relationships between multiple trajectories over time. Whether you’re exploring how income and life satisfaction evolve together or investigating the interplay between physical activity and mental health, these models offer valuable insights into complex longitudinal data.
As you apply these techniques, remember that the flexibility of parallel LGMs allows for further customisation—whether through adding mediators, exploring non-linear effects, or testing directional hypotheses. These extensions can help you tailor the model to your research questions and datasets.
Learn more about LGMs by exploring our other blog posts on estimating and visualising LGMs and including time-constant and time-varying predictors. If you want to learn how to model non-linear LGMs, check out this post. To understand how LGM differs from the multilevel model for change, you can read more about it here.
Was the information useful?
Consider supporting the site by: