In previous blog posts, I have introduced two of the most popular models for estimating change in time: the multilevel model for change (MLM) and the Latent Growth Model (LGM). In my experience teaching this topic, researchers tend to select one of these models depending on the framework they are more familiar with: regression/multilevel vs. Structural Equation Modeling. As a result, they know less about the other method and when to use it. Both of these models answer similar research questions: how does change happen? and how elements differ in their patterns of change?
There are a couple of reasons why you should try to understand both the multilevel model for change and the latent growth model. Firstly, while the two methods are similar, they have different strengths and weaknesses. As such, it might make sense to switch to the other approach in certain situations. Secondly, by default, these models make slightly different assumptions you must know. Thirdly, you should understand how both of these work to be able to engage with the academic literature.
In this post, I’m going to compare the two methods, discuss their differences in assumptions, and explain how to test them. Finally, I will discuss some of each method’s strengths and weaknesses.
Set-up
First, let’s set up things for the comparison. I’m going to use the “lavaan” to run the LGM. This package was developed to run Structural Equation Models and is well-suited to run LGM. We will use “lme4” to run the MLM. Here, I load the two packages (they are already installed):
# package for LGM library(lavaan) # package for MLM library(lme4)
One crucial difference between the two models is that they use data that is structured differently. The LGM uses the wide data format, where each row represents an individual, and variables appear in multiple columns to represent the value at each wave. Here is how the wide data looks like:
# wide data usw ## # A tibble: 8,752 x 5 ## pidp logincome_1 logincome_2 logincome_3 logincome_4 ## <fct> <dbl> <dbl> <dbl> <dbl> ## 1 1 7.07 7.20 7.21 7.16 ## 2 2 7.15 7.10 7.15 6.86 ## 3 3 6.83 6.93 7.57 7.27 ## 4 4 7.49 7.63 7.52 7.46 ## 5 5 8.35 5.57 7.78 7.73 ## 6 6 7.80 7.82 7.89 6.16 ## 7 7 6.98 7.00 7.01 7.04 ## 8 8 7.03 7.26 7.23 7.35 ## 9 9 6.39 5.66 6.54 6.64 ## 10 10 8.28 8.36 8.07 7.92 ## # ... with 8,742 more rows
The MLM uses the data in long format where each row is a combination of individual and wave. This has fewer variables but is longer as values for different waves appear in the same column:
# long data usl ## # A tibble: 35,008 x 4 ## pidp wave wave0 logincome ## <fct> <dbl> <dbl> <dbl> ## 1 1 1 0 7.07 ## 2 1 2 1 7.20 ## 3 1 3 2 7.21 ## 4 1 4 3 7.16 ## 5 2 1 0 7.15 ## 6 2 2 1 7.10 ## 7 2 3 2 7.15 ## 8 2 4 3 6.86 ## 9 3 1 0 6.83 ## 10 3 2 1 6.93 ## # ... with 34,998 more rows
The statistical models
I have introduced the two models in previous posts (the multilevel model for change and the Latent Growth Model) so do check out those if this is new to you. I want to present the statistical notation of the two models next to each other to see the similarities.
The notation for the multilevel model for change is:
Yij = γ00 + γ10TIMEij + ξ0i + ξ1iTIMEij + ϵij
The notation for the Latent Growth Model is:
yj = α0 + α1λj + ζ00 + ζ11λj + ϵj
A few things to note. Typically, the individual subscript, i, is missing from SEM notation. Also, time is an explicit variable in the MLM, but it is coded as λ in LGM, and we need to code it by hand. Otherwise, they are very similar.
Just a reminder of what all this Greek means:
- Yij/yj is the variable of interest (logincome for us) that changes in time (j).
- γ00/α0 represents the average value at the start of the data collection.
- γ10/α1 is the average rate of change in time.
- ξ0i/ζ00 is the between variation at the start of the data, basically summarizing how different the individual starting points are compared to the average starting points.
- ξ1i/ζ11 is the between variation in the rate of change. They summarize how different the individual slopes of change are from the average change.
- ϵij/ϵj is the within variation or how different are the observed scores of each individual compared to their expected value.
Comparing the results
Next, let’s compare two simple models where we explore the change in time of log income over four waves of the Understanding Society survey.
The first model we run is the LGM (which we cover in more depth in this post)
# latent growth model model <- ' i =~ 1*logincome_1 + 1*logincome_2 + 1*logincome_3 + 1*logincome_4 s =~ 0*logincome_1 + 1*logincome_2 + 2*logincome_3 + 3*logincome_4' lgm1 <- growth(model, data = usw) summary(lgm1, standardized = TRUE) ## lavaan 0.6-9 ended normally after 51 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 9 ## ## Number of observations 8752 ## ## Model Test User Model: ## ## Test statistic 83.795 ## Degrees of freedom 5 ## P-value (Chi-square) 0.000 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## i =~ ## logincome_1 1.000 0.828 0.854 ## logincome_2 1.000 0.828 0.924 ## logincome_3 1.000 0.828 0.949 ## logincome_4 1.000 0.828 0.958 ## s =~ ## logincome_1 0.000 0.000 0.000 ## logincome_2 1.000 0.148 0.165 ## logincome_3 2.000 0.296 0.339 ## logincome_4 3.000 0.444 0.513 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## i ~~ ## s -0.052 0.003 -16.641 0.000 -0.425 -0.425 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .logincome_1 0.000 0.000 0.000 ## .logincome_2 0.000 0.000 0.000 ## .logincome_3 0.000 0.000 0.000 ## .logincome_4 0.000 0.000 0.000 ## i 7.048 0.010 715.699 0.000 8.512 8.512 ## s 0.052 0.003 19.277 0.000 0.353 0.353 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .logincome_1 0.254 0.007 36.846 0.000 0.254 0.271 ## .logincome_2 0.200 0.004 47.413 0.000 0.200 0.249 ## .logincome_3 0.197 0.004 48.753 0.000 0.197 0.258 ## .logincome_4 0.177 0.006 31.356 0.000 0.177 0.237 ## i 0.686 0.013 52.076 0.000 1.000 1.000 ## s 0.022 0.001 16.826 0.000 1.000 1.000
Next, we run the MLM using the long data (which we cover in more depth in this post)
mlm1 <- lmer(data = usl, logincome ~ 1 + wave0 + (1 + wave0 | pidp)) summary(mlm1) ## Linear mixed model fit by REML ['lmerMod'] ## Formula: logincome ~ 1 + wave0 + (1 + wave0 | pidp) ## Data: usl ## ## REML criterion at convergence: 69594.4 ## ## Scaled residuals: ## Min 1Q Median 3Q Max ## -7.1893 -0.2258 0.0543 0.3052 4.9783 ## ## Random effects: ## Groups Name Variance Std.Dev. Corr ## pidp (Intercept) 0.7057 0.8401 ## wave0 0.0238 0.1543 -0.46 ## Residual 0.2039 0.4515 ## Number of obs: 35008, groups: pidp, 8752 ## ## Fixed effects: ## Estimate Std. Error t value ## (Intercept) 7.044933 0.009846 715.52 ## wave0 0.053728 0.002716 19.78 ## ## Correlation of Fixed Effects: ## (Intr) ## wave0 -0.518
To make it easier to compare, we can extract the main coefficients in a table:
Coefficient | Multilevel | Latent growth |
---|---|---|
Fixed effect: intercept | 7.045 | 7.048 |
Fixed effect: slope | 0.054 | 0.052 |
Between variance: intercept | 0.706 | 0.686 |
Between variance: slope | 0.024 | 0.022 |
First, we see that the estimates are very similar but not identical. The main reason for that is the residuals or within variation. The MLM has only one coefficient (0.204), while the LGM has four coefficients. And this is the big assumption the MLM has by default. It assumes residuals, or within variation, are the same at different points in time. The LGM, by default, does not assume that and estimates a coefficient for each wave.
This assumption can be important. These coefficients are substantively interesting as they tell us about the amount of within variation left unexplained. If residuals are not equal in time, this coefficient can be incorrect. Furthermore, other coefficients in the model can be biased as a result.
Restricting the LGM
While there is some debate about whether the assumption of equal residuals in time is reasonable, I believe the best way to deal with it is to investigate it empirically. This can be easily done in LGM. We can run the model again, but this time with the restriction that the residuals are equal in time. We can then compare this model with the prior one to decide on this assumption.
# latent growth model with restriction model <- ' i =~ 1*logincome_1 + 1*logincome_2 + 1*logincome_3 + 1*logincome_4 s =~ 0*logincome_1 + 1*logincome_2 + 2*logincome_3 + 3*logincome_4 logincome_1 ~~ a*logincome_1 logincome_2 ~~ a*logincome_2 logincome_3 ~~ a*logincome_3 logincome_4 ~~ a*logincome_4' lgm2 <- growth(model, data = usw) summary(lgm2, standardized = TRUE) ## lavaan 0.6-9 ended normally after 50 iterations ## ## Estimator ML ## Optimization method NLMINB ## Number of model parameters 9 ## Number of equality constraints 3 ## ## Number of observations 8752 ## ## Model Test User Model: ## ## Test statistic 195.275 ## Degrees of freedom 8 ## P-value (Chi-square) 0.000 ## ## Parameter Estimates: ## ## Standard errors Standard ## Information Expected ## Information saturated (h1) model Structured ## ## Latent Variables: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## i =~ ## logincome_1 1.000 0.840 0.881 ## logincome_2 1.000 0.840 0.932 ## logincome_3 1.000 0.840 0.961 ## logincome_4 1.000 0.840 0.961 ## s =~ ## logincome_1 0.000 0.000 0.000 ## logincome_2 1.000 0.154 0.171 ## logincome_3 2.000 0.309 0.353 ## logincome_4 3.000 0.463 0.530 ## ## Covariances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## i ~~ ## s -0.060 0.003 -20.757 0.000 -0.463 -0.463 ## ## Intercepts: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .logincome_1 0.000 0.000 0.000 ## .logincome_2 0.000 0.000 0.000 ## .logincome_3 0.000 0.000 0.000 ## .logincome_4 0.000 0.000 0.000 ## i 7.045 0.010 715.556 0.000 8.387 8.387 ## s 0.054 0.003 19.782 0.000 0.348 0.348 ## ## Variances: ## Estimate Std.Err z-value P(>|z|) Std.lv Std.all ## .logincom_1 (a) 0.204 0.002 93.552 0.000 0.204 0.224 ## .logincom_2 (a) 0.204 0.002 93.552 0.000 0.204 0.251 ## .logincom_3 (a) 0.204 0.002 93.552 0.000 0.204 0.267 ## .logincom_4 (a) 0.204 0.002 93.552 0.000 0.204 0.267 ## i 0.706 0.013 54.639 0.000 1.000 1.000 ## s 0.024 0.001 22.261 0.000 1.000 1.000
Now, we see that the residual is the same at each time. If we make a table with all the coefficients, we now see that they are identical for LGM and MLM:
Coefficient | Multilevel | Latent growth | Latent growth restricted |
---|---|---|---|
Fixed effect: intercept | 7.045 | 7.048 | 7.045 |
Fixed effect: slope | 0.054 | 0.052 | 0.054 |
Between variance: intercept | 0.706 | 0.686 | 0.706 |
Between variance: slope | 0.024 | 0.022 | 0.024 |
Within variation | 0.204 | 0.254 | 0.204 |
We can compare the model with restrictions and the original one to see which fits the data best. The anova()
command is an easy way to do this:
anova(lgm1, lgm2) ## Chi-Squared Difference Test ## ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq) ## lgm1 5 69483 69547 83.795 ## lgm2 8 69589 69631 195.275 111.48 3 < 2.2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
All the fit indices indicate that the original model, which had different residuals at each point, is a better fit to the data. This would mean that the assumption of equality within variation in time might not be appropriate for this particular data.
When to use each model
In addition to this assumption regarding the within variation, which can be freed and tested, there are a couple of other things to consider when choosing between the multilevel model for change and latent growth models.
The MLM is handy when analyzing data with continuous time or when data is collected at different points in time for each individual. Because it uses long data, it can easily deal with these situations, which can be problematic for the LGM. Additionally, if you have many time points, it might be easier to write up the model.
Conversely, the LGM is useful if you want to use some of the other tools available in the SEM framework. For example, you can easily do multi-group analysis, comparing trends for different groups. You can also combine the LGM with the mixture or latent class to run the Mixture Latent Growth Model. This makes it possible to find clusters of people based on their time changes. Additionally, you can include the LGM in path models, making it possible to examine the relationship between the rate of change and other variables of interest. You can also correct for measurement errors by using second-order latent growth models and investigate invariance in time. Finally, SEM can effectively deal with missing data by using Full Information Maximum Likelihood.
Conclusions
Hopefully, that gave you an idea of the strengths and weaknesses of the multilevel model for change and the latent growth models. Both can be valuable tools for understanding individual-level change in time. There might be situations where one is a better fit than the other, though.
Was the information useful?
Consider supporting the site by: