Longitudinal models, such as the latent growth model and the multilevel model for change, enable us to examine both individual and aggregate change over time. This gives us a more nuanced understanding of how individuals, organizations and societies change. However, these models assume one aggregate or average rate of change in time. In reality, there may be subgroups of individuals who follow different trajectories. Often, we do not know what these groups are in advance and identifying them can significantly improve our understanding of what is happening in the data and help us develop theories and interventions.
Mixture Latent Growth Models (MLGM) can identify latent or underlying subgroups and model their unique trajectories. This blog post will explore how to fit MLGM in R, visualize the latent classes, and interpret the results.
Preparing the data for analysis
We start by preparing the data. We will look at satisfaction over four waves of data. We will use the wide version of the Understanding Society data (“usw”) to explore the different types of trajectories in satisfaction (more on how to prepare the data in this previous post). Here, satisfaction is measured on a scale from 1 to 7, with higher values indicating higher satisfaction levels. We will also use the tidySEM
package to run the models. This package is a wrapper for the OpenMx
package, which can run MLGM in R but has a more complex syntax.
Access the code used here.
Access the data here.
We start by loading the packages (we assume they have already been installed).
library(tidyverse) library(tidySEM)
We then load the cleaned data from the “data” subfolder in our working directory:
load("./data/us_clean_syn.RData")
We will select the necessary variables to make the models easier to run.
df <- dplyr::select(usw, pidp, starts_with("sati_"))
We will also take a subsample of 3000 individuals to estimate the models faster (using sample_n()
). The set.seed()
command will allow us to reproduce the results (you will have the same results if you try to run the code).
set.seed(1234) df_small <- sample_n(df, 3000)
What is the Mixture Latent Growth Mixture Model?
Understanding how individuals change over time is a central goal in longitudinal research. Traditional Latent Growth Models (LGM) assume that aggregate change follows a trajectory, with variation captured through random effects. However, in many real-world applications—such as education, health, and behavioural sciences—subpopulations may exist with distinct patterns of change. A single growth trajectory may not adequately represent these subgroups.
Mixture Latent Growth Models (MLGM) extend traditional LGM by incorporating latent class analysis. They allow us to identify unobserved subgroups that follow different trajectories. Instead of assuming one common trajectory, MLGM assigns individuals probabilistically to latent classes, each characterised by its growth parameters. This provides a more nuanced understanding of heterogeneity in longitudinal data.
We can visualise the model using an SEM diagram:

The figure represents the structure of an MLGM. It extends a standard latent growth curve model by introducing a latent categorical variable (ηc) that accounts for subgroup membership. The model includes the following components:
- Observed variables (y1, y2, y3, y4): These are repeated measures collected over time.
- Latent intercept (η0) and slope (η1): These capture individual differences in the starting point and rate of change over time.
- Factor loadings (numbers on the paths): The intercept (η0) is linked to all time points with a loading of 1, representing a baseline level. The slope (η1) has loadings increasing over time (e.g., 0, 1, 2, 3), reflecting linear growth.
- Latent class variable (ηc): This dashed-line path indicates that class membership influences the intercept and slope, meaning each subgroup has different initial levels and rates of change.
- Residual errors (ϵ1, ϵ2, ϵ3, ϵ4): These represent measurement errors or unexplained variance in observed outcomes.
In summary, the Mixture Latent Growth Model allows us to account for heterogeneous trajectories within a population by estimating separate growth parameters for each latent class. This approach is valuable in identifying subgroups that may respond differently to interventions or exhibit unique developmental patterns over time.
Running Mixture Latent Growth Models in R
We will initially use the TidySEM
package to run this model. This package is a wrapper for the OpenMx
package, which can run a wide range of models in R. We use TidySEM
here because its syntax is simpler than the one in OpenMx
.
We can start by creating a model using a similar syntax to that from lavaan
(see an explanation of code in this blogpost).
model <- 'i_sati =~ 1*sati_1 + 1*sati_2 + 1*sati_3 + 1*sati_4 s_inc =~ 0*sati_1 + 1*sati_2 + 2*sati_3 + 3*sati_4'
The typical procedure for Mixture LGM is to run models with different numbers of classes and then compare them. We can use the results and fit indices to decide which model to select and interpret further. We can run multiple models using the mx_growth_mixture()
function. This will run multiple models with different numbers of classes. Here, we will run models with 1 to 4 classes.
res_step <- mx_growth_mixture( model = model, classes = 1:4, data = df_small)
The model with four classes had some estimation issues, so we can try running it a few more times to see if it helps.
res_step[[4]] <- mxTryHardWideSearch(res_step[[4]], extraTries = 200)
Once the model has run, we can look at the model fit:
tab_fit <- table_fit(res_step) tab_fit
## Name Classes LL n Parameters AIC BIC saBIC Entropy ## 1 1 1 -12540.12 3000 9 25098.23 25152.29 25123.69 1.0000000 ## 2 2 2 -11258.28 3000 19 22554.56 22668.68 22608.31 0.6228562 ## 3 3 3 -11079.30 3000 29 22216.60 22390.78 22298.64 0.5570806 ## 4 4 4 -12540.12 3000 39 25158.23 25392.48 25268.56 1.0000000 ## prob_min prob_max n_min n_max np_ratio np_local ## 1 1.0000000 1.0000000 1.0000000 1.0000000 333.33333 333.33333 ## 2 0.8364136 0.8969973 0.4320000 0.5680000 157.89474 144.00000 ## 3 0.4428439 0.9442728 0.1133333 0.5423333 103.44828 37.77778 ## 4 NA NA 0.0000000 1.0000000 76.92308 0.00000
The results for the four-class model seem suspect. We can explore the estimation by looking at the status and warnings.
res_step[[4]]$output$status
## $code ## [1] 0 ## ## $status ## [1] 0
res_step[[4]]$output$warnings
## NULL
While it appears the model has converged, the results again look strange. If we look at the class probabilities, we can see that three classes have no members (count and proportions are very close to 0). This is a sign that the model may not be reliable, as it basically allocated everyone to one class.
class_prob(res_step[[4]])[[1]]
## class count proportion ## 1 class1 7.210771e-08 2.403590e-11 ## 2 class2 3.468490e-08 1.156163e-11 ## 3 class3 3.000000e+03 1.000000e+00 ## 4 class4 2.102998e-07 7.009994e-11
On the other hand, the model with three classes seems to have better class distribution. We can look at the results of this model.
class_prob(res_step[[3]])[[1]]
## class count proportion ## 1 class1 602.4605 0.2008202 ## 2 class2 1155.8228 0.3852743 ## 3 class3 1241.7167 0.4139056
To make the comparison between the models easier to interpret, we can extract the model fit indices we want. In general, lower BIC values are associated with better-fitting models. On the other hand, entropy, a measure of classification accuracy, should be close to 1.
tab_fit[c("Classes", "LL", "BIC", "Entropy")]
## Classes LL BIC Entropy ## 1 1 -12540.12 25152.29 1.0000000 ## 2 2 -11258.28 22668.68 0.6228562 ## 3 3 -11079.30 22390.78 0.5570806 ## 4 4 -12540.12 25392.48 1.0000000
Based on the fit indices, we will select the model with three classes for now. However, because entropy is relatively low (ideally, it should be above 0.9), the classification may not be very accurate. This means that for some individuals, we are not sure in what class they belong.
We can extract the results of the three-class model using the table_results()
function. This gives quite a large output. To focus on the things we care the most about, we can extract just the “means” part of the model using a combination of filter()
and str_detect()
.
table_results(res_step[[3]]) |> filter(str_detect(label, "Means"))
## label est_sig se pval confint class ## 1 Means.i_sati 6.01*** 0.05 0.00 [5.91, 6.11] class1 ## 2 Means.s_inc -0.61*** 0.03 0.00 [-0.68, -0.55] class1 ## 3 Means.i_sati 3.92*** 0.07 0.00 [3.78, 4.05] class2 ## 4 Means.s_inc 0.15*** 0.04 0.00 [0.07, 0.22] class2 ## 5 Means.i_sati 6.03*** 0.02 0.00 [5.99, 6.07] class3 ## 6 Means.s_inc -0.00 0.01 0.81 [-0.02, 0.02] class3
We see that classes 1 and 3 start with relatively high levels of satisfaction, but while class 1 decreases in satisfaction, class 3 is stable. On the other hand, class 2 starts much lower (3.92) and has a slight improvement in time (0.15) per wave. In a previous table, we saw that class 3 is the largest one (41%), while class 1 is the lowest, with 20% of the sample.
To facilitate interpretation, we can also plot the trajectories using plot_growth()
from the tidySEM
package.
plot_growth(res_step[[3]], rawdata = TRUE, alpha_range = c(0, .05))

We see the trends described in the previous table. Now, it also becomes obvious that class 1 members will end up having lower satisfaction levels than those in class 2, even if they start with significantly higher levels.
Running Mixture Latent Growth Models in R using MplusAutomation and MplusLGM
Unfortunately, the R packages for running mixture models still have some limitations. If the data is complex or we have estimation issues (like we had in the 4-class model above), one alternative is to use Mplus, specialised software for latent variable modelling. Unfortunately, this is not free or open source. If you do have Mplus, then models can be run from R directly using the MplusAutomation
and MplusLGM
packages.
Let’s start by loading the packages:
library(MplusLGM) library(MplusAutomation)
The code for running Latent Growth Models in Mplus is slightly different from the one in lavaan
. We will run the model using the fitGBTM()
function. This facilitates running multiple models with different number of classes. The code below runs a similar model to the one before in Mplus.
mixture_model <- fitGBTM( data = df_small, outvar = c("sati_1", "sati_2", "sati_3", "sati_4"), idvar = "pidp", starting_val = 500, min_k = 1L, max_k = 5L, timescores = c(0, 1, 2, 3), polynomial = 1, output = c("TECH1", "TECH11", "SAMPSTAT", "STANDARDIZED") )
Similarly to what we did before, we can start by comparing the fit indices of the models:
fit_indices <- getFit(mixture_model) fit_indices
## Title Observations Parameters NLatentClasses LL BIC ## 1 GBTM_P1_K1_S1000; 2644 3 1 -12906.05 25835.74 ## 2 GBTM_P1_K2_S1000; 2644 6 2 -12380.40 24808.07 ## 3 GBTM_P1_K3_S1000; 2644 9 3 -12201.53 24473.98 ## 4 GBTM_P1_K4_S1000; 2644 12 4 -12157.95 24410.47 ## 5 GBTM_P1_K5_S1000; 2644 15 5 -12150.26 24418.73 ## aBIC AIC AICC CAIC T11_LMR_Value T11_LMR_PValue count_1 ## 1 25826.20 25818.10 25818.10 25838.73 NA NA 2644 ## 2 24789.01 24772.79 24772.83 24814.07 1008.635 0.00 611 ## 3 24445.38 24421.06 24421.12 24482.98 343.219 0.00 413 ## 4 24372.34 24339.91 24340.03 24422.47 83.610 0.00 368 ## 5 24371.07 24330.53 24330.71 24433.73 14.759 0.05 352 ## count_2 count_3 count_4 count_5 proportion_1 proportion_2 proportion_3 ## 1 NA NA NA NA 100.00 NA NA ## 2 2033 NA NA NA 23.11 76.89 NA ## 3 316 1915 NA NA 15.62 11.95 72.43 ## 4 1847 219 210 NA 13.92 69.86 8.28 ## 5 62 237 168 1825 13.31 2.34 8.96 ## proportion_4 proportion_5 Proportion_criterion APPA1 APPA2 APPA3 APPA4 APPA5 ## 1 NA NA pass 1.000 NA NA NA NA ## 2 NA NA pass 0.878 0.937 NA NA NA ## 3 NA NA pass 0.844 0.800 0.891 NA NA ## 4 7.94 NA pass 0.775 0.874 0.714 0.653 NA ## 5 6.35 69.02 fail 0.746 0.645 0.591 0.641 0.871 ## APPA_criterion Entropy Entropy_criterion ## 1 pass NA pass ## 2 pass 0.726 pass ## 3 pass 0.704 pass ## 4 fail 0.698 pass ## 5 fail 0.703 pass ## Warnings ## 1 All continuous latent variable covariances involving I have been fixed to 0 because the variance of I is fixed at 0. TECH11 option is not available for TYPE=MIXTURE with only one class. Request for TECH11 is ignored. Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 356 3 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS ## 2 All continuous latent variable covariances involving I have been fixed to 0 because the variance of I is fixed at 0. Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 356 2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS ## 3 All continuous latent variable covariances involving I have been fixed to 0 because the variance of I is fixed at 0. Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 356 2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS ## 4 All continuous latent variable covariances involving I have been fixed to 0 because the variance of I is fixed at 0. Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 356 2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS ## 5 All continuous latent variable covariances involving I have been fixed to 0 because the variance of I is fixed at 0. Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 356 2 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS ## Errors ## 1 <NA> ## 2 <NA> ## 3 <NA> ## 4 <NA> ## 5 <NA>
Again, we get quite a lot of fit indices, so we can concentrate on the main ones:
fit_indices |> select(NLatentClasses, LL, BIC, Entropy, starts_with("Proportion_cri"))
## NLatentClasses LL BIC Entropy Proportion_criterion ## 1 1 -12906.05 25835.74 NA pass ## 2 2 -12380.40 24808.07 0.726 pass ## 3 3 -12201.53 24473.98 0.704 pass ## 4 4 -12157.95 24410.47 0.698 pass ## 5 5 -12150.26 24418.73 0.703 fail
Based on these indicators, we can decide which model to choose. For example, the four-class model has the lowest BIC value, but the entropy is quite low. We also see that the five-class model has a class with a very low proportion (below 5%).
Alternatively, the MplusLGM
package has a command to help us with the decision-making (getBest()
). Here, we ask it to select the best model based on the BIC and the adjusted Likelihood Ratio Test:
LGM_best <- getBest( lgm_object = mixture_model, ic = "BIC", lrt = "aLRT", p = 0.05 ) LGM_best
## [[1]] ## GBTM_P1_K4_S1000 ## ## Estimated using MLR ## Number of obs: 2644, number of (free) parameters: 12 ## ## Fit Indices: ## ## CFI = NA, TLI = NA, SRMR = NA ## RMSEA = NA, 90% CI [NA, NA], p < .05 = NA ## AIC = 24339.91, BIC = 24410.47 ## NULL
This would indicate the selection of the four-class model based on the criteria we gave.
We can see how large the classes are in the selected model by extracting the table of interest from the output:
mixture_model[[4]]$results$class_counts$posteriorProb
## class count proportion ## 1 1 483.5599 0.18289 ## 2 2 1677.2632 0.63437 ## 3 3 234.2893 0.08861 ## 4 4 248.8876 0.09413
The majority of cases are in class 2 (63%), while classes 3 and 4 have below 10% of the sample.
We can extract the means of the classes from the output to see how their trajectories are different:
table_results(mixture_model[[4]]) |> filter(str_detect(label, "Means"))
## label est_sig se pval confint ## 1 Means.C#1 0.66*** 0.16 0.00 [ 0.35, 0.98] ## 2 Means.C#2 1.91*** 0.15 0.00 [ 1.62, 2.20] ## 3 Means.C#3 -0.06 0.28 0.83 [-0.60, 0.48] ## 4 Means.I 5.46*** 0.09 0.00 [ 5.28, 5.64] ## 5 Means.I 5.82*** 0.03 0.00 [ 5.77, 5.88] ## 6 Means.I 2.65*** 0.16 0.00 [ 2.33, 2.97] ## 7 Means.I 3.07*** 0.15 0.00 [ 2.78, 3.36] ## 8 Means.S -0.83*** 0.04 0.00 [-0.92, -0.75] ## 9 Means.S -0.04** 0.01 0.00 [-0.06, -0.01] ## 10 Means.S 0.22* 0.10 0.02 [ 0.03, 0.42] ## 11 Means.S 0.79*** 0.07 0.00 [ 0.65, 0.93]
We see that the first two classes start with high levels of satisfaction while the others start with relatively low levels. Class 1 significantly decreases its satisfaction, while classes 3 and 4 increase it.
Again, we can graph the trajectories for the different classes using the plotGrowthMixtures()
function from the MplusAutomation
package.
plotGrowthMixtures( LGM_best, bw = FALSE, rawdata = TRUE, alpha_range = c(0, 0.05) )

Now, the patterns are clearer. We see that class 1 has a significant decrease in satisfaction while class 2 is stable in time. Classes 3 and 4 have a more pronounced increase in satisfaction.
Conclusion
In this blog post, we explored how to run Mixture Latent Growth Models in R. We saw how to use the tidySEM
package to simplify the code needed to run the model (compared to the original OpenMx
package). We also saw how to run such models using Mplus through R through the mplusLGM
and MplusAutomation
packages. Hopefully, this has given you an idea of the possible insights with this model and why it is so popular. The model could be extended by explaining the classes with co-variates or by adding more complex growth patterns.

Was the information useful?
Consider supporting the site by: