Including time constant controls in Latent Growth Models

In a previous post, we looked at Latent Growth Models (LGM) and how to estimate them in R. We used LGM to investigate both linear and non-linear change in time and to concurrently examine average change and individual-level change.

Once we understand the overall change in time, we typically want to include predictors to explain those changes. For example, individual characteristics might explain why some people start with a larger income and have different trajectories than others. Here, we will focus on including time-constant predictors in LGM to better understand the change in time. Time-constant variables do not change in time but might influence the change in the variables of interest. For example, we might want to use characteristics such as age or education at the start of the study to explain the change in income over time.

Preparing the data

You can follow this guide by running the code in R on your computer. Our examples will use synthetic (simulated) data modelled after Understanding Society, a comprehensive panel study from the UK. The actual data can be accessed for free from the UK Data Archive.

Access the code used here.

Access the data here.

In a previous blog, we cleaned the data, and have both the long and the wide formats. We will use that cleaned data here. If you want to follow along, you can download the data here and all the code from here.

Before we start, make sure you have the tidyverse and lavaan packages installed and loaded by running the following code:

library(tidyverse)
library(lavaan)

We will mainly use the lavaan package to run LGMs and tidyverse for light data cleaning.

Next, we will load the data prepared in the previous post. This contains long and wide format data in an “RData” file. We will load the long-format data and have a quick look at it:

load("./data/us_clean_syn.RData")

Before we get into the LGM, let’s examine the data we want to analyze. Imagine we are interested in how income changes over time and the main predictors of that change. We explored the basic LGM before in this blog post. Here, we want to build on that knowledge and include some predictors.

We will use two predictors from our data that are coded as time-constant. These are the sex and age at the start of the study. We coded the first one as a factor variable, while the latter is continuous.

We will centre the age variable to make the results easier to interpret. This means that the average age will become 0. This is a common practice in LGMs and makes interpreting the intercept easier. Also, the lavaan package does not work very well with factors, and when using categorical variables, it is better to use dummy variables (coded as 0 and 1). Here, we will create a new variable, “female”, that will take the value 1 for females and 0 otherwise.

Here is the code to create these new control variables:

usw <- usw |> 
  mutate(female = ifelse(gndr.fct == "Male", 0, 1),
         age_center = age - mean(age, na.rm = T))

LGM with time-constant predictors

We can expand the basic LGM model to investigate how time-constant predictors impact change estimates. This is relatively easy, given that we use the SEM framework. We simply include the predictor in the model and estimate two new regressions, explaining the intercept and slope of the growth model. We can represent this visually:

SEM figure of latent growth model (LGM) with one hypotehtical time consant predictor

 Here, the regression coefficient 𝛽01 (“beta 01”) will show how 𝑥1 changes the expected score at the start of the study while 𝛽11 indicates how the rates of change differ by the values of 𝑥1. This is the equivalent of introducing a main effect and an interaction with time in a multilevel model. Note that the two latent variables are now the outcomes of a regression. As a result, they have residuals (𝜁1 (“zeta 1”) and 𝜁2). Also, the correlation between the intercept and slope, which tells us about the convergence or divergence in the data, is now between the residuals.

Let’s see how sex and age impact the change in time of log income. We will use the two new variables we created. We expand the LGM to include two regressions with these two variables explaining the latent variables “i” and “s”:

model <- ' i =~ 1*logincome_1 + 1*logincome_2 + 
              1*logincome_3 + 1*logincome_4 
            s =~ 0*logincome_1 + 1*logincome_2 + 
              2*logincome_3 + 3*logincome_4
            
            i ~ female + age_center
            s ~ female + age_center'

fit1 <- growth(model, data = usw)

summary(fit1, standardized = TRUE)
## lavaan 0.6.17 ended normally after 45 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        13
## 
##                                                   Used       Total
##   Number of observations                         24653       78159
## 
## Model Test User Model:
##                                                       
##   Test statistic                              1054.228
##   Degrees of freedom                                 9
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   i =~                                                                  
##     logincome_1       1.000                               1.220    0.867
##     logincome_2       1.000                               1.220    0.908
##     logincome_3       1.000                               1.220    0.981
##     logincome_4       1.000                               1.220    1.044
##   s =~                                                                  
##     logincome_1       0.000                               0.000    0.000
##     logincome_2       1.000                               0.303    0.226
##     logincome_3       2.000                               0.606    0.487
##     logincome_4       3.000                               0.909    0.778
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   i ~                                                                   
##     female           -0.399    0.017  -23.204    0.000   -0.327   -0.162
##     age_center        0.010    0.000   20.422    0.000    0.008    0.143
##   s ~                                                                   
##     female            0.002    0.006    0.381    0.703    0.007    0.004
##     age_center       -0.003    0.000  -18.472    0.000   -0.010   -0.173
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .i ~~                                                                  
##    .s                -0.244    0.005  -45.056    0.000   -0.687   -0.687
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .i                 6.980    0.013  537.425    0.000    5.722    5.722
##    .s                 0.096    0.004   22.271    0.000    0.317    0.317
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .logincome_1       0.493    0.011   43.628    0.000    0.493    0.249
##    .logincome_2       0.732    0.009   85.740    0.000    0.732    0.406
##    .logincome_3       0.704    0.008   89.210    0.000    0.704    0.456
##    .logincome_4       0.572    0.010   55.931    0.000    0.572    0.419
##    .i                 1.417    0.017   81.170    0.000    0.952    0.952
##    .s                 0.089    0.002   37.036    0.000    0.970    0.970

We can see the new coefficients under the “Regressions” section of the output. They are interpreted as standard regression coefficients. For example, based on the model, females have lower log income at the start of the study (by 0.399), but their rate of change is slightly higher (by 0.002 but not significantly different from 0). On the other hand, if age is higher by one, the expected log income at the start of the study is larger by 0.01, but the rate of change is slightly lower (by 0.003).

The interpretation of the intercepts now also changes because “i” and “s” are outcomes of a regression. Now, they refer to the expected value at the start of the study and the expected rate of change when all the predictors are 0 (similar to a regular regression). So, for males with average age, the expected log income at the start of the study is 6.98, and the rate of change is 0.096.

We can also represent the results using a SEM graph. This has the same information as the output but may be easier to understand.

Graph showing predicted income growth over time for males and females based on latent growth model using time-constant predictors.

Alternatively, we can write the regression equation and calculate the expected value at the start of the study and the rate of change under different scenarios.

For the intercept, the regression model is:

\[i = 6.98 -0.399 * famale + 0.01 * age\_center\]

While for the slope it is:

\[s = 0.096 + 0.002 * famale -0.003 * age\_center\]

We saw that the expected intercept and slope for males with average age (~46 years old) are 6.98 and 0.096. Let’s see what the expected value would be for females aged 16. Because age is centred, age 16 takes the value -30 for “age_centered” while the “females” variable takes the value 1. If we add these values to the above formulas, we have:

\[i_{female\_16} = 6.98 -0.399 * 1 – 0.01 * -30 = 6.281 \] \[s_{female\_16} = 0.096 + 0.002 * 1 -0.003 * -30 = 0.188\]

Based on these results, females aged 16 start with a lower expected log income (6.281 vs 6.98) but increase faster (0.188 vs 0.096) than males with average age.

These results also impact the expected values of log income. For example, the predicted log income in wave 2 for males with an average age will be:

\[logincome\_2 = 6.98 + 0.096= 7.076\]

While for a female aged 16, it will be:

\[logincome\_2_{female\_16} = 6.281 + 0.188= 6.469\]

An alternative way to include time-constant predictors

This approach of introducing time-constant predictors in LGM is widely used, but it does come with an important assumption. We assume the impact of the predictors on the slope is constant. This is not always the case. For example, the effect of age on income might be different at the start of the study compared to the end.

To account for this, we can use the predictors to explain observed variables in the model (instead of the latent ones). We could represent this in a new SEM graph as follows:

Alternative way to inculde time constant predictors in LGM

Instead of two coefficients (on the intercept and slope), we have four different ones, one for each wave. This allows us to see if the effect of the predictors is constant in time.

Returning to our example with log income, we can include female and age_center as predictors of log income at each wave. The syntax is similar to the one above, but we change the outcome variables:

model <- ' i =~ 1*logincome_1 + 1*logincome_2 + 
              1*logincome_3 + 1*logincome_4 
            s =~ 0*logincome_1 + 1*logincome_2 + 
              2*logincome_3 + 3*logincome_4
            
            logincome_1 ~ female + age_center
            logincome_2 ~ female + age_center
            logincome_3 ~ female + age_center
            logincome_4 ~ female + age_center'

fit2 <- growth(model, data = usw)

summary(fit2, standardized = TRUE)
## lavaan 0.6.17 ended normally after 57 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        17
## 
##                                                   Used       Total
##   Number of observations                         24653       78159
## 
## Model Test User Model:
##                                                       
##   Test statistic                              1018.627
##   Degrees of freedom                                 5
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   i =~                                                                  
##     logincome_1       1.000                               1.191    0.845
##     logincome_2       1.000                               1.191    0.889
##     logincome_3       1.000                               1.191    0.959
##     logincome_4       1.000                               1.191    1.018
##   s =~                                                                  
##     logincome_1       0.000                               0.000    0.000
##     logincome_2       1.000                               0.299    0.223
##     logincome_3       2.000                               0.597    0.481
##     logincome_4       3.000                               0.896    0.766
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   logincome_1 ~                                                         
##     female           -0.415    0.017  -23.819    0.000   -0.415   -0.146
##     age_center        0.011    0.001   20.447    0.000    0.011    0.128
##   logincome_2 ~                                                         
##     female           -0.367    0.015  -23.795    0.000   -0.367   -0.136
##     age_center        0.006    0.000   13.159    0.000    0.006    0.082
##   logincome_3 ~                                                         
##     female           -0.382    0.014  -26.983    0.000   -0.382   -0.153
##     age_center        0.004    0.000    8.721    0.000    0.004    0.055
##   logincome_4 ~                                                         
##     female           -0.407    0.014  -28.473    0.000   -0.407   -0.173
##     age_center        0.001    0.000    2.888    0.004    0.001    0.018
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   i ~~                                                                  
##     s                -0.244    0.005  -45.110    0.000   -0.687   -0.687
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     i                 6.980    0.013  537.413    0.000    5.861    5.861
##     s                 0.096    0.004   22.273    0.000    0.322    0.322
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .logincome_1       0.492    0.011   43.584    0.000    0.492    0.248
##    .logincome_2       0.731    0.009   85.728    0.000    0.731    0.407
##    .logincome_3       0.704    0.008   89.220    0.000    0.704    0.456
##    .logincome_4       0.572    0.010   55.900    0.000    0.572    0.418
##     i                 1.418    0.017   81.210    0.000    1.000    1.000
##     s                 0.089    0.002   37.101    0.000    1.000    1.000

The results regarding the first wave are slightly different from before, with females having lower incomes of 0.415 instead of 0.399 (compared to men), while the effect of age is the same. If we look at the impact on the other waves, we see that the effect of sex is not linear; for example, in wave 2, it is -0.367, while in wave 4, it is -0.407. This would indicate that the effect of females on income in the previous model might have been misleading. On the other hand, age’s effect seems to be approximately linear, decreasing with each wave. In this case, the results from the previous model may not have been a bad approximation.

We could also use a mix of these two approaches. For example, we might allow for the effect of female to be different in time while for that of age to be constant. This would be done by explaining both the observed and latent variables. For example, we could allow for the effect of female to be different in time but for that of age to be constant using this code:

model <- ' i =~ 1*logincome_1 + 1*logincome_2 + 
              1*logincome_3 + 1*logincome_4 
            s =~ 0*logincome_1 + 1*logincome_2 + 
              2*logincome_3 + 3*logincome_4
            
    logincome_1 ~ female 
    logincome_2 ~ female 
    logincome_3 ~ female 
    logincome_4 ~ female 
    
    i + s ~ age_center'

fit3 <- growth(model, data = usw)

summary(fit3, standardized = TRUE)
## lavaan 0.6.17 ended normally after 49 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        15
## 
##                                                   Used       Total
##   Number of observations                         24653       78159
## 
## Model Test User Model:
##                                                       
##   Test statistic                              1025.549
##   Degrees of freedom                                 7
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   i =~                                                                  
##     logincome_1       1.000                               1.203    0.854
##     logincome_2       1.000                               1.203    0.897
##     logincome_3       1.000                               1.203    0.969
##     logincome_4       1.000                               1.203    1.029
##   s =~                                                                  
##     logincome_1       0.000                               0.000    0.000
##     logincome_2       1.000                               0.303    0.226
##     logincome_3       2.000                               0.606    0.488
##     logincome_4       3.000                               0.909    0.778
## 
## Regressions:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   logincome_1 ~                                                         
##     female           -0.414    0.017  -23.764    0.000   -0.414   -0.146
##   logincome_2 ~                                                         
##     female           -0.369    0.015  -23.982    0.000   -0.369   -0.137
##   logincome_3 ~                                                         
##     female           -0.382    0.014  -27.061    0.000   -0.382   -0.153
##   logincome_4 ~                                                         
##     female           -0.406    0.014  -28.435    0.000   -0.406   -0.172
##   i ~                                                                   
##     age_center        0.010    0.000   20.422    0.000    0.008    0.145
##   s ~                                                                   
##     age_center       -0.003    0.000  -18.473    0.000   -0.010   -0.173
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##  .i ~~                                                                  
##    .s                -0.244    0.005  -45.100    0.000   -0.687   -0.687
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .i                 6.980    0.013  537.415    0.000    5.800    5.800
##    .s                 0.096    0.004   22.272    0.000    0.317    0.317
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .logincome_1       0.493    0.011   43.591    0.000    0.493    0.248
##    .logincome_2       0.731    0.009   85.732    0.000    0.731    0.407
##    .logincome_3       0.704    0.008   89.218    0.000    0.704    0.456
##    .logincome_4       0.572    0.010   55.904    0.000    0.572    0.418
##    .i                 1.418    0.017   81.202    0.000    0.979    0.979
##    .s                 0.089    0.002   37.091    0.000    0.970    0.970

The coefficients for age are similar to those in the original model. We can compare the last two models to see if assuming linear change for age significantly decreases the fit of the model:

anova(fit2, fit3)
## 
## Chi-Squared Difference Test
## 
##      Df    AIC    BIC  Chisq Chisq diff     RMSEA Df diff Pr(>Chisq)  
## fit2  5 292997 293135 1018.6                                          
## fit3  7 293000 293121 1025.5     6.9213 0.0099906       2    0.03141 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

If we look at the BIC (which tends to select simpler models), the results indicate that the model with a constant effect of age on income is better than the one that allows it to vary in time. This would indicate that the assumption of constant effects of age on the change of income is not too wrong in this case.

We could also compare this model with the original one to see if the effect of female is constant in time.

anova(fit1, fit3)
## 
## Chi-Squared Difference Test
## 
##      Df    AIC    BIC  Chisq Chisq diff    RMSEA Df diff Pr(>Chisq)    
## fit3  7 293000 293121 1025.5                                           
## fit1  9 293024 293130 1054.2     28.679 0.023262       2  5.921e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here again, we see that the new model is better than the original one. This implies that assuming the effect of “female” to be equal in time might be problematic but reasonable for age.

We can still calculate the expected values for the different scenarios in this model. Typically it is best to calculate the expected intercept and slope for different cases and then calculate the expected values for each wave. For example, the expected intercept and slope for someone who is 16 based on our model would be:

\[i_{age16} = 6.98 + 0.01 \times (-30) = 6.68\]

\[s_{age16} = 0.096 – 0.003 \times (-30) = 0.186\]

We could use that information, for example, to calculate the expected log income for men and women at each wave.

\[logincome\_{t}_{male\_16} = i_{age16} + s_{age16} * \text{Time}_t + \beta_{female, t} * female = 6.68 + 0.186 * \text{Time}_t\] \[logincome\_{t}_{female\_16} = i_{age16} + s_{age16} * \text{Time}_t + \beta_{female, t} * female = 6.68 + 0.186 * \text{Time}_t + \beta_{female, t} * female \]

Using this equation and the results from the model, we could calculate the expected values at wave 4 for males and females:

\[logincome\_{4}_{male\_16} = 6.68 + 0.186 * 3 = 7.238\] \[logincome\_{4}_{female\_16} = 6.68 + 0.186 * 3 – 0.406 = 6.832\]

You could calculate the expected values for each wave and each scenario. This would allow you to see how the expected values change over time and how they differ between people. The results could then be visualized or presented in a table to make them easier to interpret.

Conclusions

Latent Growth Models provide a framework for studying change over time, and the inclusion of time-constant predictors helps better understand the predictors that explain both initial levels and the rate of change of key variables.

In this post, we explored how to incorporate time-constant predictors into an LGM, modelling their effects on intercept (initial status) and slope (rate of change). We demonstrated how to estimate these models in R, extract key coefficients, and interpret their impact on the trajectory of the outcome variable.

We also examined two different strategies for including time-constant predictors in the model. The first approach is more straightforward but assumes that the effects of change in time are constant. The second strategy creates a more complex model but allows for non-linear effects on the outcome. We also considered how to decide between the two approaches and even combine them.

You can learn how to estimate non-linear LGM models here, and how to include time-varying predictors in this post.

Was the information useful?

Consider supporting the site by: