In a previous post, we looked at Latent Growth Models (LGM) and how to estimate them in R. We used LGM to investigate both linear and non-linear change in time and to concurrently examine average change and individual-level change.
Once we understand the overall change in time, we typically want to include predictors to explain those changes. For example, individual characteristics might explain why some people start with a larger income and have different trajectories than others. Here, we will focus on including time-constant predictors in LGM to better understand the change in time. Time-constant variables do not change in time but might influence the change in the variables of interest. For example, we might want to use characteristics such as age or education at the start of the study to explain the change in income over time.
Preparing the data
You can follow this guide by running the code in R on your computer. Our examples will use synthetic (simulated) data modelled after Understanding Society, a comprehensive panel study from the UK. The actual data can be accessed for free from the UK Data Archive.
In a previous blog, we cleaned the data, and have both the long and the wide formats. We will use that cleaned data here. If you want to follow along, you can download the data here and all the code from here.
Before we start, make sure you have the tidyverse and lavaan packages installed and loaded by running the following code:
library(tidyverse)
library(lavaan)
We will mainly use the lavaan package to run LGMs and tidyverse for light data cleaning.
Next, we will load the data prepared in the previous post. This contains long and wide format data in an “RData” file. We will load the long-format data and have a quick look at it:
load("./data/us_clean_syn.RData")
Before we get into the LGM, let’s examine the data we want to analyze. Imagine we are interested in how income changes over time and the main predictors of that change. We explored the basic LGM before in this blog post. Here, we want to build on that knowledge and include some predictors.
We will use two predictors from our data that are coded as time-constant. These are the sex and age at the start of the study. We coded the first one as a factor variable, while the latter is continuous.
We will centre the age variable to make the results easier to interpret. This means that the average age will become 0. This is a common practice in LGMs and makes interpreting the intercept easier. Also, the lavaan package does not work very well with factors, and when using categorical variables, it is better to use dummy variables (coded as 0 and 1). Here, we will create a new variable, “female”, that will take the value 1 for females and 0 otherwise.
Here is the code to create these new control variables:
We can expand the basic LGM model to investigate how time-constant predictors impact change estimates. This is relatively easy, given that we use the SEM framework. We simply include the predictor in the model and estimate two new regressions, explaining the intercept and slope of the growth model. We can represent this visually:
Here, the regression coefficient 𝛽01 (“beta 01”) will show how 𝑥1 changes the expected score at the start of the study while 𝛽11 indicates how the rates of change differ by the values of 𝑥1. This is the equivalent of introducing a main effect and an interaction with time in a multilevel model. Note that the two latent variables are now the outcomes of a regression. As a result, they have residuals (𝜁1 (“zeta 1”) and 𝜁2). Also, the correlation between the intercept and slope, which tells us about the convergence or divergence in the data, is now between the residuals.
Let’s see how sex and age impact the change in time of log income. We will use the two new variables we created. We expand the LGM to include two regressions with these two variables explaining the latent variables “i” and “s”:
model <- ' i =~ 1*logincome_1 + 1*logincome_2 +
1*logincome_3 + 1*logincome_4
s =~ 0*logincome_1 + 1*logincome_2 +
2*logincome_3 + 3*logincome_4
i ~ female + age_center
s ~ female + age_center'
fit1 <- growth(model, data = usw)
summary(fit1, standardized = TRUE)
We can see the new coefficients under the “Regressions” section of the output. They are interpreted as standard regression coefficients. For example, based on the model, females have lower log income at the start of the study (by 0.399), but their rate of change is slightly higher (by 0.002 but not significantly different from 0). On the other hand, if age is higher by one, the expected log income at the start of the study is larger by 0.01, but the rate of change is slightly lower (by 0.003).
The interpretation of the intercepts now also changes because “i” and “s” are outcomes of a regression. Now, they refer to the expected value at the start of the study and the expected rate of change when all the predictors are 0 (similar to a regular regression). So, for males with average age, the expected log income at the start of the study is 6.98, and the rate of change is 0.096.
We can also represent the results using a SEM graph. This has the same information as the output but may be easier to understand.
Alternatively, we can write the regression equation and calculate the expected value at the start of the study and the rate of change under different scenarios.
We saw that the expected intercept and slope for males with average
age (~46 years old) are 6.98 and 0.096. Let’s see what the expected
value would be for females aged 16. Because age is centred, age 16 takes
the value -30 for “age_centered” while the “females” variable takes the
value 1. If we add these values to the above formulas, we have:
Based on these results, females aged 16 start with a lower expected
log income (6.281 vs 6.98) but increase faster (0.188 vs 0.096) than
males with average age.
These results also impact the expected values of log income. For
example, the predicted log income in wave 2 for males with an average
age will be:
An alternative way to include time-constant predictors
This approach of introducing time-constant predictors in LGM is widely used, but it does come with an important assumption. We assume the impact of the predictors on the slope is constant. This is not always the case. For example, the effect of age on income might be different at the start of the study compared to the end.
To account for this, we can use the predictors to explain observed variables in the model (instead of the latent ones). We could represent this in a new SEM graph as follows:
Instead of two coefficients (on the intercept and slope), we have four different ones, one for each wave. This allows us to see if the effect of the predictors is constant in time.
Returning to our example with log income, we can include female and age_center as predictors of log income at each wave. The syntax is similar to the one above, but we change the outcome variables:
The results regarding the first wave are slightly different from before, with females having lower incomes of 0.415 instead of 0.399 (compared to men), while the effect of age is the same. If we look at the impact on the other waves, we see that the effect of sex is not linear; for example, in wave 2, it is -0.367, while in wave 4, it is -0.407. This would indicate that the effect of females on income in the previous model might have been misleading. On the other hand, age’s effect seems to be approximately linear, decreasing with each wave. In this case, the results from the previous model may not have been a bad approximation.
We could also use a mix of these two approaches. For example, we might allow for the effect of female to be different in time while for that of age to be constant. This would be done by explaining both the observed and latent variables. For example, we could allow for the effect of female to be different in time but for that of age to be constant using this code:
model <- ' i =~ 1*logincome_1 + 1*logincome_2 +
1*logincome_3 + 1*logincome_4
s =~ 0*logincome_1 + 1*logincome_2 +
2*logincome_3 + 3*logincome_4
logincome_1 ~ female
logincome_2 ~ female
logincome_3 ~ female
logincome_4 ~ female
i + s ~ age_center'
fit3 <- growth(model, data = usw)
summary(fit3, standardized = TRUE)
The coefficients for age are similar to those in the original model. We can compare the last two models to see if assuming linear change for age significantly decreases the fit of the model:
If we look at the BIC (which tends to select simpler models), the results indicate that the model with a constant effect of age on income is better than the one that allows it to vary in time. This would indicate that the assumption of constant effects of age on the change of income is not too wrong in this case.
We could also compare this model with the original one to see if the effect of female is constant in time.
Here again, we see that the new model is better than the original one. This implies that assuming the effect of “female” to be equal in time might be problematic but reasonable for age.
We can still calculate the expected values for the different scenarios in this model. Typically it is best to calculate the expected intercept and slope for different cases and then calculate the expected values for each wave. For example, the expected intercept and slope for someone who is 16 based on our model would be:
You could calculate the expected values for each wave and each scenario. This would allow you to see how the expected values change over time and how they differ between people. The results could then be visualized or presented in a table to make them easier to interpret.
Conclusions
Latent Growth Models provide a framework for studying change over time, and the inclusion of time-constant predictors helps better understand the predictors that explain both initial levels and the rate of change of key variables.
In this post, we explored how to incorporate time-constant predictors into an LGM, modelling their effects on intercept (initial status) and slope (rate of change). We demonstrated how to estimate these models in R, extract key coefficients, and interpret their impact on the trajectory of the outcome variable.
We also examined two different strategies for including time-constant predictors in the model. The first approach is more straightforward but assumes that the effects of change in time are constant. The second strategy creates a more complex model but allows for non-linear effects on the outcome. We also considered how to decide between the two approaches and even combine them.
You can learn how to estimate non-linear LGM models here, and how to include time-varying predictors in this post.