How to run a multiple mediation model in R

Representing complex social relationships using statistical models is often a challenge. Popular techniques like regression analysis do not really capture the full complexity of relationships between variables. One approach to moving beyond this simplistic model is to use techniques such as mediation analysis. This allows for the inclusion of multiple pathways to better represent the relationship between variables. In a previous post, we introduced mediation analysis, and we looked at some relatively simple examples. Here, we extend this to examine what happens in a more complex situation when using a multiple mediation model.

As an example, let’s consider the association between being single and life satisfaction. Physical health, mental health, and income could each contribute to that association. Estimating three separate single-mediator models would only give us a limited and bias image of the relationships in the data. To better understand these relationships we need a multiple mediation model.

What is a multiple mediation model?

A standard mediation model contains one mediator between a predictor and an outcome. A multiple mediation model contains two or more mediators. The mediators can operate in parallel, in sequence, or through a mixture of both.

Parallel mediation asks whether several mechanisms explain the a relationship at the same time. Serial mediation makes a stronger claim: one mediator changes another mediator, which then changes the outcome. The distinction matters because the indirect effect is calculated from different paths.

In the R package lavaan, an indirect effect is defined by labelling regression coefficients and multiplying them. For a pathway from X to M to Y, the indirect effect is a*b:

model <- '
  M ~ a*X
  Y ~ b*M + c*X

  indirect := a*b
  total := c + indirect
'

For a refresher on the single-mediator case, see the practical introduction to path and mediation analysis in R. The official lavaan mediation tutorial also explains parameter labels and defined effects.

How to fit a multiple mediation model in R

The examples use usw, which is the wide data from Understanding Society, where each respondent represents one row and repeated measurements have wave suffixes such as _1, _2, and _3.

load("../data/us_clean_syn.RData")

dat <- usw |>
  mutate(
    age_num = as.numeric(haven::zap_labels(age)),
    age10 = (age_num - mean(age_num, na.rm = TRUE)) / 10,
    female = ifelse(gndr.fct == "Female", 1, 0),
    degree = ifelse(degree.fct == "Degree", 1, 0),
    single1 = as.numeric(haven::zap_labels(single_1))
  )

dat |>
  select(single1, degree, age10, female,
         logincome_1, sati_1,
         sf12pcs_1, sf12mcs_1,
         sf12pcs_2, sf12mcs_2, sati_3) |>
  summary()
##     single1           degree           age10             female      
##  Min.   :0.0000   Min.   :0.0000   Min.   :-2.9534   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:-1.4534   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :-0.1534   Median :1.0000  
##  Mean   :0.3864   Mean   :0.3251   Mean   : 0.0000   Mean   :0.5473  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.: 1.3466   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   : 5.5466   Max.   :1.0000  
##                   NA's   :111                                        
##   logincome_1        sati_1        sf12pcs_1       sf12mcs_1    
##  Min.   :2.303   Min.   :1.000   Min.   : 4.41   Min.   : 0.00  
##  1st Qu.:6.267   1st Qu.:5.000   1st Qu.:45.08   1st Qu.:45.91  
##  Median :7.012   Median :6.000   Median :53.46   Median :52.99  
##  Mean   :6.633   Mean   :5.248   Mean   :49.49   Mean   :50.44  
##  3rd Qu.:7.581   3rd Qu.:6.000   3rd Qu.:57.20   3rd Qu.:57.16  
##  Max.   :9.211   Max.   :7.000   Max.   :74.90   Max.   :76.35  
##  NA's   :42      NA's   :11581   NA's   :3593    NA's   :3593   
##    sf12pcs_2       sf12mcs_2         sati_3     
##  Min.   : 5.06   Min.   : 0.00   Min.   :1.000  
##  1st Qu.:44.66   1st Qu.:44.62   1st Qu.:4.000  
##  Median :53.56   Median :51.83   Median :6.000  
##  Mean   :49.76   Mean   :49.84   Mean   :5.115  
##  3rd Qu.:57.49   3rd Qu.:57.09   3rd Qu.:6.000  
##  Max.   :71.95   Max.   :77.09   Max.   :7.000  
##  NA's   :23293   NA's   :23293   NA's   :23641

Age is centred and divided by ten, so a one-unit increase means ten years. The models adjust for age and gender to control for their possible confounding.

Example 1: Two parallel mediators

In the first example we explore whether being single is associated with life satisfaction through physical and mental health. The two mediators are estimated together and their residuals are allowed to correlate to account for any common variance.

model_1 <- '
  sf12pcs_1 ~ a1*single1 + age10 + female
  sf12mcs_1 ~ a2*single1 + age10 + female
  sati_1 ~ b1*sf12pcs_1 + b2*sf12mcs_1 +
           c*single1 + age10 + female

  sf12pcs_1 ~~ sf12mcs_1

  ind_physical := a1*b1
  ind_mental := a2*b2
  ind_total := ind_physical + ind_mental
  total := c + ind_total
'

fit_1 <- fit_mediation(model_1)
Flow diagram showing how being single relates to physical and mental health, which contribute to life satisfaction.

Blue arrows are positive and red arrows are negative. The labels show standardised coefficients. Age, gender, and the mediator residual correlation remain in the fitted model but are omitted from the diagram.

Here are the key coefficients of interest:

EffectEstimateSEp95% CIStandardized
ind_physical-0.0490.003<.001[-0.055, -0.044]-0.016
ind_mental-0.0760.006<.001[-0.087, -0.065]-0.025
ind_total-0.1250.006<.001[-0.138, -0.113]-0.042
total-0.2660.016<.001[-0.297, -0.236]-0.089
Indirect effects for the two-mediator model

Being single predicts poorer physical and mental health. Both health measures predict higher life satisfaction, so both indirect effects are negative. The indirect effect through physical health is -0.049; the effect through mental health is -0.076. Together they account for an indirect effect of -0.125 points on the satisfaction scale.

Mental health provides a stronger pathway, but both effects are small. With more than 50,000 respondents, small coefficients can be estimated precisely, so statistical significance should not be mistaken for practical importance.

Example 2: Three mediators in a multiple mediation model

A multiple mediation model can include more than two mediators. Here, we add log income to the physical- and mental-health pathways.

model_2 <- '
  sf12pcs_1 ~ a1*single1 + age10 + female
  sf12mcs_1 ~ a2*single1 + age10 + female
  logincome_1 ~ a3*single1 + age10 + female

  sati_1 ~ b1*sf12pcs_1 + b2*sf12mcs_1 + b3*logincome_1 +
           c*single1 + age10 + female

  sf12pcs_1 ~~ sf12mcs_1 + logincome_1
  sf12mcs_1 ~~ logincome_1

  ind_physical := a1*b1
  ind_mental := a2*b2
  ind_income := a3*b3
  ind_total := ind_physical + ind_mental + ind_income
  total := c + ind_total
'

fit_2 <- fit_mediation(model_2)

Here is a visual representation of the results and the main coefficients of interest:

Infographic: a flow diagram where 'Single' leads to 'Physical health', 'Mental health', and 'Log income', all pointing to 'Life satisfaction'.
EffectEstimateSEp95% CIStandardized
ind_physical-0.0510.003<.001[-0.057, -0.045]-0.017
ind_mental-0.0770.006<.001[-0.088, -0.065]-0.025
ind_income0.0150.002<.001[0.011, 0.019]0.005
ind_total-0.1130.007<.001[-0.126, -0.099]-0.037
total-0.2660.016<.001[-0.297, -0.236]-0.089
Indirect effects for the three-mediator model

The physical-health (-0.051) and mental-health (-0.077) pathways are negative. The income pathway is small and positive (0.015). It partially offsets the health pathways rather than reinforcing them.

This is exactly why separate mediation models are a poor substitute for one multiple-mediator model. Separate models would not show how the pathways compete after accounting for the same controls and the other mediators.

Example 3: Parallel and serial pathways together

The third model uses degree status as the predictor and mental health as the outcome. Education can operate through income alone, satisfaction alone, or income followed by satisfaction.

model_3 <- '
  logincome_1 ~ a*degree + age10 + female
  sati_1 ~ d*logincome_1 + e*degree + age10 + female
  sf12mcs_1 ~ b*sati_1 + f*logincome_1 +
              c*degree + age10 + female

  ind_income := a*f
  ind_satisfaction := e*b
  ind_serial := a*d*b
  ind_total := ind_income + ind_satisfaction + ind_serial
  total := c + ind_total
'

fit_3 <- fit_mediation(model_3)

Again, here is the visual representation and main coefficients.

Directed diagram of causal links: Degree influences Log income, Life satisfaction, and Mental health; Log income also affects Life satisfaction; a red arrow highlights a direct path from Log income to Life satisfaction; Life satisfaction influences Mental health.
EffectEstimateSEp95% CIStandardized
ind_income0.0560.0240.019[0.009, 0.103]0.003
ind_satisfaction0.4670.042<.001[0.384, 0.549]0.022
ind_serial-0.0320.0110.003[-0.054, -0.011]-0.001
ind_total0.4900.047<.001[0.399, 0.582]0.023
total1.2220.093<.001[1.04, 1.404]0.057
Indirect effects for the hybrid model

The satisfaction-only pathway is the largest (0.467). By comparison, the income-only pathway is positive but small (0.056). A negative serial pathway (-0.032) appears because, conditional on education and the other controls, income has a small negative association with satisfaction in this synthetic dataset.

Example 4: Longitudinal parallel mediation

The previous examples use only cross-sectional data. Their paths make assumptions about causal order but the variables are measured at the same wave. This means that we are just assuming the relationships go in that particular direction. When using longitudinal data we can use the temporal order to help us decide on the causal direction.

In this example, age at the first wave predicts physical and mental health at wave 2. Those mediators predict life satisfaction at wave 3. The model controls wave-1 physical health, mental health, and satisfaction in both the mediator and outcome equations.

model_4 <- '
  sf12pcs_2 ~ a1*age10 + sf12pcs_1 + sf12mcs_1 + sati_1 + female
  sf12mcs_2 ~ a2*age10 + sf12pcs_1 + sf12mcs_1 + sati_1 + female

  sati_3 ~ b1*sf12pcs_2 + b2*sf12mcs_2 + c*age10 +
           sati_1 + sf12pcs_1 + sf12mcs_1 + female

  sf12pcs_2 ~~ sf12mcs_2

  ind_physical := a1*b1
  ind_mental := a2*b2
  ind_total := ind_physical + ind_mental
  total := c + ind_total
'

fit_4 <- fit_mediation(model_4)
Diagram of a causal model: Age (+10 years) influences physical health and mental health, which in turn influence life satisfaction (W3). Arrows show pathways including a curved path from Age to Life satisfaction.

Baseline controls are omitted from the diagram but remain in the model and are shown below.

EffectEstimateSEp95% CIStandardized
ind_physical-0.0130.002<.001[-0.017, -0.009]-0.015
ind_mental0.0110.001<.001[0.009, 0.013]0.013
ind_total-0.0020.0020.365[-0.006, 0.002]-0.002
total0.0390.006<.001[0.027, 0.05]0.046
Indirect effects for the longitudinal model
OutcomePredictorEstimatepStandardized
sf12pcs_2age10-1.371<.001-0.226
sf12pcs_2sf12pcs_10.487<.0010.500
sf12pcs_2sf12mcs_10.092<.0010.084
sf12pcs_2sati_10.317<.0010.043
sf12pcs_2female0.0290.7910.001
sf12mcs_2age100.443<.0010.083
sf12mcs_2sf12pcs_10.029<.0010.034
sf12mcs_2sf12mcs_10.169<.0010.175
sf12mcs_2sati_11.568<.0010.238
sf12mcs_2female-0.896<.001-0.046
sati_3sf12pcs_20.009<.0010.068
sati_3sf12mcs_20.025<.0010.157
sati_3age100.041<.0010.048
sati_3sati_10.204<.0010.198
sati_3sf12pcs_10.0010.2530.010
sati_3sf12mcs_10.007<.0010.044
sati_3female0.0020.9140.001
Regression paths for the longitudinal model

The physical-health indirect effect is negative (-0.013), while the mental-health indirect effect is positive (0.011). Both are different from zero. Their sum is only -0.002 and is not statistically significant.

While longitudinal mediation improves our understanding of the relationships in the data it does come with assumptions. Time-varying events between waves could affect both health and satisfaction, and the spacing of the waves may not match the speed of the process. The model could be extended to include more controls and pathways to get closer the real world mechanisms.

For a fuller treatment of temporal ordering, see the guide to longitudinal mediation in R.

Choosing a multiple mediation model

What type of mediation processes should you use in your research? The main guide should be grounded in theory or prior empirical findings. Usually, the parallel mediation is a good starting point. Use a serial model only when the sequence is supported by theory or the timing of the data collection. If repeated waves are available, a longitudinal model with baseline controls is usually more credible, although it does not remove all time-varying confounding.

Where to go next

Mediation models (and SEM more generally) force users to put their theoretical expectations on paper. While this makes it more challenging, it often makes an important contribution to our understanding of the mechanisms underlying the relationships of interest.

If you want a structured treatment of mediation, moderation, assumptions, and interpretation in R, see the mediation and moderation course. For another worked example now, read the longitudinal mediation guide with R code.


Was the information useful?

Consider supporting the site: