Hands-On Introduction to Moderation Analysis

In real-world research, relationships between variables are rarely uniform. An intervention might work well for one group but barely move the needle for another; a strong association in one context may vanish when conditions change. Moderation analysis helps uncover these patterns by testing whether the effect of one variable on another depends on a third variable. Without it, analysts risk reporting averaged effects that mask meaningful heterogeneity—leading to misleading conclusions, poor policy decisions, and weak theoretical insights. In practice, moderation is essential for understanding subgroup differences, contextual influences, and conditional effects.

Access the code used here.
Access the data here.

Introduction to moderation analysis

Moderation analysis helps us understand when and for whom an effect holds. The link between a predictor (X) and an outcome (Y) is often not constant—its strength may depend on a third variable, the moderator (M). Examples include whether stress affects health differently by gender, whether life events predict illness depending on perceived control, or whether sleep benefits cognition more for active individuals. These context-dependent patterns reveal important nuance that average effects hide.

The figure below illustrates the basic idea. X predicts Y through the main path (c). Here, M modifies this relationship. The dashed arrow represents the interaction effect (cxm): the slope of X→Y changes depending on the level of M. Moderation is therefore about identifying the conditions that strengthen, weaken, or reverse the association between X and Y.

conceptual representation of moderation analysis

Conceptually, moderation clarifies the conditions under which an effect is present. When researchers suspect that the X→Y relationship varies across subgroups or situations, moderation provides a formal way to test this. Predictors and moderators sit at the same causal level, and we typically use moderation when results appear inconsistent or unexpectedly small. It tells us not how X produces Y, but under what circumstances the effect is stronger, weaker, or absent.

Several modelling tools are available for testing moderation. The most common is including an interaction term between X and M in a regression model. In SEM frameworks, multiple-group analysis examines whether paths differ across groups, and Johnson–Neyman intervals indicate the specific values of the moderator where the effect of X on Y becomes significant. These approaches make it straightforward to estimate and visualise moderation effects in R.

How is moderation different from mediation?

Although the terms sound similar, mediation and moderation answer very different research questions. Mediation focuses on how or why an effect occurs. A mediator sits in the causal chain between the predictor and the outcome, capturing the mechanism through which X influences Y. Mediation is most useful when theory or data suggest a strong relationship between the predictor of interest and the mediator, and when the goal is to uncover the underlying process leading to Y.

Moderation, in contrast, asks when, for whom, or under what conditions an effect is present. A moderator does not lie along the causal pathway between 𝑋 and 𝑌; instead, it alters the strength or direction of their relationship. Moderation is typically explored when results are inconsistent, unexpectedly weak, or vary across contexts. Importantly, predictors and moderators sit at the same causal level—neither causes the other—making moderation a tool for identifying conditional effects rather than mechanisms.

In short, mediation explains why an effect happens, while moderation shows when it happens.

Regression models in R

To explore moderation effects, we begin with a simple longitudinal regression model that predicts wave two (log)income using a set of baseline characteristics.

m1 <- lm(logincome_2 ~ age + degree.fct + single_1 + 
           urban.fct_1 + sf12mcs_1,
   data = usw)

summary(m1)

## 
## Call:
## lm(formula = logincome_2 ~ age + degree.fct + single_1 + urban.fct_1 + 
##     sf12mcs_1, data = usw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.3363 -0.3662  0.2613  0.7536  2.9219 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          6.9715046  0.0423587 164.583   <2e-16 ***
## age                  0.0082467  0.0003794  21.739   <2e-16 ***
## degree.fctNo degree -0.6900677  0.0140985 -48.946   <2e-16 ***
## single_1            -0.1858810  0.0139242 -13.349   <2e-16 ***
## urban.fct_1Urban    -0.0120416  0.0161408  -0.746    0.456    
## sf12mcs_1            0.0008782  0.0006594   1.332    0.183    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.252 on 36085 degrees of freedom
##   (14916 observations deleted due to missingness)
## Multiple R-squared:  0.08089,    Adjusted R-squared:  0.08077 
## F-statistic: 635.2 on 5 and 36085 DF,  p-value: < 2.2e-16

The results indicate that older individuals report higher income at wave 2, and those without a degree earn substantially less, even after adjusting for other factors. Being single is also associated with lower income, while living in an urban area and mental health do not show a meaningful difference.

We can also write down the regression equation based on the estimated coefficients from the model:

log(income)_2i = 6.97 + 0.008·age_i – 0.69·degreeNoDegree_i – 0.186·single_1i – 0.012·urban_1i + 0.001·sf12mcs_1i

This type of notation can be useful when interpreting more complex models (as we will see below).

One thing to note about these results is that they are additive. We can say what the effect of a variable on the outcome is while holding the other predictors constant. For example, we can say that for each additional year of age, logged income at wave two increases by 0.008 units, controlling for education, partnership status, urban residence, and mental health. But these effects are assumed to be the same across levels of other predictors. For instance, the impact of age on income is assumed to be the same for both those who have a partner and those who are single.

Moderation analysis with two continuous variables

For now, we will use the strategy of including interaction terms between pairs of predictors to explore moderation effects. This allows us to see how the effect of one predictor varies with the level of another predictor. We will start by exploring the interaction between two continuous variables: age and mental health.

An interaction can be created by multiplying two variables. This interaction term is then included in the regression model alongside the main effects of the two original variables. By doing this, we can assess whether the relationship between a predictor and the outcome varies across levels of the other predictor.

For example, if we want to find out whether the effect of age on logged income at wave 2 depends on mental health, we would create an interaction term between age and mental health. This variable captures how the slope of age changes across different levels of mental health. Including this interaction in the regression model allows us to test whether the association between age and income is stronger or weaker for individuals with better or worse mental health.

Here is how we can create the new variable:

usw <- mutate(usw, age_mcs = age * sf12mcs_1)

We can explore a few values of the new variable to see what it looks like.

## # A tibble: 10 × 4
##      age sf12mcs_1 age_mcs     n
##    <dbl>     <dbl>   <dbl> <int>
##  1    16      54.2    867.     1
##  2    43      26.6   1146.     1
##  3    48      60.3   2895.     1
##  4    49      49.5   2424.     1
##  5    51      39.8   2031.     1
##  6    54      51.8   2796.     1
##  7    62      54.8   3396.     1
##  8    65      55.2   3585.     1
##  9    77      39.4   3036.     1
## 10    77      58.9   4536.     1

We see that the new variable is simply the product of age and mental health. We can now build a model that includes this interaction term alongside the main effects.

m2 <- lm(logincome_2 ~ age + sf12mcs_1 + age_mcs, 
         data = usw)

summary(m2)

## 
## Call:
## lm(formula = logincome_2 ~ age + sf12mcs_1 + age_mcs, data = usw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.9524 -0.4163  0.2437  0.7898  2.5485 
## 
## Coefficients:
##               Estimate Std. Error t value            Pr(>|t|)    
## (Intercept) 6.51325967 0.10210298  63.791 <0.0000000000000002 ***
## age         0.00374905 0.00203224   1.845              0.0651 .  
## sf12mcs_1   0.00018642 0.00197429   0.094              0.9248    
## age_mcs     0.00007057 0.00003889   1.815              0.0696 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.298 on 36104 degrees of freedom
##   (14899 observations deleted due to missingness)
## Multiple R-squared:  0.01127,    Adjusted R-squared:  0.01119 
## F-statistic: 137.2 on 3 and 36104 DF,  p-value: < 0.00000000000000022

Alternatively, if we use the * operator between two variables in the regression formula, R automatically includes both main effects and their interaction term. This is a more concise way to specify the same model without creating a new variable.

m2 <- lm(logincome_2 ~ age*sf12mcs_1, 
         data = usw)

summary(m2)

## 
## Call:
## lm(formula = logincome_2 ~ age * sf12mcs_1, data = usw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.9524 -0.4163  0.2437  0.7898  2.5485 
## 
## Coefficients:
##                 Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)   6.51325967 0.10210298  63.791 <0.0000000000000002 ***
## age           0.00374905 0.00203224   1.845              0.0651 .  
## sf12mcs_1     0.00018642 0.00197429   0.094              0.9248    
## age:sf12mcs_1 0.00007057 0.00003889   1.815              0.0696 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.298 on 36104 degrees of freedom
##   (14899 observations deleted due to missingness)
## Multiple R-squared:  0.01127,    Adjusted R-squared:  0.01119 
## F-statistic: 137.2 on 3 and 36104 DF,  p-value: < 0.00000000000000022

The model includes an interaction between age and mental health, allowing the effect of age on income to vary by mental health level. The coefficient for age (0.0037) represents the effect of age when mental health is 0. The coefficient for “sf12mcs_1” (0.00019) represents the effect of mental health when age is 0. In this latter case, the main effect on its own is not meaningful as we don’t have any participants with age 0. The interaction term age_mcs (0.00007) indicates that the effect of age on income becomes slightly stronger as mental health increases. While small and only marginally significant, this suggests that older individuals with better mental health may experience somewhat higher income compared to those with lower mental health.

We can also write down the regression equation based on the estimated coefficients from the model:

log(income)_2i = 6.513 + 0.004·age_i + 0.001·sf12mcs_1i + 0.001·(age_i × sf12mcs_1i)

Based on this, let’s compute the predicted log income for two example individuals: someone who is 30 years old and has a mental health score of 40 (below the average of 50), and someone who is 60 years old and has a mental health score of 70.

ŷ_30,40 = 6.513 + 0.004·(30) + 0.001·(40) + 0.001·(30×40) ≈ 6.633

ŷ_60,70 = 6.513 + 0.004·(60) + 0.001·(70) + 0.001·(60×70) ≈ 6.753

The second individual (older, better mental health) has a slightly higher predicted logged income. Because the interaction term is small, the differences are modest. Still, the pattern is consistent with the model: income increases with age, and this increase becomes marginally stronger when mental health is higher.

Instead of calculating predicted values by hand, we can use R. We can create a new data frame with hypothetical age and mental health values, and then use the predict() function to obtain the predicted log income for these scenarios.

mylist <- data.frame(age = seq(16, 100, by = 20),
                     sf12mcs_1 = seq(20, 80, by = 15))

mylist

##   age sf12mcs_1
## 1  16        20
## 2  36        35
## 3  56        50
## 4  76        65
## 5  96        80

cbind(mylist, predicted_mcs = predict(m2, mylist))

##   age sf12mcs_1 predicted_mcs
## 1  16        20      6.599556
## 2  36        35      6.743673
## 3  56        50      6.930134
## 4  76        65      7.158939
## 5  96        80      7.430089

We see, for example, that someone who is 16 and has 20 on the mental health scale has a predicted logged income of 6.59, while someone who is 96 and has 80 on the mental health scale has a predicted logged income of 7.43. This shows how both age and mental health contribute to higher predicted income.

To better understand how the interaction effect impacts the results, we can also visualise these patterns. We could do this by hand or use the emmeans package, which has some helpful functions. One of these allows us to create interaction plots that show how the effect of one predictor varies across levels of another variable. In this case, we can plot the predicted log income by age for several fixed values of mental health.

library(emmeans)
emmip(m2, sf12mcs_1 ~ age, at = mylist, CIs = TRUE)

moderation analysis for mental health and age on income

The upward-sloping lines indicate that income increases with age for all groups. The separation between the lines shows how this relationship varies by mental health: individuals with higher scores have slightly higher predicted income at every age. The lines also fan out gradually, indicating that the age–income relationship becomes slightly stronger at higher levels of mental health. Although the differences are modest, the pattern is consistent: better mental health is associated with somewhat higher income, and this gap widens as people get older.

Moderation analysis with two categorical variables

So far, we have explored moderation effects between two continuous variables. Another common scenario is when both the predictor and the moderator are categorical variables. In this case, we can create interaction terms by combining the categories of the two variables. This allows us to see how the effect of one categorical variable varies across levels of another categorical variable.

As an example, we will explore how being single and having a degree moderate each other. We will create a single interaction variable combining the two categorical variables into four categories: “No degree, partnered”, “No degree, single”, “Degree, partnered”, and “Degree, single”.

usw <- usw %>%
  mutate(degree_single = case_when(
    degree.fct == "No degree" & single_1 == 0 ~ "No degree, partnered",
    degree.fct == "No degree" & single_1 == 1 ~ "No degree, single",
    degree.fct == "Degree" & single_1 == 0 ~ "Degree, partnered",
    degree.fct == "Degree" & single_1 == 1 ~ "Degree, single"
  ))

count(usw, degree.fct, single_1, degree_single)

## # A tibble: 6 × 4
##   degree.fct single_1  degree_single            n
##   <fct>      <dbl+lbl> <chr>                <int>
## 1 Degree     0 [No]    Degree, partnered    11156
## 2 Degree     1 [Yes]   Degree, single        5391
## 3 No degree  0 [No]    No degree, partnered 20074
## 4 No degree  1 [Yes]   No degree, single    14275
## 5 <NA>       0 [No]    <NA>                    67
## 6 <NA>       1 [Yes]   <NA>                    44

We can now include the new variable in the regression model to see how the effect of education on logged income at wave 2 varies by partnership status.

m3 <- lm(logincome_2 ~ degree_single - 1,
         data = usw)

summary(m3)

## 
## Call:
## lm(formula = logincome_2 ~ degree_single - 1, data = usw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0590 -0.3427  0.2862  0.7497  2.7749 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## degree_singleDegree, partnered     7.36160    0.01354   543.5   <2e-16 ***
## degree_singleDegree, single        7.21539    0.02066   349.3   <2e-16 ***
## degree_singleNo degree, partnered  6.76051    0.01024   659.9   <2e-16 ***
## degree_singleNo degree, single     6.43646    0.01263   509.6   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.272 on 38178 degrees of freedom
##   (12825 observations deleted due to missingness)
## Multiple R-squared:  0.9668, Adjusted R-squared:  0.9668 
## F-statistic: 2.782e+05 on 4 and 38178 DF,  p-value: < 2.2e-16

Note that here - 1 tells R to remove the intercept, or constant, enabling us to estimate all the regression coefficients.

We can interpret the regression coefficients as the expected log income for that group. Based on this, the highest income is for individuals with a degree who are partnered, while the lowest income is for those without a degree who are single. The other two groups fall in between, with individuals with a degree earning more regardless of their relationship status. The fact that the differences between groups are not constant suggests a moderation effect as well.

An alternative way to run this model is to use the * operator between the two categorical variables. This automatically includes both main effects and their interaction term in the regression model.

m4 <- lm(logincome_2 ~ degree.fct*single_1,
   data = usw)

summary(m4)

## 
## Call:
## lm(formula = logincome_2 ~ degree.fct * single_1, data = usw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0590 -0.3427  0.2862  0.7497  2.7749 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                   7.36160    0.01354 543.544  < 2e-16 ***
## degree.fctNo degree          -0.60109    0.01698 -35.396  < 2e-16 ***
## single_1                     -0.14621    0.02470  -5.919 3.26e-09 ***
## degree.fctNo degree:single_1 -0.17784    0.02957  -6.014 1.83e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.272 on 38178 degrees of freedom
##   (12825 observations deleted due to missingness)
## Multiple R-squared:  0.07029,    Adjusted R-squared:  0.07022 
## F-statistic: 962.1 on 3 and 38178 DF,  p-value: < 2.2e-16

The results show that both having a degree and being in a relationship are associated with higher logged income at wave 2. The interaction term between degree and partnership status is also significant, indicating that the effect of having a degree on income differs depending on whether the individual is single. Specifically, the income advantage of having a degree is larger for partnered individuals compared to singles.

To see that this model is the same as the previous one and represents an alternative way to write it, we can look at the equation.

log(income)_2i = 7.362 − 0.601·NoDegree_i − 0.146·Single_i − 0.178·(NoDegree_i × Single_i)

If we compute the expected score for someone who has a degree and is single, we get the following value:

ŷ_{degree, not single} = 7.362 − 0.601(0) − 0.146(0) − 0.178(0×0) = 7.362

On the other hand, if we look at someone who does not have a degree and is single, we get:

ŷ_{no degree, single} = 7.362 − 0.601(1) − 0.146(1) − 0.178(1×1) = 6.437

We can automate this process using the emmeans package to compute the estimated marginal means for each combination of degree and partnership status.

emmeans(m4, ~ degree.fct * single_1)

##  degree.fct single_1 emmean     SE    df lower.CL upper.CL
##  Degree            0  7.362 0.0135 38178    7.335    7.388
##  No degree         0  6.761 0.0102 38178    6.740    6.781
##  Degree            1  7.215 0.0207 38178    7.175    7.256
##  No degree         1  6.436 0.0126 38178    6.412    6.461
## 
## Confidence level used: 0.95

The results are the same as in our previous model, confirming that both approaches yield identical estimates for the interaction between education and partnership status. The decision between the two methods often comes down to personal preference. Including an interaction in the model yields a statistical test and an estimate of the moderation effect. That being said, it is also harder to interpret. A good approach is to use the interaction and then the emmeans() function to get the predicted values for easier interpretation.

We can also visualise the interaction effect using emmip(). This plot shows how the predicted logged income at wave 2 varies across education levels for those who are single and those who are not.

emmip(m4, degree.fct ~ single_1 , CIs = TRUE)

moderation analysis for being single and degree on income

We now see more clearly that for those without a degree, being single has a more negative effect.

Moderation analysis with categorical and continuous variables

When we want to investigate moderation effects between a continuous predictor and a categorical moderator, we can use the same strategy. For example, if we want to see whether there is a moderation effect between age and being single, we can create a new variable equal to the product of age and “single_1”.

usw <- mutate(usw, age_single = age * single_1)

set.seed(1234)
sample_n(usw, 10) %>% 
  count(age, single_1, age_single)

## # A tibble: 10 × 4
##      age single_1  age_single     n
##    <dbl> <dbl+lbl>      <dbl> <int>
##  1    16 0 [No]             0     1
##  2    43 0 [No]             0     1
##  3    48 1 [Yes]           48     1
##  4    49 1 [Yes]           49     1
##  5    51 1 [Yes]           51     1
##  6    54 0 [No]             0     1
##  7    62 0 [No]             0     1
##  8    65 0 [No]             0     1
##  9    77 0 [No]             0     1
## 10    77 1 [Yes]           77     1

We can now run the model with the interaction and the main effects.

m5 <- lm(logincome_2 ~ age + single_1 + age_single,
         data = usw)

summary(m5)

## 
## Call:
## lm(formula = logincome_2 ~ age + single_1 + age_single, data = usw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.3118 -0.4345  0.2299  0.7852  3.0118 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.3215468  0.0277520  263.82   <2e-16 ***
## age         -0.0070225  0.0005441  -12.91   <2e-16 ***
## single_1    -1.5096438  0.0370353  -40.76   <2e-16 ***
## age_single   0.0264041  0.0007458   35.40   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.283 on 38229 degrees of freedom
##   (12774 observations deleted due to missingness)
## Multiple R-squared:  0.05451,    Adjusted R-squared:  0.05444 
## F-statistic: 734.7 on 3 and 38229 DF,  p-value: < 2.2e-16

Again, this is equivalent to using the * operator between age and “single_1”, which automatically includes both main effects and their interaction term in the regression model.

m5 <- lm(logincome_2 ~ age *single_1,
         data = usw)

summary(m5)

## 
## Call:
## lm(formula = logincome_2 ~ age * single_1, data = usw)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.3118 -0.4345  0.2299  0.7852  3.0118 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.3215468  0.0277520  263.82   <2e-16 ***
## age          -0.0070225  0.0005441  -12.91   <2e-16 ***
## single_1     -1.5096438  0.0370353  -40.76   <2e-16 ***
## age:single_1  0.0264041  0.0007458   35.40   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.283 on 38229 degrees of freedom
##   (12774 observations deleted due to missingness)
## Multiple R-squared:  0.05451,    Adjusted R-squared:  0.05444 
## F-statistic: 734.7 on 3 and 38229 DF,  p-value: < 2.2e-16

The model shows that the relationship between age and income differs substantially between single participants and those with a partner. For people with a partner, age has a negative effect on logged income (–0.007), indicating that income decreases slightly with age in this group. The main effect of being single is difficult to interpret, as it refers to the expected effect at age 0. The positive interaction term (+0.026) indicates that the effect of age is positive for those who are single.

To better understand this interaction, we can visualise it using an interaction plot.

emmip(m5,
      single_1 ~ age,
      at = list(age = seq(16, 100, by = 20),
                single = c(1, 0)
                ),
      CIs = TRUE)

moderation analysis for being single and age on income

This shows that the effect of age on income for single and non-single individuals goes in opposite directions. For people who have a partner, predicted income starts higher but gradually declines as people get older. For single individuals, income begins noticeably lower but increases with age. The two lines cross around mid-life, meaning that while single individuals earn less when young, their income eventually surpasses that of non-single individuals later in life.

Want to learn more?

Check our on-demand course on moderation and mediation.

Explore course

Using multiple models to investigate moderation

So far, we explored moderation effects using a single regression model that includes both the main effects and interaction terms. Another approach to investigate moderation is to run separate regression models for each level of the moderator variable. This allows us to see how the predictor’s effect on the outcome differs across groups defined by the moderator.

If we look at our previous example of degree and logged income, we can run two separate models: one for single individuals and another for those with a partner. In this way, we can see how the effect of degree on logged income differs between these two groups.

m4a <- lm(logincome_2 ~ degree.fct, 
         data = usw,
         subset = single_1 == 1) 

m4b <- lm(logincome_2 ~ degree.fct, 
         data = usw,
         subset = single_1 == 0) 

summary(m4a)

## 
## Call:
## lm(formula = logincome_2 ~ degree.fct, data = usw, subset = single_1 == 
##     1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.9128 -0.2623  0.3627  0.7996  2.7749 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          7.21539    0.02218  325.25   <2e-16 ***
## degree.fctNo degree -0.77893    0.02600  -29.96   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.366 on 13936 degrees of freedom
##   (5772 observations deleted due to missingness)
## Multiple R-squared:  0.06049,    Adjusted R-squared:  0.06043 
## F-statistic: 897.3 on 1 and 13936 DF,  p-value: < 2.2e-16

summary(m4b)

## 
## Call:
## lm(formula = logincome_2 ~ degree.fct, data = usw, subset = single_1 == 
##     0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.0590 -0.3814  0.2443  0.7149  2.4508 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          7.36160    0.01293  569.22   <2e-16 ***
## degree.fctNo degree -0.60109    0.01622  -37.07   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.215 on 24242 degrees of freedom
##   (7053 observations deleted due to missingness)
## Multiple R-squared:  0.05364,    Adjusted R-squared:  0.0536 
## F-statistic:  1374 on 1 and 24242 DF,  p-value: < 2.2e-16

We see that the degree has a larger effect on logged income for single individuals (–0.78) compared to individuals with a partner (–0.60). This suggests that the income advantage of having a degree is more pronounced for single people.

We could use a similar approach for the age and single interaction by running separate models.

m5a <- lm(logincome_2 ~ age,
         data = usw,
         subset = single_1 == 0)

summary(m5a)

## 
## Call:
## lm(formula = logincome_2 ~ age, data = usw, subset = single_1 == 
##     0)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.9066 -0.4061  0.2222  0.7332  2.4165 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  7.3215468  0.0269068  272.11   <2e-16 ***
## age         -0.0070225  0.0005275  -13.31   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.244 on 24270 degrees of freedom
##   (7025 observations deleted due to missingness)
## Multiple R-squared:  0.007249,   Adjusted R-squared:  0.007208 
## F-statistic: 177.2 on 1 and 24270 DF,  p-value: < 2.2e-16

m5b <- lm(logincome_2 ~ age,
         data = usw,
         subset = single_1 == 1)

summary(m5b)

## 
## Call:
## lm(formula = logincome_2 ~ age, data = usw, subset = single_1 == 
##     1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.3118 -0.4796  0.2452  0.8814  3.0118 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 5.811903   0.025771  225.52   <2e-16 ***
## age         0.019382   0.000536   36.16   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.348 on 13959 degrees of freedom
##   (5749 observations deleted due to missingness)
## Multiple R-squared:  0.08564,    Adjusted R-squared:  0.08558 
## F-statistic:  1307 on 1 and 13959 DF,  p-value: < 2.2e-16

Now it is easier to see that age has very different effects on income for the two groups. For those with a partner, age has a small negative effect on logged income (–0.007), while for single individuals, age has a positive impact (+0.019). This confirms our earlier finding that the relationship between age and income differs substantially based on partnership status.

When we have a continuous moderator, we can also use a similar approach by splitting the data into groups based on the moderator’s levels. For example, we can create groups based on low, medium, and high levels of mental health, and then run separate regression models for each group to examine how age affects logged income.

The advantage of this approach is that it allows us to see differences in effects more clearly and to have all coefficients differ across groups (e.g., residuals). However, it also has some limitations. Running separate models can lead to a smaller sample, thereby reducing statistical power. Additionally, it can be more cumbersome to compare results across multiple models compared to a single model with interaction terms.

Conclusions regarding moderation analysis

Moderation analysis allows us to move beyond simple average effects and examine how relationships vary across people, contexts, and time—an essential step in understanding heterogeneity in data. Regressions that include interactions, combined with tools such as marginal-effect plots, can make these patterns clearer, while alternative approaches, such as running group-specific models, can offer greater flexibility.

Was the information useful?

Consider supporting the site:

Subscribe to newsletter 📬

Buy a coffee ☕

Buy the book 📖

Take a course 🏫

Longitudinal Analysis

Longitudinal methods, made clear