CFA example using SPADI data

Data example: CFA for the SPADI data in `lavaan`

The shoulder pain and disability index (SPADI) consists of a five-item pain subscale and an eight-item disability subscale

Roach, K. E., Budiman-Mak, E., Songsiridej, N., & Lertratanakul, Y. (1991). Development of a shoulder pain and disability index. Arthritis Care and Research: The Official Journal of the Arthritis Health Professions Association, 4(4), 143–149.

Pain items

P1 At its worst
P2 Lying on affected side
P3 Reaching for object on a high shelf
P4 Touching the back of your neck
P5 Pushing with involved arm

Disability items

D1 Washing your hair
D2 Washing your back
D3 Putting on undershirt or jumper
D4 Putting on a shirt that buttons down the front
D5 Putting on your pants
D6 Placing an object on a high shelf
D7 Carry heavy object
D8 Removing something from your back pocket

A recent validation study identified strengths and limitations not previously observed using CTT methods and concluded that the SPADI should be treated as two separate subscales and that, while the pain subscale is valid, the disability subscale is not and that clinicians should exercise caution when interpreting score changes on the disability subscale and attempt to compare their scores to age- and sex-stratified data.

Jerosch-Herold, C., Chester, R., Shepstone, L., Vincent, J. I., & MacDermid, J. C. (2017). An evaluation of the structural validity of the shoulder pain and disability index (SPADI) using the Rasch model. Quality of Life Research, 27(2), 389–400.

Data from a Danish validation study are published in a data repository

Christensen, K. B. (2018). SPADI [Data set]. https://doi.org/10.17894/ucph.9c28c17b-83f4-411d-9115-d457c46828a1

We download the data set from the home page

SPADI <- read.csv(url("https://erda.ku.dk/public/archives/bacc560d26b01f7a65e77a9712a92e86/SPADI.csv"))

We will look only at the eight disability items in what follows and we omit observations with missing values and omit observations with missing values.

small <- SPADI[,9:16]
NM <- small[complete.cases(small),]

We use the cfa() from the lavaan package to fit a CFA model to the SPADI data.

library('lavaan')

## Warning: pakke 'lavaan' blev bygget under R version 4.1.2

SPADI.model1 <- '
  # loadings
  F =~ D1 + D2 + D3 + D4 + D5 + D6 + D7 + D8
'

fit1 <- cfa(
  SPADI.model1, 
  data=NM,
  ordered = names(NM)
)

fitMeasures(fit1, fit.measures = c("chisq.scaled",
                                  "df.scaled",
                                  "pvalue.scaled",
                                  "rmsea.scaled",
                                  "rmsea.ci.lower.scaled",
                                  "rmsea.ci.upper.scaled")
)

##          chisq.scaled             df.scaled         pvalue.scaled 
##                68.157                20.000                 0.000 
##          rmsea.scaled rmsea.ci.lower.scaled rmsea.ci.upper.scaled 
##                 0.106                 0.079                 0.134

We observe that \(p \ll 0.05\) and therefore, we reject the null hypothesis that the proposed model fits as well as the unrestricted model: Discrepancy between model and data is significant.

We compute and compare the modification indices which measure how much better the model-fit would become if each fixed parameter was set free to be estimated.

MI <- modificationindices(fit1)
MI

##     lhs op rhs     mi    epc sepc.lv sepc.all sepc.nox
## 75   D1 ~~  D2  4.074  0.081   0.081    0.268    0.268
## 76   D1 ~~  D3  0.292 -0.022  -0.022   -0.079   -0.079
## 77   D1 ~~  D4  2.748 -0.079  -0.079   -0.240   -0.240
## 78   D1 ~~  D5  0.403 -0.029  -0.029   -0.096   -0.096
## 79   D1 ~~  D6  0.352  0.025   0.025    0.080    0.080
## 80   D1 ~~  D7  1.897  0.069   0.069    0.169    0.169
## 81   D1 ~~  D8  1.304 -0.052  -0.052   -0.176   -0.176
## 82   D2 ~~  D3  0.024  0.006   0.006    0.023    0.023
## 83   D2 ~~  D4  3.571 -0.104  -0.104   -0.312   -0.312
## 84   D2 ~~  D5  4.295 -0.118  -0.118   -0.378   -0.378
## 85   D2 ~~  D6  0.107  0.014   0.014    0.044    0.044
## 86   D2 ~~  D7  0.086 -0.015  -0.015   -0.037   -0.037
## 87   D2 ~~  D8  0.246  0.021   0.021    0.069    0.069
## 88   D3 ~~  D4  0.070  0.011   0.011    0.037    0.037
## 89   D3 ~~  D5  0.205 -0.020  -0.020   -0.071   -0.071
## 90   D3 ~~  D6  1.751  0.054   0.054    0.184    0.184
## 91   D3 ~~  D7  2.152 -0.079  -0.079   -0.210   -0.210
## 92   D3 ~~  D8  0.029  0.007   0.007    0.025    0.025
## 93   D4 ~~  D5 13.436  0.148   0.148    0.438    0.438
## 94   D4 ~~  D6  1.777 -0.077  -0.077   -0.221   -0.221
## 95   D4 ~~  D7  0.782 -0.054  -0.054   -0.121   -0.121
## 96   D4 ~~  D8  0.107  0.015   0.015    0.047    0.047
## 97   D5 ~~  D6  1.663 -0.072  -0.072   -0.222   -0.222
## 98   D5 ~~  D7  0.010 -0.006  -0.006   -0.015   -0.015
## 99   D5 ~~  D8  0.015 -0.006  -0.006   -0.018   -0.018
## 100  D6 ~~  D7  0.006  0.004   0.004    0.009    0.009
## 101  D6 ~~  D8  0.283 -0.025  -0.025   -0.080   -0.080
## 102  D7 ~~  D8  1.054  0.053   0.053    0.131    0.131

We observe that the pair D4 (Putting on a shirt that buttons down the front) and D5 (Putting on your pants) has the highest MI. Since it makes sense that these two items have correlated error terms (local dependence) the fit of the model can be improved by adding the correlated error terms.

SPADI.model2 <- '
  # loadings
  F =~ D1 + D2 + D3 + D4 + D5 + D6 + D7 + D8
  # correlated error terms (local dependence)
  D4 ~~ D5
'

fit2 <- cfa(
  SPADI.model2, 
  data=NM,
  ordered = names(NM)
)

fitMeasures(fit2, fit.measures = c("chisq.scaled",
                                  "df.scaled",
                                  "pvalue.scaled",
                                  "rmsea.scaled",
                                  "rmsea.ci.lower.scaled",
                                  "rmsea.ci.upper.scaled")
)

##          chisq.scaled             df.scaled         pvalue.scaled 
##                41.032                19.000                 0.002 
##          rmsea.scaled rmsea.ci.lower.scaled rmsea.ci.upper.scaled 
##                 0.073                 0.042                 0.104

We observe that the fit of model is improved by adding the correlated error terms, but discrepancy between model and data is significant as \(p < 0.05\).

We can add more correlated error terms corresponding to the pairs with the highest MIs.

SPADI.model3 <- '
  # loadings
  F =~ D1 + D2 + D3 + D4 + D5 + D6 + D7 + D8
  # correlated error terms (local dependence)
  D4 ~~ D5
  D2 ~~ D5
  D1 ~~ D2
  D2 ~~ D4
  D1 ~~ D4
'

fit3 <- cfa(
  SPADI.model3, 
  data=NM,
  ordered = names(NM)
)

fitMeasures(fit3, fit.measures = c("chisq.scaled",
                                  "df.scaled",
                                  "pvalue.scaled",
                                  "rmsea.scaled",
                                  "rmsea.ci.lower.scaled",
                                  "rmsea.ci.upper.scaled")
)

##          chisq.scaled             df.scaled         pvalue.scaled 
##                24.634                15.000                 0.055 
##          rmsea.scaled rmsea.ci.lower.scaled rmsea.ci.upper.scaled 
##                 0.055                 0.000                 0.092

Here, we have obtained model fit with \(p > 0.05\), but the model is difficult to understand.

Thresholds

Since we assume that the ordinal responses are generated by an underlying normal distribution the question arising is: What are the cut points, i.e., thresholds \(\tau_1,\ldots,\tau_5\), for the disability degree for a given item (D1\(,\ldots,\)D8), above which an individual scores \(1,\ldots,5\) on this item, respectively.

We can extract the estimated thresholds using the lavInspect() function from the lavaan package.

tau <- lavInspect(fit2, "sampstat")$th

We can illustrate this setup for the estimated thresholds for each item as follows: `

x <- seq(-4, 4, length=100)
hx <- dnorm(x)

for(j in 1:8) {
  plot(x, hx, type = "n", xlab = "x", ylab = "",
  main = paste0("D", j), yaxt = "n", xaxt = "n", bty = "n")

  lines(x, hx)
  for(i in grep(paste0("D", j), names(tau))) {
    segments(x0 = tau[i], y0 = -.2, x1 = tau[i], y1 = dnorm(tau[i]))
  }
  axis(c(-4,as.expression(sapply(1:5, function(x) bquote(tau["D"*.(x)]))),4), 
       side = 1, at = c(-4, tau[grep(paste0("D", j), names(tau))], 4))
}

Factor loadings

We can extract the estimated factor loadings (\(\lambda\)) using the inspect() function from the lavaan package.

lambda <- inspect(fit2, what = "std")$lambda

We can visualize the CFA path diagram with standardized estimates:

We observe that the factor loading for item 7 is somewhat small (0.674) which is indication of a bad item fit meaning that the item is perhaps not contributing in measuring the construct it self.

CFA example using SPADI data

Ann-Sophie Buchardt

11/5/2022

Data example: CFA for the SPADI data in `lavaan`

Thresholds

Factor loadings

CFA example using SPADI data

Ann-Sophie Buchardt

11/5/2022

Data example: CFA for the SPADI data in lavaan

Thresholds

Factor loadings

Data example: CFA for the SPADI data in `lavaan`