lavaan
The shoulder pain and disability index (SPADI) consists of a five-item pain subscale and an eight-item disability subscale
Roach, K. E., Budiman-Mak, E., Songsiridej, N., & Lertratanakul, Y. (1991). Development of a shoulder pain and disability index. Arthritis Care and Research: The Official Journal of the Arthritis Health Professions Association, 4(4), 143–149.
Pain items
P1 At its worst
P2 Lying on affected side
P3 Reaching for object on a high shelf
P4 Touching the back of your neck
P5 Pushing with involved arm
Disability items
D1 Washing your hair
D2 Washing your back
D3 Putting on undershirt or jumper
D4 Putting on a shirt that buttons down the front
D5 Putting on your pants
D6 Placing an object on a high shelf
D7 Carry heavy object
D8 Removing something from your back pocket
A recent validation study identified strengths and limitations not previously observed using CTT methods and concluded that the SPADI should be treated as two separate subscales and that, while the pain subscale is valid, the disability subscale is not and that clinicians should exercise caution when interpreting score changes on the disability subscale and attempt to compare their scores to age- and sex-stratified data.
Jerosch-Herold, C., Chester, R., Shepstone, L., Vincent, J. I., & MacDermid, J. C. (2017). An evaluation of the structural validity of the shoulder pain and disability index (SPADI) using the Rasch model. Quality of Life Research, 27(2), 389–400.
Data from a Danish validation study are published in a data repository
Christensen, K. B. (2018). SPADI [Data set]. https://doi.org/10.17894/ucph.9c28c17b-83f4-411d-9115-d457c46828a1
We download the data set from the home page
SPADI <- read.csv(url("https://erda.ku.dk/public/archives/bacc560d26b01f7a65e77a9712a92e86/SPADI.csv"))
We will look only at the eight disability items in what follows and we omit observations with missing values and omit observations with missing values.
small <- SPADI[,9:16]
NM <- small[complete.cases(small),]
We use the cfa()
from the lavaan
package to
fit a CFA model to the SPADI data.
library('lavaan')
## Warning: pakke 'lavaan' blev bygget under R version 4.1.2
SPADI.model1 <- '
# loadings
F =~ D1 + D2 + D3 + D4 + D5 + D6 + D7 + D8
'
fit1 <- cfa(
SPADI.model1,
data=NM,
ordered = names(NM)
)
fitMeasures(fit1, fit.measures = c("chisq.scaled",
"df.scaled",
"pvalue.scaled",
"rmsea.scaled",
"rmsea.ci.lower.scaled",
"rmsea.ci.upper.scaled")
)
## chisq.scaled df.scaled pvalue.scaled
## 68.157 20.000 0.000
## rmsea.scaled rmsea.ci.lower.scaled rmsea.ci.upper.scaled
## 0.106 0.079 0.134
We observe that \(p \ll 0.05\) and therefore, we reject the null hypothesis that the proposed model fits as well as the unrestricted model: Discrepancy between model and data is significant.
We compute and compare the modification indices which measure how much better the model-fit would become if each fixed parameter was set free to be estimated.
MI <- modificationindices(fit1)
MI
## lhs op rhs mi epc sepc.lv sepc.all sepc.nox
## 75 D1 ~~ D2 4.074 0.081 0.081 0.268 0.268
## 76 D1 ~~ D3 0.292 -0.022 -0.022 -0.079 -0.079
## 77 D1 ~~ D4 2.748 -0.079 -0.079 -0.240 -0.240
## 78 D1 ~~ D5 0.403 -0.029 -0.029 -0.096 -0.096
## 79 D1 ~~ D6 0.352 0.025 0.025 0.080 0.080
## 80 D1 ~~ D7 1.897 0.069 0.069 0.169 0.169
## 81 D1 ~~ D8 1.304 -0.052 -0.052 -0.176 -0.176
## 82 D2 ~~ D3 0.024 0.006 0.006 0.023 0.023
## 83 D2 ~~ D4 3.571 -0.104 -0.104 -0.312 -0.312
## 84 D2 ~~ D5 4.295 -0.118 -0.118 -0.378 -0.378
## 85 D2 ~~ D6 0.107 0.014 0.014 0.044 0.044
## 86 D2 ~~ D7 0.086 -0.015 -0.015 -0.037 -0.037
## 87 D2 ~~ D8 0.246 0.021 0.021 0.069 0.069
## 88 D3 ~~ D4 0.070 0.011 0.011 0.037 0.037
## 89 D3 ~~ D5 0.205 -0.020 -0.020 -0.071 -0.071
## 90 D3 ~~ D6 1.751 0.054 0.054 0.184 0.184
## 91 D3 ~~ D7 2.152 -0.079 -0.079 -0.210 -0.210
## 92 D3 ~~ D8 0.029 0.007 0.007 0.025 0.025
## 93 D4 ~~ D5 13.436 0.148 0.148 0.438 0.438
## 94 D4 ~~ D6 1.777 -0.077 -0.077 -0.221 -0.221
## 95 D4 ~~ D7 0.782 -0.054 -0.054 -0.121 -0.121
## 96 D4 ~~ D8 0.107 0.015 0.015 0.047 0.047
## 97 D5 ~~ D6 1.663 -0.072 -0.072 -0.222 -0.222
## 98 D5 ~~ D7 0.010 -0.006 -0.006 -0.015 -0.015
## 99 D5 ~~ D8 0.015 -0.006 -0.006 -0.018 -0.018
## 100 D6 ~~ D7 0.006 0.004 0.004 0.009 0.009
## 101 D6 ~~ D8 0.283 -0.025 -0.025 -0.080 -0.080
## 102 D7 ~~ D8 1.054 0.053 0.053 0.131 0.131
We observe that the pair D4
(Putting on a shirt that
buttons down the front) and D5
(Putting on your pants) has
the highest MI. Since it makes sense that these two items have
correlated error terms (local dependence) the fit of the model can be
improved by adding the correlated error terms.
SPADI.model2 <- '
# loadings
F =~ D1 + D2 + D3 + D4 + D5 + D6 + D7 + D8
# correlated error terms (local dependence)
D4 ~~ D5
'
fit2 <- cfa(
SPADI.model2,
data=NM,
ordered = names(NM)
)
fitMeasures(fit2, fit.measures = c("chisq.scaled",
"df.scaled",
"pvalue.scaled",
"rmsea.scaled",
"rmsea.ci.lower.scaled",
"rmsea.ci.upper.scaled")
)
## chisq.scaled df.scaled pvalue.scaled
## 41.032 19.000 0.002
## rmsea.scaled rmsea.ci.lower.scaled rmsea.ci.upper.scaled
## 0.073 0.042 0.104
We observe that the fit of model is improved by adding the correlated error terms, but discrepancy between model and data is significant as \(p < 0.05\).
We can add more correlated error terms corresponding to the pairs with the highest MIs.
SPADI.model3 <- '
# loadings
F =~ D1 + D2 + D3 + D4 + D5 + D6 + D7 + D8
# correlated error terms (local dependence)
D4 ~~ D5
D2 ~~ D5
D1 ~~ D2
D2 ~~ D4
D1 ~~ D4
'
fit3 <- cfa(
SPADI.model3,
data=NM,
ordered = names(NM)
)
fitMeasures(fit3, fit.measures = c("chisq.scaled",
"df.scaled",
"pvalue.scaled",
"rmsea.scaled",
"rmsea.ci.lower.scaled",
"rmsea.ci.upper.scaled")
)
## chisq.scaled df.scaled pvalue.scaled
## 24.634 15.000 0.055
## rmsea.scaled rmsea.ci.lower.scaled rmsea.ci.upper.scaled
## 0.055 0.000 0.092
Here, we have obtained model fit with \(p > 0.05\), but the model is difficult to understand.
Since we assume that the ordinal responses are generated by an underlying normal distribution the question arising is: What are the cut points, i.e., thresholds \(\tau_1,\ldots,\tau_5\), for the disability degree for a given item (D1\(,\ldots,\)D8), above which an individual scores \(1,\ldots,5\) on this item, respectively.
We can extract the estimated thresholds using the
lavInspect()
function from the lavaan
package.
tau <- lavInspect(fit2, "sampstat")$th
We can illustrate this setup for the estimated thresholds for each item as follows: `
x <- seq(-4, 4, length=100)
hx <- dnorm(x)
for(j in 1:8) {
plot(x, hx, type = "n", xlab = "x", ylab = "",
main = paste0("D", j), yaxt = "n", xaxt = "n", bty = "n")
lines(x, hx)
for(i in grep(paste0("D", j), names(tau))) {
segments(x0 = tau[i], y0 = -.2, x1 = tau[i], y1 = dnorm(tau[i]))
}
axis(c(-4,as.expression(sapply(1:5, function(x) bquote(tau["D"*.(x)]))),4),
side = 1, at = c(-4, tau[grep(paste0("D", j), names(tau))], 4))
}
We can extract the estimated factor loadings (\(\lambda\)) using the inspect()
function from the lavaan
package.
lambda <- inspect(fit2, what = "std")$lambda
We can visualize the CFA path diagram with standardized estimates:
We observe that the factor loading for item 7 is somewhat small (0.674) which is indication of a bad item fit meaning that the item is perhaps not contributing in measuring the construct it self.