If the packages are not ready installed on your computer use the
function install.packages()
, once you have done this
include the libraries:
library(iarm)
library(corrplot)
library(readr)
library(DT)
The Hospital Anxiety and Depression Scale (HADS) was developed for
the measurement of anxiety and depression by Zigmond & Snaith. It is
a self-administered scale consisting of 14 polytomous items scored as
two 7-item subscales for anxiety and depression. Originally developed
for use in a hospital setting, it is now widely used across all
settings, including screening in normal populations. We read in the data
using the read_csv()
function from the readr
package.
myfile <- "https://raw.githubusercontent.com/ERRTG/ERRTG.github.io/master/HADS.csv"
HADS <- as.data.frame(read_csv(myfile))
We create a new data frame HADS.items
containing the
items.
HADS.items <- HADS[, c("AHADS1", "AHADS2", "AHADS3", "AHADS4", "AHADS5", "AHADS6",
"AHADS7", "AHADS8", "AHADS9", "AHADS10", "AHADS11", "AHADS12", "AHADS13", "AHADS14")]
dim(HADS)
[1] 296 14
colnames(HADS)
[1] "AHADS1" "AHADS2" "AHADS3" "AHADS4" "AHADS5" "AHADS6" "AHADS7"
[8] "AHADS8" "AHADS9" "AHADS10" "AHADS11" "AHADS12" "AHADS13" "AHADS14"
Uni-dimensionality is a fundamental assumption in all item response
theory (IRT) models. We look at how it can be evaluated in the Rasch
model. Since we have polytomous items we use the partial credit model
(PCM) to calculate the residuals. First, we use the PCM()
function from the eRm
package to compute the parameter
estimates.
HADS_PCM <- PCM(HADS)
Then, we use the person.parameter()
function to estimate
the person parameters.
HADS_PP <- person.parameter(HADS_PCM)
Finaly, we use the residuals()
function to extract the
residuals.
HADS_res <- residuals(HADS_PP)
We look at the empirical correlation matrix of the residuals. The
method we use for detecting the structure is principal component
analysis (PCA). We use the cor()
function to compute the
correlation matrix.
HADS_cor <- cor(HADS_res, use = "pairwise.complete", method = "pearson")
We perform principal components analysis on the matrix using the
prcomp()
function. We ask the function to standardise the
data for us (scale. = TRUE
) for later use.
HADS_PCA <- prcomp(HADS_cor, center = TRUE, scale. = TRUE)
A common method for determining the number of principal components
(PCs) to be retained is a graphical representation known as a “scree
plot”. A scree plot shows the eigenvalues for each individual component
on the y-axis and the number of PCs on the x-axis. A scree plot always
displays a downward curve and the “elbow” of the graph where the curve
seems to level off indicates the components to be retained as
significant (to the left of the elbow), as these explain much of the
overall variability. Since we standardised our data and we now have the
corresponding eigenvalues of each PC we can use these to draw a boundary
for us. Since an eigenvalues <1 would mean that the component
actually explains less than a single explanatory variable we would like
to discard those. This is called the Kaiser-Guttman criterion. If our
data is well suited for PCA we should be able to discard these
components while retaining at least 70–80% of cumulative variance. We
use the screeplot()
function and add a line corresponding
to an eigenvalue of 1 to indicate the Kaiser-Guttman criterion.
screeplot(HADS_PCA, type = "lines", main = "Screeplot")
abline(h = 1, lty = 2, col = 3)
We notice is that the first four components have an eigenvalue \(>1\). We can illustrate the variance explained by these four components as follows.
cumpro <- cumsum(HADS_PCA$sdev^2/sum(HADS_PCA$sdev^2))
plot(cumpro[0:15], xlab = "PC #", ylab = "Amount of explained variance", main = "Cumulative variance plot",
bty = "n")
abline(v = 4, col = 1, lty = 5)
segments(x0 = 0, x1 = 4, y0 = cumpro[4], y1 = cumpro[4], col = 3, lty = 5)
We observe that the first 4 components explain less than 70% of variance. This is not optimal as we ``loose’’ more than 30% of variance explained if we reduce dimensionality from 14 to 4. The Kaiser-Guttman criterion often overestimates the number of dimensions. An alternative is the Martin-Löf’s likelihood ratio test described below.
We look at the loadings ordered according to the first column to see the positive and negative loadings.
HADS_Comp <- round(HADS_PCA$rotation, 3)
HADS_Comp[order(HADS_Comp[, 1]), ]
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
AHADS13 -0.414 0.067 0.046 0.049 0.123 -0.068 -0.029 0.081 0.012 -0.436
AHADS9 -0.400 -0.005 0.209 0.073 0.070 0.044 0.051 0.282 -0.074 -0.178
AHADS3 -0.361 0.101 0.223 -0.015 0.218 -0.354 -0.255 -0.223 -0.059 0.205
AHADS5 -0.226 0.090 -0.255 0.356 -0.526 -0.198 0.522 -0.055 0.144 0.254
AHADS11 -0.187 -0.416 -0.075 -0.286 0.291 0.479 0.179 -0.007 0.234 0.425
AHADS1 -0.116 -0.007 -0.540 -0.065 -0.352 0.256 -0.536 0.113 -0.346 0.076
AHADS6 0.093 0.480 -0.367 -0.023 0.265 0.073 -0.061 -0.127 0.173 -0.218
AHADS10 0.136 0.219 0.461 0.049 -0.409 0.413 -0.254 -0.244 0.434 -0.070
AHADS14 0.151 0.031 0.177 -0.729 -0.315 -0.343 0.080 -0.019 -0.194 0.045
AHADS4 0.206 0.458 -0.049 -0.103 0.228 0.256 0.415 -0.158 -0.328 0.029
AHADS8 0.226 -0.390 0.080 0.357 0.062 -0.026 -0.048 -0.575 -0.396 -0.071
AHADS7 0.255 -0.291 -0.355 -0.118 0.084 -0.310 -0.018 -0.096 0.489 -0.294
AHADS12 0.326 0.196 0.071 0.265 0.210 -0.255 -0.264 0.334 0.104 0.519
AHADS2 0.353 -0.200 0.153 0.150 -0.057 0.114 0.143 0.543 -0.162 -0.262
PC11 PC12 PC13 PC14
AHADS13 -0.048 -0.266 -0.675 0.272
AHADS9 -0.078 -0.366 0.675 0.263
AHADS3 -0.064 0.600 0.071 0.326
AHADS5 0.090 0.033 -0.028 0.251
AHADS11 0.149 -0.026 -0.107 0.298
AHADS1 -0.160 0.026 0.010 0.222
AHADS6 0.621 0.016 0.165 0.198
AHADS10 -0.085 -0.004 -0.005 0.239
AHADS14 0.211 -0.205 -0.024 0.251
AHADS4 -0.506 0.027 -0.017 0.234
AHADS8 0.178 -0.238 0.031 0.272
AHADS7 -0.416 0.011 0.153 0.274
AHADS12 -0.047 -0.361 -0.130 0.244
AHADS2 0.177 0.449 -0.057 0.356
It looks like the odd item numbers are all loading negatively except
for number 7 and the even item numbers all loading positively. We create
a vector (splitvec
) of allocations of items to the two
groups.
splitvec <- rep(1, 14)
splitvec[c(1, 3, 5, 9, 11, 13)] <- 0
We plot the first and second component against each other. We add labels to identify the item loading locations and colour the items according to the two groups.
plot(HADS_PCA$rotation[, 1], HADS_PCA$rotation[, 2], pch = 19, bty = "n", col = ifelse(splitvec ==
0, 2, 3), asp = 1, xlim = range(HADS_PCA$rotation[, 1]), ylim = range(HADS_PCA$rotation[,
2]) * 1.3, xlab = "1st component", ylab = "2nd component", main = "HADS Item PCA-Loading")
labs <- gsub("AHADS", "", rownames(HADS_PCA$rotation))
text(HADS_PCA$rotation[, 1], HADS_PCA$rotation[, 2], labs, pos = 1)
We assume that there are at least two dimensions and test for
unidimensionality using the Martin-Löf’s likelihood ratio test via the
MLoef()
function from the eRm
package.
MLoef(HADS_PCM, splitcr = splitvec)
Martin-Loef-Test (split criterion: user-defined)
LR-value: 425.622
Chi-square df: 431
p-value: 0.564
The null hypothesis is that the underlying latent variable is unidimensional and since \(P>0.05\) this null hypothesis is not rejected. We conclude that the 14 items all measurem the same unidimensional latent variable. However, the Martin-Lôf test does not have a lot of statistical power - so this may be a type II error.