Dimensionality tests in HADS data

Getting ready

If the packages are not ready installed on your computer use the function install.packages(), once you have done this include the libraries:

library(iarm)
library(corrplot)
library(readr)
library(DT)

The Hospital Anxiety and Depression Scale (HADS) was developed for the measurement of anxiety and depression by Zigmond & Snaith. It is a self-administered scale consisting of 14 polytomous items scored as two 7-item subscales for anxiety and depression. Originally developed for use in a hospital setting, it is now widely used across all settings, including screening in normal populations. We read in the data using the read_csv() function from the readr package.

myfile <- "https://raw.githubusercontent.com/ERRTG/ERRTG.github.io/master/HADS.csv"
HADS <- as.data.frame(read_csv(myfile))

We create a new data frame HADS.items containing the items.

HADS.items <- HADS[, c("AHADS1", "AHADS2", "AHADS3", "AHADS4", "AHADS5", "AHADS6",
    "AHADS7", "AHADS8", "AHADS9", "AHADS10", "AHADS11", "AHADS12", "AHADS13", "AHADS14")]
dim(HADS)

[1] 296  14

colnames(HADS)

 [1] "AHADS1"  "AHADS2"  "AHADS3"  "AHADS4"  "AHADS5"  "AHADS6"  "AHADS7" 
 [8] "AHADS8"  "AHADS9"  "AHADS10" "AHADS11" "AHADS12" "AHADS13" "AHADS14"

Uni-dimensionality, evaluation based on PCA of residuals

Uni-dimensionality is a fundamental assumption in all item response theory (IRT) models. We look at how it can be evaluated in the Rasch model. Since we have polytomous items we use the partial credit model (PCM) to calculate the residuals. First, we use the PCM() function from the eRm package to compute the parameter estimates.

HADS_PCM <- PCM(HADS)

Then, we use the person.parameter() function to estimate the person parameters.

HADS_PP <- person.parameter(HADS_PCM)

Finaly, we use the residuals() function to extract the residuals.

HADS_res <- residuals(HADS_PP)

We look at the empirical correlation matrix of the residuals. The method we use for detecting the structure is principal component analysis (PCA). We use the cor() function to compute the correlation matrix.

HADS_cor <- cor(HADS_res, use = "pairwise.complete", method = "pearson")

We perform principal components analysis on the matrix using the prcomp() function. We ask the function to standardise the data for us (scale. = TRUE) for later use.

HADS_PCA <- prcomp(HADS_cor, center = TRUE, scale. = TRUE)

A common method for determining the number of principal components (PCs) to be retained is a graphical representation known as a “scree plot”. A scree plot shows the eigenvalues for each individual component on the y-axis and the number of PCs on the x-axis. A scree plot always displays a downward curve and the “elbow” of the graph where the curve seems to level off indicates the components to be retained as significant (to the left of the elbow), as these explain much of the overall variability. Since we standardised our data and we now have the corresponding eigenvalues of each PC we can use these to draw a boundary for us. Since an eigenvalues <1 would mean that the component actually explains less than a single explanatory variable we would like to discard those. This is called the Kaiser-Guttman criterion. If our data is well suited for PCA we should be able to discard these components while retaining at least 70–80% of cumulative variance. We use the screeplot() function and add a line corresponding to an eigenvalue of 1 to indicate the Kaiser-Guttman criterion.

screeplot(HADS_PCA, type = "lines", main = "Screeplot")
abline(h = 1, lty = 2, col = 3)

We notice is that the first four components have an eigenvalue \(>1\). We can illustrate the variance explained by these four components as follows.

cumpro <- cumsum(HADS_PCA$sdev^2/sum(HADS_PCA$sdev^2))
plot(cumpro[0:15], xlab = "PC #", ylab = "Amount of explained variance", main = "Cumulative variance plot",
    bty = "n")
abline(v = 4, col = 1, lty = 5)
segments(x0 = 0, x1 = 4, y0 = cumpro[4], y1 = cumpro[4], col = 3, lty = 5)

We observe that the first 4 components explain less than 70% of variance. This is not optimal as we ``loose’’ more than 30% of variance explained if we reduce dimensionality from 14 to 4. The Kaiser-Guttman criterion often overestimates the number of dimensions. An alternative is the Martin-Löf’s likelihood ratio test described below.

Uni-dimensionality, evaluation based on the Martin-Löf test

We look at the loadings ordered according to the first column to see the positive and negative loadings.

HADS_Comp <- round(HADS_PCA$rotation, 3)
HADS_Comp[order(HADS_Comp[, 1]), ]

           PC1    PC2    PC3    PC4    PC5    PC6    PC7    PC8    PC9   PC10
AHADS13 -0.414  0.067  0.046  0.049  0.123 -0.068 -0.029  0.081  0.012 -0.436
AHADS9  -0.400 -0.005  0.209  0.073  0.070  0.044  0.051  0.282 -0.074 -0.178
AHADS3  -0.361  0.101  0.223 -0.015  0.218 -0.354 -0.255 -0.223 -0.059  0.205
AHADS5  -0.226  0.090 -0.255  0.356 -0.526 -0.198  0.522 -0.055  0.144  0.254
AHADS11 -0.187 -0.416 -0.075 -0.286  0.291  0.479  0.179 -0.007  0.234  0.425
AHADS1  -0.116 -0.007 -0.540 -0.065 -0.352  0.256 -0.536  0.113 -0.346  0.076
AHADS6   0.093  0.480 -0.367 -0.023  0.265  0.073 -0.061 -0.127  0.173 -0.218
AHADS10  0.136  0.219  0.461  0.049 -0.409  0.413 -0.254 -0.244  0.434 -0.070
AHADS14  0.151  0.031  0.177 -0.729 -0.315 -0.343  0.080 -0.019 -0.194  0.045
AHADS4   0.206  0.458 -0.049 -0.103  0.228  0.256  0.415 -0.158 -0.328  0.029
AHADS8   0.226 -0.390  0.080  0.357  0.062 -0.026 -0.048 -0.575 -0.396 -0.071
AHADS7   0.255 -0.291 -0.355 -0.118  0.084 -0.310 -0.018 -0.096  0.489 -0.294
AHADS12  0.326  0.196  0.071  0.265  0.210 -0.255 -0.264  0.334  0.104  0.519
AHADS2   0.353 -0.200  0.153  0.150 -0.057  0.114  0.143  0.543 -0.162 -0.262
          PC11   PC12   PC13  PC14
AHADS13 -0.048 -0.266 -0.675 0.272
AHADS9  -0.078 -0.366  0.675 0.263
AHADS3  -0.064  0.600  0.071 0.326
AHADS5   0.090  0.033 -0.028 0.251
AHADS11  0.149 -0.026 -0.107 0.298
AHADS1  -0.160  0.026  0.010 0.222
AHADS6   0.621  0.016  0.165 0.198
AHADS10 -0.085 -0.004 -0.005 0.239
AHADS14  0.211 -0.205 -0.024 0.251
AHADS4  -0.506  0.027 -0.017 0.234
AHADS8   0.178 -0.238  0.031 0.272
AHADS7  -0.416  0.011  0.153 0.274
AHADS12 -0.047 -0.361 -0.130 0.244
AHADS2   0.177  0.449 -0.057 0.356

It looks like the odd item numbers are all loading negatively except for number 7 and the even item numbers all loading positively. We create a vector (splitvec) of allocations of items to the two groups.

splitvec <- rep(1, 14)
splitvec[c(1, 3, 5, 9, 11, 13)] <- 0

We plot the first and second component against each other. We add labels to identify the item loading locations and colour the items according to the two groups.

plot(HADS_PCA$rotation[, 1], HADS_PCA$rotation[, 2], pch = 19, bty = "n", col = ifelse(splitvec ==
    0, 2, 3), asp = 1, xlim = range(HADS_PCA$rotation[, 1]), ylim = range(HADS_PCA$rotation[,
    2]) * 1.3, xlab = "1st component", ylab = "2nd component", main = "HADS Item PCA-Loading")
labs <- gsub("AHADS", "", rownames(HADS_PCA$rotation))
text(HADS_PCA$rotation[, 1], HADS_PCA$rotation[, 2], labs, pos = 1)

We assume that there are at least two dimensions and test for unidimensionality using the Martin-Löf’s likelihood ratio test via the MLoef() function from the eRm package.

MLoef(HADS_PCM, splitcr = splitvec)


Martin-Loef-Test (split criterion: user-defined)
LR-value: 425.622 
Chi-square df: 431 
p-value: 0.564

The null hypothesis is that the underlying latent variable is unidimensional and since \(P>0.05\) this null hypothesis is not rejected. We conclude that the 14 items all measurem the same unidimensional latent variable. However, the Martin-Lôf test does not have a lot of statistical power - so this may be a type II error.

Dimensionality tests in HADS data

18 May 2022

Getting ready

Uni-dimensionality, evaluation based on PCA of residuals

Uni-dimensionality, evaluation based on the Martin-Löf test