If the packages are not already installed on your computer use the
function install.packages()
. Then we include the
libraries:
library(foreign)
library(eRm)
Warning: pakke 'eRm' blev bygget under R version 4.1.2
library(iarm)
Warning: pakke 'iarm' blev bygget under R version 4.1.2
library(ggcorrplot)
Warning: pakke 'ggcorrplot' blev bygget under R version 4.1.3
library(DT)
Warning: pakke 'DT' blev bygget under R version 4.1.3
We use the AMTS data which is included as a part of the
iarm
package (Müller, 2020a). The AMTS (Abbreviated Mental
Test Score) consists of ten dichotomous knowledge items (scored 0 for
incorrect or 1 for correct). It is used to screen patients with
dementia. The included data set contains data for 197 respondents. There
are 13 variables: first three person variables (patientid, agegroup,
gender), then the 10 AMTS items. We load the amts
data from
the iarm
package using the data()
function:
data("amts")
get a list of the names of the variables using the function
names()
:
names(amts)
[1] "id" "agegrp" "sex" "age" "time" "address"
[7] "name" "year" "dob" "month" "firstww" "monarch"
[13] "countbac"
and make an object containing only items
items.AMTS <- amts[, -c(1, 2, 3)]
Item fit statistics are the most commonly reported feature in Rasch validation studies. The most widely reported item fit statitsics are
the infit and outfit mean square item fit statistics (Winsteps software)
the fit residual, Chi-square and ANOVA item fit statistics (RUMM2030 software).
Item fit statistics test the assumption of monotonicity, i.e., that the expected value of an item score increases when the values of the latent variable increases. We illustrate the evaluation of item fit by calculating two different item fit statistics and by graphical evaluation.
Infit and outfit are mean-square residual summary statistics that range from zero to infinity. The expected value, under the null hypothesis that data fit the Rasch model, is one. Values greater than one indicate under-discrimination while values below one indicate over-discrimination both of which violate the Rasch model assumptions.
Now we are able to fit a dichotomous Rasch model using the
RM()
function:
RM.AMTS <- RM(items.AMTS)
We compute infit and outfit
out_infit(RM.AMTS)
Outfit se pvalue padj sig Infit se pvalue padj sig
age 0.641 0.267 0.178 0.71 0.868 0.126 0.297 0.989
time 1.102 0.19 0.593 1 1.03 0.108 0.781 1
address 1.145 0.281 0.607 1 1.12 0.091 0.189 0.947
name 0.811 0.267 0.479 1 0.915 0.126 0.501 1
year 0.716 0.184 0.123 0.615 0.881 0.107 0.264 0.989
dob 0.82 0.493 0.715 1 1.092 0.182 0.611 1
month 0.59 0.171 0.017 0.166 0.677 0.104 0.002 0.028 *
firstww 1.328 0.21 0.117 0.615 1.202 0.112 0.072 0.482
monarch 0.888 0.182 0.538 1 0.964 0.106 0.733 1
countbac 1.446 0.171 0.009 0.166 1.31 0.104 0.003 0.028 *
P value adjustment: BH
The output shows ’*’ whenever a p-value is smaller than 5%. So these
item fit statistics flag the two items month
and
countbac
as anomalies. It is questionable if we can trust
the asymptotic distribution used for computation of p-values
(Christensen & Kreiner, 2013; Müller, 2020b) and for this reason it
is preferable to use bootstrap (Efron & Tibshirani, 1993). The
boot_fit()
function computes bootstrapping p-values for
outfit and infit statistics:
set.seed(28)
boot_fit(RM.AMTS, 350)
Number of bootstrap samples: 50, 100, 150, 200, 250, 300, 350,
Outfit pvalue padj sig Infit pvalue padj sig
age 0.641 0.111 0.278 0.71 0.209 0.409
time 1.102 0.57 0.67 1 0.779 0.779
address 1.145 0.514 0.642 1 0.045 0.182
name 0.811 0.373 0.574 1 0.42 0.6
year 0.716 0.082 0.235 0.615 0.225 0.409
dob 0.82 0.719 0.76 1 0.368 0.574
month 0.59 0 0 *** 0.166 0 0 ***
firstww 1.328 0.159 0.354 0.615 0.08 0.235
monarch 0.888 0.513 0.642 1 0.722 0.76
countbac 1.446 0.024 0.118 0.166 0.006 0.039 *
P value adjustment: BH
Notice that we use the set.seed()
function to choose a
seed
for generating random numbers. Here you can choose any
number you like, if you do not specify a seed, results will change
slightly if you re-run your code. The p-values obtained via
bootstrapping are more trustworthy. For large data sets (data sets with
more than 1000 respondents) the two ways of computing p-values yield
quite different results. Whenever the two methods yield very different
p-values it is important to remember that the bootstrap p-values are
always more trustworthy.
Item-total correlations and item-restscore correlations (called
‘corrected’ item-total correlations) are routinely reported in classical
test theory. Kreiner (2011) used the simple structure in the Rasch model
to compute the expected values of the item-restscore correlation. We
compute the item-restscore correlation using the
item_restscore()
function.
item_restscore(RM.AMTS)
observed expected se pvalue padj.BH sig
age 0.8376 0.7511 0.0441 0.0498 0.1295
time 0.7087 0.7366 0.0660 0.6723 0.7470
address 0.6157 0.6874 0.0711 0.3127 0.5212
name 0.7973 0.7511 0.0544 0.3961 0.5455
year 0.8189 0.7350 0.0424 0.0476 0.1295
dob 0.7884 0.7812 0.0640 0.9115 0.9115
month 0.8842 0.7303 0.0377 0.0000 0.0004 ***
firstww 0.6170 0.7409 0.0779 0.1117 0.2235
monarch 0.7736 0.7342 0.0507 0.4364 0.5455
countbac 0.5869 0.7303 0.0738 0.0518 0.1295
Item 7 looks bad. Two trustworthy tests agree abut this. The standard implemenation of infit and outfit does not give us a wrong result in this case (sample size $$200)
A nice way to illustrate item fit is to make an item fit plot. In the current implementation of Rasch models in R this only works for complete case analysis:
items.AMTS.complete <- items.AMTS[complete.cases(items.AMTS), ]
RM.AMTS.complete <- RM(items.AMTS.complete)
As an example, we look at the item month
.
plotICC(RM.AMTS.complete, item.subset = 7, empICC = list("raw"), empCI = list(gamma = 0.95,
col = 2))
This plot shows the item characteristic curve for the item 7 (‘What is the actual month’). The x-axis show the latent continuum; the y-axis the probability of a correct response. The line describes the probability of a correctly as a function of the underlying latent construct. The difficulty of the item is where the probability of a correct response equals 0.5.
The function option empICC = list("raw")
also plots the
relative frequencies of positive responses for each rawscore group at
the position of the corresponding ability level. The blue dotted lines
represent the 95% confidence level for the relative frequencies and are
shown if empCI =
is specified. Refer to
plotICC
for more details. The item overfits the
expectation, as the increase in the observed response probability (the
dots) is steeper than the expected response probability line.
When an item fits, the observed relative response frequencies tally with the line of expected corrected reponse probabilities as seen for item 2:
plotICC(RM.AMTS.complete, item.subset = c(2), empICC = list("raw"), empCI = list(gamma = 0.95,
col = 2))
Christensen, K. B., & Kreiner, S. (2013). Item Fit Statistics. In Rasch Models in Health (pp. 83–104). John Wiley & Sons, Inc. https://doi.org/10.1002/9781118574454.ch5
Efron, B., & Tibshirani, R. (1993). An Introcuction to the Bootstrap. Chapman & Hall.
Müller, M. (2020a). Müller, M.: iarm: Item Analysis in Rasch Models. R package version 0.4.1 (2020) https://cran.r-project.org/web/packages/iarm/index.html
Müller, M. (2020b). Item fit statistics for Rasch analysis: can we trust them? Journal of Statistical Distributions and Applications, 7(1), 5. https://doi.org/10.1186/s40488-020-00108-7