Getting ready

If the packages are not already installed on your computer use the function install.packages(). Then we include the libraries:

library(foreign)
library(eRm)
Warning: pakke 'eRm' blev bygget under R version 4.1.2
library(iarm)
Warning: pakke 'iarm' blev bygget under R version 4.1.2
library(ggcorrplot)
Warning: pakke 'ggcorrplot' blev bygget under R version 4.1.3
library(DT)
Warning: pakke 'DT' blev bygget under R version 4.1.3

We use the AMTS data which is included as a part of the iarm package (Müller, 2020a). The AMTS (Abbreviated Mental Test Score) consists of ten dichotomous knowledge items (scored 0 for incorrect or 1 for correct). It is used to screen patients with dementia. The included data set contains data for 197 respondents. There are 13 variables: first three person variables (patientid, agegroup, gender), then the 10 AMTS items. We load the amts data from the iarm package using the data() function:

data("amts")

get a list of the names of the variables using the function names():

names(amts)
 [1] "id"       "agegrp"   "sex"      "age"      "time"     "address" 
 [7] "name"     "year"     "dob"      "month"    "firstww"  "monarch" 
[13] "countbac"

and make an object containing only items

items.AMTS <- amts[, -c(1, 2, 3)]

Item Fit

Item fit statistics are the most commonly reported feature in Rasch validation studies. The most widely reported item fit statitsics are

  • the infit and outfit mean square item fit statistics (Winsteps software)

  • the fit residual, Chi-square and ANOVA item fit statistics (RUMM2030 software).

Item fit statistics test the assumption of monotonicity, i.e., that the expected value of an item score increases when the values of the latent variable increases. We illustrate the evaluation of item fit by calculating two different item fit statistics and by graphical evaluation.

Infit and Outfit

Infit and outfit are mean-square residual summary statistics that range from zero to infinity. The expected value, under the null hypothesis that data fit the Rasch model, is one. Values greater than one indicate under-discrimination while values below one indicate over-discrimination both of which violate the Rasch model assumptions.

Now we are able to fit a dichotomous Rasch model using the RM() function:

RM.AMTS <- RM(items.AMTS)

We compute infit and outfit

out_infit(RM.AMTS)

         Outfit se    pvalue padj  sig  Infit se    pvalue padj  sig  
age      0.641  0.267 0.178  0.71       0.868 0.126 0.297  0.989      
time     1.102  0.19  0.593  1          1.03  0.108 0.781  1          
address  1.145  0.281 0.607  1          1.12  0.091 0.189  0.947      
name     0.811  0.267 0.479  1          0.915 0.126 0.501  1          
year     0.716  0.184 0.123  0.615      0.881 0.107 0.264  0.989      
dob      0.82   0.493 0.715  1          1.092 0.182 0.611  1          
month    0.59   0.171 0.017  0.166      0.677 0.104 0.002  0.028  *   
firstww  1.328  0.21  0.117  0.615      1.202 0.112 0.072  0.482      
monarch  0.888  0.182 0.538  1          0.964 0.106 0.733  1          
countbac 1.446  0.171 0.009  0.166      1.31  0.104 0.003  0.028  *   

P value adjustment: BH

The output shows ’*’ whenever a p-value is smaller than 5%. So these item fit statistics flag the two items month and countbac as anomalies. It is questionable if we can trust the asymptotic distribution used for computation of p-values (Christensen & Kreiner, 2013; Müller, 2020b) and for this reason it is preferable to use bootstrap (Efron & Tibshirani, 1993). The boot_fit() function computes bootstrapping p-values for outfit and infit statistics:

set.seed(28)
boot_fit(RM.AMTS, 350)

 Number of bootstrap samples:  50, 100, 150, 200, 250, 300, 350, 
 

         Outfit pvalue padj  sig    Infit pvalue padj  sig   
age      0.641  0.111  0.278        0.71  0.209  0.409       
time     1.102  0.57   0.67         1     0.779  0.779       
address  1.145  0.514  0.642        1     0.045  0.182       
name     0.811  0.373  0.574        1     0.42   0.6         
year     0.716  0.082  0.235        0.615 0.225  0.409       
dob      0.82   0.719  0.76         1     0.368  0.574       
month    0.59   0      0     ***    0.166 0      0     ***   
firstww  1.328  0.159  0.354        0.615 0.08   0.235       
monarch  0.888  0.513  0.642        1     0.722  0.76        
countbac 1.446  0.024  0.118        0.166 0.006  0.039  *    

P value adjustment: BH

Notice that we use the set.seed() function to choose a seed for generating random numbers. Here you can choose any number you like, if you do not specify a seed, results will change slightly if you re-run your code. The p-values obtained via bootstrapping are more trustworthy. For large data sets (data sets with more than 1000 respondents) the two ways of computing p-values yield quite different results. Whenever the two methods yield very different p-values it is important to remember that the bootstrap p-values are always more trustworthy.

Item-Restscore Correlation

Item-total correlations and item-restscore correlations (called ‘corrected’ item-total correlations) are routinely reported in classical test theory. Kreiner (2011) used the simple structure in the Rasch model to compute the expected values of the item-restscore correlation. We compute the item-restscore correlation using the item_restscore() function.

item_restscore(RM.AMTS)
         observed expected se     pvalue padj.BH sig 
age      0.8376   0.7511   0.0441 0.0498 0.1295      
time     0.7087   0.7366   0.0660 0.6723 0.7470      
address  0.6157   0.6874   0.0711 0.3127 0.5212      
name     0.7973   0.7511   0.0544 0.3961 0.5455      
year     0.8189   0.7350   0.0424 0.0476 0.1295      
dob      0.7884   0.7812   0.0640 0.9115 0.9115      
month    0.8842   0.7303   0.0377 0.0000 0.0004  *** 
firstww  0.6170   0.7409   0.0779 0.1117 0.2235      
monarch  0.7736   0.7342   0.0507 0.4364 0.5455      
countbac 0.5869   0.7303   0.0738 0.0518 0.1295      

Summary so far

Item 7 looks bad. Two trustworthy tests agree abut this. The standard implemenation of infit and outfit does not give us a wrong result in this case (sample size $$200)

Visual Analysis

A nice way to illustrate item fit is to make an item fit plot. In the current implementation of Rasch models in R this only works for complete case analysis:

items.AMTS.complete <- items.AMTS[complete.cases(items.AMTS), ]
RM.AMTS.complete <- RM(items.AMTS.complete)

As an example, we look at the item month.

plotICC(RM.AMTS.complete, item.subset = 7, empICC = list("raw"), empCI = list(gamma = 0.95,
    col = 2))

This plot shows the item characteristic curve for the item 7 (‘What is the actual month’). The x-axis show the latent continuum; the y-axis the probability of a correct response. The line describes the probability of a correctly as a function of the underlying latent construct. The difficulty of the item is where the probability of a correct response equals 0.5.

The function option empICC = list("raw") also plots the relative frequencies of positive responses for each rawscore group at the position of the corresponding ability level. The blue dotted lines represent the 95% confidence level for the relative frequencies and are shown if empCI = is specified. Refer to plotICC for more details. The item overfits the expectation, as the increase in the observed response probability (the dots) is steeper than the expected response probability line.

When an item fits, the observed relative response frequencies tally with the line of expected corrected reponse probabilities as seen for item 2:

plotICC(RM.AMTS.complete, item.subset = c(2), empICC = list("raw"), empCI = list(gamma = 0.95,
    col = 2))

References

Christensen, K. B., & Kreiner, S. (2013). Item Fit Statistics. In Rasch Models in Health (pp. 83–104). John Wiley & Sons, Inc. https://doi.org/10.1002/9781118574454.ch5

Efron, B., & Tibshirani, R. (1993). An Introcuction to the Bootstrap. Chapman & Hall.

Müller, M. (2020a). Müller, M.: iarm: Item Analysis in Rasch Models. R package version 0.4.1 (2020) https://cran.r-project.org/web/packages/iarm/index.html

Müller, M. (2020b). Item fit statistics for Rasch analysis: can we trust them? Journal of Statistical Distributions and Applications, 7(1), 5. https://doi.org/10.1186/s40488-020-00108-7