If the packages are not already installed on your computer use the
function install.packages()
. Then we include the
libraries:
library(eRm)
library(iarm)
library(scales)
library(DT)
If data fit the Rasch model, information regarding the item and person parameter can be exported for further use (e.g. item difficulties and item thresholds for computer adaptive testing; person parameter for any further person-based calculations).
In the present script the R-code for the production of different useful tables, graphs and export functions regarding the item and person parameter is shown.
If we want to save all produced output (e.g. a table of the item or
person parameter) in a folder, we can indicate this. This spares us the
need to type the whole path each time we want to save output. We create
the folder and save the path to the path_output
object
below. In a Windows computer, we have to keep in mind that we have to
replace the "\"
by "/"
. In this example the
copied path was "C:\Users\User\Documents\IARM\CPH"
.
path_output <- "C:/Users/User/Documents/IARM/CPH/"
We start with the dichotomous items. Before the item parameters are estimated using the Rasch model, we might want to have a look at some useful tables, graphs and functions regarding the sum score of the assessment instrument.
For demonstration the AMTS data are used, which are included in the
iarm
package. The AMTS (Abbreviated Mental Test Score)
consists of ten dichotomous items scored 0 and 1 and is used to identify
patients with dementia. One point is given for each correct answer. The
included dataset contains the data of 197 respondents. There are 13
variables (first, the three person variables id of patient, agegroup
& gender, then the 10 AMTS items).
Before estimating the item parameters, it might be useful to have a short look at the data.
We load the amts
data from the iarm
package
using the data()
function:
data("amts")
We get a list of the names of the variables using the function
names()
:
names(amts)
[1] "id" "agegrp" "sex" "age" "time" "address"
[7] "name" "year" "dob" "month" "firstww" "monarch"
[13] "countbac"
We make an object containing only items.
items <- amts[, -c(1, 2, 3)]
This way we remove the colums 1–3 (person variables); alternatively we could write
items <- amts[, c(4:13)]
in order to select the items (which are in columns 4 to 13).
To see the number of rows (persons) and number of columns (variables
[in this case: items]) we use the dim()
function.
dim(items)
[1] 197 10
To show the first six rows of the object items
we use
the head()
function.
head(items)
age time address name year dob month firstww monarch countbac
1 1 1 1 1 1 1 1 1 1 1
2 0 0 0 0 0 0 0 0 0 1
3 0 1 0 1 0 1 1 1 0 0
4 1 1 1 1 1 1 1 1 1 1
5 1 1 0 1 1 1 1 1 1 0
6 1 1 1 1 1 1 1 1 1 0
If we are interested in having a look at the distribution of the AMTS
total score across the 197 respondents, we can, for example, produce a
histogram. First, we calculate the sum score across the ten items using
the rowSums()
function.
score <- rowSums(items)
The we create histogram of the sum score using the
hist()
function.
hist(score)
If we don’t want to analyse all items of a dataset, we can select a
subsample of items. For example, we can select items 5, 7, 8, and 9 from
the items
object.
itemsNEW <- items[, c(5, 7, 8, 9)]
Using the head()
function we show the first six rows of
the newly created object itemsNEW
.
head(itemsNEW)
year month firstww monarch
1 1 1 1 1
2 0 0 0 0
3 0 1 1 0
4 1 1 1 1
5 1 1 1 1
6 1 1 1 1
The RM-function estimates the parameters of a Rasch model for dichotomous item responses by using conditional maximum likelihood estimation (CML). For the items the item easiness parameters are estimated. Further down it is demonstrated how the item easiness parameters can be converted to item difficulty parameters.
We estimate the parameter of a Rasch model for dichotomous item
responses using the RM()
function from the eRm
package.
rmAMTS <- RM(items)
The RM()
function returns an object which stores
different values such as the log-likelihood value, the parameter
estimates and their standard errors. We can get the names of the
rmAMTS
object using the names()
function.
names(rmAMTS)
[1] "X" "X01" "model" "loglik" "npar"
[6] "iter" "convergence" "etapar" "se.eta" "hessian"
[11] "betapar" "se.beta" "W" "call"
In the help page (?RM
) we find that the estimated item
easiness parameters are stored in betapar
. Therefore, to
extract and show for all selected items we need to run the
following:
rmAMTS$betapar
beta age beta time beta address beta name beta year
0.60226398 -0.05320746 -2.00187682 0.60226356 -0.14105077
beta dob beta month beta firstww beta monarch beta countbac
1.77796324 -0.37712813 0.14901048 -0.18110994 -0.37712813
To show the item parameters in a well-arranged table and to also show
the standard error of estimation (SEM) of the items, the
cbind()
function can be used. As most people are more used
to item difficulty as opposed to item easiness parameters, the item
parameters are multiplied by “-1”. If we want, we can round the item
parameters using the round()
function to have a clearer
table.
IDiffTable <- cbind(ItemDifficulty = -1 * rmAMTS$betapar, ItemDifficultySE = rmAMTS$se.beta)
round(IDiffTable, 2)
ItemDifficulty ItemDifficultySE
beta age -0.60 0.21
beta time 0.05 0.19
beta address 2.00 0.19
beta name -0.60 0.21
beta year 0.14 0.19
beta dob -1.78 0.26
beta month 0.38 0.19
beta firstww -0.15 0.20
beta monarch 0.18 0.19
beta countbac 0.38 0.19
We now have a static table with items in entry order, item difficulties, and SEMs. We can sort this table by item difficulty.
IDiffTableorder <- IDiffTable[order(IDiffTable[, "ItemDifficulty"]), ]
round(IDiffTableorder, 2)
ItemDifficulty ItemDifficultySE
beta dob -1.78 0.26
beta age -0.60 0.21
beta name -0.60 0.21
beta firstww -0.15 0.20
beta time 0.05 0.19
beta year 0.14 0.19
beta monarch 0.18 0.19
beta month 0.38 0.19
beta countbac 0.38 0.19
beta address 2.00 0.19
In case of a html-document it might be nicer to have a dynamic table in which the items can be sorted interactively in differen ways (e.g. item order or item difficulty). If a “static” document, e.g., a Word-document, shall be produced, we have to deactivate the following code by putting a ‘#’ in front of the functions.
In order to be able to also sort the data with regard to item entry order, we first have to add a column with entry number to the data frame.
IDiffTableWithEntry <- cbind(entry = 1:10, IDiffTable)
Then, we use the datatable()
function from the
DT
package to insert a dynamic table with rounded
values.
datatable(as.data.frame(round(IDiffTableWithEntry, 3)))
A figure commonly presented in Rasch-papers is the person-item map.
It shows the distributions of item and person parameters in the same
figure and is often used to discuss the targeting of an assessment
instrument. We create a person-item map using the
plotPImap()
function.
plotPImap(rmAMTS, sorted = TRUE)
In the top panel the grey bars form a bar chart showing the
distribution of the person locations. Each item has its own line and the
item location is shown as a dot. Note that this is different for for
polytomous items. In the AMTS example above we see that the item
dob
is quite easy, while the item address
is
quite difficult.
Sometimes we might want to export the item parameters to another
software. If, for example, we want to write object
IDiffTable
of item parameters to a CSV file we use the
function write.csv()
.
write.csv(IDiffTable, file = paste(path_output, "IDiffTable.csv", sep = ""))
Up to now we have used the AMTS-data as an exemplary dataset for
dichotomous items. Now we want to have a look at polytomous items. For
demonstration purposes we use the DESC dataset included in the
iarm
package (with data from 799 patients). The DESC
(Rasch-based depression screening) assesses depression severity with 10
items with higher scores indicating a higher depression severity. The
DESC items have five response categories (0 = never to 4 = always). The
DESC dataset contains data from 799 patients and 14 variables. The first
four variables are person factors (ID, indication group, gender,
agegroup), followed by the 10 DESC items.
In case of polytomous items we have to chose between the Partial credit model (PCM function) and the Rating scale model (RSM function). For further explanations regarding both models, please see the chapter on item thresholds (-> xxx).
In the following we use the PCM.
We define an object itemsDESC
containing the items:
without first 4 columns (person factors).
itemsDESC <- desc2[, -c(1:4)]
We estimate a partial credit model using the PCM()
function from the eRm
package.
pcmDESC <- PCM(itemsDESC)
We compute item-category threshold parameters using the
thresholds()
function and show item difficulty and
thresholds for each item.
itemthresholds <- thresholds(pcmDESC)
itemthresholds
Design Matrix Block 1:
Location Threshold 1 Threshold 2 Threshold 3 Threshold 4
DESC_2_1 0.65292 -0.40914 -0.24291 1.20347 2.06026
DESC_2_2 0.98857 -0.05231 -0.00417 1.51598 2.49478
DESC_2_3 -0.35516 -2.87772 -1.11056 0.63262 1.93504
DESC_2_4 -0.02759 -2.08191 -0.53246 0.60855 1.89547
DESC_2_5 0.88304 0.22493 0.14526 0.92916 2.23283
DESC_2_6 0.68453 -1.07368 0.10748 1.01861 2.68572
DESC_2_7 0.47966 -0.64090 -0.28742 0.95992 1.88703
DESC_2_8 0.31585 -1.58430 -0.47006 0.90555 2.41223
DESC_2_9 -0.01579 -1.85411 -0.90131 0.45180 2.24046
DESC_2_10 1.75650 1.30477 0.92155 2.20644 2.59322
The function thresholds()
returns a list containing
three objects: a vector with threshold parameters
(threshpar
), a vector with standard errors
(se.thresh
), and a list with location and threshold
parameters (threshtable
).
As we cannot export the list with location and threshold parameters
directly to EXCEL, we create an object itemdiff
containing
the matrix with location and threshold parameters within the list.
itemdiff <- itemthresholds$threshtable$`1`
itemdiff
Location Threshold 1 Threshold 2 Threshold 3 Threshold 4
DESC_2_1 0.65291965 -0.40914137 -0.242912433 1.2034742 2.060258
DESC_2_2 0.98857121 -0.05230655 -0.004165513 1.5159751 2.494782
DESC_2_3 -0.35515509 -2.87772262 -1.110562803 0.6326213 1.935044
DESC_2_4 -0.02758824 -2.08190657 -0.532464539 0.6085510 1.895467
DESC_2_5 0.88304446 0.22492597 0.145256684 0.9291611 2.232834
DESC_2_6 0.68453445 -1.07367698 0.107477966 1.0186123 2.685724
DESC_2_7 0.47965903 -0.64089670 -0.287416511 0.9599232 1.887026
DESC_2_8 0.31585465 -1.58430130 -0.470058953 0.9055468 2.412232
DESC_2_9 -0.01578898 -1.85410701 -0.901313981 0.4518011 2.240464
DESC_2_10 1.75649700 1.30477346 0.921550471 2.2064394 2.593225
We can order the items in relation to the item difficulty.
itemdifforder <- itemdiff[order(itemdiff[, "Location"]), ]
round(itemdifforder, 2)
Location Threshold 1 Threshold 2 Threshold 3 Threshold 4
DESC_2_3 -0.36 -2.88 -1.11 0.63 1.94
DESC_2_4 -0.03 -2.08 -0.53 0.61 1.90
DESC_2_9 -0.02 -1.85 -0.90 0.45 2.24
DESC_2_8 0.32 -1.58 -0.47 0.91 2.41
DESC_2_7 0.48 -0.64 -0.29 0.96 1.89
DESC_2_1 0.65 -0.41 -0.24 1.20 2.06
DESC_2_6 0.68 -1.07 0.11 1.02 2.69
DESC_2_5 0.88 0.22 0.15 0.93 2.23
DESC_2_2 0.99 -0.05 0.00 1.52 2.49
DESC_2_10 1.76 1.30 0.92 2.21 2.59
Like for the dichotomous data we could also produce a dynamic table for an html-document with dynamic ordering of items with regards to item difficulty and item order.
itemdiffWithEntry <- cbind(Entry = 1:10, itemdiff)
datatable(round(itemdiffWithEntry, 2))
We now want to have a look at the item-person map of the DESC data
set. As the DESC has polytomous items, not only the item difficulty
(black filled circle) is shown for each item, but also the item
thresholds (empty circles). We can specify sorted = TRUE
for the items to be shown in order of item difficulties.
plotPImap(pcmDESC, sorted = TRUE)
To export the above shown data (itemdifforder
) as an
csv.-file to excel, we use again the write.csv()
function.
write.csv(itemdifforder, file = paste(path_output, "ItemDifficultyTable_PCM.csv",
sep = ""))
We create a data frame of the item thresholds and their corresponding SEMs.
SEthreshTable <- data.frame(Entry = 1:40, Beta = itemthresholds$threshpar, BetaSE = itemthresholds$se.thresh)
Most of the time, we will also be interested in the person parameters. They are used for conversion tables that provide the information about which interval scale level person estimate corresponds to which raw sum score of the scale. This enables everybody to base further evaluations and calculations on the person location estimates. These can be rescaled using any linear transformation if we think that users (e.g. clinicians) would rather work with person estimates ranging from, e.g., zero to 100. If we want to conduct any further calculations with the individual person scores it is recommendable to use the individual person estimates. This also enables us to use parametric statistics (like e.g. analysis of variance).
Person location estimates can also be used to get information regarding individual person fit. We can then investigate the percentage of people with misfitting responses and we can see on individual level whether we are allowed to interpret an individual person parameter (since the estimate only makes sense if the person fit is good).
For demonstration purposes we use the AMTS data set which was already used for estimating the item parameters.
We use the person_estimates()
function from the
iarm
package to compute person estimates with maximum
likelihood estimation (MLE) and weighted likelihood estimation (WLE) for
raw scores and provide a conversion table.
conversionTab <- iarm::person_estimates(rmAMTS)
If we specify properties = TRUE
the function provides
again the information how to convert the sum scores into person
estimates, but this time additional information regarding person SEM
& Bias; output presented seperatly for MLE and WLE estimation.
ppest <- person_estimates(rmAMTS, properties = TRUE)
We can also use the person.estimates()
function from the
eRm
package to estimate person parameters.
personpar <- eRm::person.parameter(rmAMTS)
personpar
Person Parameters:
Person NA Group: 1
NA pattern: x x x x x x x x x x
Raw Score Estimate Std.Error
0 -3.45975738 NA
1 -2.49406106 1.0972872
2 -1.59775965 0.8386474
3 -0.98584021 0.7378874
4 -0.47873949 0.6925663
5 -0.01108283 0.6798199
6 0.45810811 0.6948887
7 0.97071966 0.7435132
8 1.59514226 0.8491420
9 2.51613719 1.1121728
10 3.50935614 NA
Person NA Group: 2
NA pattern: x NA x x x x x x x x
Raw Score Estimate Std.Error
2 -1.47619 0.8569951
Often we need the individual person parameter for further calculations. Most of the person output only contains the data of those respondents who don’t have extreme values (highest or lowest possible score). If we want the person parameters for all persons we have to use the theta.table-function. People with extreme scores get interpolated person estimates. This matrix contains information regarding the person parameter of each person in entry order. Moreover, it is indicated for each person whether the person estimate had to be interpolated because of an extreme score. And the NAgroup is indicated which we already know from above: All patients who gave a valid response to all items belong to NAgroup “1”. If patients have missing data, they are assigned to another NAgroup. In the AMTS example there is only one additional NAgroup - with the response pattern of a missing response on item 2 (x NA x x x x x x x x).
personpar <- person.parameter(rmAMTS)
Sometimes we might want to provide the person estimates in another
format. We may apply any linear transformation to the person estimates.
In the present example we want to rescale the person parameters such,
that the minimum value is “0” and the maximum value “100”. We use the
rescale()
function from the scales
package to
do this. Note that since several rescale()
functions exist
in R we specify scales::rescale
to make sure that the
intended function is used.
pptrans <- personpar$theta.table
pptrans[, "person parameter recoded"] <- scales::rescale(pptrans[, 1], c(0, 100))
We now want to round the person parameter and the rescaled person parameter - both to 2 decimal places.
pptrans[, c(1, 4)] <- round(pptrans[, c(1, 4)], 2)
In order to be able to also sort the data with regard to person entry order, we add a column with entry order to the data frame. In case of the AMTS data from 1 to 197.
pptransWithEntry <- cbind(Entry = 1:nrow(pptrans), pptrans)
Sometimes we might want to round different variables in different ways, e.g. the “person parameter” with two decimal places and the “person parameter recoded” without any decimal places. We can do this as follows:
pptrans2 <- pptrans
pptrans2[, 1] <- round(pptrans2[, 1], 2)
pptrans2[, 4] <- round(pptrans2[, 4], 0)
We check that the rounding was successful.
In the chapter on item fit we already learnt about the different fit statistics with respect to the items. We learnt that the question of fit is about the difference between observed responses and the expected by the model. Applying this rationale to the persons we can investigate whether there are any persons responding in an unexpected way to the items. The percentage of people with unexpected response patterns should not be to high.
Using the personfit()
function we produce a list with
person fit statistics. The fit is only reported for the persons with
unextreme scores.
pfit <- personfit(personpar)
The list pfit
contains the following values:
names(pfit)
[1] "p.fit" "p.df" "st.res" "p.outfitMSQ" "p.infitMSQ"
[6] "p.outfitZ" "p.infitZ" "excl_obs_num" "excl_obs_chr"
We would like to show a table of some of the statistics and create a matrix with the relevant values.
pfitmatrix <- as.data.frame(matrix(unlist(lapply(c(1, 2, 4:7), FUN = function(x) pfit[[x]])),
ncol = 6))
colnames(pfitmatrix) <- names(pfit)[c(1, 2, 4:7)]
rownames(pfitmatrix) <- rownames(pfit[[3]])
pfitmatrix$pvalue <- round(1 - pchisq(pfit$p.fit, pfit$p.df - 1), 3)
pfitmatrix[(1 - pchisq(pfit$p.fit, pfit$p.df - 1)) <= 0.05, ]
p.fit p.df p.outfitMSQ p.infitMSQ p.outfitZ p.infitZ pvalue
P2 18.83079 10 1.883079 1.279163 0.9740987 0.5876548 0.027
P18 20.24315 10 2.024315 1.436061 2.2813724 1.6777429 0.016
P51 25.75598 10 2.575598 1.814584 3.0188438 2.8747528 0.002
P63 34.93891 9 3.882101 1.049447 2.4459857 0.2599416 0.000
P92 36.39721 10 3.639721 1.563075 2.3970109 1.1163724 0.000
P115 30.87311 10 3.087311 1.663731 2.7608539 1.7769907 0.000
P131 20.37240 10 2.037240 1.041892 1.7640518 0.2347356 0.016
P170 23.84614 10 2.384614 1.368692 1.2122402 0.6802473 0.005
pfitmatrix$sig <- ""
pfitmatrix$sig[pfitmatrix$pvalue <= 0.05 & pfitmatrix$pvalue > 0.01] <- "*"
pfitmatrix$sig[pfitmatrix$pvalue <= 0.01] <- "**"
In a static table we can order data according to, e.g., the Chi-squared statistics, as follows:
pfitmatrix[order(pfitmatrix[, "p.fit"]), ]
We can count the number of persons who do not fit the Rasch model
using the PersonMisfit()
function.
personmisfitSum <- PersonMisfit(personpar)
summary(personmisfitSum)
Percentage of Misfitting Persons:
0.6849 %
(i.e., 1 out of 146 persons have Chi-square based Z-values > 1.96)