There’s nothing more practical than a good theory

From theoretical conceptualization to practical implementation of item response theory algorithms for item selection

Ottavia M. Epifania\(^{1, 2}\)

\(^1\) Psicostat, \(^2\) University of Trento, Rovereto

2026-05-27

Test development from validated item banks

Why Automatically Generated Tests Matter

Large validated item banks (\(B\)) and automatic selection of items to obtain \(Q \subseteq B\)

IRT models for the win

Being focused on the item information and on the ability of each item to measure different levels of the latent trait, IRT models provide an ideal framework to find \[Q \subseteq B\]

Automated Test Assembly

Maximin algorithms

Maxmize the minimun measurement precision in specific regions of interest for the assessment provided by test \(Q\)

Minimax algorithms

Minimize the maximum distance from a target function that describes the desired measurement precision of test \(Q\)

Item Response Theory

Item Response Function

According to the 4-Parameter Logistic Model:

\[P(x_{pi}=1|\theta_p, b_i, a_i, c_i, e_i) = P(\theta) = c_i + (e_i-c_i)\dfrac{\exp[a_i(\theta_p - b_i)]}{1 + \exp[a_i(\theta_p - b_i)]}\]

\(c_3 = c_4 = 0\), \(e_3 = e_4 = 1\)

\(b_1 = b_2 = 0\), \(a_1 = a_2 = 1.2\)

Item Characteristics Curves (ICCs)

Information Functions

\[ \text{IIF}_{i}(\theta) = \dfrac{a_i^2[P(\theta)-c_i]^2[e_i - P(\theta)]^2}{(e_{i}-c_i)^2 P(\theta)[1-P(\theta)]}\]


\[TIF(\theta) = \sum_{i = 1}^{|B|} IIF_i(\theta)\] (\(B\): Set of items in a test (\(|X|\) cardinality of set \(X\)))

Procedures for automatic test development

Benchmark Procedure

Create a short test form composed of \(N\) items from an item bank \(B\) \(\rightarrow\) Select the \(N\) items with the highest IIFs:

The IIFs of the items of item bank are sorted in decreasing order:

\[\mathit{iif} = (\displaystyle \max_{1 < i < B} IIF_i(\theta), \ldots \displaystyle, \min_{1 < i < B} IIF_i(\theta)) \]

Items with IIFs from 1 to \(N\), \(N < |B|\), are selected to be included in the short test form

Aim: Test with \(N = 3\) items from \(B\) (\(|B| = 10\)):

Item \( b_i \) \( a_i \) \( c_i \) \( e_i \) \( \max \text{IIF}_i(\theta) \)
1 1.65 1.32 0.10 1 0.36
2 -1.82 0.71 0.06 1 0.11
3 2.87 0.78 0.00 1 0.15
4 -1.79 0.81 0.01 1 0.16
5 -0.83 0.87 0.08 1 0.16
6 1.46 1.35 0.03 1 0.43
7 2.87 0.73 0.00 1 0.13
8 -0.01 1.41 0.06 1 0.44
9 -2.92 1.09 0.06 1 0.26
10 -1.44 1.07 0.09 1 0.24

Aim: Test with \(N = 3\) items from \(B\) (\(|B| = 10\)):

Item \( b_i \) \( a_i \) \( c_i \) \( e_i \) \( \max \text{IIF}_i(\theta) \)
8 -0.01 1.41 0.06 1 0.44
6 1.46 1.35 0.03 1 0.43
1 1.65 1.32 0.10 1 0.36
9 -2.92 1.09 0.06 1 0.26
10 -1.44 1.07 0.09 1 0.24
4 -1.79 0.81 0.01 1 0.16
5 -0.83 0.87 0.08 1 0.16
3 2.87 0.78 0.00 1 0.15
7 2.87 0.73 0.00 1 0.13
2 -1.82 0.71 0.06 1 0.11

Aim: Test with \(N = 3\) items from \(B\) (\(|B| = 10\)):

Item \( b_i \) \( a_i \) \( c_i \) \( e_i \) \( \max \text{IIF}_i(\theta) \)
8 -0.01 1.41 0.06 1 0.44
6 1.46 1.35 0.03 1 0.43
1 1.65 1.32 0.10 1 0.36
9 -2.92 1.09 0.06 1 0.26
10 -1.44 1.07 0.09 1 0.24
4 -1.79 0.81 0.01 1 0.16
5 -0.83 0.87 0.08 1 0.16
3 2.87 0.78 0.00 1 0.15
7 2.87 0.73 0.00 1 0.13
2 -1.82 0.71 0.06 1 0.11

Warning!

\(\theta\)-target procedure

\(k = 0, \ldots, K\): Scalar denoting the iterations of the procedures (\(K = N-1\))

\(S^k \subseteq \{1, \ldots, J\}\): Set of items selected to be included in the short test form up to iteration \(k\)

\(Q^k \subseteq \{1, \ldots, N\}\): Set of \(\theta'\)s satisfied up to iteration \(k\);

At \(k=0\): \(S^0 = \emptyset\), \(Q^0 = \emptyset\)

The procedure cycles steps 1 to 3 until \(k = K\):

  1. Select \(iif_{in}^k = \displaystyle \max_{i \in B\setminus S^k, \, n \in N \setminus Q^k} \mathbf{IIF}(i,n)\);
  2. Compute \(S^{k+1} = S^k \cup \{i\}\) as the set of item selected at \(k\);
  3. Compute \(Q^{k+1} = Q^k \cup \{n\}\) as the set of \(\theta'\)s satisfied at \(k\);

At iteration \(K\), \(|Q^{K + 1}| = N\) and \(|S^{K + 1}| = N\)

\(\theta'\)
1 2 \(\ldots\) n \(\ldots\) N
1 \(\mathit{iif}_{11}\) \(iif_{12}\) \(\vdots\)
2 \(\mathit{iif}_{21}\) \(\mathit{iif}_{22}\) \(\vdots\)
\(\vdots\) \(\vdots\)
\(i\) \(\ldots\) \(\ldots\) \(\ldots\) \(\mathit{iif}_{in}\) \(\ldots\) \(\ldots\) \(\ldots\)
\(\vdots\) \(\vdots\)
\(B\) \(\vdots\) \(\mathit{iif}_{BN}\)

\(\theta\)-target definition

  • Intervals of different width defined on the latent trait

  • Cut-off based tests

  • \(\ldots\)

Aim: Develop a Test of \(N=3\) items from \(B\) with \(\theta' = (-2,0,2)\):

Item bank B
Item b a c e
1 1.65 1.32 0.10 1
2 -1.82 0.71 0.06 1
3 2.87 0.78 0.00 1
4 -1.79 0.81 0.01 1
5 -0.83 0.87 0.08 1
6 1.46 1.35 0.03 1
7 2.87 0.73 0.00 1
8 -0.01 1.41 0.06 1
9 -2.92 1.09 0.06 1
10 -1.44 1.07 0.09 1

IIF Matrix \(k = 0\)
-2 0 2
1 0.00 0.08 0.35
2 0.11 0.08 0.03
3 0.01 0.05 0.13
4 0.16 0.10 0.03
5 0.11 0.15 0.05
6 0.00 0.16 0.38
7 0.01 0.05 0.12
8 0.05 0.44 0.10
9 0.21 0.04 0.01
10 0.21 0.15 0.02

\(S^0 = \emptyset\)

\(Q^0 = \emptyset\)

IIF Matrix \(k = 0\)
-2 0 2
1 0 0.08 0.35
2 0.11 0.08 0.03
3 0.01 0.05 0.13
4 0.16 0.1 0.03
5 0.11 0.15 0.05
6 0 0.16 0.38
7 0.01 0.05 0.12
8 0.05 0.44 0.1
9 0.21 0.04 0.01
10 0.21 0.15 0.02

\(\mathit{iif}_{\text{max}}^0=\displaystyle \max_{j \in J\setminus S^0, \, n \in N \setminus Q^0} \mathbf{IIF}= \mathbf{IIF}(8,2) = 0.44\)

\(S^{1} = S^0 \cup \{8\}\) = {8}

\(Q^{1} = Q^0 \cup \{2\}\) = {2}

IIF Matrix \(k = 1\)
-2 0 2
1 0 0.08 0.35
2 0.11 0.08 0.03
3 0.01 0.05 0.13
4 0.16 0.1 0.03
5 0.11 0.15 0.05
6 0 0.16 0.38
7 0.01 0.05 0.12
8 0.05 0.44 0.1
9 0.21 0.04 0.01
10 0.21 0.15 0.02

\(\mathit{iif}_{max}^1=\displaystyle \max_{j \in J\setminus S^1, \, n \in N \setminus Q^1} \mathbf{IIF} = \mathbf{IIF}(6,3)= 0.38\)

\(S^{2} = S^1 \cup \{6\} = \{8, 6\}\)

\(Q^{2} = Q^1 \cup \{3\} = \{2, 3\}\)

IIF Matrix \(k = 2\)
-2 0 2
1 0 0.08 0.35
2 0.11 0.08 0.03
3 0.01 0.05 0.13
4 0.16 0.1 0.03
5 0.11 0.15 0.05
6 0 0.16 0.38
7 0.01 0.05 0.12
8 0.05 0.44 0.1
9 0.21 0.04 0.01
10 0.21 0.15 0.02

\(\mathit{iif}_{max}^2=\displaystyle \max_{j \in J\setminus S^1, \, n \in N \setminus Q^1} \mathbf{IIF} = \mathbf{IIF}(9,1)= 0.21\)

\(S^{3} = S^2 \cup \{9\} = \{8, 6, 9\}\)

\(Q^{3} = Q^2 \cup \{1\} = \{2,3, 1\}\)

End
-2 0 2
1 0 0.08 0.35
2 0.11 0.08 0.03
3 0.01 0.05 0.13
4 0.16 0.1 0.03
5 0.11 0.15 0.05
6 0 0.16 0.38
7 0.01 0.05 0.12
8 0.05 0.44 0.1
9 0.21 0.04 0.01
10 0.21 0.15 0.02


\(|S^3| = 3\), \(|Q^3| = 3\), \(K = 2\) \(\rightarrow\) end

:::

Tip

The shortIRT package

It’s on CRAN!

install.packages("shortIRT")
library(shortIRT)


bench()

bench(item_par, iifs = NULL, theta = NULL, num_item = NULL)
set.seed(1312)
n = 10 
item_par = data.frame(b = runif(n, -3,3),
                      a = runif(n, .7, 1.5),
                      c = runif(n, 0, .10),
                      e = 1)
theta = rnorm(1000)
test = bench(item_par, theta = theta, num_item = 3)
1
Define the item parameters in the item bank
2
Random values for the latent trait
3
Generate the test with the benchmark procedure

bench()

bench(item_par, iifs = NULL, theta = NULL, num_item = NULL)
set.seed(1312)
n = 10 
item_par = data.frame(b = runif(n, -3,3),
                      a = runif(n, .7, 1.5),
                      c = runif(n, 0, .10),
                      e = 1)
theta = rnorm(1000)
test = bench(item_par, theta = theta, num_item = 3)
summary(test)
plot(test)
1
Define the item parameters in the item bank
2
Random values for the latent trait
3
Generate the test with the benchmark procedure
4
Summary of the obtained test
5
Plot the resulting TIF (as compared to the TIF obtained from \(B\))
summary(test)         
The item selection is based on the benchmark procedure. 
The procedure selected the following 3 dichotomous items: 
10 2 9 
with parameters: 
           b        a          c e
10  2.243866 1.390367 0.02130947 1
2  -1.246596 1.283032 0.02087452 1
9  -1.171959 1.344295 0.07825710 1
These items maximize the information for thetas equal to: 
2.251 -1.215 -1.077
plot(test)

define_targets()

define_targets(theta, num_targets = NULL, method = c("equal", "clusters"))

targetsC = define_targets(theta, 
                          num_targets = 3, 
                          method = "clusters")
targetsC
          1           2           3 
-1.20701883  1.21315753  0.01562251 
attr(,"class")
[1] "clusters"

targetsE = define_targets(theta, 
                          num_targets = 3, 
                          method = "equal")
targetsE
[1] -2.39408579  0.04465118  2.48338815
attr(,"class")
[1] "equal"

theta_target()

theta_target(targets, item_par)
testC = theta_target(targetsC, item_par)
summary(testC)
The item selection is based on the theta-target procedure with cluster-defined targets. 
The procedure selected the following 3 dichotomous items: 
2 6 8 
with parameters: 
           b        a          c e
2 -1.2465959 1.283032 0.02087452 1
6  0.8283301 1.198013 0.01028904 1
8  0.3331977 1.208366 0.09370018 1
These items maximize the information for thetas equal to: 
-1.207019 1.213158 0.01562251
plot(testC, show_both = F)

testE = theta_target(targetsE, item_par)
summary(testE)
The item selection is based on the theta-target procedure with equally-spaced targets. 
The procedure selected the following 3 dichotomous items: 
10 8 2 
with parameters: 
            b        a          c e
10  2.2438664 1.390367 0.02130947 1
8   0.3331977 1.208366 0.09370018 1
2  -1.2465959 1.283032 0.02087452 1
These items maximize the information for thetas equal to: 
2.483388 0.04465118 -2.394086
plot(testE, show_both = F)

Additional functions

Function Description
IRT() Compute expected probability for a single item
mpirt() Compute expected probability for multiple items
obsirt() Simulate responses according to IRT probabilities
irt_estimate() Estimate of theta
item_info() Item Information Functions (multiple items, IIFs)
tif() Test Information Function (TIF)

& the methods defined for the S3 classes

Final remarks

Thinking before anything

Without the theoretical foundations, this work would not have been possible :)

A well-defined theory is 90% of the job

Practical skills will eventually come…and if they don’t, it’s fine! You’ll never walk alone

L’idea che la soluzione venga dall’intuizione del matto genio per natura e non dal lavoro complicato e collettivo di centinaia, migliaia di scienziati, questa idea è un’idea falsa e sbagliata, che toglie valore all’Università […]

Matteo Bordone, Febbraio 2025

You can find the slides on my personal page https://ottaviae.github.io/presentations

What is Psicostat?


Activities and Website

  • We meet twice a month on Zoom
  • Each meeting lasts one hour and includes one presentation
  • A lot of space is reserved for discussion (the heart of Psicostat)
  • Topics: innovative statistical methods in Psychology, theoretical tutorials, projects, experimental designs, reflections on research, and societal impact
  • Anyone is welcome to attend and present!
  • Omnia sunt communia! – All materials are shared, presenters can be contacted for info or collaborations
  • Presentations can serve as informal oral pre-registrations
  • International mailing list with ~300 members
  • Stay tuned! https://psicostat.dpss.psy.unipd.it/pages/meetings.html
https://psicostat.dpss.psy.unipd.it

The Core Team

Gianmarco Altoè

Massimiliano Pastore

Livio Finos

Giulia Calignano

Ottavia Epifania

Filippo Gambarota

Enrico Toffalini

Luca Menghini

Ambra Perugini

Margherita Calderan

Marco Tullio Liuzza

Tommaso Feraco