There’s nothing more practical than a good theory

From theoretical conceptualization to practical implementation of item response theory algorithms for item selection

Ottavia M. Epifania\(^{1, 2}\)

\(^1\) Psicostat, \(^2\) University of Trento, Rovereto

2026-05-27

Test development from validated item banks

Why Automatically Generated Tests Matter

Large validated item banks (\(B\)) and automatic selection of items to obtain \(Q \subseteq B\)

IRT models for the win

Being focused on the item information and on the ability of each item to measure different levels of the latent trait, IRT models provide an ideal framework to find \[Q \subseteq B\]

Automated Test Assembly

Maximin algorithms

Maxmize the minimun measurement precision in specific regions of interest for the assessment provided by test \(Q\)

Minimax algorithms

Minimize the maximum distance from a target function that describes the desired measurement precision of test \(Q\)

Item Response Theory

Item Response Function

According to the 4-Parameter Logistic Model:

\[P(x_{pi}=1|\theta_p, b_i, a_i, c_i, e_i) = P(\theta) = c_i + (e_i-c_i)\dfrac{\exp[a_i(\theta_p - b_i)]}{1 + \exp[a_i(\theta_p - b_i)]}\]

Item Characteristics Curves (ICCs)

Information Functions

\[ \text{IIF}_{i}(\theta) = \dfrac{a_i^2[P(\theta)-c_i]^2[e_i - P(\theta)]^2}{(e_{i}-c_i)^2 P(\theta)[1-P(\theta)]}\]

\[TIF(\theta) = \sum_{i = 1}^{|B|} IIF_i(\theta)\] (\(B\): Set of items in a test (\(|X|\) cardinality of set \(X\)))

Procedures for automatic test development

Benchmark Procedure

Create a short test form composed of \(N\) items from an item bank \(B\) \(\rightarrow\) Select the \(N\) items with the highest IIFs:

The IIFs of the items of item bank are sorted in decreasing order:

\[\mathit{iif} = (\displaystyle \max_{1 < i < B} IIF_i(\theta), \ldots \displaystyle, \min_{1 < i < B} IIF_i(\theta)) \]

Items with IIFs from 1 to \(N\), \(N < |B|\), are selected to be included in the short test form

Aim: Test with \(N = 3\) items from \(B\) (\(|B| = 10\)):

Item	\( b_i \)	\( a_i \)	\( c_i \)	\( e_i \)	\( \max \text{IIF}_i(\theta) \)
1	1.65	1.32	0.10	1	0.36
2	-1.82	0.71	0.06	1	0.11
3	2.87	0.78	0.00	1	0.15
4	-1.79	0.81	0.01	1	0.16
5	-0.83	0.87	0.08	1	0.16
6	1.46	1.35	0.03	1	0.43
7	2.87	0.73	0.00	1	0.13
8	-0.01	1.41	0.06	1	0.44
9	-2.92	1.09	0.06	1	0.26
10	-1.44	1.07	0.09	1	0.24

Aim: Test with \(N = 3\) items from \(B\) (\(|B| = 10\)):

Item	\( b_i \)	\( a_i \)	\( c_i \)	\( e_i \)	\( \max \text{IIF}_i(\theta) \)
8	-0.01	1.41	0.06	1	0.44
6	1.46	1.35	0.03	1	0.43
1	1.65	1.32	0.10	1	0.36
9	-2.92	1.09	0.06	1	0.26
10	-1.44	1.07	0.09	1	0.24
4	-1.79	0.81	0.01	1	0.16
5	-0.83	0.87	0.08	1	0.16
3	2.87	0.78	0.00	1	0.15
7	2.87	0.73	0.00	1	0.13
2	-1.82	0.71	0.06	1	0.11

Aim: Test with \(N = 3\) items from \(B\) (\(|B| = 10\)):

Item	\( b_i \)	\( a_i \)	\( c_i \)	\( e_i \)	\( \max \text{IIF}_i(\theta) \)
8	-0.01	1.41	0.06	1	0.44
6	1.46	1.35	0.03	1	0.43
1	1.65	1.32	0.10	1	0.36
9	-2.92	1.09	0.06	1	0.26
10	-1.44	1.07	0.09	1	0.24
4	-1.79	0.81	0.01	1	0.16
5	-0.83	0.87	0.08	1	0.16
3	2.87	0.78	0.00	1	0.15
7	2.87	0.73	0.00	1	0.13
2	-1.82	0.71	0.06	1	0.11

Warning!

\(\theta\)-target procedure

\(k = 0, \ldots, K\): Scalar denoting the iterations of the procedures (\(K = N-1\))

\(S^k \subseteq \{1, \ldots, J\}\): Set of items selected to be included in the short test form up to iteration \(k\)

\(Q^k \subseteq \{1, \ldots, N\}\): Set of \(\theta'\)s satisfied up to iteration \(k\);

At \(k=0\): \(S^0 = \emptyset\), \(Q^0 = \emptyset\)

The procedure cycles steps 1 to 3 until \(k = K\):

Select \(iif_{in}^k = \displaystyle \max_{i \in B\setminus S^k, \, n \in N \setminus Q^k} \mathbf{IIF}(i,n)\);
Compute \(S^{k+1} = S^k \cup \{i\}\) as the set of item selected at \(k\);
Compute \(Q^{k+1} = Q^k \cup \{n\}\) as the set of \(\theta'\)s satisfied at \(k\);

At iteration \(K\), \(|Q^{K + 1}| = N\) and \(|S^{K + 1}| = N\)

		\(\theta'\)
	1	2	\(\ldots\)	n	\(\ldots\)		N

1	\(\mathit{iif}_{11}\)	\(iif_{12}\)		\(\vdots\)
2	\(\mathit{iif}_{21}\)	\(\mathit{iif}_{22}\)		\(\vdots\)
\(\vdots\)				\(\vdots\)
\(i\)	\(\ldots\)	\(\ldots\)	\(\ldots\)	\(\mathit{iif}_{in}\)	\(\ldots\)	\(\ldots\)	\(\ldots\)
\(\vdots\)				\(\vdots\)
\(B\)				\(\vdots\)			\(\mathit{iif}_{BN}\)

\(\theta\)-target definition

Intervals of different width defined on the latent trait
Cut-off based tests
\(\ldots\)

Aim: Develop a Test of \(N=3\) items from \(B\) with \(\theta' = (-2,0,2)\):

Item bank B
Item	b	a	c	e
1	1.65	1.32	0.10	1
2	-1.82	0.71	0.06	1
3	2.87	0.78	0.00	1
4	-1.79	0.81	0.01	1
5	-0.83	0.87	0.08	1
6	1.46	1.35	0.03	1
7	2.87	0.73	0.00	1
8	-0.01	1.41	0.06	1
9	-2.92	1.09	0.06	1
10	-1.44	1.07	0.09	1

IIF Matrix \(k = 0\)
	-2	0	2
1	0.00	0.08	0.35
2	0.11	0.08	0.03
3	0.01	0.05	0.13
4	0.16	0.10	0.03
5	0.11	0.15	0.05
6	0.00	0.16	0.38
7	0.01	0.05	0.12
8	0.05	0.44	0.10
9	0.21	0.04	0.01
10	0.21	0.15	0.02

\(S^0 = \emptyset\)

\(Q^0 = \emptyset\)

IIF Matrix \(k = 0\)
	-2	0	2
1	0	0.08	0.35
2	0.11	0.08	0.03
3	0.01	0.05	0.13
4	0.16	0.1	0.03
5	0.11	0.15	0.05
6	0	0.16	0.38
7	0.01	0.05	0.12
8	0.05	0.44	0.1
9	0.21	0.04	0.01
10	0.21	0.15	0.02

\(\mathit{iif}_{\text{max}}^0=\displaystyle \max_{j \in J\setminus S^0, \, n \in N \setminus Q^0} \mathbf{IIF}= \mathbf{IIF}(8,2) = 0.44\)

\(S^{1} = S^0 \cup \{8\}\) = {8}

\(Q^{1} = Q^0 \cup \{2\}\) = {2}

IIF Matrix \(k = 1\)
	-2	0	2
1	0	0.08	0.35
2	0.11	0.08	0.03
3	0.01	0.05	0.13
4	0.16	0.1	0.03
5	0.11	0.15	0.05
6	0	0.16	0.38
7	0.01	0.05	0.12
8	0.05	0.44	0.1
9	0.21	0.04	0.01
10	0.21	0.15	0.02

\(\mathit{iif}_{max}^1=\displaystyle \max_{j \in J\setminus S^1, \, n \in N \setminus Q^1} \mathbf{IIF} = \mathbf{IIF}(6,3)= 0.38\)

\(S^{2} = S^1 \cup \{6\} = \{8, 6\}\)

\(Q^{2} = Q^1 \cup \{3\} = \{2, 3\}\)

IIF Matrix \(k = 2\)
	-2	0	2
1	0	0.08	0.35
2	0.11	0.08	0.03
3	0.01	0.05	0.13
4	0.16	0.1	0.03
5	0.11	0.15	0.05
6	0	0.16	0.38
7	0.01	0.05	0.12
8	0.05	0.44	0.1
9	0.21	0.04	0.01
10	0.21	0.15	0.02

\(\mathit{iif}_{max}^2=\displaystyle \max_{j \in J\setminus S^1, \, n \in N \setminus Q^1} \mathbf{IIF} = \mathbf{IIF}(9,1)= 0.21\)

\(S^{3} = S^2 \cup \{9\} = \{8, 6, 9\}\)

\(Q^{3} = Q^2 \cup \{1\} = \{2,3, 1\}\)

End
	-2	0	2
1	0	0.08	0.35
2	0.11	0.08	0.03
3	0.01	0.05	0.13
4	0.16	0.1	0.03
5	0.11	0.15	0.05
6	0	0.16	0.38
7	0.01	0.05	0.12
8	0.05	0.44	0.1
9	0.21	0.04	0.01
10	0.21	0.15	0.02

\(|S^3| = 3\), \(|Q^3| = 3\), \(K = 2\) \(\rightarrow\) end

:::

Tip

The `shortIRT` package

It’s on CRAN!

install.packages("shortIRT")
library(shortIRT)

`bench()`

bench(item_par, iifs = NULL, theta = NULL, num_item = NULL)

set.seed(1312)
n = 10 
item_par = data.frame(b = runif(n, -3,3),
                      a = runif(n, .7, 1.5),
                      c = runif(n, 0, .10),
                      e = 1)
theta = rnorm(1000)
test = bench(item_par, theta = theta, num_item = 3)

1: Define the item parameters in the item bank
2: Random values for the latent trait
3: Generate the test with the benchmark procedure

`bench()`

bench(item_par, iifs = NULL, theta = NULL, num_item = NULL)

set.seed(1312)
n = 10 
item_par = data.frame(b = runif(n, -3,3),
                      a = runif(n, .7, 1.5),
                      c = runif(n, 0, .10),
                      e = 1)
theta = rnorm(1000)
test = bench(item_par, theta = theta, num_item = 3)
summary(test)
plot(test)

1: Define the item parameters in the item bank
2: Random values for the latent trait
3: Generate the test with the benchmark procedure
4: Summary of the obtained test
5: Plot the resulting TIF (as compared to the TIF obtained from \(B\))

summary(test)

The item selection is based on the benchmark procedure. 
The procedure selected the following 3 dichotomous items: 
10 2 9 
with parameters: 
           b        a          c e
10  2.243866 1.390367 0.02130947 1
2  -1.246596 1.283032 0.02087452 1
9  -1.171959 1.344295 0.07825710 1
These items maximize the information for thetas equal to: 
2.251 -1.215 -1.077

plot(test)

`define_targets()`

define_targets(theta, num_targets = NULL, method = c("equal", "clusters"))

targetsC = define_targets(theta, 
                          num_targets = 3, 
                          method = "clusters")
targetsC

          1           2           3 
-1.20701883  1.21315753  0.01562251 
attr(,"class")
[1] "clusters"

targetsE = define_targets(theta, 
                          num_targets = 3, 
                          method = "equal")
targetsE

[1] -2.39408579  0.04465118  2.48338815
attr(,"class")
[1] "equal"

`theta_target()`

theta_target(targets, item_par)

testC = theta_target(targetsC, item_par)
summary(testC)

The item selection is based on the theta-target procedure with cluster-defined targets. 
The procedure selected the following 3 dichotomous items: 
2 6 8 
with parameters: 
           b        a          c e
2 -1.2465959 1.283032 0.02087452 1
6  0.8283301 1.198013 0.01028904 1
8  0.3331977 1.208366 0.09370018 1
These items maximize the information for thetas equal to: 
-1.207019 1.213158 0.01562251

plot(testC, show_both = F)

testE = theta_target(targetsE, item_par)
summary(testE)

The item selection is based on the theta-target procedure with equally-spaced targets. 
The procedure selected the following 3 dichotomous items: 
10 8 2 
with parameters: 
            b        a          c e
10  2.2438664 1.390367 0.02130947 1
8   0.3331977 1.208366 0.09370018 1
2  -1.2465959 1.283032 0.02087452 1
These items maximize the information for thetas equal to: 
2.483388 0.04465118 -2.394086

plot(testE, show_both = F)

Additional functions

Function	Description
`IRT()`	Compute expected probability for a single item
`mpirt()`	Compute expected probability for multiple items
`obsirt()`	Simulate responses according to IRT probabilities
`irt_estimate()`	Estimate of theta
`item_info()`	Item Information Functions (multiple items, IIFs)
`tif()`	Test Information Function (TIF)

& the methods defined for the S3 classes

Final remarks

Thinking before anything

Without the theoretical foundations, this work would not have been possible :)

A well-defined theory is 90% of the job

Practical skills will eventually come…and if they don’t, it’s fine! You’ll never walk alone

L’idea che la soluzione venga dall’intuizione del matto genio per natura e non dal lavoro complicato e collettivo di centinaia, migliaia di scienziati, questa idea è un’idea falsa e sbagliata, che toglie valore all’Università […]

Matteo Bordone, Febbraio 2025

You can find the slides on my personal page https://ottaviae.github.io/presentations

What is Psicostat?

Activities and Website

We meet twice a month on Zoom
Each meeting lasts one hour and includes one presentation
A lot of space is reserved for discussion (the heart of Psicostat)
Topics: innovative statistical methods in Psychology, theoretical tutorials, projects, experimental designs, reflections on research, and societal impact
Anyone is welcome to attend and present!
Omnia sunt communia! – All materials are shared, presenters can be contacted for info or collaborations
Presentations can serve as informal oral pre-registrations
International mailing list with ~300 members
Stay tuned! https://psicostat.dpss.psy.unipd.it/pages/meetings.html

https://psicostat.dpss.psy.unipd.it

The Core Team

There’s nothing more practical than a good theory

Test development from validated item banks

Why Automatically Generated Tests Matter

Item Response Theory

Item Response Function

Information Functions

Procedures for automatic test development

Benchmark Procedure

\(\theta\)-target procedure

The shortIRT package

bench()

bench()

define_targets()

theta_target()

Additional functions

Final remarks

Thinking before anything

What is Psicostat?

Activities and Website

The Core Team

The `shortIRT` package

`bench()`

`bench()`

`define_targets()`

`theta_target()`