Say ‘when’: An item response theory algorithm for shortening tests while accounting for response fatigue

Ottavia M. Epifania\(^{1,2}\), Livio Finos\(^{2,3}\), Luigi Lombardi\(^{1}\)

\(^1\) University of Trento, Rovereto (IT), \(^2\) Psicostat, Padova (IT) \(^3\) University of Padova, Padova (IT)

Short Test Forms

Why?

Many items \(\rightarrow\) good measurement precision, great reliability and so on

Not always!

People might get tired & frustrated

\[Q \subset B\]

Item Response Theory models for the win

Being focused on the item information and on the ability of each item to measure different levels of the latent trait, IRT models provide an ideal framework for developing STF (and not torturing people)

Automated test assembly and maxmin algorithms

AIM

Size matters: How well can we estimate the latent trait with less and less items?

The 4-Parameter Logistic Model (4-PL)

4-PL - Item Response Function

\[P(x_{pi}= 1| \theta_p, b_i, a_i, c_i, d_i) = c_i + (d_i -c_i) \dfrac{\exp[a_i(\theta_p - b_i)]}{1 + \exp[a_i(\theta_p - b_i)]}\]

4-PL - Information Functions

\[ \text{IIF}_{i}(\theta) = \dfrac{a_i^2[P(\theta)-c_i]^2[d_i - P(\theta)]^2}{(d_{i}-c_i)^2 P(\theta)Q(\theta)}\]

\[TIF = \sum_{i = 1}^{||B||} IIF_i\] (\(B\): Set of items in a test (\(||X||\) cardinality of set \(X\)))

Not a property of the item!

\(d\) depends on the \(r\) rank of the item presentation during the administration, \(d_r\):

Léon

The algorithm

At \(k = 0\): \(\text{TIF}^0(\theta) = 0 \, \forall \theta\), \(Q^0 = \emptyset\).

For \(k \geq 0\),

\(A^k = B \setminus Q^k\)
\(\forall i \in A^k\), \(p\text{TIF}_{i}^k = \frac{\text{TIF}^k + \text{IIF}_{i}}{||Q^k||+1}\), with \(r = \{0, 1, \ldots, ||Q^k||-1\}\)
\(i^* = \arg \min_{i \in A^k} (|\text{TIF}^* - \text{pTIF}_i^k|)\)
Termination criterion: \(|\text{TIF}^* - \text{pTIF}_{i^*}^k| \geq |\text{TIF}_B - \text{TIF}^{k}|\):
- FALSE: \(Q^{k+1} = Q^{k} \cup \{i^*\}\), \(\text{TIF}^{k+1} = p\text{TIF}_{i^*}\), iterates 1-4
- TRUE: Stop, \(Q_{\text{Léon}} = Q^k\)

\[\text{TIF}^*\] (Target Test Information Function)

\[k = 0\]

\[k = 0\]

\[k = 0\]

\[k = 0\]

\[k = 0\]

\[k = 0\]

\[k = 1\]

\[k = 1\]

\[k = 1\]

\[k = 1\]

Simulation Study

1000 respondents with \(\theta \sim \mathcal{U}(-3,3)\)

Item bank \(B\) of 70 items:

\(b \sim \mathcal{U}(-3, 3)\)
\(a \sim \mathcal{U}(.90, 2.0)\)
\(c_i = 0\), \(\forall i \in B\)
\(d_r = \exp(-0.01 r)\), with \(r = \{0, \ldots, ||B|| -1\}\)

\(\text{TIF}^* = \sum_{i = 1}^{||B||} \frac{\text{IIF}_i}{||B||}\), with \(d_i = 1\), \(\forall i \in B\) 🥇

Considering \(TIF^*\), \(B\), and \(d_r\): 100 replications to find \(Q_{\text{Léon}} \subset B\)

Minimum number of items

10%, 25%, 50% of \(||B||\)

Responses are generated for:

\(Q_{\text{Léon}}\), with \(d_r = \exp(-0.01 r)\), and \(r = \{0, \ldots, ||Q_{\text{Léon}}|| -1\}\)
\(Q_{\text{Random}}\), where \(||Q_{\text{Léon}}|| = ||Q_{\text{Random}}||\) and with \(d_r = \exp(-0.01 r)\), and \(r = \{0, \ldots, ||Q_{\text{Léon}}|| -1\}\)
\(B\) with \(d_r = \exp(-0.01 r)\), and \(r = \{0, \ldots, ||B|| -1\}\)

Latent trait estimation: Two conditions

Condition \(d\): Accounts for the response fatigue
Condition \(\lnot d\): Does not account for the response fatigue

Results

Step 1

Step 2

Final remarks

Tip

Administering less, well chosen items is better than administering all items

Warning

The order of the items selected by Léon cannot be randomized

ottavia.epifania@untin.it