Item time!
Rovereto (TN)
2023-11-17
The fit of each item to the model can be evaluated:
\(S - X^2\) (Orland & Thissen, 2000): Statistics based on the \(\chi^2\). If significant, the item does not fit to the model (not suggested)
Root Mean Squared Deviation (RMSD): Difference between what expected under the model and real data (the lower the better).
\(< .15\): acceptable fit of the item to the model
\(< .10\): optimal fit of the item to the model
data = read.csv("data/itemClass.csv", header = T, sep = ",")
prop_item = data.frame(item = names(colMeans(data[, -c(1:2)])),
proportion = colMeans(data[, -c(1:2)]))
ggplot(prop_item,
aes(x = reorder(item, proportion),
y = proportion), color = item) + geom_bar(stat = "identity") + theme_light() +
ylab("Proportion correct") + ylim(0, 1) +
theme(legend.position = "none",
axis.title = element_text(size = 26),
axis.title.x = element_blank(),
axis.text = element_text(size = 22))
$IC
Model loglike Deviance Npars Nobs AIC BIC AIC3 AICc
1 m1pl -5599.390 11198.78 11 1000 11220.78 11274.77 11231.78 11221.05
2 m2pl -5597.621 11195.24 20 1000 11235.24 11333.40 11255.24 11236.10
3 m3pl -5597.738 11195.48 31 1000 11257.48 11409.62 11288.48 11259.52
CAIC GHP
1 11285.77 0.5610390
2 11353.40 0.5617621
3 11440.62 0.5628738
$LRtest
Model1 Model2 Chi2 df p
1 m1pl m2pl 3.5389531 9 0.9390614
2 m1pl m3pl 3.3051431 20 0.9999906
3 m2pl m3pl -0.2338101 11 1.0000000
attr(,"class")
[1] "IRT.compareModels"
m1pl = tam.mml(data[, grep("item", colnames(data))], verbose = F)
m2pl = tam.mml.2pl(data[, grep("item", colnames(data))], irtmodel = "2PL", verbose = F)
m3pl = tam.mml.3pl(data[, grep("item", colnames(data))], est.guess = grep("item", colnames(data)), verbose = F)
IRT.compareModels(m1pl, m2pl, m3pl)
List of 11
$ MD :'data.frame': 10 obs. of 2 variables:
..$ item : chr [1:10] "item1" "item2" "item3" "item4" ...
..$ Group1: num [1:10] -2.21e-06 1.09e-06 -4.23e-06 -4.09e-06 1.97e-06 ...
$ RMSD :'data.frame': 10 obs. of 2 variables:
..$ item : chr [1:10] "item1" "item2" "item3" "item4" ...
..$ Group1: num [1:10] 0.01389 0.00794 0.01089 0.00595 0.01088 ...
$ RMSD_bc :'data.frame': 10 obs. of 2 variables:
..$ item : chr [1:10] "item1" "item2" "item3" "item4" ...
..$ Group1: num [1:10] -0.00371 -0.01037 -0.00785 -0.01225 -0.00296 ...
$ MAD :'data.frame': 10 obs. of 2 variables:
..$ item : chr [1:10] "item1" "item2" "item3" "item4" ...
..$ Group1: num [1:10] 0.01235 0.00612 0.00945 0.00538 0.00908 ...
$ chisquare_stat :'data.frame': 10 obs. of 2 variables:
..$ item : chr [1:10] "item1" "item2" "item3" "item4" ...
..$ Group1: num [1:10] 1.234 0.594 0.667 0.312 1.135 ...
$ call : language CDM::IRT.RMSD(object = object)
$ G : num 1
$ RMSD_summary :'data.frame': 1 obs. of 5 variables:
..$ Parm: chr "Group1"
....
The mass of Iodine 131 decreases by 1/2 every 8 days because of radioactive decay.
In a laboratory, there are 2grams of Iodine 131. How many grams there would be after 16 days?
The same item presented to two different groups paired for their level of latent trait… does not have the same probability of being endorsed!
The subjects are paired according to their level of the latent trait. Are there any differences in the performance to the item?
Theoretically: Different subjects but with the same level of the latent trait (i.e., paired) should have similar performances on the item
If this expectation is not met \(\rightarrow\) Differential Item Functioning (DIF)
The comparison to investigate items with DIF is between two groups:
Reference group: It is the “baseline” group. For instance, if the test/questionnaire has been validated in a second language, the reference group is the original group
Focal group: It is the focus of the DIF investigation, where we suspect the items of the test might be working differently
The item “favors” one of the two groups (either the focal or the reference one) constantly
The item favors one of the group \(\rightarrow\) it is not constant along the latent trait
IRT-based methods: Subjects are paired according to the estimates of their latent trait level \(\theta\)
Score-based methods: Subjects are paired to the the observed score
Uniform DIF
1PL
2PL
3PL
Non-uniform DIF
2PL
3PL
Two IRT models:
A “no DIF” model \(\rightarrow\) Parameters are constrained to be equal in the two groups
A “DIF” model \(\rightarrow\) Parameters are left free to vary between the two groups
It is like the LRT on a linear model where the group variable (Focal Vs. Reference) is used as a predictor or not
The item parameters are estimated in both the reference and the focal groups
If there is a significant difference in the item estimates between groups \(\rightarrow\) DIF
Beyond significance: Lord’s \(\Delta\):
\(< 1.00\): Negligible DIF
\(1.00 < d < 1.5\): Moderate DIF
\(> 1.5\): High DIF
It considers the DIF as the area between the ICCs of the items
If the area between the ICCs is 0, then the item presents no DIF
It is based on a \(Z\) statistic under the hypothesis that the area between the ICCs of the item in the two groups is 0
The parameters estimated in the focal and reference groups cannot be directly compared \(\rightarrow\) The parameters in one of the groups must be rescaled.
The rescaling can be done according to the equal means anchoring method (Cook & Eignor, 1991)
This method is already applied in the difR
package
First, a constant must be computed:
\[c = \bar{b}_R - \bar{b}_F\]
Then, it is subtracted from the estimates of the items in the reference group:
\[b' = b_{Ri} - c\]
The dataset is the difClass.csv
data set
long = pivot_longer(data, !1:2, names_to = "item",
values_to = "correct")
prop_gender = long %>%
group_by(item, gender) %>%
summarise(prop = mean(correct), sd = sd(correct))
ggplot(prop_gender,
aes( x = item, y= prop, fill = gender)) + geom_bar(stat = "identity", position = position_dodge()) + ylim(0,1)
m1pl = tam.mml(data[, grep("item", colnames(data))], verbose = F)
m2pl = tam.mml.2pl(data[, grep("item", colnames(data))], irtmodel = "2PL", verbose = F)
m3pl = tam.mml.3pl(data[, grep("item", colnames(data))], est.guess = grep("item", colnames(data)),
verbose = F)
IRT.compareModels(m1pl, m2pl, m3pl)
$IC
Model loglike Deviance Npars Nobs AIC BIC AIC3 AICc
1 m1pl -5599.390 11198.78 11 1000 11220.78 11274.77 11231.78 11221.05
2 m2pl -5597.621 11195.24 20 1000 11235.24 11333.40 11255.24 11236.10
3 m3pl -5597.736 11195.47 31 1000 11257.47 11409.61 11288.47 11259.52
CAIC GHP
1 11285.77 0.5610390
2 11353.40 0.5617621
3 11440.61 0.5628736
$LRtest
Model1 Model2 Chi2 df p
1 m1pl m2pl 3.5389531 9 0.9390614
2 m1pl m3pl 3.3079257 20 0.9999905
3 m2pl m3pl -0.2310274 11 1.0000000
attr(,"class")
[1] "IRT.compareModels"
This is not properly the LRT but an approximation.
Detection of uniform Differential Item Functioning
using Logistic regression method, without item purification
and with LRT DIF statistic
Matching variable: specified matching variable
No set of anchor items was provided
Multiple comparisons made with Benjamini-Hochberg adjustement of p-values
Logistic regression DIF statistic:
Stat. P-value Adj. P
item1 9.5769 0.0020 0.0033 **
item2 7.8345 0.0051 0.0064 **
item3 2.8907 0.0891 0.0891 .
item4 6.6418 0.0100 0.0111 *
item5 28.1792 0.0000 0.0000 ***
item6 36.3876 0.0000 0.0000 ***
item7 18.8507 0.0000 0.0000 ***
item8 9.8524 0.0017 0.0033 **
item9 27.3310 0.0000 0.0000 ***
item10 8.9442 0.0028 0.0040 **
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Detection threshold: 10.8276 (significance level: 0.001)
Items detected as uniform DIF items:
item5
item6
item7
item9
Effect size (Nagelkerke's R^2):
Effect size code:
'A': negligible effect
'B': moderate effect
'C': large effect
R^2 ZT JG
item1 0.0092 A A
item2 0.0086 A A
item3 0.0030 A A
item4 0.0069 A A
item5 0.0368 A B
item6 0.0394 A B
item7 0.0231 A A
item8 0.0098 A A
item9 0.0293 A A
item10 0.0130 A A
Effect size codes:
Zumbo & Thomas (ZT): 0 'A' 0.13 'B' 0.26 'C' 1
Jodoin & Gierl (JG): 0 'A' 0.035 'B' 0.07 'C' 1
Output was not captured!
Detection of Differential Item Functioning using Raju's method
with 1PL model and without item purification
Type of Raju's Z statistic: based on unsigned area
Engine 'ltm' for item parameter estimation
Common discrimination parameter: fixed to 1
No set of anchor items was provided
Multiple comparisons made with Benjamini-Hochberg adjustement of p-values
Raju's statistic:
Stat. P-value Adj. P
item1 -2.7358 0.0062 0.0104 *
item2 -2.5503 0.0108 0.0154 *
item3 -1.5133 0.1302 0.1302
item4 -2.2916 0.0219 0.0244 *
item5 4.4742 0.0000 0.0000 ***
item6 5.0150 0.0000 0.0000 ***
item7 3.6098 0.0003 0.0008 ***
item8 2.4605 0.0139 0.0173 *
item9 -4.6929 0.0000 0.0000 ***
item10 -2.9145 0.0036 0.0071 **
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Detection thresholds: -3.2905 and 3.2905 (significance level: 0.001)
Items detected as DIF items:
item5
item6
item7
item9
Effect size (ETS Delta scale):
Effect size code:
'A': negligible effect
'B': moderate effect
'C': large effect
mF-mR deltaRaju
item1 -0.4190 0.9846 A
item2 -0.4244 0.9973 A
item3 -0.2458 0.5776 A
item4 -0.3677 0.8641 A
item5 0.9066 -2.1305 C
item6 0.8568 -2.0135 C
item7 0.6592 -1.5491 C
item8 0.3799 -0.8928 A
item9 -0.7692 1.8076 C
item10 -0.5764 1.3545 B
Effect size codes: 0 'A' 1.0 'B' 1.5 'C'
(for absolute values of 'deltaRaju')
Output was not captured!
Detection of Differential Item Functioning using Lord's method
with 1PL model and without item purification
Engine 'ltm' for item parameter estimation
Common discrimination parameter: fixed to 1
No set of anchor items was provided
Multiple comparisons made with Benjamini-Hochberg adjustement of p-values
Lord's chi-square statistic:
Stat. P-value Adj. P
item1 7.4848 0.0062 0.0104 *
item2 6.5039 0.0108 0.0154 *
item3 2.2901 0.1302 0.1302
item4 5.2515 0.0219 0.0244 *
item5 20.0188 0.0000 0.0000 ***
item6 25.1505 0.0000 0.0000 ***
item7 13.0309 0.0003 0.0008 ***
item8 6.0539 0.0139 0.0173 *
item9 22.0232 0.0000 0.0000 ***
item10 8.4941 0.0036 0.0071 **
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Items detected as DIF items:
item5
item6
item7
item9
Effect size (ETS Delta scale):
Effect size code:
'A': negligible effect
'B': moderate effect
'C': large effect
mF-mR deltaLord
item1 -0.4190 0.9846 A
item2 -0.4244 0.9973 A
item3 -0.2458 0.5776 A
item4 -0.3677 0.8641 A
item5 0.9066 -2.1305 C
item6 0.8568 -2.0135 C
item7 0.6592 -1.5491 C
item8 0.3799 -0.8928 A
item9 -0.7692 1.8076 C
item10 -0.5764 1.3545 B
Effect size codes: 0 'A' 1.0 'B' 1.5 'C'
(for absolute values of 'deltaLord')
Output was not captured!
mF-mR
: Difference between the focal and reference group (the estimates in the reference group are already rescaled)
deltaLord
/deltaRaju
: effect-size, it is obtained by multiplying mF-mR
\(\times -2.35\) (Penfield & Camilli, 2007)
The size of the effect can be interpreted according to the values reported in the R
output
The item parameters can be obtained as:
The object obtained from lordDif$itemParInit
has a number of rows equal 2 times the number of items:
Rows \(i, 1\ldots, I\): Estimates of the items in the REFERENCE GROUP
Rows \(i+1, \ldots 2I\): Estimates of the items in the FOCAL GROUP
Rows \(i, 1\ldots, I\): These estimates are not rescaled
Rows \(i+1, \ldots 2I\)
b se(b)
Item1 1.621222e-06 0.1084637
Item2 -1.187073e+00 0.1186087
Item3 1.001565e+00 0.1156194
Item4 8.361141e-01 0.1134093
Item5 -1.309367e+00 0.1209242
Item6 1.717348e+00 0.1307057
Item7 2.040499e+00 0.1409710
Item8 2.933326e-01 0.1090660
Item9 -1.223203e+00 0.1192650
Item10 1.872148e+00 0.1353199
bF se.b. bR se.b..1 constant new_bR
Item1 1.621222e-06 0.1084637 0.1003314 0.1081049 -0.3186282 0.41895953
Item2 -1.187073e+00 0.1186087 -1.0813481 0.1167024 -0.3186282 -0.76271998
Item3 1.001565e+00 0.1156194 0.9287588 0.1141034 -0.3186282 1.24738700
Item4 8.361141e-01 0.1134093 0.8852199 0.1135280 -0.3186282 1.20384806
Item5 -1.309367e+00 0.1209242 -2.5345451 0.1625745 -0.3186282 -2.21591698
Item6 1.717348e+00 0.1307057 0.5418750 0.1100347 -0.3186282 0.86050314
Item7 2.040499e+00 0.1409710 1.0626916 0.1160698 -0.3186282 1.38131979
Item8 2.933326e-01 0.1090660 -0.4051660 0.1092741 -0.3186282 -0.08653783
Item9 -1.223203e+00 0.1192650 -0.7726385 0.1124324 -0.3186282 -0.45401035
Item10 1.872148e+00 0.1353199 2.1299052 0.1442241 -0.3186282 2.44853338
DIF_correct
Item1 -0.4189579
Item2 -0.4243530
Item3 -0.2458222
Item4 -0.3677340
Item5 0.9065505
Item6 0.8568448
Item7 0.6591791
Item8 0.3798704
Item9 -0.7691922
Item10 -0.5763855