Can automated item generation be used to develop high quality MCQs that assess application of knowledge?

Research and Practice in Technology Enhanced Learning

Table 5 Wilcoxon two-sample test results (Holms-Bonferroni adjusted) for each of the six quality metrics and the overall cognitive domain judgment item

Item	Wilcoxon median z-statistic (out of 3)	Empirical type I error	Adjusted critical type I threshold
2^b	2.55	.01	.007
Overall cognitive domain judgment	2.02	.04	.008
1^a	1.91	.06	.010
4^d	1.01	.31	.013
6^f	0.92	.36	.017
3^c	0.46	.65	.025
5^e	− 0.43	.67	.050

Overall cognitive domain judgment is the item tests factual knowledge only/the item tests application of knowledge
^aThe central idea is in the stem (i.e., stem is required to answer the item)
^bThe directions in the stem are very clear
^cThere are no obvious cues or item flaws (grammatical cues, conspicuous right answer, etc.)
^dThe length of the choices is about equal
^eAll distractors are plausible
^fThis is a high-quality item