--- title: "Stimulus-Based Assessment in mstATA: Conditional Item Selection" author: "Hong Chen" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 vignette: > %\VignetteIndexEntry{Stimulus-Based Assessment in mstATA: Conditional Item Selection} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4.5 ) library(knitr) library(mstATA) library(highs) library(ggplot2) ``` ## Introduction Some operational assessments are stimulus-based, where items are grouped under shared reading passages, scenarios, data sets, or graphics (collectively referred to as stimuli). In such designs, item selection is not independent: the selection of an item is meaningful only if its associated stimulus is included. This vignette demonstrates how `mstATA` supports stimulus-based assembly through explicit conditional selection constraints. The key modeling idea is to enforce item–stimulus linking constraints so that: - Items can be selected only if their stimulus is selected - Stimulus-level requirements (counts, categories, quantitative targets) can be imposed - Stimulus–item-set requirements can be expressed transparently To keep vignette build times short (CRAN-friendly), solver runs are not executed here. Precomputed results can be loaded and visualized. ## Data Structure for Stimulus-Based Item Pools A stimulus-based item pool typically contains: - Item unique identifiers and stimulus identifiers that map each item to one stimulus - Item-level categorical and quantitative attributes (e.g., IRT parameters, content area, response time) - Optional stimulus-level categorical and quantitative attributes (e.g., stimulus category, reading level) In `mstATA`, this mapping can be prepared using: `create_pivot_stimulus_map()` to construct item–stimulus relationships `get_attribute_val()` to extract item/stimulus categorical and quantitative attribute values as needed. ## Modeling Strategy: Conditional Item Selection ### Decision variables A stimulus-based model typically includes: - Item selection variables, e.g., $x_{i,m}$ (item i selected in module m) - Stimulus selection variables, e.g., $z_{s,m}$ (stimulus s selected in module m) The item-stimulus conditional inclusion is: **If item i belongs to stimulus s, then selecting item i implies selecting stimulus s**. In other words, for each module m and each item i associated with stimulus s, $x_{i,m} \le z_{s,m}$. This ensures the model never selects an item "without its stimulus". The pivot-item method (van der Linden, 2000) provides a more efficient and elegant way to represent the dependency between items and their associated stimuli. Formally, a pivot item is defined as an item that is selected if and only if its corresponding stimulus is selected. In practice, the pivot item is typically chosen as the item within a stimulus that best represents the stimulus—identified by content experts as having the most representative content or desirable psychometric properties. Each stimulus-based set includes only one pivot item. The binary decision variable associated with this pivot item therefore serves as a carrier for the selection of both the item and its associated stimulus. Consequently, the pivot-item formulation allows `mstATA` to accommodate item pools consisting solely of discrete items, solely of stimulus-based items, or a mixture of both. In all cases, the same set of decision variables are used to represent item–module selection, thereby ensuring a unified modeling structure regardless of item type. ### Logical constraints and linking Maintaining consistency between item and stimulus selection requires the inclusion of logical constraints governing item–stimulus conditional selection. van der Linden (2005), pp. 165-170 identifies three classes of such constraints: (1) limits on the number of items associated with a selected stimulus, (2) category-specific item limits within a stimulus, and (3) bounds on the sum of item-level quantitative attributes associated with a stimulus. First, limits on the number of items associated with a selected stimulus allow two selection regimes: all-in/all-out selection and partial selection. Under all-in/all-out selection, selecting a stimulus implies that all associated items are selected, whereas under partial selection, only a subset of associated items may be selected. In `mstATA`, when only a minimum number of items associated with a selected stimulus is specified, an upper bound equal to the total number of items associated with that stimulus is automatically added. This upper bound provides a safe gating mechanism to preserve consistency between item and stimulus selection. Second, category-specific item limits within a stimulus restrict the number of items from a particular category that may be selected within a selected stimulus. Similarly, in `mstATA`, if only a minimum category-specific requirement is specified, an upper bound equal to the total number of items in that category associated with the stimulus is automatically added to ensure consistency. Third, bounds on the sum of item-level quantitative attributes associated with a stimulus impose lower and/or upper limits on aggregated item attributes, such as total difficulty values or word counts. However, even when both lower and upper bounds are imposed, consistency between item and stimulus selection may not be guaranteed unless all item-level quantitative attributes are strictly positive. To prevent users from inadvertently applying this specification without verifying the strict positivity assumption, `mstATA` detects whether this constraint type is the sole logical constraint governing item–stimulus conditional selection—that is, when neither limits on the number of associated items nor category-specific limits are specified. In such cases, `mstATA` automatically applies a partial‑selection rule—setting the minimum to the smallest number of items linked to any stimulus and the maximum to the largest—to ensure consistent and feasible item selection. ## Functions Stimulus-related constraints: **Stimulus-level** - `stimcat_con()`: constrain a stimulus must or must not be selected. - `stimquant_con()`: constrain quantitative attribute for a stimulus to be selected. **Itemset-level** (items linked to the same stimulus) - `stim_itemcount_con()`: constrain the min/exact/max number of items selected conditional on the selection of a stimulus. - `stim_itemcat_con()`: constrain the min/exact/max number of items selected from category c conditional on the selection of a stimulus. - `stim_itemquant_con()`: constrain the min/exact/max values for the sum of item quantitative attribute values within a selected stimulus. **Module-/Pathway-level** - `test_stimcount_con()`: constrain the min/equal/max number of stimuli in a module or a pathway. - `test_stimcat_con()`: constrain the min/equal/max number of stimuli from specific categories in a module or pathway. - `test_stimquant_con()`: constrain the min/equal/max for the sum of the stimulus quantitative attribute in a module or pathway. **Panel-level** - `panel_stimcat_con()`: constrain the min/equal/max number of stimuli from specific categories within a panel. **Solution-level** - `solution_stimcount_con()`: constraint the min/equal/max number of unique stimuli across multiple panels. - `solution_stimcat_con()`: constrain the min/equal/max number of unique stimuli from specific categories across multiple panels. ## Worked Example ### Item pool data A simulated item pool is used for analysis and can be accessed via `data("reading_itempool")`. The pool contains 500 items nested within 64 passages, comprising 407 multiple‑choice (MC) items and 93 technology‑enhanced items (TEIs). The distribution of items across the four content domains is 120, 128, 135, and 117 items, respectively. Of the 64 passages, 36 belong to the history domain and 28 to social studies. The pool includes two enemy‑item sets and one enemy‑stimulus set. Summary descriptive statistics for item‑ and stimulus‑level quantitative attributes are presented below. ```{r,echo=FALSE} tab <- data.frame( Attribute = c("Discrimination", "Difficulty", "Guessing", "Response Time", "Word Counts"), Level = c("Item", "Item", "Item", "Item", "Stimulus"), Mean = c(0.92, -0.01, 0.10, 120.11, 123.47), SD = c(0.19, 0.97, 0.04, 35.01, 35.01), Min = c(0.51, -3.24, 0.01, 60, 52), Max = c(1.59, 2.14, 0.26, 180, 199) ) kable(tab, caption = "Descriptive statistics for item and stimulus quantitative attributes.", col.names = c("Attribute", "Level", "Mean", "SD", "Min", "Max"), digits = 2, align = c("l", "l", "r", "r", "r", "r")) ``` ### Specifications An MST panel with a 1-2-3 design is assembled using the pivot-item method to jointly select items and stimuli. Within each stimuli, an pivot item is defined as the item that has the highest item discrimination parameter. Specifications include: - 3 stages, 6 modules, 4 pathways (S1R-S2E-S3H and S1R-S2H-S3E are not allowed). - Each stage has 12, 12, 12 items respectively. - Routing decision points: 0 to routing stage 2, -0.43, 0.43 to routing to stage 3. - Unique items are used across modules in the panel. - Min and Max number of items in content 1-4 per pathway: 7-11 for each content. - Exact number of TEI items per module: 2 TEI items per module - Average response time per module: 110-130 seconds per item. - Number of passages and passage types: one history passage, one social studies passage in each module. - Items in the same enemy item set can not appear in the same pathway. - Stimuli in the same enemy stimulus set can not appear in the same pathway. - If a stimulus is selected in a module, at least 4 items, at most 8 items from that stimulus are selected. - The selected stimulus must have at least 90, at most 150 words. - maximize TIF values at $\theta = c(-1.39, -0.97, -0.68)$ for S1R-S2E-S3E pathway, $\theta = c(-0.43,-0.21, 0)$ for S1R-S2E-S3M pathway, $\theta = c(0, 0.21, 0.43)$ for S1R-S2H-S3M pathway, $\theta = c(0.68, 0.97, 1.39)$ for S1R-S2H-S3H pathway. For each pathway, the middle target theta point is prioritized over the other target theta points. This priority is operationalized by specifying that the TIF value at the middle theta point must be 1.5 times the TIF values at the remaining target theta points. ### Code #### Step 1: Prepare the Item Pool ```{r} data("reading_itempool") REE<-c(-1.39,-0.97,-0.68) REM<-c(-0.43,-0.21,0) RHM<-c(0,0.21,0.43) RHH<-c(0.68,0.97,1.39) theta_values<-unique(c(REE,REM,RHM,RHH)) item_par_cols<-list("3PL"=c("discrimination","difficulty","guessing")) theta_iif<-compute_iif(reading_itempool, item_par_cols = item_par_cols, theta = theta_values,model_col = "model", D = 1.7) reading_itempool[,paste0("iif(theta=",theta_values,")")]<-theta_iif enemyitem_set<-create_enemy_sets(reading_itempool$item_id, reading_itempool$enemy_item) enemystim_set<-create_enemy_sets(reading_itempool$stimulus, reading_itempool$enemy_stimulus) pivot_stim_map<-create_pivot_stimulus_map(reading_itempool, item_id_col = "item_id", stimulus = "stimulus", pivot_item = "pivot_item") ``` #### Step 2: Specify the MST Structure ```{r} mst_123 <- mst_design(itempool = reading_itempool,item_id_col = "item_id", design = "1-2-3",rdps = list(c(0),c(-0.43,0.43)), exclude_pathways = c("1-1-3","1-2-1"), module_length = c(12,12,12,12,12,12), enemyitem_set = enemyitem_set, enemystim_set = enemystim_set, pivot_stim_map = pivot_stim_map) ``` #### Step 3: Identify hierarchical requirements - mst structure: MST 1-2-3 (S1R-S2E-S3H and S1R-S2H-S3E are not allowed). Each stage has 12, 12, 12 items respectively. Routing decision points: 0 to routing stage 2, -0.43, 0.43 to routing to stage 3. - panel-level item reusage: Unique items are used across modules in the panel. - pathway-level:no enemy item pairs, no enemy stimulus pairs, 7-11 for each content. - module-level: 2 TEI items per module; one history passage, one social studies passage in each module. Module mean response time. - itemset-level: If a stimulus is selected in a module, at least 4 items, at most 8 items from that stimulus are selected. - stim-level: The selected stimulus must have at least 90, at most 150 words. - objective: maximize TIF values at $\theta = c(-1.39, -0.97, -0.68)$ for S1R-S2E-S3E pathway, $\theta = c(-0.43,-0.21, 0)$ for S1R-S2E-S3M pathway, $\theta = c(0, 0.21, 0.43)$ for S1R-S2H-S3M pathway, $\theta = c(0.68, 0.97, 1.39)$ for S1R-S2H-S3H pathway. #### Step 4: Translate specifications ```{r} mst_structure<-mst_structure_con(x = mst_123,info_tol = 0.1) mst_noreuse<-panel_itemreuse_con(x = mst_123,overlap = FALSE) mst_noenemyitem<-enemyitem_exclu_con(x = mst_123) mst_noenemystim<-enemystim_exclu_con(x = mst_123) mst_content<-test_itemcat_range_con(x = mst_123,attribute = "content", cat_levels = paste0("content",1:4), min = 7,max = 11, which_pathway = 1:4) mst_tei<-test_itemcat_con(x = mst_123,attribute = "itemtype", cat_levels = "TEI", operator = "=",target_num = 2, which_module = 1:6) mst_passtype<-test_stimcat_con(x = mst_123,attribute = "stimulus_type", cat_levels = c("history","social studies"), operator = "=",target_num = 1, which_module = 1:6) mst_time<-test_itemquant_range_con(x = mst_123,attribute = "time", min = 110*12,max = 130*12, which_module = 1:6) mst_stimitem<-stim_itemcount_con(x = mst_123,min = 4,max = 8, which_module = 1:6) mst_stimquant<-stimquant_con(x = mst_123,attribute = "stimulus_words", min = 90,max = 150, which_module = 1:6) obj1<-objective_term(x = mst_123,attribute = "iif(theta=-1.39)", applied_level = "Pathway-level", which_pathway = 1,sense = "max") obj2<-objective_term(x = mst_123,attribute = "iif(theta=-0.97)", applied_level = "Pathway-level", which_pathway = 1,sense = "max") obj3<-objective_term(x = mst_123,attribute = "iif(theta=-0.68)", applied_level = "Pathway-level", which_pathway = 1,sense = "max") obj4<-objective_term(x = mst_123,attribute = "iif(theta=-0.43)", applied_level = "Pathway-level", which_pathway = 2,sense = "max") obj5<-objective_term(x = mst_123,attribute = "iif(theta=-0.21)", applied_level = "Pathway-level", which_pathway = 2,sense = "max") obj6<-objective_term(x = mst_123,attribute = "iif(theta=0)", applied_level = "Pathway-level", which_pathway = 2,sense = "max") obj7<-objective_term(x = mst_123,attribute = "iif(theta=0)", applied_level = "Pathway-level", which_pathway = 3,sense = "max") obj8<-objective_term(x = mst_123,attribute = "iif(theta=0.21)", applied_level = "Pathway-level", which_pathway = 3,sense = "max") obj9<-objective_term(x = mst_123,attribute = "iif(theta=0.43)", applied_level = "Pathway-level", which_pathway = 3,sense = "max") obj10<-objective_term(x = mst_123,attribute = "iif(theta=0.68)", applied_level = "Pathway-level", which_pathway = 4,sense = "max") obj11<-objective_term(x = mst_123,attribute = "iif(theta=0.97)", applied_level = "Pathway-level", which_pathway = 4,sense = "max") obj12<-objective_term(x = mst_123,attribute = "iif(theta=1.39)", applied_level = "Pathway-level", which_pathway = 4,sense = "max") mst_obj<-capped_maximin_obj(x = mst_123, multiple_terms = list(obj1,obj2,obj3, obj4,obj5,obj6, obj7,obj8,obj9, obj10,obj11,obj12), strategy_args = list(proportions = rep(c(1,1.5,1),4))) mst_model<-onepanel_spec(x = mst_123, constraints = list(mst_structure,mst_noreuse, mst_content,mst_noenemyitem,mst_noenemystim, mst_tei,mst_passtype,mst_time, mst_stimitem, mst_stimquant), objective = mst_obj) ``` #### Step 5: Execute assembly via solver The model contains 2,146 linear constraints. Using HiGHS as the solver, an optimal solution is obtained within 2 minutes. ```{r} # It is not executed in the vignette to avoid long build times. # \dontrun{ # mst_result<-solve_model(model_spec = mst_model,solver = "HiGHS",time_limit = 5*60) # reading_panel<-assembled_panel(x = mst_123,result = mst_result) # } ``` #### Step 6: Diagnose infeasible model There is an optimal solution. Skip this step. #### Step 7: Evaluate panel The assembled panel is saved as `data("reading_panel")`. S1R contains: stim 12 (139 words, social studies, 7 MC items) and stim 51 (132 words, history, 3 MC items and 2 TEI items) S2E contains: stim 1 (117 words, history, 7 MC items and 1 TEI item) and stim 8 (101 words, social studies, 3 MC items and 1 TEI item) S2H contains: stim 38 (98 words, history, 7 MC items and 1 TEI item) and stim 53 (116 words, social studies, 3 MC items and 1 TEI item) S3E contains: stim 37 (96 words, social studies, 5 MC items and 1 TEI item) and stim 48 (122 words, history, 5 MC items and 1 TEI item) S3M contains: stim 4 (124 words, social studies, 3 MC items and 1 TEI item) and stim 45 (124 words, history, 7 MC items and 1 TEI item) S3H contains: stim 30 (99 words, social studies, 4 MC items and 1 TEI item) and stim 36 (148 words, history, 6 MC items and 1 TEI item) Enemy item pair and enemy stimulus pair do not appear together in a pathway. ```{r,echo=FALSE} data("reading_panel") # RDP information check RDP_check<-rbind(report_test_tif(assembled_panel = reading_panel, theta = 0, item_par_cols = item_par_cols, model_col = "model",D = 1.7, which_module = 2:3), report_test_tif(assembled_panel = reading_panel, theta = -0.43, item_par_cols = item_par_cols, model_col = "model",D = 1.7, which_module = 4:5), report_test_tif(assembled_panel = reading_panel, theta = 0.43, item_par_cols = item_par_cols, model_col = "model",D = 1.7, which_module = 5:6)) kable(RDP_check, caption = "Routing decision points information check", digits = 2, align = c("l","r", "r", "r")) Content_check<-report_test_itemcat(assembled_panel = reading_panel, attribute = "content", cat_levels = paste0("content",1:4), which_pathway = 1:4) kable(Content_check, caption = "Number of items per content check") time_check<-report_test_itemquant(assembled_panel = reading_panel, attribute = "time", statistic = "average", which_module = 1:6) kable(time_check, caption = "Average response time per item check") pathway_tifcheck <- data.frame(theta = c(-1.39, -0.97, -0.68, -0.43, -0.21, 0, 0, 0.21, 0.43, 0.68, 0.97, 1.39), pathway_id = c("M-E-E", "M-E-E", "M-E-E", "M-E-M", "M-E-M", "M-E-M", "M-H-M", "M-H-M", "M-H-M", "M-H-H", "M-H-H", "M-H-H"), must_greater_than = c(8.941002, 13.4115, 8.941002, 8.941002, 13.4115, 8.941002, 8.941002, 13.4115, 8.941002, 8.941002, 13.4115, 8.941002), realized_information = c(11.64047, 13.4115, 13.72838, 13.24364, 13.48899, 13.29367, 13.25245, 13.4666, 13.35719, 13.72612, 13.47349, 11.21304), must_lower_than = c(13.72838, 18.19888, 13.72838, 13.72838, 18.19888, 13.72838, 13.72838, 18.19888, 13.72838, 13.72838, 18.19888, 13.72838)) kable(pathway_tifcheck, caption = "Pathway-level information requirements and realized information at selected ability levels.", digits = 3, align = c("r","l","r", "r", "r")) plot_panel_tif(assembled_panel = reading_panel,item_par_cols = item_par_cols, model_col = "model",D = 1.7,theta = seq(-3,3,0.1),unit = "pathway") ``` ## Summary `mstATA` supports stimulus-based assessment by translating stimulus and stimulus–item specifications into explicit linear constraints. The main advantages are: - Transparent conditional selection logic (items selected only when stimuli are selected) - Natural support for stimulus-level and within-stimulus constraints **Note**: Another stimulus-based MST panel assembly can be found in the author's PhD dissertation, which will be available through ProQuest in May 2026. **Reference** van der Linden, W. J. (2000). *Optimal assembly of tests with item set.* Applied Psychological Measurement, 24(3), 225–240. https://doi.org/10.1177/01466210022031697