Mohammad Darbalaei
Daniel Hoffmann
Bioinformatics and Computational Biophysics, Faculty of Biology, University of Duisburg-Essen
Tumor heterogeneity is a major challenge in cancer research and precision oncology. In many cancers, including gastrointestinal stromal tumors (GIST), tumors are not uniform but consist of multiple genetically distinct subclones. These subclones can respond differently to treatment, and resistant populations often emerge during therapy due to secondary mutations in the KIT receptor or downstream signaling pathways.
Understanding how different genotypes respond to treatment is therefore essential. However, studying each clone individually is time-consuming and does not capture how clones behave when they grow together and compete within the same environment.
To overcome these limitations, DNA barcoding enables multiplexed experiments in which many clones are pooled and analyzed simultaneously. In this approach, each clone is labeled with a unique DNA barcode. After treatment, high-throughput sequencing is used to count these barcodes, allowing researchers to track how the composition of the population changes over time.
This makes it possible to study many genotypes in parallel within a single experiment, greatly increasing scalability and efficiency.
A key limitation of barcode sequencing data is that it provides only relative abundances. Because these data are compositional, all clone fractions within a sample are constrained to sum to one.
Consequently, relative changes cannot be directly interpreted as absolute changes in clone size. An increase in one clone’s fraction necessarily implies a decrease in others, even if all clones are expanding. Conversely, a clone may appear to decline despite continued growth, simply because competing clones expand more rapidly.
Relative changes alone are therefore insufficient to determine whether a treatment truly suppresses or enhances clonal growth.
To resolve this limitation, measurements of total population size are required.
The appropriate measurement depends on the experimental setting:
These measurements capture how the entire population expands or contracts under treatment and enable interpretation of clonal dynamics on an absolute scale.
The barmixR framework combines both data sources:
By jointly modeling these data in a Bayesian hierarchical framework, barmixR estimates clone-specific treatment effects on an absolute scale while accounting for uncertainty in both measurements.
The central output of the model is quantitative treatment resistance (QTR), which describes how each clone responds to treatment compared to a control condition:
Although barmixR was developed for multiplexed tumor experiments, the underlying approach is broadly applicable. Any experiment that combines:
can benefit from this framework.
Examples include:
To demonstrate the barmixR workflow, we generate a synthetic dataset that captures key features of multiplexed barcoded experiments.
The simulation includes 20 clones, 4 conditions (one control and three treatments), and 5 replicates per condition. For each sample, we simulate:
Counts are generated from a Dirichlet–multinomial distribution, reflecting overdispersion in sequencing data.
Clones are grouped into subsets with different treatment-specific patterns (enriched, neutral, or depleted), introducing structured heterogeneity across conditions.
Total population sizes are simulated using a log-normal distribution, capturing the variability and skewness typical of tumor growth. Treatment conditions differ in their overall growth behavior relative to control.
The simulation reflects the core principle of barmixR: combining relative composition and total population size to define the fractional size for each cell line \(l\) and condition \(k\):
\[ \nu_l^{(k)} = p_l^{(k)} \cdot V^{(k)}, \quad l = 1,\dots,L,\; k = 1,\dots,K \]
where \(l\) indexes cell lines and \(k\) indexes treatment conditions.
library(MGLM)
library(tidyverse)
library(barmixR)
set.seed(1001)
k <- 20
n_reps <- 5
n_reads <- 10000
conditions <- c("DMSO","TreatA","TreatB","TreatC")
generate_profile <- function(pattern,k){
breaks <- floor(seq(0,k,length.out=length(pattern)+1))
prof <- numeric(k)
for(i in seq_along(pattern)){
idx <- (breaks[i]+1):breaks[i+1]
prof[idx] <- pattern[i]
}
prof + runif(k,0,0.1)
}
alpha_profiles <- list(
DMSO = rep(1/k,k),
TreatA = generate_profile(c(5.5,1.1,0.25),k),
TreatB = generate_profile(c(0.2,6,1.0),k),
TreatC = generate_profile(c(1.05,0.5,4),k)
)
all_data <- list()
for(cond in conditions){
alpha <- alpha_profiles[[cond]]
alpha_scaled <- alpha/sum(alpha)*50
mat <- rdirmn(n_reps,n_reads,alpha_scaled)
rownames(mat) <- paste0(cond,"_rep",1:n_reps)
colnames(mat) <- paste0("clone",1:k)
all_data[[cond]] <- mat
}
data_count <- do.call(rbind,all_data)The object data_count contains the simulated barcode sequencing counts for all samples. Each row corresponds to a replicate sample under a specific treatment condition, and each column corresponds to one of the barcoded clones.
The entries in this matrix represent the number of sequencing reads assigned to each barcode in a given sample. Because sequencing measures relative composition, these counts primarily reflect the relative abundance of clones within the pooled population.
## clone1 clone2 clone3 clone4 clone5 clone6 clone7 clone8
## DMSO_rep1 395 1027 344 750 812 312 357 337
## DMSO_rep2 500 397 234 84 1024 1331 681 341
## DMSO_rep3 354 471 342 863 633 395 195 605
## DMSO_rep4 77 148 58 142 839 367 431 173
## DMSO_rep5 953 534 1157 616 174 202 440 494
## TreatA_rep1 470 1056 2673 1060 924 2584 123 193
In this preview:
DMSO_rep1, TreatA_rep2)clone1, clone2, …)These counts form the input to the barcode component of the barmixR model, which estimates clone composition under each treatment condition.
To complement the compositional barcode data, we simulate total population sizes. In an in vivo setting, these correspond to tumor volumes; in an in vitro setting, analogous measurements could be cellular confluency.
get_lognorm_params <- function(mean,sd){
sigma2 <- log(1+(sd/mean)^2)
mu <- log(mean)-sigma2/2
sigma <- sqrt(sigma2)
list(meanlog=mu,sdlog=sigma)
}
params_t0 <- get_lognorm_params(120,75)
params_t14 <- get_lognorm_params(600,500)
sim_vols <- function(params,seed){
set.seed(seed)
rlnorm(n_reps,params$meanlog,params$sdlog)
}
DMSO_t0 <- sim_vols(params_t0,1)
DMSO_t14 <- sim_vols(params_t14,101)
cond_means <- c(0.5,0.8,0.25)*600
cond_params <- lapply(cond_means,get_lognorm_params,sd=500)
cond1_t14 <- sim_vols(cond_params[[1]],102)
cond2_t14 <- sim_vols(cond_params[[2]],103)
cond3_t14 <- sim_vols(cond_params[[3]],104)
V_t0 <- round(c(DMSO_t0,DMSO_t0,DMSO_t0,DMSO_t0),1)
V_t14 <- round(c(DMSO_t14,cond1_t14,cond2_t14,cond3_t14),1)To run the barmixR model, all required information is organized into a structured list. This list includes barcode counts, treatment labels, population-size measurements, and clone identifiers.
condition_count <- factor(
sub("_rep[0-9]+$","",rownames(data_count)),
levels=conditions
)
cell_line <- factor(paste0("clone",1:k))
data <- list(
data_count=data_count,
condition_count=condition_count,
V=V_t14,
VT0=V_t0,
condition_v=condition_count,
cell_line=cell_line
)
time_d <- 14## $data_count
## clone1 clone2 clone3 clone4 clone5 clone6 clone7 clone8 clone9
## DMSO_rep1 395 1027 344 750 812 312 357 337 593
## DMSO_rep2 500 397 234 84 1024 1331 681 341 94
## DMSO_rep3 354 471 342 863 633 395 195 605 262
## DMSO_rep4 77 148 58 142 839 367 431 173 436
## DMSO_rep5 953 534 1157 616 174 202 440 494 631
## TreatA_rep1 470 1056 2673 1060 924 2584 123 193 334
## TreatA_rep2 1124 479 1564 965 794 2577 210 115 61
## TreatA_rep3 1291 1784 1467 860 1722 1193 366 207 125
## TreatA_rep4 1485 932 1281 1293 827 742 545 524 284
## TreatA_rep5 968 1795 919 1337 1109 803 1000 407 62
## TreatB_rep1 31 88 0 3 0 35 1290 711 1396
## TreatB_rep2 70 1 118 0 7 34 1284 860 1105
## TreatB_rep3 2 72 70 0 122 309 1022 2191 618
## TreatB_rep4 0 0 42 0 0 3 1947 1467 384
## TreatB_rep5 35 85 127 4 0 83 1088 1261 1183
## TreatC_rep1 997 114 88 97 190 409 73 43 170
## TreatC_rep2 257 379 399 747 229 651 403 24 23
## TreatC_rep3 1150 368 272 150 595 329 84 63 98
## TreatC_rep4 287 61 391 338 477 251 26 128 314
## TreatC_rep5 534 265 29 362 238 228 231 124 37
## clone10 clone11 clone12 clone13 clone14 clone15 clone16 clone17
## DMSO_rep1 527 401 176 910 279 735 441 355
## DMSO_rep2 434 149 282 461 541 714 336 430
## DMSO_rep3 157 651 267 469 946 505 411 993
## DMSO_rep4 589 1707 755 708 371 492 581 328
## DMSO_rep5 168 676 723 417 201 443 617 565
## TreatA_rep1 174 29 44 166 66 83 0 1
## TreatA_rep2 533 189 709 159 5 3 84 204
## TreatA_rep3 84 43 441 156 51 61 59 0
## TreatA_rep4 137 637 281 628 184 4 16 2
## TreatA_rep5 177 307 19 900 2 13 59 84
## TreatB_rep1 1886 1333 737 1286 158 447 115 0
## TreatB_rep2 1022 2006 1272 1517 28 102 16 126
## TreatB_rep3 1389 936 1218 1032 94 360 28 55
## TreatB_rep4 1227 1273 584 1271 752 112 505 61
## TreatB_rep5 1487 936 1072 579 111 524 515 167
## TreatC_rep1 171 320 103 22 1490 1271 531 524
## TreatC_rep2 65 16 218 119 1008 2022 623 722
## TreatC_rep3 191 33 27 118 1134 816 1380 1453
## TreatC_rep4 60 212 79 473 883 455 692 2496
## TreatC_rep5 10 15 58 321 979 855 704 708
## clone18 clone19 clone20
## DMSO_rep1 428 490 331
## DMSO_rep2 702 316 949
## DMSO_rep3 146 1235 100
## DMSO_rep4 1007 494 297
## DMSO_rep5 520 300 169
## TreatA_rep1 0 14 6
## TreatA_rep2 10 214 1
## TreatA_rep3 13 0 77
## TreatA_rep4 35 66 97
## TreatA_rep5 0 13 26
## TreatB_rep1 212 27 245
## TreatB_rep2 9 17 406
## TreatB_rep3 50 187 245
## TreatB_rep4 64 264 44
## TreatB_rep5 169 36 538
## TreatC_rep1 1174 907 1306
## TreatC_rep2 462 582 1051
## TreatC_rep3 874 214 651
## TreatC_rep4 894 1023 460
## TreatC_rep5 1245 1077 1980
##
## $condition_count
## [1] DMSO DMSO DMSO DMSO DMSO TreatA TreatA TreatA TreatA TreatA
## [11] TreatB TreatB TreatB TreatB TreatB TreatC TreatC TreatC TreatC TreatC
## Levels: DMSO TreatA TreatB TreatC
##
## $V
## [1] 363.8 688.5 282.3 538.6 577.6 190.1 381.4 32.4 1518.9 643.6
## [11] 169.5 348.4 121.7 288.0 67.2 24.9 116.1 119.1 26.3 229.1
##
## $VT0
## [1] 71.0 113.1 63.0 254.3 123.0 71.0 113.1 63.0 254.3 123.0 71.0 113.1
## [13] 63.0 254.3 123.0 71.0 113.1 63.0 254.3 123.0
##
## $condition_v
## [1] DMSO DMSO DMSO DMSO DMSO TreatA TreatA TreatA TreatA TreatA
## [11] TreatB TreatB TreatB TreatB TreatB TreatC TreatC TreatC TreatC TreatC
## Levels: DMSO TreatA TreatB TreatC
##
## $cell_line
## [1] clone1 clone2 clone3 clone4 clone5 clone6 clone7 clone8 clone9
## [10] clone10 clone11 clone12 clone13 clone14 clone15 clone16 clone17 clone18
## [19] clone19 clone20
## 20 Levels: clone1 clone10 clone11 clone12 clone13 clone14 clone15 ... clone9
For demonstration purposes, we use a reduced number of posterior sampling iterations to shorten computation time.
In practical applications, longer sampling runs are recommended to ensure stable and well-mixed posterior estimates.
fit <- barmixRQTR(
data=data,
time_d=time_d,
control=list(
chains=2,
iter_count=500, # reduced number of posterior sampling iterations for demonstration
iter_V=500, # reduced number of posterior sampling iterations for demonstration
cores=2
),
in_vivo=TRUE
)## mpling)
## Chain 2: Iteration: 450 / 500 [ 90%] (Sampling)
## Chain 2: Iteration: 500 / 500 [100%] (Sampling)
## Chain 2:
## Chain 2: Elapsed Time: 61.99 seconds (Warm-up)
## Chain 2: 11.15 seconds (Sampling)
## Chain 2: 73.14 seconds (Total)
## Chain 2:
The model is estimated using Hamiltonian Monte Carlo (HMC) via Stan.
Posterior sampling produces probability distributions for all parameters, including clone compositions, population sizes, and treatment resistance.
We next estimate ratios of relative barcode abundance between treatment groups and the control condition.
sampled_fraction <- ppc_barcode$sampled_fraction
fraction_ratio_results <- fractionRatio(model, sampled_fraction)
fraction_ratio_results$plot_ratio_fractionpopulation_ratio_results <- populationRatio(model, sampled_population)
population_ratio_results$plot_ratio_populationresistance <- QTRresistance(
model,
fraction_ratio_results$li_sam_ratio_relative,
population_ratio_results$li_sam_ratio_V
)
print(resistance$treatment_resistance) The Quantitative Treatment Resistance (QTR) is inferred by integrating information from:
within a joint Bayesian framework. This approach propagates uncertainty from both data sources to obtain posterior distributions of treatment response for each clone–treatment pair.
Each QTR value represents the log-scale change in fractional clone size under treatment relative to control.
When visualized (e.g., as density plots or violin plots), each distribution reflects the posterior uncertainty of treatment resistance for a given genotype–treatment combination. A vertical reference line at zero separates the sensitive and resistant regimes.
The heatmap summarizes clone-specific quantitative treatment resistance (QTR) across all treatments and cell lines, enabling direct comparison of drug effects in a multiplexed setting.
In analogy to the bubble plot representation used in the main analysis, QTR values can be interpreted jointly in terms of direction and magnitude:
This visualization provides a global overview of treatment performance across clones and highlights genotype-specific differences in drug response, facilitating intuitive comparison across all genotype–drug combinations.
One practical goal of multiplexed barcoded experiments is to enable treatment prioritization across genetically distinct tumor subclones. Pooled barcoding approaches were developed to increase throughput, support internally controlled comparisons, and make large-scale genotype-specific drug testing more feasible in preclinical research. In this setting, ranking treatment effects across genotypes can help highlight mutation-specific vulnerabilities while reducing the number of separate follow-up experiments needed for compound selection.
The QTRDecision() function summarizes posterior QTR distributions for each cell line and treatment using median, interquartile range, and whiskers derived from QTRresistance(). This provides a compact ordering of candidate treatments within each genotype and supports comparison of drug performance across heterogeneous cell populations.
cell_order <- paste0("clone", 1:20)
decision <- QTRDecision(
model = model,
summary_table = resistance$summary_table,
cell_order = cell_order,
ncol = 5
# legend_text = "DMSO = control condition"
)
print(decision$rank_plot) In the ranking panel, treatments are ordered within each cell line by their posterior median QTR. Values further to the left indicate lower QTR and therefore greater treatment sensitivity, whereas values further to the right indicate higher QTR and therefore greater resistance. The box shows the interquartile range, the central line marks the median, and the whiskers summarize the lower and upper bounds of the displayed posterior interval.
This type of ranking can support preclinical treatment prioritization by helping to:
In translational and pharmaceutical settings, this kind of summary may be useful as an early prioritization layer for selecting compounds, dose ranges, or combinations for additional validation.
## R version 4.6.0 RC (2026-04-17 r89917)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0 LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] barmixR_0.99.0 lubridate_1.9.5 forcats_1.0.1 stringr_1.6.0
## [5] dplyr_1.2.1 purrr_1.2.2 readr_2.2.0 tidyr_1.3.2
## [9] tibble_3.3.1 ggplot2_4.0.3 tidyverse_2.0.0 MGLM_0.2.1
##
## loaded via a namespace (and not attached):
## [1] sass_0.4.10 generics_0.1.4 stringi_1.8.7
## [4] hms_1.1.4 digest_0.6.39 magrittr_2.0.5
## [7] evaluate_1.0.5 grid_4.6.0 timechange_0.4.0
## [10] RColorBrewer_1.1-3 fastmap_1.2.0 jsonlite_2.0.0
## [13] processx_3.9.0 pkgbuild_1.4.8 ps_1.9.3
## [16] gridExtra_2.3 QuickJSR_1.9.2 scales_1.4.0
## [19] codetools_0.2-20 jquerylib_0.1.4 cli_3.6.6
## [22] rlang_1.2.0 withr_3.0.2 cachem_1.1.0
## [25] yaml_2.3.12 otel_0.2.0 StanHeaders_2.32.10
## [28] tools_4.6.0 rstan_2.32.7 inline_0.3.21
## [31] parallel_4.6.0 tzdb_0.5.0 curl_7.1.0
## [34] vctrs_0.7.3 R6_2.6.1 matrixStats_1.5.0
## [37] stats4_4.6.0 lifecycle_1.0.5 V8_8.2.0
## [40] callr_3.7.6 pkgconfig_2.0.3 RcppParallel_5.1.11-2
## [43] pillar_1.11.1 bslib_0.10.0 gtable_0.3.6
## [46] loo_2.9.0 Rcpp_1.1.1-1.1 glue_1.8.1
## [49] xfun_0.57 tidyselect_1.2.1 knitr_1.51
## [52] dichromat_2.0-0.1 farver_2.1.2 htmltools_0.5.9
## [55] patchwork_1.3.2 labeling_0.4.3 rmarkdown_2.31
## [58] compiler_4.6.0 S7_0.2.2