barmixR: Bayesian Modeling of Barcoded Tumor Mixtures for Quantitative Treatment Resistance Analysis

Mohammad Darbalaei
Daniel Hoffmann
Bioinformatics and Computational Biophysics, Faculty of Biology, University of Duisburg-Essen

Overview

Tumor heterogeneity is a major challenge in cancer research and precision oncology. In many cancers, including gastrointestinal stromal tumors (GIST), tumors are not uniform but consist of multiple genetically distinct subclones. These subclones can respond differently to treatment, and resistant populations often emerge during therapy due to secondary mutations in the KIT receptor or downstream signaling pathways.

Understanding how different genotypes respond to treatment is therefore essential. However, studying each clone individually is time-consuming and does not capture how clones behave when they grow together and compete within the same environment.

Multiplexed experiments with DNA barcoding

To overcome these limitations, DNA barcoding enables multiplexed experiments in which many clones are pooled and analyzed simultaneously. In this approach, each clone is labeled with a unique DNA barcode. After treatment, high-throughput sequencing is used to count these barcodes, allowing researchers to track how the composition of the population changes over time.

This makes it possible to study many genotypes in parallel within a single experiment, greatly increasing scalability and efficiency.

The compositional data challenge

A key limitation of barcode sequencing data is that it provides only relative abundances. Because these data are compositional, all clone fractions within a sample are constrained to sum to one.

Consequently, relative changes cannot be directly interpreted as absolute changes in clone size. An increase in one clone’s fraction necessarily implies a decrease in others, even if all clones are expanding. Conversely, a clone may appear to decline despite continued growth, simply because competing clones expand more rapidly.

Relative changes alone are therefore insufficient to determine whether a treatment truly suppresses or enhances clonal growth.

Integrating population size information

To resolve this limitation, measurements of total population size are required.
The appropriate measurement depends on the experimental setting:

These measurements capture how the entire population expands or contracts under treatment and enable interpretation of clonal dynamics on an absolute scale.

The barmixR framework

The barmixR framework combines both data sources:

By jointly modeling these data in a Bayesian hierarchical framework, barmixR estimates clone-specific treatment effects on an absolute scale while accounting for uncertainty in both measurements.

The central output of the model is quantitative treatment resistance (QTR), which describes how each clone responds to treatment compared to a control condition:

Scope and applications

Although barmixR was developed for multiplexed tumor experiments, the underlying approach is broadly applicable. Any experiment that combines:

can benefit from this framework.

Examples include:

Simulation of Barcode Counts and Population Size

To demonstrate the barmixR workflow, we generate a synthetic dataset that captures key features of multiplexed barcoded experiments.

The simulation includes 20 clones, 4 conditions (one control and three treatments), and 5 replicates per condition. For each sample, we simulate:

Barcode counts

Counts are generated from a Dirichlet–multinomial distribution, reflecting overdispersion in sequencing data.
Clones are grouped into subsets with different treatment-specific patterns (enriched, neutral, or depleted), introducing structured heterogeneity across conditions.

Population size

Total population sizes are simulated using a log-normal distribution, capturing the variability and skewness typical of tumor growth. Treatment conditions differ in their overall growth behavior relative to control.

Integration

The simulation reflects the core principle of barmixR: combining relative composition and total population size to define the fractional size for each cell line \(l\) and condition \(k\):

\[ \nu_l^{(k)} = p_l^{(k)} \cdot V^{(k)}, \quad l = 1,\dots,L,\; k = 1,\dots,K \]

where \(l\) indexes cell lines and \(k\) indexes treatment conditions.

library(MGLM)
library(tidyverse)
library(barmixR)

set.seed(1001)

k <- 20
n_reps <- 5
n_reads <- 10000

conditions <- c("DMSO","TreatA","TreatB","TreatC")

generate_profile <- function(pattern,k){
  breaks <- floor(seq(0,k,length.out=length(pattern)+1))
  prof <- numeric(k)
  for(i in seq_along(pattern)){
    idx <- (breaks[i]+1):breaks[i+1]
    prof[idx] <- pattern[i]
  }
  prof + runif(k,0,0.1)
}

alpha_profiles <- list(
  DMSO   = rep(1/k,k),
  TreatA = generate_profile(c(5.5,1.1,0.25),k),
  TreatB = generate_profile(c(0.2,6,1.0),k),
  TreatC = generate_profile(c(1.05,0.5,4),k)
)

all_data <- list()

for(cond in conditions){

  alpha <- alpha_profiles[[cond]]
  alpha_scaled <- alpha/sum(alpha)*50

  mat <- rdirmn(n_reps,n_reads,alpha_scaled)

  rownames(mat) <- paste0(cond,"_rep",1:n_reps)
  colnames(mat) <- paste0("clone",1:k)

  all_data[[cond]] <- mat
}

data_count <- do.call(rbind,all_data)

Structure of the Simulated Barcode Data

The object data_count contains the simulated barcode sequencing counts for all samples. Each row corresponds to a replicate sample under a specific treatment condition, and each column corresponds to one of the barcoded clones.

The entries in this matrix represent the number of sequencing reads assigned to each barcode in a given sample. Because sequencing measures relative composition, these counts primarily reflect the relative abundance of clones within the pooled population.

print(data_count[1:6,1:8])
##             clone1 clone2 clone3 clone4 clone5 clone6 clone7 clone8
## DMSO_rep1      395   1027    344    750    812    312    357    337
## DMSO_rep2      500    397    234     84   1024   1331    681    341
## DMSO_rep3      354    471    342    863    633    395    195    605
## DMSO_rep4       77    148     58    142    839    367    431    173
## DMSO_rep5      953    534   1157    616    174    202    440    494
## TreatA_rep1    470   1056   2673   1060    924   2584    123    193

In this preview:

These counts form the input to the barcode component of the barmixR model, which estimates clone composition under each treatment condition.

Population Size Simulation

To complement the compositional barcode data, we simulate total population sizes. In an in vivo setting, these correspond to tumor volumes; in an in vitro setting, analogous measurements could be cellular confluency.

get_lognorm_params <- function(mean,sd){

  sigma2 <- log(1+(sd/mean)^2)
  mu <- log(mean)-sigma2/2
  sigma <- sqrt(sigma2)

  list(meanlog=mu,sdlog=sigma)
}

params_t0 <- get_lognorm_params(120,75)
params_t14 <- get_lognorm_params(600,500)

sim_vols <- function(params,seed){
  set.seed(seed)
  rlnorm(n_reps,params$meanlog,params$sdlog)
}

DMSO_t0 <- sim_vols(params_t0,1)
DMSO_t14 <- sim_vols(params_t14,101)

cond_means <- c(0.5,0.8,0.25)*600
cond_params <- lapply(cond_means,get_lognorm_params,sd=500)

cond1_t14 <- sim_vols(cond_params[[1]],102)
cond2_t14 <- sim_vols(cond_params[[2]],103)
cond3_t14 <- sim_vols(cond_params[[3]],104)

V_t0 <- round(c(DMSO_t0,DMSO_t0,DMSO_t0,DMSO_t0),1)
V_t14 <- round(c(DMSO_t14,cond1_t14,cond2_t14,cond3_t14),1)

Constructing the barmixR Input Object

To run the barmixR model, all required information is organized into a structured list. This list includes barcode counts, treatment labels, population-size measurements, and clone identifiers.

condition_count <- factor(
  sub("_rep[0-9]+$","",rownames(data_count)),
  levels=conditions
)

cell_line <- factor(paste0("clone",1:k))

data <- list(
  data_count=data_count,
  condition_count=condition_count,
  V=V_t14,
  VT0=V_t0,
  condition_v=condition_count,
  cell_line=cell_line
)

time_d <- 14
print(data)
## $data_count
##             clone1 clone2 clone3 clone4 clone5 clone6 clone7 clone8 clone9
## DMSO_rep1      395   1027    344    750    812    312    357    337    593
## DMSO_rep2      500    397    234     84   1024   1331    681    341     94
## DMSO_rep3      354    471    342    863    633    395    195    605    262
## DMSO_rep4       77    148     58    142    839    367    431    173    436
## DMSO_rep5      953    534   1157    616    174    202    440    494    631
## TreatA_rep1    470   1056   2673   1060    924   2584    123    193    334
## TreatA_rep2   1124    479   1564    965    794   2577    210    115     61
## TreatA_rep3   1291   1784   1467    860   1722   1193    366    207    125
## TreatA_rep4   1485    932   1281   1293    827    742    545    524    284
## TreatA_rep5    968   1795    919   1337   1109    803   1000    407     62
## TreatB_rep1     31     88      0      3      0     35   1290    711   1396
## TreatB_rep2     70      1    118      0      7     34   1284    860   1105
## TreatB_rep3      2     72     70      0    122    309   1022   2191    618
## TreatB_rep4      0      0     42      0      0      3   1947   1467    384
## TreatB_rep5     35     85    127      4      0     83   1088   1261   1183
## TreatC_rep1    997    114     88     97    190    409     73     43    170
## TreatC_rep2    257    379    399    747    229    651    403     24     23
## TreatC_rep3   1150    368    272    150    595    329     84     63     98
## TreatC_rep4    287     61    391    338    477    251     26    128    314
## TreatC_rep5    534    265     29    362    238    228    231    124     37
##             clone10 clone11 clone12 clone13 clone14 clone15 clone16 clone17
## DMSO_rep1       527     401     176     910     279     735     441     355
## DMSO_rep2       434     149     282     461     541     714     336     430
## DMSO_rep3       157     651     267     469     946     505     411     993
## DMSO_rep4       589    1707     755     708     371     492     581     328
## DMSO_rep5       168     676     723     417     201     443     617     565
## TreatA_rep1     174      29      44     166      66      83       0       1
## TreatA_rep2     533     189     709     159       5       3      84     204
## TreatA_rep3      84      43     441     156      51      61      59       0
## TreatA_rep4     137     637     281     628     184       4      16       2
## TreatA_rep5     177     307      19     900       2      13      59      84
## TreatB_rep1    1886    1333     737    1286     158     447     115       0
## TreatB_rep2    1022    2006    1272    1517      28     102      16     126
## TreatB_rep3    1389     936    1218    1032      94     360      28      55
## TreatB_rep4    1227    1273     584    1271     752     112     505      61
## TreatB_rep5    1487     936    1072     579     111     524     515     167
## TreatC_rep1     171     320     103      22    1490    1271     531     524
## TreatC_rep2      65      16     218     119    1008    2022     623     722
## TreatC_rep3     191      33      27     118    1134     816    1380    1453
## TreatC_rep4      60     212      79     473     883     455     692    2496
## TreatC_rep5      10      15      58     321     979     855     704     708
##             clone18 clone19 clone20
## DMSO_rep1       428     490     331
## DMSO_rep2       702     316     949
## DMSO_rep3       146    1235     100
## DMSO_rep4      1007     494     297
## DMSO_rep5       520     300     169
## TreatA_rep1       0      14       6
## TreatA_rep2      10     214       1
## TreatA_rep3      13       0      77
## TreatA_rep4      35      66      97
## TreatA_rep5       0      13      26
## TreatB_rep1     212      27     245
## TreatB_rep2       9      17     406
## TreatB_rep3      50     187     245
## TreatB_rep4      64     264      44
## TreatB_rep5     169      36     538
## TreatC_rep1    1174     907    1306
## TreatC_rep2     462     582    1051
## TreatC_rep3     874     214     651
## TreatC_rep4     894    1023     460
## TreatC_rep5    1245    1077    1980
## 
## $condition_count
##  [1] DMSO   DMSO   DMSO   DMSO   DMSO   TreatA TreatA TreatA TreatA TreatA
## [11] TreatB TreatB TreatB TreatB TreatB TreatC TreatC TreatC TreatC TreatC
## Levels: DMSO TreatA TreatB TreatC
## 
## $V
##  [1]  363.8  688.5  282.3  538.6  577.6  190.1  381.4   32.4 1518.9  643.6
## [11]  169.5  348.4  121.7  288.0   67.2   24.9  116.1  119.1   26.3  229.1
## 
## $VT0
##  [1]  71.0 113.1  63.0 254.3 123.0  71.0 113.1  63.0 254.3 123.0  71.0 113.1
## [13]  63.0 254.3 123.0  71.0 113.1  63.0 254.3 123.0
## 
## $condition_v
##  [1] DMSO   DMSO   DMSO   DMSO   DMSO   TreatA TreatA TreatA TreatA TreatA
## [11] TreatB TreatB TreatB TreatB TreatB TreatC TreatC TreatC TreatC TreatC
## Levels: DMSO TreatA TreatB TreatC
## 
## $cell_line
##  [1] clone1  clone2  clone3  clone4  clone5  clone6  clone7  clone8  clone9 
## [10] clone10 clone11 clone12 clone13 clone14 clone15 clone16 clone17 clone18
## [19] clone19 clone20
## 20 Levels: clone1 clone10 clone11 clone12 clone13 clone14 clone15 ... clone9

Bayesian Model Fitting

For demonstration purposes, we use a reduced number of posterior sampling iterations to shorten computation time.
In practical applications, longer sampling runs are recommended to ensure stable and well-mixed posterior estimates.

fit <- barmixRQTR(
  data=data,
  time_d=time_d,
  control=list(
    chains=2,
    iter_count=500, # reduced number of posterior sampling iterations for demonstration
    iter_V=500,     # reduced number of posterior sampling iterations for demonstration
    cores=2
  ),
  in_vivo=TRUE
)
## mpling)
## Chain 2: Iteration: 450 / 500 [ 90%]  (Sampling)
## Chain 2: Iteration: 500 / 500 [100%]  (Sampling)
## Chain 2: 
## Chain 2:  Elapsed Time: 61.99 seconds (Warm-up)
## Chain 2:                11.15 seconds (Sampling)
## Chain 2:                73.14 seconds (Total)
## Chain 2:
model <- fit

The model is estimated using Hamiltonian Monte Carlo (HMC) via Stan.

Posterior sampling produces probability distributions for all parameters, including clone compositions, population sizes, and treatment resistance.

Posterior Predictive Checks for Barcode Counts

ppc_barcode <- ppcBarcodes(model)
print(ppc_barcode$ppc_plot)

Estimating Ratios of Relative Barcode Abundance

We next estimate ratios of relative barcode abundance between treatment groups and the control condition.

sampled_fraction <- ppc_barcode$sampled_fraction

fraction_ratio_results <- fractionRatio(model, sampled_fraction)

fraction_ratio_results$plot_ratio_fraction

Posterior Predictive Check for Population Size

ppc_population_results <- ppcPopulation(model)

ppc_population_results$ppc_plot

sampled_population <- ppc_population_results$sampled_population

Estimating Ratios of Population Size

population_ratio_results <- populationRatio(model, sampled_population)

population_ratio_results$plot_ratio_population

Estimation of Quantitative Treatment Resistance

resistance <- QTRresistance(
  model,
  fraction_ratio_results$li_sam_ratio_relative,
  population_ratio_results$li_sam_ratio_V
)

print(resistance$treatment_resistance)

The Quantitative Treatment Resistance (QTR) is inferred by integrating information from:

within a joint Bayesian framework. This approach propagates uncertainty from both data sources to obtain posterior distributions of treatment response for each clone–treatment pair.

Each QTR value represents the log-scale change in fractional clone size under treatment relative to control.

Interpretation

When visualized (e.g., as density plots or violin plots), each distribution reflects the posterior uncertainty of treatment resistance for a given genotype–treatment combination. A vertical reference line at zero separates the sensitive and resistant regimes.

Visualization of the Resistance Landscape

heatmap <- QTRheatmap(
  model,
  resistance$summary_table
)

print(heatmap$heatmap_within)

print(heatmap$heatmap_treatment)

The heatmap summarizes clone-specific quantitative treatment resistance (QTR) across all treatments and cell lines, enabling direct comparison of drug effects in a multiplexed setting.

In analogy to the bubble plot representation used in the main analysis, QTR values can be interpreted jointly in terms of direction and magnitude:

This visualization provides a global overview of treatment performance across clones and highlights genotype-specific differences in drug response, facilitating intuitive comparison across all genotype–drug combinations.

Decision Support for Treatment Prioritization

One practical goal of multiplexed barcoded experiments is to enable treatment prioritization across genetically distinct tumor subclones. Pooled barcoding approaches were developed to increase throughput, support internally controlled comparisons, and make large-scale genotype-specific drug testing more feasible in preclinical research. In this setting, ranking treatment effects across genotypes can help highlight mutation-specific vulnerabilities while reducing the number of separate follow-up experiments needed for compound selection.

The QTRDecision() function summarizes posterior QTR distributions for each cell line and treatment using median, interquartile range, and whiskers derived from QTRresistance(). This provides a compact ordering of candidate treatments within each genotype and supports comparison of drug performance across heterogeneous cell populations.

cell_order <- paste0("clone", 1:20)

decision <- QTRDecision(
  model = model,
  summary_table = resistance$summary_table,
  cell_order = cell_order,
  ncol = 5
  # legend_text = "DMSO = control condition"
)

print(decision$rank_plot)

In the ranking panel, treatments are ordered within each cell line by their posterior median QTR. Values further to the left indicate lower QTR and therefore greater treatment sensitivity, whereas values further to the right indicate higher QTR and therefore greater resistance. The box shows the interquartile range, the central line marks the median, and the whiskers summarize the lower and upper bounds of the displayed posterior interval.

This type of ranking can support preclinical treatment prioritization by helping to:

In translational and pharmaceutical settings, this kind of summary may be useful as an early prioritization layer for selecting compounds, dose ranges, or combinations for additional validation.

sessionInfo()
## R version 4.6.0 RC (2026-04-17 r89917)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.23-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] barmixR_0.99.0  lubridate_1.9.5 forcats_1.0.1   stringr_1.6.0  
##  [5] dplyr_1.2.1     purrr_1.2.2     readr_2.2.0     tidyr_1.3.2    
##  [9] tibble_3.3.1    ggplot2_4.0.3   tidyverse_2.0.0 MGLM_0.2.1     
## 
## loaded via a namespace (and not attached):
##  [1] sass_0.4.10           generics_0.1.4        stringi_1.8.7        
##  [4] hms_1.1.4             digest_0.6.39         magrittr_2.0.5       
##  [7] evaluate_1.0.5        grid_4.6.0            timechange_0.4.0     
## [10] RColorBrewer_1.1-3    fastmap_1.2.0         jsonlite_2.0.0       
## [13] processx_3.9.0        pkgbuild_1.4.8        ps_1.9.3             
## [16] gridExtra_2.3         QuickJSR_1.9.2        scales_1.4.0         
## [19] codetools_0.2-20      jquerylib_0.1.4       cli_3.6.6            
## [22] rlang_1.2.0           withr_3.0.2           cachem_1.1.0         
## [25] yaml_2.3.12           otel_0.2.0            StanHeaders_2.32.10  
## [28] tools_4.6.0           rstan_2.32.7          inline_0.3.21        
## [31] parallel_4.6.0        tzdb_0.5.0            curl_7.1.0           
## [34] vctrs_0.7.3           R6_2.6.1              matrixStats_1.5.0    
## [37] stats4_4.6.0          lifecycle_1.0.5       V8_8.2.0             
## [40] callr_3.7.6           pkgconfig_2.0.3       RcppParallel_5.1.11-2
## [43] pillar_1.11.1         bslib_0.10.0          gtable_0.3.6         
## [46] loo_2.9.0             Rcpp_1.1.1-1.1        glue_1.8.1           
## [49] xfun_0.57             tidyselect_1.2.1      knitr_1.51           
## [52] dichromat_2.0-0.1     farver_2.1.2          htmltools_0.5.9      
## [55] patchwork_1.3.2       labeling_0.4.3        rmarkdown_2.31       
## [58] compiler_4.6.0        S7_0.2.2