---
title: "Simulation of Dissolution Profiles"
author: "Zhengguo XU"
date: "`r format(Sys.Date())`"
output: 
  rmarkdown::html_vignette:
    keep_md: true
    toc: true
    toc_depth: 3
    fig_width: 7
    fig_height: 4.5
  highlight: "tango"
bibliography: ref.bib
notes-after-punctuation: false
link-citations: yes
csl: ref.csl
vignette: >
  %\VignetteIndexEntry{Simulation of Dissolution Profiles}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup0, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#"
)
```

```{r setup, echo=FALSE}
library(bootf2)
```

## Introduction

Dissolution profiles are simulated either by mathematical models, or by
multivariate normal distribution based on function `mvrnorm()` from R package
`MASS`.

### Mathematical Models

Two models are implemented at the moment: *first-order* model and *Weibull*
model (the default). 

The first-order model is expressed as 

$$f_t = f_\mathrm{max}\left(1 - e^{-k\cdot (t - t_\mathrm{lag})}\right),$$
and the Weibull model either as
$$f_t = f_\mathrm{max}\left(1 - e^{-\left(\frac{t - t_\mathrm{lag}}
  {\mathrm{MDT}}\right)^\beta}\right),$$
or as 
$$f_t = f_\mathrm{max}\left(1 - e^{-\frac{\left(t - t_\mathrm{lag}\right)^\beta}
  {\alpha}}\right).$$
Obviously, the two expressions of Weibull model are mathematically equivalent
with $\alpha = \mathrm{MDT}^\beta$. 

Parameter $f_\mathrm{max}$ is the maximum dissolution. In theory, it is 100,
but in practice, it might be slightly high than 100, or much less if the
dissolution is not complete. For immediate-release formulation, the lag time
$t_\mathrm{lag}$ is typically zero, but not necessary so for the extended
release formulation. Parameter $k$ is the first-order rate constant, and
$\alpha$ and $\beta$ are scale and shape parameters, and $\mathrm{MDT}$ is
the mean dissolution time.


### Simulation based on mathematical models

Let $P_\mu$ be the set of *population* model parameters, i.e., 
$P_\mu = \{f_\mathrm{max}, k, t_\mathrm{lag}\}$ for the first-order 
model and $P_\mu = \{f_\mathrm{max}, t_\mathrm{lag}, \mathrm{MDT}, \beta\}$
or $P_\mu = \{f_\mathrm{max}, t_\mathrm{lag}, \alpha, \beta\}$ for Weibull
model, given any time points `tp`, the dissolution profile $f_t$ will be
calculated according to the equations above with parameters $P_\mu$.
The generated profile is considered as *population* dissolution profile `dp`. 

Individual dissolution profiles are generated with the same equation, but with
individual model parameters $P_i$, which is simulated according to exponential
error model 
$$P_i = P_\mu \cdot e^{N\left(0,\, \sigma^2\right)},$$
where $N\left(0,\, \sigma^2\right)$ represents the normal distribution 
with mean zero and variance $\sigma^2$ ($\sigma = \mathrm{CV}/100$)

### Simulation based on multivariate normal distribution

Note that the *population* dissolution profile `dp` in the previous section
is generated by equation with *population* model parameters $P_\mu$. Sometime
we might have a mean dissolution profile `dp`, which is considered as population
profile, and we want to simulate many individual profiles such that the
calculated mean profile is equal to `dp`. In such case, multivariate normal
distribution approach will be used. Given any `tp`, `dp`, and the required
number of individual units, `n.units`, the function will simulate individual
profiles fulfil the above mentioned requirements.

### Recommendation

*Depending on the variability and the target profile, the run time of simulation
based on multivariate normal distribution might be very long, sometimes it might
even fail to finish. So in general, the modelling approach is recommended.*

## Usage

The complete list of arguments of the function is as follows:

```{r simdp-code0, eval=FALSE}
sim.dp(tp, dp, dp.cv, model = c("Weibull", "first-order"),
       model.par = NULL, seed = NULL, n.units = 12L, product,
       max.disso = 100, ascending = FALSE, message = FALSE,
       time.unit = c("min", "h"), plot = TRUE,
       plot.max.unit = 36L)
```

The approach used to simulate individual dissolution profiles depends on if the
target mean dissolution profile dp is provided by the user or not.

- If `dp` is not provided, then it will be calculated using `tp`, `model`, and
  `model.par`. The parameters defined by `model.par` are considered as the
  *population parameter*; consequently, the calculated `dp` is considered as the
  *targeted population profile*. In addition, `n.units` sets of *individual
  model parameters* will be simulated based on exponential error model, and
  *individual dissolution profiles* will be generated using those individual
  parameters. The mean of all simulated individual profiles, `sim.mean`,
  included in one of the outputs data frames, `sim.summary`, will be *similar, 
  but not identical, to `dp`*. The difference between `sim.mean` and `dp` is
  determined by the number of units and the variability of the model parameters.
  In general, the larger the number of units and the lower of the variability,
  the smaller the difference. 
- If `dp` is provided, then `n.units` of individual dissolution profiles will be
  simulated using multivariate normal distribution. The mean of all simulated
  individual profiles, `sim.mean`, will be *identical to `dp`*. In such case, 
  it is recommended that `dp.cv`, the CV at time points `tp`, should also be
  provided by the user. If `dp.cv` is not provided, it will be generated
  automatically such that the CV is between 10% and 20% for time points up to 
  10 min, between 1% and 3% for time points where the dissolution is more than
  95%, between 3% and 5% for time points where the dissolution is between 90% 
  and 95%, and between 5% and 10% for the rest of time points. Whether the 
  `dp.cv` is provided or generated automatically, the CV calculated using all
  individual profiles will be equal to `dp.cv`. 

### Minimum required arguments that must be provided by the user
If `dp` is provided by the user, logically, `tp` must also be provided, and the
approach based on multivariate normal distribution is used, as explained above.
If `dp` is not provided, `tp` could be missing, i.e., a simple function call
such as `sim.dp()` will simulate dissolution profiles. In such case, a default
`tp` will be generated depending on the `time.unit`: if the `time.unit` is
`"min"`, then `tp` would be `c(5, 10, 15, 20, 30, 45, 60)`; otherwise the `tp`
would be `c(1, 2, 3, 4, 6, 8, 10, 12)`. The rest arguments are either optional
or required by the function but default values will be used. 


### Notes on function arguments

1. Model parameter `model.par` has to be specified as *named list*, e.g., 
   `list(fmax = 100, fmax.cv = 3, tlag = 0, tlag.cv = 0, k = 0.9, k.cv = 30)`.
   The parameter `fmax/k/tlag` are used to generate `dp`, and the corresponding
   `.cv` parts are used to generate individual model parameters. If `model.par`
   is missing, it will be generated randomly.
1. When multivariate normal distribution approach is used, depending on the
   target profile and the variability, sometimes the simulated individual
   profiles will *decrease* in the end. This could also happen to the real-life
   profiles, especially for those products that are unstable in the dissolution
   media. To force the profiles *always increase* with time, set option
   `ascending = TRUE`. However, in that case, it is possible that the function
   takes long time to run or even fails.
1. Parameter `product` is not really necessary for the simulation. so if
   missing, it will be generated automatically with 3 letters and 4 digits. It
   might be useful in the situations such as many simulation will be run and
   output are pooled together for analysis.
1. Read the function manual by `help("sim.dp")` for more details on 
   each argument. 


## Examples

### Simple case

For the most basic use, the function can be run without any user provided 
arguments, e.g., `sim.dp()`. In such case, 12 units of individual dissolution
profiles will be simulated using Weibull model with a typical sampling time
points of 5, 10, 15, 20, 30, 45, and 60 min. A `seed` number will be randomly 
generated, if not provided by the user, and included in the output for
reproducibility purpose. 
 
```{r simple01-dat}
# simulation
tmp1 <- sim.dp(seed = 1234)
```

The output is a list of 5 components:

1. `sim.summary`: A *data frame* with summary statistics of all individual
    dissolution profiles.
    
```{r simple01-summary, error=TRUE}
tmp1$sim.summary
```
`dp` is the population mean profile obtained with the parameters in `sim.info`,
as explained in previous section. The rest columns with prefix `sim` are basic
descriptive statistics calculated from all simulated individual profiles.
`sim.qt05`, `sim.qt25`, ..., are 5%, 25%, .., quantiles.

2. `sim.disso`: A *data frame* with all individual dissolution profiles.

```{r simple01-disso, error=TRUE}
tmp1$sim.disso
```

3. `sim.info`:  A *data frame* with information of the simulation.

```{r simple01-info, error=TRUE}
tmp1$sim.info
```

4. `model.par.ind`: A *data frame* of individual model parameters that are used
    to simulate the individual dissolution profiles if mathematical models are
    chosen for the simulation.
```{r simple01-par, error=TRUE}
tmp1$model.par.ind
```

5. `sim.plot`: A plot since the default argument `plot` is set to be `TRUE`.

```{r simple01-plot, error=TRUE}
tmp1$sim.plot
```

When the number of individual units is not large, all individual profiles will
be plotted, together with the population profile in green (as target profile),
and mean profile in blue calculated with simulated individual profiles. When 
the number is small, there will be visible difference between the mean and the
target profile due to random error. When the number of units increase, the
difference will become smaller. 

The argument `plot.max.unit` control how individual profile will be represented
in the plot. When the actual number of units is greater than the value of
`plot.max.unit`, the individual profile will be represented as boxplots at each
time points, as shown below.

```{r simple01-boxplot, error=TRUE}
# default plot.max.unit = 36
sim.dp(n.units = 100)$sim.plot
```

### Example with model parameters

To obtain better controlled simulation, model parameters should be provided. 
CV should be specified in *percentage*.

```{r model01-dat, error=TRUE}
fo.par <- list(fmax = 100, fmax.cv = 3, k = 0.1, k.cv = 20, 
               tlag = 0, tlag.cv = 0)
fo.dat <- sim.dp(model = "first-order", model.par = fo.par, seed = 123)
fo.dat$sim.plot
fo.dat$sim.summary
```
Similarly, for Weibull model, the model parameter should be provided like the
example below. The order of the parameter does not matter, but the parameter
names have to be specified exactly as the following:
```{r model01-par, eval=FALSE}
# with alpha = xx and alpha.cv = yy to replace beta/beta.cv if alternative
# expression of Weibull model is used.
mod.par <- list(fmax = 100, fmax.cv = 3, tlag = 0, tlag.cv = 0, 
                mdt = 20, mdt.cv = 25, beta = 2, beta.cv = 30)
```

### Example with multivariate normal distribution

```{r mvn01-dat, error=TRUE}
# target mean profile 
dp <- c(39, 56, 67, 74, 83, 90, 94)

# CV at each time points
dp.cv <- c(19, 15, 10, 8, 8, 5, 3)

mvn.dat <- sim.dp(tp, dp = dp, dp.cv = dp.cv, seed = 1234)
mvn.dat$sim.summary
```
Notice that the the mean and CV of the simulated individual profile (
`sim.mean` and `sim.cv`) are equal to the target profile `dp` and `dp.cv`.
The plot look like this:

```{r mvn01-plot, error=TRUE}
mvn.dat$sim.plot
```

Another example with missing `dp.cv`.
```{r mvn02-dat, error=TRUE}
mvn.dat2 <- sim.dp(tp, dp = dp, seed = 123)
mvn.dat2$sim.summary
```
Notice that the `dp.cv` were generated automatically as explained in
[Notes on function arguments].