---
title: "Introduction to CohortSymmetry"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{a01_Introduction}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 5,
  eval = Sys.getenv("$RUNNER_OS") != "macOS"
)
options(rmarkdown.html_vignette.check_title = FALSE)
```

CohortSymmetry provides tools to perform Sequence Symmetry Analysis (SSA). Before using the package, it is highly recommended that this
method is tested beforehand against well-known positive and negative
controls. The details of SSA and the relevant controls could be found using Pratt et al (2015).

The functions you will interact with are:

1.  `generateSequenceCohortSet()`: this function will create a cohort with individuals present in both (the index and the marker) cohorts.

2.  `summariseSequenceRatios()`: this function will calculate sequence ratios.

3.  `tableSequenceRatios()` and `plotSequenceRatios()`: these functions will help us to visualise the sequence ratio results.

4. `summariseTemporalSymmetry()`: this function will produce aggregated results based on the time difference between two cohort start dates.

5. `plotTemporalSymmetry()`: this function will help us to visualise the results from summariseTemporalSymmetry().

Below, you will find an example analysis that offers a brief and comprehensive overview of the package's functionalities. More context and further examples for each of these functions are provided in later vignettes.

First, let’s load the relevant libraries.

```{r message= FALSE, warning=FALSE}
library(CDMConnector)
library(dplyr)
library(DBI)
library(omock)
library(CohortSymmetry)
library(duckdb)
```

The CohortSymmetry package works with data mapped to the OMOP CDM. Hence, the initial step involves connecting to a database. As an example, we will be using Omock package to generate a mock database with two mock cohorts: the **index_cohort** and the **marker_cohort**.

```{r message= FALSE, warning=FALSE}
cdm <- emptyCdmReference(cdmName = "mock") |>
  mockPerson(nPerson = 1000) |>
  mockObservationPeriod() |>
  mockCohort(
    name = "index_cohort",
    numberCohorts = 1,
    cohortName = c("index_cohort"),
    seed = 1,
  ) |>
  mockCohort(
    name = "marker_cohort",
    numberCohorts = 1,
    cohortName = c("marker_cohort"), 
    seed = 2
  )

con <- dbConnect(duckdb::duckdb())
cdm <- copyCdmTo(con = con, cdm = cdm, schema = "main", overwrite = T)

cdm$index_cohort |> 
  dplyr::glimpse()

cdm$marker_cohort |> 
  dplyr::glimpse()

```

Once we have established a connection to the database, we can use the `generateSequenceCohortSet()` function to find the intersection of the two cohorts. This function will provide us with the individuals who appear in both cohorts, which will be named **intersect** - another cohort in the cdm reference.

```{r message= FALSE, warning=FALSE}
cdm <- generateSequenceCohortSet(
  cdm = cdm,
  indexTable = "index_cohort",
  markerTable = "marker_cohort",
  name = "intersect",
  combinationWindow = c(0, Inf)
)
```

See below that the generated cohort follows the format of an OMOP CDM cohort with the addition of two extra columns: *index_date* and *marker_date*. These columns correspond to the *cohort_start_date* in the **index_cohort** and the **marker_cohort**, respectively.

```{r message= FALSE, warning=FALSE}
cdm$intersect |> 
  dplyr::glimpse()
```

Once we have the intersect cohort, you are able to explore the temporal symmetry by using `summariseTemporalSymmetry` and `plotTemporalSymmetry()`:

```{r message= FALSE, warning=FALSE}
result <- summariseTemporalSymmetry(cohort = cdm$intersect, 
                                    timescale = "year")
result |> dplyr::glimpse()

plotTemporalSymmetry(result = result)
```

Next, we will use the `summariseSequenceRatios()` function to get the crude sequence ratios, adjusted sequence ratios, and the corresponding confidence intervals. 
```{r message= FALSE, warning=FALSE}
result <- summariseSequenceRatios(cohort = cdm$intersect)

result |> dplyr::glimpse()
```
Finally, we can visualise the results using `tableSequenceRatios()`:
```{r message= FALSE, warning=FALSE}
tableSequenceRatios(result)
```
Or create a plot with the adjusted sequence ratios:

```{r message= FALSE, warning=FALSE}
plotSequenceRatios(result = result,
                  onlyaSR = T,
                  colours = "black")
```

## As a diagram

Diagrammatically, the work flow using CohortSymmetry resembles the following flow chat:

```{r, echo=FALSE, message=FALSE, out.width="100%", warning=FALSE}
library(here)
knitr::include_graphics(here("vignettes/workflow.png"))
```