---
title: "ABCDscores"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{ABCDscores}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Setup
After installing the package, you can load it with:
```{r setup}
library(ABCDscores)
```
Alternatively you can call functions directly, without loading the package,
using `::`, e.g., `ABCDscores::name_of_function(...)`
### Data preparation
To compute summary scores, you'll need to have downloaded data from the ABCD
Study®. To request access to the data, visit the
[NIH Brain Development Cohorts (NBDC) Data Hub](https://www.nbdc-datahub.org/)
Once you have access, you can use different tools to access and download the
data; they are described in more detail in the
[ABCD data documentation](https://docs.abcdstudy.org/latest/tools/tools.html).
Here we assume that you created a dataset containing the variables you want to
summarize in [DEAP](https://abcd.deapscience.com/) and downloaded it in the
`rds` format. Afterwards, unzip the `dataset.rds.zip` file to the working
directory (or move the zip file to the working directory and use
`utils::unzip("dataset.rds.zip")` to extract all files). The unzipped files
should consist of a `dataset.rds` file and an Excel file with the data
dictionary and categorical levels.
Load the data into R using the following command:
```r
data <- readRDS("dataset.rds")
```
## Computing summary scores
### Score naming convention
Before computing summary scores, it is important to understand the structure and
nomenclature of the functions in the package:
- For any given summary score, the function to compute it is named
`compute_()`. For example, the function to compute the score
`fc_p_psb_mean` is named `compute_fc_p_psb_mean()`.^[There are a few exceptions
to this rule—the summary scores in the tables `su_y_sui` and `su_y_tlfb` are
computed using higher level functions as explained in the [SUI](https://software.nbdc-datahub.org/ABCDscores/articles/sui.html) and
[TLFB](https://software.nbdc-datahub.org/ABCDscores/articles/tlfb.html) vignettes.]
- For any given measure/table, there exists a high-level
`compute__all()` function that computes all scores for that
measure/table. For example, the function to compute all scores for the
`fc_p_psb` measure/table is named `compute_fc_p_psb_all()`.
- For any given summary score function, certain columns—the columns that are
being summarized and in some cases additional columns like age or sex—are
required to be present in the data for the score to be computed. The function
documentation lists the required columns for a given function. In addition, the
columns that a summary score function summarizes are typically provided as a
character vector named `vars_`. For example, the vector with the
columns that are summarized by the `fc_p_psb_mean` function is named
`vars_fc_p_psb`.
The [references page](https://software.nbdc-datahub.org/ABCDscores/reference/index.html) provides a list of all available
functions and their parameters.
### Basic usage
After reading in the data, we can start to compute summary scores. As an
example, we will demonstrate how to compute the two summary scores for the
`fc_p_psb` measure/table (`fc_p_psb_mean` and `fc_p_psb_nm`) in two different
ways:
1. using the specific functions to compute one score at a time
1. using the `_all()` function to compute all scores for the measure/table at
once.
When we refer to the documentation for `compute_fc_p_psb_mean()`, we see that it
requires the following variables: `fc_p_psb_001`, `fc_p_psb_002`, and
`fc_p_psb_003`. If these variables are part of the dataset created in and
downloaded from [DEAP](https://abcd.deapscience.com/), they should be present
in the data after reading in `dataset.rds` as demonstrated above.
Here, for demonstration purposes, we will create a dummy data frame with these
columns:
```{r}
data <- tibble::tibble(
fc_p_psb_001 = c("1", "2", "3", "4", "5"),
fc_p_psb_002 = c("1", NA, "3", "4", NA),
fc_p_psb_003 = c("1", "2", "2", "4", NA)
)
data
```
For most summary score functions, only the `data` argument (input data frame) is
required, i.e., we can just use the function like this:
```{r}
compute_fc_p_psb_mean(data)
```
We can do the same using `fc_p_psb_nm()`:
```{r}
compute_fc_p_psb_nm(data)
```
We can also compute both scores at the same time by chaining the function calls
using the pipe operator:
```{r}
data |>
compute_fc_p_psb_mean() |>
compute_fc_p_psb_nm()
```
Lastly, if we want to compute all scores for the measure with one function call,
we can use the `compute__all()` function for the `fc_p_psb` table:
```{r}
compute_fc_p_psb_all(data)
```
### Important parameters and customization
#### `data`
The `data` argument is the input data frame that contains the columns required
to compute the score. The required columns are documented in the function
documentation for each score.
#### `name`
The `name` argument is used to specify the name of the output score. The default
default value for this parameter is the official name of the column in the
released data, but it can be overridden by users with a custom name.
```{r}
compute_fc_p_psb_mean(data, name = "my_custom_name")
```
For example, this is useful when the data frame specified in `data` contains the
official summary score that one is trying to reproduce. In this case, the user
is required to specify a different name; otherwise the function will return an
error.
#### `combine`
The `combine` argument is used to specify whether to combine the output score
with the input data frame. The default value is `TRUE`, which means the output
score is appended as a new column on the right hand side of the input data
frame. If the argument is set to `FALSE`, the output score is returned as a
single-column data frame:
```{r}
compute_fc_p_psb_mean(data, combine = FALSE)
```
#### `max_na`
The `max_na` argument is used to specify the maximum number of missing values
across all summarized variables a given row (or participant/event) can have for
the summary score to still be computed. If the number of missing values
in a row exceeds the specified value, the score for that row is set to `NA`.
Depending on the summary score, the number of missing values allowed may vary
and not all summary score functions have this argument.
- `NULL`: No limit on missing values.
- `0`: No missing values allowed.
- `1`: At most one missing value allowed.
- ...
For most summary scores in the ABCD data resource, `max_na` is set to a number
that ensures that >=80% of the variables that the given score summarizes have a
non-missing value. Users can use the `max_na` argument if they want to compute
the summary score in a more lenient or more restrictive manner.
As an example, let's explore how the summary score changes when we set `max_na`
argument to `1` (above we used the default, which in the case of
`compute_fc_p_psb_mean()` is `0`). Now a score is computed for the second row
which has one missing value but not for the last row which has two missing
values:
```{r}
compute_fc_p_psb_mean(data, max_na = 1)
```
When we change `max_na` to `2`, a score is also computed for the last row:
```{r}
compute_fc_p_psb_mean(data, max_na = 2)
```
#### `exclude`
The `exclude` argument is used to specify values that should be excluded
from the computation of the score. Some specific values in the data might
be considered as missing values, e.g., coded non-responses like "Don't know"
(`999`), "Decline to answer" (`777`), etc. This argument allows the user to
specify these values so that they are treated as missing values during the
computation of the score (importantly, the `max_na` argument applies to all
values that are either missing, `NA`, or specified as values to be excluded
using the `exclude` argument). Not all score functions have this argument.
In this example we use another score function `compute_mh_p_abcl__afs__frnd_sum`
which has the `exclude` argument. We first construct a dummy data frame:
```{r}
data <- tibble::tibble(
mh_p_abcl__frnd_001 = c(1, 2, 3, 4, 5),
mh_p_abcl__frnd_002 = c(1, 777, 3, 4, 777),
mh_p_abcl__frnd_003 = c(1, 2, NA, 4, 777),
mh_p_abcl__frnd_004 = c(1, 2, 3, 4, 999),
)
data
```
When we compute the score, only the 1, 4 rows are computed, because other rows
contain `777` or `999` or `NA` values.
```{r}
compute_mh_p_abcl__afs__frnd_sum(data, exclude = c("777", "999"))
```
We can also exclude custom values, for example, we can exclude `4`, and then
only the first row is computed.
```{r}
compute_mh_p_abcl__afs__frnd_sum(data, exclude = c("777", "999", "4"))
```
### Utility functions
The `compute_()` functions are the main functions to compute summary
scores, with one summary score function for each score (besides a few exceptions
that are documented in the other [vignettes](https://software.nbdc-datahub.org/ABCDscores/articles/)). However, to be more
concise, the main functions often use utility functions. These utility functions
are not necessarily meant to be used directly by users of this package, but they
are documented and exported for transparency and reproducibility. For the
documentation of these functions, see the
[reference page](https://software.nbdc-datahub.org/ABCDscores/reference/index.html).