permRand
This vignette illustrates the functionality of the R-package
permRand
. It demonstrates all of the functions within the
package. This document also demonstrates the use of an example pipeline
to describe the flow of a randomization for an observational study using
the package.
We start by demonstrating an example randomization problem for a cohort study with 2:1 matched cases to control. The analyst would like to generate labels for a fully randomized samples for serum samples in batches and specify a certain number of QC samples to test the batch sizes. Extra blood from the samples will be used for a secondary test. This apparatus has different sized batches then the original batch so we must reassign the batches for the new apparatus. To make it easier on the scientists in the lab, the number of “switches” to the new configuration must be minimized.
We must first generate a test dataset. The package includes the
function testRand
that allows specification of size,
matching, plate size, and the child per mother specification.
Variable descriptions of testRand
include:
So, the function would be for this example:
testRand(rowSize=20,colSize=15,studySize=1000,expNS=7000,numCC=2,QCpct=0.07,child=4)
For this test we can load a saved dataset called: testR.
data("testR",package="permRand")
serumIDs <- testR[[1]]
serumLoc <- testR[[2]]
emptyQC <- testR[[3]]
motherQC <- testR[[4]]
We create a test dataset with a plate:
We divide the output of this function into 4 different test datasets.
We next test the serumIDs and individual IDs to ensure that unique IDs are assigned to each serum sample. In this example, the ID is unique if the output dataset has no rows.
In this step we next format the datasets for entry into the Randomization function.
We first assign IDs linking mother/child and events using the
function randTest
. The output of the function is a dataset
with the ID links. This function requires:
The function formatRand
formats the dataset for the
randomization function. This function inputs serum data for both the
study subjects and QC.
QCMaster <- randTest(dataMom=motherQC,dataChild=emptyQC,maxAliq=4, nEvent=c(28,27,28,30))
serumMaster <- formatRand(QCdata=QCMaster,serumIDR=serumIDs,serumPack=serumLoc)
The dataset serumMaster
is a dataset that is formatted
and ready for the randomization file.
While the practice dataset includes multiple events or labs that can
process the samples, we are only going to randomize a single event at a
time. We subset the dataset, serumMaster to only include a single event.
In this example we will pick event 3. The function,
allRand
, randomizes aliquots to batches.
serumMaster3 <- serumMaster %>% filter(event == 3)
serumRand <- allRand(dataR=serumMaster3,batchTot=c(40,44), numQC=2,
withinN=2,numMatch=2,chkRep=1)
The example code demonstrates the use of the randomization function. The output, serumRand is the randomized serum samples.
We can verify the success of the randomization using seven different tests. We describe these verification tests and the functions that implement these tests.
Test 1: no case-control samples across batches or
plates. This function testCCAcross
inputs the randomized
dataset.
The output dataset, test, shows all of the case-control groups that are split across batches. In this example the output should have 0 rows.
Test 2: measures that every batch has at least the
specified number of matching QC sample sets in a batch. The function
testQCmatch
inputs:
The output lists all batches with not enough QC sample sets or the QC samples do not come from the same mother.
Test 3: tests for unique serum samples in the
randomized datset. The function, uniqueID
, is repeated from
above.
The output lists any duplicates of IDs in the dataset.
Test 4: tests if sets are next to each other. Any
sets that are not next to each other are flagged. The function,
testPair, inputs serumRand
.
The output reports any sets that are separated in the loc.
Test 5: tests if a large number of cases or controls
are next to each other The function, orderCases
, has the
inputs:
The output stores if there are any cases or controls together beyond a certain specified value. In this example the value is chosen to be 4. The output lists the studyID of the groups that have many next to each other.
Test 6: counts the number of samples that are in
each of the batches. The function, batchCount
, has the
inputs:
The output should be empty, as it contains the ID of any batch that does not contain the specified number of samples. The last batch is not reported if it has less than the specified number of samples.
Test 7: count how many QC samples are in each of the
batches, and if it doesn’t match the number specified. The function,
countQC
, contains:
The output includes any batches that does not contain the number of QC samples specified. In this example there are 4.
Now that our randomized example dataset, serumRand, has passed all of the verification checks, we can create our output packing lists.
In or example we demonstration we demonstrate by outputting two packing lists, one blinded and the other not blinded. We first demonstrate the unblinded packing list with the original packing location and title the dataset, unBlind.
unBlind <- outputLab(dataOut=serumRand,blind=0,origP=1,maxRows=9,maxCols=9,newPack=1)
blind <- outputLab(dataOut=serumRand,blind=1,origP=0,maxRows=9,maxCols=9,newPack=1)
The function, outputLab
, has the following options:
The second packing list, blind, demonstrates an output where the packing list is blinded. The original packing location is also not displayed. These packing lists can be printed for use.
We do not want to completely re-randomize, but we may need to change parameters such as plate sizes or QC samples per plate. The analyst now wants to produce a new location list for serum samples, but the test bench has different sized batches.
In this example the batch size is now 43 samples with 2 QC samples and 2 matching per batches.
As above we check the randomization for the serum switch.
test1 <- testCCAcross(serumSwitch)
test2 <- testQCmatch(serumSwitch,2,2)
test3 <- uniqueID(serumSwitch,"serumID")
test4 <- testPair(serumSwitch)
test5 <- orderCases(serumSwitch,4)
test6 <- batchCount(serumSwitch,44)
test7 <- countQC(serumSwitch,2)
As a reminder, the functions of the tests are repeated below.
Once this test is complete, we can output the packing lists for these serum samples. These packing lists are outputted in the datasets not Blinded (unBlindSw) and Blinded (blindSw).
switchR
FunctionWe output blinded and unblinded packing lists for the
switchR
results. We can again use the
outputLab
function and set the newPack indicator variable
to 0.
serumSwitchP <- unBlind %>%
select(serumID,rack,row,col) %>%
merge(.,serumSwitch,by='serumID')
unBlindSw <- outputLab(serumSwitchP,blind=0,origP=.,maxRows=.,maxCols=.,newPack=0)
blindSw <- outputLab(serumSwitchP,blind=1,origP=.,maxRows=.,maxCols=.,newPack=0)
All of the example functions have been shown, and a demonstration of the pipeline for the package. The input datasets are formatted then randomized using two different methods. These output datasets are checked then packing lists are generated.