permRand: Permutation Randomization

Example Use of Package Functions

We start by demonstrating an example randomization problem for a cohort study with 2:1 matched cases to control. The analyst would like to generate labels for a fully randomized samples for serum samples in batches and specify a certain number of QC samples to test the batch sizes. Extra blood from the samples will be used for a secondary test. This apparatus has different sized batches then the original batch so we must reassign the batches for the new apparatus. To make it easier on the scientists in the lab, the number of “switches” to the new configuration must be minimized.

Step 1: Generate a Test Dataset

We must first generate a test dataset. The package includes the function testRand that allows specification of size, matching, plate size, and the child per mother specification.

Variable descriptions of testRand include:

rowSize: size of plate row
colSize: size of plat column
studySize: size of each arm for the study
expNS: expected number of samples for all study arms
numCC: expected number of controls for each case
QCpt: percent of QC samples for the total study size
child: number of children per mother aliquots

So, the function would be for this example: testRand(rowSize=20,colSize=15,studySize=1000,expNS=7000,numCC=2,QCpct=0.07,child=4)

For this test we can load a saved dataset called: testR.

data("testR",package="permRand")

serumIDs <- testR[[1]]
serumLoc <- testR[[2]]
emptyQC <- testR[[3]]
motherQC <- testR[[4]]

We create a test dataset with a plate:

20 rows (rowSize)
15 columns (colSize)
1000 individuals per arm (studySize)
7000 expected number of samples (samples)
2 cases per control (numCC)
5% percentage of QC samples (QCpct)
4 children per mother (child)

We divide the output of this function into 4 different test datasets.

serumIDs: serum ID with participant ID
serumLoc: serum Packing location
emptyQC: empty QC vials
motherQC: qc “mother” samples

We next test the serumIDs and individual IDs to ensure that unique IDs are assigned to each serum sample. In this example, the ID is unique if the output dataset has no rows.

tests1 <- uniqueID(serumIDs,"serumID")
tests2 <- uniqueID(serumLoc,"serumID")
tests3 <- uniqueID(emptyQC,"serumID")
tests4 <- uniqueID(motherQC,"serumID")

Step 2: Format Datasets

In this step we next format the datasets for entry into the Randomization function.

We first assign IDs linking mother/child and events using the function randTest. The output of the function is a dataset with the ID links. This function requires:

dataMom: The mother dataset.
dataChild: Child dataset.
maxAliq: number of aliquots per mother aliquot.
nEvent: number of aliquots per each of event or lab.

The function formatRand formats the dataset for the randomization function. This function inputs serum data for both the study subjects and QC.

QCData: QC data
serumIDR: serum data with serumIDs
serumPack: serum data with packing lists

QCMaster <- randTest(dataMom=motherQC,dataChild=emptyQC,maxAliq=4, nEvent=c(28,27,28,30))

serumMaster <- formatRand(QCdata=QCMaster,serumIDR=serumIDs,serumPack=serumLoc)

The dataset serumMaster is a dataset that is formatted and ready for the randomization file.

Step 3: Randomize Dataset

While the practice dataset includes multiple events or labs that can process the samples, we are only going to randomize a single event at a time. We subset the dataset, serumMaster to only include a single event. In this example we will pick event 3. The function, allRand, randomizes aliquots to batches.

dataR: data for randomization
batchTot: c(batchTot1, batchTot2) sizes of plates, just use one plate per batch, batch size inclusive of QC samples
numQC: number of QC samples per batch
withinN: number of samples away that the QC samples must be from each other
numMatch: number of QC samples from a single mother within a batch
chkRep: check if there is a repeat of the groups within the batches


serumMaster3 <- serumMaster %>% filter(event == 3)
serumRand <- allRand(dataR=serumMaster3,batchTot=c(40,44), numQC=2,
                     withinN=2,numMatch=2,chkRep=1)

The example code demonstrates the use of the randomization function. The output, serumRand is the randomized serum samples.

Step 4: Check the Randomization

We can verify the success of the randomization using seven different tests. We describe these verification tests and the functions that implement these tests.

Test 1: no case-control samples across batches or plates. This function testCCAcross inputs the randomized dataset.

test <- testCCAcross(dataS=serumRand)

The output dataset, test, shows all of the case-control groups that are split across batches. In this example the output should have 0 rows.

Test 2: measures that every batch has at least the specified number of matching QC sample sets in a batch. The function testQCmatch inputs:

dataS: randomized data
numQCs: number of QCs specified per dataset
numMatch: number of QC samples form a single mother within a batch

test <- testQCmatch(dataS=serumRand,numQCs=4,numMatch=2)

The output lists all batches with not enough QC sample sets or the QC samples do not come from the same mother.

Test 3: tests for unique serum samples in the randomized datset. The function, uniqueID, is repeated from above.

test <- uniqueID(serumRand,"serumID")

The output lists any duplicates of IDs in the dataset.

Test 4: tests if sets are next to each other. Any sets that are not next to each other are flagged. The function, testPair, inputs serumRand.

test <- testPair(dataS=serumRand)

The output reports any sets that are separated in the loc.

Test 5: tests if a large number of cases or controls are next to each other The function, orderCases, has the inputs:

dataI: test dataset
betW: number of cases or controls to check if they are next to each other

test <- orderCases(dataI=serumRand,betW=4)

The output stores if there are any cases or controls together beyond a certain specified value. In this example the value is chosen to be 4. The output lists the studyID of the groups that have many next to each other.

Test 6: counts the number of samples that are in each of the batches. The function, batchCount, has the inputs:

dataS: test dataset
batchSizeT: batch size to test for

test <- batchCount(dataS=serumRand,batchSizeT=84)

The output should be empty, as it contains the ID of any batch that does not contain the specified number of samples. The last batch is not reported if it has less than the specified number of samples.

Test 7: count how many QC samples are in each of the batches, and if it doesn’t match the number specified. The function, countQC, contains:

dataS: test dataset
QCN: number of QC samples per batch

test <- countQC(dataS=serumRand,QCN=4)

The output includes any batches that does not contain the number of QC samples specified. In this example there are 4.

Now that our randomized example dataset, serumRand, has passed all of the verification checks, we can create our output packing lists.

Step 5: Packing Lists From the Randomization Results

In or example we demonstration we demonstrate by outputting two packing lists, one blinded and the other not blinded. We first demonstrate the unblinded packing list with the original packing location and title the dataset, unBlind.

unBlind <- outputLab(dataOut=serumRand,blind=0,origP=1,maxRows=9,maxCols=9,newPack=1)
blind <- outputLab(dataOut=serumRand,blind=1,origP=0,maxRows=9,maxCols=9,newPack=1)

The function, outputLab, has the following options:

dataOut = dataset to be formatted for packing list
Blind = 0/1 indicator to select if a blinded (1) or unblinded(0) packing list is to be generated
origP = 0/1 indicator for inclusion of the original packing location (1) or deletion of the packing location (0)
maxRows = maximum row for the output dataset
maxCols = maximum column for the output dataset
newPack = 0/1 indicator to generate new packing locations

The second packing list, blind, demonstrates an output where the packing list is blinded. The original packing location is also not displayed. These packing lists can be printed for use.

Step 6: Switching Without Re-Randomization

We do not want to completely re-randomize, but we may need to change parameters such as plate sizes or QC samples per plate. The analyst now wants to produce a new location list for serum samples, but the test bench has different sized batches.

dataIn: randomized dataset
numqc: number of QC samples per set
numqcM: number of qc matching samples
batchS: new batch size

serumSwitch <- switchR(dataIn=serumRand,numqc=2,numqcM=2,batchS=44)

In this example the batch size is now 43 samples with 2 QC samples and 2 matching per batches.

Step 7: Checking the Results

As above we check the randomization for the serum switch.

test1 <- testCCAcross(serumSwitch)
test2 <- testQCmatch(serumSwitch,2,2)
test3 <- uniqueID(serumSwitch,"serumID")
test4 <- testPair(serumSwitch)
test5 <- orderCases(serumSwitch,4)
test6 <- batchCount(serumSwitch,44)
test7 <- countQC(serumSwitch,2)

As a reminder, the functions of the tests are repeated below.

Test 1: no case-control samples across batches or plates.
Test 2: The output shows the batches with QC errors. The results are test 2 are fine, it just shows that there are extra QC samples in certain batches.
Test 3: The output shows samples where the serum ID is repeated.
Test 4: The “test” dataset contains any case-control sets that are not together.
Test 5: The test data shows cases or controls that are together.
Test 6: The test data contains batches with more than the specified count (43)
Test 7: The output contains batches that do not contain 2 QC samples.

Once this test is complete, we can output the packing lists for these serum samples. These packing lists are outputted in the datasets not Blinded (unBlindSw) and Blinded (blindSw).

Step 8: Output the Packing Lists for the `switchR` Function

We output blinded and unblinded packing lists for the switchR results. We can again use the outputLab function and set the newPack indicator variable to 0.

serumSwitchP <- unBlind %>% 
                select(serumID,rack,row,col) %>%
                merge(.,serumSwitch,by='serumID')
unBlindSw <- outputLab(serumSwitchP,blind=0,origP=.,maxRows=.,maxCols=.,newPack=0)
blindSw <- outputLab(serumSwitchP,blind=1,origP=.,maxRows=.,maxCols=.,newPack=0)

All of the example functions have been shown, and a demonstration of the pipeline for the package. The input datasets are formatted then randomized using two different methods. These output datasets are checked then packing lists are generated.

permRand: Permutation Randomization

Michelle Mellers, PhD and Thaddeus Haight, PhD

Vignette for the Package: `permRand`

Example Use of Package Functions

Step 1: Generate a Test Dataset

Step 2: Format Datasets

Step 3: Randomize Dataset

Step 4: Check the Randomization

Step 5: Packing Lists From the Randomization Results

Step 6: Switching Without Re-Randomization

Step 7: Checking the Results

Step 8: Output the Packing Lists for the `switchR` Function

permRand: Permutation Randomization

Michelle Mellers, PhD and Thaddeus Haight, PhD

Vignette for the Package: permRand

Example Use of Package Functions

Step 1: Generate a Test Dataset

Step 2: Format Datasets

Step 3: Randomize Dataset

Step 4: Check the Randomization

Step 5: Packing Lists From the Randomization Results

Step 6: Switching Without Re-Randomization

Step 7: Checking the Results

Step 8: Output the Packing Lists for the switchR Function

Vignette for the Package: `permRand`

Step 8: Output the Packing Lists for the `switchR` Function