--- title: "Automated 2D Peak Fitting Scripts" output: rmarkdown::html_vignette resource_files: - fit/noise_histograms.png - fit/fit_iterations_01.png - fit/fit_iterations_02.png - fit/fit_iterations_03.png - fit/fit_iterations_04.png - fit/fit_spectra.png - refine/1_fit.png - extend/2_fit.png - extend/assign_stages.png - extend/assign_omegas.png vignette: > %\VignetteIndexEntry{Automated 2D Peak Fitting Scripts} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_knit$set( global.par = TRUE ) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", cache = FALSE, fig.retina = 2 ) #knitr::opts_chunk$set(optipng = "-o7 -strip all") #knitr::knit_hooks$set(optipng = knitr::hook_optipng) old_options_peak2d_scripts <- options(digits = 5, width = 90) old_options_mc_cores <- NULL check_limit_cores <- tolower(Sys.getenv("_R_CHECK_LIMIT_CORES_", "")) if (nzchar(check_limit_cores) && (check_limit_cores != "false")) { old_options_mc_cores <- options(mc.cores = 2) } # Set to FALSE to run the vignette in the current working directory. run_in_temp <- TRUE if (Sys.getenv("IN_PKGDOWN") == "" && ! interactive()) { knitr::opts_chunk$set(fig.retina = 1) } else { run_in_temp <- FALSE } original_wd <- getwd() if (run_in_temp) { vignette_wd <- file.path(tempdir(), "fitnmr_peak2d_scripts") dir.create(vignette_wd, showWarnings=FALSE, recursive=TRUE) knitr::opts_knit$set(root.dir = vignette_wd) } else { vignette_wd <- original_wd # Clean up old results unlink(c("fit", "refine", "extend"), recursive = TRUE, force = TRUE) } ``` ```{r results = if (Sys.getenv("IN_PKGDOWN")=="" && !interactive()) "asis" else "hide", echo = FALSE} cat("**Note:** A high-resolution version of this document is [available online](https://smith-group.github.io/fitnmr/articles/peak2d_scripts.html).") ``` FitNMR enables fully automated peak fitting of 2D NMR spectra. This document shows how the 2D fitting can be done with very little manual code entry using three convenience scripts, `fit_peaks_2d.R`, `refit_peaks_2d.R`, and `assign_peaks_2d.R`, that are included with FitNMR. For a description of how a similar workflow can be accomplished using `R` code and more details about algorithms, see the "Automated 2D Peak Fitting Code" document. The directory containing the three demo scripts that will be used here can be found by running the following from within R: ```{r, results = "hide"} system.file("demo", package="fitnmr") ``` For each spectrum, first use NMRPipe to produce an `*.ft2` file. The data must only be apodized with the `SP` function using an exponent of 1 or 2. In addition, use the `EXT` function to extract the smallest region that contains all the relevant peaks to be fit. Eliminating noise and other extraneous peaks speeds up the processing and simplifies the output. ```{r, include = FALSE} pdf <- function(file, width=7, height=7, pointsize=12) { if (file == "fit_iterations.pdf") { file <- "fit_iterations_%02i.pdf" width <- 3.4 height <- 3.4 pointsize <- 10 } file <- sub(".pdf$", ".png", file) width <- min(width, 7) height <- min(height, 7) if (Sys.getenv("IN_PKGDOWN") == "" && ! interactive()) { grDevices::png(file, width, height, "in", pointsize, res=96) } else { grDevices::png(file, width, height, "in", pointsize, res=192) } } update_cex <- function(file, cex) { script_lines <- readLines(file) script_lines <- sub("^cex <- 0.2", paste("cex <-", cex), script_lines) writeLines(script_lines, file) } ``` ## Fitting Initial Set of Peaks The `fit_peaks_2d.R` script in the `fitnmr` `demo` directory handles the initial fitting of peaks. You would typically use this script to find an initial set of peaks on the highest signal-to-noise spectrum in your dataset. The script can be customized by creating a copy then editing the parameters listed at the top prior to running it in R. In a typical workflow, you would create a directory to perform the initial fitting in. Here we will create that directory with R, but usually you would do so using your file manager, the `mkdir` function, etc. ```{r, results = "hide"} dir.create("fit") ``` ```{r, include = FALSE} knitr::opts_knit$set(root.dir = file.path(vignette_wd, "fit")) ``` Next you need to switch to that directory. If that was successful the following command should output `fit`: ```{r} basename(getwd()) ``` Next copy the reference spectrum into that directory. That spectrum must have `.ft2` as the extension, otherwise the script won't be able to find it. Ordinarily you would do this manually, but here we will do so in code. The following copies the first T~1~ spectrum into it. ```{r, echo = FALSE} t1_dir <- system.file("extdata", "t1", package="fitnmr") t1_ft2_filenames <- list.files(t1_dir, pattern=".ft2") ``` ```{r, results = "hide"} file.copy(file.path(t1_dir, t1_ft2_filenames[1]), ".") ``` Next we need to make a copy of the `fit_peaks_2d.R` script in the current working directory, which may be easiest to do within R. ```{r, results = "hide"} file.copy(system.file("demo", "fit_peaks_2d.R", package="fitnmr"), ".") ``` ```{r, include = FALSE} update_cex("fit_peaks_2d.R", 0.6) ``` You can then customize any of the options at the top of `fit_peaks_2d.R`: ```{r, echo = FALSE, comment = NA} fit_peaks_2d_lines <- readLines("fit_peaks_2d.R") library_idx <- grep("^library", fit_peaks_2d_lines) cat(fit_peaks_2d_lines[seq(1, library_idx-3)], sep="\n") ``` In this case, we will use the script unmodified. The `fit_peaks_2d.R` script must be run from within the `fit` directory. As the script runs, a summary of the fitting progress will be automatically printed to the screen. ```{r} source("fit_peaks_2d.R") ``` ```{r, include = FALSE} knitr::opts_knit$set(root.dir = vignette_wd) ``` ### Output Files Four output files are created by `fit_peaks_2d.R`: - `noise_histograms.pdf` - `fit_iterations.pdf` - `fit_spectra.pdf` - `fit_volume.csv` ```{r, include = FALSE} list.files("fit") ``` `noise_histograms.pdf` gives a histogram of the intensity values for each of the input spectra. The standard deviation of a Gaussian fit to that histogram is used to determine the noise level. ```{r, echo = FALSE, hold = TRUE} knitr::include_graphics(file.path("fit", "noise_histograms.png")) ``` `fit_iterations.pdf` shows each iteration of the fitting, with the data shown in black and the fit contours shown in red. Negative contours are shown with lighter colors. The starting peak position is indicated with a green circle at the highest intensity in the spectrum after previous fits have been subtracted. The blue points show the positions of the peaks in the modeled doublet. The title text in parentheses gives the reason the next peak was rejected: either the p-value for the peak was greater than `f_alpha` or the peak had 0 volume. The first line of text under each peak gives the peak number and fraction of the total volume in that iteration. The second line gives the F-test p-value for that peak. All peaks in a given iteration are fit with the same peak shape parameters, including the peak position (`omega0`) and doublet scalar coupling (`sc`). Those peaks are all assigned to the same fitting group (`fit`) and continue to share the same peak shape parameters in subsequent fit optimizations. This parameter sharing can be important for both accuracy and stability when peaks are highly overlapped. If desired, you can manually modify the peak grouping in the peak list comma-separated value (CSV) file to change which peaks share parameters. The first four pages of `fit_iterations.pdf` are shown here: ```{r, echo = FALSE, hold = TRUE} knitr::include_graphics(file.path("fit", sprintf("fit_iterations_%02i.png", 1:4))) ``` `fit_spectra.pdf` gives the fit spectra. The doublet peak positions are shown with semi-transparent blue dots that are scaled in size such that the area is proportional to the peak volume. The first line of text under each peak gives the peak number and the fit group number separated by a colon. The second line gives the F-test p-value for that peak. By default, the lowest contour of this spectrum is set to four times the noise level, but that can be changed using the `plot_noise_cutoff` script parameter in 'fit_peaks_2d.R'. Furthermore, the size of the labels can be controlled using the `cex` (i.e. *c*haracter *ex*pansion) script parameter. ```{r, echo = FALSE, hold = TRUE} knitr::include_graphics(file.path("fit", "fit_spectra.png")) ``` The `fit_volume.csv` file contains all the identified peaks, the fitting group for each peak (which in this case is the iteration it was found in), the peak position/shape parameters, and a column for the volume found in each spectrum. Only one spectrum has been fit here so only one column of volumes, '1.ft2', is shown. ```{r, echo = FALSE, results = "asis"} csv_table <- read.csv(file.path("fit", "fit_volume.csv"), check.names = FALSE) csv_table[,"f_pvalue"] <- sprintf("%0.2e", csv_table[,"f_pvalue"]) knitr::kable(csv_table) ``` ## Refining the Initial Fit The initial fit is done iteratively, so it is usually a good idea to refine this by editing the fitting groups, deleting extraneous peaks, and then doing a simultaneous refit of all peaks together with the `refit_peaks_2d.R` script. We will do this in another directory called `refine`, which you should create. Here the directory is created using R code: ```{r, results = "hide"} dir.create("refine") ``` ```{r, include = FALSE} knitr::opts_knit$set(root.dir = file.path(vignette_wd, "refine")) ``` Next you need to switch to that directory. If that was successful the following command should output `refine`: ```{r} basename(getwd()) ``` Then the first spectrum and the `refit_peaks_2d.R` script should be copied into the `refine` directory: ```{r, results = "hide"} file.copy(file.path(t1_dir, t1_ft2_filenames[1]), ".") file.copy(system.file("demo", "refit_peaks_2d.R", package="fitnmr"), ".") ``` ```{r, include = FALSE} update_cex("refit_peaks_2d.R", 0.6) ``` Any of the options at the top of `refit_peaks_2d.R` can then be customized: ```{r, echo = FALSE, comment = NA} refit_peaks_2d_lines <- readLines("refit_peaks_2d.R") library_idx <- grep("^library", refit_peaks_2d_lines) cat(refit_peaks_2d_lines[seq(1, library_idx-3)], sep="\n") ``` However, in this case the script will be used unmodified. The next step is to copy the peak list from the `fit` directory to a file called `start_volume.csv` in the new `refine` directory: ```{r, results = "hide"} file.copy(file.path("..", "fit", "fit_volume.csv"), "start_volume.csv") ``` Once that file has been copied, you can then manually change the peak groups and delete rows for any peaks you want to discard. In this case, that will be done with the following `R` code, which puts peaks 5 and 7 into a new fitting group, and removes peaks 4, 14, and 16: ```{r, results = "hide"} input_table <- read.csv("start_volume.csv", check.names=FALSE) input_table[c(5,7),"fit"] <- max(input_table[,"fit"])+1 input_table <- input_table[-c(4,14,16),] write.csv(input_table, "start_volume.csv", row.names=FALSE) ``` The final step is to call the `refit_peaks_2d.R` script from within the `refine` directory: ```{r} source("refit_peaks_2d.R") ``` The `refit_peaks_2d.R` script first optimizes the volumes, keeping all other parameters fixed, then optimizes the volumes along with whatever other variables the user specifies in the script. Also, unlike the refinement done in "Automated 2D Peak Fitting Code", `refit_peaks_2d.R` only fits a single spectrum at a time. This avoids two problems: 1. For many experiments in which a series of spectra are acquired, different amounts of applied power can result in subtle shifts of the temperature/peak positions. Especially for overlapping peaks, this can cause systematic variations in the volumes. Allowing each spectrum to have slightly different peak positions within a small region around the starting values helps avoid this problem. 2. When many spectra are fit simultaneously, the memory usage for FitNMR can grow very large. This will hopefully be fixed in a future update. ```{r, include = FALSE} knitr::opts_knit$set(root.dir = vignette_wd) ``` ### Output Files The `refit_peaks_2d.R` script creates two files for each spectrum: - `*_fit.pdf` - `*_volume.csv` ```{r, include = FALSE} list.files("refine") ``` In practice, each '*' will be replaced by the spectrum filename. `*_fit.pdf` shows the overall spectrum as described above. In addition, it also shows the constraints on the `omega0` parameters using gray rectangles, the size of which is determined by the `omega0_r2_factor` parameter. The central peak position, shown as a small blue dot, is constrained to be within that gray rectangle. ```{r, echo = FALSE, hold = TRUE} knitr::include_graphics(file.path("refine", "1_fit.png")) ``` `*_volume.csv` contains all the identified peaks, along with a column for the volume found in that spectrum. ```{r, echo = FALSE, results = "asis"} csv_table <- read.csv(file.path("refine", "1_volume.csv"), check.names = FALSE) knitr::kable(csv_table) ``` ## Extending the Fit to Other Spectra To extend the refined fit to a series of other spectra, we will create another directory called `extend`. ```{r, results = "hide"} dir.create("extend", FALSE) ``` ```{r, include = FALSE} knitr::opts_knit$set(root.dir = file.path(vignette_wd, "extend")) ``` Next you need to switch to that directory. If that was successful the following command should output `extend`: ```{r} basename(getwd()) ``` We will then copy both T~1~ spectra into it. In addition, the fit parameter CSV file output from the refinement step will be copied to the file `start_volume.csv` for use as input. ```{r, results = "hide"} file.copy(file.path(t1_dir, t1_ft2_filenames), ".") file.copy(file.path("..", "refine", "1_volume.csv"), "start_volume.csv") ``` We will also need to copy the `refit_peaks_2d.R` script into the `extend` directory: ```{r, results = "hide"} file.copy(system.file("demo", "refit_peaks_2d.R", package="fitnmr"), ".") ``` ```{r, include = FALSE} update_cex("refit_peaks_2d.R", 0.6) script_lines <- readLines("refit_peaks_2d.R") script_lines <- sub("mc_cores <- parallel::detectCores\\(\\)", "mc_cores <- 1", script_lines) writeLines(script_lines, "refit_peaks_2d.R") ``` However, when extending the fit to other spectra, the parameters determining the peak shape (`sc` and `r2`) will be fixed, allowing only the volume and peak position to vary. This is done by changing `fit_sc` and `fit_r2` to `FALSE` at the top of the `refit_peaks_2d.R` script, which can be done using a text editor. For the purposes of this demonstration, the two parameters are changed with the following code: ```{r} script_lines <- readLines("refit_peaks_2d.R") script_lines <- sub("fit_sc <- TRUE", "fit_sc <- FALSE", script_lines) script_lines <- sub("fit_r2 <- TRUE", "fit_r2 <- FALSE", script_lines) writeLines(script_lines, "refit_peaks_2d.R") ``` The part of the script controlling the parameters to be fit should now read: ```{r, echo = FALSE, comment = NA} extend_peaks_2d_lines <- readLines("refit_peaks_2d.R") fit_omega0_idx <- grep("^fit_omega0", extend_peaks_2d_lines) cat(extend_peaks_2d_lines[seq(fit_omega0_idx-1, fit_omega0_idx+7)], sep="\n") ``` Next, run `refit_peaks_2d.R` from within the `extend` directory. It will do the refitting for each spectrum on a different CPU core, reducing the total time taken to fit a large number of spectra. ```{r} source("refit_peaks_2d.R") ``` ### Output Files As in the refinement step, the script creates two files for each spectrum, whose names are derived from the spectrum filenames. Here the output associated with `2.ft2` is shown, as the output for `1.ft2` is very similar to the previous section. ```{r, include = FALSE} list.files() ``` ```{r, include = FALSE} knitr::opts_knit$set(root.dir = vignette_wd) ``` `*_fit.pdf` shows the overall spectrum. Pay particular attention to where the small blue circle is located in the gray rectangle. You may want to adjust `omega0_r2_factor` to increase or decrease the stringency of the `omega0` constraint, or possibly choose a different spectrum to use for `start_volume.csv`. ```{r, echo = FALSE, hold = TRUE} knitr::include_graphics(file.path("extend", "2_fit.png")) ``` `*_volume.csv` contains all the identified peaks, along with a column for the volume found in that spectrum. ```{r, echo = FALSE, results = "asis"} csv_table <- read.csv(file.path("extend", "2_volume.csv"), check.names = FALSE) knitr::kable(csv_table) ``` ## Transferring Assignments ```{r, include = FALSE} knitr::opts_knit$set(root.dir = file.path(vignette_wd, "extend")) ``` FitNMR can also transfer previously determined peak assignments onto the peak lists. This will be done in the `extend` directory used above. For this tutorial, assignments for the subregion of the spectrum are also provided, which should be copied into that directory and given the name `assignments.csv` using: ```{r, results = "hide"} file.copy(system.file("extdata", "t1", "assignments.csv", package="fitnmr"), ".") ``` The `assignments.csv` file should contain a header and three columns. The text in the header is ignored, but the order of the columns must be: 1. Peak identifier 2. PPM in dimension 1 3. PPM in dimension 2 The `assignments.csv` file used here contains eight peaks: ```{r, echo = FALSE, results = "asis"} csv_table <- read.csv("assignments.csv", check.names = FALSE) knitr::kable(csv_table) ``` The script that transfers the assignments is called `assign_peaks_2d.R` and should also be copied into the `extend` directory: ```{r, results = "hide"} file.copy(system.file("demo", "assign_peaks_2d.R", package="fitnmr"), ".") ``` There are a number of parameters at the top of that script that can be customized: ```{r, echo = FALSE, comment = NA} assign_peaks_2d_lines <- readLines("assign_peaks_2d.R") library_idx <- grep("^library", assign_peaks_2d_lines) cat(assign_peaks_2d_lines[seq(1, library_idx-3)], sep="\n") ``` ```{r, include = FALSE} update_cex("assign_peaks_2d.R", 0.6) ``` The assignment transfer uses a FitNMR function called `height_assign()` that goes through each peak in the spectrum in order of decreasing height/volume, and assigns it to the closest position in the assignment list that is within some threshold distance. Internally, the chemical shifts in each dimension are normalized by the range of values in that dimension. The threshold is given in those normalized units, such that a value of 0.1 would allow a difference of no more than 10% of the spectral width in each dimension, or less for diagonally adjacent peaks. The algorithm does not assign multiple peaks to the same identifier. To help automatically handle differences in referencing between the spectra and input assignments, the algorithm uses `height_assign()` in two different stages. The first stage uses a larger threshold to handle cases where the referencing is quite different. After that initial assignment, the median offsets between the spectra and input assignments (in each dimension) are used to shift the input assignments. As long as there are enough isolated peaks that can be correctly assigned in the first stage, this is often sufficient to automatically correct the referencing for a more stringent second stage. The thresholds for both stages (`thresh_1` and `thresh_2`) can be tailored individually. In addition, the assigned chemical shifts can be manually adjusted for each stage (using `cs_adj_1` and `cs_adj_2`) to help in cases where the referencing cannot be automatically corrected. Because this is a subspectrum, the default values for the thresholds do not work as well. Furthermore, while the whole spectrum has enough peaks to automatically determine the correct offset after stage 1, the subspectrum does not, so the proton assignments are manually shifted 0.02 ppm to the left to compensate. For this tutorial, that is done with the following code: ```{r} script_lines <- readLines("assign_peaks_2d.R") script_lines <- sub("thresh_1 <- 0.1", "thresh_1 <- 0.2", script_lines) script_lines <- sub("cs_adj_2 <- c\\(0, 0\\)", "cs_adj_2 <- c(-0.02, 0)", script_lines) script_lines <- sub("thresh_2 <- 0.025", "thresh_2 <- 0.075", script_lines) writeLines(script_lines, "assign_peaks_2d.R") ``` The modified part of the script should now read: ```{r, echo = FALSE, comment = NA} extend_peaks_2d_lines <- readLines("refit_peaks_2d.R") thresh_1_idx <- grep("^thresh_1", script_lines) cat(script_lines[seq(thresh_1_idx-1, thresh_1_idx+6)], sep="\n") ``` The `assign_peaks_2d.R` script should then be run from within the `extend` directory: ```{r} source("assign_peaks_2d.R") ``` ### Output Files Three output files are generated by the `assign_peaks_2d.R` script: - `assign_stages.pdf` - `assign_omegas.pdf` - `assign_volume.csv` ```{r, include = FALSE} list.files() ``` ```{r, include = FALSE} knitr::opts_knit$set(root.dir = vignette_wd) ``` `assign_stages.pdf` shows the two stages used for the assignment transfer. The gray points show the chemical shifts of the input assignments after `cs_adj_1` has been applied. The peaks matched to those assignments in stage 1 are connected by gray lines. A larger gray oval around each input assignment shows the size of `thresh_1`. The adjusted input assignment chemical shifts used for stage 2 are shown in green, with smaller ovals indicating a more stringent `thresh_2`. Any input assignment not matched to a peak has its label drawn in purple. Likewise, any peak not matched to an input assignment is labeled in purple. ```{r, echo = FALSE, hold = TRUE} knitr::include_graphics(file.path("extend", "assign_stages.png")) ``` `assign_omegas.pdf` shows the final results of the assignment transfer, with the identifier of an assigned peak shown above the peak, and unassigned peaks labeled in purple. In addition, the peak positions from all of the fit spectra are shown as very small points, which are difficult to see here and are best viewed by zooming into the PDF generated by this step. An attempt is made to color each peak in a given group differently. ```{r, echo = FALSE, hold = TRUE} knitr::include_graphics(file.path("extend", "assign_omegas.png")) ``` `assign_volume.csv` contains all the peaks, with the assigned identifier in the first column. The peak positions, scalar couplings, and relaxation rates in this table give the median values from all the individual fits. There is also a column for the volume found in each spectrum. ```{r, echo = FALSE, results = "asis"} csv_table <- read.csv(file.path("extend", "assign_volume.csv"), check.names = FALSE) csv_table[,"assignment"] <- as.character(csv_table[,"assignment"]) csv_table[is.na(csv_table[,"assignment"]),"assignment"] <- "" knitr::kable(csv_table) ``` ```{r, include = FALSE} knitr::opts_knit$set(root.dir = original_wd) if (!is.null(old_options_mc_cores)) { options(old_options_mc_cores) } options(old_options_peak2d_scripts) ```