--- title: "eye" author: "Tjebo Heeren" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{eye} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## See more with *eye* ```{r setup} library(eye) library(eyedata) ``` ## Introduction *eye* is dedicated to facilitate ophthalmic research, providing convenient application programming interfaces (API) for common tasks: - [Handling of visual acuity notations](#visual-acuity) - [Easy count of patients and eyes](#count-patients-and-eyes) - [Easy recoding of your eye variable](#recoding-the-eye-variable) - Reshape your eye data - [long](#myop) or [wide](#hyperop) - [Quick summary of your eye data](#blink) - [Get common summary statistics](#reveal) - [Calculate age](#getage) - [Clean NA equivalent entries](#clean-na-entries) *eye* includes a [visual acuity conversion chart](#va-conversion-chart). ## Visual acuity Pesky visual acuity notations are now a matter of the past. Convert between any of Snellen (meter/ feet/ decimal!), logMAR and ETDRS. The notation will be detected automatically and converted to the desired notation. For some more details see [VA conversion](#va-conversion). For entries with mixed notation, use `va_mixed` instead. You can also decide to simply "clean" your VA vector with `cleanVA(x)`. This will remove all entries that are certainly no VA. `va()` (and of course, its wrappers) cleans and converts visual acuity notations (classes) between Snellen (decimal, meter and feet), ETDRS, and logMAR. Each class can be converted from one to another. `va()` will detect the class automatically based on specific rules detailed below. Calling va() without specifying the "to" argument will simply clean the visual acuity entries - any notations will be accepted, no plausibility checks yet performed. This is then bascially a wrapper around `cleanVA`. It takes an (atomic) vector with visual acuity entries as the only required argument. The user can specify the original VA notation, but va will check that and ignore the argument if implausible. ### Conversion steps `va()` basically runs three main steps: 1. [Entry cleaning](#cleaning) with `clean_va()` 1. [Notation detection](#detection) with `which_va()` 1. [Plausibility checks](#plausibility) with `checkVA()` 1. [Conversion](#conversion) with the S3 generic `convertVA()` ### Cleaning 1. `NA` are assigned to missing entries or strings representing such entries (".", "", "{any number of spaces}", "N/A", "NA", "NULL", "-") 1. Notation for qualitative entries is simplified (NPL becomes NLP, PL becomes LP). 1. "plus" and "minus" from Snellen entries are converted: - if entry -2 to +2 : take same Snellen value - if < -2 : take Snellen value one line below - if >+2 : Snellen value one line above Snellen are unfortunately often entered with "+/-", which is a violation of psychophysical methods designed to assign one (!) unambiguous value to visual acuity, with non-arbitrary thresholds based on psychometric functions. Therefore, transforming "+/-" notation to actual results is in itself problematic and the below suggestion to convert it will remain an approximation to the most likely "true" result. Even more so, as the given conditions should work for charts with 4 or 5 optotypes in a line, and visual acuity is not always tested on such charts. Yet, I believe that the approach is still better than just omitting the letters or (worse) assigning a missing value to those entries. If the argument `smallstep = TRUE`, the entries will be converted to logmar values (0.02 logmar for each optotype). This is based on the assumption of 5 optotypes in a row. This argument can be overriden with `noplus = TRUE`, ignoring the plus minus entries entirely and simply returning the nearest Snellen values. ### Detection - Internally done with `which_va()` based on the following rules - if x integer and 3 < x <= 100: `etdrs` - if x integer and 0 <= x <= 3: `logmar`, `snellendec` or `etdrs` - if x numeric and -0.3 <= x <= 3: `logmar` or `snellendec` - if character and format x/y: `snellen` (fraction) - if one of "CF", "HM", "LP", "PL", "NLP", or "NPL": `quali` #### Accepted VA formats / Plausibility checks [#plausibility] - Snellen fractions (meter/ feet) need to be entered as fraction with "/". **any fraction is allowed** e.g. 3/60 and 2/200 will also be recognized. - ETDRS must be integer-equivalent between 0 and 100 (integer equivalent means, it can also be a character vector) - logMAR must be between -0.3 and 3.0 - snellendec must be greater than 0 and smaller or equal to 2 - Qualitative must be either of PL, LP, NLP, NPL, HM, CF (any case allowed) - Any element which is implausible / not recognized will be converted to NA ### Conversion - **logMAR to ETDRS**: logMAR rounded to the first digit and converted with the chart. - **Snellen to logMAR**: logMAR = -1 * log10(snellen_frac) - **Snellen to ETDRS**: ETDRS = 85 + 50 * log10(snellen_frac) [Gregori et al.](https://doi.org/10.1097/iae.0b013e3181d87e04). - **ETDRS to logMAR**: logMAR = -0.02 * etdrs + 1.7 [Beck et al.](https://doi.org/10.1016/s0002-9394(02)01825-1) - **Hand movements and counting fingers** are converted following Schulze-Bonsel et al. (https://doi.org/10.1167/iovs.05-0981) - **(No) light perception** are converted following the suggestions by [Michael Bach](https://michaelbach.de/sci/acuity.html) - **To Snellen**: Although there seems to be no good statistical reason to convert back to Snellen, it is a very natural thing to eye specialists to think in Snellen. A conversion to snellen gives a good gauge of how the visual acuity for the patients are. However, back-conversion should not be considered an exact science and any attempt to use formulas will result in very weird Snellen values that have no correspondence to common charts. Therefore, Snellen matching the nearest ETDRS and logMAR value in [the VA conversion chart](#va-conversion-chart) are used. ### Examples ```{r va} ## automatic detection of VA notation and converting to logMAR by default x <- c(23, 56, 74, 58) ## ETDRS letters to_logmar(x) # wrapper of va(x, to = "logmar") ## ... or convert to snellen to_snellen(x) # wrapper of va(x, to = "snellen") ## eye knows metric as well to_snellen(x, type = "m") ## And the decimal snellen notation, so much loved in Germany to_snellen(x, type = "dec") ## Remove weird entries and implausible entries depending on the VA choice x <- c("NLP", "0.8", "34", "3/60", "2/200", "20/50", " ", ".", "-", "NULL") to_snellen(x) to_snellen(x, from = "snellendec") to_snellen(x, from = "etdrs") to_snellen(x, from = "logmar") ## "plus/minus" entries are converted to the most probable threshold (any spaces allowed) x <- c("20/200 - 1", "6/6-2", "20/50 + 3", "6/6-4", "20/33 + 4") to_logmar(x) ## or evaluating them as logmar values (each optotype equals 0.02 logmar) to_logmar(x, smallstep = TRUE) ## or you can also decide to completely ignore them (converting them to the nearest snellen value in the VA chart) to_snellen(x, noplus = TRUE) ``` ## recodeye {#recode} Makes recoding eye variables very easy. The following codes are recognized: - integer coding 0:1 and 1:2, right eye being the lower number. - right eyes: c("r", "re", "od", "right") and - left eyes: c("l", "le", "os", "left") and - both eyes: c("b", "both", "ou") If you have different codes, you can change the recognized strings with the `eyestrings` argument, which needs to be a list. But remember to put the strings for right eyes first, or pass a *named list*. You can also more globally change recognized codes with `set_eye_strings()` ```{r} x <- c("r", "re", "od", "right", "l", "le", "os", "left", "both", "ou") recodeye(x) ## chose the resulting codes recodeye(x, to = c("od", "os", "ou")) ## Numeric codes 0:1/ 1:2 are recognized x <- 1:2 recodeye(x) ## with weird missing values x <- c(1:2, ".", NA, "", " ") recodeye(x) ## If you are using different strings to code for eyes, e.g., you are using a different language, you can change this either with the "eyestrings" argument french <- c("OD", "droit", "gauche", "OG") recodeye(french, eyestrings = list(r = c("droit", "od"), l = c("gauche", "og"))) ## or change it more globally with `set_eye_strings` set_eye_strings(right = c("droit", "od"), left = c("gauche", "og")) recodeye(french) # to restore the default, call set_eye_strings empty set_eye_strings() ``` ## Counting patients and eyes {#eyes} `eyes` offers a very simple tool for counting patients and eyes. It will return a list object which gives you easy access to the data. An important step in `eyes` is the guessing of the columns that identify patients and eyes. As for `myop` and of course `blink`, a specific column naming is required for a reliable automatic detection of patient and eye column(s) ( [see Names and codes](#names-and-codes)) The arguments **id** and **eye** arguments overrule the name guessing for the respective columns. ### Guessing #### patient ID columns: - names can be in any case. - First, `eyes` is looking for names that contain both strings "pat" and "id" (the order doesn't matter) you can change those codes with `set_eye_strings()` - Next, it will look for columns that are plainly called "ID" - Last, it will search for all names that contain either "pat" or "id" #### eye variable column: - names can be in any case. - `eyes` looks for columns that contain the string either "eye" or "eyes" - you can change those codes with `set_eye_strings()` - columns will full string "eye" or "eyes" will be given precedence ### Counting For counting eyes, eyes need to be coded in commonly used ways. You can use [recodeye](#recode) for very convenient recoding. - `eyes` recognizes integer coding 0:1 and 1:2, with right being the lower number. - Or, arguably more appropriate in R, character coding for a categorical variable: - right eyes: c("r", "re", "od", "right") - left eyes: c("l", "le", "os", "left") - both eyes: c("b", "both", "ou") - you can change those codes with `set_eye_strings()` ### Report: `eyes` also include a convenience function to turn the count into a text. This is intended for integration into rmarkdown reports, or for easy copy / pasting. `eyes_to_string()` parses the output of `eyes` into text under the hood. Arguments to `eyes_to_string` are passed via **...**: - **english** Which numbers to be written in plain english: choose "small" for numbers till 12, "all" (all numbers), or "none" (or any other string!) for none - **caps** First number will have capital letter **eyestr** will create a string which you can paste into a report. The name was chosen because it's a contraction of "eyes" and "strings" and it's a tiny bit easier to type than "eyetxt". #### Use in rmarkdown `eyestr` was designed with the use in rmarkdown in mind, most explicitly for the use inline. You can change the way numbers are converted to english with the `english` argument. By default, numbers smaller than or equal to 12 will be real English, all other numbers will be ... numbers. You can capitalise the first number with the `caps` argument. We analyzed `` `r knitr::inline_expr("eyestr(amd2)")` `` gives: We analyzed `r eyestr(amd2)` We analyzed `` `r knitr::inline_expr("eyestr(head(amd2, 100))")` ``gives: We analyzed `r eyestr(head(amd2, 100))` We analyzed `` `r knitr::inline_expr('eyestr(amd2, english = "all")')` `` gives: We analyzed `r eyestr(amd2, english = "all")` `` `r knitr::inline_expr("eyestr(head(amd2, 100), caps = TRUE)")` `` were analyzed gives: `r eyestr(head(amd2, 100), caps = TRUE)` were analyzed We analyzed `` `r knitr::inline_expr('eyestr(head(amd2, 100), english = "none")')` `` gives: We analyzed `r eyestr(head(amd2, 100), english = "none")` ## Reshape eye data Out of convenience, data is often entered in a "wide" format: In eye research, there will be often two columns for the same variable, one column for each eye. This may be a necessary data formal for specific questions. However, "eye" is also variable (a dimension of your observation), and it can also be stored in a separate column. Indeed, in my experience R often needs eyes to be in a single column, with each other variable having their own dedicated column. ### myop Reshaping many such columns can be a daunting task, and `myop()` makes this easier. It will remove duplicate rows, and pivot the eye variable to one column and generate a single column for each variable, thus shaping the data for specific types of analysis. For example, eight columns that store data of four variables for right and left eyes will be pivoted to 5 columns (one eye column and four further variable columns)). See also **Examples**. As with `eyes()`, `myop()` requires a specific data format. [See names and codes](#names-and-codes) If there is already a column called "eye" or "eyes", myop will not make any changes - because the data is then already assumed to be in long format. If there still are variables spread over two columns for right and left eyes, then this is an example of messy data. A solution would be to remove or simply rename the "eye" column and then let myop do the work. However, you need to be very careful in those cases if resulting data frame is plausible. [Learn about tidy data.](https://tidyr.tidyverse.org/articles/tidy-data.html) #### Make myop work myop will work reliably if you adhere to the following: 1. Common codes for eyes: - Right eyes: *"r", "re", "od", "right"* - Left eyes: *"l", "le", "os", "left"* 1. strings for eyes need to be **separated by period or underscores**. (Periods will be replaced by underscores). 1. Any order of substrings is allowed: - **Will work**: "va_r", "right_morningpressure", "night_iop.le", "gat_os_postop" - **Will fail**: "VAr", "rightmorningPressure", "night_IOPle", "gatOSpostop" An exception is when there is only one column for each eye. Then the column names can consist of "eye strings" only. In this case, the argument *var* will be used to name the resulting variable. If there are only eye columns in your data (should actually not happen), myop will create identifiers by row position. **Please always check the result for plausibility.** Depending a lot on how the data was entered, the results could become quite surprising. There is basically a nearly infinite amount of possible combinations of how to enter data, and it is likely that myop will not be able to deal with all of them. #### myop under the hood `myop()` basically runs three main steps: 1. Removing duplicates 1. Rename data names with `myop_rename()` and `sort_substr()`: - Replacing "." by "_" - Re-arranging and recoding substrings in a way that strings for eyes always appear at first position. They will be recoded to "r" and "l" 1. Myopization: The actual work is done with `myopizer()` and `myop_pivot()` and itself consists of three steps. - All columns with an eye string at first position will be selected pivoted to two long colums (`key` and `value`) using `tidyr::pivot_longer`. - The `key` column will be split by position into an eye column and a `variable` column. - The `variable` and `value` columns will be pivoted wide again with `tidyr::pivot_wider`. ```{r data frames} wide1 <- data.frame(id = letters[1:3], r = 11:13 , l = 14:16) iop_wide <- data.frame(id = letters[1:3], iop_r = 11:13, iop_l = 14:16) ## Mildly messy data frame with several variables spread over two columns: wide_df <- data.frame( id = letters[1:4], surgery_right = c("TE", "TE", "SLT", "SLT"), surgery_left = c("TE", "TE", "TE", "SLT"), iop_r_preop = 21:24, iop_r_postop = 11:14, iop_l_preop = 31:34, iop_l_postop = 11:14, va_r_preop = 41:44, va_r_postop = 45:48, va_l_preop = 41:44, va_l_postop = 45:48 ) ``` ```{r} ## the variable has not been exactly named, (but it is probably IOP data), ## you can specify the dimension with the var argument myop(wide1, var = "iop") ## If the dimension is already part of the column names, this is not necessary. myop(iop_wide) ## myop deals with this in a breeze: myop(wide_df) ``` ### hyperop Basically the opposite of `myop()` - a slightly intelligent wrapper around `tidyr::pivot_longer()` and `tidyr::pivot_wider()`. Will find the eye column, unify the codes for the eyes (all to "r" and "l") and pivot the columns wide, that have been specified in "cols". **Again, good names and tidy data always help!** The cols argument takes a tidyselection. [Read about tidyselection](https://tidyselect.r-lib.org/reference/language.html) ```{r} myop_df <- myop(wide_df) hyperop(myop_df, cols = matches("va|iop")) ``` ## blink Although kind of nice, blink is more a nerdy extra and is not likely to be used much. Therefore I decided to stop the work on it. It will be left in the package as such. **See your data in a blink of an eye** `blink()` is more than just a wrapper around `myop()`, `eyes()`, `va()` and `reveal()`. It will look for VA and for IOP columns and provide the summary stats for the entire cohort and for right and left eyes for each VA and IOP variable. **This, again, requires a certain format of names and codes** - [See Names and Codes](#names-and-codes) ### Work under the hood 1. Removing duplicates 1. Identification of the index of the VA and IOP columns 1. Remove certain columns from analysis: - Logical columns - Character columns if they are not coding for Snellen fractions or "qualitative visual acuity" - Those with unique values in the range of the vector defined by the argument "fct_level". Those are removed because they are likely categorical codes. If you have columns with VA or IOP that contain only 0 and 1 for example (which is unlikely, but not impossible), you can just set `fct_level = "x"` or any other arbitrary value. 1. Rename data names with `myop_rename()` and `sort_substr()` 1. Getting the new names based on the indices of the columns 1. Creating a vector of "expected new names" for the summary after myopization 1. Myopizing if possible 1. Converting VA vectors to logMAR 1. Applying `reveal()` to all VA and IOP columns. As you can imagine, a lot of those steps rely hugely on reasonable naming of your columns and this is what makes this function unfortunately a bit fragile. However, if you adhere to the naming conventions, blink (and myop) will do a great job for you. If you are not happy with the automatic column selection, you can manually select the VA and IOP columns with the arguments `va_cols` or `iop_cols`. Both accept [tidyselection](https://tidyselect.r-lib.org/reference/language.html). I personally find `starts_with`, `ends_with`, `contains()` or the more general `matches()` very useful. ### Examples ```{r} blink(wide_df) blink(amd) ``` ## Names and codes **eye works smoother with tidy data, and with good names** (any package does, really!) ### Tidy data The basic principle of tidy data is: one column for each dimension and one row for each observation. [Learn more about tidy data.](https://tidyr.tidyverse.org/articles/tidy-data.html) This chapter explains how you can improve names and codes so that `eye` will work like a charm. ### How do I rename columns in R? When I started with R, I found it challenging to rename columns and I found the following methods very helpful: I've got a data frame with unfortunate names: ```{r} name_mess <- data.frame(name = "a", oculus = "r", eyepressure = 14, vision = 0.2) names(name_mess) ``` I can rename all names easily: ```{r} names(name_mess) <- c("patID", "eye", "IOP", "VA") names(name_mess) ``` To rename only specific columns, even if you are not sure about their exact position: ```{r, include=FALSE} names(name_mess) <- c("name", "oculus", "eyepressure", "vision") ``` ```{r} ## if you only want to rename one or a few columns: names(name_mess)[names(name_mess) %in% c("name", "vision")] <- c("patID", "VA") names(name_mess) ``` For even more methods, I found those two threads on Stackoverflow very helpful: - [Rename single column](https://stackoverflow.com/q/7531868/7941188) - [Rename columns with named vector](https://stackoverflow.com/q/20987295/7941188) ### Tips and rules for naming: 1. Don't be too creative with your names! 1. Use common coding: - **eyes**: "r", "re", "od", "right" - or numeric coding r:l = 0:1 or 1:2 - **Visual acuity**: "VA", "BCVA", "Acuity" - **Intraocular pressure**: "IOP", "GAT", "NCT", "pressure" - **Patient identifier**: "pat", "patient", "ID" (ideally both: "patientID" or "patID") 1. Column names: - No spaces! - Do not use numeric coding for eyes in column names - Separate eye and VA and IOP codes with underscores ("bcva_l_preop", "VA_r", "left_va", "IOP_re") - Keep names short - Don't use underscores when you don't need to: Consider each section divided by an underscore as a relevant characteristic of your variable. E.g., "preop" instead of "pre_op", or simply "VA" instead of "VA_ETDRS_Letters" ### Name examples Good names (`eye` will work nicely) ```{r} ## right and left eyes have common codes ## information on the tested dimension is included ("iop") ## VA and eye strings are separated by underscores ## No unnecessary underscores. names(wide_df) names(iop_wide) ``` OK names (`eye` will work) ```{r} ## Id and Eye are common names, there are no spaces ## VA is separated from the rest with an underscore ## BUT: ## The names are quite long ## There is an unnecessary underscore (etdrs are always letters). Better just "VA" c("Id", "Eye", "FollowupDays", "BaselineAge", "Gender", "VA_ETDRS_Letters", "InjectionNumber") ## All names are commonly used (good!) ## But which dimension of "r"/"l" are we exactly looking at? c("id", "r", "l") ``` Bad names (`eye` will fail) ```{r} ## VA/IOP not separated with underscore ## `eye` won't be able to recognize IOP and VA columns c("id", "iopr", "iopl", "VAr", "VAl") ## A human may think this is clear ## But `eye` will fail to understand those variable names c("person", "goldmann", "vision") ## Not even clear to humans c("var1", "var2", "var3") ``` ## Reveal common statistics {#reveal} `reveal()` offers a simple API to show common summary statistics for all numeric columns of your data frame. `reveal()` is basically a slightly complicated wrapper around `mean()`, `sd()`, `length()`, `min()` and `max()` (with na.rm = TRUE and `length()` counting only non-NA values). It is not really intended to replace other awesome data exploration packages / functions such as `skimr::skim`, and it will likely remain focussed on summarizing numerical data only. It uses an S3 generic under the hood with methods for atomic vectors, data frames, and lists of either atomic vectors or data frames. Character vectors will be omitted (and it should give a warning that it has done so). `reveal()` takes the grouping argument `by` and it returns vector for atomic vectors or a data frame for lists. ### Examples ```{r stats, warning=FALSE, message=FALSE} clean_df <- myop(wide_df) reveal(clean_df) reveal(clean_df, by = "eye") reveal(clean_df, by = c("eye", "surgery")) ``` ## Calculate age {#getage} This is a simple function and should not require much explanation. However, it may be noteworthy to mention the subtle distinction of periods and durations, which are an idiosyncrasy of time measurements and well explained [in this thread.](https://lubridate.tidyverse.org/articles/lubridate.html#time-intervals) ### Examples ```{r getage, warning=FALSE, message=FALSE} dob <- c("1984-10-16", "2000-01-01") ## If no second date given, the age today getage(dob) ## If the second argument is specified, the age until then getage(dob, "2000-01-01") ``` ## Important notes **I do not assume responsability for your data or analysis**. Please always keep a critical mind when working with data - if you do get results that seem implausible, there may be a chance that the data is in an unfortunate shape for which `eye` may not be suitable. ### VA conversion chart This chart is included in the package as `va_chart`