--- title: "exams.forge" author: "Sigbert Klinke, Kleio Chrysopoulou Tseva" date: "`r Sys.Date()`" output: pdf_document: extra_dependencies: ["environ"] includes: in_header: preamble.tex toc: yes rmarkdown::html_vignette: toc: true html_document: toc: yes vignette: > %\VignetteIndexEntry{exams.forge} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: 72 --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(exams.forge) ``` ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r, echo=FALSE} suppressPackageStartupMessages({ library("extraDistr") library("exams") library("exams.forge") }) ``` # Introduction When devising exercises for students, two primary challenges emerge. Firstly, not all datasets prove suitable; students' tendencies to round intermediate results can lead to differing yet accurate solutions. Secondly, the lack of access to these intermediate values may hinder solution explanation. This can be addressed by expanding existing routines. The former problem arises mainly when students are prohibited from using computers for exercises. Genuine comprehension of statistical coefficients or graphical representations is best achieved through manual calculation —an approach similar to memorizing multiplication tables. Without this hands-on engagement, students risk becoming mere button-pushers without deeper understanding. To tackle these challenges effectively, a decision was made to create carefully curated datasets, allowing for precise control over their nuances. The general approach is outlined below: ```{r, eval=FALSE} library("exams") library("exams.forge") repeat { ... # some data generation if (condition_holds) break } ``` For instance, in calculating the median from five observations $x_i$, we determine that the solution lies with the third sorted observation, $x_{(3)}$. Yet, it's crucial to verify that this third sorted observation doesn't coincide with the third observation itself. Otherwise, a student might overlook a crucial step in median computation. This concern is resolved as follows: ```{r, eval=FALSE} library("exams") library("exams.forge") repeat { x <- sample(1:10, size=5) sx <- sort(x) if (x[3]!=sx[3]) break } x ``` The `exams.forge` package was developed with the primary objective of "forging" exam tasks in combination with the `exams` package, along with auxiliary functions aimed at streamlining the process of generating Moodle exercises. The package consists of various functions divided into 7 categories based on their attributes. The nomenclature of the categories is as follows: Data Conversion and Modification, Statistical Analysis, Mathematical Computations, Exercise Generation, String Manipulation, LaTeX and HTML Functions and General Purpose Functions. The `exams.forge` package is intended for educators responsible for crafting examination materials within the domain of statistics, for example fundamental courses like Statistics I and II, utilizing the R programming language. The objective is to optimize the process of generating a substantial array of assessment items, thus allowing instructors to channel their efforts toward enhancing the substantive quality of the tasks. To what extent has the package been employed thus far? As a result of the onset of the COVID-19 pandemic in the spring of 2020, the Chair of Statistics at the Humboldt University of Berlin implemented non-compulsory, remote, digital examinations accommodating up to 500 participants. These examinations were administered within the domain of the foundational courses, Statistics I and II, deploying the Moodle platform for exams. In the context of Statistics I and II, each examination comprised a set of twenty questions, and for every question, an extensive array of one hundred variants was crafted. These variants encompassed a spectrum of distinctions, including variations in numerical values, shifts in content, or the weaving of diverse narratives. Moodle, our chosen platform, employed a random selection process, where one of the hundred available variants was assigned to each student. This meticulous approach guaranteed that each student received a unique examination, as opposed to the two-variant approach frequently taken in traditional face-to-face examinations. In summary, `exams.forge` is an R package designed for educators that simplifies the creation of statistical exercises. Beyond enhanced statistical functions, it offers specialized formatting tools, data generation functions, and XML file adaptations, created by the exams package, to facilitate the integration of exercises into Moodle. Now, let's explore the specifics of the first category, where we delve into a set of essential functions designed to enhance utility and streamline various data processing tasks. # Data Conversion and Modification ## Classical Univariate Time Series ### `ts_data` Creates a univariate time series by combining elements of a linear or exponential trend, additive or multiplicative seasonal adjustment, and white noise. The function generates a time series object with specified parameters, including the length of the series, trend presence and coefficients, seasonal adjustment (with coefficients), and error terms (with coefficients). The resulting time series is structured as a `ts_data` object, allowing for further analysis and exploration. ```{r} # Generate a time series ts_eg <- ts_data(end = 20, trend = TRUE, trend.coeff = c(1, 0.5), season = TRUE, season.coeff = c(0.2, 0.1), error = TRUE, error.coeff = 0.1, digits = 2) print(ts_eg) ``` ### `as_ts` Transforms a `ts_data` object into a time series object (`ts`). ```{r} ts <- ts_data(12, trend.coeff= c(sample(0:10, 1), sample(1+(1:10)/20, 1))) as_ts(ts) ``` ### `ts_moving_average` The `ts_moving_average` function calculates the moving average for a `ts_data` object. This function takes a `ts_data` object (`ts`) and a user-defined order for the moving average (`order`). The result is an extended `ts_data` object containing information about the filter used for the moving average (`filter`) and the computed moving average values (`moving.average`). ```{r} # Create a time series data object with sinusoidal fluctuations ts <- ts_data(20, trend.coeff = c(2)) # Compute the moving average with an order of 5 result_ts <- ts_moving_average(ts, 5) # Display the original and extended time series data objects cat("Original Time Series Data:\n") str(ts) cat("\nExtended Time Series Data with Moving Average:\n") str(result_ts) ``` ### `ts_trend_season` The `ts_trend_season` function estimates a trend and season model from a time series data object (`ts_data`). It allows for flexible modeling, enabling the specification of linear or exponential trends and additive or multiplicative seasonality. The function returns an extended `ts_data` object with various components, including the estimated trend, season, combined trend and season, as well as relevant coefficients. It also provides information about the variance of residuals and the goodness of fit ($R^2$) for the final model. ```{r} # Create a time series data object with a linear trend ts <- ts_data(12, trend.coeff = c(sample(0:10, 1), sample(1 + (1:10)/20, 1))) # Estimate trend and season result_ts <- ts_trend_season(ts) # Display the extended time series data object str(result_ts) ``` ## Confidence Intervals ### `CImulen_data` ::: {.infobox style="float: right;" data-latex=""} ```{=tex} \begin{align*}n &=\left\lceil \frac{\sigma^{2} \cdot z^2_{1 - \frac{\alpha}{2}}}{e^{2}} \right\rceil \\ &=\left\lceil \frac{\sigma^{2} \cdot z^2_{1 - \frac{\alpha}{2}}}{4\cdot l^{2}} \right\rceil \end{align*} ``` ::: This function generates data to determine the required sample size for constructing a confidence interval for the population mean with minimal rounding operations. Either the stimation error (`e`) or the length of the interval (`l`) must be provided. The relationship between `l` and `e` is given by $l = 2 \times e$. The function ensures that the computed standard deviation (`s`) differs from the known population standard deviation (`sigma`). ```{r} # Generate data for a confidence interval with estimation error ranging from 0.1 to 1.0 result <- CImulen_data(sigma = 1:10, e = (1:10)/10) str(result) result <- CImulen_data(sigma = 1:10, e = (1:10)/10, full=TRUE) head(result) ``` ### `CIpilen_data` The `CIpilen_data` function is designed for generating data to determine the necessary sample size of a confidence interval for the population proportion using ${z^2/l^2)}$. The estimation error (`e`) or the length of the interval (l) must be provided, where the relationship between them is defined as (${l=2*e}$). The function ensures that the computed sample proportion (`p`) deviates from the known population proportion (`pi`). ```{r} # Generate data for a confidence interval with estimation error 0.1 result <- CIpilen_data(pi = (1:9/10), e = (1:9)/10) # Display the result str(result) ``` ## Data Generation ### `add_data` `add_data` adds data point(s) to the left and/or the right of a given data vector `x`. 1. A box and its width is determined by - `box="range"` gives a box width of `width=max(x)-min(x)` and two points `xleft=min(x)` and `xright=max(x)` - `box="box"` gives a box width of `width=IQR(x)` and two points `xleft=quantile(x, 0.25)` and `xright=quantile(x, 0.75)` - `box=c(xleft, xright)` gives a box width of `width=xright-xleft` and two points `xleft` and `xright` 2. The numbers of additional data points is determined by `n` - `n=c(nleft, nright)` gives the number of points to generate at the left and right - `n=1` is a short form of `c(0,1)` (the default) 3. Within the interval [`xleft-range[2]*width`; `xleft-range[1]*width`] are `nleft` points uniformly drawn and within the interval [`xright+range[1]*width`; `xright+rang[2]*width`] are `nleft` points uniformly drawn (both intervals are colored in red) ```{r, echo=FALSE, fig.width=6, fig.height=3} par(mar=c(0,0,0,0)) plot(c(0, 1), c(0.15,1.15), axes=FALSE, type="n", xlab="", ylab="") rect(0.25, 0.25, 0.75, 0.75) text(0.25, 0.25, labels="xleft", pos=1) text(0.75, 0.25, labels="xright", pos=1) text(0.5, 0.75, labels="width", pos=3) arrows(0.25, 0.8, 0.75, 0.8, code=3, length=0.1) arrows(0.0, 0.5, 0.2, 0.5, code=3, col="red", length=0.1) arrows(0.8, 0.5, 1.0, 0.5, code=3, col="red", length=0.1) text(0.8, 0.8, "xright+range[1]*width", col="red", srt=90) text(1, 0.8, "xright+range[2]*width", col="red", srt=90) text(0, 0.8, "xleft-range[2]*width", col="red", srt=90) text(0.2, 0.8, "xleft-range[1]*width", col="red", srt=90) ``` ```{r} x <- runif(7, 165, 195) xr <- add_data(x, "range", n=c(0,1), range=c(1,1.5)) round(xr) xb <- add_data(x, "box", n=c(0,1), range=c(1,1.5)) round(xb) x1 <- add_data(x, box=c(165,195), n=c(0,1), range=c(1,1.5)) round(x1) ``` ### `cor_data` and `meanint_data` In this exercise, researchers aim to determine which variable, "number of absences in high school" ($X$) or "monthly income of parents" ($Z$), better predicts students' average grade points ($Y$) in the baccalaureate exam. ```{r} n <- sample(seq(25,50,5),1) y <- meanint_data(n, c(2,12)) x <- meanint_data(n, c(36, 50)) z <- meanint_data(n, c(2,6)) yx <- cor_data(y, x, r=sample((5:9)/10, 1)) yz <- cor_data(y, z, r=sample((5:9)/10, 1)) ``` Here, the function `meanint_data` is used to generate random data for the variables $X$ and $Z$. It takes two arguments: `n`, the number of the observations and `v`, a vector with two elements specifying the range for the data where the `n` values are allowed. The first element of the vector specifies the lower limit of the data and the second one specifies the upper limit for the data. The second function from the `exams.forge` package we are dealing with in this exercise is `cor_data`. This function is used in order to create a data set with two variables and a desired correlation coefficient `r`. It consists of 3 arguments: the dependent variable $Y$ , the independent variables $X$ and $Z$ and the correlation coefficient `r`. The function contains 2 more arguments not shown in this exercise, namely the argument `method` and the argument `maxit`. `method` indicates which correlation coefficient is to be computed, in this case it is going to be the default pearson correlation coefficient, and `maxit` presents the maximal number of iterations that is set by default on 1000. Overall, these two functions help in generating random data that simulate the relationships between variables as described in the exercise. ## Number of Observations ### `data_n`, `data_n25` and `data_nsq` The `data_n`, `data_nsq`, and `data_n25` functions are designed to generate sequences of sample sizes within a specified range, from `min` to `max`. Each function serves a unique purpose: - `data_n` generates a sequence of sample sizes in the specified range. - `data_n25` generates a sequence of sample sizes in the specified range that are divisible only by 2 and 5. - `data_nsq` generates a sequence of sample sizes in the specified range whose square root is an integer. ```{r} # Generate a sequence of sample sizes from 5 to 10 data_n(10) # Generate a sequence of sample sizes whose square root is an integer, from 9 to 961 data_nsq(1000) # Generate a sequence of sample sizes divisible only by 2 and 5, from 5 to 1000 data_n25(1000) ``` ## Number Properties ### `all_integer` Proves whether all `x`s are integer. ```{r} numbers_check <- c(4, 10, 7.00001) all_integer(numbers_check) ``` ### `divisor_25` Checks if a number can be represented as a product of powers of 2 and 5. ```{r} number_check <- 0.3125 result <- divisor_25(number_check) ``` ### `has_digits` Verifies whether the decimal part of a number consists only of digits within a specified tolerance. ```{r} # Taken from the exercise "Club_Raucher2" maxn <- 100 repeat { n <- sample(seq(5, maxn, 5),1) p <- sample((1:20)/100, 1) x <- n*c(p, 1-p) if (all(has_digits(x, 0))) break } print(has_digits(x, 0)) ``` ### `prime_numbers` Generates a list of prime numbers up to a specified limit. ```{r} prime_numbers(20) ``` ### `primes` Computes the prime factorization of each element in a numeric vector, providing a matrix that delineates the power of each prime number. ```{r} primes(1:5) ``` ## Result Generation with Rounding ### `as_result`, `rounded`, `tol`, `val` and `digits` This set of functions is designed to facilitate precise rounding of a numerical input `x` based on specified `digits` and a user-defined rounding function (`FUN`). Additionally, the functions offer a convenient way to set a tolerance for the result. If a tolerance is not explicitly provided, it defaults to the maximum of 2 times 10 to the power of negative `digits`. - `as_result (x, digits, tol = NA, FUN = round2)`: Rounds the input `x` with specified `digits` using the specified rounding function (`FUN`), and allows for setting a tolerance (defaulting to the maximum of 2 times 10 to the power of negative `digits` if not provided). - `tol(x)`: Computes the tolerance for a given input `x`. - `rounded(x)`: Returns the rounded value of `x`. - `val(x)`: Returns the value of `x`. - `digits(x)`: Returns the specified digits for rounding `x`. ```{r} x <- as_result(1/3, "prob") tol(x) rounded(x) digits(x) val(x) ``` ## Tables ### `as_table` Transforms a vector into a horizontal table, facilitating a more structured representation of the data. The parameters are the same as in `xtable` which is used internally. Intended for the use as (class) frequency table. ```{r} x <- runif(3) tab <- vec2mat(x, colnames=1:length(x)) as_table(tab) tab <- vec2mat(x, colnames=sprintf("%.0f-%0.f", 0:2, 1:3)) as_table(tab) ``` ### `assoc_data` Reorders observations in a frequency table to approximate a specified target association, while maintaining unchanged marginal frequencies. The function utilizes a provided frequency table and computes an association (or correlation) measure using the specified function (`FUN`). The target association may not be achieved entirely, especially for extreme target values like +1 or -1. - `target`: Specifies the target association to be approximated. If set to `NA`, the original table is returned. - `zero`: Allows for zero entries in the common distribution. - `tol`: Sets the maximal deviation of the association measure from the target value. - `maxit`: Limits the number of optimization steps. A solution is not assured, necessitating adjustments to parameters such as `maxit`, `tol`, or a reconsideration of the chosen target value. The resulting association value is stored in the attribute `attr ("target")`. ```{r} # Reordering observations in a frequency table to approximate a target association # Creating a frequency table (2x2) with arbitrary values frequency_table <- matrix(c(10, 20, 30, 40), nrow = 2, byrow = TRUE) # Defining a target association value target_association <- 0.5 # Applying assoc_data to reorder the frequency table to approximate the target association result_table <- assoc_data(frequency_table, target = target_association, zero = TRUE, tol = 0.1, maxit = 100) # Displaying the resulting reordered table print(result_table) ``` ## Vector Generation and Transformation ### `random` The `random` function generates a random permutation of indices from 1 to the length of the input vector `v`. ```{r} random(-1:6) ``` ### `refer` The `refer` function facilitates the generation of names for elements within a vector. It provides a mechanism for assigning customized names based on a specified format, allowing us to enhance the interpretability of vector elements. ```{r} # Generating a vector of 5 random uniform values x <- runif(5) # Applying refer with LaTeX default format latex_result <- refer(x) str(latex_result) # Applying refer with R default format r_default_result <- refer(x, fmt = "%s[%.0f]") str(r_default_result) ``` In the first example, a vector `x` is created with 5 random uniform values using `runif(5)`. The `refer` function is then applied to `x` without specifying a custom format (`fmt`). By default, the LaTeX format is used, resulting in names that follow the pattern "x\_{1}", "x\_{2}", ..., "x\_{n}", where n is the length of the vector. In the second example, the refer function is applied to the vector x with a custom format specified as "fmt="%s[%.0f]". This R default format results in names following the pattern"x[1]","x[2]", ...,"x[n]", where n is the length of the vector. ### `transformif` The `transformif` function offers conditional transformations for a vector `x` based on the specified condition `cond`. When the condition holds true, the transformation is applied to each element of `x`. The dynamic transformation is determined by parameters `a`, `b`, and `p`, allowing for versatile adjustments. Specifically, if the condition is met and `p` is set to 0, the transformation becomes $\log(a + b \cdot x)$; otherwise, it is $(a + b \cdot x)^p$. ```{r} # Generate a vector with a mix of positive and negative values v <- c(2, -3, 1, 0, 5, -4) # Transform only negative values using a custom shift (a) and scale (b) transformed_vector <- transformif(v, v < 0, a = 2, b = 0.5) # Display the original and transformed vectors cat("Original Vector: ", v, "\n") cat("Transformed Vector: ", transformed_vector, "\n") ``` ### `vec2mat` The `vec2mat` function transforms a vector into either a horizontal or vertical matrix, allowing users to specify new column and row names. Existing names can be overwritten if `colnames` or `rownames` are provided. ```{r} # Generate a vector vec <- c(1, 2, 3, 4, 5) # Convert the vector to a horizontal matrix with custom column names mat_horizontal <- vec2mat(vec, colnames = c("A", "B", "C", "D", "E")) # Display the resulting matrix print(mat_horizontal) # Convert the vector to a vertical matrix with custom row names mat_vertical <- vec2mat(vec, rownames = c("First", "Second", "Third", "Fourth", "Fifth"), horizontal = FALSE) # Display the resulting matrix print(mat_vertical) ``` # Statistical Analysis ## Approximations ### `binom2norm`, `clt2norm` and `t2norm` `binom2norm` checks if the sample size and parameters of a binomial distribution are suitable for approximating it with a normal distribution. Returns TRUE if conditions based on the binomial distribution parameters (`size`, `prob`, and optionally `type`) are met. The default threshold `c` is 9. The default value of $c=9$ can be overwritten with `options(distribution.binom2norm=5)` or explicitly set. ```{r} # Single type size <- 421 prob <- 0.5 cutoff <- 9 result_single <- binom2norm(size, prob, c=cutoff, type="single") cat("Single type:", result_single, "\n") # Double type result_double <- binom2norm(size, prob, c=cutoff, type="double") cat("Double type:", result_double, "\n") ``` `clt2norm` examines if the sample size (`n`) is large enough for the Central Limit Theorem to provide a reasonable approximation to a normal distribution. Returns TRUE if n is greater than a specified threshold (`c`), with the default threshold being 30. The default value of $c=30$ can be overwritten with `options(distribution.clt2norm=5)` or explicitly set. Note that this function does not verify the existence of the expectation and variance, which are required by the Central Limit Theorem. ```{r} # Check for a broader range of observations observations <- c(20, 40, 80, 120, 200, 300, 500, 1000) # Assess whether each observation size is suitable for CLT approximation clt_approximation_results <- clt2norm(n = observations) # Display the results print(clt_approximation_results) ``` `t2norm` determines if the sample size (`n`) is large enough for a t-distribution to be reasonably approximated by a normal distribution. Returns TRUE if `n` is greater than a specified threshold (`c`), where the default threshold is 30. The default value of $30$ can be overwritten with `options(distribution.t2norm=50)` or explicitly set. ```{r} # Check for a range of observations observations <- c(10, 30, 50, 100, 200) # Assess whether each observation size is suitable for t-distribution approximation approximation_results <- t2norm(n = observations) # Display the results print(approximation_results) ``` ## Bivariate Descriptive Statistics ### `grouped_data` Determines the mean, mode, quantile or median for data that has been grouped. ```{r} turnier <- ifelse(as.integer(format(Sys.Date(), "%Y")) %% 4 >= 2, "welt", "europa") popSize <- 100 classbreaks <- c(0, 50, 100, 200) gd <- grouped_data(classbreaks, popSize*ddiscrete(runif(length(classbreaks)-1)), 0.5) print(gd) ``` In this example we can observe how the `grouped_data` function calculates the median from the data that has been grouped, namely it takes the `classbreaks`, the product of `popSize` and a random discrete uniform distribution created using `ddiscrete(runif(length(classbreaks)-1))`, and a weighting factor of $0.5$ as parameters. ### `lcmval` This function computes the least common multiple for a numeric vector `x`. ```{r} lcmval(c(144, 160, 175)) ``` ### `mcval` The function computes all the modes (most common value) of data. ```{r} # Numeric x <- sample(1:5, size=25, replace = TRUE) table(x) mcval(x) # Character x <- sample(letters[1:5], size=25, replace = TRUE) table(x) mcval(x) # Histogram x <- hist(runif(100), plot=FALSE) mcval(x) mcval(x, exact=TRUE) ``` ### `nom.cc`, `nom.cramer`, `ord.spearman` and `ord.kendall` (Association) A set function which determines a compute association measure based on a contingency table: - `nom.cc` (Corrected Contingency Coefficient): Computes the corrected contingency coefficient, a statistical measure assessing the association between two categorical variables. This coefficient is an enhancement of the contingency coefficient, correcting for potential biases. - `nom.cramer` (Cramer's V or Phi): Calculates Cramer's V or Phi, providing a measure of association between categorical variables. Cramer's V is an extension of the phi coefficient, suitable for contingency tables with more than 2x2 cells. - `ord.spearman` (Spearman's Rank Correlation): Computes Spearman's rank correlation, a non-parametric measure of association between two ordinal variables. It assesses the monotonic relationship between the variables, providing insights into their degree of association. - `ord.kendall` (Kendall's Rank Correlation): Computes Kendall's rank correlation, a non-parametric measure evaluating the strength and direction of the association between two ordinal variables. This method is particularly suitable for detecting monotonic relationships. ```{r} tab <- matrix(round(10*runif(15)), ncol=5) nom.cc(tab) nom.cc(tab, correct=TRUE) nom.cramer(tab) ord.spearman(tab) ord.kendall(tab) ``` ### `pearson_data` The following exercise asks for the calculation of the Bravais- Pearson correlation coefficient from the scores recorded from selected students in the mathematics and statistics exam. ```{r} data(sos) n <- sample(4:8, 1) rseq <- seq(-0.95, 0.95, by=0.05) r <- sample(rseq, size=1, prob=rseq^2) xy0 <- pearson_data(r=r, nmax=n, n=100, xsos=sos100) str(xy0) ``` The `pearson_data` function is used to generate an integer data set that can be used to compute a correlation, specifically the Pearson correlation coefficient. It is designed to create a data set with a specified desired correlation value `r` while following using the function `sumofsquares`. We define 4 arguments for this function in our example: 1. `r`: as mentioned previously it is our desired correlation we want to achieve in the generated data set. The Pearson correlation coefficient measures the linear relationship between two variables and ranges from -1 to 1. 2. `n`: marks the number that we want to decompose as a sum of squares. The generated data set will consist of integer values, and this argument specifies how many data points should be included in the data set. Here `n` is set to 100, meaning that the data set will have 100 data points. 3. `nmax`: presents the maximal number of squares in the sum of squares. The `sumofsquares` function is used internally to generate the data set, and `nmax` controls the number of squares allowed in the decomposition. 4. `xsos`: is a precomputed matrix here set to 100. `maxt`, not mentioned in this exercise, specifies the maximal number of seconds that the `pearson_data` routine should run. It sets a time limit on how long the function can take to generate the data set. ### `sumofsquares` This function endeavors to express an integer, denoted as `n`, as a summation of squared integers ($n = \sum_{i=1}^k x_i^2$), where each $x_i$ lies within the range $1 \leq x_i < n$, and the count of terms ($k$) is bounded by $n_{\text{max}}$. If the parameter `zerosum` is set to true, it ensures that the summation $\sum_{i=1}^k c_i x_i$ equals zero, where $c_i$ can take values of either -1 or +1. The computational process is constrained by a specified time limit, `maxt` seconds, which might lead to an incomplete identification of all potential solutions. To optimize efficiency, the use of `rbind` operations within the function has been replaced by the allocation of matrices with a defined number of rows, denoted as `size`, to systematically collate the results. ```{r} # Example: Decomposing the integer 50 into a sum of squared integers sos_example <- sumofsquares(50, nmax = 8, zerosum = FALSE, maxt = Inf, size = 100000L) str(sos_example) ``` In this example, the `sumofsquares` function is employed to decompose the integer 50 into a sum of squared integers. The function allows a maximum of 8 terms in the decomposition (`nmax = 8`), does not enforce a zero sum (`zerosum = FALSE`), and has no time limit (`maxt = Inf`). The result is stored in the `sos_example` variable and then printed to the console. ## Univariate Descriptive Statistics ### `means` and `means_choice` `means_choice` computes a list of mean values for a given data vector `x`: - arithmetic mean, - median, - harmonic mean, - geometric mean, - (first) mode, - trimmed mean, and - winsorized mean. If the parameter `trim` and/or `winsor` set to `NA` then these means will not be computed. ```{r} digits <- 2 # round to two digits repeat { x <- round(runif(7, min=165, max=195), digits) ms <- means_choice(x, digits) if (attr(ms, "mindiff")>0.1) break # make sure that all values are different by 0.1 } ms <- unlist(ms) sc <- to_choice(ms, names(ms)=='mean') # arithmetic mean is the correct solution str(sc) ``` The attribute `mindiff` gives the minimal distance between two mean values. This might be important for setting `extol` the tolerance for numeric solutions. ### `scale_to` Given a numeric vector it uses a linear transformation to re-scale the data to a given mean and standard deviation. The default is to standardize the data. ```{r} x <- runif(21) y <- scale_to(x, mean=2, sd=0.5) print(y) ``` ## Combinatorics ### `combinatorics`, `permutation`, `variation` and `combination` Computation of all results for variation, combination and permutation with and without repetition. ```{r} variation(7,3) # without replication variation(7,3, TRUE) # with replication combination(7,3) # without replication combination(7,3, TRUE) # with replication permutation(7) permutation(7, c(2,1,4)) # three groups with indistinguishable elements z <- combinatorics(7, 4) str(z) ``` ```{r} permutation(5, c(2, 2)) ``` The warning is raised because the sum of the specified group sizes (`c(2, 2)`) is less than the total number of elements (`n = 5`). This implies that the specified groups do not cover all elements, leaving some elements without a designated group. In the context of permutations, the `permutation` function calculates the number of permutations of a set with specified group sizes. When there are not enough groups or when the sum of group sizes is less than the total number of elements, it means that some elements will be left unassigned or unmatched in the permutation process. To account for these unmatched elements, the function automatically adds one or more groups, each containing a single element, to cover the remaining elements. This ensures that every element has a place in the permutations. In this case, we have 5 elements and specified two groups, each with size 2. However, 1 element remains unassigned. The function adds a one-element group to accommodate the leftover element, and then it calculates the permutations of the entire set. To summarize, the warning essentially declares that the specified group sizes don't cover all the elements, and the function has automatically adjusted by adding one or more one-element groups to make sure every element is considered in the permutation calculation. ### `lfact`, `lfactquot` and `lbinom` `lfact` calculates the natural logarithm of the factorial of a given number `n`. The factorial of a non-negative integer `n`, denoted as `n!`, is the product of all positive integers less than or equal to `n`. The natural logarithm of the factorial is computed to avoid overflow errors when dealing with large numbers. This function helps in computing large factorial values efficiently by returning their natural logarithms. `lfactquot` calculates the natural logarithm of a quotient of factorials. It takes a number `n` and additional arguments as factors, and computes the natural logarithm of the quotient of the factorial of `n` and the product of factorials of the additional arguments. This function is useful in scenarios where calculating large factorials or their quotients is required, as it helps avoid numerical instability by working with logarithms. `lbinom` computes the natural logarithm of the binomial coefficient, also known as "`n` choose `k`". The binomial coefficient `n choose k` represents the number of ways to choose `k` elements from a set of `n` elements without regard to the order of selection. The natural logarithm of the binomial coefficient is computed to handle large values efficiently and to avoid numerical overflow. This function is helpful in scenarios where the exact value of the binomial coefficient is not required, but its logarithm is sufficient for computation or analysis. ```{r} lfact(5) lfactquot(5,3,2) lbinom(6,3) ``` ## Distributions ### `ddunif2`, `pdunif2`, `qdunif2` and `rdunif2` These functions provide probability mass function, distribution function, quantile function, and random generation for the sum of two independent discrete uniform distributions. The minimum and maximum values for the uniform distributions can be specified using the `min` and `max` parameters. - `ddunif2`: Distribution function. - `pdunif2`: Probability mass function. - `qdunif2`: Quantile function. - `rdunif2`: Random generation. ```{r} # Probability Mass Function pdunif2(1:13) # Distribution Function ddunif2(1:13) # Quantile Function qdunif2((0:4)/4) # Random Generation rdunif2(10) ``` ### `distribution` An object of class `distribution` holds a distribution (of a random variable). It is specified by a name and the distribution values. The name is used create quantile (`("q", name)`) and cumulative distribution functions (`("p", name)`), for example - `binom` hypergeometric distribution with parameters: `size`, `prob` - `hyper` hypergeometric distribution with parameters: `m`, `n`, `k` - `geom` geometric distribution with parameter: `prob` - `pois` Poisson distribution with parameter: `lambda` - `unif` hypergeometric distribution with parameters: `min`, `max` - `exp`: exponential distribution with parameter: `rate` - `norm`: normal distribution with parameters: `mean`, `sd` - `lnorm`: log-normal distribution with parameters: `meanlog`, `sdlog` - `t`: Student t distribution with parameter: `df` - `chisq`: chi-squared distribution with parameter: `df` - `f`: F distribution with parameters: `df1`, `df2` The names of the above-mentioned distributions can be abbreviated; for all others the exact name must be given. ```{r} d <- distribution("t", df=15) quantile(d, c(0.025, 0.975)) d <- distribution("norm", mean=0, sd=1) cdf(d, c(-1.96, +1.96)) d <- distribution("binom", size=9, prob=0.5) pmdf(d, 5) ``` ### `distribution` and `prob1` The `exams.forge` package includes numerous functions designed to aid with exercises involving distributions. In this exercise, the functions `distribution` and `prob1` serve as fundamental building blocks to unravel the intricacies of a dice game, where chance and probability intersect to determine one's success. ```{r} # Taken from the exercise "Würfel 2". d <- distribution("dunif", min=1, max=6) border <- sample(1:5, size=1)+1 ptype <- "point" lsg <- prob1(d, border) sc <- num_result(lsg, 4) str(d) print(lsg) ``` In the context of this exercise, the functions `distribution` and `prob1` play a crucial role in determining the probability of success in the dice game "Jule". `distribution` is used to model the outcomes of a six-sided die, while `prob1` calculates the probability of rolling the next required number, making them essential tools for understanding the game's dynamics. `d <- distribution("dunif", min=1, max=6)` This line defines a discrete uniform distribution called `d` with minimum value 1 and maximum value 6. Generally, the `distribution` function creates a distribution with a `name` in this case `dunif`. `lsg <- prob1(d, border)` This is the key part of the code. It calculates the point probability using the `prob1` function. The `prob1` function takes two arguments: - `d`: The probability distribution (in this case, the discrete uniform distribution representing the six-sided die). - `border`: A randomly selected value from the integers 1 to 5 (inclusive) using the `sample` function. The `prob1` function calculates the probability of rolling the next required number in the game, given the current state of the game (represented by the `border` value). It is an important function for this exercise as it directly addresses the main question of the probability of rolling the next required number. `sc <- num_result(lsg, 4)` This line defines a numerical result named `sc`. It captures the result of the point probability calculation done by the `prob1` function. ### `is.distribution` Checks if the `object` is a distribution object. If the `name` is given, it checks if the distribution type is the same. ```{r} # Check if an object is a distribution x <- distribution("norm", mean=1.4, sd=0,44) is.distribution(x) # Check if an object is a specific distribution type is.distribution(x, "exp") ``` ### `binom_param` and `sqrtnp` The `binom_param` function computes parameters for a binomial distribution based on the number of trials (`n`) and the success probability (`p`). Optionally, it calculates the mean, standard deviation, and other measures. If mean, standard deviation, or other measures are not specified, they default to NA. ```{r} # Generate binomial parameters for a specific case params <- binom_param(600, 0.6, mean = 0, sd = 0) # Display the generated parameters print(params) ``` The `sqrtnp` function calculates the square root of the product of `n`, `p`, and (1-p) for all combinations of given n and p values. If the resulting value has only digits after the decimal point, the corresponding n, p, and sqrt(n*p*(1-p)) are presented in a structured data frame. ```{r} # Calculate sqrtnp for different combinations of n and p result <- sqrtnp(n = c(50, 100, 150), p = c(0.25, 0.5, 0.75), digits = 3) # Display the resulting data frame print(result) ``` In this example: - The `sqrtnp` function is employed to compute the square root of the product of n, p, and (1-p) for various combinations of n and p. - The vectors `c(50, 100, 150)` and `c(0.25, 0.5, 0.75)` represent different observation numbers and probabilities, respectively. - The `digits` parameter is set to 3, specifying the number of digits to consider. - The resulting data frame, denoted as `result`, contains the combinations of n, p, and sqrt(n*p*(1-p)) where the computed value has only digits after the decimal point. This function is particularly useful for exploring the relationships between observation numbers, probabilities, and their respective square roots in a systematic manner. Adjusting the `digits` parameter allows users to control the precision of the results. ### `cdf` Computes the cumulative distribution function of a distribution using `paste0('p', name)`. ```{r} # Create a distribution object for a normal distribution normal_distribution <- distribution("norm", mean = 0, sd = 1) # Calculate CDF for normal distribution quantiles <- seq(-3, 3, by = 0.5) # Quantiles for which to compute CDF cdf_values <- cdf(normal_distribution, quantiles) # Compute CDF values # Display the results cat("Quantile\tCDF Value\n") cat("----------------------------\n") for (i in 1:length(quantiles)) { cat(quantiles[i], "\t\t", cdf_values[i], "\n") } ``` ### `pmdf` Computes the probability mass/density function of a distribution using `paste0('d', name)`. ```{r} # Taken from the exercise "Haribo_3" n <- sample(2:10, 1) # Gruppe 1: keine Frösche und Himbeeren nj <- 0 m <- sample(2:10, 1) # Gruppe 2: Frösche und Himbeeren mj <- sample(1:(m-1), 1) k <- mj+nj d <- distribution(name="hyper", m=m, n=n, k=k) lsg <- pmdf(d, k) str(lsg) ``` ### `sample_size_freq` The `sample_size_freq` function assesses the compatibility of vectors containing possible sample sizes (`n`) and corresponding relative frequencies (`f`). It checks whether the product of sample sizes and relative frequencies results in integer absolute frequencies. This function is particularly useful in scenarios where the requirement for integer absolute frequencies is essential, such as in the design of experiments and statistical sampling. ```{r} # Generating a set of random discrete probabilities with a total sum of 200 f <- ddiscrete(runif(6), unit=200) # Checking compatibility for a sequence of sample sizes from 50 to 300 with a step of 1 result_default <- sample_size_freq(seq(50, 300, 1), f) str(result_default) # Checking compatibility for a sequence of sample sizes from 10 to 700 with a step of 1, with 'which' set to 200 result_specific <- sample_size_freq(seq(10, 700, 1), f, which=200) str(result_specific) ``` - `f` is generated using the `ddiscrete` function. It creates a set of discrete probabilities based on a random uniform distribution with six elements. The `unit=200` argument ensures that the total sum of probabilities is 200. - `sample_size_freq` is applied to a sequence of sample sizes ranging from 50 to 300 with a step of 1. - The function returns the first sample size in the sequence that results in integer absolute frequencies. - `sample_size_freq` is applied to a sequence of sample sizes ranging from 10 to 700 with a step of 1. - The `which=200` argument is specified, meaning the function specifically returns the sample size at the 200th position in the sequence. - The function returns the sample size 200 from the specified sequence specifically, that satisfies the condition of creating integer absolute frequencies. In summary, this example demonstrates the use of the `sample_size_freq` function to check the compatibility of different sequences of sample sizes with the given discrete probabilities. The results indicate which sample sizes, under the specified conditions, result in integer absolute frequencies. ### `q2norm` The `q2norm` function takes two arguments: `x`, which is a numeric vector containing two quantiles, and `probs`, which is a numeric vector containing the corresponding probabilities (defaulting to `c(0.025, 0.975)`). The function calculates the z-scores corresponding to the input probabilities.Based on the quantiles and z-scores, it estimates the mean and standard deviation of the corresponding normal distribution.The results are returned as a list with components `mean` and `sd`. The example section demonstrates how to use the function with a set of example quantiles and probabilities, providing an estimated mean and standard deviation for the normal distribution. ```{r} # Estimate mean and standard deviation for a normal distribution based on quantiles. quantiles <- c(10, 20) # Example quantiles probabilities <- c(0.1, 0.9) # Example probabilities result <- q2norm(quantiles, probabilities) str(result) ``` ## Histogram Manipulation and Analysis ### `histbreaks` The `histbreaks` function is designed to randomly select breakpoints from a given set of `breaks` values. When the outer parameter is set to TRUE, it ensures that the first and last elements of the `breaks` values are always included in the resulting breakpoints. If `size` is provided as a vector, the number of breakpoints is first sampled from this vector, adding flexibility to the selection process. ```{r} # Always includes 100 and 200 in the breakpoints histbreaks(seq(100, 200, by = 10), 4) # Always includes 100 and 200; randomly chooses between 3 to 5 breakpoints histbreaks(seq(100, 200, by = 10), 3:5) # May not include 100 and 200 histbreaks(seq(100, 200, by = 10), 4, outer = FALSE) ``` ### `histdata` `histdata` computes data about the corresponding histogram to a vector like `hist`, but returns more information which might be necessary for exercises. In contrast to `hist` `histdata` requires that `breaks` covers the entire range of `x`. `histdata` has the additional parameter `probs`. If `breaks="quantiles"` then it determines which quantiles are used. ```{r} x <- runif(25) h1 <- hist(x, plot=FALSE) str(h1) h2 <- histdata(x) str(h2) ``` The returned list contains the following elements: - `x`: the finite data values used - `class`: the class number in which a value falls starting with 1 for the first class - `xname`: the x argument name - `breaks`: the class borders - `lower`: the lower class borders - `upper`: the upper class borders - `width`: the class widths - `mid`: the class mids - `equidist`: if the classes are equidistant or not - `counts`: the number of observations in each class - `relfreq`: the relative class frequency - `density`: the frequency density computed as relative frequency divided by class width You can compute mean, quantile, median and mode for a histogram: ```{r} x <- runif(25) h <- histdata(x) # mean mean(h) # median & quantile median(h) quantile(h) # mode mcval(h) mcval(h, exact=TRUE) ``` ### `histwidth` Creates histogram data sampled from a set of class widths with following properties: - the class density has a unique maximum, - are the class density terminating numbers, and - the class frequency maximum differs from the class density maximum. ```{r} hw <- histwidth(1.6, 2.1, widths=0.05*(1:4)) str(hw) x <- histx (hw$breaks, hw$n) hist(x, hw$breaks) rug(x) ``` ### `histx` Generates a data set based on specified class borders (breaks) and the desired number of observations for each class. The resulting data set is structured to distribute data points across the defined classes. ```{r} breaks <- seq(1.6, 2.1, by=0.1) x <- histx (breaks, sample(5:15, length(breaks)-1)) hist(x, breaks) rug(x) ``` In this example, `histx()` is used to generate a data set based on the specified breaks and the number of observations in each class. The resulting data is then plotted using the `hist()` function, and a rug plot is added using the `rug()` function. ## Probability Theory ### `data_prob2` The data_prob2 function generates a matrix of probabilities or frequencies based on the specified parameters. If data is provided, it will be normalized so that the sum of finite elements equals 1. If row and column names are not given, event names from the alphabet (`LETTERS`) are used. The resulting matrix has various attributes: - `marginals`: A list of row and column marginal distributions. - `byrow`: A matrix with conditional probabilities by row. - `bycol`: A matrix with conditional probabilities by column. - `expected`: A matrix with the expected probabilities under independence. - `prob`: A vector of all computed probabilities (excluding the expected ones). ```{r} # Generate a data_prob2 object with default parameters x <- data_prob2() str(x) # Generate a data_prob2 object with colnames="E" data_prob2(colnames="E") # Generate a data_prob2 object with nrow=3 data_prob2(nrow=3) ``` ### `ddiscrete` `ddiscrete` generates a finite one-dimensional discrete probability distribution. If the length of `x` is one then `x` is the number of elements. Otherwise `x` is considered a starting distribution and length of `x` is the number of elements. The parameter `zero` determines if the final distribution can contain the probability entry zero or not. Since, for computation of exercises based on a one-dimensional discrete probability distribution, it is favorable that the entries are fractions having the same denominator, the parameter `unit` can be used for this purpose. Thus, if the smallest non-zero denominator should be `1/7` then use `unit=7`; the default is a power of 10. ```{r} ddiscrete(6) # fair dice x <- runif(6) ddiscrete(x) ddiscrete(x, zero=TRUE) ddiscrete(x, unit=15) fractions(ddiscrete(x, unit=15)) ``` The next exercise acts as a second example for better understanding of the `ddiscrete` function: Exercise: Modify the Discrete Probability Function for a Biased Coin We consider a biased coin with an initial probability distribution represented as `c(0.8, 0.2, 0, 0, 0, 0)` , where the first element corresponds to the probability of getting heads, and the second element corresponds to the probability of getting tails. Here: Firstly, we use the `ddiscrete` function to create a discrete probability function for the biased coin. Secondly, we allow zeros in the final probabilities. And thirdly, we experiment with different resolutions by specifying different units. Hints: - We can use the `ddiscrete` function with the biased coin probabilities. - Set `zero = TRUE` to allow zeros in the final probabilities. - Experiment with different units, for example, `unit = 100` and `unit = 1000`. ```{r} # Exercise: Modify the discrete probability function for a biased coin # Given biased coin probabilities (Heads, Tails) biased_coin_prob <- c(0.8, 0.2, 0, 0, 0, 0) # 1. Create a discrete probability function for the biased coin biased_coin_fun <- ddiscrete(biased_coin_prob) print(biased_coin_fun) # 2. Create a modified discrete probability function allowing zeros modified_coin_fun <- ddiscrete(biased_coin_prob, zero = TRUE) print(modified_coin_fun) # 3. Experiment with different resolutions (units) unit_100 <- ddiscrete(biased_coin_prob, unit = 100) unit_1000 <- ddiscrete(biased_coin_prob, unit = 1000) print(unit_100) print(unit_1000) ``` This code performs the exercise steps, creating the original biased coin probability function, a modified version allowing zeros, and experimenting with different resolutions (units). ### `ddiscrete2` `ddiscrete2` generates a finite two-dimensional discrete probability distribution. The generation has two steps: 1. Generate two marginal finite two-dimensional discrete probability distributions. Based on this a joint probability for two independent distributions is generated. 2. Define target measure for association and target value for the association for the joint distribution. The current available association measure are: - `nom.cc`: (corrected) contingency coefficient - `nom.cramer`: Cramer's V or Phi - `ord.spearman`: Spearman's rank correlation - `ord.kendall`: Kendall's rank correlation ```{r} r <- ddiscrete(6) c <- ddiscrete(6) ddiscrete2(r, c) ddiscrete2(r, c, FUN=nom.cc, target=0.4) ddiscrete2(r, c, FUN=nom.cc, target=1) ``` The units are determined as units of `r` multiplied with the units of `c`. Since a iterative process is used the parameter `maxit` is set to 500. If the attribute `iterations` is equal to `maxit` then the iterative process has not been finished. The attribute `target` gives the association value obtained. ### `is.prob` The function `is.prob` serves the purpose of verifying whether a given numeric value `x` lies within the bounds of an open or closed interval defined by specified minimum (`min`) and maximum (`max`) values. By default, the function is configured to check if `x` falls within the standard open interval (0, 1), often associated with probability values. ```{r} is.prob(runif(1)) ``` In this case, the `runif(1)` generates a random numeric value between 0 and 1, and the `is.prob` function confirms that the generated value indeed falls within the standard open interval (0, 1). The result, in this instance, is `TRUE`. The function is particularly useful for scenarios where it is essential to ascertain whether a given numeric value is within the expected range, such as verifying whether a number represents a valid probability within the unit interval (0, 1). The default settings of the function align with the typical interval used for probabilities, facilitating a straightforward validation process. ### `pprobability` The `pprobability` function is designed to facilitate the generation and estimation of polynomials for discrete random variables. This versatile function allows us to construct polynomials, estimate both least squares and maximum likelihood solutions, and provides flexibility in specifying various parameters. ```{r} y <- pprobability(0:2, coef=seq(-2, 2, by=0.1)) str(y) ``` The `pprobability` function, when called with the arguments `pprobability(0:2, coef = seq(-2, 2, by = 0.1))`, performs the following: 1. Generated Polynomials: - Three linear polynomials are generated based on the user-defined coefficients. The coefficients are sampled from the sequence `-2` to `2` in increments of `0.1`. Each polynomial corresponds to a value in the discrete random variable `0:2`. 2. Estimated Polynomial: - The estimated polynomial is the sum of the generated polynomials. 3. Values of the Random Variable: - The values of the discrete random variable: 0, 1, 2. 4. Sample Structure: - The sample structure represents the frequency of each value in the random variable. In this case, each value occurs once (`c(0, 1, 2)`). 5. Least Squares Results: - The least squares method is applied to estimate a polynomial. The results include the estimated polynomial, its degree, and coefficients. 6. Maximum Likelihood Results: - The maximum likelihood method is applied to estimate a polynomial. The results include the estimated polynomial, its degree, and coefficients. The purpose of this function call is to generate and estimate polynomials for a discrete random variable (`0:2`) with a specified set of coefficients. The user-supplied coefficients (`seq(-2, 2, by = 0.1)`) influence the shape and characteristics of the generated polynomials. Both the least squares and maximum likelihood methods are used to estimate the polynomial parameters based on the generated data. ### `prob` Computes the probability for an interval between `min` and `max` (`max` included, `min` excluded). ```{r} # Compute the probability for an interval in a uniform distribution d <- distribution("unif", min=1, max=7) prob(d) ``` ## Simple Linear Regression ### `lm1_data` This function is designed to create data suitable for performing a simple linear regression with a predefined correlation coefficient. It accepts various parameters, including the desired correlation, the number of squares to decompose, and other options for data manipulation and scaling. The steps the function performs are as follows: 1. Generate `x` and `y` data so that the sum of squares of the values equals `n` and the sum of values equals 0 for both `x` and `y`. 2. Re-scale the data using user-defined center and scale values. 3. Conduct a simple linear regression analysis on the transformed data, allowing users to explore the relationship between `x` and `y` with the specified correlation. ```{r} n <- sample(4:8, 1) lm1 <- lm1_data(0.4, nmax=n, xsos=sos100) print(lm1) ``` ### `lmr_data` The `lmr_data` function in R serves the purpose of generating data suitable for conducting a simple linear regression analysis. Arguments of the function include: - `xr` and `yr`: The ranges for `x` and `y` values can be defined, allowing for controlled data generation. - `n`: This parameter specifies the number of observations to create. - `r`: If desired, a target correlation coefficient can be specified. If not provided, the function defaults to a zero correlation. - `digits`: There is the option to set the precision for rounding `x` and `y` values individually. Additional parameters can be passed to the function, which are further used in the underlying "cor_data" function. The function returns an `lm` object, which includes various components such as the generated `x` and `y` values, sums, means, variations, covariance, correlation, and the coefficients of a linear regression model. ```{r} n <- sample(c(4,5,8,10),1) lmr <- lmr_data(c(1,3), c(2,8), n=n, r=sample(seq(0.1, 0.9, by=0.05), 1)) print(lmr) ``` ## Tables ### `incomplete_table` The `incomplete_table` function is designed to complete a relative contingency table with missing values in such a way that the overall table entries can be recomputed. If a solution cannot be found, the function will generate an error. Consider a relative contingency table represented by the matrix `tab`, which has some missing values. 7 missing values must be filled in order to make the table computationally complete. ```{r} tab <- rbind(c(0.02, 0.04, 0.34), c(0.02, 0.28, 0.3)) result <- incomplete_table(tab, 7) print(result) # Here column no. 4 and row no. 3 constitute the summaries of their respective columns and rows. ``` Additionally, the function provides information about the filled-in values in the `fillin` attribute and the fully reconstructed table in the `full` attribute. The `fillin` matrix indicates which cells were filled and corresponds to the missing values in the incomplete table. The `full` matrix is the complete contingency table with all missing values filled. ```{r} # attr(,"fillin") # [,1] [,2] # [1,] 2 2 # [2,] 2 2 # [3,] 4 4 # [4,] 1 1 # [5,] 3 3 # [6,] 3 3 # [7,] 1 1 ``` In summary, the `incomplete_table` function helps to impute missing values in a relative contingency table, ensuring that the resulting table remains consistent and computationally valid. ```{r} # attr(,"full") # [,1] [,2] [,3] [,4] # [1,] 0.02 0.04 0.34 0.4 # [2,] 0.02 0.28 0.30 0.6 # [3,] 0.04 0.32 0.64 1.0 ``` ### `table_data` The `table_data` function is designed to generate a frequency table where each entry can be expressed in the form $2^{p_{ij}} \times 5^{q_{ij}}$. The function enforces the constraints $p_{ij} < m_2$ and $q_{ij} < m_5$. In the event that the algorithm fails to find a solution, an error is raised, prompting us to consider increasing the `unit` parameter for a more refined search. Once a valid table is identified, normalization is performed by dividing all entries by an appropriate factor to maintain integer values. Subsequently, a random multiplier in the format $2^p \times 5^5$ is selected, ensuring that the sum of the entries remains less than or equal to the specified limit `n`. ```{r} # Generate a frequency table with 4 rows and 3 columns generated_table <- table_data(nrow = 4, ncol = 3, unit = 20, n = 150, maxit = 5000) # Display the generated frequency table print(generated_table) ``` In this example: - The `table_data` function is applied to create a frequency table with 4 rows and 3 columns. - The `unit` parameter is set to 20, influencing the granularity of the search for a valid table. - The `n` parameter is set to 150, indicating the maximum sum of entries. - The resulting frequency table, denoted as `generated_table`, adheres to the conditions specified by the function, and all entries can be expressed in the form $2^{p_{ij}} \times 5^{q_{ij}}$. ## Tests ### `proptests`, `proptest_data` and `proptest_num` - `proptests` The `proptests` function systematically explores various modifications of the input parameters for `proptest` to generate a comprehensive set of proportion tests. If the `hyperloop` parameter is not specified, it will result in the generation of several hundred tests. The function returns a list of different tests, with the first element being the original `proptest` result. If only a specific element of a `proptest` result is of interest, providing the name of the element in `elem` will return all `proptests` where the specified element is different. ```{r} # Set up a base proportion test n <- 150 x <- sum(runif(n) < 0.6) basetest <- proptest_num(x = x, n = n) # Generate all different tests all_tests <- proptests(basetest, hyperloop = TRUE) str(all_tests) # Generate all different random sampling functions x_functions <- proptests(basetest, elem = "X", hyperloop = TRUE) str(x_functions) ``` In this example, a base proportion test (`basetest`) is created using a sample size (`n`) and the number of successes (`x`). The `proptests` function is then used to explore various modifications of the input parameters, generating all different tests in the first case and all different random sampling functions in the second case. - `proptest_data` Generates data for a binomial test based on specified test properties. This function is particularly useful for simulating scenarios and conducting binomial tests under different conditions. ```{r} # Generate binomial test data with default settings data_d <- proptest_data() # Generate binomial test data with custom settings data_c <- proptest_data( size = 20:50, # Vector of sample sizes prob = seq(0.1, 0.9, by = 0.2), # Vector of probabilities reject = FALSE, # Determines whether the generated data leads to a rejection of the null hypothesis alternative = "less", # Specifies the alternative hypothesis, must be "less" or "greater" alpha = 0.05, # Vector of significance levels norm.approx = TRUE, # Specifies whether a normal approximation should be used maxit = 500 # Maximum number of trials ) str(data_c) ``` - `proptest_num` Computes results for a test on proportions using either `stats::binom.test()` or a normal approximation without continuity correction. The function accepts named parameters or an argument list with parameters. - x: Number of successes. - n: Sample size (default: `sd(x)`). - pi0: True value of the proportion (default: 0.5). - alternative: A string specifying the alternative hypothesis (default: "two.sided"; "greater" or "less" can be used). - alpha: Significance level (default: 0.05). - binom2norm: Can the binomial distribution be approximated by a normal distribution (default: NA = use `binom2norm` function). The results may differ from `stats::binom.test()` as `proptest_num` is designed for hand-computed binomial tests. The p-value computed by `stats::binom.test` may not be reliable. ```{r} # Example with default parameters n <- 100 x <- sum(runif(n) < 0.4) result <- proptest_num(x = x, n = n) str(result) ``` In this example, the `proptest_num` function is used to compute results for a binomial test with specified parameters. The function returns a list of relevant values, including test statistics, critical values, acceptance intervals, and p-values. ### `ttests`, `ttest_data` and `ttest_num` The `ttest_data` function generates simulated data tailored for a t-test for a single mean, considering specified test properties. This facilitates the exploration of various scenarios and the evaluation of statistical hypotheses related to the mean. The `ttest_data` function consists of the following arguments: - size: a numeric vector specifying sample sizes to be generated, calculated as squares of integers ranging from 3 to 20. - mean: a numeric vector defining potential mean values for the simulated data, ranging from -5 to 5. - sd: a numeric vector determining standard deviations for the generated data, with values ranging from 0.1 to 1 in increments of 0.1. - reject: a logical vector that determines whether the generated values of variable x should result in the rejection of the null hypothesis (default is TRUE). If set to NA, this condition will be disregarded. - alternative: a character vector specifying the alternative hypothesis for the t-test, with options for "two.sided," "less," or "greater." - alpha: a numeric vector containing significance levels for hypothesis testing, including common values such as 0.01, 0.05, and 0.1. - z: a numeric vector defining quantiles for the standard normal distribution, used in hypothesis testing; ranges from -4.49 to 4.49 with increments of 0.01. - use.sigma: a logical value indicating whether the standard deviation (`sigma`) should be used in generating data; default is `TRUE`. ```{r} # Generate t-test data ttest_data_scenario1 <- ttest_data( size = c(25, 64, 121), mean = c(0, 2, -2), sd = c(0.5, 0.7, 1), reject = TRUE, # Rejection condition alternative = "two.sided", alpha = c(0.01, 0.05, 0.1), z = seq(-3.49, 3.49, by = 0.01), use.sigma = TRUE ) ``` In summary, this example represents a situation where we are generating t-test data for three different sample sizes and mean values, with specific rejection conditions. The generated data is tailored for hypothesis testing with a two-sided alternative hypothesis and varying significance levels. The condition reject = TRUE implies that the null hypothesis will be rejected based on the generated data. - `ttest_num` Is a function that helps with the computation of all the results for a t-test. We are testing this function with the following exercise that is intended to produce a one-sample t-test. The exercise is meant to assess whether a new variety of butter is worth launching, based on customers' willingness to pay a certain price. ```{r} sigma <- sample(5:30, size=1) ttest <- ttest_num(n = sample((4:8)^2, size=1), mu0 = sample(seq(1.5, 3, by=0.1)+0.5, size=1), mean = sample(seq(1.5, 3, by=0.1), size=1), alternative = 'greater', sd = sample((sigma-3:sigma+3), size=1)/10, sigma = sigma/10, norm = TRUE) str(ttest) ``` The exercise is set in the context of a butter manufacturer considering the launch of a new butter variety. To determine whether it's worth launching, the manufacturer wants to know if customers are willing to pay at least a specific price per pack of the new butter. This is why we use the `ttest_num` function, in order to make an informed decision with the help of a t-test. `ttest_num` computes all the results of the t-test as we can observe: - `n`: The sample size, representing the number of customers randomly selected for the survey. - `mu0`: The price the manufacturer intends to test as its objective. - `mean`: The average spending level of the sample's respondents. - `alternative`: The alternative hypothesis, set to 'greater,' indicating that the manufacturer is interested in testing whether customers are willing to pay more than the target price. - `sd`: The sample standard deviation, which reflects the range of prices that customers are ready to accept. - `sigma`: The population standard deviation, representing the standard deviation of prices in the entire population (unknown by default). - `alpha`: The significance level (set to 0.05). - `ttests` The `ttests` function systematically explores various modifications of the input parameters for t-tests, generating a comprehensive set of possible t-tests. Details regarding the specific parameter values employed can be found below. It is important to note that omitting the hyperloop parameter may result in the generation of approximately 5000 t-tests. The function returns only distinct t-tests, with the primary t-test stored as the first element. If there is interest in a specific element of the t-test, users can specify it using the elem parameter, and the function will return all t-tests where that particular element differs. ```{r} # Generate a base t-test base_ttest <- ttest_num(mean = 1.2, sd = 0.8, n = 30, sigma = 1) # Vary the parameters for hyperloop hyperloop_variation <- list( mean = c(base_ttest$mean - 0.5, base_ttest$mean, base_ttest$mean + 0.5), n = c(20, 30, 40), sd = c(0.7, 0.8, 0.9) ) # Obtain different t-tests with varied parameters different_ttests <- ttests(base_ttest, hyperloop = hyperloop_variation) # Extract t-tests where the element "Conf.Int" differs confint_differing_ttests <- ttests(base_ttest, "Conf.Int", hyperloop = hyperloop_variation) ``` - We start by generating a base t-test (`base_ttest`) with specified parameters such as mean, standard deviation, sample size, and population standard deviation using the `ttest_num` function. - The `hyperloop_variation` parameter is utilized to systematically vary the mean, sample size, and standard deviation in different scenarios. - The `ttests` function is then employed to generate distinct t-tests by modifying the base t-test with the specified variations. The resulting t-tests are stored in the variable `different_ttests`. - Additionally, the function is called again, this time focusing on the specific element "Conf.Int," and returning t-tests where this element differs. The results are stored in the variable `confint_differing_ttests`. This example demonstrates how the `ttests` function can be applied to explore various t-tests by systematically varying parameters, and it highlights the flexibility of extracting t-tests based on specific elements of interest. # Mathematical Computations ## Intervals ### `dbl`, `pos` and `neg` The `pos`, `neg`, and `dbl` functions are designed to generate intervals based on powers of ten. - `pos(pow)`: Generates positive intervals based on powers of ten. - `neg(pow)`: Generates negative intervals based on powers of ten. - `dbl(pow)`: Generates intervals that include both positive and negative values based on powers of ten. ```{r} # Generate double intervals result_1 <- dbl(2) print(result_1) # Generate positive intervals result_2 <- pos(3) print(result_2) # Generate negative intervals result_3 <- neg(3) print(result_3) ``` ## Polynomials ### `monomial` The `monomial` function constructs a polynomial in the form of $c \cdot x^d$, where $c$ is the coefficient and $d$ is the degree. The default values are set to create a monomial of degree 1 with a coefficient of 1. ```{r} degree <- 3 coefficient <- 2 # Generate a monomial with the specified degree and coefficient result_monomial <- monomial(3, 2) cat("Monomial:", result_monomial, "\n") ``` In this example, the `monomial` function is utilized to create a monomial with a degree of 3 and a coefficient of 2. The resulting monomial $2 \cdot x^3$ is then printed using the `cat` function. ### `pminimum` The `pminimum` function calculates the minimum value of a polynomial within a specified interval $[lower, upper]$. It evaluates the polynomial at critical points within the given interval, including the interval's boundaries, and returns the minimum value. ```{r} # Creating a polynomial and finding the minimum within a specified range custom_polynomial <- polynomial(c(2, -1, 4, -2)) # Represents 2x^3 - x^2 + 4x - 2 # Finding the minimum of the polynomial within the range [-1, 2] minimum_result <- pminimum(custom_polynomial, -1, 2) # Displaying the result print(minimum_result) ``` In this example, a custom polynomial `custom_polynomial` is created using the `polynomial` function with coefficients `c(2, -1, 4, -2)`, representing the polynomial $2x^3 - x^2 + 4x - 2$. The `pminimum` function is then applied to find the minimum value of the polynomial within the specified range $[-1, 2]$. The result is stored in `minimum_result`, and represents the minimum value of the polynomial within the given range. ## Rational Approximation ### `fractions` and `is_terminal` To overcome the rounding problem there is a simple approach: try to use (terminal) fractions. A terminal fraction generates a number with a finite number of digits, for example $\frac{1}{10}=0.1$. The command `fractions` calls simply `MASS::fractions()` to avoid explicitly loading the library `MASS`. The result of calling fractions has an attribute `fracs` which contains a (approximate) fraction as $\frac{numerator}{denominator}$ representation. ```{r} x <- c(1/5, 1/6) x fractions(x) str(fractions(x)) ``` Therefore, `is_terminal` tests if all entries are terminal fractions which means the denominators must be dividable by two and five only. ```{r} x <- c(1/5, 1/6) is_terminal(x) ``` Unfortunately, we use a decimal numeral system limiting the number possible of denominators which lead to terminal numbers; the ancient babylonian cultures using a sexagesimal numeral system had a larger number of denominators which would lead to terminal numbers. - `fractions` `fractions` is a copy of `MASS::fractions` to compute from a numeric values fractions. ```{r} # Create a 5x5 matrix with random values Y <- matrix(runif(25), 5, 5) # Display the matrix as fractions using the `fractions` function fractions(Y) # Perform matrix operations and display the results as fractions fractions(solve(Y, Y/5)) fractions(solve(Y, Y/5) + 1) ``` ## Solving Equations ### `equal` Compares two numeric values if they are equal given a tolerance (default: 1e-6). ```{r} x <- pi y <- pi+1e-4 equal(x, y) equal(x, y, tol=1e-3) ``` ### `equations` The equations function is used to define a set of equations using the formula interface. It also provides a LaTeX representation of the formulae. The resulting equations object includes information about the type of equation, its value, associated text, and the interval if applicable. ```{r} # Defining a system of economics equations econ_eq <- equations( Y ~ C + I + G + (X - M), "Y = C + I + G + (X - M)", C ~ c0 + c1*YD, "C = c_0 + c_1\\cdot YD", I ~ I0 - i1*r + i2*Y, "I = I_0 - i_1\\cdot r + i_2\\cdot Y", YD ~ Y - T, "YD = Y - T", T ~ t0 + t1*Y, "T = t_0 + t_1\\cdot Y", M ~ m0 + m1*Y, "M = m_0 + m_1\\cdot Y", X ~ x0 + x1*Y, "X = x_0 + x_1\\cdot Y", r ~ r0, "r = r_0" ) print(econ_eq) ``` In this example, the equations represent components of the Keynesian aggregate expenditure model, where $Y$ is the national income, $C$ is consumption, $I$ is investment, $G$ is government spending, $X$ is exports, and $M$ is imports. The model includes consumption functions, investment functions, taxation, and trade balance. ### `print.equations` The `print.equations` function serves as an S3 method designed for displaying an equations object containing equations and associated variables. Internally, it generates a data frame, providing a clear representation of the equations and their dependencies. ```{r} # The equations describe the formulae for an confidence interval of the mean e <- equations(o~x+c*s/sqrt(n), "v_o=\\bar{x}+c\\cdot\\frac{s^2}{n}", u~x-c*s/sqrt(n), "v_u=\\bar{x}-c\\cdot\\frac{s^2}{n}", e~c*s/sqrt(n), "e =c\\cdot\\frac{s^2}{\\sqrt{n}}", l~2*e, "l =2\\cdot e" ) print(e) ``` In this example, a set of equations is defined to describe the formulae for a confidence interval of the mean. Let's break down the code and understand each part: - The `equations` function is used to create an equations object (`e`). - Four equations are defined in terms of variables (`o`, `u`, `e`, `l`) and involve the variables `x`, `c`, `s`, and `n`. - Each equation is provided in a formula style, representing a statistical formula related to a confidence interval of the mean. - The `print` function is used to display the equations object (`e`). - The output presents the equations and associated variables in a structured format. This example demonstrates how the `equations` function can be utilized to create a set of equations representing statistical formulas. ### `variables` Is a function that allows the configuration of values, LaTeX representations, and solution intervals for variables within an `equations` object. The first argument must be the `equations` object, followed by named parameters to specify values, intervals, and LaTeX representations for specific variables. This function enables the modification of the `equations` object to incorporate specific variable information. ```{r} # The equations describe the formulae for a confidence interval of the mean e <- equations(o~x+c*s/sqrt(n), "v_o=\\bar{x}+c\\cdot\\frac{s^2}{n}", u~x-c*s/sqrt(n), "v_u=\\bar{x}-c\\cdot\\frac{s^2}{n}", e~c*s/sqrt(n), "e =c\\cdot\\frac{s^2}{\\sqrt{n}}", l~2*e, "l =2\\cdot e" ) # Set variable values, intervals, and LaTeX representations e <- variables(e, x=0, "\\bar{x}", c=2.58, dbl(2), s=1, pos(5), "s^2", n=25, pos(5), l=pos(5), e=pos(5), u="v_u", o="v_o") # Print the modified equations object print(e) ``` The provided R example involves creating a set of equations representing the formulae for a confidence interval of the mean, including variables such as `o`, `u`, `e`, and `l`. Subsequently, the `variables` function is applied to set specific values, intervals, and LaTeX representations for these variables. For instance, `x` is assigned a value of 0, `c` is set to 2.58 with an interval of [1, 2], and the LaTeX representation for `s` is defined as "s\^2". The modified equations object is then printed, showcasing the customized variable settings and representations. This approach demonstrates efficient manipulation and customization of mathematical expressions within the R environment. ### `num_solve` The `num_solve` function is designed to compute the value of a target variable in a set of equations. The equations, representing relationships between variables, are transformed into root-finding problems, and the function attempts to find the roots using the `stats::uniroot()` function. If successful, the computed value of the target variable is returned; otherwise, `numeric(0)` is returned. If the target variable is not specified (`target==''`), the function returns all computed values and steps. The `compute` attribute contains a data frame with information about the computation steps. ```{r} # The equations describe the formulae for an confidence interval of the mean e <- equations(o~x+c*s/sqrt(n), "v_o=\\bar{x}+c\\cdot\\frac{s^2}{n}", u~x-c*s/sqrt(n), "v_u=\\bar{x}-c\\cdot\\frac{s^2}{n}", e~c*s/sqrt(n), "e =c\\cdot\\frac{s^2}{\\sqrt{n}}", l~2*e, "l =2\\cdot e" ) # Setting variables and their values e <- variables(e, x = 0, c = 2.58, s = 1, n = 25, l = pos(5), e = pos(5), u = "v_u", o = "v_o") # Finding confidence interval length ('l') ns <- num_solve('l', e) # Computing all possible values ns <- num_solve('', e) print(ns) ``` In this example, the function is used to find the confidence interval length (`l`) based on a set of equations and variable values. Here, the function is also used to compute all possible values for the variables specified in the equations. In both cases, the resulting `ns` object contains information about the computation, including the values of variables and computation steps. The `compute` attribute provides a data frame with details about each variable's value in the computation process. ## Value and Extremes Analysis ### `extremes` Calculates the extrema of real values, including minima, maxima, and saddle points, for a univariate polynomial. The computation can be tailored to focus on specific categories of extrema. ```{r} p <- polynomial(c(0,0,0,1)) extremes(p) ``` ### `nearest_arg` `nearest_arg` is a function designed to identify the closest candidate value for each element in the input argument (`arg`). This function serves as an enhanced alternative to the base R function `match.arg`, offering improved tolerance for potential typographical errors. However, it's important to note that while `nearest_arg` enhances error resilience, detecting an incorrect choice may be challenging if one occurs. ```{r} # Sample usage of nearest_arg valid_colors <- c("red", "blue", "green", "yellow", "orange") # Input color names with potential typos input_colors <- c("rad", "blu", "grien", "yello", "ornge") # Applying nearest_arg to find the closest valid color names result_colors <- nearest_arg(input_colors, valid_colors) # Displaying the result cat("Input Colors:", input_colors) cat("Nearest Valid Colors:", result_colors) ``` - `valid_colors`: A vector representing the valid color names. - `input_colors`: A vector containing color names with potential typos or deviations. - `result_colors`: The output of `nearest_arg` applied to `input_colors` and `valid_colors`. In this example, `nearest_arg` is utilized to identify the nearest valid color name for each input color. The function demonstrates its effectiveness in handling potential typos or variations in the input color names. The result provides a vector of the nearest valid color names, showcasing how `nearest_arg` enhances error tolerance and accurately identifies the closest valid candidates in a given set. ### `unique_max` Checks if the numeric vector `x` possesses a singular maximum. This function evaluates whether the discrepancy between the largest and second-largest values in `x` is greater than a specified minimum distance, `tol`. ```{r} # Generate a vector with a unique maximum vec_unique_max <- c(3, 7, 5, 2, 8, 6, 4) # Check if vec_unique_max has a unique maximum with the default tolerance (1e-3) result_default_tol <- unique_max(vec_unique_max) # Check if vec_unique_max has a unique maximum with a larger tolerance (1) result_large_tol <- unique_max(vec_unique_max, tol = 1) # Print the results cat("Default Tolerance Result:", result_default_tol, "\n") cat("Large Tolerance Result:", result_large_tol, "\n") ``` # Exercise Generation ## Structured Exercise Development ### `all_different` For solutions in multiple choice exercises you want to ensure that the numerical results are not too near to each other. Therefore, `all_different` checks if the differences between the entries in `obj` are larger than some given value `tol`. ```{r} x <- runif(20) all_different(x, 1) # Minimal distance is at least 1 all_different(x, 1e-4) # Minimal distance is at least 0.0001 ``` ### `calledBy` Checks if the call stack, obtained from `base::sys.calls`, contains a call from the specified function (`fun`). ```{r} # Define functions funa and funb funb <- function() { calledBy('funa') } funa <- function() { funb() } # Call funa and check if it is called by funb result <- funa() # Display the result str(result) ``` ### `exercise` The `exercise` function is used to create and modify a data structure for exercise data. `exer` represents an existing exercise data structure or NULL to create a new one. ```{r} # Create a new exercise data structure exer <- exercise() # Add a parameter 'x' to the exercise data structure exer <- exercise(exer, x = 3) str(exer) ``` ## Solution Handling and Result Formatting ### `solutions` - `sol_num` generates a numerical solution object for a given numeric value. The function automatically determines tolerance if not provided, considering the range of values. Additionally, it captures relevant information about the source context, including the script's name or file path. ```{r} # Example 1: Calculating a solution with default parameters s <- sol_num(sqrt(2)) str(s) # Example 2: Numeric solution with tolerance and rounding sol_num(pi, tol=0.001, digits=3) ``` - `sol_int` extends the functionality of the `sol_num` function by rounding the given numeric value to the nearest integer. It generates an integer solution object with optional parameters for tolerance and rounding digits. ```{r} # Example: Creating an integer solution integer_solution <- sol_int(7.89, tol=0.01, digits=2) str(integer_solution) ``` - `sol_mc` generates a multiple-choice solution object by combining false (x) and true (y) answers. The number of false and true answers to include can be altered, shuffling options can be specified, and a default option when none of the choices apply can be provided. The resulting solution object captures the answer list, solution indicators, and relevant source context information. ```{r} # Example: Creating a multiple-choice solution for a biology quiz plants <- c("Moss", "Fern", "Pine", "Rose", "Tulip") flowering_plants <- c("Rose", "Tulip") non_flowering_plants <- setdiff(plants, flowering_plants) s_plants <- sol_mc(non_flowering_plants, flowering_plants, sample=c(2, 2), shuffle=FALSE, none="None of the above") str(s_plants) ``` - `sol_ans` extracts the answer list from a multiple-choice solution object created using the `sol_mc` function. It facilitates the presentation of correct and potential answer choices in various formats, including LaTeX for exams2pdf compatibility. ```{r} # Example: Extracting correct answers from a biology quiz s <- sol_mc(c("Oak", "Maple", "Rose"), c("Tulip", "Sunflower"), sample=c(2, 1), none="No valid options") sol_ans(s) ``` - `sol_tf` extracts the solution list (True or False) from a multiple-choice solution object created using the `sol_mc` function. It facilitates the presentation of binary representations of correct and incorrect choices in various formats, including LaTeX for exams2pdf compatibility. ```{r} # Example: Extracting True/False solutions from a chemistry quiz s <- sol_mc(c("Copper", "Silver", "Gold"), c("Oxygen", "Carbon"), sample=c(2, 1), none="None of the above") sol_tf(s) ``` - `sol_info` generates a Meta-Information block for a given solution object. It provides additional context and details about the solution, including its type, solution values, tolerance, and source context. ```{r} # Example: Displaying Meta-Information for a statistical analysis stat_analysis <- sol_num(mean(c(5, 8, 12, 15, 18)), tol = 0.01, digits = 2) info_stat <- sol_info(stat_analysis) cat(info_stat) ``` ### `int_result` and `num_result` `num_result` is a function that generates a list containing various elements for numeric results. The key components of this list include: - `x`: The original numeric values. - `fx`: The rounded values with the `exams::fmt()` function, represented as characters. - `tolerance`: The specified tolerance for rounding. - `digits`: The number of digits used for rounding. It's important to note that `x` can contain more than one numeric value, and in such cases, ensure using `...$x[1]` for numeric exercises. If `digits` are not explicitly provided and `length(x) > 1`, the function calculates `ceiling(-log10(min(diff(sort(x)), na.rm=TRUE)))`. If `digits` are not provided and `length(x) == 1`, it uses `3 + ceiling(-log10(abs(x)))`. If no tolerance is specified, `tolmult * 10^(1 - digits)` is employed. Additionally, the auxiliary function `int_result` can be used when the result is an integer number. It calls `num_result(x, 0, 0.1, 1, ...)` with a tolerance of 0.1. As for the exercise provided, it involves generating random values for variables such as `hours`, `lambda`, `busses`, and `border`. The exercise utilizes the exponential distribution and aims to create a scenario related to waiting times for buses. The `num_result` and `int_result` functions are then employed to format and round the results appropriately for use in statistical exercises. The overall goal is to create a dynamic and varied set of exercises with numerical outcomes based on the specified parameters. ```{r} # Exercise "Bluthochdruck" alpha <- sample(c(0.01, 0.02, 0.05, 0.1, 0.2), 1) n <- sample(5:15, 1) smean <- 80:160 ssig <- 1:50 ski <- sample(smean,1) sigma <- sample(ssig,1) a <- ski-sigma b <- ski+sigma X <- sample(seq(a,b,1),n,replace=TRUE) #part a xBar <- round(mean(X)) s2 <- var(X) s2 <- round(s2) s <- round(sqrt(s2),2) #part c c <- round(qt(1-alpha/2, n-1), 3) v_u <- xBar - c * sqrt(s2/n) v_o <- xBar + c * sqrt(s2/n) dig <- 1-floor(log10((c-qnorm(1-alpha/2))*sqrt(s2/n))) sc <- num_result(v_u, digits=dig, tolmult=1) print(sc) ``` This example demonstrates how to generate random data, perform statistical calculations, and use the `num_result` function to obtain a numerical result for a confidence interval. The focus is on rounding precision and tolerance. Here the `num_result` function is called with the upper confidence limit `v_u`, specifying the desired precision (digits) and a tolerance multiplier (`tolmult`). ## File Manipulation and Document Enhancement ### `makekey` The `makekey` function generates a character key from a vector of integers. It takes a numeric vector `index` as input and converts each element into a character, creating a comma-separated string representation of the indices. ```{r} makekey(c(3, 7, 10)) ``` - The function `makekey` is applied to the numeric vector `c(3, 7, 10)`. - Each numeric value in the vector is converted to a character. - The resulting characters are then joined into a single string, separated by commas. - In this specific example, `makekey(c(3, 7, 10))` generates the key "3, 7, 10". ### `moodle_m2s` The `moodle_m2s` function addresses a limitation in the `exams` package by enabling support for multiple-choice questions with multiple correct answers, a feature allowed by Moodle but not directly supported by exams. This function processes an XML file created by `exams.forge`, specifically adapting the representation of multiple-choice questions: - Changes `...` to `true` - Adjusts the `fraction` attribute in `...` tags. If the fraction is less than 0, it is set to zero, and if it's greater than 0, it is set to 100. If the file does not have a .xml extension, .xml is appended. Finally, the modified XML code is saved in `newfile`. ```{r} # Modifying a Moodle XML file for multiple-choice questions with multiple correct answers # Example 1: Using moodle_m2s on a specified file # Assuming 'my_moodle_file.txt' is the original Moodle XML file # original_file <- "my_moodle_file.txt" # Applying moodle_m2s to modify the XML file # modified_file <- moodle_m2s(original_file) # Displaying the name of the modified XML file # cat("Example 1: Modified XML file saved as:", modified_file, "\n") # Example 2: Using moodle_m2s on a file from the exams.moodle package # if (interactive()) { # Creating a temporary file with .xml extension # newfile <- tempfile(fileext=".xml") # Using moodle_m2s on the 'klausur-test.xml' file from the exams.forge package # moodle_m2s(system.file("xml", "klausur-test.xml", package="exams.forge"), newfile=newfile) # Opening the modified XML file for editing with file.edit(newfile) } ``` In the first example, the `moodle_m2s` function is applied to address the limitation in the exams package regarding multiple-choice questions with multiple correct answers. The original Moodle XML file is assumed to be named `my_moodle_file.txt`. The function processes this file, making necessary adjustments such as changing `...` to `true`. It also adjusts the fraction attribute in `...` tags, ensuring that it is set to zero if less than 0 and set to 100 if greater than 0. The modified XML code is then saved in a new file, and the name of the modified XML file is printed. It's important to note that the function automatically appends `.xml` to the file name if it does not already have a `.xml` extension. The second example demonstrates the interactive use of the `moodle_m2s` function. It creates a temporary file with a `.xml` extension and applies the function to the `klausur-test.xml` file from the `exams.forge` package. The modified XML file is then opened for editing using `file.edit`. If run interactively, the modifications made by the function can also be viewed and edited. ### `spell` The `spell` function conducts a spell check on RMarkdown files while selectively disregarding specified keywords commonly used in the context of `exams`. This is achieved through the utilization of the `spelling::spell_check_files()` function. ```{r} # Perform spell check on an RMarkdown file, ignoring specific keywords # spell_result <- spell("path/to/my/file.Rmd") # Alternatively, perform spell check on multiple files # spell_result_multiple <- spell(c("path/to/file1.Rmd", "path/to/file2.Rmd")) # Display the spell check results # print(spell_result) ``` In this example: - The `spell` function is used to conduct a spell check on an RMarkdown file located at "path/to/y/file.Rmd" while ignoring specified keywords common in `exams`. - Alternatively, the function is applied to multiple files by passing a vector of file paths. - The results of the spell check are stored in the `spell_result` and `spell_result_multiple` variables. # String Manipulation ## Conditional String Output ### `catif` Calls `cat` if the specified condition (`cond`) is TRUE. ```{r} # Call catif with TRUE condition catif(TRUE, "PDF") # Call catif with FALSE condition catif(FALSE, "Moodle") # There is no output with this condition ``` ### `nosanitize` The `nosanitize` function allows us to bypass any sanitation procedures on character vectors. It is designed for situations where no additional sanitization or modification of strings is required, providing us with direct access to the original unaltered data. ```{r} original_strings <- c("Hello, World!", "", "1234567890") # Applying nosanitize to preserve original strings unsanitized_strings <- nosanitize(original_strings) print(unsanitized_strings) ``` In this example, the `nosanitize` function is used to process a vector of strings (`original_strings`) without performing any sanitation.The resulting `unsanitized_strings` vector preserves the original content, including any potentially unsafe characters or HTML tags. ## Number to String Conversion ### `fcvt` The `fcvt` function converts a numeric vector to a string containing either a floating-point or a fractional number. It is particularly useful for representing repeating or recurring decimals as rational numbers. The function supports various options for controlling the output format. - `x`: Numeric vector to be converted. - `nsmall`: Number of decimal places for floating-point numbers. - `plus`: Logical, indicating whether to include a plus sign for positive numbers. - `denom`: Integer controlling the output format: - If negative, always decimal point numbers are used (default). - If zero, a mix of decimal point and fractional numbers are used (whichever is shorter). - If one, fractional numbers are used except for integers. - If larger than one, the denominator is set to `denom` if possible. ```{r test} # Example 1 x3 <- c((0:16)/8, 1/3) fcvt(x3) # Example 2 fcvt(x3, denom=0) # Example 3 fcvt(x3, denom=1) # Example 4 fcvt(x3, denom=8) ``` ### `num2str` Converts a set of numeric variables to a list of string representations, allowing for both decimal and fractional number formats. The function takes numeric variables as arguments and an optional denominator for the fractional representation. The result is a list where each element corresponds to the string representation of a numeric variable. ```{r} x <- 1 str(num2str(x)) y <- 2 str(num2str(x, y)) str(num2str(x, y, z=c(x,y))) ``` ## Quote and Prefix and/or Suffix Manipulation ### `affix`, `unaffix` - `affix` adds a specified prefix and/or suffix to a character vector. ```{r} random_values <- runif(5) new_value <- affix(random_values, prefix = "$", suffix = "$") ``` - `unaffix` removes specified prefixes and/or suffixes from a character vector. ```{r} random_numbers <- c("$15.3", "$7.9", "$22.6") new_numbers <- unaffix(random_numbers, prefix = "$", suffix = "") ``` ### `cdata`, `uncdata` - `cdata` adds a \<[CDATA[ prefix and ]]\> suffix to a character vector, ensuring proper encapsulation for XML or HTML data content. ```{r} new_data <- c(5.5, 12.3, 8.9) cdata_representation <- cdata(new_data) ``` - `uncdata` removes the \<[CDATA[ prefix and ]]\> suffix from a character vector, commonly used in XML and HTML processing. ```{r} cdata_numbers <- c("", "", "") new_numbers <- uncdata(cdata_numbers) ``` ### `bracket` Adds a ( as prefix and ) as suffix to a (character) vector. ```{r} existing_values <- c(10, 20, 30) new_values <- bracket(existing_values) ``` ### `math` Encloses a character vector with the dollar symbol (\$) as both prefix and suffix, often used for mathematical expressions. ```{r} numeric_vector <- c(3.14, 2.718, 1.618) math_representation <- math(numeric_vector) ``` ### `unquote` Eliminates double quotes as both prefix and suffix from a character vector. ```{r} quoted_values <- c("\"42.0\"", "\"8.8\"", "\"16.5\"") unquoted_values <- unquote(quoted_values) ``` ### `breaks` Generates a set of breakpoints for a given data vector `x`. The breaks can be either equidistant or non-equidistant. If the `width` parameter is not specified, it defaults to the first difference of the rounded values from `pretty(x)`. The `probs` parameter defines the number of quantiles or a vector of probabilities with values in [0, 1]. If the `width` is too large, using `probs` may result in equidistant breaks. ```{r} # Generate breaks for a random normal distribution x <- rnorm(100, mean = 1.8, sd = 0.1) breaks(x) # Generate breaks with specified width for the same distribution breaks(x, 0.1) # Generate quantile-based breaks with specified width for the distribution breaks(x, 0.1, probs = 4) ``` ## Vector to String Conversion ### `as_fraction` Converts numeric values into fractions, optionally in LaTeX format and allowing sorting. ```{r} x <- round(runif(5), 2) as_fraction(x) as_fraction(x, latex = TRUE) ``` ### `as_obs` Creates a string representing observations with optional sorting and LaTeX formatting. ```{r} # Taken from the exercise "Niederschlag" smean <- 250:350 ssig <- 1:10 ski <- sample(smean, 1) sigma <- sample(ssig, 1) a <- ski-sigma b <- ski+sigma repeat{ X <- sample(seq(a,b,1),5,replace=TRUE) xbar <- sum(X)/5 if (abs(xbar-round(xbar))<1e-3) break } #part a sumSize = sum(X) xBar <- round(xbar,2) S2 <- round(var(X), 2) sx <- as_obs(X, last=" und ") ``` ### `as_string` Converts a vector or list of values into a readable string with specified separators. ```{r} # Taken from the exercise "Dart 2" fields <- c(6, 13, 4, 18, 1, 20, 5, 12, 9, 14, 11, 8, 16, 7, 19, 3, 17, 2, 15, 10) N <- 82 ind <- sort(sample(20, 2)) mname <- paste0("eines der Felder, die zu den Nummern ", as_string(fields[ind[1]:ind[2]], last=" oder "), " gehören") print(mname) ``` ### `as_sum` Creates a string representation of a sum expression for numeric values. ```{r} x <- round(runif(5), 2) as_sum(x) ``` ## Miscellaneous Functions ## Function Helper ### `gapply` The `gapply` function executes a given function (`FUN`) for all combinations of parameters specified in the ellipsis (`...`). This facilitates grid application, where each combination of parameters is applied to the function. The use of `I(.)` allows preventing certain elements from being interpreted as grid values. If an error occurs during the execution of the function, the corresponding result will not be stored, and missing indices may be observed in the returned list. ```{r} # Execute 4 function calls: sum(1,3,5:6), sum(1,4,5:6), ..., sum(2,4,5:6) gapply("sum", 1:2, 3:4, I(5:6)) ``` ## Formatting ### `replace_fmt` The `replace_fmt` function is designed to substitute names within a text with values that are formatted either through the `exams::fmt()` function or as strings. This facilitates the integration of formatted values or strings into a given text. ```{r} # Formatting numeric values with a list specifying precision for each variable, overriding y's precision to 0 result1 <- replace_fmt("\\frac{x}{y}", x = 2, y = 3, digits = list(2, y = 0)) # Formatting LaTeX expressions as strings result1 <- replace_fmt("\\frac{x}{y}", x = "\\\\sum_{i=1}^n x_i", y = "\\\\sum_{i=1}^n y_i") ``` The first example showcases custom precision for each variable using a list, with `y` overridden to have zero digits. The second example illustrates the use of LaTeX expressions as strings, incorporating them into the formatted LaTeX expression. # LaTeX and HTML Functions (Multi-Format Rendering Functions) ## Introductory LaTeX Functions ### `answercol` Customizes LaTeX documents by specifying the number of answer columns using the \def\answercol{n} command. ```{r} # Set the number of answer columns to 2 in the LaTeX document answercol(2) ``` ### `hypothesis_latex` This function generates a structured data frame to represent test hypotheses. The resulting data frame includes various columns: - `h0.left`: Represents the left value in the null hypothesis, typically denoted as $\mu$ or $\pi$. - `h0.operator`: Indicates the operator used in the null hypothesis, selected from eq, ne, lt, le, gt, or ge. - `h0.right`: Denotes the right value in the null hypothesis, often expressed as $\mu_0$, $\pi_0$, or a hypothetical value. - `h1.left`: Signifies the left value in the alternative hypothesis, typically $\mu$ or $\pi$. - `h1.operator`: Specifies the operator in the alternative hypothesis, chosen from eq, ne, lt, le, gt, or ge. - `h1.right`: Represents the right value in the alternative hypothesis, usually $\mu_0$, $\pi_0$, or a hypothetical value. - `H0`: Provides the LaTeX representation of the null hypothesis. - `H1`: Presents the LaTeX representation of the alternative hypothesis. - `match.left`: Indicates whether the left values in the null and alternative hypotheses match. - `match.right`: Specifies whether the right values in the null and alternative hypotheses match. - `match.operator`: Determines whether the operators in the null and alternative hypotheses match, covering all real numbers. - `match.type`: Describes the matching type as wrong, left.sided, right.sided, two.sided, greater, or less. If the null hypothesis is not provided, it is determined from the alternative hypothesis. Valid values for the alternative and null include two.sided, greater, less, eq, ne, lt, le, gt, or ge. ```{r} hypothesis_latex("\\mu", alternative=c("eq", "ne", "lt", "le", "gt", "ge"), null=c("eq", "ne", "lt", "le", "gt", "ge")) ``` Here the function `hypothesis_latex` is used to generate a data frame that represents different hypotheses related to the population mean ($\mu$). Let's break down the key components of this example: - `\\mu`: The symbol for the population mean in LaTeX format, which is specified as the first argument to the function. - `alternative`: A vector specifying the alternative hypotheses. In this example, the alternatives include: - `eq`: Equality - `ne`: Inequality - `lt`: Less than - `le`: Less than or equal to - `gt`: Greater than - `ge`: Greater than or equal to - `null`: A vector specifying the null hypotheses. It includes the same set of hypotheses as the `alternative` vector. The function will generate a data frame with columns representing various aspects of the hypotheses, such as left and right values, operators, LaTeX representations, and matching criteria. The resulting data frame will contain rows corresponding to all possible combinations of operators in the null and alternative hypotheses. Each row represents a unique hypothesis scenario. The `match` columns indicate whether the left and right values, as well as the operators, match between the null and alternative hypotheses. In essence, this example explores and generates a comprehensive set of hypotheses involving the population mean with different combinations of operators in both null and alternative hypotheses. ### `latexdef` Enhances LaTeX document customization by adding a \def\name{body} command, enabling the inclusion of personalized definitions within the document body. ```{r} latexdef("myvariable", "42") ``` ### `pdensity` and `toLatex` The `pdensity` function generates a density function in a specified interval [a, b], where the endpoints a and b are sampled from the input vector x. The function can create either a linear (power=1) or constant (power=0) density function. It samples a specified number of elements (`size`) without replacement and calculates the values of the distribution function. `toLatex` generates a LaTeX representation of the distribution and its parameters. ```{r} # Taken from the exercise "Constant_Density" ops <- c("\\leq", "<", "\\geq", ">") sym <- sample(1:2, size=2, replace=TRUE) dens <- pdensity(-5:5, size=4, power=0) xdens <- toLatex(dens$pcoeff, digits=FALSE) tdens <- toLatex(dens$pcoeff, digits=FALSE, variable="t") tdist <- toLatex(integral(dens$pcoeff), digits=FALSE, variable="t") str(dens) print(tdist) ``` In this exercise, the `pdensity` function is used to generate a density function within a specified interval. The `pdensity` function is called with the following parameters: - `x`: The vector `-5:5` is provided, from which the endpoints of the interval will be sampled. - `size`: `4` elements will be sampled without replacement. - `power`: `0` specifies that a constant density function should be generated. The resulting `dens` object contains information about the generated density function. Specifically, `dens$pcoeff` holds the coefficients of the generated density function. - `toLatex` is used to convert the coefficients of the density function to LaTeX format. - `xdens`: The coefficients without any specific variable, essentially the constant terms. - `tdens`: The coefficients with the variable "t" specified. - `tdist`: The integral of the density function with respect to "t" is converted to LaTeX. ### `toLatex` After getting a glimpse of the `toLatex` function in the previous example, let's now explore it further in detail. The `toLatex` S3 method is a versatile tool for generating LaTeX representations, focusing on statistical distributions and parameters. Derived functions cover a range of scenarios, including solution paths, matrices, polynomials, and equation solutions through tools like `num_solve()`. This suite provides a practical toolkit for producing LaTeX output across various mathematical and statistical contexts. #### `toLatex.distribution` Generates LaTeX representation for statistical distributions and their parameters. #### `toLatex.equation_solve` This function retrieves a LaTeX representation of the solution path obtained through the use of `num_solve()`. It inherits parameters from the base `utils::toLatex` function, providing compatibility with its usage. #### `toLatex.html_matrix` Produces a LaTeX representation for matrices with limited style options. #### `toLatex.polynomial` Generates a LaTeX representation for polynomials. #### `toLatex.prob_solve` Presents solution pathways in LaTeX/MathJax using an align* environment. ### `toHTMLorLatex` This function produces either an HTML or LaTeX representation of a matrix, contingent on whether the function is invoked within the context of `exams2pdf`. ```{r} # Example: Generating HTML or LaTeX representation based on context matrix_example <- html_matrix(matrix(1:4, nrow = 2)) result <- toHTMLorLatex(matrix_example) str(result) ``` In this example, the `toHTMLorLatex` function is employed to generate either an HTML or LaTeX representation of a matrix. The choice between HTML and LaTeX output depends on whether the function is called within the context of `exams2pdf`. The resulting representation is then printed to the console. Adjust the matrix content and structure as needed for the specific use case. ## Supporting Functions for Math LaTeX Output ### `lsumprod`, `lsum`, `lprod`, `lmean`, `lvar`, `lbr`, `lsgn` and `lvec` 1. `lsumprod`: Creates a LaTeX printout of the sum of the products of corresponding elements in vectors `x` and `y`, including brackets if any element in `x` or `y` starts with a minus sign. ```{r} lsumprod(-2:2, (1:5)/10) ``` This example generates the LaTeX expression for the sum of products: $$ \left(-2\right) \cdot 0.1 + \left(-1\right) \cdot 0.2 + 0 \cdot 0.3 + 1 \cdot 0.4 + 2 \cdot 0.5 $$ 2. `lsum`:Creates a LaTeX printout of the sum of elements in vector `x`. ```{r} lsum(-2:2) ``` This example generates the LaTeX expression for the sum: $$ -2-1+0+1+2 $$ 3. `lprod`:Creates a LaTeX printout of the product of elements in vector `x`. ```{r} lprod(-3:2) ``` This example generates the LaTeX expression for the product: $$ (-3) \cdot (-2) \cdot (-1) \cdot 0 \cdot 1 \cdot 2 $$ 4. `lmean`: Creates a LaTeX printout of the mean of elements in vector `x`. ```{r} lmean(-2:2) ``` This example generates the LaTeX expression for the mean: $$ \frac{-2-1+0+1+2}{5} $$ 5. `lvar`:Creates a LaTeX printout of the variance of elements in vector `x`. ```{r} lvar(1:5) ``` `lvar(x)` will generate a LaTeX printout for the variance of the vector `x`. The output will be a mathematical representation of the variance formula: $$ \frac{(1 - \bar{x})^2 + (2 - \bar{x})^2 + (3 - \bar{x})^2 + (4 - \bar{x})^2 + (5 - \bar{x})^2}{5} $$ where $\bar{x}$ is the mean of the vector `x`. 6. `lbr`: Creates a LaTeX printout of the vector `x` with brackets if any element starts with a minus sign. ```{r} lbr(-2:2) ``` This example generates the LaTeX expressions for each element with brackets: $$ \left(-2\right), \left(-1\right), 0, 1, 2 $$ 7. `lsgn`: Creates a LaTeX printout of the vector `x` with a plus or minus sign at the beginning. ```{r} lsgn(-3:1) ``` In this example, `lsgn` will generate a LaTeX printout with a plus or minus at the beginning of each element. The output will be a LaTeX representation of the vector: $$ - 3, -2, - 1, +0 , +1$$ 8. `lvec`: is a versatile function designed to create a LaTeX printout of a vector `x`. This function allows for the specification of the left and right delimiters for the vector. ```{r} # Using lvec to create a LaTeX representation of a vector with square brackets # lvec(c(1, 2, 3), left = "[", right = "]") # Using lvec to create a LaTeX representation of a vector with angle brackets and custom collapse # lvec(c("a", "b", "c"), left = "<", collapse = " \\cdot ") ``` ### `lprob` and `prob_solve` - `prob_solve`: Given a set of events, it computes the total or conditional probability of the given event. If no solution is found, it returns NA. Events are specified using uppercase letters, and operators include ! (complementary event), \| (conditional event), and \^ (intersection of events). The latex attribute of the return value contains computation steps, and if `getprob` is TRUE, it includes the prob vector and compute with all computation steps. - `print`: Shows the solution in ASCII format. - `toLatex`: Shows the solution in LaTeX/MathJax with an `align` environment. - `lprob`: Converts `!A` to $\bar{A}$ and `A^B` to $A \cap B$. ```{r} # Example: Solving a Genetics Problem # Consider two genes A and B with the following probabilities: # P(A) = 0.6, P(B) = 0.4 # P(A|B) = 0.3, P(B|A) = 0.2 # Compute the probability of having both genes A and B (A^B) result_genetics <- prob_solve("A^B", "A" = 0.6, "B" = 0.4, "A|B" = 0.3, "B|A" = 0.2) # Print the result print(result_genetics) ``` In this genetics example, consider genes A and B. The probabilities of having each gene individually (P(A) and P(B)) and the conditional probabilities (P(A\|B) and P(B\|A)) are given. The `prob_solve` function is used to compute the probability of having both genes A and B (A\^B). ```{r} # Example: Probability Expression Transformation # Suppose we have a probability expression in a format using ^ and !: expression <- "!A^B" # Apply the lprob function to transform the expression transformed_expression <- lprob(expression) # Print the original and transformed expressions cat("Original expression:", expression, "\n") cat("Transformed expression:", transformed_expression, "\n") ``` In this example, we start with a probability expression `!A^B`. We then apply the `lprob` function to transform the expression by replacing `^` with the LaTeX representation for the intersection (`\\cap`) and `!A` with the LaTeX representation for the complement (`\\bar{A}`). ## Markdown Functions ### `inline` This function is designed to knit (render) text within an R code chunk. It is utilized to incorporate text-based content into an R Markdown document, providing a convenient way to weave together narrative and code. ```{r} result <- inline("2 + 2") cat("The result of the calculation is:", result, "\n") ``` ### `rv` The provided exercise is used to calculate the expected value for a random variable `rvt`. ```{r} rateperhour <- sample(10:25, 1) rate <- rateperhour/60 sec <- 60/rate d <- distribution("exp", rate=rate) number <- rateperhour length <- 60 lambda <- rate rvt <- rv("T", "Wartezeit in Minuten auf den nächsten Wähler") str(rvt) ``` In order to calculate the random variable `rvt`, we use the function `rv`. Here `rv` formats a random variable and its meaning for R Markdown using a symbol and the explanation to the symbol. The symbol "T" stands for the waiting time in minutes until the next voter arrives at a polling station. In this case, "T" indicates an exponential distribution as we can also observe from the function `distribution`. ### `template` This function creates a text template that allows the incorporation of R code snippets. The template, defined as a character string, can include placeholders marked by backticks, where the ellipsis represents variable names. The R code within these placeholders is then replaced by its corresponding evaluation based on the provided parameter values. ```{r} # Example: Creating a dynamic template with embedded R code tmpl <- "The sum of `r a` and `r b` is: `r a + b`" result <- template(tmpl, a = 1, b = 2) cat(result) ``` ### `to_choice` To determine the correct level of measurement of a variable we use an Excel file with two columns with the name of the variable and the level of measurement. ```{r} # subset of variables we use, variable names are in German data("skalenniveau") skalen <- c("nominal", "ordinal", "metrisch") stopifnot(all(skalenniveau$type %in% skalen)) # protect against typos skala <- sample(skalenniveau$type, 1) exvars <- sample(nrow(skalenniveau), 8) tf <- (skalenniveau$type[exvars]==skala) sc <- to_choice(skalenniveau$name[exvars], tf) # Additional answer: Does none fit? sc$questions <- c(sc$questions, "Keine der Variablen hat das gewünschte Skalenniveau") sc$solutions <- c(sc$solutions, !any(tf)) sc ``` The `to_choice` function generates a object such that can be used in `answerlist` and `mchoice2string`. The first parameter is either a vector or data frame. The second parameter is a logical vector containing `TRUE` if the element in the vector (or row in the data frame) contains a true answer. The parameter `shuffle` samples from the correct and false answers. The following example could replace the main code from the example above. ```{r} # Subset of variables we use, variable names are in German data("skalenniveau") skalen <- c("nominal", "ordinal", "metrisch") skala <- sample(skalenniveau$type, 1) exvars <- sample(nrow(skalenniveau), 8) tf <- (skalenniveau$type[exvars]==skala) # select one true and four false answers sc <- to_choice(skalenniveau$name[exvars], tf, shuffle=c(1,4)) sc ``` By default the answers are arranged in a certain order, determined by the parameter `order`, which is used to arrange the answers (default: `order`). To use the ordering given, set `order=NULL`. ## HTML Functions ### `html_e2m` The `html_e2m` function facilitates the creation of an HTML page containing the contents of XML tags that match a specified pattern. By default, it displays the contents of all XML tags. The resulting HTML page is stored in the specified HTML file name. If `name` is set to NULL (default), a temporary file is created. If the specified name does not end with .html, the function appends .html. When `browseURL` is set to TRUE (default), the HTML page is automatically opened in the default web browser. If needed, the contents of XML tags are concatenated with `\n`. Users have the flexibility to customize the concatenation for single XML tags using the `merge` parameter. ```{r} # if (interactive()) { # Read XML data from an RDS file # resexams <- readRDS(system.file("xml", "klausur-test.rds", package="exams.forge")) # Create and display HTML page # html_e2m(resexams) # Opens HTML file in the browser} ``` ### `html_matrix_sk` A twist on creating a `html_matrix` object. It is important to note that the length of the `fmt` parameter must match either the number of rows (`nrow(m)`) or the number of columns (`ncol(m)`) in the matrix, depending on the `byrow` argument. ```{r} # html_matrix_sk(m) # tooltip(sprintf(tooltip, nrow(m), ncol(m))) # hm_cell(fmt=fmt, byrow=byrow) ``` ```{r} # Create a matrix m <- matrix(1:6, ncol=2) # Generate and display an html_matrix object html_matrix_sk(m, title="", fmt=c("%.0f", "%.1f")) # Another small example taken from the exercise "Mobil Telephone 2" a <- runif(4) pa <- ddiscrete(a) b <- dpois(0:3, 1) pb <- ddiscrete(b) studie <- cbind(pa, pb) hstudie <- html_matrix_sk(studie, "Studie / $x$", fmt=rep("%3.1f", 2)) print(hstudie) ``` ### `html_matrix`, `zebra` and `toHTML` Returns a HTML representation of a matrix as table. Any exercises created for Moodle can be embedded as HTML in an exercise and will be translated by `exams.forge` into HTML. ```{r} library("magrittr") x <- matrix(1:12, ncol=3) hm <- html_matrix(x) toHTML(hm) # hm <- html_matrix(x) %>% zebra() %>% # sprintf("Table has %.0f rows and %.0f columns", nrow(.), ncol(.)) # toHTML(hm) ``` With parameters the appearance of the table can be influenced: - `title` entry at the top left (default: `""`) - `caption` entry for the caption (default: `""`) - `names$col` entry for the column names (default: `colnames(x)`) - `names$row` entry for the row names (default: `rownames(x)`) - `style$table` style for the table (default: `""`) - `style$caption` style for the caption (default: `""`) - `style$title` style for the caption (default: `"background-color:#999999;vertical-align:top;text-align:left;font-weight:bold;"`) - `style$row` style for the row names (default: `"background-color:#999999;vertical-align:top;text-align:right;font-weight:bold;"`) - `style$col` style for the col names (default: `"background-color:#999999;vertical-align:top;text-align:right;font-weight:bold;"`) - `style$cell` style for the col names (default: `c("background-color:#CCCCCC; vertical-align:top; text-align:right;", "background-color:#FFFFFF; vertical-align:top; text-align:right;")`) - `style$logical` style for a logical matrix entry (default: `c("background-color:#CCCCCC; vertical-align:top; text-align:right;", "background-color:#FFFFFF; vertical-align:top; text-align:right;")`) - `style$numeric` style for a numeric matrix entry (default: `c("background-color:#CCCCCC; vertical-align:top; text-align:right;", "background-color:#FFFFFF; vertical-align:top; text-align:right;")`) - `style$char` style for a character matrix entry (default: `c("background-color:#CCCCCC; vertical-align:top; text-align:right;", "background-color:#FFFFFF; vertical-align:top; text-align:left;"`) - `format$title$fmt` parameter to format the title via `sprintf` (default: `"\%s"`) - `format$row$fmt` parameter to format the row names via `sprintf` (default: `"\%s"`) - `format$col$fmt` parameter to format the col names via `sprintf` (default: `"\%s"`) - `format$cell$fmt` parameter to format a matrix entry via `sprintf` - `format$logical$fmt` parameter to format a logical matrix entry via `sprintf` (default: `"\%d"`) - `format$numeric$fmt` parameter to format a numeric matrix entry via `sprintf` (default: `"\%f"`) # General Purpose Functions ## Output Checker ### `firstmatch` Seeks matches for the elements of its first argument among those of its second. If multiple matches are found then the first match is returned, for further details see `charmatch`. ```{r} firstmatch("d", c("chisq", "cauchy")) firstmatch("c", c("chisq", "cauchy")) firstmatch("ca", c("chisq", "cauchy")) ``` ### `gsimplify` The `gsimplify` function is designed to simplify a hyperloop object, primarily utilized in the context of grid applications. The goal is to reduce the complexity of the hyperloop object if simplification is feasible. ```{r} # Execute three t-test calls: t.test(x, -1), t.test(x, 0), t.test(x, 1) ga <- gapply(t.test, x = I(rnorm(100)), mu = -1:1) # No simplification occurs in this case since `data.name` and `conf.int` have lengths larger than one str(gsimplify(ga)) ``` ### `hyperloop` and `unique_elem` For generating answers for multiple choice exercises it is helpful to run the same routine several times with different input parameters. For example students may forget to divide by n-1 or divide by n instead of n. `hyperloop` runs about all parameter combinations. `unique_elem` removes duplicate elements from a `hyperloop` object by considering specific list elements for comparison. As the outcome in each execution might be a list, the deletion process focuses on maintaining distinct elements within the `hyperloop` structure. `ttest_num` is a routine which computes all information required for exercises with a $t$-test. ```{r} x <- runif(100) correct <- ttest_num(x=x, mu0=0.5, sigma=sqrt(1/12)) str(correct) ``` Now, let us run many $t$-tests (up to 384) with typical student errors. We extract all different test statistic and choose seven wrong answers and one correct answer with the condition that all solutions differ at least by 0.05. ```{r} res <- hyperloop(ttest_num, n = list(1, correct$n, correct$n+1), mu0 = list(correct$mu0, correct$mean), mean = list(correct$mu0, correct$mean), sigma = list(correct$sigma, correct$sd, sqrt(correct$sigma), sqrt(correct$sd)), sd = list(correct$sigma, correct$sd, sqrt(correct$sigma), sqrt(correct$sd)), norm = list(TRUE, FALSE) ) # extract all unique test statistics stat <- unlist(unique_elem(res, "statistic")) # select 7 wrong test statistic such that the difference # between all possible test statistics is at least 0.01 repeat { sc <- to_choice(stat, stat==correct$statistic, shuffle=c(1,7)) if (all_different(sc$questions, 0.005)) break } # show possible results for a MC questions sc$questions sc$solutions ``` ## Text Processing and Formatting ### `knitif` The function knitif is designed to evaluate a logical condition and return a knitted result based on the outcome. It takes a text argument and produces the rendered output using R Markdown syntax. ```{r} knitif(runif(1) < 0.5, 'TRUE' = "`r pi`", 'FALSE' = "$\\pi=`r pi`$") ``` In the given example, the `knitif` function is employed with the logical condition `runif(1) < 0.5`. This condition evaluates to `FALSE` in this specific instance. As a result, the function selects the text argument associated with `FALSE`, which is "$\\pi=`r pi`$". Therefore, the output of the `knitif` function in this example is "$\\pi=`r pi`$". ### `now` If we randomize the task and the stories then we may have a lot of different tasks. If questions arise then we need to identify the exact task a student has. Therefore we embed a: ```{r} substring(now(), 10) ``` The `now` function uses: `gsub('.', '', sprintf("%.20f", as.numeric(Sys.time())), fixed=TRUE)` and ensures that every time called a different number is returned. ### `nsprintf` (`round_de` and `schoice_de`) The `nsprintf` function generates text based on the value(s) provided in n. Specifically, it includes two sub-functions: - `round_de`: Returns text indicating rounding instructions, such as "Round your result to the nearest whole number," "Round your result to one decimal place," or "Round your result to n decimal places." - `schoice_de`: Returns text indicating that there can be one or more correct answers. It emphasizes that providing one correct answer is sufficient. If multiple answers are given and at least one is incorrect, the task is considered incorrectly answered. ```{r} # Example taken from the exercise "DSL 4" repeat { border <- sample(3:10, 1)-1 lambda <- sample(seq(0.5, 6, by=0.1), 1) if (ppois(border, lambda = lambda)>1e-3) break } d <- distribution("pois", lambda=lambda) ptype <- "less" sc <- num_result(cdf(d, border), 4) txt <- nsprintf(border, "%i Netzunterbrechungen", '0'="keine Netzunterbrechung", '1'="eine Netzunterbrechung") str(txt) ``` In this exercise, the `nsprintf` function is used to create a text message based on the value of `border`, which represents the number of network interruptions in a specific context. The resulting text is then embedded in the question text for the exercise. Here, `nsprintf` is used with the following parameters: - `border`: The value to be included in the text. - `"%i Netzunterbrechungen"`: The format string indicating where the value from `border` should be inserted. `%i` is a placeholder for an integer. The following arguments provide alternative text depending on the value of `border`: - `'0'="keine Netzunterbrechung"`: If `border` is 0, the text "keine Netzunterbrechung" (no network interruption) will be used. - `'1'="eine Netzunterbrechung"`: If `border` is 1, the text "eine Netzunterbrechung" (one network interruption) will be used. The resulting `txt` variable will contain a formatted text message that includes the value of `border` and provides context-specific information about network interruptions. ## MIME ### `mime_image` The `mime_image` function returns the MIME type of an image based on the provided filename extension. In cases where a corresponding MIME type for a given file extension is not identified, the function returns the extension itself. ```{r} image_file <- "example_image.jpg" # Retrieve MIME type for the given image file mime_type <- mime_image(image_file) # Display the result cat("MIME Type for", image_file, ":", mime_type, "\n") ``` In this example, the `mime_image` function is used to obtain the MIME type for an image file named "example_image.jpg." The resulting MIME type is then printed using the `cat` function.