Creating Forest Plots

Most of the forest plotting functions in R are built to be general(ish), but they do not anticipate wanting to plot two different confidence intervals for each estimate. Some of them, like forestploter::forest and forestplot::forestplot can be made to do this, but it takes a bit of hacking to get there. In an effort to make it easier to plot forests with our two sets of confidence intervals, we wrote a function called gg_forest() which constucts a table panel and a plot panel and uses patchwork to put them together in a coherent fashion. Many features of the underlying machinery are exposed in gg_forest, but the function itself relies on some geom_ and scale_ functions that are part of VizTest. This vignette will show some basic use cases of these tools.

The Data

The input data for gg_forest (and the underlying geom_ functions) should be organized in a data frame. It should have columns for estimates, standard lower and upper confidence intervals (e.g., 95 percent) and optionally inferential confidence intervals. While the inferential confidence intervals are optional, we built these functions to work with those intervals alongside the standard intervals. If you want a standard forest plot with only one set of intervals, it is likely that other packages, like those mentioned above, may provide more customization options. In addition to the estimates and intervals, if a summary row (or rows) exists, then there should be a logical variable indicating which rows are the summary - generally you’ll want to print those at the bottom. The data frame should also have a variable that indicates the character expansion for the point sizes. We will show a couple of examples with the built-in esoph data from the datasets package. We will use the emmeans package to make predictions from the model to plot.

First, we can load the data, we will change age into an unordered factor and then run two different models one with age as a covariate and a null model to generate the overall summary.

library(emmeans)
library(VizTest)
library(dplyr)
data(esoph)
esoph$age <- factor(as.character(esoph$age), 
                        levels=c("25-34", "35-44", "45-54", "55-64", "65-74", "75+"))
model1 <- glm(cbind(ncases, ncontrols) ~ age,
              data = esoph, family = binomial())
model_sum <- glm(cbind(ncases, ncontrols) ~ 1,
              data = esoph, family = binomial())

We can use emmeans to generate predictions and confidence intervals for each age group and save them in the data frame called fit.

em <- emmeans(model1, "age")
fit <- print(em)
#>  age   emmean    SE  df asymp.LCL asymp.UCL
#>  25-34 -4.745 1.000 Inf    -6.713    -2.776
#>  35-44 -3.050 0.341 Inf    -3.718    -2.381
#>  45-54 -1.289 0.167 Inf    -1.616    -0.963
#>  55-64 -0.781 0.138 Inf    -1.053    -0.510
#>  65-74 -0.656 0.166 Inf    -0.982    -0.330
#>  75+   -0.869 0.330 Inf    -1.517    -0.221
#> 
#> Results are given on the logit (not the response) scale. 
#> Confidence level used: 0.95

To get the inferential confidence intervals, we can execute viztest() on the emmeans object. We can use get_viztest_level() to pull the confidence level associated with the easiest tests to decipher.

vt <- viztest(em, test_level = .05)
vt
#> 
#> Correspondents of PW Tests with CI Tests
#>   level psame     pdiff       easy  method
#> 1  0.61     1 0.8095238 -0.2390388  Lowest
#> 2  0.75     1 0.8095238 -0.0058850  Middle
#> 3  0.90     1 0.8095238 -0.3696089 Highest
#> 4  0.75     1 0.8095238 -0.0058850 Easiest
#> 
#> All  21  tests properly represented for by CI overlaps.
ic_level <- get_viztest_levels(vt, "easiest")

Next, we can use confint() to get the standard and inferential confidence intervals and transform the predictions into the odds ratio scale.

fit$or <- exp(fit$emmean)
fit$lower <- exp(fit$asymp.LCL)
fit$upper <- exp(fit$asymp.UCL)
ag_data <- aggregate(esoph[,c("ncases", "ncontrols")], list(age = esoph$age), \(x)as.integer(sum(x)))
fit$n_cases <- ag_data$ncases
fit$n_controls <- ag_data$ncontrols
fit_ci <- confint(em, level = ic_level)
fit$lower_ici <- exp(fit_ci$asymp.LCL)
fit$upper_ici <- exp(fit_ci$asymp.UCL)

Now, we can do the same for the summary data. Here we simply generate a prediction for the model with predict() and confidence intervals using confint(). We do not need to make visual testing intervals for the summary value.

pred_sum <- predict(model_sum, type="link", se.fit=TRUE)
ci_sum <- confint(model_sum)
fit_sum <- data.frame(
  age = "Summary", 
  emmean = pred_sum$fit[1],
  SE = pred_sum$se.fit[1],
  asymp.LCL = ci_sum[1], 
  asymp.UCL = ci_sum[2],
  or = exp(pred_sum$fit[1]),
  lower = exp(ci_sum[1]),
  upper = exp(ci_sum[2]), 
  n_cases =sum(ag_data$ncases), 
  n_controls = sum(ag_data$ncontrols), 
  lower_ici = NA, 
  upper_ici = NA
)

We can then put the data together in the same place.

fit <- rbind(fit[,-4], #remove df column
      fit_sum) 

# add row_number indicator for later. 
fit$row_number <- 1:nrow(fit)

Now, there are some other variables we can add since the data are all in the same place. We need to add a logical flag for the summary row(s), a point_size scaling variable, which we will set to the inverse of the sampling variance estimate and a label variable that puts the stimuli in the right order. Sometimes, we may want to reorder the rows based on the value of the odds ratio. In those cases, you could use reorder_forest() for this. It works like reorder(), but it forces the summary row to be in the right place. In this case, since the age levels are already ordered in a useful way, we just want to reverse the values so 25 has the highest value (it will plot at the top) and Summary has the smallest value (it will plot at the bottom).

fit$is_summary <- fit$age == "Summary"
fit$point_size <- 1/fit$SE^2
fit$age_label <- factor(fit$age, levels=rev(fit$age))

The fit object now has all the relevant data to make a forest plot.

Forest Plots using `gg_forest()`.

We first show how to do this with the gg_forest() wrapper. We will show the function first and then explain its arguments.

out <- gg_forest(fit, 
                 y = "age_label", 
                 x = "or", 
                 xmin_std = "lower", 
                 xmax_std = "upper", 
                 xmin_ici = "lower_ici", 
                 xmax_ici = "upper_ici",
                 vline = 1, 
                 size_prop = "point_size", 
                 is_summary = "is_summary", 
                 use_log_scale = TRUE, 
                 data_cols = c("Age" = "age_label", "Controls" = "n_controls", "Cases"="n_cases", "OR" = "or"), 
                 max_size=5,
                 table_header_size = 16, 
                 table_text_size = 5, 
                 col_nudge=c(-.15, 0,0,0), 
                 diamond_aspect=15, 
                 diamond_row_frac = 2)

The first argument is the data. The names of the y-variable (the labelled indicator for the rows) and the other variables are in quotes. The _std variables are the standard confidence intervals and the _ici variables are the inferential confidence intervals. The size_prop variable is the variable that will be used to scale the point sizes. The is_summary variable is a logical variable indicating which rows are summary rows. The use_log_scale argument indicates whether to use a log scale for the x-axis, which is common for odds ratios. The data_cols argument takes a named vector of column names to be printed in the table panel and the corresponding variables in the data frame. The max_size argument sets the maximum size for the points. The table_header_size and table_text_size arguments set the font size for the table header and text, respectively. The col_nudge argument takes a vector of horizontal nudges for each column in the table to adjust their position. We use this for the first column which is left-aligned. In this case, the cell text aligns left with the anchor as the tick mark location. The header label is set centered over the tick mark regardless of the justification of the text. The col_nudge parameter allows you to move the cell text to be roughly in line with the header label. It may take a bit of trial and error to get it just right. Finally, the diamond_aspect and diamond_row_frac arguments control the aspect ratio and row fraction for summary rows, which are plotted as diamonds instead of points. If the summary diamond is too flat, increase diamond_aspect. The diamond_row_frac argument is intended to make is so that the diamond never spills over into another row.

The out object now contains two elements - a table and a plot. We can use the plot method to put them together, which is really just a call to patchwork functions.

plot(out)

Forest Plot with Standard (95%) and Inferential (75%) Confidence Intervals. The standard confidence intervals are the thin lines and the inferential confidence intervals are the thick lines. See Armstrong and Poirier (2025) for more details on inferential confidence intervals.

Using Individual `geom_` Functions.

The gg_forest() function is a wrapper that makes it easy to put together a forest plot with two sets of confidence intervals, but the underlying machinery is built on ggplot2 geoms and scales that are part of the VizTest package. If you want more control over the aesthetics, you can use those functions directly.

Let’s make the figure first. We can make confidence bounds with ggplot2’s geom_linerange() using different line widths. We made a utility - geom_foreststripe() which will make a striped figure that could match striping in the table if you use it. The striping should be added before you add other elements. The points are added with geom_forestpoint() the main utility of this function is that it plots the summary and non-summary dots as different shapes and the size argument works differently. The summary diamonds are as wide as the confidence bounds and the non-summary points are scaled proportional to the size_prop variable. We use other ggplot2 functions to add vertical lines, re-scale the x-axis and adjust the maximum size of the points.

library(ggplot2)
g <- ggplot(fit, aes(y=age_label, x=or)) + 
  geom_foreststripe(n_cols=NULL, start = 3) + 
  geom_linerange(aes(xmin = lower, xmax=upper, color="Standard CI"), linewidth=1) + 
  geom_linerange(aes(xmin=lower_ici, xmax=upper_ici, color="Inferential CI"), linewidth=5) + 
  geom_forestpoint(aes(is_summary = is_summary, size=point_size, xmin = lower, xmax=upper), fill="black", 
                   diamond_aspect=10, diamond_row_frac = .9) + 
  theme_classic() + 
  scale_color_manual(values=c("gray60", "black")) + 
  scale_x_log10() + 
  geom_vline(xintercept = 1, linetype="dashed") + 
  labs(x = "Odds Ratio", y="", colour="") + 
  scale_size_area(max_size=5) + 
  guides(size = "none")
g

To make this compatible with the table, we will also need to remove the y-axis ticks and labels.

g <- g + 
  theme(axis.ticks.y = element_blank(), 
        axis.text.y = element_blank(), 
        axis.line.y=element_blank())

Next, we can make the table. For this, we can use geom_foresttable() a function from this package. Again, we will make the table and then discuss the arguments.

data_cols <- c("Age" = "age_label", "Controls" = "n_controls", "Cases"="n_cases", "OR" = "or")
tab <- ggplot(fit, aes(y=age_label)) + 
  geom_foreststripe(n_cols=length(data_cols), start=3) +
  geom_foresttable(
    cols = data_cols,
    col_nudge = c(-.15, 0, 0, 0), 
    fmt = list(
      age_label = as.character, 
      n_controls = function(x)sprintf("%d",x), 
      n_cases = function(x)sprintf("%d",x), 
      or = function(x)sprintf("%.2f",x)
    ), 
    col_align = c("left", "center", "center", "center"),
    size = 4
  ) +
  scale_x_foresttable(data_cols, names(data_cols),
                      col_gap = 1) +   # returns list(scale, theme) — ggplot2 handles lists fine
  theme_void() +
  theme(
    axis.text.x.top   = element_text(face = "bold", size = 16, margin = margin(t = 8)),
    axis.line.x.top   = element_blank(),
    axis.ticks.x.top  = element_blank(),
    axis.ticks.length = unit(0, "pt")
  )
tab

Data Table for Forest Plot.

In the code above, you we start with a ggplot with just the y aesthetic. This should be the same y variable as in the figure. We add striping in the same way. We can specify n_cols to let the function know that there are four columns of data to stripe. The data_cols is a named vector of values. The values are variable names and the names of the vector are what will be printed in the table header row. The col_nudge argument takes a vector of horizontal nudges for each column in the table to adjust their position. We use this for the first column which is left-aligned as described above. The fmt argument takes a list of functions that format each variable for printing. The names of the list should be the same as the variable names in data_cols. I’ve used sprintf() here, but format() would work as well. In fact, any function that could be called on the values to render them to print would work. The col_align argument takes a vector of justifications for each column. Finally, we use scale_x_foresttable() to set the header labels and adjust the gap between columns with col_gap. We end with some theming that turns off elements that we do not need.

We could put these two elements together with patchwork:

library(patchwork)
(tab + g + plot_layout(widths = c(1.25, 1), guides = "collect")) & theme(legend.position="bottom")

Forest Plot with Data Table. The left panel is the data table and the right panel is the forest plot. The striping matches across panels.

Subgroup Analysis

Since these are just made with ggplot, in principle, there are a couple of ways that subgroup analysis could be done. First, it would be trivial to make these two different plots for models built on subsets of the data. Alternatively, you could build a single model with interactions and then subset the data once it is made and make two different plots. That is likelyt he easiest way and we will not pursue that more here.

Another option, would be to use the custom syntax we propose above and add grouping variables. For example, using the esoph data imagine we wanted to look at instances by age-group for high and low tobacco usage groups. We will pull the data together much in the same way that we did above.

data(esoph)
esoph <- subset(esoph, agegp != "25-34")
esoph$agegp <- factor(as.character(esoph$agegp), 
                        levels=c("25-34", "35-44", "45-54", "55-64", "65-74", "75+"))
esoph$high_tob <- ifelse(esoph$tobgp %in% c("20-29", "30+"), "High Tobacco", "Low Tobacco")
esoph$high_tob <- factor(esoph$high_tob, levels=c("Low Tobacco", "High Tobacco"))
model1 <- glm(cbind(ncases, ncontrols) ~ agegp*high_tob,
              data = esoph, family = binomial())
model_sum <- glm(cbind(ncases, ncontrols) ~ high_tob,
              data = esoph, family = binomial())
em <- emmeans(model1, c("agegp", "high_tob"))
em_sum <- emmeans(model_sum, "high_tob")
fit <- print(em)
#>  agegp high_tob     emmean    SE  df asymp.LCL asymp.UCL
#>  35-44 Low Tobacco  -3.212 0.416 Inf    -4.028   -2.3961
#>  45-54 Low Tobacco  -1.602 0.211 Inf    -2.015   -1.1885
#>  55-64 Low Tobacco  -1.027 0.168 Inf    -1.356   -0.6969
#>  65-74 Low Tobacco  -0.782 0.184 Inf    -1.143   -0.4213
#>  75+   Low Tobacco  -0.860 0.360 Inf    -1.565   -0.1552
#>  35-44 High Tobacco -2.615 0.598 Inf    -3.787   -1.4427
#>  45-54 High Tobacco -0.552 0.288 Inf    -1.117    0.0124
#>  55-64 High Tobacco -0.134 0.259 Inf    -0.641    0.3737
#>  65-74 High Tobacco  0.000 0.408 Inf    -0.800    0.8002
#>  75+   High Tobacco -0.916 0.837 Inf    -2.556    0.7235
#> 
#> Results are given on the logit (not the response) scale. 
#> Confidence level used: 0.95
fit_sum <- print(em_sum)
#>  high_tob     emmean     SE  df asymp.LCL asymp.UCL
#>  Low Tobacco  -1.381 0.0963 Inf    -1.569    -1.192
#>  High Tobacco -0.653 0.1540 Inf    -0.955    -0.351
#> 
#> Results are given on the logit (not the response) scale. 
#> Confidence level used: 0.95
fit_sum$agegp <- "Summary"
fit_sum$lower_ici <- fit_sum$upper_ici <- NA
ag_data <- aggregate(esoph[,c("ncases", "ncontrols")], list(high_tob = esoph$high_tob), \(x)as.integer(sum(x)))
fit_sum$n_cases <- ag_data$ncases
fit_sum$n_controls <- ag_data$ncontrols

vt <- viztest(em, include_zero = FALSE, test_level = .05)
vt
#> 
#> Correspondents of PW Tests with CI Tests
#>   level psame     pdiff        easy  method
#> 1  0.81     1 0.6666667 -0.07238479  Lowest
#> 2  0.84     1 0.6666667 -0.01736363  Middle
#> 3  0.87     1 0.6666667 -0.12092006 Highest
#> 4  0.83     1 0.6666667 -0.01385398 Easiest
#> 
#> All  45  tests properly represented for by CI overlaps.
ic_level <- get_viztest_levels(vt, "easiest")
fit_ci <- confint(em, level = ic_level)
fit$lower_ici <- exp(fit_ci$asymp.LCL)
fit$upper_ici <- exp(fit_ci$asymp.UCL)
ag_data <- aggregate(esoph[,c("ncases", "ncontrols")], list(age = esoph$age, high_tob = esoph$high_tob), \(x)as.integer(sum(x)))
fit$n_cases <- ag_data$ncases
fit$n_controls <- ag_data$ncontrols
fit <- rbind(fit, fit_sum)
fit$or <- exp(fit$emmean)
fit$lower <- exp(fit$asymp.LCL)
fit$upper <- exp(fit$asymp.UCL)
fit$is_summary  <- fit$agegp == "Summary"
fit$point_size <- 1/fit$SE^2
fit <- fit[order(fit$agegp, fit$high_tob), ]
fit$row <- 1:nrow(fit)
fit$row_fac <- as.factor(1:nrow(fit))
fit$row_fac <- reorder_forest(fit$row_fac, fit$row)

g <- ggplot(fit, aes(y=row_fac, x=or)) + 
  geom_foreststripe(n_cols=NULL, start = 3) + 
  geom_linerange(aes(xmin = lower, xmax=upper, color = high_tob, alpha="Standard CI"), linewidth=1) + 
  geom_linerange(aes(xmin=lower_ici, xmax=upper_ici, color=high_tob, alpha= "Inferential CI"), linewidth=5) + 
  geom_forestpoint(aes(is_summary = is_summary, size=point_size, xmin = lower, xmax=upper, color=high_tob, fill=high_tob), 
                   diamond_aspect=10, diamond_row_frac = .9) + 
  theme_classic() + 
  scale_alpha_manual(values=c(.25, .75)) +
  scale_x_log10() + 
  geom_vline(xintercept = 1, linetype="dashed") + 
  labs(x = "Odds Ratio", y="", alpha="") + 
  scale_size_area(max_size=5) + 
  guides(size = "none", color="none", fill="none") + 
  theme(axis.ticks.y = element_blank(), 
        axis.text.y = element_blank(), 
        axis.line.y=element_blank())
g

Forest Plot with Standard (95%) and Inferential (83%) Confidence Intervals. The standard confidence intervals are the thin lines and the inferential confidence intervals are the thick lines. See Armstrong and Poirier (2025) for more details on inferential confidence intervals.

Next, we can make the table. Because these work in th ggplot ecosystem, you can color the text by a variable if you like. Here, we color the text by the tobacco usage group. We can do this simply by adding a color aesthetic to geom_foresttable().

data_cols = c("Age" = "agegp", "Tobacco\nUsage" = "high_tob", "Controls" = "n_controls", "Cases"="n_cases", "OR" = "or")
tab <- ggplot(fit, aes(y=row_fac, x=or)) + 
  geom_foreststripe(n_cols=length(data_cols), start=3) +
  geom_foresttable(aes(color=high_tob), 
    cols = data_cols,
    col_nudge = c(-.15, -.35, 0, 0, 0), 
    fmt = list(
      agegp = as.character, 
      high_tob = as.character,
      n_controls = function(x)sprintf("%d",x), 
      n_cases = function(x)sprintf("%d",x), 
      or = function(x)sprintf("%.2f",x)
    ), 
    col_align = c("left", "left", "center", "center", "center"),
    size = 4
  ) +
  scale_x_foresttable(data_cols, names(data_cols),
                      col_gap = 1) +   # returns list(scale, theme) — ggplot2 handles lists fine
  theme_void() +
  theme(
    axis.text.x.top   = element_text(face = "bold", size = 12, margin = margin(t = 8)),
    axis.line.x.top   = element_blank(),
    axis.ticks.x.top  = element_blank(),
    axis.ticks.length = unit(0, "pt")
  )
tab

Data Table for Forest Plot with Subgroups.

And now putting them together.

library(patchwork)
(tab + g + plot_layout(widths = c(1.25, 1), guides = "collect")) & theme(legend.position="bottom")

Forest Plot with Data Table with Subgroups. The left panel is the data table and the right panel is the forest plot. The striping matches across panels.

Creating Forest Plots

William Poirier

David A. Armstrong

2026-02-21

The Data

Forest Plots using `gg_forest()`.

Using Individual `geom_` Functions.

Subgroup Analysis

Creating Forest Plots

William Poirier

David A. Armstrong

2026-02-21

The Data

Forest Plots using gg_forest().

Using Individual geom_ Functions.

Subgroup Analysis

Forest Plots using `gg_forest()`.

Using Individual `geom_` Functions.