Introduction

Visualisations are an important tool in systematic reviews for identifying, demonstrating, and investigating patterns in the reviewed literature. These are often more efficient and effective than using text, especially where meta-analysis is not possible (Cochrane Handbook). Harvest plots are one way to do this and have been defined by Ogilvie et al., 2008 as:

A novel and useful method for synthesising evidence about the differential effects of population-level interventions. It contributes to the challenge of making best use of all available evidence by incorporating all relevant data. The visual display assists both the process of synthesis and the assimilation of the findings. The method is suitable for adaptation to a variety of questions in evidence synthesis and may be particularly useful for systematic reviews addressing the broader type of research question which may be most relevant to policymakers.

We aim to demonstrate how to create a simple harvest plot using R and how this can be extended where there are multiple outcomes and exposures of interest. Our examples come from “Shifting towards healthier transport: carrots or sticks? Systematic review and meta-analysis of population-level interventions” (Xiao et al., 2022). This study aimed to compare the effectiveness of carrot, stick, and combined carrot-and-stick interventions in changing different transport behaviors.

With this in mind, let’s get plotting!




Setup

Load R Packages

Load the relevant packages:

library(readxl)
library(tidyverse)
library(knitr)
library(ggpubr)




Load data

We first start using the following dataset to create our simple version of the harvest plot.

df = tibble(
  direction_effect = factor(c("Decrease", "Decrease", "Decrease", "Decrease", "Decrease",
                              "Decrease", "Decrease", "Decrease", "Decrease", "Decrease",
                              "Decrease", "Decrease", "Decrease", "Decrease", "Decrease",
                              "Decrease", "Decrease", "Decrease", "Decrease", "Decrease",
                              "No change","No change","No change","No change","No change",
                              "No change", "Increase", "Increase", "Increase", "Increase",
                              "Increase", "Increase", "Increase", "Increase"),
                            levels = c("Decrease", "No change", "Increase")) ,
  stat_sig = factor(c("Yes","No","No","No","Yes","Yes","No","Yes","N/A","N/A","N/A",
                      "N/A","N/A","N/A","N/A","N/A","N/A","N/A","N/A","N/A","No","N/A",
                      "No","No","N/A","N/A","N/A","No","Yes","Yes","No","N/A","N/A","No"), 
                    levels = c("N/A", "No", "Yes")),
  quality = c(3,3,3,3,2,2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,2,1,1,1,1,1,1,3,2,2,1,2,1,1),
  n_interventions = c(1,1,1,2,3,3,4,1,1,1,1,1,1,1,1,1,1,2,2,1,4,1,1,1,1,1,1,4,2,2,1,1,2,1),
  position = c(1,3,4,2,5,6,8,7,11,12,13,14,15,16,17,18,19,9,10,20,10,12,11,9,8,13,13,20,19,
               18,16,17,14,15))

And this is what the data frame looks like:

knitr::kable(head(df), format = "pipe")
direction_effect stat_sig quality n_interventions position
Decrease Yes 3 1 1
Decrease No 3 1 3
Decrease No 3 1 4
Decrease No 3 2 2
Decrease Yes 2 3 5
Decrease Yes 2 3 6




Defining variables

The following defines each variable in the simplified harvest plot dataset:

  • position: (Numeric) Placement of the individual study outcome in the harvest plot
  • direction_effect: (Factor) Whether there was an increase, decrease, or no change in the outcome
  • stat_sig: (Factor) Whether the difference was found to be statistically significant. This will be represented by the shading of the bars
  • quality: (Numeric) Indicates the study quality. We have three categories here: low, moderate, and high, converted into a numeric variable (1 = low, 2 = moderate, 3 = high). This will be represented by the height of the bars
  • n_interventions: (Numeric) The number of intervention components, as represented by a number above each bar



Setting the position of bars representing each study

For this basic example, we manually set the position of the bars using the variable position. Bars are positioned at the extremities of the facets (for decrease and increase) or in the middle (for no change). Within each facet, the bars are sorted by the characteristics plotted, in order of precedence. In this case bars were sorted by study quality, significant and number of intervention components. It is usual to sort initially by the same characteristic used to determine bar height. If this doesn’t immediately make sense, hopefully the examples below will illustrate how the position of the bars in the three columns.


Basic harvest plot

The data allow us to visualise the evidence to support each of three competing hypotheses about a change in travel mode as a result of an intervention, in addition to several additional characteristics of the studies and their outcomes. Initially we create a plot that conveys the evidence consistent with a decrease, increase or no change in travel, along with the study quality.

ggplot(df, aes(y=quality, x=position)) + geom_bar(stat="identity") +
  facet_grid(cols = vars(direction_effect)) 

Using the facet_grid option we disaggregate the studies based on their outcomes, allowing us to see the number of studies supporting each finding. The height of the bars indicating study quality facilitates a comparison of the strength of evidence for each finding.

We can add further data about each study or its findings to better visualise the nature of the evidence base.

To set the colour scheme:

cbPalette <- c("#A9C5D0","#FFB7B0","#ADA7A7")

Plotting the figure with colours and labels to further differentiate between studies:

ggplot(df, aes(y=quality, x=position, fill = stat_sig)) + geom_bar(stat="identity") +
  facet_grid(cols = vars(direction_effect)) + scale_fill_manual(values=cbPalette, breaks=c("Yes","No","N/A")) +
  guides(fill = guide_legend("Significant")) +
  geom_text(aes(label=n_interventions), position=position_dodge(width=0.9), vjust=-0.4)

We can now see the statistical significance of the findings for each study (colour) and the number of intervention components (number above each bar). The most important characteristics will likely differ depending on the specific research question being addressed, and reviewers can tailor the harvest plots to their needs.

The following code makes it easier to distinguish between the bars and the background.

ggplot(df, aes(y=quality, x=position, fill = stat_sig)) + geom_bar(stat="identity") +
  facet_grid(cols = vars(direction_effect)) + scale_fill_manual(values=cbPalette, breaks=c("Yes","No","N/A")) +
  guides(fill = guide_legend("Significant")) +
  geom_text(aes(label=n_interventions), position=position_dodge(width=0.9), vjust=-0.4) +  
  theme(strip.placement = "outside")+ scale_y_continuous("", limits=c(0,3.5)) +
  theme_minimal(base_size = 12) +
  annotate("segment", x=-Inf, xend=Inf, y=-Inf, yend=-Inf) +
  annotate("segment", x=-Inf, xend=-Inf, y=-Inf, yend=Inf) +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) + 
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_blank())


Including more exposures

If you have a dataset with more than one outcome and exposure, the following section provides guidance on how to create a harvest plot that would allow you to compare multiple outcomes and exposures.

Loading data

First, we load the data:

#data_harvest <- readRDS(url("https://github.com/christina-s-xiao/harvest_plot/raw/revise/harvest_plot_data.rds"))

data_harvest <- readRDS("harvest_plot_data.rds")

And this is what the new loaded dataset looks like:

knitr::kable(head(data_harvest))
study_id outcome_mode stat_sig quality n_interventions cs_category direction_effect position
52 Driving Yes 3 1 Carrot Decrease 1
61 Driving No 3 2 Carrot Decrease 2
36 Driving No 3 1 Carrot Decrease 3
91 Driving No 3 1 Carrot Decrease 4
30 Driving Yes 2 3 Carrot Decrease 5
41 Driving Yes 2 3 Carrot Decrease 6



Here, we have a few more variables, as defined below:

  • study_id: Unique study identifier
  • outcome_mode: Outcome of interest in your review. This can be a single outcome or many. Here, we are interested in looking at driving and public transport
  • cs_category: Exposure or category of studies to be compared, which will be placed in separate rows


Postion

An alternative to manually specifying the position of observations in the harvest plot is to use some code. The example below works with this data and can be adapted as necessary.

data_harvest = data_harvest %>%
  group_by(cs_category, direction_effect, outcome_mode) %>% # group by column variable (and row variables if you're using them)
  arrange(desc(quality), desc(stat_sig), desc(n_interventions), .by_group = TRUE) # sort by quality measures in order of precedence

c = count(data_harvest, direction_effect, outcome_mode,cs_category, .drop = TRUE) # count the number of studies in each group

n = max(c$n) # number of studies in the group with the most studies

v = c(0, rep(1:n, each = 2, length.out = n-1)) * c(-1, 1) + floor(n/2) # creates a position vector starting in the middle and working out for "No change" column

c = left_join(c,
              tibble(
                direction_effect = c("Decrease", "No change" ,"Increase"),
                order = list(seq_len(n),
                             v[seq_len(n)],
                             seq(n,1,-1)))
) # joins the correct position vector to each group

## Joining, by = "direction_effect"

l = lapply(seq_len(nrow(c)), function(x) c$order[[x]][seq_len(c$n[x])]) # collect the position vector for each group

data_harvest$position = unlist(l) # add the new position variable to the dataset

Creating separate plots for each outcome

First, we will plot the harvest plot for driving outcomes, which we will later combine with the public transport outcomes in a separate figure. To do so, we create two new datasets called ‘drive’ and ‘public_transport’.

drive <- data_harvest[data_harvest$outcome_mode == 'Driving', ]
public_transport <- data_harvest[data_harvest$outcome_mode == 'Public transport', ]

Driving harvest plot

The following code plots the harvest plot for the driving outcome.

We start building the harvest plot, specifying that the colour of each bar is represented by the variable stat_significant, the height of each bar by quality, the number of interventions is labelled above each bar by n_interventions, and the position of each bar by position. We then add bars to represent each study.

drive_a<- ggplot(drive, aes(fill=stat_sig, y=quality, x=position, label = study_id)) + 
  geom_bar(position="dodge", stat="identity") + # Adding bars to the plot
  scale_fill_manual(values=cbPalette, na.translate=FALSE, breaks=c("Yes","No","N/A")) + # Adding colours to the bars
  geom_text(aes(label=n_interventions), position=position_dodge(width=0.9), vjust=-0.4) # Adding annotated number of interventions above each bar

Next, we use the facet_grid option to specify that we want cs_category as our rows and direction_effect as our columns.

drive_b<- drive_a + facet_grid(cs_category~direction_effect,  
   switch = "y", space='free_x') + theme_minimal(base_size = 14) 

The following code formats the figure so it’s easier to distinguish between the different rows and columns:

drive_c <- drive_b +  theme(strip.placement = "outside") + 
  scale_y_continuous("", limits=c(0,3.5)) +
  theme(plot.title = element_text(size = 14, face = "bold"),
        legend.title=element_text(size=12,  face = "bold"), 
        legend.text=element_text(size=10)) +
  annotate("segment", x=-Inf, xend=Inf, y=-Inf, yend=-Inf) +
  annotate("segment", x=-Inf, xend=-Inf, y=-Inf, yend=Inf) +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) + 
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_blank())

Specifying the title and legends:

drive_final <- drive_c + guides(fill = guide_legend("Significant", nrow = 3)) +
  theme(legend.position="right") + labs(title="Driving") 

The final plot should look like this:

drive_final




Public transport harvest plot

Repeat the steps above for the public transport outcomes, as below:

pt_final<- ggplot(public_transport, aes(fill=stat_sig, y=quality, x=position, label = study_id)) + 
  geom_bar(position="dodge", stat="identity") + scale_fill_manual(values=cbPalette, na.translate=FALSE, breaks=c("Yes","No","N/A")) +
  geom_text(aes(label=n_interventions), position=position_dodge(width=0.9), vjust=-0.4) +
  facet_grid(cs_category~direction_effect,  
   switch = "y", space='free_x') + theme_minimal(base_size = 12) +
  theme(strip.placement = "outside")+ scale_y_continuous("", limits=c(0,3.5)) +
  theme(plot.title = element_text(size = 14, face = "bold"),
        legend.title=element_text(size=12,  face = "bold"), 
        legend.text=element_text(size=10)) +
  annotate("segment", x=-Inf, xend=Inf, y=-Inf, yend=-Inf)+
  annotate("segment", x=-Inf, xend=-Inf, y=-Inf, yend=Inf) +
  theme(axis.title.x=element_blank(),
        axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) + 
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank(), 
        panel.background = element_blank()) +
  guides(fill = guide_legend("Significant", nrow = 3)) +
  theme(legend.position="right") + labs(title="Public transport")

The final plot should look like this:

pt_final




Creating the final combined harvest plot

We then use the ggarrange function to plot both the driving and public transport harvest plot figures together:

ggarrange(drive_final, pt_final, hjust = -0.5,
                    ncol = 1, nrow = 2, common.legend = TRUE, legend='right')