Supplementary Material - pilot data processing and visualization

PFAS exposure of humans, animals and the environment: protocol of an evidence review map and bibliometric analysis

In this Rmarkdown document we provide preliminary workflow and code for:

  • Objective 1. Mapping: What evidence on PFAS has been synthesized? We aim to reveal what areas in the realm of PFAS health-related and environmental research have been synthesized the most and where syntheses of evidence are lacking.

  • Objective 2. Critical appraisal: How robust are syntheses of PFAS evidence? We will examine the included syntheses for their reporting quality and potential biases, in order to assess reliability of reviews’ conclusions, reveal syntheses that should be re-done, and highlight the aspects of review methodology that need to be improved.

  • Objective 3. Bibliometrics: How is synthesized PFAS evidence connected? We will examine which countries and institutions are mostly involved in secondary PFAS research and what do the networks between these institutions look like.

  • Cross-Objective Insights: We will investigate how review type, indicators of review quality or transparency are related other review properties, such as publication date, main review topic and subject, etc.

Note: The code and all created outputs are based on a pilot set of data extracted from only 8 papers that fulfilled our inclusion criteria during the piloting stage. The code will need to be adjusted and expanded for the final data set (e.g., additional data cleaning steps or adjustments to the plot styling and contents). These changes will be documented via tracking on GitHub.

R code processing messages and warnings are suppressed from the output document.

Setup

Set global code chunk parameters and load packages.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.4     ✓ purrr   0.3.4
## ✓ tibble  3.1.2     ✓ dplyr   1.0.5
## ✓ tidyr   1.1.3     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(here)
## here() starts at /Users/itchyshin/Dropbox/Local_git_repos/UmbrellaReview_PFAS_pilot
library(stringr)
library(knitr)
library(formatR)
library(forcats)
library(ggplot2)
library(hrbrthemes) #for ggplot2
library(patchwork) #for ggplot2
library(bibliometrix)
## To cite bibliometrix in publications, please use:
## 
## Aria, M. & Cuccurullo, C. (2017) bibliometrix: An R-tool for comprehensive science mapping analysis, 
##                                  Journal of Informetrics, 11(4), pp 959-975, Elsevier.
##                         
## 
## https://www.bibliometrix.org
## 
##                         
## For information and bug reports:
##                         - Send an email to info@bibliometrix.org   
##                         - Write a post on https://github.com/massimoaria/bibliometrix/issues
##                         
## Help us to keep Bibliometrix free to download and use by contributing with a small donation to support our research team (https://bibliometrix.org/donate.html)
## 
##                         
## To start with the shiny web-interface, please digit:
## biblioshiny()
library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## The following objects are masked from 'package:purrr':
## 
##     compose, simplify
## The following object is masked from 'package:tidyr':
## 
##     crossing
## The following object is masked from 'package:tibble':
## 
##     as_data_frame
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)

Load pilot data

Manually extracted pilot data is stored in four separate .csv files representing different aspects of the data (extracted via structured predefined Google Forms - one per table). Additional fixed table stores pre-extracted data on PFAS types included in this map.

Bibliographic data records are exported from Scopus (including cited references field) in .bib format and locally saved as scopus.bib. the pilot version of this file only includes bibliographic information of ‘systeamtic review family’-like reviews included in Objectives 1 and 2. In the actual work bibliographic information will be expanded to all reviews on PFAS exposure in humans, animals and the environment.

The study identity (authors, title) has been obscured in the pilot data extraction file. It will be public in the final data extraction files.

# Data from .csv file - main data sheet
mdata <- read_csv(here("data","ReviewMap_PFAS_main_pilot.csv"), skip = 0)
#dim(mdata) #8 rows 34 columns

# Data from .csv file - Species_info sheet
spdata <- read_csv(here("data","ReviewMap_PFAS_species_pilot.csv"), skip = 0) 
#dim(spdata) #2 rows 4 columns
# change to long format (one species per row - one or multple rows per study):
spdata %>% select(-X4) %>% separate_rows(Species_scientific_name, sep='; ') -> spdata
#dim(spdata) #29 rows 3 columns

# Data from .csv file - PFAS_types sheet:
ptdata <- read_csv(here("data","ReviewMap_PFAS_types_pilot.csv"), skip = 0) 
#dim(ptdata) #8 rows 4 columns
# change to long format (one type per row - one or multple rows per study):
ptdata %>% select(-X4) %>% separate_rows(PFAS_type, sep=', ') -> ptdata
#dim(ptdata) #70 rows 3 columns

# Data from .csv file - PFAS_info sheet:
pidata <- read_csv(here("data","PFAS_info.csv"), skip = 0) 
#dim(pidata) #35 rows 7 columns

# Critical apprisal(AMSTAR2) from .csv file:
qdata <- read_csv(here("data", "ReviewMap_PFAS_quality_pilot.csv"), skip = 0)
#dim(qdata) #11 rows 38 columns
qdata %>% select(-X34) -> qdata #remove empty column

# Bibliometric data of included papers (with references) dwonloaded from Scopus in .bib format:
bib_sco <- convert2df(here("data","scopus.bib"), dbsource = "scopus", format = "bibtex")
## 
## Converting your scopus collection into a bibliographic dataframe
## 
## Done!
## 
## 
## Generating affiliation field tag AU_UN from C1:  Done!
#dim(bib_sco) #8 rows 33 columns

Merge data stored across tables

The data stored across multiple flat spreadsheets needs merged for some of the analyses.

## Prepare - remove columns starting with "Checked" 
# (info on data checking - these fields will be present in the final extraction sheet), 
# also remove any other redundand columns:
# mdata %>% select(-starts_with("Checked")) -> mdata
# spdata %>% select(-starts_with("Checked")) -> spdata
# ptdata %>% select(-starts_with("Checked")) -> ptdata
# pidata %>% select(-starts_with("Checked")) -> pidata

## Merge main data with species info
mspdata <- left_join(mdata, spdata, by = "Study_ID")
#dim(mspdata) #8 rows 37 columns
#names(mspdata)
#str(mspdata)
#View(mspdata)

# Merge PFAS types data with PFAS info and then with main data
ptidata <- left_join(ptdata, pidata, by = "PFAS_type")
#dim(ptidata) #8 rows 10 columns
#names(ptidata)
#str(ptidata)
#View(ptidata)

mpidata <- left_join(mdata, ptidata, by = "Study_ID")
#dim(mpidata) #8 rows 43 columns
#names(mpidata)
#str(mpidata)
#View(mpidata)

Make a simple summary table of included papers

Note: authors names and paper titles are blinded for the protocol.

knitr::kable(select(mdata, Study_ID,    Author_year,    Paper_title), caption = "Table of included studies")
Table of included studies
Study_ID Author_year Paper_title
S_001 Authors1_2017 Title1
S_002 Authors2_2013 Title2
S_003 Authors3_2019 Title3
S_004 Authors4_2021 Title4
S_005 Authors5_2017 Title5
S_006 Authors6_2017 Title6
S_007 Authors7_2016 Title7
S_008 Authors8_2020 Title8

Objective 1. Mapping: What evidence on PFAS has been synthesized?

Here we aim to reveal what areas in the realm of PFAS health-related and environmental research have been synthesized the most and where syntheses of evidence are lacking. Below, we include examples of potential data summaries and visualizations that will form the evidence review map. The pilot dataset presented here consists of 8 review studies.

Focus on PFAS, or POPs, and how many PFAS

#str(mdata)
mdata %>% count(PFAS_focus, PFAS_one_many) %>% arrange(desc(n)) %>% 
  filter(PFAS_focus != "NA") -> t_PFASfocus #filter out NA

ggplot(t_PFASfocus, aes(x = PFAS_focus, y = n)) +
 theme_light() +
 labs(title = expression("PFAS focus")) + #~bold(A.)~' Type and subject'
 coord_flip() +
 scale_fill_brewer() +
 guides(fill=guide_legend(title="PFAS:")) + 
 geom_col(aes(fill = PFAS_one_many), width = 0.7) + 
 scale_y_continuous(name = "Article count") +
 theme_light() +
 theme(legend.position = "bottom", legend.box.background = element_rect(colour = "black"), 
       legend.title = element_text(size=10),  legend.text=element_text(size = 10), 
       axis.title.x = element_text(size = 10), axis.title.y = element_blank())

#ggsave(here("plots","figure_PFAS_focus.pdf"), width = 4, height = 6, units = "cm", scale = 2, device = cairo_pdf)

PFAS types

#str(ptidata)
ptidata %>% count(PFAS_type) %>% arrange(desc(n)) %>% filter(PFAS_type != "NA") -> t_PFAStype #filter out NA
t_PFAStype$PFAS_type <- factor(t_PFAStype$PFAS_type, levels = t_PFAStype$PFAS_type[order(t_PFAStype$n, decreasing = FALSE)])
PFAS_levels_order <- t_PFAStype$PFAS_type[order(t_PFAStype$n, decreasing = FALSE)] #save for another plot

# simple barplot with PFAS types
ggplot(t_PFAStype, aes(x = PFAS_type, y = n)) +
  geom_col(aes(fill = ""), width = 0.7) +
  theme_light() +
  labs(title = expression("PFAS types reviewed")) + #~bold(A.)~' Type and subject'
  coord_flip()+
  scale_y_continuous(name = "Article count") +
  scale_fill_manual(values = c("#919191")) +
  theme(legend.position = "none", axis.title.x = element_text(size = 10), axis.title.y = element_blank())

#ggsave(here("plots","figure_PFAS_types.pdf"), width = 4, height = 6, units = "cm", scale = 2, device = cairo_pdf)

Main review topics (MeSH terms)

Each records usually has multiple associated MeSH terms, which are separated with a semicolon (or a semicolon with a newline symbol).
The main topics of the article are denoted by the MeSH terms with asterisks *, but some records will not have main topics, so its safer to ignore this information.
Some MeSH headings also have qualifiers (sometimes called subheadings) - these are shown after a forward-slash and are used to “describe the specific aspects of the MeSH heading that are pertinent to the article” (https://www.nlm.nih.gov/bsd/indexing/training/SUB_010.html). MeSH terms and qualifiers could be analysed separately.

# change to long format (one species per row - one or multple rows per study):
mdata %>% select(MeSH_terms) %>% separate_rows(MeSH_terms, sep = ";\\\n") %>% separate_rows(MeSH_terms, sep = "; ") -> t_MESH
#bring to lower case and remove ending and trailing whitespace from character strings, single character vector:
tmesh <- str_to_lower(trimws(t_MESH$MeSH_terms)) 
#remove star character:
tmesh <- sub("[\\*]", "", tmesh) 

#separate terms before and after "/" into separate columns (headings and quantifiers, respectively)
tmesh2 <- str_split_fixed(tmesh, " / ", n = 2)

#create a MeSH dataframe with three columns: terms, headings, qualifiers
t_MESH2 <- bind_cols(tmesh, tmesh2[,1], tmesh2[,2])
colnames(t_MESH2)  <- c("MeSH_terms", "MeSH_headings", "MeSH_qualifiers")
#count frequencies of headings
t_MESH2 %>% select(MeSH_headings) %>% count(MeSH_headings) %>% arrange(desc(n)) %>% 
  filter(MeSH_headings != "NA") ->  MeSH_headings_counts 
#str(MeSH_headings_counts)

MeSH_headings_counts$MeSH_headings <- factor(MeSH_headings_counts$MeSH_headings, levels = MeSH_headings_counts$MeSH_headings[order(MeSH_headings_counts$n, decreasing = FALSE)])
MESH_headings_order <- MeSH_headings_counts$MeSH_headings[order(MeSH_headings_counts$n, decreasing = FALSE)] #save for the next plot

# simple barplot with top 10 MESH headings
ggplot(MeSH_headings_counts[1:10, ], aes(x = MeSH_headings, y = n)) +
  geom_col(aes(fill = ""), width = 0.7) +
  theme_light() +
  labs(title = expression("Top 10 MeSH headings reviewed")) + #~bold(A.)~' Type and subject'
  coord_flip() +
  scale_y_continuous(name = "Article count") +
  scale_fill_manual(values = c("#919191")) +
  theme(legend.position = "none", axis.title.x = element_text(size = 10), axis.title.y = element_blank())

#ggsave(here("plots","figure_MeSH_headings.pdf"), width = 4, height = 6, units = "cm", scale = 2, device = cairo_pdf)
#count frequencies of qualifiers
t_MESH2 %>% select(MeSH_qualifiers) %>% count(MeSH_qualifiers) %>% arrange(desc(n)) %>% filter(MeSH_qualifiers != "NA") %>% filter(MeSH_qualifiers != "")->  MeSH_qualifiers_counts 
#str(MeSH_qualifiers_counts)

MeSH_qualifiers_counts$MeSH_qualifiers <- factor(MeSH_qualifiers_counts$MeSH_qualifiers, levels = MeSH_qualifiers_counts$MeSH_qualifiers[order(MeSH_qualifiers_counts$n, decreasing = FALSE)])
MESH_qualifiers_order <- MeSH_qualifiers_counts$MeSH_qualifiers[order(MeSH_qualifiers_counts$n, decreasing = FALSE)] #save for the next plot

# simple barplot with MESH qualifiers
ggplot(MeSH_qualifiers_counts[, ], aes(x = MeSH_qualifiers, y = n)) +
  geom_col(aes(fill = ""), width = 0.7) +
  theme_light() +
  labs(title = expression("MeSH qualifiers reviewed")) + #~bold(A.)~' Type and subject'
  coord_flip() +
  scale_y_continuous(name = "Article count") +
  scale_fill_manual(values = c("#919191")) +
  theme(legend.position = "none", axis.title.x = element_text(size = 10), axis.title.y = element_blank())

#ggsave(here("plots","figure_MeSH_qualifiers.pdf"), width = 4, height = 6, units = "cm", scale = 2, device = cairo_pdf)

Main review subjects

#str(mdata)
mdata %>% count(Human_animal_environment) %>% arrange(desc(n)) %>% 
  filter(Human_animal_environment != "NA") %>% filter(Human_animal_environment != "") -> t_subject #filter out NA
t_subject$Human_animal_environment <- factor(t_subject$Human_animal_environment, levels = unique(t_subject$Human_animal_environment[order(t_subject$n, decreasing = FALSE)]))
subjects_levels_order <- t_subject$Human_animal_environment[order(t_subject$n, decreasing = FALSE)] #save for another plot

# simple barplot with MESH_heading
ggplot(t_subject, aes(x = Human_animal_environment, y = n)) +
  geom_col(aes(fill = ""), width = 0.7) +
  theme_light() +
  labs(title = expression("Main subjects reviewed")) + #~bold(A.)~' Type and subject'
  coord_flip()+
  scale_y_continuous(name = "Article count") +
  scale_fill_manual(values = c("#919191")) +
  theme(legend.position = "none", axis.title.x = element_text(size = 10), axis.title.y = element_blank())