Chapter 3 Conclusions

This review surveyed both the scientific literature and the R landscape for packages relevant to metabolomics research. While it was very easy to find relevant packages in CRAN and even more so in BioC, many packages are scattered across other source code hosting platforms. While GitHub has a concept of topics (see github.com/search?q=topic:metabolomics+topic:r), and crawlers like rdrr.io can find R packages across several platforms, the best findability can be achieved through well-integrated umbrella projects like Bioconductor, which provide additional infrastructure and also improve the community interaction through conferences and workshops.

This also shows the need for more detailed metadata of the R packages allowing easier mixing and matching of packages, noting that Bioconductor already does a very good job. R packages already have a long standing history of metadata annotation via their DESCRIPTION and CITATION files. These provide links to other packages (e.g. dependencies and suggestions) and literature describing the package. Exposing package and vignette metasdata with semantic approaches will support the community in developing further, more advanced multi-functional workflows for metabolomics. Authors have recently adopted Bioschemas [145] to make metadata easier findable. For example, efforts to start annotation in vignettes allows the ELIXIR Training eSupport System TeSS (tess.oerc.ox.ac.uk) to pick up newer versions (see this git commit [@“attempt to add bioschemas.org json-ld to the vignette html · bridgedb/bridgedbr@40e741a · github”_n.d.]), and efforts are underway to expose content from the DESCRIPTION file as Bioschemas annotation on Bioconductor (see this pull request [@“added template for bioschemas tool annotation by egonw · pull request #25 · bioconductor/bioconductor.org · github”_n.d.]). These actions greatly contribute to community adoption and encourage the reuse of R-based computational workflows in different use cases [141].

In some cases, software described in the literature was only available “on request”, which in practice often turns out to be not available anymore. This review also did not assess whether the R packages (and their dependencies) can be installed on a current R installation. A recent survey [146] showed how the repeatability of papers using scientific software drops when software is not available or does not install. Issues/bug reports were filed for packages that were found that were not able to be tested on contemporary operating systems. The way out of the (un-)repeatability trap can be expressed in very few, seemingly trivial, rules [147] and hosting the code in the open repositories, if possible with regular builds or even Continuous Integration. As discussed earlier, the metabolomics packages have tighter connections in an established community such as Bioconductor, rather than in other package repositories. In the last few years, Bioconductor packages for metabolomics and proteomics data analysis started converging towards a common mass spectrometry infrastructure, which simplifies interoperability between these packages. Based on experiences from these efforts, the RforMassSpectrometry (RforMassSpectrometry.org) initiative was recently started aiming at providing efficient, thoroughly documented, tested and flexible R software for MS data import, handling and analysis. Significant improvements can thus be expected in the future, simplifying and unifying MS data handling for the benefit of the end users. RforMassSpectrometry also contains the metaRbolomics-book [148], which will be a continuously developed resource with additional examples beyond this review.

The authors expect that the metaRbolomics landscape will continue its steady growth rate and keep track of the evolving metabolomics experiments to come.